From pierre at joliv.et Tue Mar 1 01:36:53 2022 From: pierre at joliv.et (Pierre Jolivet) Date: Tue, 1 Mar 2022 08:36:53 +0100 Subject: [petsc-users] Preconditioner for LSQR In-Reply-To: References: Message-ID: <62D453D6-75FD-4FAD-9E52-35DCC24FD41F@joliv.et> Hello Lucas, In your sequence of systems, is A changing? Are all right-hand sides available from the get-go? In that case, you can solve everything in a block fashion and that?s how you could get real improvements. Also, instead of PCCHOLESKY on A^T * A + KSPCG, you could use PCQR on A + KSPPREONLY, but this may not be needed, cf. Jed?s answer. Thanks, Pierre > On 1 Mar 2022, at 12:54 AM, Lucas Banting wrote: > > Hello, > > I have an MPIDENSE matrix of size about 200,000 x 200, using KSPLSQR on my machine a solution takes about 15 s. I typically run with six to eight processors. > I have to solve the system several times, typically 4-30, and was looking for recommendations on reusable preconditioners to use with KSPLSQR to increase speed. > > Would it make the most sense to use PCCHOLESKY on the smaller system A^T * A? > > Thanks, > Lucas -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Tue Mar 1 04:49:08 2022 From: jroman at dsic.upv.es (Jose E. Roman) Date: Tue, 1 Mar 2022 11:49:08 +0100 Subject: [petsc-users] Preconditioner for LSQR In-Reply-To: <62D453D6-75FD-4FAD-9E52-35DCC24FD41F@joliv.et> References: <62D453D6-75FD-4FAD-9E52-35DCC24FD41F@joliv.et> Message-ID: <8168BD74-2615-4596-9F6C-E061E9039ADB@dsic.upv.es> To use SLEPc's TSQR one would do something like this: ierr = BVCreateFromMat(A,&X);CHKERRQ(ierr); ierr = BVSetFromOptions(X);CHKERRQ(ierr); ierr = BVSetOrthogonalization(X,BV_ORTHOG_CGS,BV_ORTHOG_REFINE_IFNEEDED,PETSC_DEFAULT,BV_ORTHOG_BLOCK_TSQR);CHKERRQ(ierr); ierr = BVOrthogonalize(X,R);CHKERRQ(ierr); But then one would have to use BVDotVec() to obtain Q'*b and finally solve a triangular system with R. Jose > El 1 mar 2022, a las 8:36, Pierre Jolivet escribi?: > > Hello Lucas, > In your sequence of systems, is A changing? > Are all right-hand sides available from the get-go? > In that case, you can solve everything in a block fashion and that?s how you could get real improvements. > Also, instead of PCCHOLESKY on A^T * A + KSPCG, you could use PCQR on A + KSPPREONLY, but this may not be needed, cf. Jed?s answer. > > Thanks, > Pierre > >> On 1 Mar 2022, at 12:54 AM, Lucas Banting wrote: >> >> Hello, >> >> I have an MPIDENSE matrix of size about 200,000 x 200, using KSPLSQR on my machine a solution takes about 15 s. I typically run with six to eight processors. >> I have to solve the system several times, typically 4-30, and was looking for recommendations on reusable preconditioners to use with KSPLSQR to increase speed. >> >> Would it make the most sense to use PCCHOLESKY on the smaller system A^T * A? >> >> Thanks, >> Lucas > From jeremy at seamplex.com Tue Mar 1 06:04:19 2022 From: jeremy at seamplex.com (Jeremy Theler) Date: Tue, 01 Mar 2022 09:04:19 -0300 Subject: [petsc-users] set MUMPS OOC_TMPDIR In-Reply-To: References: <997F651C-BB2E-43FE-82D9-47353F5D5F46@petsc.dev> <4C042735-2FAC-45DF-A0C4-463974AABC68@petsc.dev> Message-ID: I've known at least one other person that had this very same confusion. Maybe it's worth adding Barry's explanation to the manual page: This function can be called BEFORE PetscInitialize(), but it does not NEED to be called before PetscInitialize(). You should be able to call PetscOptionsSetValue() anytime you want. On Mon, 2022-02-28 at 16:39 -0800, Sam Guo wrote: > Yes, " This function can be called BEFORE PetscInitialize()" confused > me. Thanks for the clarification.? > > On Mon, Feb 28, 2022 at 4:38 PM Barry Smith wrote: > > > > ? You should be able to call PetscOptionsSetValue() anytime you > > want, as I said between different uses of MUMPS you can call it to > > use different directories. > > > > ? Perhaps this confused you? > > > > ? ? ?Note: > > ? ?This function can be called BEFORE PetscInitialize() > > > > ? It is one of the very few functions that can be called before > > PetscInitialize() but it does not NEED to be called before > > PetscInitialize(). > > > > ? Barry From zhangqingwen at live.cn Tue Mar 1 03:19:38 2022 From: zhangqingwen at live.cn (Zhang Qingwen) Date: Tue, 1 Mar 2022 09:19:38 +0000 Subject: [petsc-users] How to solve N-S equation with SIMPLE-like prediction-correction method? Message-ID: Hi, all, In the field of CFD, the prediction-correction method (e.g., the SIMPLE algorithm) has been widely used to solved the Navier-Stokes equations with finite volume method. However, I have no idea about how it can be implemented with PETSc. Let me describe a simplified case of creeping flow (e.g., mantle flow), in which the inertia term in the N-S equ. can be neglected and reduced to the so-called Stokes equ. This equation only contains a diffusion term and source term of pressure gradient, and there are two coupled equations to solved: * The equation of momentum conservation * The equation of mass conservation The SIMPLE does not solve these coupled equations simultaneously. Instead, this can be done in a iterative fashion with SEPERATE steps until the solutions of velocity and pressure are accurate enough: 1. set initial u*, v*, P* 2. solve the momentum equation to get the velocity 3. solve the continuity equation to get the pressure correction P' 4. correct the velocity and the pressure with P' to get the new u, v, P 5. update the solutions: u*=u, v*=v, P*=P 6. go to step 2 if the solutions are not accurate enough (A document is attached to clearify the governing equation and the SIMPLE algorithm, please check the attachment) There is a cavity flow example using TS that demonstrate how to solve N-S equations simultaneously, which solve four coupled equations with nonlinear iterations inside SNES and TS, and does not involves separate prediction-correction steps like the SIMPLE algorithm above. So, any hints about how can the SIMPLE algorithm be implemented in PETSc? Particularly, how can I solve these equations in separate steps, so I can solve the governing equations in a prediction-correction fashion? Best, Qw -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: SIMPLE-like_algorithm.pdf Type: application/pdf Size: 227159 bytes Desc: SIMPLE-like_algorithm.pdf URL: From bsmith at petsc.dev Tue Mar 1 10:16:44 2022 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 1 Mar 2022 11:16:44 -0500 Subject: [petsc-users] How to solve N-S equation with SIMPLE-like prediction-correction method? In-Reply-To: References: Message-ID: The PCFIELDSPLIT preconditioner provides a general framework for solving coupled systems by using inner solvers on individual fields. It can be used for some instances of SIMPLE-like algorithms but I do not know if it exactly handles your specific instance. src/snes/tutorials/ex70.c demonstrates its usage on Poiseuille flow problem. Some of the online tutorials and slides also discuss the SIMPLE algorithm and have example usage with PCFIELDSPLIT. If you have specific questions with the usage please let us know. Barry > On Mar 1, 2022, at 4:19 AM, Zhang Qingwen wrote: > > Hi, all, > > In the field of CFD, the prediction-correction method (e.g., the SIMPLE algorithm) has been widely used to solved the Navier-Stokes equations with finite volume method. > > However, I have no idea about how it can be implemented with PETSc. > > Let me describe a simplified case of creeping flow (e.g., mantle flow), in which the inertia term in the N-S equ. can be neglected and reduced to the so-called Stokes equ. > This equation only contains a diffusion term and source term of pressure gradient, and there are two coupled equations to solved: > The equation of momentum conservation > The equation of mass conservation > The SIMPLE does not solve these coupled equations simultaneously. Instead, this can be done in a iterative fashion with SEPERATE steps until the solutions of velocity and pressure are accurate enough: > > 1. set initial u*, v*, P* > 2. solve the momentum equation to get the velocity > 3. solve the continuity equation to get the pressure correction P' > 4. correct the velocity and the pressure with P' to get the new u, v, P > 5. update the solutions: u*=u, v*=v, P*=P > 6. go to step 2 if the solutions are not accurate enough > > (A document is attached to clearify the governing equation and the SIMPLE algorithm, please check the attachment) > > There is a cavity flow example using TS that demonstrate how to solve N-S equations simultaneously, > which solve four coupled equations with nonlinear iterations inside SNES and TS, and does not involves separate prediction-correction steps like the SIMPLE algorithm above. > > So, any hints about how can the SIMPLE algorithm be implemented in PETSc? > Particularly, how can I solve these equations in separate steps, so I can solve the governing equations in a prediction-correction fashion? > > Best, > Qw > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bantingl at myumanitoba.ca Tue Mar 1 11:06:53 2022 From: bantingl at myumanitoba.ca (Lucas Banting) Date: Tue, 1 Mar 2022 17:06:53 +0000 Subject: [petsc-users] Preconditioner for LSQR In-Reply-To: <8168BD74-2615-4596-9F6C-E061E9039ADB@dsic.upv.es> References: <62D453D6-75FD-4FAD-9E52-35DCC24FD41F@joliv.et> <8168BD74-2615-4596-9F6C-E061E9039ADB@dsic.upv.es> Message-ID: Thanks everyone, QR makes the most sense for my application. Jose, Once I get R from BVOrthogonalize, how I should I solve the upper triangular system? Is the returned Mat R setup to be used in MatBackwardSolve? Pierre, I can reuse A for many iterations, but I cannot do a matmatsolve as I need the resulting solution to produce the next right hand side vector. Thanks again, Lucas ________________________________ From: Jose E. Roman Sent: Tuesday, March 1, 2022 4:49 AM To: Pierre Jolivet Cc: Lucas Banting ; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Preconditioner for LSQR Caution: This message was sent from outside the University of Manitoba. To use SLEPc's TSQR one would do something like this: ierr = BVCreateFromMat(A,&X);CHKERRQ(ierr); ierr = BVSetFromOptions(X);CHKERRQ(ierr); ierr = BVSetOrthogonalization(X,BV_ORTHOG_CGS,BV_ORTHOG_REFINE_IFNEEDED,PETSC_DEFAULT,BV_ORTHOG_BLOCK_TSQR);CHKERRQ(ierr); ierr = BVOrthogonalize(X,R);CHKERRQ(ierr); But then one would have to use BVDotVec() to obtain Q'*b and finally solve a triangular system with R. Jose > El 1 mar 2022, a las 8:36, Pierre Jolivet escribi?: > > Hello Lucas, > In your sequence of systems, is A changing? > Are all right-hand sides available from the get-go? > In that case, you can solve everything in a block fashion and that?s how you could get real improvements. > Also, instead of PCCHOLESKY on A^T * A + KSPCG, you could use PCQR on A + KSPPREONLY, but this may not be needed, cf. Jed?s answer. > > Thanks, > Pierre > >> On 1 Mar 2022, at 12:54 AM, Lucas Banting wrote: >> >> Hello, >> >> I have an MPIDENSE matrix of size about 200,000 x 200, using KSPLSQR on my machine a solution takes about 15 s. I typically run with six to eight processors. >> I have to solve the system several times, typically 4-30, and was looking for recommendations on reusable preconditioners to use with KSPLSQR to increase speed. >> >> Would it make the most sense to use PCCHOLESKY on the smaller system A^T * A? >> >> Thanks, >> Lucas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Tue Mar 1 11:18:24 2022 From: jroman at dsic.upv.es (Jose E. Roman) Date: Tue, 1 Mar 2022 18:18:24 +0100 Subject: [petsc-users] Preconditioner for LSQR In-Reply-To: References: <62D453D6-75FD-4FAD-9E52-35DCC24FD41F@joliv.et> <8168BD74-2615-4596-9F6C-E061E9039ADB@dsic.upv.es> Message-ID: <0C4D7258-88C2-47C3-BBD9-8A4CFF7438E2@dsic.upv.es> > El 1 mar 2022, a las 18:06, Lucas Banting escribi?: > > Thanks everyone, QR makes the most sense for my application. > > Jose, > Once I get R from BVOrthogonalize, how I should I solve the upper triangular system? > Is the returned Mat R setup to be used in MatBackwardSolve? MatBackwardSolve() would be the operation to use, but unfortunately it seems to be implemented only for SBAIJ matrices, not for dense matrices. You will have to get the array with MatDenseGetArray()/MatDenseRestoreArray() and then call BLAStrsm_ The KSP/PC interface is more convenient. Jose > > Pierre, > I can reuse A for many iterations, but I cannot do a matmatsolve as I need the resulting solution to produce the next right hand side vector. > > Thanks again, > Lucas > From: Jose E. Roman > Sent: Tuesday, March 1, 2022 4:49 AM > To: Pierre Jolivet > Cc: Lucas Banting ; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Preconditioner for LSQR > > Caution: This message was sent from outside the University of Manitoba. > > > To use SLEPc's TSQR one would do something like this: > > ierr = BVCreateFromMat(A,&X);CHKERRQ(ierr); > ierr = BVSetFromOptions(X);CHKERRQ(ierr); > ierr = BVSetOrthogonalization(X,BV_ORTHOG_CGS,BV_ORTHOG_REFINE_IFNEEDED,PETSC_DEFAULT,BV_ORTHOG_BLOCK_TSQR);CHKERRQ(ierr); > ierr = BVOrthogonalize(X,R);CHKERRQ(ierr); > > But then one would have to use BVDotVec() to obtain Q'*b and finally solve a triangular system with R. > Jose > > > El 1 mar 2022, a las 8:36, Pierre Jolivet escribi?: > > > > Hello Lucas, > > In your sequence of systems, is A changing? > > Are all right-hand sides available from the get-go? > > In that case, you can solve everything in a block fashion and that?s how you could get real improvements. > > Also, instead of PCCHOLESKY on A^T * A + KSPCG, you could use PCQR on A + KSPPREONLY, but this may not be needed, cf. Jed?s answer. > > > > Thanks, > > Pierre > > > >> On 1 Mar 2022, at 12:54 AM, Lucas Banting wrote: > >> > >> Hello, > >> > >> I have an MPIDENSE matrix of size about 200,000 x 200, using KSPLSQR on my machine a solution takes about 15 s. I typically run with six to eight processors. > >> I have to solve the system several times, typically 4-30, and was looking for recommendations on reusable preconditioners to use with KSPLSQR to increase speed. > >> > >> Would it make the most sense to use PCCHOLESKY on the smaller system A^T * A? > >> > >> Thanks, > >> Lucas > > From FERRANJ2 at my.erau.edu Wed Mar 2 14:10:49 2022 From: FERRANJ2 at my.erau.edu (Ferrand, Jesus A.) Date: Wed, 2 Mar 2022 20:10:49 +0000 Subject: [petsc-users] Missing Function MatSeqAIJGetArray_C Message-ID: Greetings: I need to carry out a matrix product of the form P^tAP, where matrix P is always sparse, but matrix A can either be sparse or dense. To handle both cases, I define two (2) separate matrix products using the sequence. MatProductCreate() MatProductSetType() MatProductSetAlgorithm() MatProductSetFill() MatProductSetFromOptions() MatProductSymbolic() And then when I need to carry out the multiplication I call. MatProductReplaceMats() MatProductNumeric() When matrix A is sparse, the code runs flawlessly. When matrix A is dense, however, I get this error that says that there is a missing function "MatSeqAIJGetArray_C." I have no clue as to why there could be a dependency on the type of my matrix A. I also tried MatPtAP() which essentially does the same as the "MatProduct" callouts I mentioned and the error is the same. [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: No support for this operation for this object type [0]PETSC ERROR: Cannot locate function MatSeqAIJGetArray_C in object [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.16.0, unknown [0]PETSC ERROR: ./par on a linux-c-dbg named F86 by jesus Wed Mar 2 14:55:05 2022 [0]PETSC ERROR: Configure options --with-32bits-pci-domain=1 --with-debugging =1 --download-ptscotch --download-metis --download-parmetis --download-chaco --download-hdf5 [0]PETSC ERROR: #1 MatSeqAIJGetArray() at /home/jesus/SAND/PETSc_install/petsc/src/mat/impls/aij/seq/aij.c:4550 [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: No support for this operation for this object type [1]PETSC ERROR: Cannot locate function MatSeqAIJGetArray_C in object [1]PETSC ERROR: [0]PETSC ERROR: #2 KAFormKE2D() at /home/jesus/SAND/FEA/3D/PARALLEL.c:1845 [0]PETSC ERROR: #3 MESHTraverseDepth() at /home/jesus/SAND/FEA/3D/PARALLEL.c:411 [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [1]PETSC ERROR: Petsc Release Version 3.16.0, unknown [1]PETSC ERROR: #4 KAFormK() at /home/jesus/SAND/FEA/3D/PARALLEL.c:1881 [0]PETSC ERROR: #5 main() at /home/jesus/SAND/FEA/3D/PARALLEL.c:2394 ./par on a linux-c-dbg named F86 by jesus Wed Mar 2 14:55:05 2022 [1]PETSC ERROR: [0]PETSC ERROR: PETSc Option Table entries: [0]PETSC ERROR: -benchmark_iter 2 Configure options --with-32bits-pci-domain=1 --with-debugging =1 --download-ptscotch --download-metis --download-parmetis --download-chaco --download-hdf5 [1]PETSC ERROR: #1 MatSeqAIJGetArray() at /home/jesus/SAND/PETSc_install/petsc/src/mat/impls/aij/seq/aij.c:4550 [0]PETSC ERROR: -mesh_filename L.msh [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- [1]PETSC ERROR: #2 KAFormKE2D() at /home/jesus/SAND/FEA/3D/PARALLEL.c:1845 [1]PETSC ERROR: #3 MESHTraverseDepth() at /home/jesus/SAND/FEA/3D/PARALLEL.c:411 application called MPI_Abort(MPI_COMM_WORLD, 56) - process 0 [1]PETSC ERROR: #4 KAFormK() at /home/jesus/SAND/FEA/3D/PARALLEL.c:1881 [1]PETSC ERROR: #5 main() at /home/jesus/SAND/FEA/3D/PARALLEL.c:2394 [1]PETSC ERROR: PETSc Option Table entries: [1]PETSC ERROR: -benchmark_iter 2 [1]PETSC ERROR: -mesh_filename L.msh [1]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- application called MPI_Abort(MPI_COMM_WORLD, 56) - process 1 Sincerely: J.A. Ferrand Embry-Riddle Aeronautical University - Daytona Beach FL M.Sc. Aerospace Engineering | May 2022 B.Sc. Aerospace Engineering B.Sc. Computational Mathematics Sigma Gamma Tau Tau Beta Pi Honors Program Phone: (386)-843-1829 Email(s): ferranj2 at my.erau.edu jesus.ferrand at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Wed Mar 2 17:04:37 2022 From: hzhang at mcs.anl.gov (Zhang, Hong) Date: Wed, 2 Mar 2022 23:04:37 +0000 Subject: [petsc-users] Missing Function MatSeqAIJGetArray_C In-Reply-To: References: Message-ID: Ferrand, We do not have support for PtAP with P in aij format and A in dense format yet. MatProductReplaceMats() is used only when a new matrix has the SAME non-zero pattern as the one to be replace, which is not your case. You may do: B = A*P (MatMatMult) C = Pt*B (MatTransposeMatMult) in which, you simply set P as aij format and A as dense. These option should work. Let me know if this works or not. If it works, I can add this support to petsc. Hong ________________________________ From: petsc-users on behalf of Ferrand, Jesus A. Sent: Wednesday, March 2, 2022 2:10 PM To: petsc-users at mcs.anl.gov Subject: [petsc-users] Missing Function MatSeqAIJGetArray_C Greetings: I need to carry out a matrix product of the form P^tAP, where matrix P is always sparse, but matrix A can either be sparse or dense. To handle both cases, I define two (2) separate matrix products using the sequence. MatProductCreate() MatProductSetType() MatProductSetAlgorithm() MatProductSetFill() MatProductSetFromOptions() MatProductSymbolic() And then when I need to carry out the multiplication I call. MatProductReplaceMats() MatProductNumeric() When matrix A is sparse, the code runs flawlessly. When matrix A is dense, however, I get this error that says that there is a missing function "MatSeqAIJGetArray_C." I have no clue as to why there could be a dependency on the type of my matrix A. I also tried MatPtAP() which essentially does the same as the "MatProduct" callouts I mentioned and the error is the same. [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: No support for this operation for this object type [0]PETSC ERROR: Cannot locate function MatSeqAIJGetArray_C in object [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.16.0, unknown [0]PETSC ERROR: ./par on a linux-c-dbg named F86 by jesus Wed Mar 2 14:55:05 2022 [0]PETSC ERROR: Configure options --with-32bits-pci-domain=1 --with-debugging =1 --download-ptscotch --download-metis --download-parmetis --download-chaco --download-hdf5 [0]PETSC ERROR: #1 MatSeqAIJGetArray() at /home/jesus/SAND/PETSc_install/petsc/src/mat/impls/aij/seq/aij.c:4550 [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: No support for this operation for this object type [1]PETSC ERROR: Cannot locate function MatSeqAIJGetArray_C in object [1]PETSC ERROR: [0]PETSC ERROR: #2 KAFormKE2D() at /home/jesus/SAND/FEA/3D/PARALLEL.c:1845 [0]PETSC ERROR: #3 MESHTraverseDepth() at /home/jesus/SAND/FEA/3D/PARALLEL.c:411 [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [1]PETSC ERROR: Petsc Release Version 3.16.0, unknown [1]PETSC ERROR: #4 KAFormK() at /home/jesus/SAND/FEA/3D/PARALLEL.c:1881 [0]PETSC ERROR: #5 main() at /home/jesus/SAND/FEA/3D/PARALLEL.c:2394 ./par on a linux-c-dbg named F86 by jesus Wed Mar 2 14:55:05 2022 [1]PETSC ERROR: [0]PETSC ERROR: PETSc Option Table entries: [0]PETSC ERROR: -benchmark_iter 2 Configure options --with-32bits-pci-domain=1 --with-debugging =1 --download-ptscotch --download-metis --download-parmetis --download-chaco --download-hdf5 [1]PETSC ERROR: #1 MatSeqAIJGetArray() at /home/jesus/SAND/PETSc_install/petsc/src/mat/impls/aij/seq/aij.c:4550 [0]PETSC ERROR: -mesh_filename L.msh [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- [1]PETSC ERROR: #2 KAFormKE2D() at /home/jesus/SAND/FEA/3D/PARALLEL.c:1845 [1]PETSC ERROR: #3 MESHTraverseDepth() at /home/jesus/SAND/FEA/3D/PARALLEL.c:411 application called MPI_Abort(MPI_COMM_WORLD, 56) - process 0 [1]PETSC ERROR: #4 KAFormK() at /home/jesus/SAND/FEA/3D/PARALLEL.c:1881 [1]PETSC ERROR: #5 main() at /home/jesus/SAND/FEA/3D/PARALLEL.c:2394 [1]PETSC ERROR: PETSc Option Table entries: [1]PETSC ERROR: -benchmark_iter 2 [1]PETSC ERROR: -mesh_filename L.msh [1]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- application called MPI_Abort(MPI_COMM_WORLD, 56) - process 1 Sincerely: J.A. Ferrand Embry-Riddle Aeronautical University - Daytona Beach FL M.Sc. Aerospace Engineering | May 2022 B.Sc. Aerospace Engineering B.Sc. Computational Mathematics Sigma Gamma Tau Tau Beta Pi Honors Program Phone: (386)-843-1829 Email(s): ferranj2 at my.erau.edu jesus.ferrand at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From giavancini at usp.br Thu Mar 3 10:24:41 2022 From: giavancini at usp.br (Giovane Avancini) Date: Thu, 3 Mar 2022 13:24:41 -0300 Subject: [petsc-users] [KSP] PETSc not reporting a KSP fail when true residual is NaN In-Reply-To: <850B6DA1-9FB8-4139-ADDF-B32F118A5EA3@petsc.dev> References: <850B6DA1-9FB8-4139-ADDF-B32F118A5EA3@petsc.dev> Message-ID: Sorry for my late reply Barry, Sure I can share the code with you, but unfortunately I don't know how to make docker images. If you don't mind, you can clone the code from github through this link: git at github.com:giavancini/runPFEM.git It can be easily compiled with cmake, and you can see the dependencies in README.md. Please let me know if you need any other information. Kind regards, Giovane Em sex., 25 de fev. de 2022 ?s 18:22, Barry Smith escreveu: > > Hmm, this is going to be tricky to debug why it the Inf/Nan is not > found when it should be. > > In a debugger you can catch/trap floating point exceptions (how to do > this depends on your debugger) and then step through the code after that to > see why PETSc KSP is not properly noting the Inf/Nan and returning. This > may be cumbersome to do if you don't know PETSc well. Is your code easy to > build, would be willing to share it to me so I can run it and debug > directly? If you know how to make docker images or something you might be > able to give it to me easily. > > Barry > > > On Feb 25, 2022, at 3:59 PM, Giovane Avancini wrote: > > Mark, Matthew and Barry, > > Thank you all for the quick responses. > > Others might have a better idea, but you could run with '-info :ksp' and > see if you see any messages like "Linear solver has created a not a number > (NaN) as the residual norm, declaring divergence \n" > You could also run with -log_trace and see if it is > using KSPConvergedDefault. I'm not sure if this is the method used given > your parameters, but I think it is. > > Mark, I ran with both options. I didn't get any messages like "linear > solver has created a not a number..." when using -info: ksp. When turning > on -log_trace, I could verify that it is using KSPConvergedDefault but what > does it mean exactly? When FGMRES converges with the true residual being > NaN, I get the following message: [0] KSPConvergedDefault(): Linear solver > has converged. Residual norm 8.897908325511e-05 is less than relative > tolerance 1.000000000000e-08 times initial right hand side norm > 1.466597558465e+04 at iteration 53. No information about NaN whatsoever. > > We check for NaN or Inf, for example, in KSPCheckDot(). if you have the > KSP set to error ( > https://petsc.org/main/docs/manualpages/KSP/KSPSetErrorIfNotConverged.html > ) > then we throw an error, but the return codes do not seem to be checked in > your implementation. If not, then we set the flag for divergence. > > Matthew, I do not check the return code in this case because I don't want > PETSc to stop if an error occurs during the solving step. I just want to > know that it didn't converge and treat this error inside my code. The > problem is that the flag for divergence is not always being set when FGMRES > is not converging. I was just wondering why it was set during time step 921 > and why not for time step 922 as well. > > Thanks for the complete report. It looks like we may be missing a check in > our FGMRES implementation that allows the iteration to continue after a > NaN/Inf. > > I will explain how we handle the checking and then attach a patch that > you can apply to see if it resolves the problem. Whenever our KSP solvers > compute a norm we > check after that calculation to verify that the norm is not an Inf or Nan. > This is an inexpensive global check across all MPI ranks because > immediately after the norm computation all ranks that share the KSP have > the same value. If the norm is a Inf or Nan we "short-circuit" the KSP > solve and return immediately with an appropriate not converged code. A > quick eye-ball inspection of the FGMRES code found a missing check. > > You can apply the attached patch file in the PETSC_DIR with > > patch -p1 < fgmres.patch > make libs > > then rerun your code and see if it now handles the Inf/NaN correctly. If > so we'll patch our release branch with the fix. > > Thank you for checking this, Barry. I applied the patch exactly the way > you instructed, however, the problem is still happening. Is there a way to > check if the patch was in fact applied? You can see in the attached > screenshot the terminal information. > > Kind regards, > > Giovane > > Em sex., 25 de fev. de 2022 ?s 13:48, Barry Smith > escreveu: > >> >> Giovane, >> >> Thanks for the complete report. It looks like we may be missing a >> check in our FGMRES implementation that allows the iteration to continue >> after a NaN/Inf. >> >> I will explain how we handle the checking and then attach a patch >> that you can apply to see if it resolves the problem. Whenever our KSP >> solvers compute a norm we >> check after that calculation to verify that the norm is not an Inf or >> Nan. This is an inexpensive global check across all MPI ranks because >> immediately after the norm computation all ranks that share the KSP have >> the same value. If the norm is a Inf or Nan we "short-circuit" the KSP >> solve and return immediately with an appropriate not converged code. A >> quick eye-ball inspection of the FGMRES code found a missing check. >> >> You can apply the attached patch file in the PETSC_DIR with >> >> patch -p1 < fgmres.patch >> make libs >> >> then rerun your code and see if it now handles the Inf/NaN correctly. If >> so we'll patch our release branch with the fix. >> >> Barry >> >> >> >> Giovane >> >> >> >> On Feb 25, 2022, at 11:06 AM, Giovane Avancini via petsc-users < >> petsc-users at mcs.anl.gov> wrote: >> >> Dear PETSc users, >> >> I'm working on an inhouse code that solves the Navier-Stokes equation in >> a Lagrangian fashion for free surface flows. Because of the large >> distortions and pressure gradients, it is quite common to encounter some >> issues with iterative solvers for some time steps, and because of that, I >> implemented a function that changes the solver type based on the flag >> KSPConvergedReason. If this flag is negative after a call to KSPSolve, I >> solve the same linear system again using a direct method. >> >> The problem is that, sometimes, KSP keeps converging even though the >> residual is NaN, and because of that, I'm not able to identify the problem >> and change the solver, which leads to a solution vector equals to INF and >> obviously the code ends up crashing. Is it normal to observe this kind of >> behaviour? >> >> Please find attached the log produced with the options >> -ksp_monitor_lg_residualnorm -ksp_log -ksp_view -ksp_monitor_true_residual >> -ksp_converged_reason and the function that changes the solver. I'm >> currently using FGMRES and BJACOBI preconditioner with LU for each block. >> The problem still happens with ILU for example. We can see in the log file >> that for the time step 921, the true residual is NaN and within just one >> iteration, the solver fails and it gives the reason DIVERGED_PC_FAILED. I >> simply changed the solver to MUMPS and it converged for that time step. >> However, when solving time step 922 we can see that FGMRES converges while >> the true residual is NaN. Why is that possible? I would appreciate it if >> someone could clarify this issue to me. >> >> Kind regards, >> Giovane >> >> >> >> -- >> Giovane Avancini >> Doutorando em Engenharia de Estruturas - Escola de Engenharia de S?o >> Carlos, USP >> >> PhD researcher in Structural Engineering - School of Engineering of S?o >> Carlos. USP >> >> >> >> > > -- > Giovane Avancini > Doutorando em Engenharia de Estruturas - Escola de Engenharia de S?o > Carlos, USP > > PhD researcher in Structural Engineering - School of Engineering of S?o > Carlos. USP > > > > -- Giovane Avancini Doutorando em Engenharia de Estruturas - Escola de Engenharia de S?o Carlos, USP PhD researcher in Structural Engineering - School of Engineering of S?o Carlos. USP -------------- next part -------------- An HTML attachment was scrubbed... URL: From yi.ethan.jiang at gmail.com Thu Mar 3 21:32:39 2022 From: yi.ethan.jiang at gmail.com (Yi Jiang) Date: Fri, 4 Mar 2022 11:32:39 +0800 Subject: [petsc-users] Loading labels from a hdf5 mesh file in parallel Message-ID: Dear Petsc developers, We are trying to use the HDF5_XDMF parallel I/O feature to read/write unstructured meshes. By trying some tests, we found that the topology and geometry data (i.e., cells and vertices) can be efficiently loaded in a scalable way, which is very impressive! However, we also found that all labels (such as `cell sets', `face sets') in the .h5 file are ignored (the .h5 file was converted from an EXODUSII mesh, by using a PetscViewer with PETSC_VIEWER_HDF5_XDMF format). Hence, we are wondering, does the latest Petsc also support to import these labels in parallel? In particular, we would like to redistribute the mesh after it is parallelly loaded by the naive partition. If so, could you please show me where to find an example to learn the techniques? Thank you very much for devoting the continuous efforts to the community and keep developing these wonderful features. Best regards, YJ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Fri Mar 4 12:54:22 2022 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 4 Mar 2022 12:54:22 -0600 (CST) Subject: [petsc-users] petsc-3.16.5 now available Message-ID: <20a1975e-6075-1fe9-723e-3627e68382f1@mcs.anl.gov> Dear PETSc users, The patch release petsc-3.16.5 is now available for download. https://petsc.org/release/download/ Satish From varunhiremath at gmail.com Fri Mar 4 13:07:25 2022 From: varunhiremath at gmail.com (Varun Hiremath) Date: Fri, 4 Mar 2022 11:07:25 -0800 Subject: [petsc-users] SLEPc solve: progress info and abort option Message-ID: Hi All, We use SLEPc to compute eigenvalues of big problems which typically takes a long time. We want to add a progress bar to inform the user of the estimated time remaining to finish the computation. In addition, we also want to add an option for the user to abort the computation midway if needed. To some extent, I am able to do these by attaching a custom function to EPSSetStoppingTestFunction and using nconv/nev as an indication of progress, and throwing an exception when the user decides to abort the computation. However, since this function gets called only once every iteration, for very big problems it takes a long time for the program to respond. I was wondering if there is any other function to which I can attach, which gets called more frequently and can provide more fine-grained information on the progress. Thanks, Varun -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Mar 4 13:16:42 2022 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 4 Mar 2022 14:16:42 -0500 Subject: [petsc-users] SLEPc solve: progress info and abort option In-Reply-To: References: Message-ID: On Fri, Mar 4, 2022 at 2:07 PM Varun Hiremath wrote: > Hi All, > > We use SLEPc to compute eigenvalues of big problems which typically takes > a long time. We want to add a progress bar to inform the user of the > estimated time remaining to finish the computation. In addition, we also > want to add an option for the user to abort the computation midway if > needed. > > To some extent, I am able to do these by attaching a custom function to > EPSSetStoppingTestFunction > and > using nconv/nev as an indication of progress, and throwing an exception > when the user decides to abort the computation. However, since this > function gets called only once every iteration, for very big problems it > takes a long time for the program to respond. I was wondering if there is > any other function to which I can attach, which gets called more frequently > and can provide more fine-grained information on the progress. > I believe (Jose can correct me) that the bulk of the time in an iterate would be in the linear solve. You can insert something into a KSPMonitor. If you know the convergence tolerance and assume a linear convergence rate I guess you could estimate the "amount done". Thanks, Matt > Thanks, > Varun > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Fri Mar 4 13:36:15 2022 From: jroman at dsic.upv.es (Jose E. Roman) Date: Fri, 4 Mar 2022 20:36:15 +0100 Subject: [petsc-users] SLEPc solve: progress info and abort option In-Reply-To: References: Message-ID: Yes, assuming that the eigensolver is calling KSPSolve(), you can set a monitor with KSPMonitorSet(). This will be called more often than the callback for EPSSetStoppingTestFunction(). Jose > El 4 mar 2022, a las 20:16, Matthew Knepley escribi?: > > > On Fri, Mar 4, 2022 at 2:07 PM Varun Hiremath wrote: > Hi All, > > We use SLEPc to compute eigenvalues of big problems which typically takes a long time. We want to add a progress bar to inform the user of the estimated time remaining to finish the computation. In addition, we also want to add an option for the user to abort the computation midway if needed. > > To some extent, I am able to do these by attaching a custom function to EPSSetStoppingTestFunction and using nconv/nev as an indication of progress, and throwing an exception when the user decides to abort the computation. However, since this function gets called only once every iteration, for very big problems it takes a long time for the program to respond. I was wondering if there is any other function to which I can attach, which gets called more frequently and can provide more fine-grained information on the progress. > > I believe (Jose can correct me) that the bulk of the time in an iterate would be in the linear solve. You can insert something into a KSPMonitor. If you know the convergence tolerance and assume a linear convergence rate I guess you could estimate the "amount done". > > Thanks, > > Matt > > Thanks, > Varun > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From tangqi at msu.edu Sat Mar 5 12:26:57 2022 From: tangqi at msu.edu (Tang, Qi) Date: Sat, 5 Mar 2022 18:26:57 +0000 Subject: [petsc-users] Control adaptive time step Message-ID: <7F668160-5A51-41D2-8CD2-E1C965FA54BE@msu.edu> Hi, Is there a simple way to control the first few adaptive time step? Currently, I am using the following options but dt grows too fast initially. The first few time steps increase dt by a factor of 10 in each time step. Is there a way to do a slow start, say, a factor of 2 instead of 10? Thanks. -ts_type arkimex -ts_arkimex_type 2e -ts_arkimex_fully_implicit -ts_adapt_dt_min 1.0e-5 -ts_adapt_dt_max 1.0e-2 -ts_adapt_type basic Qi T-5 at LANL From hongzhang at anl.gov Sat Mar 5 12:55:09 2022 From: hongzhang at anl.gov (Zhang, Hong) Date: Sat, 5 Mar 2022 18:55:09 +0000 Subject: [petsc-users] Control adaptive time step In-Reply-To: <7F668160-5A51-41D2-8CD2-E1C965FA54BE@msu.edu> References: <7F668160-5A51-41D2-8CD2-E1C965FA54BE@msu.edu> Message-ID: <2FC5AB26-594C-4C1D-B961-2896C665FCD1@anl.gov> You can control the factors with -ts_adapt_clip ,- to set admissible time step decrease and increase factors The default setting uses low=0.1, high=10. https://petsc.org/release/docs/manualpages/TS/TSAdaptSetClip.html#TSAdaptSetClip Hong (Mr.) On Mar 5, 2022, at 12:26 PM, Tang, Qi wrote: Hi, Is there a simple way to control the first few adaptive time step? Currently, I am using the following options but dt grows too fast initially. The first few time steps increase dt by a factor of 10 in each time step. Is there a way to do a slow start, say, a factor of 2 instead of 10? Thanks. -ts_type arkimex -ts_arkimex_type 2e -ts_arkimex_fully_implicit -ts_adapt_dt_min 1.0e-5 -ts_adapt_dt_max 1.0e-2 -ts_adapt_type basic Qi T-5 at LANL -------------- next part -------------- An HTML attachment was scrubbed... URL: From tangqi at msu.edu Sat Mar 5 13:07:33 2022 From: tangqi at msu.edu (Tang, Qi) Date: Sat, 5 Mar 2022 19:07:33 +0000 Subject: [petsc-users] Control adaptive time step In-Reply-To: <2FC5AB26-594C-4C1D-B961-2896C665FCD1@anl.gov> References: <7F668160-5A51-41D2-8CD2-E1C965FA54BE@msu.edu> <2FC5AB26-594C-4C1D-B961-2896C665FCD1@anl.gov> Message-ID: Thanks, Hong. That?s exactly what I need. On Mar 5, 2022, at 11:55 AM, Zhang, Hong > wrote: You can control the factors with -ts_adapt_clip ,- to set admissible time step decrease and increase factors The default setting uses low=0.1, high=10. https://petsc.org/release/docs/manualpages/TS/TSAdaptSetClip.html#TSAdaptSetClip Hong (Mr.) On Mar 5, 2022, at 12:26 PM, Tang, Qi > wrote: Hi, Is there a simple way to control the first few adaptive time step? Currently, I am using the following options but dt grows too fast initially. The first few time steps increase dt by a factor of 10 in each time step. Is there a way to do a slow start, say, a factor of 2 instead of 10? Thanks. -ts_type arkimex -ts_arkimex_type 2e -ts_arkimex_fully_implicit -ts_adapt_dt_min 1.0e-5 -ts_adapt_dt_max 1.0e-2 -ts_adapt_type basic Qi T-5 at LANL -------------- next part -------------- An HTML attachment was scrubbed... URL: From varunhiremath at gmail.com Mon Mar 7 05:00:47 2022 From: varunhiremath at gmail.com (Varun Hiremath) Date: Mon, 7 Mar 2022 03:00:47 -0800 Subject: [petsc-users] SLEPc solve: progress info and abort option In-Reply-To: References: Message-ID: Thanks, Matt and Jose! I have added a custom function to KSPMonitorSet, and that improves the response time for the *abort *option, however, it is still a bit slow for very big problems, but I think that is probably because I am using the MUMPS direct solver so likely a large amount of time is spent inside MUMPS. And I am guessing there is no way to get the progress info of MUMPS from PETSc? Jose, for the progress bar I am using the number of converged eigenvalues (nconv) as obtained using EPSMonitorSet function. But this is slow as it is called only once every iteration, and typically many eigenvalues converge within an iteration, so is there any way to get more detailed/finer info on the solver progress? Many thanks for your help. Thanks, Varun On Fri, Mar 4, 2022 at 11:36 AM Jose E. Roman wrote: > Yes, assuming that the eigensolver is calling KSPSolve(), you can set a > monitor with KSPMonitorSet(). This will be called more often than the > callback for EPSSetStoppingTestFunction(). > > Jose > > > El 4 mar 2022, a las 20:16, Matthew Knepley > escribi?: > > > > > > On Fri, Mar 4, 2022 at 2:07 PM Varun Hiremath > wrote: > > Hi All, > > > > We use SLEPc to compute eigenvalues of big problems which typically > takes a long time. We want to add a progress bar to inform the user of the > estimated time remaining to finish the computation. In addition, we also > want to add an option for the user to abort the computation midway if > needed. > > > > To some extent, I am able to do these by attaching a custom function to > EPSSetStoppingTestFunction and using nconv/nev as an indication of > progress, and throwing an exception when the user decides to abort the > computation. However, since this function gets called only once every > iteration, for very big problems it takes a long time for the program to > respond. I was wondering if there is any other function to which I can > attach, which gets called more frequently and can provide more fine-grained > information on the progress. > > > > I believe (Jose can correct me) that the bulk of the time in an iterate > would be in the linear solve. You can insert something into a KSPMonitor. > If you know the convergence tolerance and assume a linear convergence rate > I guess you could estimate the "amount done". > > > > Thanks, > > > > Matt > > > > Thanks, > > Varun > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Mon Mar 7 05:23:03 2022 From: jroman at dsic.upv.es (Jose E. Roman) Date: Mon, 7 Mar 2022 12:23:03 +0100 Subject: [petsc-users] SLEPc solve: progress info and abort option In-Reply-To: References: Message-ID: <52E7AF73-EE43-4F23-A978-43A9F6882F24@dsic.upv.es> > El 7 mar 2022, a las 12:00, Varun Hiremath escribi?: > > Thanks, Matt and Jose! I have added a custom function to KSPMonitorSet, and that improves the response time for the abort option, however, it is still a bit slow for very big problems, but I think that is probably because I am using the MUMPS direct solver so likely a large amount of time is spent inside MUMPS. And I am guessing there is no way to get the progress info of MUMPS from PETSc? > > Jose, for the progress bar I am using the number of converged eigenvalues (nconv) as obtained using EPSMonitorSet function. But this is slow as it is called only once every iteration, and typically many eigenvalues converge within an iteration, so is there any way to get more detailed/finer info on the solver progress? It is typical that Krylov solvers converge several eigenvalues at once. You can look at the residual norm of the first uncoverged eigenvalue to see "how far" you are from convergence. But convergence may be irregular. You can also try reducing the ncv parameter, so that the monitor is called more often, but this will probably slow down convergence. Jose > > Many thanks for your help. > > Thanks, > Varun > > On Fri, Mar 4, 2022 at 11:36 AM Jose E. Roman > wrote: > Yes, assuming that the eigensolver is calling KSPSolve(), you can set a monitor with KSPMonitorSet(). This will be called more often than the callback for EPSSetStoppingTestFunction(). > > Jose > > > El 4 mar 2022, a las 20:16, Matthew Knepley > escribi?: > > > > > > On Fri, Mar 4, 2022 at 2:07 PM Varun Hiremath > wrote: > > Hi All, > > > > We use SLEPc to compute eigenvalues of big problems which typically takes a long time. We want to add a progress bar to inform the user of the estimated time remaining to finish the computation. In addition, we also want to add an option for the user to abort the computation midway if needed. > > > > To some extent, I am able to do these by attaching a custom function to EPSSetStoppingTestFunction and using nconv/nev as an indication of progress, and throwing an exception when the user decides to abort the computation. However, since this function gets called only once every iteration, for very big problems it takes a long time for the program to respond. I was wondering if there is any other function to which I can attach, which gets called more frequently and can provide more fine-grained information on the progress. > > > > I believe (Jose can correct me) that the bulk of the time in an iterate would be in the linear solve. You can insert something into a KSPMonitor. If you know the convergence tolerance and assume a linear convergence rate I guess you could estimate the "amount done". > > > > Thanks, > > > > Matt > > > > Thanks, > > Varun > > > > > > -- > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > From knepley at gmail.com Mon Mar 7 05:36:26 2022 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 7 Mar 2022 06:36:26 -0500 Subject: [petsc-users] SLEPc solve: progress info and abort option In-Reply-To: <52E7AF73-EE43-4F23-A978-43A9F6882F24@dsic.upv.es> References: <52E7AF73-EE43-4F23-A978-43A9F6882F24@dsic.upv.es> Message-ID: On Mon, Mar 7, 2022 at 6:23 AM Jose E. Roman wrote: > > > > El 7 mar 2022, a las 12:00, Varun Hiremath > escribi?: > > > > Thanks, Matt and Jose! I have added a custom function to KSPMonitorSet, > and that improves the response time for the abort option, however, it is > still a bit slow for very big problems, but I think that is probably > because I am using the MUMPS direct solver so likely a large amount of time > is spent inside MUMPS. And I am guessing there is no way to get the > progress info of MUMPS from PETSc? > Yes, we do not have a way of looking into MUMPS. You might see if they have a suggestion. Thanks, Matt > > Jose, for the progress bar I am using the number of converged > eigenvalues (nconv) as obtained using EPSMonitorSet function. But this is > slow as it is called only once every iteration, and typically many > eigenvalues converge within an iteration, so is there any way to get more > detailed/finer info on the solver progress? > > It is typical that Krylov solvers converge several eigenvalues at once. > You can look at the residual norm of the first uncoverged eigenvalue to see > "how far" you are from convergence. But convergence may be irregular. You > can also try reducing the ncv parameter, so that the monitor is called more > often, but this will probably slow down convergence. > > Jose > > > > > > Many thanks for your help. > > > > Thanks, > > Varun > > > > On Fri, Mar 4, 2022 at 11:36 AM Jose E. Roman > wrote: > > Yes, assuming that the eigensolver is calling KSPSolve(), you can set a > monitor with KSPMonitorSet(). This will be called more often than the > callback for EPSSetStoppingTestFunction(). > > > > Jose > > > > > El 4 mar 2022, a las 20:16, Matthew Knepley > escribi?: > > > > > > > > > On Fri, Mar 4, 2022 at 2:07 PM Varun Hiremath > wrote: > > > Hi All, > > > > > > We use SLEPc to compute eigenvalues of big problems which typically > takes a long time. We want to add a progress bar to inform the user of the > estimated time remaining to finish the computation. In addition, we also > want to add an option for the user to abort the computation midway if > needed. > > > > > > To some extent, I am able to do these by attaching a custom function > to EPSSetStoppingTestFunction and using nconv/nev as an indication of > progress, and throwing an exception when the user decides to abort the > computation. However, since this function gets called only once every > iteration, for very big problems it takes a long time for the program to > respond. I was wondering if there is any other function to which I can > attach, which gets called more frequently and can provide more fine-grained > information on the progress. > > > > > > I believe (Jose can correct me) that the bulk of the time in an > iterate would be in the linear solve. You can insert something into a > KSPMonitor. If you know the convergence tolerance and assume a linear > convergence rate I guess you could estimate the "amount done". > > > > > > Thanks, > > > > > > Matt > > > > > > Thanks, > > > Varun > > > > > > > > > -- > > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > > -- Norbert Wiener > > > > > > https://www.cse.buffalo.edu/~knepley/ < > https://www.cse.buffalo.edu/~knepley/> > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Mar 7 15:08:09 2022 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 7 Mar 2022 16:08:09 -0500 Subject: [petsc-users] [KSP] PETSc not reporting a KSP fail when true residual is NaN In-Reply-To: References: <850B6DA1-9FB8-4139-ADDF-B32F118A5EA3@petsc.dev> Message-ID: <084502CB-39DD-4A9C-B2AC-CBCD49CA7CF3@petsc.dev> The fix for the problem Geiovane encountered is in https://gitlab.com/petsc/petsc/-/merge_requests/4934 > On Mar 3, 2022, at 11:24 AM, Giovane Avancini wrote: > > Sorry for my late reply Barry, > > Sure I can share the code with you, but unfortunately I don't know how to make docker images. If you don't mind, you can clone the code from github through this link: git at github.com:giavancini/runPFEM.git > It can be easily compiled with cmake, and you can see the dependencies in README.md. Please let me know if you need any other information. > > Kind regards, > > Giovane > > Em sex., 25 de fev. de 2022 ?s 18:22, Barry Smith > escreveu: > > Hmm, this is going to be tricky to debug why it the Inf/Nan is not found when it should be. > > In a debugger you can catch/trap floating point exceptions (how to do this depends on your debugger) and then step through the code after that to see why PETSc KSP is not properly noting the Inf/Nan and returning. This may be cumbersome to do if you don't know PETSc well. Is your code easy to build, would be willing to share it to me so I can run it and debug directly? If you know how to make docker images or something you might be able to give it to me easily. > > Barry > > >> On Feb 25, 2022, at 3:59 PM, Giovane Avancini > wrote: >> >> Mark, Matthew and Barry, >> >> Thank you all for the quick responses. >> >> Others might have a better idea, but you could run with '-info :ksp' and see if you see any messages like "Linear solver has created a not a number (NaN) as the residual norm, declaring divergence \n" >> You could also run with -log_trace and see if it is using KSPConvergedDefault. I'm not sure if this is the method used given your parameters, but I think it is. >> Mark, I ran with both options. I didn't get any messages like "linear solver has created a not a number..." when using -info: ksp. When turning on -log_trace, I could verify that it is using KSPConvergedDefault but what does it mean exactly? When FGMRES converges with the true residual being NaN, I get the following message: [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 8.897908325511e-05 is less than relative tolerance 1.000000000000e-08 times initial right hand side norm 1.466597558465e+04 at iteration 53. No information about NaN whatsoever. >> >> We check for NaN or Inf, for example, in KSPCheckDot(). if you have the KSP set to error (https://petsc.org/main/docs/manualpages/KSP/KSPSetErrorIfNotConverged.html ) >> then we throw an error, but the return codes do not seem to be checked in your implementation. If not, then we set the flag for divergence. >> Matthew, I do not check the return code in this case because I don't want PETSc to stop if an error occurs during the solving step. I just want to know that it didn't converge and treat this error inside my code. The problem is that the flag for divergence is not always being set when FGMRES is not converging. I was just wondering why it was set during time step 921 and why not for time step 922 as well. >> >> Thanks for the complete report. It looks like we may be missing a check in our FGMRES implementation that allows the iteration to continue after a NaN/Inf. >> >> I will explain how we handle the checking and then attach a patch that you can apply to see if it resolves the problem. Whenever our KSP solvers compute a norm we >> check after that calculation to verify that the norm is not an Inf or Nan. This is an inexpensive global check across all MPI ranks because immediately after the norm computation all ranks that share the KSP have the same value. If the norm is a Inf or Nan we "short-circuit" the KSP solve and return immediately with an appropriate not converged code. A quick eye-ball inspection of the FGMRES code found a missing check. >> >> You can apply the attached patch file in the PETSC_DIR with >> >> patch -p1 < fgmres.patch >> make libs >> >> then rerun your code and see if it now handles the Inf/NaN correctly. If so we'll patch our release branch with the fix. >> Thank you for checking this, Barry. I applied the patch exactly the way you instructed, however, the problem is still happening. Is there a way to check if the patch was in fact applied? You can see in the attached screenshot the terminal information. >> >> Kind regards, >> >> Giovane >> >> Em sex., 25 de fev. de 2022 ?s 13:48, Barry Smith > escreveu: >> >> Giovane, >> >> Thanks for the complete report. It looks like we may be missing a check in our FGMRES implementation that allows the iteration to continue after a NaN/Inf. >> >> I will explain how we handle the checking and then attach a patch that you can apply to see if it resolves the problem. Whenever our KSP solvers compute a norm we >> check after that calculation to verify that the norm is not an Inf or Nan. This is an inexpensive global check across all MPI ranks because immediately after the norm computation all ranks that share the KSP have the same value. If the norm is a Inf or Nan we "short-circuit" the KSP solve and return immediately with an appropriate not converged code. A quick eye-ball inspection of the FGMRES code found a missing check. >> >> You can apply the attached patch file in the PETSC_DIR with >> >> patch -p1 < fgmres.patch >> make libs >> >> then rerun your code and see if it now handles the Inf/NaN correctly. If so we'll patch our release branch with the fix. >> >> Barry >> >> >> >>> Giovane >> >> >>> On Feb 25, 2022, at 11:06 AM, Giovane Avancini via petsc-users > wrote: >>> >>> Dear PETSc users, >>> >>> I'm working on an inhouse code that solves the Navier-Stokes equation in a Lagrangian fashion for free surface flows. Because of the large distortions and pressure gradients, it is quite common to encounter some issues with iterative solvers for some time steps, and because of that, I implemented a function that changes the solver type based on the flag KSPConvergedReason. If this flag is negative after a call to KSPSolve, I solve the same linear system again using a direct method. >>> >>> The problem is that, sometimes, KSP keeps converging even though the residual is NaN, and because of that, I'm not able to identify the problem and change the solver, which leads to a solution vector equals to INF and obviously the code ends up crashing. Is it normal to observe this kind of behaviour? >>> >>> Please find attached the log produced with the options -ksp_monitor_lg_residualnorm -ksp_log -ksp_view -ksp_monitor_true_residual -ksp_converged_reason and the function that changes the solver. I'm currently using FGMRES and BJACOBI preconditioner with LU for each block. The problem still happens with ILU for example. We can see in the log file that for the time step 921, the true residual is NaN and within just one iteration, the solver fails and it gives the reason DIVERGED_PC_FAILED. I simply changed the solver to MUMPS and it converged for that time step. However, when solving time step 922 we can see that FGMRES converges while the true residual is NaN. Why is that possible? I would appreciate it if someone could clarify this issue to me. >>> >>> Kind regards, >>> Giovane >>> >>> >>> >>> -- >>> Giovane Avancini >>> Doutorando em Engenharia de Estruturas - Escola de Engenharia de S?o Carlos, USP >>> >>> PhD researcher in Structural Engineering - School of Engineering of S?o Carlos. USP >>> >> >> >> >> -- >> Giovane Avancini >> Doutorando em Engenharia de Estruturas - Escola de Engenharia de S?o Carlos, USP >> >> PhD researcher in Structural Engineering - School of Engineering of S?o Carlos. USP >> > > > > -- > Giovane Avancini > Doutorando em Engenharia de Estruturas - Escola de Engenharia de S?o Carlos, USP > > PhD researcher in Structural Engineering - School of Engineering of S?o Carlos. USP -------------- next part -------------- An HTML attachment was scrubbed... URL: From berend.vanwachem at ovgu.de Tue Mar 8 10:08:31 2022 From: berend.vanwachem at ovgu.de (Berend van Wachem) Date: Tue, 8 Mar 2022 17:08:31 +0100 Subject: [petsc-users] DMView and DMLoad In-Reply-To: References: <56ce2135-9757-4292-e33b-c7eea8fb7b2e@ovgu.de> <056E066F-D596-4254-A44A-60BFFD30FE82@erdw.ethz.ch> <6c4e0656-db99-e9da-000f-ab9f7dd62c07@ovgu.de> <0845e501-e2cd-d7cc-58be-2803ee5ef6cd@ovgu.de> Message-ID: <49329c6f-dfa7-b1ca-1d16-d79f1234c1df@ovgu.de> Dear Koki, Many thanks for your help - that was very useful! We have been able to save/load non-periodic meshes. Unfortunately, the periodic ones are still not working for us, and I attach a small example code and output. As you suggested, the "-dm_plex_view_hdf5_storage_version 2.0.0" option gets rid of the previous error message ("Number of coordinates loaded 3168 does not match number of vertices 1000") but the the cell values are either shifted or overwritten after loading. To illustrate that the cell IDs and values for a periodic box with 4 cells in each direction and consecutive cell values is included below. Two CPUs are used. It is clear to see that the saved and loaded fields are different. Do you think is there a way to make it work for periodic cases? I have attached the small code that I used for this test. Many thanks and best regards, Berend. Output: ---- Save ----------------- ----- Load ----------------- CellIndex CellValue CPU 0 : CellIndex CellValue CPU 0 : 0 0 0 8 1 1 1 9 2 2 2 10 3 3 3 11 4 4 4 12 5 5 5 13 6 6 6 14 7 7 7 15 8 8 8 24 9 9 9 25 10 10 10 26 11 11 11 27 12 12 12 28 13 13 13 29 14 14 14 30 15 15 15 31 16 16 16 8 17 17 17 9 18 18 18 10 19 19 19 11 20 20 20 12 21 21 21 13 22 22 22 14 23 23 23 15 24 24 24 24 25 25 25 25 26 26 26 26 27 27 27 27 28 28 28 28 29 29 29 29 30 30 30 30 31 31 31 31 CellIndex CellValue CPU 1 : CellIndex CellValue CPU 1 : 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 7 7 7 7 8 8 8 16 9 9 9 17 10 10 10 18 11 11 11 19 12 12 12 20 13 13 13 21 14 14 14 22 15 15 15 23 16 16 16 0 17 17 17 1 18 18 18 2 19 19 19 3 20 20 20 4 21 21 21 5 22 22 22 6 23 23 23 7 24 24 24 16 25 25 25 17 26 26 26 18 27 27 27 19 28 28 28 20 29 29 29 21 30 30 30 22 31 31 31 23 On 2/24/22 12:07, Sagiyama, Koki wrote: > Dear Berend, > > DMClone() on a DMPlex object does not clone the PetscSection that that > DMPlex object carries > (https://petsc.org/main/docs/manualpages/DM/DMClone.html > ). I think you > intended to do something like the following: > ``` > DMClone(dm, &sdm); > PetscObjectSetName((PetscObject)sdm, "dmA"); > DMSetLocalSection(sdm, section); > ... > DMCreateGlobalVector(sdm, &xGlobalVector); > ... > ``` > > Regarding save/load, current default I/O seems not working for some > reason for periodic meshes as you reported. The latest implementation, > however, seems working, so you can try using > `-dm_plex_view_hdf5_storage_version 2.0.0` option when saving and see if > it works. > > Thanks, > Koki > > ------------------------------------------------------------------------ > *From:* Berend van Wachem > *Sent:* Thursday, February 17, 2022 9:06 AM > *To:* Sagiyama, Koki ; Hapla Vaclav > ; PETSc users list ; > Lawrence Mitchell > *Subject:* Re: [petsc-users] DMView and DMLoad > Dear Koki, > > Many thanks for your help and sorry for the slow reply. > > I haven't been able to get it to work successfully. I have attached a > small example that replicates the main features of our code. In this > example a Box with one random field is generated, saved and loaded. The > case works for non-periodic domains and fails for periodic ones. I've > also included the error output at the bottom of this email. > > To switch between periodic and non-periodic, please comment/uncomment > lines 47 to 52 in src/main.c. To compile, the files "compile" and > "CMakeLists.txt" are included in a separate tar file, if you want to use > this. Your library paths should be updated in the latter file. The PETSc > main distribution is used. > > Many thanks for your help! > > Thanks and best regards, > > Berend. > > > > The error message with --with-debugging=no --with-errorchecking=no: > > [0]PETSC ERROR: --------------------- Error Message > ------------------------------------------------ > [0]PETSC ERROR: Invalid argument > [0]PETSC ERROR: Number of coordinates loaded 3168 does not match number > of vertices 1000 > [0]PETSC ERROR: See https://petsc.org/release/faq/ > for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.16.2-499-g9039b796d1 > GIT Date: 2021-12-24 23:23:09 +0000 > [0]PETSC ERROR: ./bin/restart_periodic on a arch-linux-c-opt named james > by serbenlo Thu Dec 30 20:53:22 2021 > [0]PETSC ERROR: Configure options --with-debugging=no > --with-errorchecking=no --with-clean --download-metis=yes > --download-parmetis=yes --download-hdf5 --download-p4est > --download-triangle --download-tetgen > --with-zlib-lib=/usr/lib/x86_64-linux-gnu/libz.a > --with-zlib-include=/usr/include --with-mpi=yes --with-mpi-dir=/usr > --with-mpiexec=/usr/bin/mpiexec > [0]PETSC ERROR: #1 DMPlexCoordinatesLoad_HDF5_V0_Private() at > /usr/local/petsc_main/src/dm/impls/plex/plexhdf5.c:1387 > [0]PETSC ERROR: #2 DMPlexCoordinatesLoad_HDF5_Internal() at > /usr/local/petsc_main/src/dm/impls/plex/plexhdf5.c:1419 > [0]PETSC ERROR: #3 DMPlexCoordinatesLoad() at > /usr/local/petsc_main/src/dm/impls/plex/plex.c:2070 > [0]PETSC ERROR: #4 main() at > /media/MANNHEIM/Arbeit/OvGU_PostDoc_2021/Projects/MF_Restart/Periodic_DM/restart-periodic/src/main.c:229 > [0]PETSC ERROR: No PETSc Option Table entries > [0]PETSC ERROR: ----------------End of Error Message -------send entire > error message to petsc-maint at mcs.anl.gov---------- > application called MPI_Abort(MPI_COMM_WORLD, 62) - process 0 > > > The error message with --with-debugging=yes --with-errorchecking=yes: > > [0]PETSC ERROR: --------------------- Error Message > ------------------------------------------------- > [0]PETSC ERROR: Null argument, when expecting valid pointer > [1]PETSC ERROR: --------------------- Error Message > ------------------------------------------------- > [1]PETSC ERROR: Null argument, when expecting valid pointer > [1]PETSC ERROR: Null Object: Parameter # 1 > [1]PETSC ERROR: See https://petsc.org/release/faq/ > for trouble shooting. > [1]PETSC ERROR: Petsc Development GIT revision: v3.16.2-499-g9039b796d1 > GIT Date: 2021-12-24 23:23:09 +0000 > [1]PETSC ERROR: ./bin/restart_periodic on a arch-linux-c-opt named james > by serbenlo Thu Dec 30 20:17:22 2021 > [1]PETSC ERROR: Configure options --with-debugging=yes > --with-errorchecking=yes --with-clean --download-metis=yes > --download-parmetis=yes --download-hdf5 --download-p4est > --download-triangle --download-tetgen > --with-zlib-lib=/usr/lib/x86_64-linux-gnu/libz.a > --with-zlib-include=/usr/include --with-mpi=yes --with-mpi-dir=/usr > --with-mpiexec=/usr/bin/mpiexec > [1]PETSC ERROR: #1 PetscSectionGetDof() at > /usr/local/petsc_main/src/vec/is/section/interface/section.c:807 > [1]PETSC ERROR: [0]PETSC ERROR: Null Object: Parameter # 1 > [0]PETSC ERROR: See https://petsc.org/release/faq/ > for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.16.2-499-g9039b796d1 > GIT Date: 2021-12-24 23:23:09 +0000 > [0]PETSC ERROR: ./bin/restart_periodic on a arch-linux-c-opt named james > by serbenlo Thu Dec 30 20:17:22 2021 > [0]PETSC ERROR: Configure options --with-debugging=yes > --with-errorchecking=yes --with-clean --download-metis=yes > --download-parmetis=yes --download-hdf5 --download-p4est > --download-triangle --download-tetgen > --with-zlib-lib=/usr/lib/x86_64-linux-gnu/libz.a > --with-zlib-include=/usr/include --with-mpi=yes --with-mpi-dir=/usr > --with-mpiexec=/usr/bin/mpiexec > #2 DMDefaultSectionCheckConsistency_Internal() at > /usr/local/petsc_main/src/dm/interface/dm.c:4489 > [1]PETSC ERROR: #3 DMSetGlobalSection() at > /usr/local/petsc_main/src/dm/interface/dm.c:4583 > [1]PETSC ERROR: [0]PETSC ERROR: #1 PetscSectionGetDof() at > /usr/local/petsc_main/src/vec/is/section/interface/section.c:807 > [0]PETSC ERROR: #4 main() at > /media/MANNHEIM/Arbeit/OvGU_PostDoc_2021/Projects/MF_Restart/Periodic_DM/restart-periodic/src/main.c:164 > [1]PETSC ERROR: No PETSc Option Table entries > [1]PETSC ERROR: #2 DMDefaultSectionCheckConsistency_Internal() at > /usr/local/petsc_main/src/dm/interface/dm.c:4489 > [0]PETSC ERROR: #3 DMSetGlobalSection() at > /usr/local/petsc_main/src/dm/interface/dm.c:4583 > ----------------End of Error Message -------send entire error message to > petsc-maint at mcs.anl.gov---------- > application called MPI_Abort(MPI_COMM_WORLD, 85) - process 1 > [0]PETSC ERROR: #4 main() at > /media/MANNHEIM/Arbeit/OvGU_PostDoc_2021/Projects/MF_Restart/Periodic_DM/restart-periodic/src/main.c:164 > [0]PETSC ERROR: No PETSc Option Table entries > [0]PETSC ERROR: ----------------End of Error Message -------send entire > error message to petsc-maint at mcs.anl.gov---------- > application called MPI_Abort(MPI_COMM_WORLD, 85) - process 0 > > > > On 12/7/21 16:50, Sagiyama, Koki wrote: >> Hi Berend, >> >> I made some small changes to your code to successfully compile it and >> defined a periodic dm using DMPlexCreateBoxMesh(), but otherwise your >> code worked fine. >> I think we would like to see a complete minimal failing example. Can you >> make the working example that I pasted in earlier email fail just by >> modifying the dm(i.e., using the periodic mesh you are actually using)? >> >> Thanks, >> Koki >> ------------------------------------------------------------------------ >> *From:* Berend van Wachem >> *Sent:* Monday, December 6, 2021 3:39 PM >> *To:* Sagiyama, Koki ; Hapla Vaclav >> ; PETSc users list ; >> Lawrence Mitchell >> *Subject:* Re: [petsc-users] DMView and DMLoad >> Dear Koki, >> >> Thanks for your email. In the example of your last email >> DMPlexCoordinatesLoad() takes sF0 (PetscSF) as a third argument. In our >> code this modification does not fix the error when loading a periodic >> dm. Are we doing something wrong? I've included an example code at the >> bottom of this email, including the error output. >> >> Thanks and best regards, >> Berend >> >> >> /**** Write DM + Vec restart ****/ >> PetscViewerHDF5Open(PETSC_COMM_WORLD, "result", FILE_MODE_WRITE, &H5Viewer); >> PetscObjectSetName((PetscObject)dm, "plexA"); >> PetscViewerPushFormat(H5Viewer, PETSC_VIEWER_HDF5_PETSC); >> DMPlexTopologyView(dm, H5Viewer); >> DMPlexLabelsView(dm, H5Viewer); >> DMPlexCoordinatesView(dm, H5Viewer); >> PetscViewerPopFormat(H5Viewer); >> >> DM sdm; >> PetscSection s; >> >> DMClone(dm, &sdm); >> PetscObjectSetName((PetscObject)sdm, "dmA"); >> DMGetGlobalSection(dm, &s); >> DMSetGlobalSection(sdm, s); >> DMPlexSectionView(dm, H5Viewer, sdm); >> >> Vec? vec, vecOld; >> PetscScalar *array, *arrayOld, *xVecArray, *xVecArrayOld; >> PetscInt numPoints; >> >> DMGetGlobalVector(sdm, &vec); >> DMGetGlobalVector(sdm, &vecOld); >> >> /*** Fill the vectors vec and vecOld? ***/ >> VecGetArray(vec, &array); >> VecGetArray(vecOld, &arrayOld); >> VecGetLocalSize(xGlobalVector, &numPoints); >> VecGetArray(xGlobalVector, &xVecArray); >> VecGetArray(xOldGlobalVector, &xVecArrayOld); >> >> for (i = 0; i < numPoints; i++) /* Loop over all internal mesh points */ >> { >>? ???? array[i]??? = xVecArray[i]; >>? ???? arrayOld[i] = xVecArrayOld[i]; >> } >> >> VecRestoreArray(vec, &array); >> VecRestoreArray(vecOld, &arrayOld); >> VecRestoreArray(xGlobalVector, &xVecArray); >> VecRestoreArray(xOldGlobalVector, &xVecArrayOld); >> >> PetscObjectSetName((PetscObject)vec, "vecA"); >> PetscObjectSetName((PetscObject)vecOld, "vecB"); >> DMPlexGlobalVectorView(dm, H5Viewer, sdm, vec); >> DMPlexGlobalVectorView(dm, H5Viewer, sdm, vecOld); >> PetscViewerDestroy(&H5Viewer); >> /*** end of writing ****/ >> >> /*** Load ***/ >> PetscViewerHDF5Open(PETSC_COMM_WORLD, "result", FILE_MODE_READ, &H5Viewer); >> DMCreate(PETSC_COMM_WORLD, &dm); >> DMSetType(dm, DMPLEX); >> PetscObjectSetName((PetscObject)dm, "plexA"); >> PetscViewerPushFormat(H5Viewer, PETSC_VIEWER_HDF5_PETSC); >> DMPlexTopologyLoad(dm, H5Viewer, &sfO); >> DMPlexLabelsLoad(dm, H5Viewer); >> DMPlexCoordinatesLoad(dm, H5Viewer, sfO); >> PetscViewerPopFormat(H5Viewer); >> >> DMPlexDistribute(dm, Options->Mesh.overlap, &sfDist, &distributedDM); >> if (distributedDM) { >>? ???? DMDestroy(&dm); >>? ???? dm = distributedDM; >>? ???? PetscObjectSetName((PetscObject)dm, "plexA"); >> } >> >> PetscSFCompose(sfO, sfDist, &sf); >> PetscSFDestroy(&sfO); >> PetscSFDestroy(&sfDist); >> >> DMClone(dm, &sdm); >> PetscObjectSetName((PetscObject)sdm, "dmA"); >> DMPlexSectionLoad(dm, H5Viewer, sdm, sf, &globalDataSF, &localDataSF); >> >> /** Load the Vectors **/ >> DMGetGlobalVector(sdm, &Restart_xGlobalVector); >> VecSet(Restart_xGlobalVector,0.0); >> >> PetscObjectSetName((PetscObject)Restart_xGlobalVector, "vecA"); >> DMPlexGlobalVectorLoad(dm, H5Viewer, sdm, >> globalDataSF,Restart_xGlobalVector); >> DMGetGlobalVector(sdm, &Restart_xOldGlobalVector); >> VecSet(Restart_xOldGlobalVector,0.0); >> >> PetscObjectSetName((PetscObject)Restart_xOldGlobalVector, "vecB"); >> DMPlexGlobalVectorLoad(dm, H5Viewer, sdm, globalDataSF, >> Restart_xOldGlobalVector); >> >> PetscViewerDestroy(&H5Viewer); >> >> >> /**** The error message when loading is the following ************/ >> >> Creating and distributing mesh >> [0]PETSC ERROR: --------------------- Error Message >> -------------------------- >> [0]PETSC ERROR: Invalid argument >> [0]PETSC ERROR: Number of coordinates loaded 17128 does not match number >> of vertices 8000 >> [0]PETSC ERROR: See https://petsc.org/release/faq/ >> > for > trouble shooting. >> [0]PETSC ERROR: Petsc Development GIT revision: v3.16.1-435-g007f11b901 >> GIT Date: 2021-12-01 14:31:21 +0000 >> [0]PETSC ERROR: ./MF3 on a linux-gcc-openmpi-opt named >> ivt24.ads.uni-magdeburg.de by berend Mon Dec? 6 16:11:21 2021 >> [0]PETSC ERROR: Configure options --with-p4est=yes --with-partemis >> --with-metis --with-debugging=no --download-metis=yes >> --download-parmetis=yes --with-errorchecking=no --download-hdf5 >> --download-zlib --download-p4est >> [0]PETSC ERROR: #1 DMPlexCoordinatesLoad_HDF5_V0_Private() at >> /home/berend/src/petsc_main/src/dm/impls/plex/plexhdf5.c:1387 >> [0]PETSC ERROR: #2 DMPlexCoordinatesLoad_HDF5_Internal() at >> /home/berend/src/petsc_main/src/dm/impls/plex/plexhdf5.c:1419 >> [0]PETSC ERROR: #3 DMPlexCoordinatesLoad() at >> /home/berend/src/petsc_main/src/dm/impls/plex/plex.c:2070 >> [0]PETSC ERROR: #4 RestartMeshDM() at >> /home/berend/src/eclipseworkspace/multiflow/src/io/restartmesh.c:81 >> [0]PETSC ERROR: #5 CreateMeshDM() at >> /home/berend/src/eclipseworkspace/multiflow/src/mesh/createmesh.c:61 >> [0]PETSC ERROR: #6 main() at >> /home/berend/src/eclipseworkspace/multiflow/src/general/main.c:132 >> [0]PETSC ERROR: PETSc Option Table entries: >> [0]PETSC ERROR: --download-hdf5 >> [0]PETSC ERROR: --download-metis=yes >> [0]PETSC ERROR: --download-p4est >> [0]PETSC ERROR: --download-parmetis=yes >> [0]PETSC ERROR: --download-zlib >> [0]PETSC ERROR: --with-debugging=no >> [0]PETSC ERROR: --with-errorchecking=no >> [0]PETSC ERROR: --with-metis >> [0]PETSC ERROR: --with-p4est=yes >> [0]PETSC ERROR: --with-partemis >> [0]PETSC ERROR: -d results >> [0]PETSC ERROR: -o run.mf >> [0]PETSC ERROR: ----------------End of Error Message -------send entire >> error message to petsc-maint at mcs.anl.gov---------- >> -------------------------------------------------------------------------- >> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD >> with errorcode 62. >> >> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. >> You may or may not see output from other processes, depending on >> exactly when Open MPI kills them. >> -------------------------------------------------------------------------- >> >> >> >> >> >> On 11/19/21 00:26, Sagiyama, Koki wrote: >>> Hi Berend, >>> >>> I was not able to reproduce the issue you are having, but the following >>> 1D example (and similar 2D examples) worked fine for me using the latest >>> PETSc. Please note that DMPlexCoordinatesLoad() now takes a PetscSF >>> object as the third argument, but the default behavior is unchanged. >>> >>> /* test_periodic_io.c */ >>> >>> #include >>> #include >>> #include >>> >>> int main(int argc, char **argv) >>> { >>>? ? DM ? ? ? ? ? ? ? ? dm; >>>? ? Vec ? ? ? ? ? ? ? ?coordinates; >>>? ? PetscViewer ? ? ? ?viewer; >>>? ? PetscViewerFormat ?format = PETSC_VIEWER_HDF5_PETSC; >>>? ? PetscSF ? ? ? ? ? ?sfO; >>>? ? PetscErrorCode ? ? ierr; >>> >>>? ? ierr = PetscInitialize(&argc, &argv, NULL, NULL); if (ierr) return ierr; >>>? ? /* Save */ >>>? ? ierr = PetscViewerHDF5Open(PETSC_COMM_WORLD, "periodic_example.h5", >>> FILE_MODE_WRITE, &viewer);CHKERRQ(ierr); >>>? ? { >>>? ? ? DM ? ? ? ? ? ? ?pdm; >>>? ? ? PetscInt ? ? ? ?dim = 1; >>>? ? ? const PetscInt ?faces[1] = {4}; >>>? ? ? DMBoundaryType ?periodicity[] = {DM_BOUNDARY_PERIODIC}; >>>? ? ? PetscInt ? ? ? ?overlap = 1; >>> >>>? ? ? ierr = DMPlexCreateBoxMesh(PETSC_COMM_WORLD, dim, PETSC_FALSE, >>> faces, NULL, NULL, periodicity, PETSC_TRUE, &dm);CHKERRQ(ierr); >>>? ? ? ierr = DMPlexDistribute(dm, overlap, NULL, &pdm);CHKERRQ(ierr); >>>? ? ? ierr = DMDestroy(&dm);CHKERRQ(ierr); >>>? ? ? dm = pdm; >>>? ? ? ierr = PetscObjectSetName((PetscObject)dm, "periodicDM");CHKERRQ(ierr); >>>? ? } >>>? ? ierr = DMGetCoordinates(dm, &coordinates);CHKERRQ(ierr); >>>? ? ierr = PetscPrintf(PETSC_COMM_WORLD, "Coordinates before >>> saving:\n");CHKERRQ(ierr); >>>? ? ierr = VecView(coordinates, NULL);CHKERRQ(ierr); >>>? ? ierr = PetscViewerPushFormat(viewer, format);CHKERRQ(ierr); >>>? ? ierr = DMPlexTopologyView(dm, viewer);CHKERRQ(ierr); >>>? ? ierr = DMPlexCoordinatesView(dm, viewer);CHKERRQ(ierr); >>>? ? ierr = PetscViewerPopFormat(viewer);CHKERRQ(ierr); >>>? ? ierr = DMDestroy(&dm);CHKERRQ(ierr); >>>? ? ierr = PetscViewerDestroy(&viewer);CHKERRQ(ierr); >>>? ? /* Load */ >>>? ? ierr = PetscViewerHDF5Open(PETSC_COMM_WORLD, "periodic_example.h5", >>> FILE_MODE_READ, &viewer);CHKERRQ(ierr); >>>? ? ierr = DMCreate(PETSC_COMM_WORLD, &dm);CHKERRQ(ierr); >>>? ? ierr = DMSetType(dm, DMPLEX);CHKERRQ(ierr); >>>? ? ierr = PetscObjectSetName((PetscObject)dm, "periodicDM");CHKERRQ(ierr); >>>? ? ierr = PetscViewerPushFormat(viewer, format);CHKERRQ(ierr); >>>? ? ierr = DMPlexTopologyLoad(dm, viewer, &sfO);CHKERRQ(ierr); >>>? ? ierr = DMPlexCoordinatesLoad(dm, viewer, sfO);CHKERRQ(ierr); >>>? ? ierr = PetscViewerPopFormat(viewer);CHKERRQ(ierr); >>>? ? ierr = DMGetCoordinates(dm, &coordinates);CHKERRQ(ierr); >>>? ? ierr = PetscPrintf(PETSC_COMM_WORLD, "Coordinates after >>> loading:\n");CHKERRQ(ierr); >>>? ? ierr = VecView(coordinates, NULL);CHKERRQ(ierr); >>>? ? ierr = PetscSFDestroy(&sfO);CHKERRQ(ierr); >>>? ? ierr = DMDestroy(&dm);CHKERRQ(ierr); >>>? ? ierr = PetscViewerDestroy(&viewer);CHKERRQ(ierr); >>>? ? ierr = PetscFinalize(); >>>? ? return ierr; >>> } >>> >>> mpiexec -n 2 ./test_periodic_io >>> >>> Coordinates before saving: >>> Vec Object: coordinates 2 MPI processes >>>? ? type: mpi >>> Process [0] >>> 0. >>> Process [1] >>> 0.25 >>> 0.5 >>> 0.75 >>> Coordinates after loading: >>> Vec Object: vertices 2 MPI processes >>>? ? type: mpi >>> Process [0] >>> 0. >>> 0.25 >>> 0.5 >>> 0.75 >>> Process [1] >>> >>> I would also like to note that, with the latest update, we can >>> optionally load coordinates directly on the distributed dm as (using >>> your notation): >>> >>>? ? /* Distribute dm */ >>>? ? ... >>>? ? PetscSFCompose(sfO, sfDist, &sf); >>>? ? DMPlexCoordinatesLoad(dm, viewer, sf); >>> >>> To use this feature, we need to pass "-dm_plex_view_hdf5_storage_version >>> 2.0.0" option when saving topology/coordinates. >>> >>> >>> Thanks, >>> Koki >>> ------------------------------------------------------------------------ >>> *From:* Berend van Wachem >>> *Sent:* Wednesday, November 17, 2021 3:16 PM >>> *To:* Hapla Vaclav ; PETSc users list >>> ; Lawrence Mitchell ; Sagiyama, >>> Koki >>> *Subject:* Re: [petsc-users] DMView and DMLoad >>> >>> ******************* >>> This email originates from outside Imperial. Do not click on links and >>> attachments unless you recognise the sender. >>> If you trust the sender, add them to your safe senders list >>> https://spam.ic.ac.uk/SpamConsole/Senders.aspx > >> > >>> > >> to disable email >>> stamping for this address. >>> ******************* >>> Dear Vaclav, Lawrence, Koki, >>> >>> Thanks for your help! Following your advice and following your example >>> (https://petsc.org/main/docs/manual/dmplex/#saving-and-loading-data-with-hdf5 > >> >>> >> >>) > >> >>> >>> we are able to save and load the DM with a wrapped Vector in h5 format >>> (PETSC_VIEWER_HDF5_PETSC) successfully. >>> >>> For saving, we use something similar to: >>> >>>? ???? DMPlexTopologyView(dm, viewer); >>>? ???? DMClone(dm, &sdm); >>>? ???? ... >>>? ???? DMPlexSectionView(dm, viewer, sdm); >>>? ???? DMGetLocalVector(sdm, &vec); >>>? ???? ... >>>? ???? DMPlexLocalVectorView(dm, viewer, sdm, vec); >>> >>> and for loading: >>> >>>? ???? DMCreate(PETSC_COMM_WORLD, &dm); >>>? ???? DMSetType(dm, DMPLEX); >>>? ???????? ... >>>? ?????? PetscViewerPushFormat(viewer, PETSC_VIEWER_HDF5_PETSC); >>>? ???? DMPlexTopologyLoad(dm, viewer, &sfO); >>>? ???? DMPlexLabelsLoad(dm, viewer); >>>? ???? DMPlexCoordinatesLoad(dm, viewer); >>>? ???? PetscViewerPopFormat(viewer); >>>? ???? ... >>>? ???? PetscSFCompose(sfO, sfDist, &sf); >>>? ???? ... >>>? ???? DMClone(dm, &sdm); >>>? ???? DMPlexSectionLoad(dm, viewer, sdm, sf, &globalDataSF, &localDataSF); >>>? ???? DMGetLocalVector(sdm, &vec); >>>? ???? ... >>>? ???? DMPlexLocalVectorLoad(dm, viewer, sdm, localDataSF, vec); >>> >>> >>> This works fine for non-periodic DMs but for periodic cases the line: >>> >>>? ???? DMPlexCoordinatesLoad(dm, H5Viewer); >>> >>> delivers the error message: invalid argument and the number of loaded >>> coordinates does not match the number of vertices. >>> >>> Is this a known shortcoming, or have we forgotten something to load >>> periodic DMs? >>> >>> Best regards, >>> >>> Berend. >>> >>> >>> >>> On 9/22/21 20:59, Hapla Vaclav wrote: >>>> To avoid confusions here, Berend seems to be specifically demanding XDMF >>>> (PETSC_VIEWER_HDF5_XDMF). The stuff we are now working on is parallel >>>> checkpointing in our own HDF5 format?(PETSC_VIEWER_HDF5_PETSC), I will >>>> make a series of MRs on this topic in the following days. >>>> >>>> For XDMF, we are specifically missing the ability to write/load DMLabels >>>> properly. XDMF uses specific cell-local numbering for faces for >>>> specification of face sets, and face-local numbering for specification >>>> of edge sets, which is not great wrt DMPlex design. And ParaView doesn't >>>> show any of these properly so it's hard to debug. Matt, we should talk >>>> about this soon. >>>> >>>> Berend, for now, could you just load the mesh initially from XDMF and >>>> then use our PETSC_VIEWER_HDF5_PETSC format for subsequent saving/loading? >>>> >>>> Thanks, >>>> >>>> Vaclav >>>> >>>>> On 17 Sep 2021, at 15:46, Lawrence Mitchell >>>> >>>> wrote: >>>>> >>>>> Hi Berend, >>>>> >>>>>> On 14 Sep 2021, at 12:23, Matthew Knepley >>>>> > >>>> wrote: >>>>>> >>>>>> On Tue, Sep 14, 2021 at 5:15 AM Berend van Wachem >>>>>> > >>>> wrote: >>>>>> Dear PETSc-team, >>>>>> >>>>>> We are trying to save and load distributed DMPlex and its associated >>>>>> physical fields (created with DMCreateGlobalVector) ?(Uvelocity, >>>>>> VVelocity, ?...) in HDF5_XDMF format. To achieve this, we do the >>>>>> following: >>>>>> >>>>>> 1) save in the same xdmf.h5 file: >>>>>> DMView( DM ????????, H5_XDMF_Viewer ); >>>>>> VecView( UVelocity, H5_XDMF_Viewer ); >>>>>> >>>>>> 2) load the dm: >>>>>> DMPlexCreateFromfile(PETSC_COMM_WORLD, Filename, PETSC_TRUE, DM); >>>>>> >>>>>> 3) load the physical field: >>>>>> VecLoad( UVelocity, H5_XDMF_Viewer ); >>>>>> >>>>>> There are no errors in the execution, but the loaded DM is distributed >>>>>> differently to the original one, which results in the incorrect >>>>>> placement of the values of the physical fields (UVelocity etc.) in the >>>>>> domain. >>>>>> >>>>>> This approach is used to restart the simulation with the last saved DM. >>>>>> Is there something we are missing, or there exists alternative routes to >>>>>> this goal? Can we somehow get the IS of the redistribution, so we can >>>>>> re-distribute the vector data as well? >>>>>> >>>>>> Many thanks, best regards, >>>>>> >>>>>> Hi Berend, >>>>>> >>>>>> We are in the midst of rewriting this. We want to support saving >>>>>> multiple meshes, with fields attached to each, >>>>>> and preserving the discretization (section) information, and allowing >>>>>> us to load up on a different number of >>>>>> processes. We plan to be done by October. Vaclav and I are doing this >>>>>> in collaboration with Koki Sagiyama, >>>>>> David Ham, and Lawrence Mitchell from the Firedrake team. >>>>> >>>>> The core load/save cycle functionality is now in PETSc main. So if >>>>> you're using main rather than a release, you can get access to it now. >>>>> This section of the manual shows an example of how to do >>>>> thingshttps://petsc.org/main/docs/manual/dmplex/#saving-and-loading-data-with-hdf5 >>>>> >> >>> >> >>> >>>>> >>>>> Let us know if things aren't clear! >>>>> >>>>> Thanks, >>>>> >>>>> Lawrence >>>> -------------- next part -------------- A non-text attachment was scrubbed... Name: examplecode.c Type: text/x-csrc Size: 9877 bytes Desc: not available URL: From david.knezevic at akselos.com Tue Mar 8 10:29:27 2022 From: david.knezevic at akselos.com (David Knezevic) Date: Tue, 8 Mar 2022 11:29:27 -0500 Subject: [petsc-users] Two questions regarding SNESLinesearchPrecheck Message-ID: We're using SNESLinesearchPrecheck in order to implement some nonlinear continuation methods, and we had a couple of questions: 1. The "search direction" that is handed to the pre-check function is referred to as "y" in the documentation. We had assumed that we would have "y = delta_x_k", where delta_x_k is as shown in the attached screenshot from the PETSc manual. But after doing some testing it seems that in fact we have "y = -delta_x_k", i.e. y is the NEGATIVE of the Newton step. Is this correct? 2. In the documentation for SNESLineSearchPreCheck it says that "x" is the "current solution". Referring again to the attached screenshot, does this mean that it's x_k+1 or x_k? I assume it means x_k+1 but I wanted to confirm this. Thanks for your help! Regards, David -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: newton.png Type: image/png Size: 47332 bytes Desc: not available URL: From jed at jedbrown.org Tue Mar 8 10:43:17 2022 From: jed at jedbrown.org (Jed Brown) Date: Tue, 08 Mar 2022 09:43:17 -0700 Subject: [petsc-users] Two questions regarding SNESLinesearchPrecheck In-Reply-To: References: Message-ID: <87ee3ctmxm.fsf@jedbrown.org> I think SNESLineSearchApply_Basic will clarify these points. Note that the pre-check is applied before "taking the step", so X is x_k. You're right that the sign is flipped on search direction Y, as it's using -lambda below. /* precheck */ ierr = SNESLineSearchPreCheck(linesearch,X,Y,&changed_y);CHKERRQ(ierr); /* update */ ierr = VecWAXPY(W,-lambda,Y,X);CHKERRQ(ierr); [...] ierr = VecCopy(W, X);CHKERRQ(ierr); David Knezevic writes: > We're using SNESLinesearchPrecheck in order to implement some nonlinear > continuation methods, and we had a couple of questions: > > 1. The "search direction" that is handed to the pre-check function is > referred to as "y" in the documentation. We had assumed that we would have > "y = delta_x_k", where delta_x_k is as shown in the attached screenshot > from the PETSc manual. But after doing some testing it seems that in fact > we have "y = -delta_x_k", i.e. y is the NEGATIVE of the Newton step. Is > this correct? > > 2. In the documentation for SNESLineSearchPreCheck it says that "x" is the > "current solution". Referring again to the attached screenshot, does this > mean that it's x_k+1 or x_k? I assume it means x_k+1 but I wanted to > confirm this. > > Thanks for your help! > > Regards, > David From david.knezevic at akselos.com Tue Mar 8 10:47:42 2022 From: david.knezevic at akselos.com (David Knezevic) Date: Tue, 8 Mar 2022 11:47:42 -0500 Subject: [petsc-users] Two questions regarding SNESLinesearchPrecheck In-Reply-To: <87ee3ctmxm.fsf@jedbrown.org> References: <87ee3ctmxm.fsf@jedbrown.org> Message-ID: OK, that's clear, thank you! David On Tue, Mar 8, 2022 at 11:43 AM Jed Brown wrote: > I think SNESLineSearchApply_Basic will clarify these points. Note that the > pre-check is applied before "taking the step", so X is x_k. You're right > that the sign is flipped on search direction Y, as it's using -lambda below. > > /* precheck */ > ierr = SNESLineSearchPreCheck(linesearch,X,Y,&changed_y);CHKERRQ(ierr); > > /* update */ > ierr = VecWAXPY(W,-lambda,Y,X);CHKERRQ(ierr); > [...] > ierr = VecCopy(W, X);CHKERRQ(ierr); > > David Knezevic writes: > > > We're using SNESLinesearchPrecheck in order to implement some nonlinear > > continuation methods, and we had a couple of questions: > > > > 1. The "search direction" that is handed to the pre-check function is > > referred to as "y" in the documentation. We had assumed that we would > have > > "y = delta_x_k", where delta_x_k is as shown in the attached screenshot > > from the PETSc manual. But after doing some testing it seems that in fact > > we have "y = -delta_x_k", i.e. y is the NEGATIVE of the Newton step. Is > > this correct? > > > > 2. In the documentation for SNESLineSearchPreCheck it says that "x" is > the > > "current solution". Referring again to the attached screenshot, does this > > mean that it's x_k+1 or x_k? I assume it means x_k+1 but I wanted to > > confirm this. > > > > Thanks for your help! > > > > Regards, > > David > -------------- next part -------------- An HTML attachment was scrubbed... URL: From k.sagiyama at imperial.ac.uk Wed Mar 9 06:28:28 2022 From: k.sagiyama at imperial.ac.uk (Sagiyama, Koki) Date: Wed, 9 Mar 2022 12:28:28 +0000 Subject: [petsc-users] DMView and DMLoad In-Reply-To: <49329c6f-dfa7-b1ca-1d16-d79f1234c1df@ovgu.de> References: <56ce2135-9757-4292-e33b-c7eea8fb7b2e@ovgu.de> <056E066F-D596-4254-A44A-60BFFD30FE82@erdw.ethz.ch> <6c4e0656-db99-e9da-000f-ab9f7dd62c07@ovgu.de> <0845e501-e2cd-d7cc-58be-2803ee5ef6cd@ovgu.de> <49329c6f-dfa7-b1ca-1d16-d79f1234c1df@ovgu.de> Message-ID: Dear Berend, Those numbers in general do not match for various reasons; e.g., even if we use the same number of processes for saving and loading, we might end up having different plex partitions. The easiest way to check correctness might be to actually run some simple problem (e.g., compute some global quantity) before saving and after loading, and see if you get consistent results. Thanks, Koki ________________________________ From: Berend van Wachem Sent: Tuesday, March 8, 2022 4:08 PM To: Sagiyama, Koki ; Hapla Vaclav ; PETSc users list ; Lawrence Mitchell Subject: Re: [petsc-users] DMView and DMLoad Dear Koki, Many thanks for your help - that was very useful! We have been able to save/load non-periodic meshes. Unfortunately, the periodic ones are still not working for us, and I attach a small example code and output. As you suggested, the "-dm_plex_view_hdf5_storage_version 2.0.0" option gets rid of the previous error message ("Number of coordinates loaded 3168 does not match number of vertices 1000") but the the cell values are either shifted or overwritten after loading. To illustrate that the cell IDs and values for a periodic box with 4 cells in each direction and consecutive cell values is included below. Two CPUs are used. It is clear to see that the saved and loaded fields are different. Do you think is there a way to make it work for periodic cases? I have attached the small code that I used for this test. Many thanks and best regards, Berend. Output: ---- Save ----------------- ----- Load ----------------- CellIndex CellValue CPU 0 : CellIndex CellValue CPU 0 : 0 0 0 8 1 1 1 9 2 2 2 10 3 3 3 11 4 4 4 12 5 5 5 13 6 6 6 14 7 7 7 15 8 8 8 24 9 9 9 25 10 10 10 26 11 11 11 27 12 12 12 28 13 13 13 29 14 14 14 30 15 15 15 31 16 16 16 8 17 17 17 9 18 18 18 10 19 19 19 11 20 20 20 12 21 21 21 13 22 22 22 14 23 23 23 15 24 24 24 24 25 25 25 25 26 26 26 26 27 27 27 27 28 28 28 28 29 29 29 29 30 30 30 30 31 31 31 31 CellIndex CellValue CPU 1 : CellIndex CellValue CPU 1 : 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 7 7 7 7 8 8 8 16 9 9 9 17 10 10 10 18 11 11 11 19 12 12 12 20 13 13 13 21 14 14 14 22 15 15 15 23 16 16 16 0 17 17 17 1 18 18 18 2 19 19 19 3 20 20 20 4 21 21 21 5 22 22 22 6 23 23 23 7 24 24 24 16 25 25 25 17 26 26 26 18 27 27 27 19 28 28 28 20 29 29 29 21 30 30 30 22 31 31 31 23 On 2/24/22 12:07, Sagiyama, Koki wrote: > Dear Berend, > > DMClone() on a DMPlex object does not clone the PetscSection that that > DMPlex object carries > (https://petsc.org/main/docs/manualpages/DM/DMClone.html > ). I think you > intended to do something like the following: > ``` > DMClone(dm, &sdm); > PetscObjectSetName((PetscObject)sdm, "dmA"); > DMSetLocalSection(sdm, section); > ... > DMCreateGlobalVector(sdm, &xGlobalVector); > ... > ``` > > Regarding save/load, current default I/O seems not working for some > reason for periodic meshes as you reported. The latest implementation, > however, seems working, so you can try using > `-dm_plex_view_hdf5_storage_version 2.0.0` option when saving and see if > it works. > > Thanks, > Koki > > ------------------------------------------------------------------------ > *From:* Berend van Wachem > *Sent:* Thursday, February 17, 2022 9:06 AM > *To:* Sagiyama, Koki ; Hapla Vaclav > ; PETSc users list ; > Lawrence Mitchell > *Subject:* Re: [petsc-users] DMView and DMLoad > Dear Koki, > > Many thanks for your help and sorry for the slow reply. > > I haven't been able to get it to work successfully. I have attached a > small example that replicates the main features of our code. In this > example a Box with one random field is generated, saved and loaded. The > case works for non-periodic domains and fails for periodic ones. I've > also included the error output at the bottom of this email. > > To switch between periodic and non-periodic, please comment/uncomment > lines 47 to 52 in src/main.c. To compile, the files "compile" and > "CMakeLists.txt" are included in a separate tar file, if you want to use > this. Your library paths should be updated in the latter file. The PETSc > main distribution is used. > > Many thanks for your help! > > Thanks and best regards, > > Berend. > > > > The error message with --with-debugging=no --with-errorchecking=no: > > [0]PETSC ERROR: --------------------- Error Message > ------------------------------------------------ > [0]PETSC ERROR: Invalid argument > [0]PETSC ERROR: Number of coordinates loaded 3168 does not match number > of vertices 1000 > [0]PETSC ERROR: See https://petsc.org/release/faq/ > for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.16.2-499-g9039b796d1 > GIT Date: 2021-12-24 23:23:09 +0000 > [0]PETSC ERROR: ./bin/restart_periodic on a arch-linux-c-opt named james > by serbenlo Thu Dec 30 20:53:22 2021 > [0]PETSC ERROR: Configure options --with-debugging=no > --with-errorchecking=no --with-clean --download-metis=yes > --download-parmetis=yes --download-hdf5 --download-p4est > --download-triangle --download-tetgen > --with-zlib-lib=/usr/lib/x86_64-linux-gnu/libz.a > --with-zlib-include=/usr/include --with-mpi=yes --with-mpi-dir=/usr > --with-mpiexec=/usr/bin/mpiexec > [0]PETSC ERROR: #1 DMPlexCoordinatesLoad_HDF5_V0_Private() at > /usr/local/petsc_main/src/dm/impls/plex/plexhdf5.c:1387 > [0]PETSC ERROR: #2 DMPlexCoordinatesLoad_HDF5_Internal() at > /usr/local/petsc_main/src/dm/impls/plex/plexhdf5.c:1419 > [0]PETSC ERROR: #3 DMPlexCoordinatesLoad() at > /usr/local/petsc_main/src/dm/impls/plex/plex.c:2070 > [0]PETSC ERROR: #4 main() at > /media/MANNHEIM/Arbeit/OvGU_PostDoc_2021/Projects/MF_Restart/Periodic_DM/restart-periodic/src/main.c:229 > [0]PETSC ERROR: No PETSc Option Table entries > [0]PETSC ERROR: ----------------End of Error Message -------send entire > error message to petsc-maint at mcs.anl.gov---------- > application called MPI_Abort(MPI_COMM_WORLD, 62) - process 0 > > > The error message with --with-debugging=yes --with-errorchecking=yes: > > [0]PETSC ERROR: --------------------- Error Message > ------------------------------------------------- > [0]PETSC ERROR: Null argument, when expecting valid pointer > [1]PETSC ERROR: --------------------- Error Message > ------------------------------------------------- > [1]PETSC ERROR: Null argument, when expecting valid pointer > [1]PETSC ERROR: Null Object: Parameter # 1 > [1]PETSC ERROR: See https://petsc.org/release/faq/ > for trouble shooting. > [1]PETSC ERROR: Petsc Development GIT revision: v3.16.2-499-g9039b796d1 > GIT Date: 2021-12-24 23:23:09 +0000 > [1]PETSC ERROR: ./bin/restart_periodic on a arch-linux-c-opt named james > by serbenlo Thu Dec 30 20:17:22 2021 > [1]PETSC ERROR: Configure options --with-debugging=yes > --with-errorchecking=yes --with-clean --download-metis=yes > --download-parmetis=yes --download-hdf5 --download-p4est > --download-triangle --download-tetgen > --with-zlib-lib=/usr/lib/x86_64-linux-gnu/libz.a > --with-zlib-include=/usr/include --with-mpi=yes --with-mpi-dir=/usr > --with-mpiexec=/usr/bin/mpiexec > [1]PETSC ERROR: #1 PetscSectionGetDof() at > /usr/local/petsc_main/src/vec/is/section/interface/section.c:807 > [1]PETSC ERROR: [0]PETSC ERROR: Null Object: Parameter # 1 > [0]PETSC ERROR: See https://petsc.org/release/faq/ > for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.16.2-499-g9039b796d1 > GIT Date: 2021-12-24 23:23:09 +0000 > [0]PETSC ERROR: ./bin/restart_periodic on a arch-linux-c-opt named james > by serbenlo Thu Dec 30 20:17:22 2021 > [0]PETSC ERROR: Configure options --with-debugging=yes > --with-errorchecking=yes --with-clean --download-metis=yes > --download-parmetis=yes --download-hdf5 --download-p4est > --download-triangle --download-tetgen > --with-zlib-lib=/usr/lib/x86_64-linux-gnu/libz.a > --with-zlib-include=/usr/include --with-mpi=yes --with-mpi-dir=/usr > --with-mpiexec=/usr/bin/mpiexec > #2 DMDefaultSectionCheckConsistency_Internal() at > /usr/local/petsc_main/src/dm/interface/dm.c:4489 > [1]PETSC ERROR: #3 DMSetGlobalSection() at > /usr/local/petsc_main/src/dm/interface/dm.c:4583 > [1]PETSC ERROR: [0]PETSC ERROR: #1 PetscSectionGetDof() at > /usr/local/petsc_main/src/vec/is/section/interface/section.c:807 > [0]PETSC ERROR: #4 main() at > /media/MANNHEIM/Arbeit/OvGU_PostDoc_2021/Projects/MF_Restart/Periodic_DM/restart-periodic/src/main.c:164 > [1]PETSC ERROR: No PETSc Option Table entries > [1]PETSC ERROR: #2 DMDefaultSectionCheckConsistency_Internal() at > /usr/local/petsc_main/src/dm/interface/dm.c:4489 > [0]PETSC ERROR: #3 DMSetGlobalSection() at > /usr/local/petsc_main/src/dm/interface/dm.c:4583 > ----------------End of Error Message -------send entire error message to > petsc-maint at mcs.anl.gov---------- > application called MPI_Abort(MPI_COMM_WORLD, 85) - process 1 > [0]PETSC ERROR: #4 main() at > /media/MANNHEIM/Arbeit/OvGU_PostDoc_2021/Projects/MF_Restart/Periodic_DM/restart-periodic/src/main.c:164 > [0]PETSC ERROR: No PETSc Option Table entries > [0]PETSC ERROR: ----------------End of Error Message -------send entire > error message to petsc-maint at mcs.anl.gov---------- > application called MPI_Abort(MPI_COMM_WORLD, 85) - process 0 > > > > On 12/7/21 16:50, Sagiyama, Koki wrote: >> Hi Berend, >> >> I made some small changes to your code to successfully compile it and >> defined a periodic dm using DMPlexCreateBoxMesh(), but otherwise your >> code worked fine. >> I think we would like to see a complete minimal failing example. Can you >> make the working example that I pasted in earlier email fail just by >> modifying the dm(i.e., using the periodic mesh you are actually using)? >> >> Thanks, >> Koki >> ------------------------------------------------------------------------ >> *From:* Berend van Wachem >> *Sent:* Monday, December 6, 2021 3:39 PM >> *To:* Sagiyama, Koki ; Hapla Vaclav >> ; PETSc users list ; >> Lawrence Mitchell >> *Subject:* Re: [petsc-users] DMView and DMLoad >> Dear Koki, >> >> Thanks for your email. In the example of your last email >> DMPlexCoordinatesLoad() takes sF0 (PetscSF) as a third argument. In our >> code this modification does not fix the error when loading a periodic >> dm. Are we doing something wrong? I've included an example code at the >> bottom of this email, including the error output. >> >> Thanks and best regards, >> Berend >> >> >> /**** Write DM + Vec restart ****/ >> PetscViewerHDF5Open(PETSC_COMM_WORLD, "result", FILE_MODE_WRITE, &H5Viewer); >> PetscObjectSetName((PetscObject)dm, "plexA"); >> PetscViewerPushFormat(H5Viewer, PETSC_VIEWER_HDF5_PETSC); >> DMPlexTopologyView(dm, H5Viewer); >> DMPlexLabelsView(dm, H5Viewer); >> DMPlexCoordinatesView(dm, H5Viewer); >> PetscViewerPopFormat(H5Viewer); >> >> DM sdm; >> PetscSection s; >> >> DMClone(dm, &sdm); >> PetscObjectSetName((PetscObject)sdm, "dmA"); >> DMGetGlobalSection(dm, &s); >> DMSetGlobalSection(sdm, s); >> DMPlexSectionView(dm, H5Viewer, sdm); >> >> Vec vec, vecOld; >> PetscScalar *array, *arrayOld, *xVecArray, *xVecArrayOld; >> PetscInt numPoints; >> >> DMGetGlobalVector(sdm, &vec); >> DMGetGlobalVector(sdm, &vecOld); >> >> /*** Fill the vectors vec and vecOld ***/ >> VecGetArray(vec, &array); >> VecGetArray(vecOld, &arrayOld); >> VecGetLocalSize(xGlobalVector, &numPoints); >> VecGetArray(xGlobalVector, &xVecArray); >> VecGetArray(xOldGlobalVector, &xVecArrayOld); >> >> for (i = 0; i < numPoints; i++) /* Loop over all internal mesh points */ >> { >> array[i] = xVecArray[i]; >> arrayOld[i] = xVecArrayOld[i]; >> } >> >> VecRestoreArray(vec, &array); >> VecRestoreArray(vecOld, &arrayOld); >> VecRestoreArray(xGlobalVector, &xVecArray); >> VecRestoreArray(xOldGlobalVector, &xVecArrayOld); >> >> PetscObjectSetName((PetscObject)vec, "vecA"); >> PetscObjectSetName((PetscObject)vecOld, "vecB"); >> DMPlexGlobalVectorView(dm, H5Viewer, sdm, vec); >> DMPlexGlobalVectorView(dm, H5Viewer, sdm, vecOld); >> PetscViewerDestroy(&H5Viewer); >> /*** end of writing ****/ >> >> /*** Load ***/ >> PetscViewerHDF5Open(PETSC_COMM_WORLD, "result", FILE_MODE_READ, &H5Viewer); >> DMCreate(PETSC_COMM_WORLD, &dm); >> DMSetType(dm, DMPLEX); >> PetscObjectSetName((PetscObject)dm, "plexA"); >> PetscViewerPushFormat(H5Viewer, PETSC_VIEWER_HDF5_PETSC); >> DMPlexTopologyLoad(dm, H5Viewer, &sfO); >> DMPlexLabelsLoad(dm, H5Viewer); >> DMPlexCoordinatesLoad(dm, H5Viewer, sfO); >> PetscViewerPopFormat(H5Viewer); >> >> DMPlexDistribute(dm, Options->Mesh.overlap, &sfDist, &distributedDM); >> if (distributedDM) { >> DMDestroy(&dm); >> dm = distributedDM; >> PetscObjectSetName((PetscObject)dm, "plexA"); >> } >> >> PetscSFCompose(sfO, sfDist, &sf); >> PetscSFDestroy(&sfO); >> PetscSFDestroy(&sfDist); >> >> DMClone(dm, &sdm); >> PetscObjectSetName((PetscObject)sdm, "dmA"); >> DMPlexSectionLoad(dm, H5Viewer, sdm, sf, &globalDataSF, &localDataSF); >> >> /** Load the Vectors **/ >> DMGetGlobalVector(sdm, &Restart_xGlobalVector); >> VecSet(Restart_xGlobalVector,0.0); >> >> PetscObjectSetName((PetscObject)Restart_xGlobalVector, "vecA"); >> DMPlexGlobalVectorLoad(dm, H5Viewer, sdm, >> globalDataSF,Restart_xGlobalVector); >> DMGetGlobalVector(sdm, &Restart_xOldGlobalVector); >> VecSet(Restart_xOldGlobalVector,0.0); >> >> PetscObjectSetName((PetscObject)Restart_xOldGlobalVector, "vecB"); >> DMPlexGlobalVectorLoad(dm, H5Viewer, sdm, globalDataSF, >> Restart_xOldGlobalVector); >> >> PetscViewerDestroy(&H5Viewer); >> >> >> /**** The error message when loading is the following ************/ >> >> Creating and distributing mesh >> [0]PETSC ERROR: --------------------- Error Message >> -------------------------- >> [0]PETSC ERROR: Invalid argument >> [0]PETSC ERROR: Number of coordinates loaded 17128 does not match number >> of vertices 8000 >> [0]PETSC ERROR: See https://petsc.org/release/faq/ >> > for > trouble shooting. >> [0]PETSC ERROR: Petsc Development GIT revision: v3.16.1-435-g007f11b901 >> GIT Date: 2021-12-01 14:31:21 +0000 >> [0]PETSC ERROR: ./MF3 on a linux-gcc-openmpi-opt named >> ivt24.ads.uni-magdeburg.de by berend Mon Dec 6 16:11:21 2021 >> [0]PETSC ERROR: Configure options --with-p4est=yes --with-partemis >> --with-metis --with-debugging=no --download-metis=yes >> --download-parmetis=yes --with-errorchecking=no --download-hdf5 >> --download-zlib --download-p4est >> [0]PETSC ERROR: #1 DMPlexCoordinatesLoad_HDF5_V0_Private() at >> /home/berend/src/petsc_main/src/dm/impls/plex/plexhdf5.c:1387 >> [0]PETSC ERROR: #2 DMPlexCoordinatesLoad_HDF5_Internal() at >> /home/berend/src/petsc_main/src/dm/impls/plex/plexhdf5.c:1419 >> [0]PETSC ERROR: #3 DMPlexCoordinatesLoad() at >> /home/berend/src/petsc_main/src/dm/impls/plex/plex.c:2070 >> [0]PETSC ERROR: #4 RestartMeshDM() at >> /home/berend/src/eclipseworkspace/multiflow/src/io/restartmesh.c:81 >> [0]PETSC ERROR: #5 CreateMeshDM() at >> /home/berend/src/eclipseworkspace/multiflow/src/mesh/createmesh.c:61 >> [0]PETSC ERROR: #6 main() at >> /home/berend/src/eclipseworkspace/multiflow/src/general/main.c:132 >> [0]PETSC ERROR: PETSc Option Table entries: >> [0]PETSC ERROR: --download-hdf5 >> [0]PETSC ERROR: --download-metis=yes >> [0]PETSC ERROR: --download-p4est >> [0]PETSC ERROR: --download-parmetis=yes >> [0]PETSC ERROR: --download-zlib >> [0]PETSC ERROR: --with-debugging=no >> [0]PETSC ERROR: --with-errorchecking=no >> [0]PETSC ERROR: --with-metis >> [0]PETSC ERROR: --with-p4est=yes >> [0]PETSC ERROR: --with-partemis >> [0]PETSC ERROR: -d results >> [0]PETSC ERROR: -o run.mf >> [0]PETSC ERROR: ----------------End of Error Message -------send entire >> error message to petsc-maint at mcs.anl.gov---------- >> -------------------------------------------------------------------------- >> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD >> with errorcode 62. >> >> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. >> You may or may not see output from other processes, depending on >> exactly when Open MPI kills them. >> -------------------------------------------------------------------------- >> >> >> >> >> >> On 11/19/21 00:26, Sagiyama, Koki wrote: >>> Hi Berend, >>> >>> I was not able to reproduce the issue you are having, but the following >>> 1D example (and similar 2D examples) worked fine for me using the latest >>> PETSc. Please note that DMPlexCoordinatesLoad() now takes a PetscSF >>> object as the third argument, but the default behavior is unchanged. >>> >>> /* test_periodic_io.c */ >>> >>> #include >>> #include >>> #include >>> >>> int main(int argc, char **argv) >>> { >>> DM dm; >>> Vec coordinates; >>> PetscViewer viewer; >>> PetscViewerFormat format = PETSC_VIEWER_HDF5_PETSC; >>> PetscSF sfO; >>> PetscErrorCode ierr; >>> >>> ierr = PetscInitialize(&argc, &argv, NULL, NULL); if (ierr) return ierr; >>> /* Save */ >>> ierr = PetscViewerHDF5Open(PETSC_COMM_WORLD, "periodic_example.h5", >>> FILE_MODE_WRITE, &viewer);CHKERRQ(ierr); >>> { >>> DM pdm; >>> PetscInt dim = 1; >>> const PetscInt faces[1] = {4}; >>> DMBoundaryType periodicity[] = {DM_BOUNDARY_PERIODIC}; >>> PetscInt overlap = 1; >>> >>> ierr = DMPlexCreateBoxMesh(PETSC_COMM_WORLD, dim, PETSC_FALSE, >>> faces, NULL, NULL, periodicity, PETSC_TRUE, &dm);CHKERRQ(ierr); >>> ierr = DMPlexDistribute(dm, overlap, NULL, &pdm);CHKERRQ(ierr); >>> ierr = DMDestroy(&dm);CHKERRQ(ierr); >>> dm = pdm; >>> ierr = PetscObjectSetName((PetscObject)dm, "periodicDM");CHKERRQ(ierr); >>> } >>> ierr = DMGetCoordinates(dm, &coordinates);CHKERRQ(ierr); >>> ierr = PetscPrintf(PETSC_COMM_WORLD, "Coordinates before >>> saving:\n");CHKERRQ(ierr); >>> ierr = VecView(coordinates, NULL);CHKERRQ(ierr); >>> ierr = PetscViewerPushFormat(viewer, format);CHKERRQ(ierr); >>> ierr = DMPlexTopologyView(dm, viewer);CHKERRQ(ierr); >>> ierr = DMPlexCoordinatesView(dm, viewer);CHKERRQ(ierr); >>> ierr = PetscViewerPopFormat(viewer);CHKERRQ(ierr); >>> ierr = DMDestroy(&dm);CHKERRQ(ierr); >>> ierr = PetscViewerDestroy(&viewer);CHKERRQ(ierr); >>> /* Load */ >>> ierr = PetscViewerHDF5Open(PETSC_COMM_WORLD, "periodic_example.h5", >>> FILE_MODE_READ, &viewer);CHKERRQ(ierr); >>> ierr = DMCreate(PETSC_COMM_WORLD, &dm);CHKERRQ(ierr); >>> ierr = DMSetType(dm, DMPLEX);CHKERRQ(ierr); >>> ierr = PetscObjectSetName((PetscObject)dm, "periodicDM");CHKERRQ(ierr); >>> ierr = PetscViewerPushFormat(viewer, format);CHKERRQ(ierr); >>> ierr = DMPlexTopologyLoad(dm, viewer, &sfO);CHKERRQ(ierr); >>> ierr = DMPlexCoordinatesLoad(dm, viewer, sfO);CHKERRQ(ierr); >>> ierr = PetscViewerPopFormat(viewer);CHKERRQ(ierr); >>> ierr = DMGetCoordinates(dm, &coordinates);CHKERRQ(ierr); >>> ierr = PetscPrintf(PETSC_COMM_WORLD, "Coordinates after >>> loading:\n");CHKERRQ(ierr); >>> ierr = VecView(coordinates, NULL);CHKERRQ(ierr); >>> ierr = PetscSFDestroy(&sfO);CHKERRQ(ierr); >>> ierr = DMDestroy(&dm);CHKERRQ(ierr); >>> ierr = PetscViewerDestroy(&viewer);CHKERRQ(ierr); >>> ierr = PetscFinalize(); >>> return ierr; >>> } >>> >>> mpiexec -n 2 ./test_periodic_io >>> >>> Coordinates before saving: >>> Vec Object: coordinates 2 MPI processes >>> type: mpi >>> Process [0] >>> 0. >>> Process [1] >>> 0.25 >>> 0.5 >>> 0.75 >>> Coordinates after loading: >>> Vec Object: vertices 2 MPI processes >>> type: mpi >>> Process [0] >>> 0. >>> 0.25 >>> 0.5 >>> 0.75 >>> Process [1] >>> >>> I would also like to note that, with the latest update, we can >>> optionally load coordinates directly on the distributed dm as (using >>> your notation): >>> >>> /* Distribute dm */ >>> ... >>> PetscSFCompose(sfO, sfDist, &sf); >>> DMPlexCoordinatesLoad(dm, viewer, sf); >>> >>> To use this feature, we need to pass "-dm_plex_view_hdf5_storage_version >>> 2.0.0" option when saving topology/coordinates. >>> >>> >>> Thanks, >>> Koki >>> ------------------------------------------------------------------------ >>> *From:* Berend van Wachem >>> *Sent:* Wednesday, November 17, 2021 3:16 PM >>> *To:* Hapla Vaclav ; PETSc users list >>> ; Lawrence Mitchell ; Sagiyama, >>> Koki >>> *Subject:* Re: [petsc-users] DMView and DMLoad >>> >>> ******************* >>> This email originates from outside Imperial. Do not click on links and >>> attachments unless you recognise the sender. >>> If you trust the sender, add them to your safe senders list >>> https://spam.ic.ac.uk/SpamConsole/Senders.aspx > >> > >>> > >> to disable email >>> stamping for this address. >>> ******************* >>> Dear Vaclav, Lawrence, Koki, >>> >>> Thanks for your help! Following your advice and following your example >>> (https://petsc.org/main/docs/manual/dmplex/#saving-and-loading-data-with-hdf5 > >> >>> >> >>) > >> >>> >>> we are able to save and load the DM with a wrapped Vector in h5 format >>> (PETSC_VIEWER_HDF5_PETSC) successfully. >>> >>> For saving, we use something similar to: >>> >>> DMPlexTopologyView(dm, viewer); >>> DMClone(dm, &sdm); >>> ... >>> DMPlexSectionView(dm, viewer, sdm); >>> DMGetLocalVector(sdm, &vec); >>> ... >>> DMPlexLocalVectorView(dm, viewer, sdm, vec); >>> >>> and for loading: >>> >>> DMCreate(PETSC_COMM_WORLD, &dm); >>> DMSetType(dm, DMPLEX); >>> ... >>> PetscViewerPushFormat(viewer, PETSC_VIEWER_HDF5_PETSC); >>> DMPlexTopologyLoad(dm, viewer, &sfO); >>> DMPlexLabelsLoad(dm, viewer); >>> DMPlexCoordinatesLoad(dm, viewer); >>> PetscViewerPopFormat(viewer); >>> ... >>> PetscSFCompose(sfO, sfDist, &sf); >>> ... >>> DMClone(dm, &sdm); >>> DMPlexSectionLoad(dm, viewer, sdm, sf, &globalDataSF, &localDataSF); >>> DMGetLocalVector(sdm, &vec); >>> ... >>> DMPlexLocalVectorLoad(dm, viewer, sdm, localDataSF, vec); >>> >>> >>> This works fine for non-periodic DMs but for periodic cases the line: >>> >>> DMPlexCoordinatesLoad(dm, H5Viewer); >>> >>> delivers the error message: invalid argument and the number of loaded >>> coordinates does not match the number of vertices. >>> >>> Is this a known shortcoming, or have we forgotten something to load >>> periodic DMs? >>> >>> Best regards, >>> >>> Berend. >>> >>> >>> >>> On 9/22/21 20:59, Hapla Vaclav wrote: >>>> To avoid confusions here, Berend seems to be specifically demanding XDMF >>>> (PETSC_VIEWER_HDF5_XDMF). The stuff we are now working on is parallel >>>> checkpointing in our own HDF5 format (PETSC_VIEWER_HDF5_PETSC), I will >>>> make a series of MRs on this topic in the following days. >>>> >>>> For XDMF, we are specifically missing the ability to write/load DMLabels >>>> properly. XDMF uses specific cell-local numbering for faces for >>>> specification of face sets, and face-local numbering for specification >>>> of edge sets, which is not great wrt DMPlex design. And ParaView doesn't >>>> show any of these properly so it's hard to debug. Matt, we should talk >>>> about this soon. >>>> >>>> Berend, for now, could you just load the mesh initially from XDMF and >>>> then use our PETSC_VIEWER_HDF5_PETSC format for subsequent saving/loading? >>>> >>>> Thanks, >>>> >>>> Vaclav >>>> >>>>> On 17 Sep 2021, at 15:46, Lawrence Mitchell >>>> >>>> wrote: >>>>> >>>>> Hi Berend, >>>>> >>>>>> On 14 Sep 2021, at 12:23, Matthew Knepley >>>>> > >>>> wrote: >>>>>> >>>>>> On Tue, Sep 14, 2021 at 5:15 AM Berend van Wachem >>>>>> > >>>> wrote: >>>>>> Dear PETSc-team, >>>>>> >>>>>> We are trying to save and load distributed DMPlex and its associated >>>>>> physical fields (created with DMCreateGlobalVector) (Uvelocity, >>>>>> VVelocity, ...) in HDF5_XDMF format. To achieve this, we do the >>>>>> following: >>>>>> >>>>>> 1) save in the same xdmf.h5 file: >>>>>> DMView( DM , H5_XDMF_Viewer ); >>>>>> VecView( UVelocity, H5_XDMF_Viewer ); >>>>>> >>>>>> 2) load the dm: >>>>>> DMPlexCreateFromfile(PETSC_COMM_WORLD, Filename, PETSC_TRUE, DM); >>>>>> >>>>>> 3) load the physical field: >>>>>> VecLoad( UVelocity, H5_XDMF_Viewer ); >>>>>> >>>>>> There are no errors in the execution, but the loaded DM is distributed >>>>>> differently to the original one, which results in the incorrect >>>>>> placement of the values of the physical fields (UVelocity etc.) in the >>>>>> domain. >>>>>> >>>>>> This approach is used to restart the simulation with the last saved DM. >>>>>> Is there something we are missing, or there exists alternative routes to >>>>>> this goal? Can we somehow get the IS of the redistribution, so we can >>>>>> re-distribute the vector data as well? >>>>>> >>>>>> Many thanks, best regards, >>>>>> >>>>>> Hi Berend, >>>>>> >>>>>> We are in the midst of rewriting this. We want to support saving >>>>>> multiple meshes, with fields attached to each, >>>>>> and preserving the discretization (section) information, and allowing >>>>>> us to load up on a different number of >>>>>> processes. We plan to be done by October. Vaclav and I are doing this >>>>>> in collaboration with Koki Sagiyama, >>>>>> David Ham, and Lawrence Mitchell from the Firedrake team. >>>>> >>>>> The core load/save cycle functionality is now in PETSc main. So if >>>>> you're using main rather than a release, you can get access to it now. >>>>> This section of the manual shows an example of how to do >>>>> thingshttps://petsc.org/main/docs/manual/dmplex/#saving-and-loading-data-with-hdf5 >>>>> >> >>> >> >>> >>>>> >>>>> Let us know if things aren't clear! >>>>> >>>>> Thanks, >>>>> >>>>> Lawrence >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From nabw91 at gmail.com Wed Mar 9 14:42:39 2022 From: nabw91 at gmail.com (=?UTF-8?Q?Nicol=C3=A1s_Barnafi?=) Date: Wed, 9 Mar 2022 21:42:39 +0100 Subject: [petsc-users] Arbitrary ownership IS for a matrix Message-ID: Hi community, I have an application with polytopal meshes (elements of arbitrary shape) where the distribution of dofs is not PETSc friendly, meaning that it is not true that cpu0 owns dofs [0,a), then cpu1 owns [a,b) and so on, but instead the distribution is in fact random. Another important detail is that boundary dofs are shared, meaning that if dof 150 is on the boundary, each subdomain vector has dof 150. Under this considerations: i) Is it possible to give an arbitrary mapping to the matrix structure or is the blocked distribution hard coded? ii) Are the repeated boundary dofs an issue when computing a Fieldsplit preconditioner in parallel? Best regards, Nicolas -------------- next part -------------- An HTML attachment was scrubbed... URL: From aduarteg at utexas.edu Wed Mar 9 15:50:00 2022 From: aduarteg at utexas.edu (Alfredo J Duarte Gomez) Date: Wed, 9 Mar 2022 15:50:00 -0600 Subject: [petsc-users] TSBDF prr-load higher order solution Message-ID: Good morning PETSC team, I am currently using a TSBDF object, which is working very well. However, I am running into trouble restarting higher order BDF methods. My problem is highly nonlinear, and when restarted for higher order BDF methods (using the TSBDF_Restart function), wiggles appear in a specific region of the solution. Is there any way I can initialize the higher order BDF restart loading previous solutions from a data file? I took a look at the code, but there is no obvious way to do this. Thanks, -Alfredo -- Alfredo Duarte Graduate Research Assistant The University of Texas at Austin -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Mar 9 16:12:59 2022 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 9 Mar 2022 17:12:59 -0500 Subject: [petsc-users] Arbitrary ownership IS for a matrix In-Reply-To: References: Message-ID: <6F08A21C-B411-4B23-93A5-22800E861014@petsc.dev> You need to do a mapping of your global numbering to the standard PETSc numbering and use the PETSc numbering for all access to vectors and matrices. https://petsc.org/release/docs/manualpages/AO/AOCreate.html provides one approach to managing the renumbering. Barry > On Mar 9, 2022, at 3:42 PM, Nicol?s Barnafi wrote: > > Hi community, > > I have an application with polytopal meshes (elements of arbitrary shape) where the distribution of dofs is not PETSc friendly, meaning that it is not true that cpu0 owns dofs [0,a), then cpu1 owns [a,b) and so on, but instead the distribution is in fact random. Another important detail is that boundary dofs are shared, meaning that if dof 150 is on the boundary, each subdomain vector has dof 150. > > Under this considerations: > > i) Is it possible to give an arbitrary mapping to the matrix structure or is the blocked distribution hard coded? > ii) Are the repeated boundary dofs an issue when computing a Fieldsplit preconditioner in parallel? > > Best regards, > Nicolas -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Mar 9 16:24:38 2022 From: jed at jedbrown.org (Jed Brown) Date: Wed, 09 Mar 2022 15:24:38 -0700 Subject: [petsc-users] TSBDF prr-load higher order solution In-Reply-To: References: Message-ID: <874k46zrvd.fsf@jedbrown.org> Can you restart using small low-order steps? Hong, does (or should) your trajectory stuff support an exact checkpointing scheme for BDF? I think we could add an interface to access the stored steps, but there are few things other than checkpointing that would make sense mathematically. Would you be up for making a merge request to add TSBDFGetStepVecs(TS ts, PetscInt *num_steps, const PetscReal **times, const Vec *vecs) and the respective setter? Alfredo J Duarte Gomez writes: > Good morning PETSC team, > > I am currently using a TSBDF object, which is working very well. > > However, I am running into trouble restarting higher order BDF methods. > > My problem is highly nonlinear, and when restarted for higher order BDF > methods (using the TSBDF_Restart function), wiggles appear in a specific > region of the solution. > > Is there any way I can initialize the higher order BDF restart loading > previous solutions from a data file? I took a look at the code, but there > is no obvious way to do this. > > Thanks, > > -Alfredo > > -- > Alfredo Duarte > Graduate Research Assistant > The University of Texas at Austin From hongzhang at anl.gov Wed Mar 9 16:49:15 2022 From: hongzhang at anl.gov (Zhang, Hong) Date: Wed, 9 Mar 2022 22:49:15 +0000 Subject: [petsc-users] TSBDF prr-load higher order solution In-Reply-To: <874k46zrvd.fsf@jedbrown.org> References: <874k46zrvd.fsf@jedbrown.org> Message-ID: <4C1804B4-0B18-4540-ACCE-3390F373FD1F@anl.gov> TSTrajectory supports checkpointing for multistage methods and can certainly be extended to multistep methods. But I doubt it is the best solution to Alfredo?s problem. Alfredo, can you elaborate a bit on what you would like to do? TSBDF_Restart is already using the previous solution to restart the integration with first-order BDF. Hong(Mr.) > On Mar 9, 2022, at 4:24 PM, Jed Brown wrote: > > Can you restart using small low-order steps? > > Hong, does (or should) your trajectory stuff support an exact checkpointing scheme for BDF? > > I think we could add an interface to access the stored steps, but there are few things other than checkpointing that would make sense mathematically. Would you be up for making a merge request to add TSBDFGetStepVecs(TS ts, PetscInt *num_steps, const PetscReal **times, const Vec *vecs) and the respective setter? > > Alfredo J Duarte Gomez writes: > >> Good morning PETSC team, >> >> I am currently using a TSBDF object, which is working very well. >> >> However, I am running into trouble restarting higher order BDF methods. >> >> My problem is highly nonlinear, and when restarted for higher order BDF >> methods (using the TSBDF_Restart function), wiggles appear in a specific >> region of the solution. >> >> Is there any way I can initialize the higher order BDF restart loading >> previous solutions from a data file? I took a look at the code, but there >> is no obvious way to do this. >> >> Thanks, >> >> -Alfredo >> >> -- >> Alfredo Duarte >> Graduate Research Assistant >> The University of Texas at Austin From knepley at gmail.com Wed Mar 9 18:19:00 2022 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 9 Mar 2022 19:19:00 -0500 Subject: [petsc-users] Arbitrary ownership IS for a matrix In-Reply-To: <6F08A21C-B411-4B23-93A5-22800E861014@petsc.dev> References: <6F08A21C-B411-4B23-93A5-22800E861014@petsc.dev> Message-ID: On Wed, Mar 9, 2022 at 5:13 PM Barry Smith wrote: > > You need to do a mapping of your global numbering to the standard PETSc > numbering and use the PETSc numbering for all access to vectors and > matrices. > > https://petsc.org/release/docs/manualpages/AO/AOCreate.html provides > one approach to managing the renumbering. > You can think of this as the mapping to offsets that you would need in any event to store your values (they could not be directly addressed with your random indices). Thanks, Matt > Barry > > > On Mar 9, 2022, at 3:42 PM, Nicol?s Barnafi wrote: > > Hi community, > > I have an application with polytopal meshes (elements of arbitrary shape) > where the distribution of dofs is not PETSc friendly, meaning that it is > not true that cpu0 owns dofs [0,a), then cpu1 owns [a,b) and so on, but > instead the distribution is in fact random. Another important detail is > that boundary dofs are shared, meaning that if dof 150 is on the boundary, > each subdomain vector has dof 150. > > Under this considerations: > > i) Is it possible to give an arbitrary mapping to the matrix structure or > is the blocked distribution hard coded? > ii) Are the repeated boundary dofs an issue when computing a Fieldsplit > preconditioner in parallel? > > Best regards, > Nicolas > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From nabw91 at gmail.com Wed Mar 9 19:50:52 2022 From: nabw91 at gmail.com (=?UTF-8?Q?Nicol=C3=A1s_Barnafi?=) Date: Wed, 9 Mar 2022 22:50:52 -0300 Subject: [petsc-users] Arbitrary ownership IS for a matrix In-Reply-To: References: <6F08A21C-B411-4B23-93A5-22800E861014@petsc.dev> Message-ID: Thank you both very much, it is exactly what I needed. Best regards On Wed, Mar 9, 2022, 21:19 Matthew Knepley wrote: > On Wed, Mar 9, 2022 at 5:13 PM Barry Smith wrote: > >> >> You need to do a mapping of your global numbering to the standard PETSc >> numbering and use the PETSc numbering for all access to vectors and >> matrices. >> >> https://petsc.org/release/docs/manualpages/AO/AOCreate.html provides >> one approach to managing the renumbering. >> > > You can think of this as the mapping to offsets that you would need in any > event to store your values (they could not be directly addressed with your > random indices). > > Thanks, > > Matt > > >> Barry >> >> >> On Mar 9, 2022, at 3:42 PM, Nicol?s Barnafi wrote: >> >> Hi community, >> >> I have an application with polytopal meshes (elements of arbitrary shape) >> where the distribution of dofs is not PETSc friendly, meaning that it is >> not true that cpu0 owns dofs [0,a), then cpu1 owns [a,b) and so on, but >> instead the distribution is in fact random. Another important detail is >> that boundary dofs are shared, meaning that if dof 150 is on the boundary, >> each subdomain vector has dof 150. >> >> Under this considerations: >> >> i) Is it possible to give an arbitrary mapping to the matrix structure or >> is the blocked distribution hard coded? >> ii) Are the repeated boundary dofs an issue when computing a Fieldsplit >> preconditioner in parallel? >> >> Best regards, >> Nicolas >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From M.Deij at marin.nl Thu Mar 10 05:50:06 2022 From: M.Deij at marin.nl (Deij-van Rijswijk, Menno) Date: Thu, 10 Mar 2022 11:50:06 +0000 Subject: [petsc-users] Building with CMAKE_GENERATOR set Message-ID: <988cf2e240344a80bfe11642787dfe3e@MAR190n2.marin.local> Good morning, I find that when I set the environment variable CMAKE_GENERATOR to something that is not "Unix Makefiles" (e.g. Ninja) that building of subprojects like METIS fails. METIS gets configured with CMake, and writes out ninja.build build instructions. Then PETSc calls make/gmake to build and it can't find the makefiles because they're not generated. It would be nice if PETSc could handle other build tools supported by CMake, like for example Ninja. Best regards, Menno Deij - van Rijswijk dr. ir. Menno A. Deij-van Rijswijk | Researcher | Research & Development | Vrijdag vrij. In het algemeen op maandag en woensdag vanuit huis werkend, en op dinsdag en donderdag op kantoor. Actuele beschikbaarheid staat in mijn agenda. MARIN | T +31 317 49 35 06 | M.Deij at marin.nl | www.marin.nl [LinkedIn] [YouTube] [Twitter] [Facebook] MARIN news: -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image1150b9.PNG Type: image/png Size: 293 bytes Desc: image1150b9.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image5b3f56.PNG Type: image/png Size: 331 bytes Desc: image5b3f56.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image401cc0.PNG Type: image/png Size: 333 bytes Desc: image401cc0.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image87ad97.PNG Type: image/png Size: 253 bytes Desc: image87ad97.PNG URL: From bsmith at petsc.dev Thu Mar 10 08:24:59 2022 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 10 Mar 2022 09:24:59 -0500 Subject: [petsc-users] Building with CMAKE_GENERATOR set In-Reply-To: <988cf2e240344a80bfe11642787dfe3e@MAR190n2.marin.local> References: <988cf2e240344a80bfe11642787dfe3e@MAR190n2.marin.local> Message-ID: <4C7A4518-087E-406A-9F21-CFC1A891B183@petsc.dev> Would it be enough that PETSc's ./configure build of external packages that use cmake temporarily turns off the value of CMAKE_GENERATOR when it runs cmake for the external packages? Barry > On Mar 10, 2022, at 6:50 AM, Deij-van Rijswijk, Menno wrote: > > > Good morning, > > I find that when I set the environment variable CMAKE_GENERATOR to something that is not "Unix Makefiles" (e.g. Ninja) that building of subprojects like METIS fails. METIS gets configured with CMake, and writes out ninja.build build instructions. Then PETSc calls make/gmake to build and it can't find the makefiles because they're not generated. It would be nice if PETSc could handle other build tools supported by CMake, like for example Ninja. > > Best regards, > > > Menno Deij - van Rijswijk > > > dr. ir. Menno A. Deij-van Rijswijk | Researcher | Research & Development | Vrijdag vrij. > In het algemeen op maandag en woensdag vanuit huis werkend, en op dinsdag en donderdag op kantoor. Actuele beschikbaarheid staat in mijn agenda. > MARIN | T +31 317 49 35 06 | M.Deij at marin.nl | www.marin.nl > > > MARIN news: -------------- next part -------------- An HTML attachment was scrubbed... URL: From M.Deij at marin.nl Thu Mar 10 08:29:17 2022 From: M.Deij at marin.nl (Deij-van Rijswijk, Menno) Date: Thu, 10 Mar 2022 14:29:17 +0000 Subject: [petsc-users] Building with CMAKE_GENERATOR set In-Reply-To: <4C7A4518-087E-406A-9F21-CFC1A891B183@petsc.dev> References: <988cf2e240344a80bfe11642787dfe3e@MAR190n2.marin.local> <4C7A4518-087E-406A-9F21-CFC1A891B183@petsc.dev> Message-ID: Yes, I think that would work. All the best, Menno dr. ir. Menno A. Deij-van Rijswijk | Researcher | Research & Development | Vrijdag vrij. In het algemeen op maandag en woensdag vanuit huis werkend, en op dinsdag en donderdag op kantoor. Actuele beschikbaarheid staat in mijn agenda. MARIN | T +31 317 49 35 06 | M.Deij at marin.nl | www.marin.nl [LinkedIn] [YouTube] [Twitter] [Facebook] MARIN news: From: Barry Smith Sent: Thursday, March 10, 2022 3:25 PM To: Deij-van Rijswijk, Menno Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Building with CMAKE_GENERATOR set Would it be enough that PETSc's ./configure build of external packages that use cmake temporarily turns off the value of CMAKE_GENERATOR when it runs cmake for the external packages? Barry On Mar 10, 2022, at 6:50 AM, Deij-van Rijswijk, Menno > wrote: Good morning, I find that when I set the environment variable CMAKE_GENERATOR to something that is not "Unix Makefiles" (e.g. Ninja) that building of subprojects like METIS fails. METIS gets configured with CMake, and writes out ninja.build build instructions. Then PETSc calls make/gmake to build and it can't find the makefiles because they're not generated. It would be nice if PETSc could handle other build tools supported by CMake, like for example Ninja. Best regards, Menno Deij - van Rijswijk dr. ir. Menno A. Deij-van Rijswijk | Researcher | Research & Development | Vrijdag vrij. In het algemeen op maandag en woensdag vanuit huis werkend, en op dinsdag en donderdag op kantoor. Actuele beschikbaarheid staat in mijn agenda. MARIN | T +31 317 49 35 06 | M.Deij at marin.nl | www.marin.nl MARIN news: Help us improve the spam filter. If this message contains SPAM, click here to report. Thank you, MARIN Digital Services -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image58ada6.PNG Type: image/png Size: 293 bytes Desc: image58ada6.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image8693ea.PNG Type: image/png Size: 331 bytes Desc: image8693ea.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image11d6c4.PNG Type: image/png Size: 333 bytes Desc: image11d6c4.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: imagef67c7d.PNG Type: image/png Size: 253 bytes Desc: imagef67c7d.PNG URL: From aduarteg at utexas.edu Thu Mar 10 10:05:01 2022 From: aduarteg at utexas.edu (Alfredo J Duarte Gomez) Date: Thu, 10 Mar 2022 10:05:01 -0600 Subject: [petsc-users] TSBDF prr-load higher order solution In-Reply-To: <4C1804B4-0B18-4540-ACCE-3390F373FD1F@anl.gov> References: <874k46zrvd.fsf@jedbrown.org> <4C1804B4-0B18-4540-ACCE-3390F373FD1F@anl.gov> Message-ID: Hello Zhang and Hong, Thank you for your reply. As I described, I simply wanted to be able to restart a higher order BDF from a previous solution. For example, if I want to restart a BDF-2 solution I can simply load times (n is current time step ) t_n-1, t_n, load solutions y_n, and yn-1 from a restart file and continue integration with a BDF-2 formula as if it never stopped. This would replace the current default approach, which starts from a single time tn, solution yn and uses lower order BDF steps as you build up to the selected order. I am not sure why, but an abrupt change in integration order or time step leads to unwanted numerical noise in my solution, which I blame on the high nonlinearity of the system (I have tested extensively to rule out bugs). Thank you and let me know if you have any questions, -Alfredo On Wed, Mar 9, 2022 at 4:49 PM Zhang, Hong wrote: > TSTrajectory supports checkpointing for multistage methods and can > certainly be extended to multistep methods. But I doubt it is the best > solution to Alfredo?s problem. Alfredo, can you elaborate a bit on what you > would like to do? TSBDF_Restart is already using the previous solution to > restart the integration with first-order BDF. > > Hong(Mr.) > > > On Mar 9, 2022, at 4:24 PM, Jed Brown wrote: > > > > Can you restart using small low-order steps? > > > > Hong, does (or should) your trajectory stuff support an exact > checkpointing scheme for BDF? > > > > I think we could add an interface to access the stored steps, but there > are few things other than checkpointing that would make sense > mathematically. Would you be up for making a merge request to add > TSBDFGetStepVecs(TS ts, PetscInt *num_steps, const PetscReal **times, const > Vec *vecs) and the respective setter? > > > > Alfredo J Duarte Gomez writes: > > > >> Good morning PETSC team, > >> > >> I am currently using a TSBDF object, which is working very well. > >> > >> However, I am running into trouble restarting higher order BDF methods. > >> > >> My problem is highly nonlinear, and when restarted for higher order BDF > >> methods (using the TSBDF_Restart function), wiggles appear in a specific > >> region of the solution. > >> > >> Is there any way I can initialize the higher order BDF restart loading > >> previous solutions from a data file? I took a look at the code, but > there > >> is no obvious way to do this. > >> > >> Thanks, > >> > >> -Alfredo > >> > >> -- > >> Alfredo Duarte > >> Graduate Research Assistant > >> The University of Texas at Austin > > -- Alfredo Duarte Graduate Research Assistant The University of Texas at Austin -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Mar 10 10:09:32 2022 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 10 Mar 2022 11:09:32 -0500 Subject: [petsc-users] Building with CMAKE_GENERATOR set In-Reply-To: References: <988cf2e240344a80bfe11642787dfe3e@MAR190n2.marin.local> <4C7A4518-087E-406A-9F21-CFC1A891B183@petsc.dev> Message-ID: <3CF818F7-620E-4A1B-9C90-72599A421B18@petsc.dev> https://gitlab.com/petsc/petsc/-/merge_requests/4952 > On Mar 10, 2022, at 9:29 AM, Deij-van Rijswijk, Menno wrote: > > > Yes, I think that would work. > > All the best, > Menno > > > dr. ir. Menno A. Deij-van Rijswijk | Researcher | Research & Development | Vrijdag vrij. > In het algemeen op maandag en woensdag vanuit huis werkend, en op dinsdag en donderdag op kantoor. Actuele beschikbaarheid staat in mijn agenda. > MARIN | T +31 317 49 35 06 | M.Deij at marin.nl | www.marin.nl > > > MARIN news: > > > > From: Barry Smith > > Sent: Thursday, March 10, 2022 3:25 PM > To: Deij-van Rijswijk, Menno > > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Building with CMAKE_GENERATOR set > > > Would it be enough that PETSc's ./configure build of external packages that use cmake temporarily turns off the value of CMAKE_GENERATOR when it runs cmake for the external packages? > > Barry > > > > On Mar 10, 2022, at 6:50 AM, Deij-van Rijswijk, Menno > wrote: > > > Good morning, > > I find that when I set the environment variable CMAKE_GENERATOR to something that is not "Unix Makefiles" (e.g. Ninja) that building of subprojects like METIS fails. METIS gets configured with CMake, and writes out ninja.build build instructions. Then PETSc calls make/gmake to build and it can't find the makefiles because they're not generated. It would be nice if PETSc could handle other build tools supported by CMake, like for example Ninja. > > Best regards, > > > Menno Deij - van Rijswijk > > > dr. ir. Menno A. Deij-van Rijswijk | Researcher | Research & Development | Vrijdag vrij. > In het algemeen op maandag en woensdag vanuit huis werkend, en op dinsdag en donderdag op kantoor. Actuele beschikbaarheid staat in mijn agenda. > MARIN | T +31 317 49 35 06 | M.Deij at marin.nl | www.marin.nl > > > MARIN news: > > > > Help us improve the spam filter. If this message contains SPAM, click here to report. Thank you, MARIN Digital Services > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Thu Mar 10 10:13:25 2022 From: jed at jedbrown.org (Jed Brown) Date: Thu, 10 Mar 2022 09:13:25 -0700 Subject: [petsc-users] TSBDF prr-load higher order solution In-Reply-To: References: <874k46zrvd.fsf@jedbrown.org> <4C1804B4-0B18-4540-ACCE-3390F373FD1F@anl.gov> Message-ID: <87czitu6oq.fsf@jedbrown.org> I think this is a reasonable thing to ask for, despite the IO data sizes being larger. I'd also note that the adaptive controller can have more "memory" in how it selects steps, so it'd be nice to have TSAdaptView/TSAdaptLoad so you can restart from a checkpoint without any impact on the trajectory. Alfredo J Duarte Gomez writes: > Hello Zhang and Hong, > > Thank you for your reply. > > As I described, I simply wanted to be able to restart a higher order BDF > from a previous solution. > > For example, if I want to restart a BDF-2 solution I can simply load times > (n is current time step ) t_n-1, t_n, load solutions y_n, and yn-1 from a > restart file and continue integration with a BDF-2 formula as if it never > stopped. > > This would replace the current default approach, which starts from a single > time tn, solution yn and uses lower order BDF steps as you build up to the > selected order. > > I am not sure why, but an abrupt change in integration order or time step > leads to unwanted numerical noise in my solution, which I blame on the high > nonlinearity of the system (I have tested extensively to rule out bugs). > > Thank you and let me know if you have any questions, > > -Alfredo > > On Wed, Mar 9, 2022 at 4:49 PM Zhang, Hong wrote: > >> TSTrajectory supports checkpointing for multistage methods and can >> certainly be extended to multistep methods. But I doubt it is the best >> solution to Alfredo?s problem. Alfredo, can you elaborate a bit on what you >> would like to do? TSBDF_Restart is already using the previous solution to >> restart the integration with first-order BDF. >> >> Hong(Mr.) >> >> > On Mar 9, 2022, at 4:24 PM, Jed Brown wrote: >> > >> > Can you restart using small low-order steps? >> > >> > Hong, does (or should) your trajectory stuff support an exact >> checkpointing scheme for BDF? >> > >> > I think we could add an interface to access the stored steps, but there >> are few things other than checkpointing that would make sense >> mathematically. Would you be up for making a merge request to add >> TSBDFGetStepVecs(TS ts, PetscInt *num_steps, const PetscReal **times, const >> Vec *vecs) and the respective setter? >> > >> > Alfredo J Duarte Gomez writes: >> > >> >> Good morning PETSC team, >> >> >> >> I am currently using a TSBDF object, which is working very well. >> >> >> >> However, I am running into trouble restarting higher order BDF methods. >> >> >> >> My problem is highly nonlinear, and when restarted for higher order BDF >> >> methods (using the TSBDF_Restart function), wiggles appear in a specific >> >> region of the solution. >> >> >> >> Is there any way I can initialize the higher order BDF restart loading >> >> previous solutions from a data file? I took a look at the code, but >> there >> >> is no obvious way to do this. >> >> >> >> Thanks, >> >> >> >> -Alfredo >> >> >> >> -- >> >> Alfredo Duarte >> >> Graduate Research Assistant >> >> The University of Texas at Austin >> >> > > -- > Alfredo Duarte > Graduate Research Assistant > The University of Texas at Austin From bsmith at petsc.dev Thu Mar 10 10:11:58 2022 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 10 Mar 2022 11:11:58 -0500 Subject: [petsc-users] TSBDF prr-load higher order solution In-Reply-To: References: <874k46zrvd.fsf@jedbrown.org> <4C1804B4-0B18-4540-ACCE-3390F373FD1F@anl.gov> Message-ID: <6C398D8F-2FB5-403E-9EAA-018CD4EEFB4B@petsc.dev> This seems completely reasonable. We should have a simple mechanism for multi-step methods to save the appropriate multisteps after an integration and to load them into a TS before a new integration. Barry > On Mar 10, 2022, at 11:05 AM, Alfredo J Duarte Gomez wrote: > > Hello Zhang and Hong, > > Thank you for your reply. > > As I described, I simply wanted to be able to restart a higher order BDF from a previous solution. > > For example, if I want to restart a BDF-2 solution I can simply load times (n is current time step ) t_n-1, t_n, load solutions y_n, and yn-1 from a restart file and continue integration with a BDF-2 formula as if it never stopped. > > This would replace the current default approach, which starts from a single time tn, solution yn and uses lower order BDF steps as you build up to the selected order. > > I am not sure why, but an abrupt change in integration order or time step leads to unwanted numerical noise in my solution, which I blame on the high nonlinearity of the system (I have tested extensively to rule out bugs). > > Thank you and let me know if you have any questions, > > -Alfredo > > On Wed, Mar 9, 2022 at 4:49 PM Zhang, Hong > wrote: > TSTrajectory supports checkpointing for multistage methods and can certainly be extended to multistep methods. But I doubt it is the best solution to Alfredo?s problem. Alfredo, can you elaborate a bit on what you would like to do? TSBDF_Restart is already using the previous solution to restart the integration with first-order BDF. > > Hong(Mr.) > > > On Mar 9, 2022, at 4:24 PM, Jed Brown > wrote: > > > > Can you restart using small low-order steps? > > > > Hong, does (or should) your trajectory stuff support an exact checkpointing scheme for BDF? > > > > I think we could add an interface to access the stored steps, but there are few things other than checkpointing that would make sense mathematically. Would you be up for making a merge request to add TSBDFGetStepVecs(TS ts, PetscInt *num_steps, const PetscReal **times, const Vec *vecs) and the respective setter? > > > > Alfredo J Duarte Gomez > writes: > > > >> Good morning PETSC team, > >> > >> I am currently using a TSBDF object, which is working very well. > >> > >> However, I am running into trouble restarting higher order BDF methods. > >> > >> My problem is highly nonlinear, and when restarted for higher order BDF > >> methods (using the TSBDF_Restart function), wiggles appear in a specific > >> region of the solution. > >> > >> Is there any way I can initialize the higher order BDF restart loading > >> previous solutions from a data file? I took a look at the code, but there > >> is no obvious way to do this. > >> > >> Thanks, > >> > >> -Alfredo > >> > >> -- > >> Alfredo Duarte > >> Graduate Research Assistant > >> The University of Texas at Austin > > > > -- > Alfredo Duarte > Graduate Research Assistant > The University of Texas at Austin -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongzhang at anl.gov Thu Mar 10 11:03:57 2022 From: hongzhang at anl.gov (Zhang, Hong) Date: Thu, 10 Mar 2022 17:03:57 +0000 Subject: [petsc-users] TSBDF prr-load higher order solution In-Reply-To: References: <874k46zrvd.fsf@jedbrown.org> <4C1804B4-0B18-4540-ACCE-3390F373FD1F@anl.gov> Message-ID: On Mar 10, 2022, at 10:05 AM, Alfredo J Duarte Gomez > wrote: Hello Zhang and Hong, Thank you for your reply. As I described, I simply wanted to be able to restart a higher order BDF from a previous solution. For example, if I want to restart a BDF-2 solution I can simply load times (n is current time step ) t_n-1, t_n, load solutions y_n, and yn-1 from a restart file and continue integration with a BDF-2 formula as if it never stopped. Are y_n and yn-1 from the restart file different from the states saved internally for BDF-2? Are you trying to modify these states yourself? A restart is needed typically when you have discontinuities in the system, so the solutions before the discontinuity point have to be discarded. If you simply want to modify the states, there should be better ways than using checkpoint-restart. Hong (Mr.) This would replace the current default approach, which starts from a single time tn, solution yn and uses lower order BDF steps as you build up to the selected order. I am not sure why, but an abrupt change in integration order or time step leads to unwanted numerical noise in my solution, which I blame on the high nonlinearity of the system (I have tested extensively to rule out bugs). Thank you and let me know if you have any questions, -Alfredo On Wed, Mar 9, 2022 at 4:49 PM Zhang, Hong > wrote: TSTrajectory supports checkpointing for multistage methods and can certainly be extended to multistep methods. But I doubt it is the best solution to Alfredo?s problem. Alfredo, can you elaborate a bit on what you would like to do? TSBDF_Restart is already using the previous solution to restart the integration with first-order BDF. Hong(Mr.) > On Mar 9, 2022, at 4:24 PM, Jed Brown > wrote: > > Can you restart using small low-order steps? > > Hong, does (or should) your trajectory stuff support an exact checkpointing scheme for BDF? > > I think we could add an interface to access the stored steps, but there are few things other than checkpointing that would make sense mathematically. Would you be up for making a merge request to add TSBDFGetStepVecs(TS ts, PetscInt *num_steps, const PetscReal **times, const Vec *vecs) and the respective setter? > > Alfredo J Duarte Gomez > writes: > >> Good morning PETSC team, >> >> I am currently using a TSBDF object, which is working very well. >> >> However, I am running into trouble restarting higher order BDF methods. >> >> My problem is highly nonlinear, and when restarted for higher order BDF >> methods (using the TSBDF_Restart function), wiggles appear in a specific >> region of the solution. >> >> Is there any way I can initialize the higher order BDF restart loading >> previous solutions from a data file? I took a look at the code, but there >> is no obvious way to do this. >> >> Thanks, >> >> -Alfredo >> >> -- >> Alfredo Duarte >> Graduate Research Assistant >> The University of Texas at Austin -- Alfredo Duarte Graduate Research Assistant The University of Texas at Austin -------------- next part -------------- An HTML attachment was scrubbed... URL: From aduarteg at utexas.edu Thu Mar 10 11:35:12 2022 From: aduarteg at utexas.edu (Alfredo J Duarte Gomez) Date: Thu, 10 Mar 2022 11:35:12 -0600 Subject: [petsc-users] TSBDF prr-load higher order solution In-Reply-To: References: <874k46zrvd.fsf@jedbrown.org> <4C1804B4-0B18-4540-ACCE-3390F373FD1F@anl.gov> Message-ID: Hello, The solutions y_n and y_n-1 are the same ones as those saved internally for the BDF-2. The practical problem is for example: run a simulation with a BDF-2 for 10 hrs, see that everything looks good and decide that this simulation is worth continuing for another 10 hrs. Except now when I start from that last solution, the TSBDF starts from a single solution and steps its way up to the BDF-2 (numerical noise appears), instead of simply being able to load the last two solutions from my HDF5 output file and continue as if it had not stopped. Maybe I confused you by referencing the TSBDF_Restart, instead of the "starting" procedure for higher order BDFs, I thought they were the same but maybe not. Thank you, -Alfredo On Thu, Mar 10, 2022 at 11:04 AM Zhang, Hong wrote: > > > On Mar 10, 2022, at 10:05 AM, Alfredo J Duarte Gomez > wrote: > > Hello Zhang and Hong, > > Thank you for your reply. > > As I described, I simply wanted to be able to restart a higher order BDF > from a previous solution. > > For example, if I want to restart a BDF-2 solution I can simply load times > (n is current time step ) t_n-1, t_n, load solutions y_n, and yn-1 from a > restart file and continue integration with a BDF-2 formula as if it never > stopped. > > > Are y_n and yn-1 from the restart file different from the states saved > internally for BDF-2? Are you trying to modify these states yourself? A > restart is needed typically when you have discontinuities in the system, so > the solutions before the discontinuity point have to be discarded. If you > simply want to modify the states, there should be better ways than using > checkpoint-restart. > > Hong (Mr.) > > > > This would replace the current default approach, which starts from a > single time tn, solution yn and uses lower order BDF steps as you build up > to the selected order. > > I am not sure why, but an abrupt change in integration order or time step > leads to unwanted numerical noise in my solution, which I blame on the high > nonlinearity of the system (I have tested extensively to rule out bugs). > > Thank you and let me know if you have any questions, > > -Alfredo > > On Wed, Mar 9, 2022 at 4:49 PM Zhang, Hong wrote: > >> TSTrajectory supports checkpointing for multistage methods and can >> certainly be extended to multistep methods. But I doubt it is the best >> solution to Alfredo?s problem. Alfredo, can you elaborate a bit on what you >> would like to do? TSBDF_Restart is already using the previous solution to >> restart the integration with first-order BDF. >> >> Hong(Mr.) >> >> > On Mar 9, 2022, at 4:24 PM, Jed Brown wrote: >> > >> > Can you restart using small low-order steps? >> > >> > Hong, does (or should) your trajectory stuff support an exact >> checkpointing scheme for BDF? >> > >> > I think we could add an interface to access the stored steps, but there >> are few things other than checkpointing that would make sense >> mathematically. Would you be up for making a merge request to add >> TSBDFGetStepVecs(TS ts, PetscInt *num_steps, const PetscReal **times, const >> Vec *vecs) and the respective setter? >> > >> > Alfredo J Duarte Gomez writes: >> > >> >> Good morning PETSC team, >> >> >> >> I am currently using a TSBDF object, which is working very well. >> >> >> >> However, I am running into trouble restarting higher order BDF methods. >> >> >> >> My problem is highly nonlinear, and when restarted for higher order BDF >> >> methods (using the TSBDF_Restart function), wiggles appear in a >> specific >> >> region of the solution. >> >> >> >> Is there any way I can initialize the higher order BDF restart loading >> >> previous solutions from a data file? I took a look at the code, but >> there >> >> is no obvious way to do this. >> >> >> >> Thanks, >> >> >> >> -Alfredo >> >> >> >> -- >> >> Alfredo Duarte >> >> Graduate Research Assistant >> >> The University of Texas at Austin >> >> > > -- > Alfredo Duarte > Graduate Research Assistant > The University of Texas at Austin > > > -- Alfredo Duarte Graduate Research Assistant The University of Texas at Austin -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongzhang at anl.gov Thu Mar 10 14:00:52 2022 From: hongzhang at anl.gov (Zhang, Hong) Date: Thu, 10 Mar 2022 20:00:52 +0000 Subject: [petsc-users] TSBDF prr-load higher order solution In-Reply-To: References: <874k46zrvd.fsf@jedbrown.org> <4C1804B4-0B18-4540-ACCE-3390F373FD1F@anl.gov> Message-ID: It is clear to me now. You need TSBDFGetStepVecs(TS ts, PetscInt *num_steps, const PetscReal **times, const Vec *vecs) as Jed suggested so that you can access the work vectors for TSBDF. In addition, you need to save the step size and the current order to jumpstart BDF (without starting from BDF-1). Setting the step size is straightforward with TSSetTimeStep(). To access/recover the order, you need something like TSBDF{Set|Get}CurrentOrder(TS ts, PetscInt **order). Thanks, Hong (Mr.) On Mar 10, 2022, at 11:35 AM, Alfredo J Duarte Gomez > wrote: Hello, The solutions y_n and y_n-1 are the same ones as those saved internally for the BDF-2. The practical problem is for example: run a simulation with a BDF-2 for 10 hrs, see that everything looks good and decide that this simulation is worth continuing for another 10 hrs. Except now when I start from that last solution, the TSBDF starts from a single solution and steps its way up to the BDF-2 (numerical noise appears), instead of simply being able to load the last two solutions from my HDF5 output file and continue as if it had not stopped. Maybe I confused you by referencing the TSBDF_Restart, instead of the "starting" procedure for higher order BDFs, I thought they were the same but maybe not. Thank you, -Alfredo On Thu, Mar 10, 2022 at 11:04 AM Zhang, Hong > wrote: On Mar 10, 2022, at 10:05 AM, Alfredo J Duarte Gomez > wrote: Hello Zhang and Hong, Thank you for your reply. As I described, I simply wanted to be able to restart a higher order BDF from a previous solution. For example, if I want to restart a BDF-2 solution I can simply load times (n is current time step ) t_n-1, t_n, load solutions y_n, and yn-1 from a restart file and continue integration with a BDF-2 formula as if it never stopped. Are y_n and yn-1 from the restart file different from the states saved internally for BDF-2? Are you trying to modify these states yourself? A restart is needed typically when you have discontinuities in the system, so the solutions before the discontinuity point have to be discarded. If you simply want to modify the states, there should be better ways than using checkpoint-restart. Hong (Mr.) This would replace the current default approach, which starts from a single time tn, solution yn and uses lower order BDF steps as you build up to the selected order. I am not sure why, but an abrupt change in integration order or time step leads to unwanted numerical noise in my solution, which I blame on the high nonlinearity of the system (I have tested extensively to rule out bugs). Thank you and let me know if you have any questions, -Alfredo On Wed, Mar 9, 2022 at 4:49 PM Zhang, Hong > wrote: TSTrajectory supports checkpointing for multistage methods and can certainly be extended to multistep methods. But I doubt it is the best solution to Alfredo?s problem. Alfredo, can you elaborate a bit on what you would like to do? TSBDF_Restart is already using the previous solution to restart the integration with first-order BDF. Hong(Mr.) > On Mar 9, 2022, at 4:24 PM, Jed Brown > wrote: > > Can you restart using small low-order steps? > > Hong, does (or should) your trajectory stuff support an exact checkpointing scheme for BDF? > > I think we could add an interface to access the stored steps, but there are few things other than checkpointing that would make sense mathematically. Would you be up for making a merge request to add TSBDFGetStepVecs(TS ts, PetscInt *num_steps, const PetscReal **times, const Vec *vecs) and the respective setter? > > Alfredo J Duarte Gomez > writes: > >> Good morning PETSC team, >> >> I am currently using a TSBDF object, which is working very well. >> >> However, I am running into trouble restarting higher order BDF methods. >> >> My problem is highly nonlinear, and when restarted for higher order BDF >> methods (using the TSBDF_Restart function), wiggles appear in a specific >> region of the solution. >> >> Is there any way I can initialize the higher order BDF restart loading >> previous solutions from a data file? I took a look at the code, but there >> is no obvious way to do this. >> >> Thanks, >> >> -Alfredo >> >> -- >> Alfredo Duarte >> Graduate Research Assistant >> The University of Texas at Austin -- Alfredo Duarte Graduate Research Assistant The University of Texas at Austin -- Alfredo Duarte Graduate Research Assistant The University of Texas at Austin -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Thu Mar 10 14:51:59 2022 From: jed at jedbrown.org (Jed Brown) Date: Thu, 10 Mar 2022 13:51:59 -0700 Subject: [petsc-users] TSBDF prr-load higher order solution In-Reply-To: References: <874k46zrvd.fsf@jedbrown.org> <4C1804B4-0B18-4540-ACCE-3390F373FD1F@anl.gov> Message-ID: <874k45ttsg.fsf@jedbrown.org> Would the order be inferred by the number of vectors in TSBDFSetStepVecs(TS ts, PetscInt num_steps, const PetscReal *times, const Vec *vecs)? "Zhang, Hong" writes: > It is clear to me now. You need TSBDFGetStepVecs(TS ts, PetscInt *num_steps, const PetscReal **times, const Vec *vecs) as Jed suggested so that you can access the work vectors for TSBDF. In addition, you need to save the step size and the current order to jumpstart BDF (without starting from BDF-1). Setting the step size is straightforward with TSSetTimeStep(). To access/recover the order, you need something like TSBDF{Set|Get}CurrentOrder(TS ts, PetscInt **order). > > Thanks, > Hong (Mr.) > > On Mar 10, 2022, at 11:35 AM, Alfredo J Duarte Gomez > wrote: > > Hello, > > The solutions y_n and y_n-1 are the same ones as those saved internally for the BDF-2. > > The practical problem is for example: run a simulation with a BDF-2 for 10 hrs, see that everything looks good and decide that this simulation is worth continuing for another 10 hrs. Except now when I start from that last solution, the TSBDF starts from a single solution and steps its way up to the BDF-2 (numerical noise appears), instead of simply being able to load the last two solutions from my HDF5 output file and continue as if it had not stopped. > > Maybe I confused you by referencing the TSBDF_Restart, instead of the "starting" procedure for higher order BDFs, I thought they were the same but maybe not. > > Thank you, > > -Alfredo > > > > On Thu, Mar 10, 2022 at 11:04 AM Zhang, Hong > wrote: > > > On Mar 10, 2022, at 10:05 AM, Alfredo J Duarte Gomez > wrote: > > Hello Zhang and Hong, > > Thank you for your reply. > > As I described, I simply wanted to be able to restart a higher order BDF from a previous solution. > > For example, if I want to restart a BDF-2 solution I can simply load times (n is current time step ) t_n-1, t_n, load solutions y_n, and yn-1 from a restart file and continue integration with a BDF-2 formula as if it never stopped. > > Are y_n and yn-1 from the restart file different from the states saved internally for BDF-2? Are you trying to modify these states yourself? A restart is needed typically when you have discontinuities in the system, so the solutions before the discontinuity point have to be discarded. If you simply want to modify the states, there should be better ways than using checkpoint-restart. > > Hong (Mr.) > > > > This would replace the current default approach, which starts from a single time tn, solution yn and uses lower order BDF steps as you build up to the selected order. > > I am not sure why, but an abrupt change in integration order or time step leads to unwanted numerical noise in my solution, which I blame on the high nonlinearity of the system (I have tested extensively to rule out bugs). > > Thank you and let me know if you have any questions, > > -Alfredo > > On Wed, Mar 9, 2022 at 4:49 PM Zhang, Hong > wrote: > TSTrajectory supports checkpointing for multistage methods and can certainly be extended to multistep methods. But I doubt it is the best solution to Alfredo?s problem. Alfredo, can you elaborate a bit on what you would like to do? TSBDF_Restart is already using the previous solution to restart the integration with first-order BDF. > > Hong(Mr.) > >> On Mar 9, 2022, at 4:24 PM, Jed Brown > wrote: >> >> Can you restart using small low-order steps? >> >> Hong, does (or should) your trajectory stuff support an exact checkpointing scheme for BDF? >> >> I think we could add an interface to access the stored steps, but there are few things other than checkpointing that would make sense mathematically. Would you be up for making a merge request to add TSBDFGetStepVecs(TS ts, PetscInt *num_steps, const PetscReal **times, const Vec *vecs) and the respective setter? >> >> Alfredo J Duarte Gomez > writes: >> >>> Good morning PETSC team, >>> >>> I am currently using a TSBDF object, which is working very well. >>> >>> However, I am running into trouble restarting higher order BDF methods. >>> >>> My problem is highly nonlinear, and when restarted for higher order BDF >>> methods (using the TSBDF_Restart function), wiggles appear in a specific >>> region of the solution. >>> >>> Is there any way I can initialize the higher order BDF restart loading >>> previous solutions from a data file? I took a look at the code, but there >>> is no obvious way to do this. >>> >>> Thanks, >>> >>> -Alfredo >>> >>> -- >>> Alfredo Duarte >>> Graduate Research Assistant >>> The University of Texas at Austin > > > > -- > Alfredo Duarte > Graduate Research Assistant > The University of Texas at Austin > > > > -- > Alfredo Duarte > Graduate Research Assistant > The University of Texas at Austin From hongzhang at anl.gov Thu Mar 10 15:40:19 2022 From: hongzhang at anl.gov (Zhang, Hong) Date: Thu, 10 Mar 2022 21:40:19 +0000 Subject: [petsc-users] TSBDF prr-load higher order solution In-Reply-To: <874k45ttsg.fsf@jedbrown.org> References: <874k46zrvd.fsf@jedbrown.org> <4C1804B4-0B18-4540-ACCE-3390F373FD1F@anl.gov> <874k45ttsg.fsf@jedbrown.org> Message-ID: <4243BDF3-76C4-41FB-AB68-7D77A5FB3CD3@anl.gov> > On Mar 10, 2022, at 2:51 PM, Jed Brown wrote: > > Would the order be inferred by the number of vectors in TSBDFSetStepVecs(TS ts, PetscInt num_steps, const PetscReal *times, const Vec *vecs)? Ah! Yes. num_steps could be used to set bdf->k which is the current order of BDF. Hong (Mr.) > "Zhang, Hong" writes: > >> It is clear to me now. You need TSBDFGetStepVecs(TS ts, PetscInt *num_steps, const PetscReal **times, const Vec *vecs) as Jed suggested so that you can access the work vectors for TSBDF. In addition, you need to save the step size and the current order to jumpstart BDF (without starting from BDF-1). Setting the step size is straightforward with TSSetTimeStep(). To access/recover the order, you need something like TSBDF{Set|Get}CurrentOrder(TS ts, PetscInt **order). >> >> Thanks, >> Hong (Mr.) >> >> On Mar 10, 2022, at 11:35 AM, Alfredo J Duarte Gomez > wrote: >> >> Hello, >> >> The solutions y_n and y_n-1 are the same ones as those saved internally for the BDF-2. >> >> The practical problem is for example: run a simulation with a BDF-2 for 10 hrs, see that everything looks good and decide that this simulation is worth continuing for another 10 hrs. Except now when I start from that last solution, the TSBDF starts from a single solution and steps its way up to the BDF-2 (numerical noise appears), instead of simply being able to load the last two solutions from my HDF5 output file and continue as if it had not stopped. >> >> Maybe I confused you by referencing the TSBDF_Restart, instead of the "starting" procedure for higher order BDFs, I thought they were the same but maybe not. >> >> Thank you, >> >> -Alfredo >> >> >> >> On Thu, Mar 10, 2022 at 11:04 AM Zhang, Hong > wrote: >> >> >> On Mar 10, 2022, at 10:05 AM, Alfredo J Duarte Gomez > wrote: >> >> Hello Zhang and Hong, >> >> Thank you for your reply. >> >> As I described, I simply wanted to be able to restart a higher order BDF from a previous solution. >> >> For example, if I want to restart a BDF-2 solution I can simply load times (n is current time step ) t_n-1, t_n, load solutions y_n, and yn-1 from a restart file and continue integration with a BDF-2 formula as if it never stopped. >> >> Are y_n and yn-1 from the restart file different from the states saved internally for BDF-2? Are you trying to modify these states yourself? A restart is needed typically when you have discontinuities in the system, so the solutions before the discontinuity point have to be discarded. If you simply want to modify the states, there should be better ways than using checkpoint-restart. >> >> Hong (Mr.) >> >> >> >> This would replace the current default approach, which starts from a single time tn, solution yn and uses lower order BDF steps as you build up to the selected order. >> >> I am not sure why, but an abrupt change in integration order or time step leads to unwanted numerical noise in my solution, which I blame on the high nonlinearity of the system (I have tested extensively to rule out bugs). >> >> Thank you and let me know if you have any questions, >> >> -Alfredo >> >> On Wed, Mar 9, 2022 at 4:49 PM Zhang, Hong > wrote: >> TSTrajectory supports checkpointing for multistage methods and can certainly be extended to multistep methods. But I doubt it is the best solution to Alfredo?s problem. Alfredo, can you elaborate a bit on what you would like to do? TSBDF_Restart is already using the previous solution to restart the integration with first-order BDF. >> >> Hong(Mr.) >> >>> On Mar 9, 2022, at 4:24 PM, Jed Brown > wrote: >>> >>> Can you restart using small low-order steps? >>> >>> Hong, does (or should) your trajectory stuff support an exact checkpointing scheme for BDF? >>> >>> I think we could add an interface to access the stored steps, but there are few things other than checkpointing that would make sense mathematically. Would you be up for making a merge request to add TSBDFGetStepVecs(TS ts, PetscInt *num_steps, const PetscReal **times, const Vec *vecs) and the respective setter? >>> >>> Alfredo J Duarte Gomez > writes: >>> >>>> Good morning PETSC team, >>>> >>>> I am currently using a TSBDF object, which is working very well. >>>> >>>> However, I am running into trouble restarting higher order BDF methods. >>>> >>>> My problem is highly nonlinear, and when restarted for higher order BDF >>>> methods (using the TSBDF_Restart function), wiggles appear in a specific >>>> region of the solution. >>>> >>>> Is there any way I can initialize the higher order BDF restart loading >>>> previous solutions from a data file? I took a look at the code, but there >>>> is no obvious way to do this. >>>> >>>> Thanks, >>>> >>>> -Alfredo >>>> >>>> -- >>>> Alfredo Duarte >>>> Graduate Research Assistant >>>> The University of Texas at Austin >> >> >> >> -- >> Alfredo Duarte >> Graduate Research Assistant >> The University of Texas at Austin >> >> >> >> -- >> Alfredo Duarte >> Graduate Research Assistant >> The University of Texas at Austin From sam.guo at cd-adapco.com Thu Mar 10 17:27:34 2022 From: sam.guo at cd-adapco.com (Sam Guo) Date: Thu, 10 Mar 2022 15:27:34 -0800 Subject: [petsc-users] How to call MUMPS for general symmetric matrix? Message-ID: Dear PETSc dev team, I have a general symmetric matrix which I want to call MUMPS symbolic to get memory estimate. I see MatLUFactorSymbolic and MatCholeskyFactor but there is no API for general symmetric. In MUMPS there are three types SYM: 0 unsymmetric; 1: symmetric positive definite; 2: general symmetric. I am wondering if PETSc API is missing or if I am missing something. Thanks, Sam -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Mar 10 20:05:26 2022 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 10 Mar 2022 21:05:26 -0500 Subject: [petsc-users] How to call MUMPS for general symmetric matrix? In-Reply-To: References: Message-ID: If you call the Cholesky matrix factorization it treats the matrix as general symmetric, if you call MatSetOption(mat,MAT_SPD,PETSC_TRUE); on the PETSc matrix before using it with MUMPS it will treat it as positive definite. > On Mar 10, 2022, at 6:27 PM, Sam Guo wrote: > > Dear PETSc dev team, > I have a general symmetric matrix which I want to call MUMPS symbolic to get memory estimate. I see MatLUFactorSymbolic and MatCholeskyFactor but there is no API for general symmetric. In MUMPS there are three types SYM: 0 unsymmetric; 1: symmetric positive definite; 2: general symmetric. I am wondering if PETSc API is missing or if I am missing something. > > Thanks, > Sam From aduarteg at utexas.edu Fri Mar 11 11:09:24 2022 From: aduarteg at utexas.edu (Alfredo J Duarte Gomez) Date: Fri, 11 Mar 2022 11:09:24 -0600 Subject: [petsc-users] TSBDF prr-load higher order solution In-Reply-To: <4243BDF3-76C4-41FB-AB68-7D77A5FB3CD3@anl.gov> References: <874k46zrvd.fsf@jedbrown.org> <4C1804B4-0B18-4540-ACCE-3390F373FD1F@anl.gov> <874k45ttsg.fsf@jedbrown.org> <4243BDF3-76C4-41FB-AB68-7D77A5FB3CD3@anl.gov> Message-ID: Thank you so much, This is exactly what I need. Just to clarify, these functions that you are mentioning already exist? Or you are simply sketching out what is going to be needed? Thank you, -Alfredo On Thu, Mar 10, 2022 at 3:40 PM Zhang, Hong wrote: > > > On Mar 10, 2022, at 2:51 PM, Jed Brown wrote: > > > > Would the order be inferred by the number of vectors in > TSBDFSetStepVecs(TS ts, PetscInt num_steps, const PetscReal *times, const > Vec *vecs)? > > Ah! Yes. num_steps could be used to set bdf->k which is the current order > of BDF. > > Hong (Mr.) > > > "Zhang, Hong" writes: > > > >> It is clear to me now. You need TSBDFGetStepVecs(TS ts, PetscInt > *num_steps, const PetscReal **times, const Vec *vecs) as Jed suggested so > that you can access the work vectors for TSBDF. In addition, you need to > save the step size and the current order to jumpstart BDF (without starting > from BDF-1). Setting the step size is straightforward with TSSetTimeStep(). > To access/recover the order, you need something like > TSBDF{Set|Get}CurrentOrder(TS ts, PetscInt **order). > >> > >> Thanks, > >> Hong (Mr.) > >> > >> On Mar 10, 2022, at 11:35 AM, Alfredo J Duarte Gomez < > aduarteg at utexas.edu> wrote: > >> > >> Hello, > >> > >> The solutions y_n and y_n-1 are the same ones as those saved internally > for the BDF-2. > >> > >> The practical problem is for example: run a simulation with a BDF-2 for > 10 hrs, see that everything looks good and decide that this simulation is > worth continuing for another 10 hrs. Except now when I start from that last > solution, the TSBDF starts from a single solution and steps its way up to > the BDF-2 (numerical noise appears), instead of simply being able to load > the last two solutions from my HDF5 output file and continue as if it had > not stopped. > >> > >> Maybe I confused you by referencing the TSBDF_Restart, instead of the > "starting" procedure for higher order BDFs, I thought they were the same > but maybe not. > >> > >> Thank you, > >> > >> -Alfredo > >> > >> > >> > >> On Thu, Mar 10, 2022 at 11:04 AM Zhang, Hong hongzhang at anl.gov>> wrote: > >> > >> > >> On Mar 10, 2022, at 10:05 AM, Alfredo J Duarte Gomez < > aduarteg at utexas.edu> wrote: > >> > >> Hello Zhang and Hong, > >> > >> Thank you for your reply. > >> > >> As I described, I simply wanted to be able to restart a higher order > BDF from a previous solution. > >> > >> For example, if I want to restart a BDF-2 solution I can simply load > times (n is current time step ) t_n-1, t_n, load solutions y_n, and yn-1 > from a restart file and continue integration with a BDF-2 formula as if it > never stopped. > >> > >> Are y_n and yn-1 from the restart file different from the states saved > internally for BDF-2? Are you trying to modify these states yourself? A > restart is needed typically when you have discontinuities in the system, so > the solutions before the discontinuity point have to be discarded. If you > simply want to modify the states, there should be better ways than using > checkpoint-restart. > >> > >> Hong (Mr.) > >> > >> > >> > >> This would replace the current default approach, which starts from a > single time tn, solution yn and uses lower order BDF steps as you build up > to the selected order. > >> > >> I am not sure why, but an abrupt change in integration order or time > step leads to unwanted numerical noise in my solution, which I blame on the > high nonlinearity of the system (I have tested extensively to rule out > bugs). > >> > >> Thank you and let me know if you have any questions, > >> > >> -Alfredo > >> > >> On Wed, Mar 9, 2022 at 4:49 PM Zhang, Hong hongzhang at anl.gov>> wrote: > >> TSTrajectory supports checkpointing for multistage methods and can > certainly be extended to multistep methods. But I doubt it is the best > solution to Alfredo?s problem. Alfredo, can you elaborate a bit on what you > would like to do? TSBDF_Restart is already using the previous solution to > restart the integration with first-order BDF. > >> > >> Hong(Mr.) > >> > >>> On Mar 9, 2022, at 4:24 PM, Jed Brown jed at jedbrown.org>> wrote: > >>> > >>> Can you restart using small low-order steps? > >>> > >>> Hong, does (or should) your trajectory stuff support an exact > checkpointing scheme for BDF? > >>> > >>> I think we could add an interface to access the stored steps, but > there are few things other than checkpointing that would make sense > mathematically. Would you be up for making a merge request to add > TSBDFGetStepVecs(TS ts, PetscInt *num_steps, const PetscReal **times, const > Vec *vecs) and the respective setter? > >>> > >>> Alfredo J Duarte Gomez > > writes: > >>> > >>>> Good morning PETSC team, > >>>> > >>>> I am currently using a TSBDF object, which is working very well. > >>>> > >>>> However, I am running into trouble restarting higher order BDF > methods. > >>>> > >>>> My problem is highly nonlinear, and when restarted for higher order > BDF > >>>> methods (using the TSBDF_Restart function), wiggles appear in a > specific > >>>> region of the solution. > >>>> > >>>> Is there any way I can initialize the higher order BDF restart loading > >>>> previous solutions from a data file? I took a look at the code, but > there > >>>> is no obvious way to do this. > >>>> > >>>> Thanks, > >>>> > >>>> -Alfredo > >>>> > >>>> -- > >>>> Alfredo Duarte > >>>> Graduate Research Assistant > >>>> The University of Texas at Austin > >> > >> > >> > >> -- > >> Alfredo Duarte > >> Graduate Research Assistant > >> The University of Texas at Austin > >> > >> > >> > >> -- > >> Alfredo Duarte > >> Graduate Research Assistant > >> The University of Texas at Austin > > -- Alfredo Duarte Graduate Research Assistant The University of Texas at Austin -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Fri Mar 11 12:48:59 2022 From: jed at jedbrown.org (Jed Brown) Date: Fri, 11 Mar 2022 11:48:59 -0700 Subject: [petsc-users] TSBDF prr-load higher order solution In-Reply-To: References: <874k46zrvd.fsf@jedbrown.org> <4C1804B4-0B18-4540-ACCE-3390F373FD1F@anl.gov> <874k45ttsg.fsf@jedbrown.org> <4243BDF3-76C4-41FB-AB68-7D77A5FB3CD3@anl.gov> Message-ID: <87cziss4tg.fsf@jedbrown.org> They don't exist, but would be simple to implement. Is that something you might be able to contribute (we can advise) or that you want us to do? Alfredo J Duarte Gomez writes: > Thank you so much, > > This is exactly what I need. Just to clarify, these functions that you are > mentioning already exist? > > Or you are simply sketching out what is going to be needed? > > Thank you, > > -Alfredo > > > On Thu, Mar 10, 2022 at 3:40 PM Zhang, Hong wrote: > >> >> > On Mar 10, 2022, at 2:51 PM, Jed Brown wrote: >> > >> > Would the order be inferred by the number of vectors in >> TSBDFSetStepVecs(TS ts, PetscInt num_steps, const PetscReal *times, const >> Vec *vecs)? >> >> Ah! Yes. num_steps could be used to set bdf->k which is the current order >> of BDF. >> >> Hong (Mr.) >> >> > "Zhang, Hong" writes: >> > >> >> It is clear to me now. You need TSBDFGetStepVecs(TS ts, PetscInt >> *num_steps, const PetscReal **times, const Vec *vecs) as Jed suggested so >> that you can access the work vectors for TSBDF. In addition, you need to >> save the step size and the current order to jumpstart BDF (without starting >> from BDF-1). Setting the step size is straightforward with TSSetTimeStep(). >> To access/recover the order, you need something like >> TSBDF{Set|Get}CurrentOrder(TS ts, PetscInt **order). >> >> >> >> Thanks, >> >> Hong (Mr.) >> >> >> >> On Mar 10, 2022, at 11:35 AM, Alfredo J Duarte Gomez < >> aduarteg at utexas.edu> wrote: >> >> >> >> Hello, >> >> >> >> The solutions y_n and y_n-1 are the same ones as those saved internally >> for the BDF-2. >> >> >> >> The practical problem is for example: run a simulation with a BDF-2 for >> 10 hrs, see that everything looks good and decide that this simulation is >> worth continuing for another 10 hrs. Except now when I start from that last >> solution, the TSBDF starts from a single solution and steps its way up to >> the BDF-2 (numerical noise appears), instead of simply being able to load >> the last two solutions from my HDF5 output file and continue as if it had >> not stopped. >> >> >> >> Maybe I confused you by referencing the TSBDF_Restart, instead of the >> "starting" procedure for higher order BDFs, I thought they were the same >> but maybe not. >> >> >> >> Thank you, >> >> >> >> -Alfredo >> >> >> >> >> >> >> >> On Thu, Mar 10, 2022 at 11:04 AM Zhang, Hong > hongzhang at anl.gov>> wrote: >> >> >> >> >> >> On Mar 10, 2022, at 10:05 AM, Alfredo J Duarte Gomez < >> aduarteg at utexas.edu> wrote: >> >> >> >> Hello Zhang and Hong, >> >> >> >> Thank you for your reply. >> >> >> >> As I described, I simply wanted to be able to restart a higher order >> BDF from a previous solution. >> >> >> >> For example, if I want to restart a BDF-2 solution I can simply load >> times (n is current time step ) t_n-1, t_n, load solutions y_n, and yn-1 >> from a restart file and continue integration with a BDF-2 formula as if it >> never stopped. >> >> >> >> Are y_n and yn-1 from the restart file different from the states saved >> internally for BDF-2? Are you trying to modify these states yourself? A >> restart is needed typically when you have discontinuities in the system, so >> the solutions before the discontinuity point have to be discarded. If you >> simply want to modify the states, there should be better ways than using >> checkpoint-restart. >> >> >> >> Hong (Mr.) >> >> >> >> >> >> >> >> This would replace the current default approach, which starts from a >> single time tn, solution yn and uses lower order BDF steps as you build up >> to the selected order. >> >> >> >> I am not sure why, but an abrupt change in integration order or time >> step leads to unwanted numerical noise in my solution, which I blame on the >> high nonlinearity of the system (I have tested extensively to rule out >> bugs). >> >> >> >> Thank you and let me know if you have any questions, >> >> >> >> -Alfredo >> >> >> >> On Wed, Mar 9, 2022 at 4:49 PM Zhang, Hong > hongzhang at anl.gov>> wrote: >> >> TSTrajectory supports checkpointing for multistage methods and can >> certainly be extended to multistep methods. But I doubt it is the best >> solution to Alfredo?s problem. Alfredo, can you elaborate a bit on what you >> would like to do? TSBDF_Restart is already using the previous solution to >> restart the integration with first-order BDF. >> >> >> >> Hong(Mr.) >> >> >> >>> On Mar 9, 2022, at 4:24 PM, Jed Brown > jed at jedbrown.org>> wrote: >> >>> >> >>> Can you restart using small low-order steps? >> >>> >> >>> Hong, does (or should) your trajectory stuff support an exact >> checkpointing scheme for BDF? >> >>> >> >>> I think we could add an interface to access the stored steps, but >> there are few things other than checkpointing that would make sense >> mathematically. Would you be up for making a merge request to add >> TSBDFGetStepVecs(TS ts, PetscInt *num_steps, const PetscReal **times, const >> Vec *vecs) and the respective setter? >> >>> >> >>> Alfredo J Duarte Gomez > >> writes: >> >>> >> >>>> Good morning PETSC team, >> >>>> >> >>>> I am currently using a TSBDF object, which is working very well. >> >>>> >> >>>> However, I am running into trouble restarting higher order BDF >> methods. >> >>>> >> >>>> My problem is highly nonlinear, and when restarted for higher order >> BDF >> >>>> methods (using the TSBDF_Restart function), wiggles appear in a >> specific >> >>>> region of the solution. >> >>>> >> >>>> Is there any way I can initialize the higher order BDF restart loading >> >>>> previous solutions from a data file? I took a look at the code, but >> there >> >>>> is no obvious way to do this. >> >>>> >> >>>> Thanks, >> >>>> >> >>>> -Alfredo >> >>>> >> >>>> -- >> >>>> Alfredo Duarte >> >>>> Graduate Research Assistant >> >>>> The University of Texas at Austin >> >> >> >> >> >> >> >> -- >> >> Alfredo Duarte >> >> Graduate Research Assistant >> >> The University of Texas at Austin >> >> >> >> >> >> >> >> -- >> >> Alfredo Duarte >> >> Graduate Research Assistant >> >> The University of Texas at Austin >> >> > > -- > Alfredo Duarte > Graduate Research Assistant > The University of Texas at Austin From tangqi at msu.edu Fri Mar 11 14:31:49 2022 From: tangqi at msu.edu (Tang, Qi) Date: Fri, 11 Mar 2022 20:31:49 +0000 Subject: [petsc-users] Fieldsplit for 4 fields Message-ID: <46D54D47-9A0B-4078-B1D2-2313562BD909@msu.edu> Hi, I am trying to solve a four field system with three nested fieldsplit (I am in petsc/dmstag directly). I think I have all the IS info in the original system. I am wondering how to set up IS for the split system. More specifically, I would like to call something like this -pc_type fieldsplit -pc_fieldsplit_type schur -pc_fieldsplit_0_fields 0 -pc_fieldsplit_1_fields 1,2,3 -fieldsplit_1_ksp_type fgmres -fieldsplit_1_pc_type fieldsplit -fieldsplit_1_pc_fieldsplit_type multiplicative -fieldsplit_1_fieldsplit_1_ksp_type fgmres -fieldsplit_1_fieldsplit_1_pc_type fieldsplit -fieldsplit_1_fieldsplit_1_pc_fieldsplit_type schur I know the first level probably would work. But the second and third levels would not. We have two components living on one type of dofs. So the natural split 0,1,2,3 do not work. Therefore, at the first level I am setting up split through PCFieldSplitSetIS(pc, ?i", myis[i]); How could I know the sub ISs and set it up correctly? Thanks. Qi Tang T5 at LANL From luke.roskop at hpe.com Fri Mar 11 14:34:00 2022 From: luke.roskop at hpe.com (Roskop, Luke B) Date: Fri, 11 Mar 2022 20:34:00 +0000 Subject: [petsc-users] Problem building petsc +rocm variant Message-ID: Hi, I?m hoping you can help me figure out how to build PETSc targeting AMDGPUs (gfx90a GPU). I?m attempting to build on the CrayEx ORNL system called crusher, using ROCmCC (AMD?s compiler), and cray-mpich. In case it helps, I?m using spack to build petsc with the ?petsc at main%rocmcc+batch+rocm amdgpu_target=gfx90a? spec. Spack ends up invoking the following configure for PETSc: '/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/python-3.9.10-7y7mxajn5rywz5xdnba4azphcdodxiub/bin/python3.9' 'configure' '--prefix=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/petsc-main-mccbycx66la7rlx6jv44f6zd63cmdzm7' '--with-ssl=0' '--download-c2html=0' '--download-sowing=0' '--download-hwloc=0' 'CFLAGS=' 'FFLAGS=-fPIC' 'CXXFLAGS=' 'LDFLAGS=-Wl,-z,notext' '--with-cc=/opt/cray/pe/mpich/8.1.12/ofi/amd/4.4/bin/mpicc' '--with-cxx=/opt/cray/pe/mpich/8.1.12/ofi/amd/4.4/bin/mpicxx' '--with-fc=/opt/cray/pe/mpich/8.1.12/ofi/amd/4.4/bin/mpif90' '--with-precision=double' '--with-scalar-type=real' '--with-shared-libraries=1' '--with-debugging=0' '--with-openmp=0' '--with-64-bit-indices=0' 'COPTFLAGS=' 'FOPTFLAGS=' 'CXXOPTFLAGS=' '--with-blas-lapack-lib=/opt/cray/pe/libsci/21.08.1.2/AMD/4.0/x86_64/lib/libsci_amd.so' '--with-batch=1' '--with-x=0' '--with-clanguage=C' '--with-cuda=0' '--with-hip=1' '--with-hip-dir=/opt/rocm-4.5.0/hip' '--with-metis=1' '--with-metis-include=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/metis-5.1.0-zn5rn5srr7qzxyo5tq36d46adcsyc5a7/include' '--with-metis-lib=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/metis-5.1.0-zn5rn5srr7qzxyo5tq36d46adcsyc5a7/lib/libmetis.so' '--with-hypre=1' '--with-hypre-include=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/hypre-develop-vgfx3lhhloq4cnethsrrpz7iez7x6wad/include' '--with-hypre-lib=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/hypre-develop-vgfx3lhhloq4cnethsrrpz7iez7x6wad/lib/libHYPRE.so' '--with-parmetis=1' '--with-parmetis-include=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/parmetis-4.0.3-6jqxqmt7qqq73rxmx3beu5ba4vj3r253/include' '--with-parmetis-lib=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/parmetis-4.0.3-6jqxqmt7qqq73rxmx3beu5ba4vj3r253/lib/libparmetis.so' '--with-superlu_dist=1' '--with-superlu_dist-include=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/superlu-dist-develop-mpiyhomp4k72bilqn6xk7uol36ulsdve/include' '--with-superlu_dist-lib=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/superlu-dist-develop-mpiyhomp4k72bilqn6xk7uol36ulsdve/lib/libsuperlu_dist.so' '--with-ptscotch=0' '--with-suitesparse=0' '--with-hdf5=1' '--with-hdf5-include=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/hdf5-1.12.1-dp5vqo4tjh6oi7szpcsqkdlifgjxknzf/include' '--with-hdf5-lib=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/hdf5-1.12.1-dp5vqo4tjh6oi7szpcsqkdlifgjxknzf/lib/libhdf5.so' '--with-zlib=1' '--with-zlib-include=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/zlib-1.2.11-2ciasfxwyxanyohroisdpvidg4gs2fdy/include' '--with-zlib-lib=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/zlib-1.2.11-2ciasfxwyxanyohroisdpvidg4gs2fdy/lib/libz.so' '--with-mumps=0' '--with-trilinos=0' '--with-fftw=0' '--with-valgrind=0' '--with-gmp=0' '--with-libpng=0' '--with-giflib=0' '--with-mpfr=0' '--with-netcdf=0' '--with-pnetcdf=0' '--with-moab=0' '--with-random123=0' '--with-exodusii=0' '--with-cgns=0' '--with-memkind=0' '--with-p4est=0' '--with-saws=0' '--with-yaml=0' '--with-hwloc=0' '--with-libjpeg=0' '--with-scalapack=1' '--with-scalapack-lib=/opt/cray/pe/libsci/21.08.1.2/AMD/4.0/x86_64/lib/libsci_amd.so' '--with-strumpack=0' '--with-mmg=0' '--with-parmmg=0' '--with-tetgen=0' '--with-cxx-dialect=C++11' Using spack, I see this error at compile time: /tmp/lukebr/spack-stage/spack-stage-petsc-main-5jlv6jcfdaa37iy5zm77umvb6uvgwdo7/spack-src/src/vec/is/sf/impls/basic/sfpack.c:463:19: error: static declaration of 'MPI_Type_dup' follows non-static declaration static inline int MPI_Type_dup(MPI_Datatype datatype,MPI_Datatype *newtype) ^ /opt/cray/pe/mpich/8.1.12/ofi/cray/10.0/include/mpi.h:1291:5: note: previous declaration is here int MPI_Type_dup(MPI_Datatype oldtype, MPI_Datatype *newtype) MPICH_API_PUBLIC; ^ 1 error generated. To get around this error, I pass ?-DPETSC_HAVE_MPI_TYPE_DUP? but then I see the following lining error: CLINKER arch-linux-c-opt/lib/libpetsc.so.3.016.5 ld.lld: error: undefined hidden symbol: PetscSFCreate_Window >>> referenced by sfregi.c >>> arch-linux-c-opt/obj/vec/is/sf/interface/sfregi.o:(PetscSFRegisterAll) clang-13: error: linker command failed with exit code 1 (use -v to see invocation) gmake[3]: *** [gmakefile:113: arch-linux-c-opt/lib/libpetsc.so.3.016.5] Error 1 gmake[2]: *** [/tmp/lukebr/spack-stage/spack-stage-petsc-main-5jlv6jcfdaa37iy5zm77umvb6uvgwdo7/spack-src/lib/petsc/conf/rules:56: libs] Error 2 Before I continue, is there a preferred way to build PETSc on an AMDGPU system? Could you share this? Thanks, Luke -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Mar 11 14:47:31 2022 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 11 Mar 2022 15:47:31 -0500 Subject: [petsc-users] Problem building petsc +rocm variant In-Reply-To: References: Message-ID: Please send configure.log and make.log to petsc-maint at mcs.anl.gov For some reason PETSc's configure does not detect that the MPI does provide MPI_Type_dup() even though it is prototyped in the MPI include file. > On Mar 11, 2022, at 3:34 PM, Roskop, Luke B wrote: > > Hi, I?m hoping you can help me figure out how to build PETSc targeting AMDGPUs (gfx90a GPU). > > I?m attempting to build on the CrayEx ORNL system called crusher, using ROCmCC (AMD?s compiler), and cray-mpich. In case it helps, I?m using spack to build petsc with the ?petsc at main%rocmcc+batch+rocm amdgpu_target=gfx90a? spec. > > Spack ends up invoking the following configure for PETSc: > '/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/python-3.9.10-7y7mxajn5rywz5xdnba4azphcdodxiub/bin/python3.9' 'configure' '--prefix=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/petsc-main-mccbycx66la7rlx6jv44f6zd63cmdzm7' '--with-ssl=0' '--download-c2html=0' '--download-sowing=0' '--download-hwloc=0' 'CFLAGS=' 'FFLAGS=-fPIC' 'CXXFLAGS=' 'LDFLAGS=-Wl,-z,notext' '--with-cc=/opt/cray/pe/mpich/8.1.12/ofi/amd/4.4/bin/mpicc' '--with-cxx=/opt/cray/pe/mpich/8.1.12/ofi/amd/4.4/bin/mpicxx' '--with-fc=/opt/cray/pe/mpich/8.1.12/ofi/amd/4.4/bin/mpif90' '--with-precision=double' '--with-scalar-type=real' '--with-shared-libraries=1' '--with-debugging=0' '--with-openmp=0' '--with-64-bit-indices=0' 'COPTFLAGS=' 'FOPTFLAGS=' 'CXXOPTFLAGS=' '--with-blas-lapack-lib=/opt/cray/pe/libsci/21.08.1.2/AMD/4.0/x86_64/lib/libsci_amd.so' '--with-batch=1' '--with-x=0' '--with-clanguage=C' '--with-cuda=0' '--with-hip=1' '--with-hip-dir=/opt/rocm-4.5.0/hip' '--with-metis=1' '--with-metis-include=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/metis-5.1.0-zn5rn5srr7qzxyo5tq36d46adcsyc5a7/include' '--with-metis-lib=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/metis-5.1.0-zn5rn5srr7qzxyo5tq36d46adcsyc5a7/lib/libmetis.so' '--with-hypre=1' '--with-hypre-include=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/hypre-develop-vgfx3lhhloq4cnethsrrpz7iez7x6wad/include' '--with-hypre-lib=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/hypre-develop-vgfx3lhhloq4cnethsrrpz7iez7x6wad/lib/libHYPRE.so' '--with-parmetis=1' '--with-parmetis-include=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/parmetis-4.0.3-6jqxqmt7qqq73rxmx3beu5ba4vj3r253/include' '--with-parmetis-lib=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/parmetis-4.0.3-6jqxqmt7qqq73rxmx3beu5ba4vj3r253/lib/libparmetis.so' '--with-superlu_dist=1' '--with-superlu_dist-include=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/superlu-dist-develop-mpiyhomp4k72bilqn6xk7uol36ulsdve/include' '--with-superlu_dist-lib=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/superlu-dist-develop-mpiyhomp4k72bilqn6xk7uol36ulsdve/lib/libsuperlu_dist.so' '--with-ptscotch=0' '--with-suitesparse=0' '--with-hdf5=1' '--with-hdf5-include=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/hdf5-1.12.1-dp5vqo4tjh6oi7szpcsqkdlifgjxknzf/include' '--with-hdf5-lib=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/hdf5-1.12.1-dp5vqo4tjh6oi7szpcsqkdlifgjxknzf/lib/libhdf5.so' '--with-zlib=1' '--with-zlib-include=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/zlib-1.2.11-2ciasfxwyxanyohroisdpvidg4gs2fdy/include' '--with-zlib-lib=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/zlib-1.2.11-2ciasfxwyxanyohroisdpvidg4gs2fdy/lib/libz.so' '--with-mumps=0' '--with-trilinos=0' '--with-fftw=0' '--with-valgrind=0' '--with-gmp=0' '--with-libpng=0' '--with-giflib=0' '--with-mpfr=0' '--with-netcdf=0' '--with-pnetcdf=0' '--with-moab=0' '--with-random123=0' '--with-exodusii=0' '--with-cgns=0' '--with-memkind=0' '--with-p4est=0' '--with-saws=0' '--with-yaml=0' '--with-hwloc=0' '--with-libjpeg=0' '--with-scalapack=1' '--with-scalapack-lib=/opt/cray/pe/libsci/21.08.1.2/AMD/4.0/x86_64/lib/libsci_amd.so' '--with-strumpack=0' '--with-mmg=0' '--with-parmmg=0' '--with-tetgen=0' '--with-cxx-dialect=C++11' > > Using spack, I see this error at compile time: > > /tmp/lukebr/spack-stage/spack-stage-petsc-main-5jlv6jcfdaa37iy5zm77umvb6uvgwdo7/spack-src/src/vec/is/sf/impls/basic/sfpack.c:463:19: error: static declaration of 'MPI_Type_dup' follows non-static declaration > static inline int MPI_Type_dup(MPI_Datatype datatype,MPI_Datatype *newtype) > ^ > /opt/cray/pe/mpich/8.1.12/ofi/cray/10.0/include/mpi.h:1291:5: note: previous declaration is here > int MPI_Type_dup(MPI_Datatype oldtype, MPI_Datatype *newtype) MPICH_API_PUBLIC; > ^ > 1 error generated. > > To get around this error, I pass ?-DPETSC_HAVE_MPI_TYPE_DUP? but then I see the following lining error: > > CLINKER arch-linux-c-opt/lib/libpetsc.so.3.016.5 > ld.lld: error: undefined hidden symbol: PetscSFCreate_Window > >>> referenced by sfregi.c > >>> arch-linux-c-opt/obj/vec/is/sf/interface/sfregi.o:(PetscSFRegisterAll) > clang-13: error: linker command failed with exit code 1 (use -v to see invocation) > gmake[3]: *** [gmakefile:113: arch-linux-c-opt/lib/libpetsc.so.3.016.5] Error 1 > gmake[2]: *** [/tmp/lukebr/spack-stage/spack-stage-petsc-main-5jlv6jcfdaa37iy5zm77umvb6uvgwdo7/spack-src/lib/petsc/conf/rules:56: libs] Error 2 > > > Before I continue, is there a preferred way to build PETSc on an AMDGPU system? Could you share this? > > Thanks, > Luke -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Fri Mar 11 14:48:08 2022 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 11 Mar 2022 14:48:08 -0600 (CST) Subject: [petsc-users] Problem building petsc +rocm variant In-Reply-To: References: Message-ID: <366937b8-7cac-f2ff-ec8b-6c12a43d6b8c@mcs.anl.gov> Hm - I have a successful build via spack with: ./bin/spack install petsc at main%cce at 13.0.0~hdf5~hypre~metis~superlu-dist+rocm+kokkos amdgpu_target=gfx90a ^kokkos at 3.5.01 amdgpu_target=gfx90a ^kokkos-kernels at 3.5.01 [with some spack patches to get kokkos/kokkos-kernels built] One difference I see here is "%cce at 13.0.0" vs "%rocmcc" - don't know if that would result in this error. [the other is: ~hdf5~hypre~metis~superlu-dist] Also my builds are with rocm/4.5.2 [also 5.0.0] But the preferred build mode is native [without spack] - check config/examples/config/examples/arch-olcf-crusher.py Satish On Fri, 11 Mar 2022, Roskop, Luke B wrote: > Hi, I?m hoping you can help me figure out how to build PETSc targeting AMDGPUs (gfx90a GPU). > > I?m attempting to build on the CrayEx ORNL system called crusher, using ROCmCC (AMD?s compiler), and cray-mpich. In case it helps, I?m using spack to build petsc with the ?petsc at main%rocmcc+batch+rocm amdgpu_target=gfx90a? spec. > > Spack ends up invoking the following configure for PETSc: > > '/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/python-3.9.10-7y7mxajn5rywz5xdnba4azphcdodxiub/bin/python3.9' 'configure' '--prefix=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/petsc-main-mccbycx66la7rlx6jv44f6zd63cmdzm7' '--with-ssl=0' '--download-c2html=0' '--download-sowing=0' '--download-hwloc=0' 'CFLAGS=' 'FFLAGS=-fPIC' 'CXXFLAGS=' 'LDFLAGS=-Wl,-z,notext' '--with-cc=/opt/cray/pe/mpich/8.1.12/ofi/amd/4.4/bin/mpicc' '--with-cxx=/opt/cray/pe/mpich/8.1.12/ofi/amd/4.4/bin/mpicxx' '--with-fc=/opt/cray/pe/mpich/8.1.12/ofi/amd/4.4/bin/mpif90' '--with-precision=double' '--with-scalar-type=real' '--with-shared-libraries=1' '--with-debugging=0' '--with-openmp=0' '--with-64-bit-indices=0' 'COPTFLAGS=' 'FOPTFLAGS=' 'CXXOPTFLAGS=' '--with-blas-lapack-lib=/opt/cray/pe/libsci/21.08.1.2/AMD/4.0/x86_64/lib/libsci_amd.so' '--with-batch=1' '--with-x=0' '--with-clanguage=C' '--with-cuda=0' '-- with-hip =1' '--with-hip-dir=/opt/rocm-4.5.0/hip' '--with-metis=1' '--with-metis-include=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/metis-5.1.0-zn5rn5srr7qzxyo5tq36d46adcsyc5a7/include' '--with-metis-lib=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/metis-5.1.0-zn5rn5srr7qzxyo5tq36d46adcsyc5a7/lib/libmetis.so' '--with-hypre=1' '--with-hypre-include=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/hypre-develop-vgfx3lhhloq4cnethsrrpz7iez7x6wad/include' '--with-hypre-lib=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/hypre-develop-vgfx3lhhloq4cnethsrrpz7iez7x6wad/lib/libHYPRE.so' '--with-parmetis=1' '--with-parmetis-include=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/parmetis-4.0.3-6jqxqmt7qqq73rxmx3beu5ba4vj3r253/include' '--with-parmetis-lib= /gpfs/al pine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/parmetis-4.0.3-6jqxqmt7qqq73rxmx3beu5ba4vj3r253/lib/libparmetis.so' '--with-superlu_dist=1' '--with-superlu_dist-include=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/superlu-dist-develop-mpiyhomp4k72bilqn6xk7uol36ulsdve/include' '--with-superlu_dist-lib=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/superlu-dist-develop-mpiyhomp4k72bilqn6xk7uol36ulsdve/lib/libsuperlu_dist.so' '--with-ptscotch=0' '--with-suitesparse=0' '--with-hdf5=1' '--with-hdf5-include=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/hdf5-1.12.1-dp5vqo4tjh6oi7szpcsqkdlifgjxknzf/include' '--with-hdf5-lib=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/hdf5-1.12.1-dp5vqo4tjh6oi7szpcsqkdlifgjxknzf/lib/libhdf5.so' '--with-zlib=1' '--with -zlib-in clude=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/zlib-1.2.11-2ciasfxwyxanyohroisdpvidg4gs2fdy/include' '--with-zlib-lib=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/zlib-1.2.11-2ciasfxwyxanyohroisdpvidg4gs2fdy/lib/libz.so' '--with-mumps=0' '--with-trilinos=0' '--with-fftw=0' '--with-valgrind=0' '--with-gmp=0' '--with-libpng=0' '--with-giflib=0' '--with-mpfr=0' '--with-netcdf=0' '--with-pnetcdf=0' '--with-moab=0' '--with-random123=0' '--with-exodusii=0' '--with-cgns=0' '--with-memkind=0' '--with-p4est=0' '--with-saws=0' '--with-yaml=0' '--with-hwloc=0' '--with-libjpeg=0' '--with-scalapack=1' '--with-scalapack-lib=/opt/cray/pe/libsci/21.08.1.2/AMD/4.0/x86_64/lib/libsci_amd.so' '--with-strumpack=0' '--with-mmg=0' '--with-parmmg=0' '--with-tetgen=0' '--with-cxx-dialect=C++11' > > Using spack, I see this error at compile time: > > > /tmp/lukebr/spack-stage/spack-stage-petsc-main-5jlv6jcfdaa37iy5zm77umvb6uvgwdo7/spack-src/src/vec/is/sf/impls/basic/sfpack.c:463:19: error: static declaration of 'MPI_Type_dup' follows non-static declaration > > static inline int MPI_Type_dup(MPI_Datatype datatype,MPI_Datatype *newtype) > > ^ > > /opt/cray/pe/mpich/8.1.12/ofi/cray/10.0/include/mpi.h:1291:5: note: previous declaration is here > > int MPI_Type_dup(MPI_Datatype oldtype, MPI_Datatype *newtype) MPICH_API_PUBLIC; > > ^ > > 1 error generated. > > To get around this error, I pass ?-DPETSC_HAVE_MPI_TYPE_DUP? but then I see the following lining error: > > > CLINKER arch-linux-c-opt/lib/libpetsc.so.3.016.5 > > ld.lld: error: undefined hidden symbol: PetscSFCreate_Window > > >>> referenced by sfregi.c > > >>> arch-linux-c-opt/obj/vec/is/sf/interface/sfregi.o:(PetscSFRegisterAll) > > clang-13: error: linker command failed with exit code 1 (use -v to see invocation) > > gmake[3]: *** [gmakefile:113: arch-linux-c-opt/lib/libpetsc.so.3.016.5] Error 1 > > gmake[2]: *** [/tmp/lukebr/spack-stage/spack-stage-petsc-main-5jlv6jcfdaa37iy5zm77umvb6uvgwdo7/spack-src/lib/petsc/conf/rules:56: libs] Error 2 > > > Before I continue, is there a preferred way to build PETSc on an AMDGPU system? Could you share this? > > Thanks, > Luke > > From luke.roskop at hpe.com Fri Mar 11 15:34:53 2022 From: luke.roskop at hpe.com (Roskop, Luke B) Date: Fri, 11 Mar 2022 21:34:53 +0000 Subject: [petsc-users] Problem building petsc +rocm variant In-Reply-To: References: Message-ID: As requested, I attached the configure.log and make.log files Thanks! Luke From: Barry Smith Date: Friday, March 11, 2022 at 2:47 PM To: "Roskop, Luke B" Cc: "petsc-users at mcs.anl.gov" Subject: Re: [petsc-users] Problem building petsc +rocm variant Please send configure.log and make.log to petsc-maint at mcs.anl.gov For some reason PETSc's configure does not detect that the MPI does provide MPI_Type_dup() even though it is prototyped in the MPI include file. On Mar 11, 2022, at 3:34 PM, Roskop, Luke B > wrote: Hi, I?m hoping you can help me figure out how to build PETSc targeting AMDGPUs (gfx90a GPU). I?m attempting to build on the CrayEx ORNL system called crusher, using ROCmCC (AMD?s compiler), and cray-mpich. In case it helps, I?m using spack to build petsc with the ?petsc at main%rocmcc+batch+rocm amdgpu_target=gfx90a? spec. Spack ends up invoking the following configure for PETSc: '/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/python-3.9.10-7y7mxajn5rywz5xdnba4azphcdodxiub/bin/python3.9' 'configure' '--prefix=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/petsc-main-mccbycx66la7rlx6jv44f6zd63cmdzm7' '--with-ssl=0' '--download-c2html=0' '--download-sowing=0' '--download-hwloc=0' 'CFLAGS=' 'FFLAGS=-fPIC' 'CXXFLAGS=' 'LDFLAGS=-Wl,-z,notext' '--with-cc=/opt/cray/pe/mpich/8.1.12/ofi/amd/4.4/bin/mpicc' '--with-cxx=/opt/cray/pe/mpich/8.1.12/ofi/amd/4.4/bin/mpicxx' '--with-fc=/opt/cray/pe/mpich/8.1.12/ofi/amd/4.4/bin/mpif90' '--with-precision=double' '--with-scalar-type=real' '--with-shared-libraries=1' '--with-debugging=0' '--with-openmp=0' '--with-64-bit-indices=0' 'COPTFLAGS=' 'FOPTFLAGS=' 'CXXOPTFLAGS=' '--with-blas-lapack-lib=/opt/cray/pe/libsci/21.08.1.2/AMD/4.0/x86_64/lib/libsci_amd.so' '--with-batch=1' '--with-x=0' '--with-clanguage=C' '--with-cuda=0' '--with-hip=1' '--with-hip-dir=/opt/rocm-4.5.0/hip' '--with-metis=1' '--with-metis-include=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/metis-5.1.0-zn5rn5srr7qzxyo5tq36d46adcsyc5a7/include' '--with-metis-lib=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/metis-5.1.0-zn5rn5srr7qzxyo5tq36d46adcsyc5a7/lib/libmetis.so' '--with-hypre=1' '--with-hypre-include=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/hypre-develop-vgfx3lhhloq4cnethsrrpz7iez7x6wad/include' '--with-hypre-lib=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/hypre-develop-vgfx3lhhloq4cnethsrrpz7iez7x6wad/lib/libHYPRE.so' '--with-parmetis=1' '--with-parmetis-include=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/parmetis-4.0.3-6jqxqmt7qqq73rxmx3beu5ba4vj3r253/include' '--with-parmetis-lib=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/parmetis-4.0.3-6jqxqmt7qqq73rxmx3beu5ba4vj3r253/lib/libparmetis.so' '--with-superlu_dist=1' '--with-superlu_dist-include=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/superlu-dist-develop-mpiyhomp4k72bilqn6xk7uol36ulsdve/include' '--with-superlu_dist-lib=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/superlu-dist-develop-mpiyhomp4k72bilqn6xk7uol36ulsdve/lib/libsuperlu_dist.so' '--with-ptscotch=0' '--with-suitesparse=0' '--with-hdf5=1' '--with-hdf5-include=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/hdf5-1.12.1-dp5vqo4tjh6oi7szpcsqkdlifgjxknzf/include' '--with-hdf5-lib=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/hdf5-1.12.1-dp5vqo4tjh6oi7szpcsqkdlifgjxknzf/lib/libhdf5.so' '--with-zlib=1' '--with-zlib-include=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/zlib-1.2.11-2ciasfxwyxanyohroisdpvidg4gs2fdy/include' '--with-zlib-lib=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/zlib-1.2.11-2ciasfxwyxanyohroisdpvidg4gs2fdy/lib/libz.so' '--with-mumps=0' '--with-trilinos=0' '--with-fftw=0' '--with-valgrind=0' '--with-gmp=0' '--with-libpng=0' '--with-giflib=0' '--with-mpfr=0' '--with-netcdf=0' '--with-pnetcdf=0' '--with-moab=0' '--with-random123=0' '--with-exodusii=0' '--with-cgns=0' '--with-memkind=0' '--with-p4est=0' '--with-saws=0' '--with-yaml=0' '--with-hwloc=0' '--with-libjpeg=0' '--with-scalapack=1' '--with-scalapack-lib=/opt/cray/pe/libsci/21.08.1.2/AMD/4.0/x86_64/lib/libsci_amd.so' '--with-strumpack=0' '--with-mmg=0' '--with-parmmg=0' '--with-tetgen=0' '--with-cxx-dialect=C++11' Using spack, I see this error at compile time: /tmp/lukebr/spack-stage/spack-stage-petsc-main-5jlv6jcfdaa37iy5zm77umvb6uvgwdo7/spack-src/src/vec/is/sf/impls/basic/sfpack.c:463:19: error: static declaration of 'MPI_Type_dup' follows non-static declaration static inline int MPI_Type_dup(MPI_Datatype datatype,MPI_Datatype *newtype) ^ /opt/cray/pe/mpich/8.1.12/ofi/cray/10.0/include/mpi.h:1291:5: note: previous declaration is here int MPI_Type_dup(MPI_Datatype oldtype, MPI_Datatype *newtype) MPICH_API_PUBLIC; ^ 1 error generated. To get around this error, I pass ?-DPETSC_HAVE_MPI_TYPE_DUP? but then I see the following lining error: CLINKER arch-linux-c-opt/lib/libpetsc.so.3.016.5 ld.lld: error: undefined hidden symbol: PetscSFCreate_Window >>> referenced by sfregi.c >>> arch-linux-c-opt/obj/vec/is/sf/interface/sfregi.o:(PetscSFRegisterAll) clang-13: error: linker command failed with exit code 1 (use -v to see invocation) gmake[3]: *** [gmakefile:113: arch-linux-c-opt/lib/libpetsc.so.3.016.5] Error 1 gmake[2]: *** [/tmp/lukebr/spack-stage/spack-stage-petsc-main-5jlv6jcfdaa37iy5zm77umvb6uvgwdo7/spack-src/lib/petsc/conf/rules:56: libs] Error 2 Before I continue, is there a preferred way to build PETSc on an AMDGPU system? Could you share this? Thanks, Luke -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 1198364 bytes Desc: configure.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: make.log Type: application/octet-stream Size: 50072 bytes Desc: make.log URL: From balay at mcs.anl.gov Fri Mar 11 16:52:54 2022 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 11 Mar 2022 16:52:54 -0600 (CST) Subject: [petsc-users] Problem building petsc +rocm variant In-Reply-To: References: Message-ID: Looks like an issue with --with-batch=1 [where the MPI checks are skipped] Can you try the build without: '+batch'? Satish On Fri, 11 Mar 2022, Roskop, Luke B wrote: > As requested, I attached the configure.log and make.log files > Thanks! > Luke > > > From: Barry Smith > Date: Friday, March 11, 2022 at 2:47 PM > To: "Roskop, Luke B" > Cc: "petsc-users at mcs.anl.gov" > Subject: Re: [petsc-users] Problem building petsc +rocm variant > > > Please send configure.log and make.log to petsc-maint at mcs.anl.gov For some reason PETSc's configure does not detect that the MPI does provide MPI_Type_dup() even though it is prototyped in the MPI include file. > > > > On Mar 11, 2022, at 3:34 PM, Roskop, Luke B > wrote: > > Hi, I?m hoping you can help me figure out how to build PETSc targeting AMDGPUs (gfx90a GPU). > > I?m attempting to build on the CrayEx ORNL system called crusher, using ROCmCC (AMD?s compiler), and cray-mpich. In case it helps, I?m using spack to build petsc with the ?petsc at main%rocmcc+batch+rocm amdgpu_target=gfx90a? spec. > > Spack ends up invoking the following configure for PETSc: > '/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/python-3.9.10-7y7mxajn5rywz5xdnba4azphcdodxiub/bin/python3.9' 'configure' '--prefix=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/petsc-main-mccbycx66la7rlx6jv44f6zd63cmdzm7' '--with-ssl=0' '--download-c2html=0' '--download-sowing=0' '--download-hwloc=0' 'CFLAGS=' 'FFLAGS=-fPIC' 'CXXFLAGS=' 'LDFLAGS=-Wl,-z,notext' '--with-cc=/opt/cray/pe/mpich/8.1.12/ofi/amd/4.4/bin/mpicc' '--with-cxx=/opt/cray/pe/mpich/8.1.12/ofi/amd/4.4/bin/mpicxx' '--with-fc=/opt/cray/pe/mpich/8.1.12/ofi/amd/4.4/bin/mpif90' '--with-precision=double' '--with-scalar-type=real' '--with-shared-libraries=1' '--with-debugging=0' '--with-openmp=0' '--with-64-bit-indices=0' 'COPTFLAGS=' 'FOPTFLAGS=' 'CXXOPTFLAGS=' '--with-blas-lapack-lib=/opt/cray/pe/libsci/21.08.1.2/AMD/4.0/x86_64/lib/libsci_amd.so' '--with-batch=1' '--with-x=0' '--with-clanguage=C' '--with-cuda=0' '-- with-hip =1' '--with-hip-dir=/opt/rocm-4.5.0/hip' '--with-metis=1' '--with-metis-include=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/metis-5.1.0-zn5rn5srr7qzxyo5tq36d46adcsyc5a7/include' '--with-metis-lib=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/metis-5.1.0-zn5rn5srr7qzxyo5tq36d46adcsyc5a7/lib/libmetis.so' '--with-hypre=1' '--with-hypre-include=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/hypre-develop-vgfx3lhhloq4cnethsrrpz7iez7x6wad/include' '--with-hypre-lib=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/hypre-develop-vgfx3lhhloq4cnethsrrpz7iez7x6wad/lib/libHYPRE.so' '--with-parmetis=1' '--with-parmetis-include=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/parmetis-4.0.3-6jqxqmt7qqq73rxmx3beu5ba4vj3r253/include' '--with-parmetis-lib= /gpfs/al pine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/parmetis-4.0.3-6jqxqmt7qqq73rxmx3beu5ba4vj3r253/lib/libparmetis.so' '--with-superlu_dist=1' '--with-superlu_dist-include=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/superlu-dist-develop-mpiyhomp4k72bilqn6xk7uol36ulsdve/include' '--with-superlu_dist-lib=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/superlu-dist-develop-mpiyhomp4k72bilqn6xk7uol36ulsdve/lib/libsuperlu_dist.so' '--with-ptscotch=0' '--with-suitesparse=0' '--with-hdf5=1' '--with-hdf5-include=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/hdf5-1.12.1-dp5vqo4tjh6oi7szpcsqkdlifgjxknzf/include' '--with-hdf5-lib=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/hdf5-1.12.1-dp5vqo4tjh6oi7szpcsqkdlifgjxknzf/lib/libhdf5.so' '--with-zlib=1' '--with -zlib-in clude=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/zlib-1.2.11-2ciasfxwyxanyohroisdpvidg4gs2fdy/include' '--with-zlib-lib=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/zlib-1.2.11-2ciasfxwyxanyohroisdpvidg4gs2fdy/lib/libz.so' '--with-mumps=0' '--with-trilinos=0' '--with-fftw=0' '--with-valgrind=0' '--with-gmp=0' '--with-libpng=0' '--with-giflib=0' '--with-mpfr=0' '--with-netcdf=0' '--with-pnetcdf=0' '--with-moab=0' '--with-random123=0' '--with-exodusii=0' '--with-cgns=0' '--with-memkind=0' '--with-p4est=0' '--with-saws=0' '--with-yaml=0' '--with-hwloc=0' '--with-libjpeg=0' '--with-scalapack=1' '--with-scalapack-lib=/opt/cray/pe/libsci/21.08.1.2/AMD/4.0/x86_64/lib/libsci_amd.so' '--with-strumpack=0' '--with-mmg=0' '--with-parmmg=0' '--with-tetgen=0' '--with-cxx-dialect=C++11' > > Using spack, I see this error at compile time: > > /tmp/lukebr/spack-stage/spack-stage-petsc-main-5jlv6jcfdaa37iy5zm77umvb6uvgwdo7/spack-src/src/vec/is/sf/impls/basic/sfpack.c:463:19: error: static declaration of 'MPI_Type_dup' follows non-static declaration > static inline int MPI_Type_dup(MPI_Datatype datatype,MPI_Datatype *newtype) > ^ > /opt/cray/pe/mpich/8.1.12/ofi/cray/10.0/include/mpi.h:1291:5: note: previous declaration is here > int MPI_Type_dup(MPI_Datatype oldtype, MPI_Datatype *newtype) MPICH_API_PUBLIC; > ^ > 1 error generated. > > To get around this error, I pass ?-DPETSC_HAVE_MPI_TYPE_DUP? but then I see the following lining error: > > CLINKER arch-linux-c-opt/lib/libpetsc.so.3.016.5 > ld.lld: error: undefined hidden symbol: PetscSFCreate_Window > >>> referenced by sfregi.c > >>> arch-linux-c-opt/obj/vec/is/sf/interface/sfregi.o:(PetscSFRegisterAll) > clang-13: error: linker command failed with exit code 1 (use -v to see invocation) > gmake[3]: *** [gmakefile:113: arch-linux-c-opt/lib/libpetsc.so.3.016.5] Error 1 > gmake[2]: *** [/tmp/lukebr/spack-stage/spack-stage-petsc-main-5jlv6jcfdaa37iy5zm77umvb6uvgwdo7/spack-src/lib/petsc/conf/rules:56: libs] Error 2 > > > Before I continue, is there a preferred way to build PETSc on an AMDGPU system? Could you share this? > > Thanks, > Luke > > From bhargav.subramanya at kaust.edu.sa Sat Mar 12 12:00:52 2022 From: bhargav.subramanya at kaust.edu.sa (Bhargav Subramanya) Date: Sat, 12 Mar 2022 21:00:52 +0300 Subject: [petsc-users] Reuse MUMPS factorization Message-ID: Dear All, I have the following two queries: 1. I am running simulations using MUMPS through a job submission system. Since the job run time is limited, I need to restart the simulations periodically. Since I am restarting the simulation with exactly the same parameters, for instance, number of nodes, number of tasks per node, etc., is it possible to dump the matrix factors for the first time and reuse it for the remaining restarts? 2. In the Ax=b solve, for now, the Matrix A doesn't change. However, if I implement a slightly different time integration technique, matrix A changes with each time step (sparsity pattern is still the same). Is it possible to incorporate a modifying matrix A in the mumps solver? Thanks, Bhargav -- This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sat Mar 12 12:50:09 2022 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 12 Mar 2022 13:50:09 -0500 Subject: [petsc-users] Reuse MUMPS factorization In-Reply-To: References: Message-ID: On Sat, Mar 12, 2022 at 1:01 PM Bhargav Subramanya < bhargav.subramanya at kaust.edu.sa> wrote: > Dear All, > > I have the following two queries: > > 1. I am running simulations using MUMPS through a job submission system. > Since the job run time is limited, I need to restart the simulations > periodically. Since I am restarting the simulation with exactly the same > parameters, for instance, number of nodes, number of tasks per node, etc., > is it possible to dump the matrix factors for the first time and reuse it > for the remaining restarts? > We do not have access to the MUMPS internals, so this is a MUMPS mailing list question I think. > 2. In the Ax=b solve, for now, the Matrix A doesn't change. However, if I > implement a slightly different time integration technique, matrix A changes > with each time step (sparsity pattern is still the same). Is it possible to > incorporate a modifying matrix A in the mumps solver? > You can preserve the symbolic factorization, and just carry out the numerical factorization. I believe we do this automatically if you just update the values. Thanks, Matt > Thanks, > Bhargav > > ------------------------------ > This message and its contents, including attachments are intended solely > for the original recipient. If you are not the intended recipient or have > received this message in error, please notify me immediately and delete > this message from your computer system. Any unauthorized use or > distribution is prohibited. Please consider the environment before printing > this email. -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Sat Mar 12 15:34:15 2022 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Sat, 12 Mar 2022 15:34:15 -0600 Subject: [petsc-users] Problem building petsc +rocm variant In-Reply-To: References: Message-ID: Hi, Luke, I removed check of MPI_Type_dup in https://gitlab.com/petsc/petsc/-/merge_requests/4965 I hope with that you could build petsc +batch or ~batch --Junchao Zhang On Fri, Mar 11, 2022 at 4:53 PM Satish Balay via petsc-users < petsc-users at mcs.anl.gov> wrote: > Looks like an issue with --with-batch=1 [where the MPI checks are skipped] > > Can you try the build without: '+batch'? > > Satish > > > On Fri, 11 Mar 2022, Roskop, Luke B wrote: > > > As requested, I attached the configure.log and make.log files > > Thanks! > > Luke > > > > > > From: Barry Smith > > Date: Friday, March 11, 2022 at 2:47 PM > > To: "Roskop, Luke B" > > Cc: "petsc-users at mcs.anl.gov" > > Subject: Re: [petsc-users] Problem building petsc +rocm variant > > > > > > Please send configure.log and make.log to petsc-maint at mcs.anl.gov > For some reason PETSc's configure does > not detect that the MPI does provide MPI_Type_dup() even though it is > prototyped in the MPI include file. > > > > > > > > On Mar 11, 2022, at 3:34 PM, Roskop, Luke B luke.roskop at hpe.com>> wrote: > > > > Hi, I?m hoping you can help me figure out how to build PETSc targeting > AMDGPUs (gfx90a GPU). > > > > I?m attempting to build on the CrayEx ORNL system called crusher, using > ROCmCC (AMD?s compiler), and cray-mpich. In case it helps, I?m using spack > to build petsc with the ?petsc at main%rocmcc+batch+rocm > amdgpu_target=gfx90a? spec. > > > > Spack ends up invoking the following configure for PETSc: > > > '/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/python-3.9.10-7y7mxajn5rywz5xdnba4azphcdodxiub/bin/python3.9' > 'configure' > '--prefix=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/petsc-main-mccbycx66la7rlx6jv44f6zd63cmdzm7' > '--with-ssl=0' '--download-c2html=0' '--download-sowing=0' > '--download-hwloc=0' 'CFLAGS=' 'FFLAGS=-fPIC' 'CXXFLAGS=' > 'LDFLAGS=-Wl,-z,notext' > '--with-cc=/opt/cray/pe/mpich/8.1.12/ofi/amd/4.4/bin/mpicc' > '--with-cxx=/opt/cray/pe/mpich/8.1.12/ofi/amd/4.4/bin/mpicxx' > '--with-fc=/opt/cray/pe/mpich/8.1.12/ofi/amd/4.4/bin/mpif90' > '--with-precision=double' '--with-scalar-type=real' > '--with-shared-libraries=1' '--with-debugging=0' '--with-openmp=0' > '--with-64-bit-indices=0' 'COPTFLAGS=' 'FOPTFLAGS=' 'CXXOPTFLAGS=' > '--with-blas-lapack-lib=/opt/cray/pe/libsci/ > 21.08.1.2/AMD/4.0/x86_64/lib/libsci_amd.so' '--with-batch=1' '--with-x=0' > '--with-clanguage=C' '--with-cuda=0' '-- > with-hip > =1' '--with-hip-dir=/opt/rocm-4.5.0/hip' '--with-metis=1' > '--with-metis-include=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/metis-5.1.0-zn5rn5srr7qzxyo5tq36d46adcsyc5a7/include' > '--with-metis-lib=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/metis-5.1.0-zn5rn5srr7qzxyo5tq36d46adcsyc5a7/lib/libmetis.so' > '--with-hypre=1' > '--with-hypre-include=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/hypre-develop-vgfx3lhhloq4cnethsrrpz7iez7x6wad/include' > '--with-hypre-lib=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/hypre-develop-vgfx3lhhloq4cnethsrrpz7iez7x6wad/lib/libHYPRE.so' > '--with-parmetis=1' > '--with-parmetis-include=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/parmetis-4.0.3-6jqxqmt7qqq73rxmx3beu5ba4vj3r253/include' > '--with-parmetis-lib= > /gpfs/al > pine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/parmetis-4.0.3-6jqxqmt7qqq73rxmx3beu5ba4vj3r253/lib/libparmetis.so' > '--with-superlu_dist=1' > '--with-superlu_dist-include=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/superlu-dist-develop-mpiyhomp4k72bilqn6xk7uol36ulsdve/include' > '--with-superlu_dist-lib=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/superlu-dist-develop-mpiyhomp4k72bilqn6xk7uol36ulsdve/lib/libsuperlu_dist.so' > '--with-ptscotch=0' '--with-suitesparse=0' '--with-hdf5=1' > '--with-hdf5-include=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/hdf5-1.12.1-dp5vqo4tjh6oi7szpcsqkdlifgjxknzf/include' > '--with-hdf5-lib=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/hdf5-1.12.1-dp5vqo4tjh6oi7szpcsqkdlifgjxknzf/lib/libhdf5.so' > '--with-zlib=1' '--with > -zlib-in > clude=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/zlib-1.2.11-2ciasfxwyxanyohroisdpvidg4gs2fdy/include' > '--with-zlib-lib=/gpfs/alpine/ven114/scratch/lukebr/confidence/spack/install_tree/cray-sles15-zen3/rocmcc-4.5.0/zlib-1.2.11-2ciasfxwyxanyohroisdpvidg4gs2fdy/lib/libz.so' > '--with-mumps=0' '--with-trilinos=0' '--with-fftw=0' '--with-valgrind=0' > '--with-gmp=0' '--with-libpng=0' '--with-giflib=0' '--with-mpfr=0' > '--with-netcdf=0' '--with-pnetcdf=0' '--with-moab=0' '--with-random123=0' > '--with-exodusii=0' '--with-cgns=0' '--with-memkind=0' '--with-p4est=0' > '--with-saws=0' '--with-yaml=0' '--with-hwloc=0' '--with-libjpeg=0' > '--with-scalapack=1' '--with-scalapack-lib=/opt/cray/pe/libsci/ > 21.08.1.2/AMD/4.0/x86_64/lib/libsci_amd.so' '--with-strumpack=0' > '--with-mmg=0' '--with-parmmg=0' '--with-tetgen=0' > '--with-cxx-dialect=C++11' > > > > Using spack, I see this error at compile time: > > > > > /tmp/lukebr/spack-stage/spack-stage-petsc-main-5jlv6jcfdaa37iy5zm77umvb6uvgwdo7/spack-src/src/vec/is/sf/impls/basic/sfpack.c:463:19: > error: static declaration of 'MPI_Type_dup' follows non-static declaration > > static inline int MPI_Type_dup(MPI_Datatype datatype,MPI_Datatype > *newtype) > > ^ > > /opt/cray/pe/mpich/8.1.12/ofi/cray/10.0/include/mpi.h:1291:5: note: > previous declaration is here > > int MPI_Type_dup(MPI_Datatype oldtype, MPI_Datatype *newtype) > MPICH_API_PUBLIC; > > ^ > > 1 error generated. > > > > To get around this error, I pass ?-DPETSC_HAVE_MPI_TYPE_DUP? but then I > see the following lining error: > > > > CLINKER arch-linux-c-opt/lib/libpetsc.so.3.016.5 > > ld.lld: error: undefined hidden symbol: PetscSFCreate_Window > > >>> referenced by sfregi.c > > >>> > arch-linux-c-opt/obj/vec/is/sf/interface/sfregi.o:(PetscSFRegisterAll) > > clang-13: error: linker command failed with exit code 1 (use -v to see > invocation) > > gmake[3]: *** [gmakefile:113: arch-linux-c-opt/lib/libpetsc.so.3.016.5] > Error 1 > > gmake[2]: *** > [/tmp/lukebr/spack-stage/spack-stage-petsc-main-5jlv6jcfdaa37iy5zm77umvb6uvgwdo7/spack-src/lib/petsc/conf/rules:56: > libs] Error 2 > > > > > > Before I continue, is there a preferred way to build PETSc on an AMDGPU > system? Could you share this? > > > > Thanks, > > Luke > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sat Mar 12 16:21:24 2022 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 12 Mar 2022 17:21:24 -0500 Subject: [petsc-users] Fieldsplit for 4 fields In-Reply-To: <46D54D47-9A0B-4078-B1D2-2313562BD909@msu.edu> References: <46D54D47-9A0B-4078-B1D2-2313562BD909@msu.edu> Message-ID: On Fri, Mar 11, 2022 at 3:32 PM Tang, Qi wrote: > Hi, > I am trying to solve a four field system with three nested fieldsplit (I > am in petsc/dmstag directly). I think I have all the IS info in the > original system. I am wondering how to set up IS for the split system. > Some questions first: 1) Are you using DMStag? If so, the field split might be able to be automated. 2) Checking the division: Schur complement {1, 2, 3} and {0} Multiplicative {1,2} and {3} Schur complement {1} and {2} Thanks, Matt > More specifically, I would like to call something like this > -pc_type fieldsplit -pc_fieldsplit_type schur -pc_fieldsplit_0_fields 0 > -pc_fieldsplit_1_fields 1,2,3 > -fieldsplit_1_ksp_type fgmres -fieldsplit_1_pc_type fieldsplit > -fieldsplit_1_pc_fieldsplit_type multiplicative > -fieldsplit_1_fieldsplit_1_ksp_type fgmres > -fieldsplit_1_fieldsplit_1_pc_type fieldsplit > -fieldsplit_1_fieldsplit_1_pc_fieldsplit_type schur > > I know the first level probably would work. But the second and third > levels would not. > > We have two components living on one type of dofs. So the natural split > 0,1,2,3 do not work. Therefore, at the first level I am setting up split > through > PCFieldSplitSetIS(pc, ?i", myis[i]); > How could I know the sub ISs and set it up correctly? Thanks. > > > Qi Tang > T5 at LANL > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From tangqi at msu.edu Sat Mar 12 18:49:07 2022 From: tangqi at msu.edu (Tang, Qi) Date: Sun, 13 Mar 2022 00:49:07 +0000 Subject: [petsc-users] Fieldsplit for 4 fields In-Reply-To: References: <46D54D47-9A0B-4078-B1D2-2313562BD909@msu.edu> Message-ID: <454E49C9-C508-4B7E-B986-5855E542CE43@msu.edu> Thanks a lot, Matt. 1) Are you using DMStag? If so, the field split might be able to be automated. I think this is true only if I have one component on each vertex/face/edge. I have two dofs on vertex, corresponding to component 0 and 1. They are mixed together as I understand. Meanwhile, I am testing the three field version using automated split and two nested fieldsplit (so pc will handle vertex dofs together). I expect it will work. 2) Checking the division: I am trying to do Schur complement {1, 2, 3} and {0} Multiplicative {1} and {2, 3} Schur complement {2} and {3} The first level is a saddle point problem and the last level is parabolization. I know the diagonal operator for {1} and Schur complement for {3} works well with amg. Qi On Mar 12, 2022, at 3:21 PM, Matthew Knepley > wrote: On Fri, Mar 11, 2022 at 3:32 PM Tang, Qi > wrote: Hi, I am trying to solve a four field system with three nested fieldsplit (I am in petsc/dmstag directly). I think I have all the IS info in the original system. I am wondering how to set up IS for the split system. Some questions first: 1) Are you using DMStag? If so, the field split might be able to be automated. 2) Checking the division: Schur complement {1, 2, 3} and {0} Multiplicative {1,2} and {3} Schur complement {1} and {2} Thanks, Matt More specifically, I would like to call something like this -pc_type fieldsplit -pc_fieldsplit_type schur -pc_fieldsplit_0_fields 0 -pc_fieldsplit_1_fields 1,2,3 -fieldsplit_1_ksp_type fgmres -fieldsplit_1_pc_type fieldsplit -fieldsplit_1_pc_fieldsplit_type multiplicative -fieldsplit_1_fieldsplit_1_ksp_type fgmres -fieldsplit_1_fieldsplit_1_pc_type fieldsplit -fieldsplit_1_fieldsplit_1_pc_fieldsplit_type schur I know the first level probably would work. But the second and third levels would not. We have two components living on one type of dofs. So the natural split 0,1,2,3 do not work. Therefore, at the first level I am setting up split through PCFieldSplitSetIS(pc, ?i", myis[i]); How could I know the sub ISs and set it up correctly? Thanks. Qi Tang T5 at LANL -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bhargav.subramanya at kaust.edu.sa Sun Mar 13 10:00:23 2022 From: bhargav.subramanya at kaust.edu.sa (Bhargav Subramanya) Date: Sun, 13 Mar 2022 18:00:23 +0300 Subject: [petsc-users] Reuse MUMPS factorization In-Reply-To: References: Message-ID: Thanks a lot, Matt. On Sat, Mar 12, 2022 at 9:50 PM Matthew Knepley wrote: > On Sat, Mar 12, 2022 at 1:01 PM Bhargav Subramanya < > bhargav.subramanya at kaust.edu.sa> wrote: > >> Dear All, >> >> I have the following two queries: >> >> 1. I am running simulations using MUMPS through a job submission system. >> Since the job run time is limited, I need to restart the simulations >> periodically. Since I am restarting the simulation with exactly the same >> parameters, for instance, number of nodes, number of tasks per node, etc., >> is it possible to dump the matrix factors for the first time and reuse it >> for the remaining restarts? >> > > We do not have access to the MUMPS internals, so this is a MUMPS mailing > list question I think. > > >> 2. In the Ax=b solve, for now, the Matrix A doesn't change. However, if I >> implement a slightly different time integration technique, matrix A changes >> with each time step (sparsity pattern is still the same). Is it possible to >> incorporate a modifying matrix A in the mumps solver? >> > > You can preserve the symbolic factorization, and just carry out the > numerical factorization. I believe we do this automatically if you just > update the values. > > Thanks, > > Matt > > >> Thanks, >> Bhargav >> >> ------------------------------ >> This message and its contents, including attachments are intended solely >> for the original recipient. If you are not the intended recipient or have >> received this message in error, please notify me immediately and delete >> this message from your computer system. Any unauthorized use or >> distribution is prohibited. Please consider the environment before printing >> this email. > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From liluo at um.edu.mo Mon Mar 14 09:31:36 2022 From: liluo at um.edu.mo (liluo) Date: Mon, 14 Mar 2022 14:31:36 +0000 Subject: [petsc-users] How to obtain the local matrix from a global matrix in DMDA? Message-ID: <459a4f2fa00744d8a109884fde18c822@um.edu.mo> Dear developers, I created a DMDA object and obtain the local vector from a global vector by using DMGlobalToLocalBegin/End. How can I obtain the corresponding local matrix from the global matrix? In my case, the global matrix is of MPIBAIJ type created by DMCreateMatrix(da,MATBAIJ,&A); Bests, LI -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon Mar 14 09:47:52 2022 From: jed at jedbrown.org (Jed Brown) Date: Mon, 14 Mar 2022 08:47:52 -0600 Subject: [petsc-users] How to obtain the local matrix from a global matrix in DMDA? In-Reply-To: <459a4f2fa00744d8a109884fde18c822@um.edu.mo> References: <459a4f2fa00744d8a109884fde18c822@um.edu.mo> Message-ID: <874k40r3on.fsf@jedbrown.org> Could you explain more of what you mean by "local matrix"? If you're thinking of finite elements, then that doesn't exist and can't readily be constructed except at assembly time (see MATIS, for example). If you mean an overlapping block, then MatGetSubMatrix() or MatGetSubMatrices(), as used inside PCASM. If you mean only the diagonal block, then see MatGetDiagonalBlock(). liluo writes: > Dear developers, > > > I created a DMDA object and obtain the local vector from a global vector by using DMGlobalToLocalBegin/End. > > How can I obtain the corresponding local matrix from the global matrix? > > In my case, the global matrix is of MPIBAIJ type created by DMCreateMatrix(da,MATBAIJ,&A); > > > Bests, > > LI From knepley at gmail.com Mon Mar 14 09:47:55 2022 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 14 Mar 2022 10:47:55 -0400 Subject: [petsc-users] How to obtain the local matrix from a global matrix in DMDA? In-Reply-To: <459a4f2fa00744d8a109884fde18c822@um.edu.mo> References: <459a4f2fa00744d8a109884fde18c822@um.edu.mo> Message-ID: On Mon, Mar 14, 2022 at 10:35 AM liluo wrote: > Dear developers, > > I created a DMDA object and obtain the local vector from a global > vector by using DMGlobalToLocalBegin/End. > > How can I obtain the corresponding local matrix from the global matrix? > > In my case, the global matrix is of MPIBAIJ type created by > DMCreateMatrix(da,MATBAIJ,&A); > Are you asking for unassembled local matrices? Thanks, Matt > Bests, > > LI > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Mon Mar 14 09:49:47 2022 From: mfadams at lbl.gov (Mark Adams) Date: Mon, 14 Mar 2022 10:49:47 -0400 Subject: [petsc-users] How to obtain the local matrix from a global matrix in DMDA? In-Reply-To: <459a4f2fa00744d8a109884fde18c822@um.edu.mo> References: <459a4f2fa00744d8a109884fde18c822@um.edu.mo> Message-ID: You might want to explain what you want to do. A local vector has ghost values that you can use to compute local residuals. You don't usually want a matrix that conforms with this local vector. You could get the global equations of the local vector and use MatCreateSubMatrix, but I think there is a better way to do what you want. OK, Matt and Jed beat me On Mon, Mar 14, 2022 at 10:35 AM liluo wrote: > Dear developers, > > > I created a DMDA object and obtain the local vector from a global > vector by using DMGlobalToLocalBegin/End. > > How can I obtain the corresponding local matrix from the global matrix? > > In my case, the global matrix is of MPIBAIJ type created by > DMCreateMatrix(da,MATBAIJ,&A); > > > Bests, > > LI > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From liluo at um.edu.mo Mon Mar 14 10:05:30 2022 From: liluo at um.edu.mo (liluo) Date: Mon, 14 Mar 2022 15:05:30 +0000 Subject: [petsc-users] How to obtain the local matrix from a global matrix in DMDA? In-Reply-To: References: <459a4f2fa00744d8a109884fde18c822@um.edu.mo>, Message-ID: <035ac093b89044a38131b5fc5151e3de@um.edu.mo> Dear developers, I defined subdomain problems that have some layers of ghost points, which are exactly of the "local" size of a DA. And I want to generate the corresponding submatrix in each subdomain which contains those layers of ghost points. I know that MatGetSubMatrices can return a submatrix when determining the index set like PCASM, but how can I get the index set including those layers of ghost points? When a DA is setup, such IS for creating ltog mapping was already destroyed. Or you may have some simpler way to get this? Bests, Li Luo ________________________________ From: Mark Adams Sent: Monday, 14 March, 2022 22:49:47 To: liluo Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] How to obtain the local matrix from a global matrix in DMDA? You might want to explain what you want to do. A local vector has ghost values that you can use to compute local residuals. You don't usually want a matrix that conforms with this local vector. You could get the global equations of the local vector and use MatCreateSubMatrix, but I think there is a better way to do what you want. OK, Matt and Jed beat me On Mon, Mar 14, 2022 at 10:35 AM liluo > wrote: Dear developers, I created a DMDA object and obtain the local vector from a global vector by using DMGlobalToLocalBegin/End. How can I obtain the corresponding local matrix from the global matrix? In my case, the global matrix is of MPIBAIJ type created by DMCreateMatrix(da,MATBAIJ,&A); Bests, LI -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Mar 14 10:44:06 2022 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 14 Mar 2022 11:44:06 -0400 Subject: [petsc-users] How to obtain the local matrix from a global matrix in DMDA? In-Reply-To: <035ac093b89044a38131b5fc5151e3de@um.edu.mo> References: <459a4f2fa00744d8a109884fde18c822@um.edu.mo> <035ac093b89044a38131b5fc5151e3de@um.edu.mo> Message-ID: On Mon, Mar 14, 2022 at 11:05 AM liluo wrote: > Dear developers, > > > I defined subdomain problems that have some layers of ghost points, which > are exactly of the "local" size of a DA. > > And I want to generate the corresponding submatrix in each subdomain which > contains those layers of ghost points. > > > I know that MatGetSubMatrices can return a submatrix when determining the > index set like PCASM, but how can I get the index set including those > layers of ghost points? When a DA is setup, such IS for creating ltog > mapping was already destroyed. > > Or you may have some simpler way to get this? > You can use https://petsc.org/main/docs/manualpages/IS/ISLocalToGlobalMappingGetIndices.html to get the indices out of the L2G map, and then call MatGetSubmatrix(). Thanks, Matt > Bests, > > Li Luo > ------------------------------ > *From:* Mark Adams > *Sent:* Monday, 14 March, 2022 22:49:47 > *To:* liluo > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] How to obtain the local matrix from a global > matrix in DMDA? > > You might want to explain what you want to do. > A local vector has ghost values that you can use to compute local > residuals. You don't usually want a matrix that conforms with this local > vector. > You could get the global equations of the local vector and use > MatCreateSubMatrix, but I think there is a better way to do what you want. > > OK, Matt and Jed beat me > > On Mon, Mar 14, 2022 at 10:35 AM liluo wrote: > >> Dear developers, >> >> >> I created a DMDA object and obtain the local vector from a global >> vector by using DMGlobalToLocalBegin/End. >> >> How can I obtain the corresponding local matrix from the global matrix? >> >> In my case, the global matrix is of MPIBAIJ type created by >> DMCreateMatrix(da,MATBAIJ,&A); >> >> >> Bests, >> >> LI >> >> >> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From liluo at um.edu.mo Mon Mar 14 11:35:14 2022 From: liluo at um.edu.mo (liluo) Date: Mon, 14 Mar 2022 16:35:14 +0000 Subject: [petsc-users] How to obtain the local matrix from a global matrix in DMDA? In-Reply-To: References: <459a4f2fa00744d8a109884fde18c822@um.edu.mo> <035ac093b89044a38131b5fc5151e3de@um.edu.mo>, Message-ID: <4a29830d9c514e06a932cf4ae28d348b@um.edu.mo> Thank you! Bests, LI ________________________________ From: Matthew Knepley Sent: Monday, 14 March, 2022 23:44:06 To: liluo Cc: Mark Adams; jed at jedbrown.org; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] How to obtain the local matrix from a global matrix in DMDA? On Mon, Mar 14, 2022 at 11:05 AM liluo > wrote: Dear developers, I defined subdomain problems that have some layers of ghost points, which are exactly of the "local" size of a DA. And I want to generate the corresponding submatrix in each subdomain which contains those layers of ghost points. I know that MatGetSubMatrices can return a submatrix when determining the index set like PCASM, but how can I get the index set including those layers of ghost points? When a DA is setup, such IS for creating ltog mapping was already destroyed. Or you may have some simpler way to get this? You can use https://petsc.org/main/docs/manualpages/IS/ISLocalToGlobalMappingGetIndices.html to get the indices out of the L2G map, and then call MatGetSubmatrix(). Thanks, Matt Bests, Li Luo ________________________________ From: Mark Adams > Sent: Monday, 14 March, 2022 22:49:47 To: liluo Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] How to obtain the local matrix from a global matrix in DMDA? You might want to explain what you want to do. A local vector has ghost values that you can use to compute local residuals. You don't usually want a matrix that conforms with this local vector. You could get the global equations of the local vector and use MatCreateSubMatrix, but I think there is a better way to do what you want. OK, Matt and Jed beat me On Mon, Mar 14, 2022 at 10:35 AM liluo > wrote: Dear developers, I created a DMDA object and obtain the local vector from a global vector by using DMGlobalToLocalBegin/End. How can I obtain the corresponding local matrix from the global matrix? In my case, the global matrix is of MPIBAIJ type created by DMCreateMatrix(da,MATBAIJ,&A); Bests, LI -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From tangqi at msu.edu Mon Mar 14 16:02:40 2022 From: tangqi at msu.edu (Tang, Qi) Date: Mon, 14 Mar 2022 21:02:40 +0000 Subject: [petsc-users] Fieldsplit for 4 fields In-Reply-To: <454E49C9-C508-4B7E-B986-5855E542CE43@msu.edu> References: <46D54D47-9A0B-4078-B1D2-2313562BD909@msu.edu> <454E49C9-C508-4B7E-B986-5855E542CE43@msu.edu> Message-ID: <73220BED-2EFF-4EDB-B10F-94635E2A18D5@msu.edu> Matt and Patrick If I replace PCFieldSplitSetDetectSaddlePoint(pc,PETSC_TRUE); in https://petsc.org/main/src/dm/impls/stag/tutorials/ex3.c.html with const PetscInt ufield[] = {0,1,2},pfield[] = {3}; PCFieldSplitSetBlockSize(pc,4); PCFieldSplitSetFields(pc,"u",3,ufield,ufield); PCFieldSplitSetFields(pc,"p",1,pfield,pfield); It does not work. It seems petsc just divide all the dofs by four and assign ?components? one by one. I am not sure that is the right thing to do with dmstag. This would divide everything into two components of the same size, which is not correct either. const PetscInt ufield[] = {0},pfield[] = {1}; PCFieldSplitSetBlockSize(pc,2); PCFieldSplitSetFields(pc,"u?,1,ufield,ufield); PCFieldSplitSetFields(pc,"p",1,pfield,pfield); Is there an alternative solution to this? Qi On Mar 12, 2022, at 5:49 PM, Tang, Qi > wrote: Thanks a lot, Matt. 1) Are you using DMStag? If so, the field split might be able to be automated. I think this is true only if I have one component on each vertex/face/edge. I have two dofs on vertex, corresponding to component 0 and 1. They are mixed together as I understand. Meanwhile, I am testing the three field version using automated split and two nested fieldsplit (so pc will handle vertex dofs together). I expect it will work. 2) Checking the division: I am trying to do Schur complement {1, 2, 3} and {0} Multiplicative {1} and {2, 3} Schur complement {2} and {3} The first level is a saddle point problem and the last level is parabolization. I know the diagonal operator for {1} and Schur complement for {3} works well with amg. Qi On Mar 12, 2022, at 3:21 PM, Matthew Knepley > wrote: On Fri, Mar 11, 2022 at 3:32 PM Tang, Qi > wrote: Hi, I am trying to solve a four field system with three nested fieldsplit (I am in petsc/dmstag directly). I think I have all the IS info in the original system. I am wondering how to set up IS for the split system. Some questions first: 1) Are you using DMStag? If so, the field split might be able to be automated. 2) Checking the division: Schur complement {1, 2, 3} and {0} Multiplicative {1,2} and {3} Schur complement {1} and {2} Thanks, Matt More specifically, I would like to call something like this -pc_type fieldsplit -pc_fieldsplit_type schur -pc_fieldsplit_0_fields 0 -pc_fieldsplit_1_fields 1,2,3 -fieldsplit_1_ksp_type fgmres -fieldsplit_1_pc_type fieldsplit -fieldsplit_1_pc_fieldsplit_type multiplicative -fieldsplit_1_fieldsplit_1_ksp_type fgmres -fieldsplit_1_fieldsplit_1_pc_type fieldsplit -fieldsplit_1_fieldsplit_1_pc_fieldsplit_type schur I know the first level probably would work. But the second and third levels would not. We have two components living on one type of dofs. So the natural split 0,1,2,3 do not work. Therefore, at the first level I am setting up split through PCFieldSplitSetIS(pc, ?i", myis[i]); How could I know the sub ISs and set it up correctly? Thanks. Qi Tang T5 at LANL -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From joauma.marichal at uclouvain.be Tue Mar 15 08:33:08 2022 From: joauma.marichal at uclouvain.be (Joauma Marichal) Date: Tue, 15 Mar 2022 13:33:08 +0000 Subject: [petsc-users] DMSwarm Message-ID: Hello, I am writing to you as I am trying to implement a Lagrangian Particle Tracking method to my eulerian solver that relies on a 3D DMDA. To that end, I want to use the DMSwarm library but cannot find much documentation on it. Is there any examples that you would recommend for this specific application? I understood the very basics but do not really understand how to use the following fields: DMSwarm_pid, DMSwarmPIC_coor and DMSwarm_cellid. I also understood that particles could be moved from one processor to another using DMSwarm_rank and the migrate functions. However, is there any way to link directly the coordinates of my particle to the processor on which it should be stored? Thanks a lot for your help. Best regards, Joauma -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Mar 15 08:39:53 2022 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 15 Mar 2022 09:39:53 -0400 Subject: [petsc-users] DMSwarm In-Reply-To: References: Message-ID: On Tue, Mar 15, 2022 at 9:33 AM Joauma Marichal < joauma.marichal at uclouvain.be> wrote: > Hello, > > I am writing to you as I am trying to implement a Lagrangian Particle > Tracking method to my eulerian solver that relies on a 3D DMDA. To that > end, I want to use the DMSwarm library but cannot find much documentation > on it. Is there any examples that you would recommend for this specific > application? I understood the very basics but do not really understand how > to use the following fields: DMSwarm_pid, DMSwarmPIC_coor and > DMSwarm_cellid. > I also understood that particles could be moved from one processor to > another using DMSwarm_rank and the migrate functions. However, is there any > way to link directly the coordinates of my particle to the processor on > which it should be stored? > I was trying to do a similar thing. Here is my attempt: https://gitlab.com/petsc/petsc/-/blob/main/src/ts/tutorials/ex77.c It is simple, but hopefully how you integrate the particles in is somewhat clear. DMSwarmPIC_coor is the coordinate field, and it is updated by the user. Then you call DMSwarmMigrate() to move the particles to the appropriate process. DMSwarm_cellid is the cell number of the cell that contains each particle. It is updated by Migrate(). Thanks, Matt > Thanks a lot for your help. > > Best regards, > Joauma > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From gallanaisaias at gmail.com Tue Mar 15 10:09:15 2022 From: gallanaisaias at gmail.com (=?UTF-8?Q?Isa=C3=ADas_Gallana?=) Date: Tue, 15 Mar 2022 12:09:15 -0300 Subject: [petsc-users] Petsc with Address Sanitizer Message-ID: Hi everyone. I would like to compile petsc with a few of the Google Sanitizers, in particular ASAN. I am using petsc-3.16.4, but I guess that is not relevant. I have used for other projects -fsanitize=address -fsanitize=undefined -fno-sanitize-recover=all -fsanitize=float-divide-by-zero -fsanitize=float-cast-overflow -fno-sanitize=null -fno-sanitize=alignment as shown in https://developers.redhat.com/blog/2021/05/05/memory-error-checking-in-c-and-c-comparing-sanitizers-and-valgrind#tldr. I could not find the proper way to do that, and I do not even know if that is possible. Does anyone know about this? What is the proper configure line? Thanks a lot -- Isa?as -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacob.fai at gmail.com Tue Mar 15 10:13:40 2022 From: jacob.fai at gmail.com (Jacob Faibussowitsch) Date: Tue, 15 Mar 2022 11:13:40 -0400 Subject: [petsc-users] Petsc with Address Sanitizer In-Reply-To: References: Message-ID: <8F4D50AE-7BA4-4CA2-8C50-FF37E29CC804@gmail.com> $ ./configure --COPTFLAGS=?-fsanitize=address -fsanitize=undefined -fno-sanitize-recover=all -fsanitize=float-divide-by-zero -fsanitize=float-cast-overflow -fno-sanitize=null -fno-sanitize=alignment? --CXXOPTFLAGS=?-fsanitize=address -fsanitize=undefined -fno-sanitize-recover=all -fsanitize=float-divide-by-zero -fsanitize=float-cast-overflow -fno-sanitize=null -fno-sanitize=alignment" Should do what you want. Note that as of the latest release PETSc is not exactly -fsanitize=undefined clean. We have integrated -fsanitize=address into CI but have yet to do the same for the other checkers. Best regards, Jacob Faibussowitsch (Jacob Fai - booss - oh - vitch) > On Mar 15, 2022, at 11:09, Isa?as Gallana wrote: > > Hi everyone. > > I would like to compile petsc with a few of the Google Sanitizers, in particular ASAN. > I am using petsc-3.16.4, but I guess that is not relevant. > > I have used for other projects > > -fsanitize=address -fsanitize=undefined -fno-sanitize-recover=all -fsanitize=float-divide-by-zero -fsanitize=float-cast-overflow -fno-sanitize=null -fno-sanitize=alignment > > as shown in https://developers.redhat.com/blog/2021/05/05/memory-error-checking-in-c-and-c-comparing-sanitizers-and-valgrind#tldr. > > I could not find the proper way to do that, and I do not even know if that is possible. > > Does anyone know about this? > What is the proper configure line? > > Thanks a lot > > > > > -- > Isa?as -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitesh.shah at amag.at Tue Mar 15 10:25:51 2022 From: vitesh.shah at amag.at (Shah Vitesh) Date: Tue, 15 Mar 2022 15:25:51 +0000 Subject: [petsc-users] Petsc configure with download hdf5 fortrtan bindings Message-ID: Hello, I am on a linux mahine which has no internet connection. I am trying to install PETSc to use it in a crystal plasticity software. I am trying to follow a script given to me for PETSc installation that would work with this software. In the script during the configuration there is an option: --download-hdf5-fortran-bindings=1 What does this option exactly do? And is it possible that I download these hdf5 fortran bindings and then point the PETSc Configure tot he download path? I see that it is possible to do for other packages like superlu etc. If yes, any ideas on where can these fortran bindings be downloaded from? Looking forward to hearing from you. With regards, Vitesh Shah AMAG ist am Standort Ranshofen nach dem ASI Chain of Custody Standard zertifiziert - weitere Informationen finden Sie unter diesem Link. R?ckfragen dazu sowie zu unseren ASI-zertifizierten Produkten bitte an asi at amag.at AMAG is certified according to the ASI Chain of Custody Standard at the Ranshofen location - further information can be found at this Link. If you have any questions about this or about our ASI-certified products, please send them to asi at amag.at -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Tue Mar 15 10:36:58 2022 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 15 Mar 2022 10:36:58 -0500 (CDT) Subject: [petsc-users] Petsc configure with download hdf5 fortrtan bindings In-Reply-To: References: Message-ID: <8ddf3770-dd1a-cfa-dfb-b24df8479b3@mcs.anl.gov> On Tue, 15 Mar 2022, Shah Vitesh via petsc-users wrote: > Hello, > > I am on a linux mahine which has no internet connection. I am trying to install PETSc to use it in a crystal plasticity software. > I am trying to follow a script given to me for PETSc installation that would work with this software. In the script during the configuration there is an option: > > --download-hdf5-fortran-bindings=1 > > What does this option exactly do? Its a modifier to --download-hdf5 - i.e it enables fortran-bindings in the hdf5 build. > And is it possible that I download these hdf5 fortran bindings and then point the PETSc Configure tot he download path? I see that it is possible to do for other packages like superlu etc. Yes. > If yes, any ideas on where can these fortran bindings be downloaded from? Its not a separate download of fortran bindings [but modifier to --download-hdf5] Here is what you would do: >>>> balay at sb /home/balay/petsc (release=) $ ./configure --download-hdf5=1 --download-hdf5-fortran-bindings=1 --with-packages-download-dir=$HOME/tmp ============================================================================================= Configuring PETSc to compile on your system ============================================================================================= Download the following packages to /home/balay/tmp hdf5 ['https://support.hdfgroup.org/ftp/HDF5/releases/hdf5-1.12/hdf5-1.12.1/src/hdf5-1.12.1.tar.bz2', 'http://ftp.mcs.anl.gov/pub/petsc/externalpackages/hdf5-1.12.1.tar.bz2'] Then run the script again balay at sb /home/balay/petsc (release=) $ cd $HOME/tmp balay at sb /home/balay/tmp $ wget -q https://support.hdfgroup.org/ftp/HDF5/releases/hdf5-1.12/hdf5-1.12.1/src/hdf5-1.12.1.tar.bz2 balay at sb /home/balay/tmp $ ls -l total 9500 -rw-r--r--. 1 balay balay 9724309 Jul 6 2021 hdf5-1.12.1.tar.bz2 balay at sb /home/balay/tmp $ cd - /home/balay/petsc balay at sb /home/balay/petsc (release=) $ ./configure --download-hdf5=1 --download-hdf5-fortran-bindings=1 --with-packages-download-dir=$HOME/tmp <<<<<< Satish From balay at mcs.anl.gov Tue Mar 15 10:39:31 2022 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 15 Mar 2022 10:39:31 -0500 (CDT) Subject: [petsc-users] Petsc with Address Sanitizer In-Reply-To: <8F4D50AE-7BA4-4CA2-8C50-FF37E29CC804@gmail.com> References: <8F4D50AE-7BA4-4CA2-8C50-FF37E29CC804@gmail.com> Message-ID: Likely configure will fail when all these options are used. Alternative: - run 'configure' normally - run 'make CFLAGS="-fsanitize=address -fsanitize=undefined..." And its best to do this with petsc/main branch. And if you are able to find issues and fix them - you can submit merge requests with fixes at https://gitlab.com/petsc/petsc/-/merge_requests Satish On Tue, 15 Mar 2022, Jacob Faibussowitsch wrote: > $ ./configure --COPTFLAGS=?-fsanitize=address -fsanitize=undefined -fno-sanitize-recover=all -fsanitize=float-divide-by-zero -fsanitize=float-cast-overflow -fno-sanitize=null -fno-sanitize=alignment? --CXXOPTFLAGS=?-fsanitize=address -fsanitize=undefined -fno-sanitize-recover=all -fsanitize=float-divide-by-zero -fsanitize=float-cast-overflow -fno-sanitize=null -fno-sanitize=alignment" > > Should do what you want. Note that as of the latest release PETSc is not exactly -fsanitize=undefined clean. We have integrated -fsanitize=address into CI but have yet to do the same for the other checkers. > > Best regards, > > Jacob Faibussowitsch > (Jacob Fai - booss - oh - vitch) > > > On Mar 15, 2022, at 11:09, Isa?as Gallana wrote: > > > > Hi everyone. > > > > I would like to compile petsc with a few of the Google Sanitizers, in particular ASAN. > > I am using petsc-3.16.4, but I guess that is not relevant. > > > > I have used for other projects > > > > -fsanitize=address -fsanitize=undefined -fno-sanitize-recover=all -fsanitize=float-divide-by-zero -fsanitize=float-cast-overflow -fno-sanitize=null -fno-sanitize=alignment > > > > as shown in https://developers.redhat.com/blog/2021/05/05/memory-error-checking-in-c-and-c-comparing-sanitizers-and-valgrind#tldr. > > > > I could not find the proper way to do that, and I do not even know if that is possible. > > > > Does anyone know about this? > > What is the proper configure line? > > > > Thanks a lot > > > > > > > > > > -- > > Isa?as > > From nabw91 at gmail.com Tue Mar 15 20:58:06 2022 From: nabw91 at gmail.com (=?UTF-8?Q?Nicol=C3=A1s_Barnafi?=) Date: Wed, 16 Mar 2022 02:58:06 +0100 Subject: [petsc-users] Arbitrary ownership IS for a matrix In-Reply-To: References: <6F08A21C-B411-4B23-93A5-22800E861014@petsc.dev> Message-ID: Hello, sorry to bring back this issue. I am observing some behavior that I don't understand to try to debug our code, so my question is: what happens when to set values to a matrix from dofs that don't belong to the processor? I.e. if processor 0 has [0 1 2] and proc 1 has dofs [3 4 5], if I set the value in position (3,3) in proc 0, does this not complain during assemble as long as I preallocated sufficient rows, even if these do not coincide with the ones from MatSetSizes? Thanks in advance, Nicolas On Thu, Mar 10, 2022 at 2:50 AM Nicol?s Barnafi wrote: > > Thank you both very much, it is exactly what I needed. > > Best regards > > On Wed, Mar 9, 2022, 21:19 Matthew Knepley wrote: >> >> On Wed, Mar 9, 2022 at 5:13 PM Barry Smith wrote: >>> >>> >>> You need to do a mapping of your global numbering to the standard PETSc numbering and use the PETSc numbering for all access to vectors and matrices. >>> >>> https://petsc.org/release/docs/manualpages/AO/AOCreate.html provides one approach to managing the renumbering. >> >> >> You can think of this as the mapping to offsets that you would need in any event to store your values (they could not be directly addressed with your random indices). >> >> Thanks, >> >> Matt >> >>> >>> Barry >>> >>> >>> On Mar 9, 2022, at 3:42 PM, Nicol?s Barnafi wrote: >>> >>> Hi community, >>> >>> I have an application with polytopal meshes (elements of arbitrary shape) where the distribution of dofs is not PETSc friendly, meaning that it is not true that cpu0 owns dofs [0,a), then cpu1 owns [a,b) and so on, but instead the distribution is in fact random. Another important detail is that boundary dofs are shared, meaning that if dof 150 is on the boundary, each subdomain vector has dof 150. >>> >>> Under this considerations: >>> >>> i) Is it possible to give an arbitrary mapping to the matrix structure or is the blocked distribution hard coded? >>> ii) Are the repeated boundary dofs an issue when computing a Fieldsplit preconditioner in parallel? >>> >>> Best regards, >>> Nicolas >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ -- Nicol?s Alejandro Barnafi Wittwer From EPrudencio at slb.com Wed Mar 16 00:03:41 2022 From: EPrudencio at slb.com (Ernesto Prudencio) Date: Wed, 16 Mar 2022 05:03:41 +0000 Subject: [petsc-users] Two simple questions on building Message-ID: Hi. I have an application that uses MKL for some convolution operations. Such MKL functionality uses, I suppose, BLAS/LAPACK underneath. This same application of mine also uses PETSc for other purposes. I can supply blas and lapack to PETSc in two ways: 1. Using the configuration option--with-blaslapack-lib="-L${MKL_DIR}/lib/intel64 -lfile1 -lfile2 ... ". For reasons related to compilation environments + docker images + cloud, I am having issues with this option (a) _after_ PETSc builds successfully (both make and make install work fine). 2. Using the configuration option --download-fblaslapack=yes. This options works fine for the purpose of generating my application executable. If I use option (b), I understand that I will have two different blas/lapack codes available during the execution of my application: one from MKL, the other being the one that PETSc downloads during its configuration. Question 1) Do you foresee any potential run time issue with option (b)? Question 2) In the case PETSc, is there any problem if run "make" and "make install" without specifying PETSC_ARCH? Thank you in advance, Ernesto. Schlumberger-Private -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Mar 16 05:20:33 2022 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 16 Mar 2022 06:20:33 -0400 Subject: [petsc-users] Arbitrary ownership IS for a matrix In-Reply-To: References: <6F08A21C-B411-4B23-93A5-22800E861014@petsc.dev> Message-ID: On Tue, Mar 15, 2022 at 9:58 PM Nicol?s Barnafi wrote: > Hello, sorry to bring back this issue. > > I am observing some behavior that I don't understand to try to debug > our code, so my question is: what happens when to set values to a > matrix from dofs that don't belong to the processor? I.e. if processor > 0 has [0 1 2] and proc 1 has dofs [3 4 5], if I set the value in > position (3,3) in proc 0, does this not complain during assemble as > long as I preallocated sufficient rows, even if these do not coincide > with the ones from MatSetSizes? > Here is what happens. When you call MatSetValues(), if the value is for an off-process row, it is stored in a MatStash object. When you call MatAssemblyBegin(), those values are sent to the correct process, and inserted with the normal call. If there is insufficient allocation, you would get an error at that time. THanks, Matt > Thanks in advance, > Nicolas > > On Thu, Mar 10, 2022 at 2:50 AM Nicol?s Barnafi wrote: > > > > Thank you both very much, it is exactly what I needed. > > > > Best regards > > > > On Wed, Mar 9, 2022, 21:19 Matthew Knepley wrote: > >> > >> On Wed, Mar 9, 2022 at 5:13 PM Barry Smith wrote: > >>> > >>> > >>> You need to do a mapping of your global numbering to the standard > PETSc numbering and use the PETSc numbering for all access to vectors and > matrices. > >>> > >>> https://petsc.org/release/docs/manualpages/AO/AOCreate.html > provides one approach to managing the renumbering. > >> > >> > >> You can think of this as the mapping to offsets that you would need in > any event to store your values (they could not be directly addressed with > your random indices). > >> > >> Thanks, > >> > >> Matt > >> > >>> > >>> Barry > >>> > >>> > >>> On Mar 9, 2022, at 3:42 PM, Nicol?s Barnafi wrote: > >>> > >>> Hi community, > >>> > >>> I have an application with polytopal meshes (elements of arbitrary > shape) where the distribution of dofs is not PETSc friendly, meaning that > it is not true that cpu0 owns dofs [0,a), then cpu1 owns [a,b) and so on, > but instead the distribution is in fact random. Another important detail is > that boundary dofs are shared, meaning that if dof 150 is on the boundary, > each subdomain vector has dof 150. > >>> > >>> Under this considerations: > >>> > >>> i) Is it possible to give an arbitrary mapping to the matrix structure > or is the blocked distribution hard coded? > >>> ii) Are the repeated boundary dofs an issue when computing a > Fieldsplit preconditioner in parallel? > >>> > >>> Best regards, > >>> Nicolas > >>> > >>> > >> > >> > >> -- > >> What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > >> -- Norbert Wiener > >> > >> https://www.cse.buffalo.edu/~knepley/ > > > > -- > Nicol?s Alejandro Barnafi Wittwer > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Mar 16 05:45:20 2022 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 16 Mar 2022 06:45:20 -0400 Subject: [petsc-users] Two simple questions on building In-Reply-To: References: Message-ID: On Wed, Mar 16, 2022 at 1:04 AM Ernesto Prudencio via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi. > > > > I have an application that uses MKL for some convolution operations. Such > MKL functionality uses, I suppose, BLAS/LAPACK underneath. > > > > This same application of mine also uses PETSc for other purposes. I can > supply blas and lapack to PETSc in two ways: > > 1. Using the configuration > option--with-blaslapack-lib="-L${MKL_DIR}/lib/intel64 -lfile1 -lfile2 ? ". > For reasons related to compilation environments + docker images + cloud, I > am having issues with this option (a) _*after*_ PETSc builds > successfully (both make and make install work fine). > 2. Using the configuration option --download-fblaslapack=yes. This > options works fine for the purpose of generating my application executable. > > > > If I use option (b), I understand that I will have two different > blas/lapack codes available during the execution of my application: one > from MKL, the other being the one that PETSc downloads during its > configuration. > > > > Question 1) Do you foresee any potential run time issue with option (b)? > All those BLAS/LAPACK functions have the same name. If MKL does something slightly different in one, you could have problems. The annoying thing is that it will probably work 99% of the time. What problem do you have with a)? > Question 2) In the case PETSc, is there any problem if run ?make? and > ?make install? without specifying PETSC_ARCH? > It will choose an ARCH if you do not specify one. Thanks, Matt > > > Thank you in advance, > > > > Ernesto. > > Schlumberger-Private > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From EPrudencio at slb.com Wed Mar 16 08:30:07 2022 From: EPrudencio at slb.com (Ernesto Prudencio) Date: Wed, 16 Mar 2022 13:30:07 +0000 Subject: [petsc-users] [Ext] Re: Two simple questions on building In-Reply-To: References: Message-ID: Thanks, Matt. Regarding the blas/lapack used by PETSc via the -download-fblaslapack configuration option: are the libraries consumed as .a or as .so? My question was related to the situation where they would be .so, with my LD_LIBRARY_PATH pointing first to the MKL path and then to the PETSc path. Would that cause PETSc to use the blas/lapack from MKL at run time, instead of the blas/lapack use at compilation time? Regarding your question on (a): at the end of the PETSc building process, a docker image has to be created, but then there are some soft links in the supplied MKL library which are not resolved. The problem boils down to details on where the MKL is actually located, and the overall compilation environment used for all software, where certain rules have to be enforced for all. Cheers, Ernesto. From: Matthew Knepley Sent: Wednesday, March 16, 2022 5:45 AM To: Ernesto Prudencio Cc: PETSc users list Subject: [Ext] Re: [petsc-users] Two simple questions on building On Wed, Mar 16, 2022 at 1:04 AM Ernesto Prudencio via petsc-users > wrote: Hi. I have an application that uses MKL for some convolution operations. Such MKL functionality uses, I suppose, BLAS/LAPACK underneath. This same application of mine also uses PETSc for other purposes. I can supply blas and lapack to PETSc in two ways: 1. Using the configuration option--with-blaslapack-lib="-L${MKL_DIR}/lib/intel64 -lfile1 -lfile2 ... ". For reasons related to compilation environments + docker images + cloud, I am having issues with this option (a) _after_ PETSc builds successfully (both make and make install work fine). 2. Using the configuration option --download-fblaslapack=yes. This options works fine for the purpose of generating my application executable. If I use option (b), I understand that I will have two different blas/lapack codes available during the execution of my application: one from MKL, the other being the one that PETSc downloads during its configuration. Question 1) Do you foresee any potential run time issue with option (b)? All those BLAS/LAPACK functions have the same name. If MKL does something slightly different in one, you could have problems. The annoying thing is that it will probably work 99% of the time. What problem do you have with a)? Question 2) In the case PETSc, is there any problem if run "make" and "make install" without specifying PETSC_ARCH? It will choose an ARCH if you do not specify one. Thanks, Matt Thank you in advance, Ernesto. Schlumberger-Private -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ Schlumberger-Private -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Mar 16 08:56:16 2022 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 16 Mar 2022 09:56:16 -0400 Subject: [petsc-users] [Ext] Re: Two simple questions on building In-Reply-To: References: Message-ID: On Wed, Mar 16, 2022 at 9:30 AM Ernesto Prudencio via petsc-users < petsc-users at mcs.anl.gov> wrote: > Thanks, Matt. > > > Regarding the blas/lapack used by PETSc via the ?download-fblaslapack > configuration option: are the libraries consumed as .a or as .so? > Those are .a > My question was related to the situation where they would be .so, with my > LD_LIBRARY_PATH pointing first to the MKL path and then to the PETSc path. > Would that cause PETSc to use the blas/lapack from MKL at run time, instead > of the blas/lapack use at compilation time? > It would, but since BLAS/LAPACK does not have a standard ABI there can still be problems I think. > Regarding your question on (a): at the end of the PETSc building process, > a docker image has to be created, but then there are some soft links in the > supplied MKL library which are not resolved. The problem boils down to > details on where the MKL is actually located, and the overall compilation > environment used for all software, where certain rules have to be enforced > for all. > Oh, yes for a container you would have to install MKL into the container just as you would install it anywhere else. Then install PETSc in the container. I have done that for another project and got it to work. Thanks, Matt > Cheers, > > > > Ernesto. > > > > *From:* Matthew Knepley > *Sent:* Wednesday, March 16, 2022 5:45 AM > *To:* Ernesto Prudencio > *Cc:* PETSc users list > *Subject:* [Ext] Re: [petsc-users] Two simple questions on building > > > > On Wed, Mar 16, 2022 at 1:04 AM Ernesto Prudencio via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hi. > > > > I have an application that uses MKL for some convolution operations. Such > MKL functionality uses, I suppose, BLAS/LAPACK underneath. > > > > This same application of mine also uses PETSc for other purposes. I can > supply blas and lapack to PETSc in two ways: > > 1. Using the configuration > option--with-blaslapack-lib="-L${MKL_DIR}/lib/intel64 -lfile1 -lfile2 ? ". > For reasons related to compilation environments + docker images + cloud, I > am having issues with this option (a) _*after*_ PETSc builds > successfully (both make and make install work fine). > 2. Using the configuration option --download-fblaslapack=yes. This > options works fine for the purpose of generating my application executable. > > > > If I use option (b), I understand that I will have two different > blas/lapack codes available during the execution of my application: one > from MKL, the other being the one that PETSc downloads during its > configuration. > > > > Question 1) Do you foresee any potential run time issue with option (b)? > > > > All those BLAS/LAPACK functions have the same name. If MKL does something > slightly different in one, you could have problems. The annoying thing is > that it will probably work 99% of the time. > > > > What problem do you have with a)? > > > > Question 2) In the case PETSc, is there any problem if run ?make? and > ?make install? without specifying PETSC_ARCH? > > > > It will choose an ARCH if you do not specify one. > > > > Thanks, > > > > Matt > > > > > > Thank you in advance, > > > > Ernesto. > > > > Schlumberger-Private > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > Schlumberger-Private > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Wed Mar 16 09:17:10 2022 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 16 Mar 2022 09:17:10 -0500 (CDT) Subject: [petsc-users] [Ext] Re: Two simple questions on building In-Reply-To: References: Message-ID: <4895cb5f-e2a6-fc1f-404e-58121b7858b0@mcs.anl.gov> On Wed, 16 Mar 2022, Matthew Knepley wrote: > On Wed, Mar 16, 2022 at 9:30 AM Ernesto Prudencio via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > Thanks, Matt. > > > > > > Regarding the blas/lapack used by PETSc via the ?download-fblaslapack > > configuration option: are the libraries consumed as .a or as .so? > > > > Those are .a However - note that petsc library build defaults to shared - so it links in with these blas .a files [so a copy gets into libpetsc.so]. Alternative is to build PETSc --with-shared-libraries=0 Another alternative is to just install system blas/lapack in the container - and use that instead of download-fblaslapack > > > > My question was related to the situation where they would be .so, with my > > LD_LIBRARY_PATH pointing first to the MKL path and then to the PETSc path. > > Would that cause PETSc to use the blas/lapack from MKL at run time, instead > > of the blas/lapack use at compilation time? I would think the first resolved symbol should get used. Also one can also use LD_PRELOAD to force loading MKL libraries first. > > > > It would, but since BLAS/LAPACK does not have a standard ABI there can > still be problems I think. > > > > Regarding your question on (a): at the end of the PETSc building process, > > a docker image has to be created, but then there are some soft links in the > > supplied MKL library which are not resolved. The problem boils down to > > details on where the MKL is actually located, and the overall compilation > > environment used for all software, where certain rules have to be enforced > > for all. > > > > Oh, yes for a container you would have to install MKL into the container > just as you would install it anywhere else. Then install PETSc in the > container. > I have done that for another project and got it to work. Yeah - its best to figure out the issue here - and build PETSc with MKL. Perhaps LD_PRELOAD will help here. I would think LD_LIBRARY_PATH should enable the resolution of the paths [should be verifiable with ldd] - or perhaps there is some other subtle issue here. Satish > > Thanks, > > Matt > > > > Cheers, > > > > > > > > Ernesto. > > > > > > > > *From:* Matthew Knepley > > *Sent:* Wednesday, March 16, 2022 5:45 AM > > *To:* Ernesto Prudencio > > *Cc:* PETSc users list > > *Subject:* [Ext] Re: [petsc-users] Two simple questions on building > > > > > > > > On Wed, Mar 16, 2022 at 1:04 AM Ernesto Prudencio via petsc-users < > > petsc-users at mcs.anl.gov> wrote: > > > > Hi. > > > > > > > > I have an application that uses MKL for some convolution operations. Such > > MKL functionality uses, I suppose, BLAS/LAPACK underneath. > > > > > > > > This same application of mine also uses PETSc for other purposes. I can > > supply blas and lapack to PETSc in two ways: > > > > 1. Using the configuration > > option--with-blaslapack-lib="-L${MKL_DIR}/lib/intel64 -lfile1 -lfile2 ? ". > > For reasons related to compilation environments + docker images + cloud, I > > am having issues with this option (a) _*after*_ PETSc builds > > successfully (both make and make install work fine). > > 2. Using the configuration option --download-fblaslapack=yes. This > > options works fine for the purpose of generating my application executable. > > > > > > > > If I use option (b), I understand that I will have two different > > blas/lapack codes available during the execution of my application: one > > from MKL, the other being the one that PETSc downloads during its > > configuration. > > > > > > > > Question 1) Do you foresee any potential run time issue with option (b)? > > > > > > > > All those BLAS/LAPACK functions have the same name. If MKL does something > > slightly different in one, you could have problems. The annoying thing is > > that it will probably work 99% of the time. > > > > > > > > What problem do you have with a)? > > > > > > > > Question 2) In the case PETSc, is there any problem if run ?make? and > > ?make install? without specifying PETSC_ARCH? > > > > > > > > It will choose an ARCH if you do not specify one. > > > > > > > > Thanks, > > > > > > > > Matt > > > > > > > > > > > > Thank you in advance, > > > > > > > > Ernesto. > > > > > > > > Schlumberger-Private > > > > > > > > > > -- > > > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which their > > experiments lead. > > -- Norbert Wiener > > > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > Schlumberger-Private > > > > > From balay at mcs.anl.gov Wed Mar 16 09:25:40 2022 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 16 Mar 2022 09:25:40 -0500 (CDT) Subject: [petsc-users] Two simple questions on building In-Reply-To: References: Message-ID: <32c3974b-b0d-89ad-e1e0-e6caefd93d74@mcs.anl.gov> On Wed, 16 Mar 2022, Matthew Knepley wrote: > > Question 2) In the case PETSc, is there any problem if run ?make? and > > ?make install? without specifying PETSC_ARCH? > > > > It will choose an ARCH if you do not specify one. With a prefix build - think of PETSC_ARCH as an intermediate build location - before the final install at the prefix location. And after 'make install' one should point PETSC_DIR to this prefix location [if using PETSc formatted makefiles] - and PETSC_ARCH is not used (or required) anymore. Satish From vitesh.shah at amag.at Wed Mar 16 10:19:58 2022 From: vitesh.shah at amag.at (Shah Vitesh) Date: Wed, 16 Mar 2022 15:19:58 +0000 Subject: [petsc-users] Petsc configure with download hdf5 fortrtan bindings In-Reply-To: <8ddf3770-dd1a-cfa-dfb-b24df8479b3@mcs.anl.gov> References: <8ddf3770-dd1a-cfa-dfb-b24df8479b3@mcs.anl.gov> Message-ID: Hello, Thanks a lot fort he detailed instructions. It worked form e! With regards, Vitesh -----Urspr?ngliche Nachricht----- Von: Satish Balay Gesendet: Dienstag, 15. M?rz 2022 16:37 An: Shah Vitesh Cc: petsc-users at mcs.anl.gov Betreff: Re: [petsc-users] Petsc configure with download hdf5 fortrtan bindings On Tue, 15 Mar 2022, Shah Vitesh via petsc-users wrote: > Hello, > > I am on a linux mahine which has no internet connection. I am trying to install PETSc to use it in a crystal plasticity software. > I am trying to follow a script given to me for PETSc installation that would work with this software. In the script during the configuration there is an option: > > --download-hdf5-fortran-bindings=1 > > What does this option exactly do? Its a modifier to --download-hdf5 - i.e it enables fortran-bindings in the hdf5 build. > And is it possible that I download these hdf5 fortran bindings and then point the PETSc Configure tot he download path? I see that it is possible to do for other packages like superlu etc. Yes. > If yes, any ideas on where can these fortran bindings be downloaded from? Its not a separate download of fortran bindings [but modifier to --download-hdf5] Here is what you would do: >>>> balay at sb /home/balay/petsc (release=) $ ./configure --download-hdf5=1 --download-hdf5-fortran-bindings=1 --with-packages-download-dir=$HOME/tmp ============================================================================================= Configuring PETSc to compile on your system ============================================================================================= Download the following packages to /home/balay/tmp hdf5 ['https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsupport.hdfgroup.org%2Fftp%2FHDF5%2Freleases%2Fhdf5-1.12%2Fhdf5-1.12.1%2Fsrc%2Fhdf5-1.12.1.tar.bz2&data=04%7C01%7Cvitesh.shah%40amag.at%7Cf5245d541b2a41aff57708da0699a78b%7Ceb56916191974b369057efd40e40ee4c%7C1%7C0%7C637829557373446021%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=fXDI%2FxqjOwZ5NTron5tIyCA33%2BP9P1UbTPifPhNPJ%2FI%3D&reserved=0', 'https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fftp.mcs.anl.gov%2Fpub%2Fpetsc%2Fexternalpackages%2Fhdf5-1.12.1.tar.bz2&data=04%7C01%7Cvitesh.shah%40amag.at%7Cf5245d541b2a41aff57708da0699a78b%7Ceb56916191974b369057efd40e40ee4c%7C1%7C0%7C637829557373446021%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=m9AewLIcgPRu885K6VZxogpCvxbh2D8uubIhCZxqTtk%3D&reserved=0'] Then run the script again balay at sb /home/balay/petsc (release=) $ cd $HOME/tmp balay at sb /home/balay/tmp $ wget -q https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsupport.hdfgroup.org%2Fftp%2FHDF5%2Freleases%2Fhdf5-1.12%2Fhdf5-1.12.1%2Fsrc%2Fhdf5-1.12.1.tar.bz2&data=04%7C01%7Cvitesh.shah%40amag.at%7Cf5245d541b2a41aff57708da0699a78b%7Ceb56916191974b369057efd40e40ee4c%7C1%7C0%7C637829557373446021%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=fXDI%2FxqjOwZ5NTron5tIyCA33%2BP9P1UbTPifPhNPJ%2FI%3D&reserved=0 balay at sb /home/balay/tmp $ ls -l total 9500 -rw-r--r--. 1 balay balay 9724309 Jul 6 2021 hdf5-1.12.1.tar.bz2 balay at sb /home/balay/tmp $ cd - /home/balay/petsc balay at sb /home/balay/petsc (release=) $ ./configure --download-hdf5=1 --download-hdf5-fortran-bindings=1 --with-packages-download-dir=$HOME/tmp <<<<<< Satish From balay at mcs.anl.gov Wed Mar 16 10:24:05 2022 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 16 Mar 2022 10:24:05 -0500 (CDT) Subject: [petsc-users] Petsc configure with download hdf5 fortrtan bindings In-Reply-To: References: <8ddf3770-dd1a-cfa-dfb-b24df8479b3@mcs.anl.gov> Message-ID: <68a86cf0-adaa-522a-3494-f77f527c665c@mcs.anl.gov> Glad it worked. Thanks for the update! Satish On Wed, 16 Mar 2022, Shah Vitesh via petsc-users wrote: > Hello, > > Thanks a lot fort he detailed instructions. It worked form e! > > With regards, > Vitesh > > -----Urspr?ngliche Nachricht----- > Von: Satish Balay > Gesendet: Dienstag, 15. M?rz 2022 16:37 > An: Shah Vitesh > Cc: petsc-users at mcs.anl.gov > Betreff: Re: [petsc-users] Petsc configure with download hdf5 fortrtan bindings > > On Tue, 15 Mar 2022, Shah Vitesh via petsc-users wrote: > > > Hello, > > > > I am on a linux mahine which has no internet connection. I am trying to install PETSc to use it in a crystal plasticity software. > > I am trying to follow a script given to me for PETSc installation that would work with this software. In the script during the configuration there is an option: > > > > --download-hdf5-fortran-bindings=1 > > > > What does this option exactly do? > > Its a modifier to --download-hdf5 - i.e it enables fortran-bindings in the hdf5 build. > > > And is it possible that I download these hdf5 fortran bindings and then point the PETSc Configure tot he download path? I see that it is possible to do for other packages like superlu etc. > > Yes. > > > > If yes, any ideas on where can these fortran bindings be downloaded from? > > Its not a separate download of fortran bindings [but modifier to --download-hdf5] > > Here is what you would do: > > >>>> > balay at sb /home/balay/petsc (release=) > $ ./configure --download-hdf5=1 --download-hdf5-fortran-bindings=1 --with-packages-download-dir=$HOME/tmp > ============================================================================================= > Configuring PETSc to compile on your system > ============================================================================================= > Download the following packages to /home/balay/tmp > > hdf5 ['https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsupport.hdfgroup.org%2Fftp%2FHDF5%2Freleases%2Fhdf5-1.12%2Fhdf5-1.12.1%2Fsrc%2Fhdf5-1.12.1.tar.bz2&data=04%7C01%7Cvitesh.shah%40amag.at%7Cf5245d541b2a41aff57708da0699a78b%7Ceb56916191974b369057efd40e40ee4c%7C1%7C0%7C637829557373446021%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=fXDI%2FxqjOwZ5NTron5tIyCA33%2BP9P1UbTPifPhNPJ%2FI%3D&reserved=0', 'https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fftp.mcs.anl.gov%2Fpub%2Fpetsc%2Fexternalpackages%2Fhdf5-1.12.1.tar.bz2&data=04%7C01%7Cvitesh.shah%40amag.at%7Cf5245d541b2a41aff57708da0699a78b%7Ceb56916191974b369057efd40e40ee4c%7C1%7C0%7C637829557373446021%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=m9AewLIcgPRu885K6VZxogpCvxbh2D8uubIhCZxqTtk%3D&reserved=0'] > > Then run the script again > > balay at sb /home/balay/petsc (release=) > $ cd $HOME/tmp > balay at sb /home/balay/tmp > $ wget -q https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsupport.hdfgroup.org%2Fftp%2FHDF5%2Freleases%2Fhdf5-1.12%2Fhdf5-1.12.1%2Fsrc%2Fhdf5-1.12.1.tar.bz2&data=04%7C01%7Cvitesh.shah%40amag.at%7Cf5245d541b2a41aff57708da0699a78b%7Ceb56916191974b369057efd40e40ee4c%7C1%7C0%7C637829557373446021%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=fXDI%2FxqjOwZ5NTron5tIyCA33%2BP9P1UbTPifPhNPJ%2FI%3D&reserved=0 > balay at sb /home/balay/tmp > $ ls -l > total 9500 > -rw-r--r--. 1 balay balay 9724309 Jul 6 2021 hdf5-1.12.1.tar.bz2 balay at sb /home/balay/tmp $ cd - /home/balay/petsc balay at sb /home/balay/petsc (release=) $ ./configure --download-hdf5=1 --download-hdf5-fortran-bindings=1 --with-packages-download-dir=$HOME/tmp > <<<<<< > > Satish > From bsmith at petsc.dev Wed Mar 16 11:14:52 2022 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 16 Mar 2022 12:14:52 -0400 Subject: [petsc-users] Two simple questions on building In-Reply-To: References: Message-ID: <9B3F89EA-7100-4FFF-86E5-72D991761274@petsc.dev> Hi Ernesto, I hope you are doing well. I agree with Satish. It would be best to resolve the issues with the pure MKL approach. Any "hack" that got the libraries to work mixing fblaslapack and MKL would be fragile and untrustworthy. Feel free to email to petsc-maint at mcs.anl.gov the configure.log make.log and failure information in the pure MKL approach so we can take a look at it. Since you doing this in a controlled environment presumably we can even reproduce the problem with enough information on your build process and track down the underlying cause. Barry > On Mar 16, 2022, at 1:03 AM, Ernesto Prudencio via petsc-users wrote: > > Hi. > > I have an application that uses MKL for some convolution operations. Such MKL functionality uses, I suppose, BLAS/LAPACK underneath. > > This same application of mine also uses PETSc for other purposes. I can supply blas and lapack to PETSc in two ways: > Using the configuration option--with-blaslapack-lib="-L${MKL_DIR}/lib/intel64 -lfile1 -lfile2 ? ". For reasons related to compilation environments + docker images + cloud, I am having issues with this option (a) _after_ PETSc builds successfully (both make and make install work fine). > Using the configuration option --download-fblaslapack=yes. This options works fine for the purpose of generating my application executable. > > If I use option (b), I understand that I will have two different blas/lapack codes available during the execution of my application: one from MKL, the other being the one that PETSc downloads during its configuration. > > Question 1) Do you foresee any potential run time issue with option (b)? > > Question 2) In the case PETSc, is there any problem if run ?make? and ?make install? without specifying PETSC_ARCH? > > Thank you in advance, > > Ernesto. > > Schlumberger-Private -------------- next part -------------- An HTML attachment was scrubbed... URL: From EPrudencio at slb.com Wed Mar 16 12:53:36 2022 From: EPrudencio at slb.com (Ernesto Prudencio) Date: Wed, 16 Mar 2022 17:53:36 +0000 Subject: [petsc-users] [Ext] Re: Two simple questions on building In-Reply-To: <9B3F89EA-7100-4FFF-86E5-72D991761274@petsc.dev> References: <9B3F89EA-7100-4FFF-86E5-72D991761274@petsc.dev> Message-ID: Thank you, Barry, Satish, and Matt, for all your answers. Per your feedbacks, we will pursue using MKL as the provider of blas/lapack for PETSc. If we continue to have issues, I will contact you via petsc-maint. I hope you are all doing well also. Very best, Ernesto. Schlumberger-Private From: Barry Smith Sent: Wednesday, March 16, 2022 11:15 AM To: Ernesto Prudencio Cc: PETSc users list Subject: [Ext] Re: [petsc-users] Two simple questions on building Hi Ernesto, I hope you are doing well. I agree with Satish. It would be best to resolve the issues with the pure MKL approach. Any "hack" that got the libraries to work mixing fblaslapack and MKL would be fragile and untrustworthy. Feel free to email to petsc-maint at mcs.anl.gov the configure.log make.log and failure information in the pure MKL approach so we can take a look at it. Since you doing this in a controlled environment presumably we can even reproduce the problem with enough information on your build process and track down the underlying cause. Barry On Mar 16, 2022, at 1:03 AM, Ernesto Prudencio via petsc-users > wrote: Hi. I have an application that uses MKL for some convolution operations. Such MKL functionality uses, I suppose, BLAS/LAPACK underneath. This same application of mine also uses PETSc for other purposes. I can supply blas and lapack to PETSc in two ways: 1. Using the configuration option--with-blaslapack-lib="-L${MKL_DIR}/lib/intel64 -lfile1 -lfile2 ... ". For reasons related to compilation environments + docker images + cloud, I am having issues with this option (a) _after_ PETSc builds successfully (both make and make install work fine). 2. Using the configuration option --download-fblaslapack=yes. This options works fine for the purpose of generating my application executable. If I use option (b), I understand that I will have two different blas/lapack codes available during the execution of my application: one from MKL, the other being the one that PETSc downloads during its configuration. Question 1) Do you foresee any potential run time issue with option (b)? Question 2) In the case PETSc, is there any problem if run "make" and "make install" without specifying PETSC_ARCH? Thank you in advance, Ernesto. Schlumberger-Private -------------- next part -------------- An HTML attachment was scrubbed... URL: From EPrudencio at slb.com Thu Mar 17 13:11:14 2022 From: EPrudencio at slb.com (Ernesto Prudencio) Date: Thu, 17 Mar 2022 18:11:14 +0000 Subject: [petsc-users] One question on configuring / compiling PETSc Message-ID: Hi all. When compiling PETSc with INTEL compilers, we have been using the options "-Ofast -xHost". Is there an equivalent to -xHost for GNU compilers? Thank you in advance, Ernesto. Schlumberger-Private -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Thu Mar 17 13:37:40 2022 From: jed at jedbrown.org (Jed Brown) Date: Thu, 17 Mar 2022 12:37:40 -0600 Subject: [petsc-users] One question on configuring / compiling PETSc In-Reply-To: References: Message-ID: <87czikv30r.fsf@jedbrown.org> -march=native, which is also recognized by the new Intel compilers (icx), which are based on LLVM. Ernesto Prudencio via petsc-users writes: > Hi all. > > When compiling PETSc with INTEL compilers, we have been using the options "-Ofast -xHost". Is there an equivalent to -xHost for GNU compilers? > > Thank you in advance, > > Ernesto. > > > Schlumberger-Private From jacob.fai at gmail.com Thu Mar 17 13:59:31 2022 From: jacob.fai at gmail.com (Jacob Faibussowitsch) Date: Thu, 17 Mar 2022 13:59:31 -0500 Subject: [petsc-users] One question on configuring / compiling PETSc In-Reply-To: <87czikv30r.fsf@jedbrown.org> References: <87czikv30r.fsf@jedbrown.org> Message-ID: <32C6CECA-3632-4F7D-A00E-7F611C8CF5ED@gmail.com> You may also want to specify -mtune=. It is usually redundant (-march implies -mtune), but depending on how old your compiler is/how new your hardware is your compiler may not properly classify your hardware and still tune for ?generic?. Best regards, Jacob Faibussowitsch (Jacob Fai - booss - oh - vitch) > On Mar 17, 2022, at 13:37, Jed Brown wrote: > > ?-march=native, which is also recognized by the new Intel compilers (icx), which are based on LLVM. > > Ernesto Prudencio via petsc-users writes: > >> Hi all. >> >> When compiling PETSc with INTEL compilers, we have been using the options "-Ofast -xHost". Is there an equivalent to -xHost for GNU compilers? >> >> Thank you in advance, >> >> Ernesto. >> >> >> Schlumberger-Private From sasyed at fnal.gov Thu Mar 17 15:46:03 2022 From: sasyed at fnal.gov (Sajid Ali Syed) Date: Thu, 17 Mar 2022 20:46:03 +0000 Subject: [petsc-users] Regarding the status of VecSetValues(Blocked) for GPU vectors Message-ID: Hi PETSc-developers, Is it possible to use VecSetValues with distributed-memory CUDA & Kokkos vectors from the device, i.e. can I call VecSetValues with GPU memory pointers and expect PETSc to figure out how to stash on the device it until I call VecAssemblyBegin (at which point PETSc could use GPU-aware MPI to populate off-process values) ? If this is not currently supported, is supporting this on the roadmap? Thanks in advance! Thank You, Sajid Ali (he/him) | Research Associate Scientific Computing Division Fermi National Accelerator Laboratory s-sajid-ali.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Mar 17 16:18:17 2022 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 17 Mar 2022 17:18:17 -0400 Subject: [petsc-users] Regarding the status of VecSetValues(Blocked) for GPU vectors In-Reply-To: References: Message-ID: <248499B5-63BC-4F3B-99BD-BA903B27F855@petsc.dev> We seem to be emphasizing using MatSetValuesCOO() for GPUs (can also be for CPUs); in the main branch you can find a simple example in src/mat/tutorials/ex18.c which demonstrates its use. Barry > On Mar 17, 2022, at 4:46 PM, Sajid Ali Syed wrote: > > Hi PETSc-developers, > > Is it possible to use VecSetValues with distributed-memory CUDA & Kokkos vectors from the device, i.e. can I call VecSetValues with GPU memory pointers and expect PETSc to figure out how to stash on the device it until I call VecAssemblyBegin (at which point PETSc could use GPU-aware MPI to populate off-process values) ? > > If this is not currently supported, is supporting this on the roadmap? Thanks in advance! > > Thank You, > Sajid Ali (he/him) | Research Associate > Scientific Computing Division > Fermi National Accelerator Laboratory > s-sajid-ali.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Thu Mar 17 16:43:57 2022 From: jed at jedbrown.org (Jed Brown) Date: Thu, 17 Mar 2022 15:43:57 -0600 Subject: [petsc-users] Regarding the status of VecSetValues(Blocked) for GPU vectors In-Reply-To: <248499B5-63BC-4F3B-99BD-BA903B27F855@petsc.dev> References: <248499B5-63BC-4F3B-99BD-BA903B27F855@petsc.dev> Message-ID: <877d8suuea.fsf@jedbrown.org> The question is about vectors. I think it will work, but haven't tested. Barry Smith writes: > We seem to be emphasizing using MatSetValuesCOO() for GPUs (can also be for CPUs); in the main branch you can find a simple example in src/mat/tutorials/ex18.c which demonstrates its use. > > Barry > > >> On Mar 17, 2022, at 4:46 PM, Sajid Ali Syed wrote: >> >> Hi PETSc-developers, >> >> Is it possible to use VecSetValues with distributed-memory CUDA & Kokkos vectors from the device, i.e. can I call VecSetValues with GPU memory pointers and expect PETSc to figure out how to stash on the device it until I call VecAssemblyBegin (at which point PETSc could use GPU-aware MPI to populate off-process values) ? >> >> If this is not currently supported, is supporting this on the roadmap? Thanks in advance! >> >> Thank You, >> Sajid Ali (he/him) | Research Associate >> Scientific Computing Division >> Fermi National Accelerator Laboratory >> s-sajid-ali.github.io From mfadams at lbl.gov Thu Mar 17 17:40:32 2022 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 17 Mar 2022 18:40:32 -0400 Subject: [petsc-users] Regarding the status of VecSetValues(Blocked) for GPU vectors In-Reply-To: <877d8suuea.fsf@jedbrown.org> References: <248499B5-63BC-4F3B-99BD-BA903B27F855@petsc.dev> <877d8suuea.fsf@jedbrown.org> Message-ID: I saw "Mat" also ... we don't have device set values for vectors, I am pretty sure. But it is not hard to do for setting local data. I don't see an easy way to do off-processor data ... we have not gotten to this. See https://petsc.org/main/docs/manualpages/Vec/VecGetArrayWriteAndMemType.html You get a pointer to the raw data and pass that down to your device functions. If it is a device vector it will give you the device data. You then add to it yourself in your kernel. On Thu, Mar 17, 2022 at 5:44 PM Jed Brown wrote: > The question is about vectors. I think it will work, but haven't tested. > > Barry Smith writes: > > > We seem to be emphasizing using MatSetValuesCOO() for GPUs (can also > be for CPUs); in the main branch you can find a simple example in > src/mat/tutorials/ex18.c which demonstrates its use. > > > > Barry > > > > > >> On Mar 17, 2022, at 4:46 PM, Sajid Ali Syed wrote: > >> > >> Hi PETSc-developers, > >> > >> Is it possible to use VecSetValues with distributed-memory CUDA & > Kokkos vectors from the device, i.e. can I call VecSetValues with GPU > memory pointers and expect PETSc to figure out how to stash on the device > it until I call VecAssemblyBegin (at which point PETSc could use GPU-aware > MPI to populate off-process values) ? > >> > >> If this is not currently supported, is supporting this on the roadmap? > Thanks in advance! > >> > >> Thank You, > >> Sajid Ali (he/him) | Research Associate > >> Scientific Computing Division > >> Fermi National Accelerator Laboratory > >> s-sajid-ali.github.io > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Mar 17 18:18:46 2022 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 17 Mar 2022 19:18:46 -0400 Subject: [petsc-users] Regarding the status of VecSetValues(Blocked) for GPU vectors In-Reply-To: References: Message-ID: On Thu, Mar 17, 2022 at 4:46 PM Sajid Ali Syed wrote: > Hi PETSc-developers, > > Is it possible to use VecSetValues with distributed-memory CUDA & Kokkos > vectors from the device, i.e. can I call VecSetValues with GPU memory > pointers and expect PETSc to figure out how to stash on the device it until > I call VecAssemblyBegin (at which point PETSc could use GPU-aware MPI to > populate off-process values) ? > > If this is not currently supported, is supporting this on the roadmap? > Thanks in advance! > VecSetValues() will fall back to the CPU vector, so I do not think this will work on device. Usually, our assembly computes all values and puts them in a "local" vector, which you can access explicitly as Mark said. Then we call LocalToGlobal() to communicate the values, which does work directly on device using specialized code in VecScatter/PetscSF. What are you trying to do? THanks, Matt > Thank You, > Sajid Ali (he/him) | Research Associate > Scientific Computing Division > Fermi National Accelerator Laboratory > s-sajid-ali.github.io > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Mar 17 19:19:37 2022 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 17 Mar 2022 20:19:37 -0400 Subject: [petsc-users] Regarding the status of VecSetValues(Blocked) for GPU vectors In-Reply-To: References: Message-ID: LocalToGlobal is a DM thing.. Sajid, do use DM? If you need to add off procesor entries then DM could give you a local vector as Matt said that you can add to for off procesor values and then you could use the CPU communication in DM. On Thu, Mar 17, 2022 at 7:19 PM Matthew Knepley wrote: > On Thu, Mar 17, 2022 at 4:46 PM Sajid Ali Syed wrote: > >> Hi PETSc-developers, >> >> Is it possible to use VecSetValues with distributed-memory CUDA & Kokkos >> vectors from the device, i.e. can I call VecSetValues with GPU memory >> pointers and expect PETSc to figure out how to stash on the device it until >> I call VecAssemblyBegin (at which point PETSc could use GPU-aware MPI to >> populate off-process values) ? >> >> If this is not currently supported, is supporting this on the roadmap? >> Thanks in advance! >> > > VecSetValues() will fall back to the CPU vector, so I do not think this > will work on device. > > Usually, our assembly computes all values and puts them in a "local" > vector, which you can access explicitly as Mark said. Then > we call LocalToGlobal() to communicate the values, which does work > directly on device using specialized code in VecScatter/PetscSF. > > What are you trying to do? > > THanks, > > Matt > > >> Thank You, >> Sajid Ali (he/him) | Research Associate >> Scientific Computing Division >> Fermi National Accelerator Laboratory >> s-sajid-ali.github.io >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Mar 17 19:25:55 2022 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 17 Mar 2022 20:25:55 -0400 Subject: [petsc-users] Regarding the status of VecSetValues(Blocked) for GPU vectors In-Reply-To: References: Message-ID: On Thu, Mar 17, 2022 at 8:19 PM Mark Adams wrote: > LocalToGlobal is a DM thing.. > Sajid, do use DM? > If you need to add off procesor entries then DM could give you a local > vector as Matt said that you can add to for off procesor values and then > you could use the CPU communication in DM. > It would be GPU communication, not CPU. Matt > On Thu, Mar 17, 2022 at 7:19 PM Matthew Knepley wrote: > >> On Thu, Mar 17, 2022 at 4:46 PM Sajid Ali Syed wrote: >> >>> Hi PETSc-developers, >>> >>> Is it possible to use VecSetValues with distributed-memory CUDA & Kokkos >>> vectors from the device, i.e. can I call VecSetValues with GPU memory >>> pointers and expect PETSc to figure out how to stash on the device it until >>> I call VecAssemblyBegin (at which point PETSc could use GPU-aware MPI to >>> populate off-process values) ? >>> >>> If this is not currently supported, is supporting this on the roadmap? >>> Thanks in advance! >>> >> >> VecSetValues() will fall back to the CPU vector, so I do not think this >> will work on device. >> >> Usually, our assembly computes all values and puts them in a "local" >> vector, which you can access explicitly as Mark said. Then >> we call LocalToGlobal() to communicate the values, which does work >> directly on device using specialized code in VecScatter/PetscSF. >> >> What are you trying to do? >> >> THanks, >> >> Matt >> >> >>> Thank You, >>> Sajid Ali (he/him) | Research Associate >>> Scientific Computing Division >>> Fermi National Accelerator Laboratory >>> s-sajid-ali.github.io >>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From joauma.marichal at uclouvain.be Fri Mar 18 10:02:34 2022 From: joauma.marichal at uclouvain.be (Joauma Marichal) Date: Fri, 18 Mar 2022 15:02:34 +0000 Subject: [petsc-users] DMSwarm Message-ID: Hello, I am writing to you as I am trying to implement a Lagrangian Particle Tracking method to my eulerian solver that relies on a 3D collocated DMDA. I have been using examples to develop a first basic code. The latter creates particles on rank 0 with random coordinates on the whole domain and then migrates them to the rank corresponding to these coordinates. Unfortunately, as I migrate I am loosing some particles. I came to understand that when I create a DMDA with 6 grid points in each 3 directions and then set coordinates in between 0 and 1 using ,DMDASetUniformCoordinates and running on 2 processors, I obtain the following coordinates values on each proc: [Proc 0] X = 0.000000 0.200000 0.400000 0.600000 0.800000 1.000000 [Proc 0] Y = 0.000000 0.200000 0.400000 0.600000 0.800000 1.000000 [Proc 0] Z = 0.000000 0.200000 0.400000 [Proc 1] X = 0.000000 0.200000 0.400000 0.600000 0.800000 1.000000 [Proc 1] Y = 0.000000 0.200000 0.400000 0.600000 0.800000 1.000000 [Proc 1] Z = 0.600000 0.800000 1.000000 . Furthermore, it appears that the particles that I am losing are (in the case of 2 processors) located in between z = 0.4 and z = 0.6. How can this be avoided? I attach my code to this email (I run it using mpirun -np 2 ./cobpor). Furthermore, my actual code relies on a collocated 3D DMDA, however the DMDASetUniformCoordinates seems to be working for staggered grids only... How would you advice to deal with particles in this case? Thanks a lot for your help. Best regards, Joauma -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Particle_test2.zip Type: application/zip Size: 22920 bytes Desc: Particle_test2.zip URL: From sasyed at fnal.gov Fri Mar 18 10:28:37 2022 From: sasyed at fnal.gov (Sajid Ali Syed) Date: Fri, 18 Mar 2022 15:28:37 +0000 Subject: [petsc-users] Regarding the status of VecSetValues(Blocked) for GPU vectors In-Reply-To: References: Message-ID: Hi Matt/Mark, I'm working on a Poisson solver for a distributed PIC code, where the particles are distributed over MPI ranks rather than the grid. Prior to the solve, all particles are deposited onto a (DMDA) grid. The current prototype I have is that each rank holds a full size DMDA vector and particles on that rank are deposited into it. Then, the data from all the local vectors in combined into multiple distributed DMDA vectors via VecScatters and this is followed by solving the Poisson equation. The need to have multiple subcomms, each solving the same equation is due to the fact that the grid size too small to use all the MPI ranks (beyond the strong scaling limit). The solution is then scattered back to each MPI rank via VecScatters. This first local-to-(multi)global transfer required the use of multiple VecScatters as there is no one-to-multiple scatter capability in SF. This works and is already giving a large speedup over the current allreduce baseline (which transfers more data than is necessary) which is currently used. I was wondering if within each subcommunicator I could directly write to the DMDA vector via VecSetValues and PETSc would take care of stashing them on the GPU until I call VecAssemblyBegin. Since this would be from within a kokkos parallel_for operation, there would be multiple (probably ~1e3) simultaneous writes that the stashing mechanism would have to support. Currently, we use Kokkos-ScatterView to do this. Thank You, Sajid Ali (he/him) | Research Associate Scientific Computing Division Fermi National Accelerator Laboratory s-sajid-ali.github.io ________________________________ From: Matthew Knepley Sent: Thursday, March 17, 2022 7:25 PM To: Mark Adams Cc: Sajid Ali Syed ; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Regarding the status of VecSetValues(Blocked) for GPU vectors On Thu, Mar 17, 2022 at 8:19 PM Mark Adams > wrote: LocalToGlobal is a DM thing.. Sajid, do use DM? If you need to add off procesor entries then DM could give you a local vector as Matt said that you can add to for off procesor values and then you could use the CPU communication in DM. It would be GPU communication, not CPU. Matt On Thu, Mar 17, 2022 at 7:19 PM Matthew Knepley > wrote: On Thu, Mar 17, 2022 at 4:46 PM Sajid Ali Syed > wrote: Hi PETSc-developers, Is it possible to use VecSetValues with distributed-memory CUDA & Kokkos vectors from the device, i.e. can I call VecSetValues with GPU memory pointers and expect PETSc to figure out how to stash on the device it until I call VecAssemblyBegin (at which point PETSc could use GPU-aware MPI to populate off-process values) ? If this is not currently supported, is supporting this on the roadmap? Thanks in advance! VecSetValues() will fall back to the CPU vector, so I do not think this will work on device. Usually, our assembly computes all values and puts them in a "local" vector, which you can access explicitly as Mark said. Then we call LocalToGlobal() to communicate the values, which does work directly on device using specialized code in VecScatter/PetscSF. What are you trying to do? THanks, Matt Thank You, Sajid Ali (he/him) | Research Associate Scientific Computing Division Fermi National Accelerator Laboratory s-sajid-ali.github.io -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From fabio.durastante at unipi.it Fri Mar 18 10:44:24 2022 From: fabio.durastante at unipi.it (Fabio Durastante) Date: Fri, 18 Mar 2022 16:44:24 +0100 Subject: [petsc-users] Fwd: Problem running ex54f with GAMG In-Reply-To: <893eb69d-c3d9-769f-ec54-29090aac07a2@mat.uniroma2.it> References: <893eb69d-c3d9-769f-ec54-29090aac07a2@mat.uniroma2.it> Message-ID: <32dffef8-cee8-da32-7992-89e5f1a42392@unipi.it> Hi everybody, I'm trying to run the rotated anisotropy example ex54f using CG and GAMG as preconditioner, I run it with the command: mpirun -np 2 ./ex54f -ne 1011 \ -theta 18.0 \ -epsilon 100.0 \ -pc_type gamg \ -pc_gamg_type agg \ -log_view \ -log_trace \ -ksp_view \ -ksp_monitor \ -ksp_type cg \ -mg_levels_pc_type jacobi \ -mg_levels_ksp_type richardson \ -mg_levels_ksp_max_it 4 \ -ksp_atol 1e-9 \ -ksp_rtol 1e-12 But the KSP CG seems to stop just after two iterations: ? 0 KSP Residual norm 6.666655711717e-02 ? 1 KSP Residual norm 9.859661350927e-03 I'm attaching the full log, the problem seems to appear when I modify the value of epsilon, if I leave it to the default (1.0) it prints ? 0 KSP Residual norm 5.862074869050e+00 ? 1 KSP Residual norm 5.132711016122e-01 ? 2 KSP Residual norm 1.198566629717e-01 ? 3 KSP Residual norm 1.992885901625e-02 ? 4 KSP Residual norm 4.919780086064e-03 ? 5 KSP Residual norm 1.417045143681e-03 ? 6 KSP Residual norm 3.559622318760e-04 ? 7 KSP Residual norm 9.270786187701e-05 ? 8 KSP Residual norm 1.886403709163e-05 ? 9 KSP Residual norm 2.940634415714e-06 ?10 KSP Residual norm 5.015043022637e-07 ?11 KSP Residual norm 9.760219712757e-08 ?12 KSP Residual norm 2.320857464659e-08 ?13 KSP Residual norm 4.563772507631e-09 ?14 KSP Residual norm 8.896675476997e-10 that is very strange because the case with epsilon 1 should be easier. Any help with this would be great. Thank you very much, Fabio Durastante -------------- next part -------------- 0 KSP Residual norm 6.666655711717e-02 1 KSP Residual norm 9.859661350927e-03 KSP Object: 2 MPI processes type: cg maximum iterations=10000, initial guess is zero tolerances: relative=1e-12, absolute=1e-09, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: 2 MPI processes type: gamg type is MULTIPLICATIVE, levels=6 cycles=v Cycles per PCApply=1 Using externally compute Galerkin coarse grid matrices GAMG specific options Threshold for dropping small values in graph on each level = 0. 0. 0. 0. 0. 0. Threshold scaling factor for each level not specified = 1. AGG specific options Symmetric graph false Number of levels to square graph 1 Number smoothing steps 1 Complexity: grid = 1.13597 Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 2 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 2 MPI processes type: bjacobi number of blocks = 2 Local solver information for first block is in the following KSP and PC objects on rank 0: Use -mg_coarse_ksp_view ::ascii_info_detail to display information for all blocks KSP Object: (mg_coarse_sub_) 1 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_sub_) 1 MPI processes type: lu out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5., needed 1. Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=6, cols=6 package used to perform factorization: petsc total: nonzeros=36, allocated nonzeros=36 using I-node routines: found 2 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: (mg_coarse_sub_) 1 MPI processes type: seqaij rows=6, cols=6 total: nonzeros=36, allocated nonzeros=36 total number of mallocs used during MatSetValues calls=0 using I-node routines: found 2 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 2 MPI processes type: mpiaij rows=6, cols=6 total: nonzeros=36, allocated nonzeros=36 total number of mallocs used during MatSetValues calls=0 using I-node (on process 0) routines: found 2 nodes, limit used is 5 Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 2 MPI processes type: richardson damping factor=1. maximum iterations=4, nonzero initial guess tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_levels_1_) 2 MPI processes type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 2 MPI processes type: mpiaij rows=80, cols=80 total: nonzeros=2236, allocated nonzeros=2236 total number of mallocs used during MatSetValues calls=0 using nonscalable MatPtAP() implementation not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_levels_2_) 2 MPI processes type: richardson damping factor=1. maximum iterations=4, nonzero initial guess tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_levels_2_) 2 MPI processes type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 2 MPI processes type: mpiaij rows=1086, cols=1086 total: nonzeros=34476, allocated nonzeros=34476 total number of mallocs used during MatSetValues calls=0 using nonscalable MatPtAP() implementation not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 3 ------------------------------- KSP Object: (mg_levels_3_) 2 MPI processes type: richardson damping factor=1. maximum iterations=4, nonzero initial guess tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_levels_3_) 2 MPI processes type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 2 MPI processes type: mpiaij rows=12368, cols=12368 total: nonzeros=304142, allocated nonzeros=304142 total number of mallocs used during MatSetValues calls=0 using nonscalable MatPtAP() implementation not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 4 ------------------------------- KSP Object: (mg_levels_4_) 2 MPI processes type: richardson damping factor=1. maximum iterations=4, nonzero initial guess tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_levels_4_) 2 MPI processes type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 2 MPI processes type: mpiaij rows=77337, cols=77337 total: nonzeros=910755, allocated nonzeros=910755 total number of mallocs used during MatSetValues calls=0 using nonscalable MatPtAP() implementation not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 5 ------------------------------- KSP Object: (mg_levels_5_) 2 MPI processes type: richardson damping factor=1. maximum iterations=4, nonzero initial guess tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_levels_5_) 2 MPI processes type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 2 MPI processes type: mpiaij rows=1024144, cols=1024144 total: nonzeros=9205156, allocated nonzeros=15362160 total number of mallocs used during MatSetValues calls=0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: 2 MPI processes type: mpiaij rows=1024144, cols=1024144 total: nonzeros=9205156, allocated nonzeros=15362160 total number of mallocs used during MatSetValues calls=0 not using I-node (on process 0) routines ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./ex54f on a arch-linux-c-opt named grace with 2 processors, by fabiod Fri Mar 18 16:38:04 2022 Using Petsc Release Version 3.16.3, unknown Max Max/Min Avg Total Time (sec): 2.719e+00 1.000 2.719e+00 Objects: 5.450e+02 1.004 5.440e+02 Flop: 5.535e+08 1.001 5.532e+08 1.106e+09 Flop/sec: 2.035e+08 1.001 2.034e+08 4.069e+08 MPI Messages: 3.205e+02 1.009 3.190e+02 6.380e+02 MPI Message Lengths: 1.207e+06 1.000 3.783e+03 2.413e+06 MPI Reductions: 5.740e+02 1.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flop and VecAXPY() for complex vectors of length N --> 8N flop Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 2.7193e+00 100.0% 1.1065e+09 100.0% 6.380e+02 100.0% 3.783e+03 100.0% 5.540e+02 96.5% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage BuildTwoSided 71 1.0 7.1392e-0214.7 0.00e+00 0.0 7.1e+01 4.0e+00 7.1e+01 1 0 11 0 12 1 0 11 0 13 0 BuildTwoSidedF 44 1.0 7.1353e-0215.9 0.00e+00 0.0 3.8e+01 2.0e+04 4.4e+01 1 0 6 31 8 1 0 6 31 8 0 MatMult 131 1.0 1.9677e-01 1.0 2.66e+08 1.0 2.8e+02 2.7e+03 5.0e+00 7 48 44 32 1 7 48 44 32 1 2704 MatMultAdd 10 1.0 8.1694e-03 1.0 5.91e+06 1.0 1.8e+01 6.9e+02 0.0e+00 0 1 3 1 0 0 1 3 1 0 1446 MatMultTranspose 10 1.0 8.6451e-03 1.0 5.91e+06 1.0 3.6e+01 4.3e+02 5.0e+00 0 1 6 1 1 0 1 6 1 1 1367 MatSolve 2 0.0 7.2710e-06 0.0 1.32e+02 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 18 MatLUFactorSym 1 1.0 1.3105e-05 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatLUFactorNum 1 1.0 7.9300e-06 2.1 1.29e+02 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 16 MatConvert 5 1.0 3.5570e-02 1.0 0.00e+00 0.0 2.0e+01 7.1e+02 5.0e+00 1 0 3 1 1 1 0 3 1 1 0 MatScale 15 1.0 2.1281e-02 1.0 1.34e+07 1.0 1.0e+01 2.8e+03 0.0e+00 1 2 2 1 0 1 2 2 1 0 1260 MatResidual 10 1.0 1.6194e-02 1.0 2.09e+07 1.0 2.0e+01 2.8e+03 0.0e+00 1 4 3 2 0 1 4 3 2 0 2583 MatAssemblyBegin 84 1.0 7.2778e-0216.0 0.00e+00 0.0 3.8e+01 2.0e+04 2.9e+01 1 0 6 31 5 1 0 6 31 5 0 MatAssemblyEnd 84 1.0 1.8844e-01 1.0 7.86e+03 2.3 0.0e+00 0.0e+00 9.6e+01 7 0 0 0 17 7 0 0 0 17 0 MatGetRowIJ 1 0.0 5.4830e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatCreateSubMat 2 1.0 2.3775e-04 1.0 0.00e+00 0.0 5.0e+00 7.0e+01 2.8e+01 0 0 1 0 5 0 0 1 0 5 0 MatGetOrdering 1 0.0 4.9585e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatCoarsen 5 1.0 3.4196e-02 1.0 0.00e+00 0.0 5.8e+01 1.9e+03 1.4e+01 1 0 9 5 2 1 0 9 5 3 0 MatZeroEntries 5 1.0 7.8601e-04 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatView 9 1.3 8.4609e-04 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 7.0e+00 0 0 0 0 1 0 0 0 0 1 0 MatAXPY 5 1.0 3.8052e-02 1.0 5.58e+05 1.0 0.0e+00 0.0e+00 5.0e+00 1 0 0 0 1 1 0 0 0 1 29 MatTranspose 10 1.0 1.1442e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMatMultSym 15 1.0 1.1629e-01 1.0 0.00e+00 0.0 3.0e+01 1.9e+03 4.5e+01 4 0 5 2 8 4 0 5 2 8 0 MatMatMultNum 15 1.0 4.2265e-02 1.0 2.69e+07 1.0 1.0e+01 2.8e+03 5.0e+00 2 5 2 1 1 2 5 2 1 1 1272 MatPtAPSymbolic 5 1.0 1.9834e-01 1.0 0.00e+00 0.0 6.0e+01 3.7e+03 3.5e+01 7 0 9 9 6 7 0 9 9 6 0 MatPtAPNumeric 5 1.0 7.2245e-02 1.0 4.51e+07 1.0 2.0e+01 7.8e+03 2.5e+01 3 8 3 6 4 3 8 3 6 5 1247 MatTrnMatMultSym 1 1.0 6.6188e-01 1.0 0.00e+00 0.0 1.0e+01 6.2e+04 1.2e+01 24 0 2 26 2 24 0 2 26 2 0 MatGetLocalMat 16 1.0 6.5550e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 MatGetBrAoCol 15 1.0 2.9662e-03 1.6 0.00e+00 0.0 7.0e+01 3.7e+03 0.0e+00 0 0 11 11 0 0 0 11 11 0 0 VecMDot 50 1.0 2.5929e-02 1.0 6.13e+07 1.0 0.0e+00 0.0e+00 5.0e+01 1 11 0 0 9 1 11 0 0 9 4730 VecTDot 3 1.0 1.6664e-03 1.1 3.07e+06 1.0 0.0e+00 0.0e+00 3.0e+00 0 1 0 0 1 0 1 0 0 1 3687 VecNorm 57 1.0 4.3805e-03 1.1 1.43e+07 1.0 0.0e+00 0.0e+00 5.7e+01 0 3 0 0 10 0 3 0 0 10 6535 VecScale 55 1.0 3.1254e-03 1.1 6.13e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 3924 VecCopy 17 1.0 5.6865e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 80 1.0 2.9919e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 87 1.0 9.4864e-03 1.0 2.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 4428 VecAYPX 80 1.0 1.1726e-02 1.1 8.92e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 1521 VecMAXPY 55 1.0 3.7916e-02 1.0 7.25e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 13 0 0 0 1 13 0 0 0 3823 VecAssemblyBegin 16 1.0 1.0659e-03 7.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.5e+01 0 0 0 0 3 0 0 0 0 3 0 VecAssemblyEnd 16 1.0 1.6375e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecPointwiseMult 135 1.0 2.7489e-02 1.0 1.51e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 3 0 0 0 1 3 0 0 0 1095 VecScatterBegin 172 1.0 8.5210e-04 1.1 0.00e+00 0.0 4.0e+02 2.7e+03 1.7e+01 0 0 63 45 3 0 0 63 45 3 0 VecScatterEnd 172 1.0 2.9094e-03 2.0 9.20e+02 1.4 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1 VecSetRandom 5 1.0 1.2987e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecNormalize 55 1.0 6.9701e-03 1.0 1.84e+07 1.0 0.0e+00 0.0e+00 5.5e+01 0 3 0 0 10 0 3 0 0 10 5279 SFSetGraph 35 1.0 1.4065e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFSetUp 27 1.0 1.0133e-03 1.4 0.00e+00 0.0 1.0e+02 8.1e+02 2.7e+01 0 0 16 3 5 0 0 16 3 5 0 SFBcastBegin 19 1.0 6.6749e-05 1.1 0.00e+00 0.0 3.8e+01 2.3e+03 0.0e+00 0 0 6 4 0 0 0 6 4 0 0 SFBcastEnd 19 1.0 1.0335e-03 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFPack 191 1.0 8.2768e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFUnpack 191 1.0 3.5429e-05 1.0 9.20e+02 1.4 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 44 KSPSetUp 13 1.0 1.3468e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+01 0 0 0 0 2 0 0 0 0 2 0 KSPSolve 1 1.0 1.9267e-01 1.0 2.22e+08 1.0 2.2e+02 2.3e+03 2.0e+01 7 40 34 21 3 7 40 34 21 4 2302 KSPGMRESOrthog 50 1.0 5.8824e-02 1.0 1.23e+08 1.0 0.0e+00 0.0e+00 5.0e+01 2 22 0 0 9 2 22 0 0 9 4170 PCGAMGGraph_AGG 5 1.0 3.8613e-01 1.0 1.05e+07 1.0 3.0e+01 1.4e+03 4.5e+01 14 2 5 2 8 14 2 5 2 8 54 PCGAMGCoarse_AGG 5 1.0 8.1589e-01 1.0 0.00e+00 0.0 8.6e+01 1.1e+04 3.7e+01 30 0 13 37 6 30 0 13 37 7 0 PCGAMGProl_AGG 5 1.0 1.2226e-01 1.0 0.00e+00 0.0 4.8e+01 2.3e+03 7.9e+01 4 0 8 5 14 4 0 8 5 14 0 PCGAMGPOpt_AGG 5 1.0 3.6809e-01 1.0 2.73e+08 1.0 1.6e+02 2.4e+03 2.0e+02 14 49 25 16 36 14 49 25 16 37 1480 GAMG: createProl 5 1.0 1.6962e+00 1.0 2.83e+08 1.0 3.2e+02 4.4e+03 3.7e+02 62 51 51 60 64 62 51 51 60 66 334 Graph 10 1.0 3.8577e-01 1.0 1.05e+07 1.0 3.0e+01 1.4e+03 4.5e+01 14 2 5 2 8 14 2 5 2 8 54 MIS/Agg 5 1.0 3.4274e-02 1.0 0.00e+00 0.0 5.8e+01 1.9e+03 1.4e+01 1 0 9 5 2 1 0 9 5 3 0 SA: col data 5 1.0 2.4279e-02 1.0 0.00e+00 0.0 3.6e+01 2.6e+03 3.4e+01 1 0 6 4 6 1 0 6 4 6 0 SA: frmProl0 5 1.0 9.1539e-02 1.0 0.00e+00 0.0 1.2e+01 1.3e+03 2.5e+01 3 0 2 1 4 3 0 2 1 5 0 SA: smooth 5 1.0 1.5983e-01 1.0 1.40e+07 1.0 4.0e+01 2.1e+03 6.5e+01 6 3 6 4 11 6 3 6 4 12 175 GAMG: partLevel 5 1.0 2.7117e-01 1.0 4.51e+07 1.0 9.4e+01 4.0e+03 1.1e+02 10 8 15 16 20 10 8 15 16 20 332 repartition 1 1.0 5.0492e-04 1.0 0.00e+00 0.0 1.4e+01 3.3e+01 5.3e+01 0 0 2 0 9 0 0 2 0 10 0 Invert-Sort 1 1.0 1.1274e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 1 0 0 0 0 1 0 Move A 1 1.0 1.8661e-04 1.0 0.00e+00 0.0 5.0e+00 7.0e+01 1.5e+01 0 0 1 0 3 0 0 1 0 3 0 Move P 1 1.0 1.2168e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.6e+01 0 0 0 0 3 0 0 0 0 3 0 PCGAMG Squ l00 1 1.0 6.6188e-01 1.0 0.00e+00 0.0 1.0e+01 6.2e+04 1.2e+01 24 0 2 26 2 24 0 2 26 2 0 PCGAMG Gal l00 1 1.0 2.2394e-01 1.0 3.54e+07 1.0 1.6e+01 8.7e+03 1.2e+01 8 6 3 6 2 8 6 3 6 2 316 PCGAMG Opt l00 1 1.0 9.5932e-02 1.0 9.21e+06 1.0 8.0e+00 6.1e+03 1.0e+01 4 2 1 2 2 4 2 1 2 2 192 PCGAMG Gal l01 1 1.0 3.6699e-02 1.0 7.11e+06 1.0 1.6e+01 8.0e+03 1.2e+01 1 1 3 5 2 1 1 3 5 2 385 PCGAMG Opt l01 1 1.0 9.6930e-03 1.0 9.13e+05 1.0 8.0e+00 2.3e+03 1.0e+01 0 0 1 1 2 0 0 1 1 2 188 PCGAMG Gal l02 1 1.0 8.7276e-03 1.0 2.33e+06 1.0 1.6e+01 5.0e+03 1.2e+01 0 0 3 3 2 0 0 3 3 2 534 PCGAMG Opt l02 1 1.0 2.3489e-03 1.0 3.07e+05 1.0 8.0e+00 1.5e+03 1.0e+01 0 0 1 1 2 0 0 1 1 2 259 PCGAMG Gal l03 1 1.0 1.0658e-03 1.0 2.61e+05 1.1 1.6e+01 1.7e+03 1.2e+01 0 0 3 1 2 0 0 3 1 2 478 PCGAMG Opt l03 1 1.0 3.5707e-04 1.0 3.46e+04 1.0 8.0e+00 5.7e+02 1.0e+01 0 0 1 0 2 0 0 1 0 2 193 PCGAMG Gal l04 1 1.0 2.2163e-04 1.0 1.02e+04 1.4 1.6e+01 1.9e+02 1.2e+01 0 0 3 0 2 0 0 3 0 2 80 PCGAMG Opt l04 1 1.0 1.0728e-04 1.0 2.43e+03 1.2 8.0e+00 1.6e+02 1.0e+01 0 0 1 0 2 0 0 1 0 2 42 PCSetUp 2 1.0 1.9752e+00 1.0 3.28e+08 1.0 4.2e+02 4.3e+03 5.1e+02 73 59 66 75 88 73 59 66 75 92 332 PCSetUpOnBlocks 2 1.0 1.5289e-04 1.2 1.29e+02 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1 PCApply 2 1.0 1.7902e-01 1.0 2.06e+08 1.0 2.1e+02 2.3e+03 1.5e+01 7 37 34 20 3 7 37 34 20 3 2300 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Container 10 10 5840 0. Matrix 141 141 611009048 0. Matrix Coarsen 5 5 3160 0. Vector 220 220 158846504 0. Index Set 70 70 88748 0. Star Forest Graph 45 45 53640 0. Krylov Solver 13 13 166952 0. Preconditioner 13 13 13776 0. Viewer 3 2 1696 0. PetscRandom 10 10 6700 0. Distributed Mesh 5 5 25280 0. Discrete System 5 5 4520 0. Weak Form 5 5 3120 0. ======================================================================================================================== Average time to get PetscTime(): 2.16e-08 Average time for MPI_Barrier(): 3.11e-07 Average time for zero size MPI_Send(): 6.87e-07 #PETSc Option Table entries: -epsilon 100.0 -ksp_atol 1e-9 -ksp_monitor -ksp_rtol 1e-12 -ksp_type cg -ksp_view -log_trace -log_view -mg_levels_ksp_max_it 4 -mg_levels_ksp_type richardson -mg_levels_pc_type jacobi -ne 1011 -pc_gamg_type agg -pc_type gamg -theta 18.0 #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --download-trilinos-configure-arguments="-DTPL_ENABLE_Boost=OFF -DTPL_ENABLE_Matio=OFF" --download-hypre --download-netcdf --download-hdf5 --download-zlib --download-make --download-ml --with-debugging=0 COPTFLAGS="-g -O3" CXXOPTFLAGS="-g -O3" FOPTFLAGS="-g -O3" CUDAOPTFLAGS=-O3 ----------------------------------------- Libraries compiled on 2022-02-23 22:16:38 on grace Machine characteristics: Linux-3.10.0-1160.42.2.el7.x86_64-x86_64-with-centos-7.9.2009-Core Using PETSc directory: /home/fabiod/anisotropy/petsc Using PETSc arch: arch-linux-c-opt ----------------------------------------- Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O3 Using Fortran compiler: mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O3 ----------------------------------------- Using include paths: -I/home/fabiod/anisotropy/petsc/include -I/home/fabiod/anisotropy/petsc/arch-linux-c-opt/include ----------------------------------------- Using C linker: mpicc Using Fortran linker: mpif90 Using libraries: -Wl,-rpath,/home/fabiod/anisotropy/petsc/arch-linux-c-opt/lib -L/home/fabiod/anisotropy/petsc/arch-linux-c-opt/lib -lpetsc -Wl,-rpath,/home/fabiod/anisotropy/petsc/arch-linux-c-opt/lib -L/home/fabiod/anisotropy/petsc/arch-linux-c-opt/lib -Wl,-rpath,/opt/openmpi/lib -L/opt/openmpi/lib -Wl,-rpath,/opt/gcc61/lib/gcc/x86_64-pc-linux-gnu/6.1.0 -L/opt/gcc61/lib/gcc/x86_64-pc-linux-gnu/6.1.0 -Wl,-rpath,/opt/gcc61/lib64 -L/opt/gcc61/lib64 -Wl,-rpath,/opt/gcc61/lib -L/opt/gcc61/lib -lHYPRE -lml -lopenblas -lnetcdf -lhdf5_hl -lhdf5 -lm -lz -lX11 -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lquadmath -lstdc++ -ldl ----------------------------------------- From bsmith at petsc.dev Fri Mar 18 10:47:51 2022 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 18 Mar 2022 11:47:51 -0400 Subject: [petsc-users] Problem running ex54f with GAMG In-Reply-To: <32dffef8-cee8-da32-7992-89e5f1a42392@unipi.it> References: <893eb69d-c3d9-769f-ec54-29090aac07a2@mat.uniroma2.it> <32dffef8-cee8-da32-7992-89e5f1a42392@unipi.it> Message-ID: Run with -ksp_converged_reason to have it print why it has stopped the iteration. > On Mar 18, 2022, at 11:44 AM, Fabio Durastante wrote: > > Hi everybody, > > I'm trying to run the rotated anisotropy example ex54f using CG and GAMG as preconditioner, I run it with the command: > > mpirun -np 2 ./ex54f -ne 1011 \ > -theta 18.0 \ > -epsilon 100.0 \ > -pc_type gamg \ > -pc_gamg_type agg \ > -log_view \ > -log_trace \ > -ksp_view \ > -ksp_monitor \ > -ksp_type cg \ > -mg_levels_pc_type jacobi \ > -mg_levels_ksp_type richardson \ > -mg_levels_ksp_max_it 4 \ > -ksp_atol 1e-9 \ > -ksp_rtol 1e-12 > > But the KSP CG seems to stop just after two iterations: > > 0 KSP Residual norm 6.666655711717e-02 > 1 KSP Residual norm 9.859661350927e-03 > > I'm attaching the full log, the problem seems to appear when I modify the value of epsilon, if I leave it to the default (1.0) it prints > > 0 KSP Residual norm 5.862074869050e+00 > 1 KSP Residual norm 5.132711016122e-01 > 2 KSP Residual norm 1.198566629717e-01 > 3 KSP Residual norm 1.992885901625e-02 > 4 KSP Residual norm 4.919780086064e-03 > 5 KSP Residual norm 1.417045143681e-03 > 6 KSP Residual norm 3.559622318760e-04 > 7 KSP Residual norm 9.270786187701e-05 > 8 KSP Residual norm 1.886403709163e-05 > 9 KSP Residual norm 2.940634415714e-06 > 10 KSP Residual norm 5.015043022637e-07 > 11 KSP Residual norm 9.760219712757e-08 > 12 KSP Residual norm 2.320857464659e-08 > 13 KSP Residual norm 4.563772507631e-09 > 14 KSP Residual norm 8.896675476997e-10 > > that is very strange because the case with epsilon 1 should be easier. > > Any help with this would be great. > > Thank you very much, > > Fabio Durastante > From fabio.durastante at unipi.it Fri Mar 18 10:53:41 2022 From: fabio.durastante at unipi.it (Fabio Durastante) Date: Fri, 18 Mar 2022 16:53:41 +0100 Subject: [petsc-users] Problem running ex54f with GAMG In-Reply-To: References: <893eb69d-c3d9-769f-ec54-29090aac07a2@mat.uniroma2.it> <32dffef8-cee8-da32-7992-89e5f1a42392@unipi.it> Message-ID: For the default case: Linear solve converged due to CONVERGED_ATOL iterations 14 for the other it tells Linear solve did not converge due to DIVERGED_INDEFINITE_PC iterations 2 that again seems a bit strange to me, since this should be a symmetric V-cycle built on smoothed aggregation that should be definite (and symmetric). Fabio Il 18/03/22 16:47, Barry Smith ha scritto: > Run with -ksp_converged_reason to have it print why it has stopped the iteration. > > >> On Mar 18, 2022, at 11:44 AM, Fabio Durastante wrote: >> >> Hi everybody, >> >> I'm trying to run the rotated anisotropy example ex54f using CG and GAMG as preconditioner, I run it with the command: >> >> mpirun -np 2 ./ex54f -ne 1011 \ >> -theta 18.0 \ >> -epsilon 100.0 \ >> -pc_type gamg \ >> -pc_gamg_type agg \ >> -log_view \ >> -log_trace \ >> -ksp_view \ >> -ksp_monitor \ >> -ksp_type cg \ >> -mg_levels_pc_type jacobi \ >> -mg_levels_ksp_type richardson \ >> -mg_levels_ksp_max_it 4 \ >> -ksp_atol 1e-9 \ >> -ksp_rtol 1e-12 >> >> But the KSP CG seems to stop just after two iterations: >> >> 0 KSP Residual norm 6.666655711717e-02 >> 1 KSP Residual norm 9.859661350927e-03 >> >> I'm attaching the full log, the problem seems to appear when I modify the value of epsilon, if I leave it to the default (1.0) it prints >> >> 0 KSP Residual norm 5.862074869050e+00 >> 1 KSP Residual norm 5.132711016122e-01 >> 2 KSP Residual norm 1.198566629717e-01 >> 3 KSP Residual norm 1.992885901625e-02 >> 4 KSP Residual norm 4.919780086064e-03 >> 5 KSP Residual norm 1.417045143681e-03 >> 6 KSP Residual norm 3.559622318760e-04 >> 7 KSP Residual norm 9.270786187701e-05 >> 8 KSP Residual norm 1.886403709163e-05 >> 9 KSP Residual norm 2.940634415714e-06 >> 10 KSP Residual norm 5.015043022637e-07 >> 11 KSP Residual norm 9.760219712757e-08 >> 12 KSP Residual norm 2.320857464659e-08 >> 13 KSP Residual norm 4.563772507631e-09 >> 14 KSP Residual norm 8.896675476997e-10 >> >> that is very strange because the case with epsilon 1 should be easier. >> >> Any help with this would be great. >> >> Thank you very much, >> >> Fabio Durastante >> From junchao.zhang at gmail.com Fri Mar 18 11:11:46 2022 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Fri, 18 Mar 2022 11:11:46 -0500 Subject: [petsc-users] Regarding the status of VecSetValues(Blocked) for GPU vectors In-Reply-To: References: Message-ID: On Fri, Mar 18, 2022 at 10:28 AM Sajid Ali Syed wrote: > Hi Matt/Mark, > > I'm working on a Poisson solver for a distributed PIC code, where the > particles are distributed over MPI ranks rather than the grid. Prior to the > solve, all particles are deposited onto a (DMDA) grid. > > The current prototype I have is that each rank holds a full size DMDA > vector and particles on that rank are deposited into it. Then, the data > from all the local vectors in combined into multiple distributed DMDA > vectors via VecScatters and this is followed by solving the Poisson > equation. The need to have multiple subcomms, each solving the same > equation is due to the fact that the grid size too small to use all the MPI > ranks (beyond the strong scaling limit). The solution is then scattered > back to each MPI rank via VecScatters. > > This first local-to-(multi)global transfer required the use of multiple > VecScatters as there is no one-to-multiple scatter capability in SF. This > works and is already giving a large speedup over the current allreduce > baseline (which transfers more data than is necessary) which is currently > used. > > I was wondering if within each subcommunicator I could directly write to > the DMDA vector via VecSetValues and PETSc would take care of stashing them > on the GPU until I call VecAssemblyBegin. Since this would be from within a > kokkos parallel_for operation, there would be multiple (probably ~1e3) > simultaneous writes that the stashing mechanism would have to support. > Currently, we use Kokkos-ScatterView to do this. > VecSetValues() only supports host data. I was wondering to provide a VecSetValues for you to call in Kokkos parallel_for, does it have to be a device function? > > Thank You, > Sajid Ali (he/him) | Research Associate > Scientific Computing Division > Fermi National Accelerator Laboratory > s-sajid-ali.github.io > > ------------------------------ > *From:* Matthew Knepley > *Sent:* Thursday, March 17, 2022 7:25 PM > *To:* Mark Adams > *Cc:* Sajid Ali Syed ; petsc-users at mcs.anl.gov < > petsc-users at mcs.anl.gov> > *Subject:* Re: [petsc-users] Regarding the status of > VecSetValues(Blocked) for GPU vectors > > On Thu, Mar 17, 2022 at 8:19 PM Mark Adams wrote: > > LocalToGlobal is a DM thing.. > Sajid, do use DM? > If you need to add off procesor entries then DM could give you a local > vector as Matt said that you can add to for off procesor values and then > you could use the CPU communication in DM. > > > It would be GPU communication, not CPU. > > Matt > > > On Thu, Mar 17, 2022 at 7:19 PM Matthew Knepley wrote: > > On Thu, Mar 17, 2022 at 4:46 PM Sajid Ali Syed wrote: > > Hi PETSc-developers, > > Is it possible to use VecSetValues with distributed-memory CUDA & Kokkos > vectors from the device, i.e. can I call VecSetValues with GPU memory > pointers and expect PETSc to figure out how to stash on the device it until > I call VecAssemblyBegin (at which point PETSc could use GPU-aware MPI to > populate off-process values) ? > > If this is not currently supported, is supporting this on the roadmap? > Thanks in advance! > > > VecSetValues() will fall back to the CPU vector, so I do not think this > will work on device. > > Usually, our assembly computes all values and puts them in a "local" > vector, which you can access explicitly as Mark said. Then > we call LocalToGlobal() to communicate the values, which does work > directly on device using specialized code in VecScatter/PetscSF. > > What are you trying to do? > > THanks, > > Matt > > > Thank You, > Sajid Ali (he/him) | Research Associate > Scientific Computing Division > Fermi National Accelerator Laboratory > s-sajid-ali.github.io > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Mar 18 12:15:26 2022 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 18 Mar 2022 13:15:26 -0400 Subject: [petsc-users] Regarding the status of VecSetValues(Blocked) for GPU vectors In-Reply-To: References: Message-ID: On Fri, Mar 18, 2022 at 11:28 AM Sajid Ali Syed wrote: > Hi Matt/Mark, > > I'm working on a Poisson solver for a distributed PIC code, where the > particles are distributed over MPI ranks rather than the grid. Prior to the > solve, all particles are deposited onto a (DMDA) grid. > > The current prototype I have is that each rank holds a full size DMDA > vector and particles on that rank are deposited into it. Then, the data > from all the local vectors in combined into multiple distributed DMDA > vectors via VecScatters and this is followed by solving the Poisson > equation. The need to have multiple subcomms, each solving the same > equation is due to the fact that the grid size too small to use all the MPI > ranks (beyond the strong scaling limit). The solution is then scattered > back to each MPI rank via VecScatters. > > This first local-to-(multi)global transfer required the use of multiple > VecScatters as there is no one-to-multiple scatter capability in SF. This > works and is already giving a large speedup over the current allreduce > baseline (which transfers more data than is necessary) which is currently > used. > > I was wondering if within each subcommunicator I could directly write to > the DMDA vector via VecSetValues and PETSc would take care of stashing them > on the GPU until I call VecAssemblyBegin. Since this would be from within a > kokkos parallel_for operation, there would be multiple (probably ~1e3) > simultaneous writes that the stashing mechanism would have to support. > Currently, we use Kokkos-ScatterView to do this. > Hi Sajid, It turns out that Mark and I are doing exactly this operation for plasma physics. Here is what we currently do: 1) Use DMSwarm to hold the particle data 2) Use a DMPlex as the cellDM for the swarm, which does point location after each particle push 3) Use a conservative projection routine in PETSc to transfer charge to a FEM space while preserving any number of moments (currently we do 0, 1, and 2). This projection is just a KSP solve, which can happen on the GPU, except that the particle data is currently held on the CPU. 4) Solve the Poisson problem (or Landau operator), which can happen completely on the GPU 5) Project the other direction. The biggest improvement we could make here for a GPU workflow is to hold the particle data on the GPU. That is not conceptually hard, but would take some rewriting of the internals, which predate GPUs. Thanks, Matt > Thank You, > Sajid Ali (he/him) | Research Associate > Scientific Computing Division > Fermi National Accelerator Laboratory > s-sajid-ali.github.io > > ------------------------------ > *From:* Matthew Knepley > *Sent:* Thursday, March 17, 2022 7:25 PM > *To:* Mark Adams > *Cc:* Sajid Ali Syed ; petsc-users at mcs.anl.gov < > petsc-users at mcs.anl.gov> > *Subject:* Re: [petsc-users] Regarding the status of > VecSetValues(Blocked) for GPU vectors > > On Thu, Mar 17, 2022 at 8:19 PM Mark Adams wrote: > > LocalToGlobal is a DM thing.. > Sajid, do use DM? > If you need to add off procesor entries then DM could give you a local > vector as Matt said that you can add to for off procesor values and then > you could use the CPU communication in DM. > > > It would be GPU communication, not CPU. > > Matt > > > On Thu, Mar 17, 2022 at 7:19 PM Matthew Knepley wrote: > > On Thu, Mar 17, 2022 at 4:46 PM Sajid Ali Syed wrote: > > Hi PETSc-developers, > > Is it possible to use VecSetValues with distributed-memory CUDA & Kokkos > vectors from the device, i.e. can I call VecSetValues with GPU memory > pointers and expect PETSc to figure out how to stash on the device it until > I call VecAssemblyBegin (at which point PETSc could use GPU-aware MPI to > populate off-process values) ? > > If this is not currently supported, is supporting this on the roadmap? > Thanks in advance! > > > VecSetValues() will fall back to the CPU vector, so I do not think this > will work on device. > > Usually, our assembly computes all values and puts them in a "local" > vector, which you can access explicitly as Mark said. Then > we call LocalToGlobal() to communicate the values, which does work > directly on device using specialized code in VecScatter/PetscSF. > > What are you trying to do? > > THanks, > > Matt > > > Thank You, > Sajid Ali (he/him) | Research Associate > Scientific Computing Division > Fermi National Accelerator Laboratory > s-sajid-ali.github.io > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri Mar 18 13:30:31 2022 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 18 Mar 2022 14:30:31 -0400 Subject: [petsc-users] Fwd: Problem running ex54f with GAMG In-Reply-To: <32dffef8-cee8-da32-7992-89e5f1a42392@unipi.it> References: <893eb69d-c3d9-769f-ec54-29090aac07a2@mat.uniroma2.it> <32dffef8-cee8-da32-7992-89e5f1a42392@unipi.it> Message-ID: On Fri, Mar 18, 2022 at 11:44 AM Fabio Durastante wrote: > Hi everybody, > > I'm trying to run the rotated anisotropy example ex54f using CG and GAMG > as preconditioner, I run it with the command: > > mpirun -np 2 ./ex54f -ne 1011 \ > -theta 18.0 \ > -epsilon 100.0 \ > -pc_type gamg \ > -pc_gamg_type agg \ > -log_view \ > -log_trace \ > -ksp_view \ > -ksp_monitor \ > -ksp_type cg \ > -mg_levels_pc_type jacobi \ > -mg_levels_ksp_type richardson \ > -mg_levels_ksp_max_it 4 \ > -ksp_atol 1e-9 \ > -ksp_rtol 1e-12 > > But the KSP CG seems to stop just after two iterations: > > 0 KSP Residual norm 6.666655711717e-02 > 1 KSP Residual norm 9.859661350927e-03 > > I'm attaching the full log, the problem seems to appear when I modify the > value of epsilon, if I leave it to the default (1.0) it prints > > 0 KSP Residual norm 5.862074869050e+00 > 1 KSP Residual norm 5.132711016122e-01 > 2 KSP Residual norm 1.198566629717e-01 > 3 KSP Residual norm 1.992885901625e-02 > 4 KSP Residual norm 4.919780086064e-03 > 5 KSP Residual norm 1.417045143681e-03 > 6 KSP Residual norm 3.559622318760e-04 > 7 KSP Residual norm 9.270786187701e-05 > 8 KSP Residual norm 1.886403709163e-05 > 9 KSP Residual norm 2.940634415714e-06 > 10 KSP Residual norm 5.015043022637e-07 > 11 KSP Residual norm 9.760219712757e-08 > 12 KSP Residual norm 2.320857464659e-08 > 13 KSP Residual norm 4.563772507631e-09 > 14 KSP Residual norm 8.896675476997e-10 > > that is very strange because the case with epsilon 1 should be easier. > As Barry said, it failed (use -ksp_converged_reason). You are making it pretty hard with 100. The example uses .1 (ie, 10) You are adding -mg_levels_ksp_type richardson which does work with jacobi. Use sor (instead of jacobi) and it should work. The default ksp_type is chebyshev. > Any help with this would be great. > > Thank you very much, > > Fabio Durastante > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Mar 18 14:17:33 2022 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 18 Mar 2022 15:17:33 -0400 Subject: [petsc-users] Problem running ex54f with GAMG In-Reply-To: References: <893eb69d-c3d9-769f-ec54-29090aac07a2@mat.uniroma2.it> <32dffef8-cee8-da32-7992-89e5f1a42392@unipi.it> Message-ID: <57743B5E-5856-4F5A-B819-5935C62D4930@petsc.dev> The GAMG produced an indefinite operator. I don't know if there is a way to detect why this happened or how to stop it. You can try -ksp_type gmres and see how that goes since it can handle an indefinite preconditioner. > On Mar 18, 2022, at 11:53 AM, Fabio Durastante wrote: > > For the default case: > > Linear solve converged due to CONVERGED_ATOL iterations 14 > > for the other it tells > > Linear solve did not converge due to DIVERGED_INDEFINITE_PC iterations 2 > > that again seems a bit strange to me, since this should be a symmetric V-cycle built on smoothed aggregation that should be definite (and symmetric). > > Fabio > > Il 18/03/22 16:47, Barry Smith ha scritto: >> Run with -ksp_converged_reason to have it print why it has stopped the iteration. >> >> >>> On Mar 18, 2022, at 11:44 AM, Fabio Durastante wrote: >>> >>> Hi everybody, >>> >>> I'm trying to run the rotated anisotropy example ex54f using CG and GAMG as preconditioner, I run it with the command: >>> >>> mpirun -np 2 ./ex54f -ne 1011 \ >>> -theta 18.0 \ >>> -epsilon 100.0 \ >>> -pc_type gamg \ >>> -pc_gamg_type agg \ >>> -log_view \ >>> -log_trace \ >>> -ksp_view \ >>> -ksp_monitor \ >>> -ksp_type cg \ >>> -mg_levels_pc_type jacobi \ >>> -mg_levels_ksp_type richardson \ >>> -mg_levels_ksp_max_it 4 \ >>> -ksp_atol 1e-9 \ >>> -ksp_rtol 1e-12 >>> >>> But the KSP CG seems to stop just after two iterations: >>> >>> 0 KSP Residual norm 6.666655711717e-02 >>> 1 KSP Residual norm 9.859661350927e-03 >>> >>> I'm attaching the full log, the problem seems to appear when I modify the value of epsilon, if I leave it to the default (1.0) it prints >>> >>> 0 KSP Residual norm 5.862074869050e+00 >>> 1 KSP Residual norm 5.132711016122e-01 >>> 2 KSP Residual norm 1.198566629717e-01 >>> 3 KSP Residual norm 1.992885901625e-02 >>> 4 KSP Residual norm 4.919780086064e-03 >>> 5 KSP Residual norm 1.417045143681e-03 >>> 6 KSP Residual norm 3.559622318760e-04 >>> 7 KSP Residual norm 9.270786187701e-05 >>> 8 KSP Residual norm 1.886403709163e-05 >>> 9 KSP Residual norm 2.940634415714e-06 >>> 10 KSP Residual norm 5.015043022637e-07 >>> 11 KSP Residual norm 9.760219712757e-08 >>> 12 KSP Residual norm 2.320857464659e-08 >>> 13 KSP Residual norm 4.563772507631e-09 >>> 14 KSP Residual norm 8.896675476997e-10 >>> >>> that is very strange because the case with epsilon 1 should be easier. >>> >>> Any help with this would be great. >>> >>> Thank you very much, >>> >>> Fabio Durastante >>> From mfadams at lbl.gov Fri Mar 18 17:14:40 2022 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 18 Mar 2022 18:14:40 -0400 Subject: [petsc-users] Problem running ex54f with GAMG In-Reply-To: <57743B5E-5856-4F5A-B819-5935C62D4930@petsc.dev> References: <893eb69d-c3d9-769f-ec54-29090aac07a2@mat.uniroma2.it> <32dffef8-cee8-da32-7992-89e5f1a42392@unipi.it> <57743B5E-5856-4F5A-B819-5935C62D4930@petsc.dev> Message-ID: MG is not stable with richardson/jacobi, or at least it is almost not stable. Yea I would guess gmres will go forever because it does not care. I think CG exists when it gets a negative number for beta or whatever and it was probably -eps. That is my guess. On Fri, Mar 18, 2022 at 3:17 PM Barry Smith wrote: > > The GAMG produced an indefinite operator. I don't know if there is a way > to detect why this happened or how to stop it. > > You can try -ksp_type gmres and see how that goes since it can handle an > indefinite preconditioner. > > > > > On Mar 18, 2022, at 11:53 AM, Fabio Durastante < > fabio.durastante at unipi.it> wrote: > > > > For the default case: > > > > Linear solve converged due to CONVERGED_ATOL iterations 14 > > > > for the other it tells > > > > Linear solve did not converge due to DIVERGED_INDEFINITE_PC iterations 2 > > > > that again seems a bit strange to me, since this should be a symmetric > V-cycle built on smoothed aggregation that should be definite (and > symmetric). > > > > Fabio > > > > Il 18/03/22 16:47, Barry Smith ha scritto: > >> Run with -ksp_converged_reason to have it print why it has stopped > the iteration. > >> > >> > >>> On Mar 18, 2022, at 11:44 AM, Fabio Durastante < > fabio.durastante at unipi.it> wrote: > >>> > >>> Hi everybody, > >>> > >>> I'm trying to run the rotated anisotropy example ex54f using CG and > GAMG as preconditioner, I run it with the command: > >>> > >>> mpirun -np 2 ./ex54f -ne 1011 \ > >>> -theta 18.0 \ > >>> -epsilon 100.0 \ > >>> -pc_type gamg \ > >>> -pc_gamg_type agg \ > >>> -log_view \ > >>> -log_trace \ > >>> -ksp_view \ > >>> -ksp_monitor \ > >>> -ksp_type cg \ > >>> -mg_levels_pc_type jacobi \ > >>> -mg_levels_ksp_type richardson \ > >>> -mg_levels_ksp_max_it 4 \ > >>> -ksp_atol 1e-9 \ > >>> -ksp_rtol 1e-12 > >>> > >>> But the KSP CG seems to stop just after two iterations: > >>> > >>> 0 KSP Residual norm 6.666655711717e-02 > >>> 1 KSP Residual norm 9.859661350927e-03 > >>> > >>> I'm attaching the full log, the problem seems to appear when I modify > the value of epsilon, if I leave it to the default (1.0) it prints > >>> > >>> 0 KSP Residual norm 5.862074869050e+00 > >>> 1 KSP Residual norm 5.132711016122e-01 > >>> 2 KSP Residual norm 1.198566629717e-01 > >>> 3 KSP Residual norm 1.992885901625e-02 > >>> 4 KSP Residual norm 4.919780086064e-03 > >>> 5 KSP Residual norm 1.417045143681e-03 > >>> 6 KSP Residual norm 3.559622318760e-04 > >>> 7 KSP Residual norm 9.270786187701e-05 > >>> 8 KSP Residual norm 1.886403709163e-05 > >>> 9 KSP Residual norm 2.940634415714e-06 > >>> 10 KSP Residual norm 5.015043022637e-07 > >>> 11 KSP Residual norm 9.760219712757e-08 > >>> 12 KSP Residual norm 2.320857464659e-08 > >>> 13 KSP Residual norm 4.563772507631e-09 > >>> 14 KSP Residual norm 8.896675476997e-10 > >>> > >>> that is very strange because the case with epsilon 1 should be easier. > >>> > >>> Any help with this would be great. > >>> > >>> Thank you very much, > >>> > >>> Fabio Durastante > >>> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fabio.durastante at unipi.it Mon Mar 21 02:56:38 2022 From: fabio.durastante at unipi.it (Fabio Durastante) Date: Mon, 21 Mar 2022 08:56:38 +0100 Subject: [petsc-users] Problem running ex54f with GAMG In-Reply-To: References: <893eb69d-c3d9-769f-ec54-29090aac07a2@mat.uniroma2.it> <32dffef8-cee8-da32-7992-89e5f1a42392@unipi.it> <57743B5E-5856-4F5A-B819-5935C62D4930@petsc.dev> Message-ID: <977ea5a8-fe36-e529-c39e-cdef3b7c773a@unipi.it> Thank you very much for the fast answers and for the insight. Indeed, I tried changing Richardson with Chebyshev (while keeping fixed all the rest), and the? method did converge in 39 iterations. Richards and GMRES instead has a very slow convergence as you were expecting (around one thousand iterations for a four digit relative residual, at that point I terminated it), even if it had come to convergence it would have been a waste for an SPD system. Best, Fabio Il 18/03/22 23:14, Mark Adams ha scritto: > MG is not stable with richardson/jacobi, or at least it is almost not stable. > Yea I would guess gmres will go forever because it does not care. > I think CG exists when it gets a negative number for beta or whatever and it was probably?-eps. > That is my guess. > > On Fri, Mar 18, 2022 at 3:17 PM Barry Smith wrote: > > > ? The GAMG produced an indefinite operator. I don't know if there is a way to detect why this happened or how to stop it. > > ? You can try -ksp_type gmres and see how that goes since it can handle an indefinite preconditioner. > > > > > On Mar 18, 2022, at 11:53 AM, Fabio Durastante wrote: > > > > For the default case: > > > > Linear solve converged due to CONVERGED_ATOL iterations 14 > > > > for the other it tells > > > > Linear solve did not converge due to DIVERGED_INDEFINITE_PC iterations 2 > > > > that again seems a bit strange to me, since this should be a symmetric V-cycle built on smoothed aggregation that should be definite (and symmetric). > > > > Fabio > > > > Il 18/03/22 16:47, Barry Smith ha scritto: > >>? ?Run with -ksp_converged_reason to have it print why it has stopped the iteration. > >> > >> > >>> On Mar 18, 2022, at 11:44 AM, Fabio Durastante wrote: > >>> > >>> Hi everybody, > >>> > >>> I'm trying to run the rotated anisotropy example ex54f using CG and GAMG as preconditioner, I run it with the command: > >>> > >>> mpirun -np 2 ./ex54f -ne 1011 \ > >>> -theta 18.0 \ > >>> -epsilon 100.0 \ > >>> -pc_type gamg \ > >>> -pc_gamg_type agg \ > >>> -log_view \ > >>> -log_trace \ > >>> -ksp_view \ > >>> -ksp_monitor \ > >>> -ksp_type cg \ > >>> -mg_levels_pc_type jacobi \ > >>> -mg_levels_ksp_type richardson \ > >>> -mg_levels_ksp_max_it 4 \ > >>> -ksp_atol 1e-9 \ > >>> -ksp_rtol 1e-12 > >>> > >>> But the KSP CG seems to stop just after two iterations: > >>> > >>>? ?0 KSP Residual norm 6.666655711717e-02 > >>>? ?1 KSP Residual norm 9.859661350927e-03 > >>> > >>> I'm attaching the full log, the problem seems to appear when I modify the value of epsilon, if I leave it to the default (1.0) it prints > >>> > >>>? ?0 KSP Residual norm 5.862074869050e+00 > >>>? ?1 KSP Residual norm 5.132711016122e-01 > >>>? ?2 KSP Residual norm 1.198566629717e-01 > >>>? ?3 KSP Residual norm 1.992885901625e-02 > >>>? ?4 KSP Residual norm 4.919780086064e-03 > >>>? ?5 KSP Residual norm 1.417045143681e-03 > >>>? ?6 KSP Residual norm 3.559622318760e-04 > >>>? ?7 KSP Residual norm 9.270786187701e-05 > >>>? ?8 KSP Residual norm 1.886403709163e-05 > >>>? ?9 KSP Residual norm 2.940634415714e-06 > >>>? 10 KSP Residual norm 5.015043022637e-07 > >>>? 11 KSP Residual norm 9.760219712757e-08 > >>>? 12 KSP Residual norm 2.320857464659e-08 > >>>? 13 KSP Residual norm 4.563772507631e-09 > >>>? 14 KSP Residual norm 8.896675476997e-10 > >>> > >>> that is very strange because the case with epsilon 1 should be easier. > >>> > >>> Any help with this would be great. > >>> > >>> Thank you very much, > >>> > >>> Fabio Durastante > >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From marco.cisternino at optimad.it Mon Mar 21 07:17:48 2022 From: marco.cisternino at optimad.it (Marco Cisternino) Date: Mon, 21 Mar 2022 12:17:48 +0000 Subject: [petsc-users] Null space and preconditioners Message-ID: Good morning, I'm observing an unexpected (to me) behaviour of my code. I tried to reduce the problem in a toy code here attached. The toy code archive contains a small main, a matrix and a rhs. The toy code solves the linear system and check the norms and the mean of the solution. The problem into the matrix and the rhs is the finite volume discretization of the pressure equation of an incompressible NS solver. It has been cooked as tiny as possible (16 cells!). It is important to say that it is an elliptic problem with homogeneous Neumann boundary conditions only, for this reason the toy code sets a null space containing the constant. The unexpected (to me) behaviour is evident by launching the code using different preconditioners, using -pc-type I tested using PCNONE ("none"), PCGAMG ("gamg") and PCILU ("ilu"). The default solver is KSPFGMRES. Using the three PC, I get 3 different solutions. It seems to me that they differ in the mean value, but GAMG is impressive. PCNONE gives me the zero mean solution I expected. What about the others? Asking for residuals monitor, the ratio ||r||/||b|| shows convergence for PCNONE and PCILU (~10^-16), but it stalls for PCGAMG (~10^-4). I cannot see why. Am I doing anything wrong or incorrectly thinking about the expected behaviour? Generalizing to larger mesh the behaviour is similar. Thank you for any help. Marco Cisternino -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: toyCode.tar.gz Type: application/x-gzip Size: 1633 bytes Desc: toyCode.tar.gz URL: From FERRANJ2 at my.erau.edu Mon Mar 21 10:22:28 2022 From: FERRANJ2 at my.erau.edu (Ferrand, Jesus A.) Date: Mon, 21 Mar 2022 15:22:28 +0000 Subject: [petsc-users] PetscSection and DMPlexVTKWriteAll in parallel Message-ID: Greetings. I am having trouble exporting a vertex-based solution field to ParaView when I run my PETSc script in parallel (see screenshots). The smoothly changing field is produced by my serial runs whereas the "messed up" one is produced by my parallel runs. This is not a calculation bug, rather, it concerns the vtk output only (the solution field is the same in parallel and serial). I am using DMPlexVTKWriteAll() but will make the switch to hdf5 sometime. Anyways, my suspicion is about PetscSection and how I am setting it up. I call PetscSectionSetChart() where my "pStart" and "pEnd" I collect from DMPlexGetDepthStratum() where "depth" is set to zero (for vertices) and then I call DMSetLocalSection(). After tinkering with DMPlex routines, I realize that DMPlexGetDepthStratum() returns "pStart" and "pEnd" in local numbering when the run is in parallel. Thus, I think that my serial output is correct because in that case local numbering matches the global numbering. So, am I correct in believing that the PetscSectionSetChart() call should be done with global numbering? Also, I noticed that the parallel DMPlex counts ghost vertices towards the "pStart" and "pEnd". So, when I set the chart in the local PetscSection, should I figure out the chart for the owned vertices or can PETSc figure the ghost/owned dilemma when the local PetscSections feature overlapping charts? Sincerely: J.A. Ferrand Embry-Riddle Aeronautical University - Daytona Beach FL M.Sc. Aerospace Engineering | May 2022 B.Sc. Aerospace Engineering B.Sc. Computational Mathematics Sigma Gamma Tau Tau Beta Pi Phone: (386)-843-1829 Email(s): ferranj2 at my.erau.edu jesus.ferrand at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screenshot from 2022-03-18 19-18-25.png Type: image/png Size: 63193 bytes Desc: Screenshot from 2022-03-18 19-18-25.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screenshot from 2022-03-18 19-18-11.png Type: image/png Size: 22021 bytes Desc: Screenshot from 2022-03-18 19-18-11.png URL: From mfadams at lbl.gov Mon Mar 21 11:06:04 2022 From: mfadams at lbl.gov (Mark Adams) Date: Mon, 21 Mar 2022 12:06:04 -0400 Subject: [petsc-users] Null space and preconditioners In-Reply-To: References: Message-ID: The solution for Neumann problems can "float away" if the constant is not controlled in some way because floating point errors can introduce it even if your RHS is exactly orthogonal to it. You should use a special coarse grid solver for GAMG but it seems to be working for you. I have lost track of the simply way to have the KSP solver clean the constant out, which is what you want. can someone help Marco? Mark On Mon, Mar 21, 2022 at 8:18 AM Marco Cisternino < marco.cisternino at optimad.it> wrote: > Good morning, > > I?m observing an unexpected (to me) behaviour of my code. > > I tried to reduce the problem in a toy code here attached. > The toy code archive contains a small main, a matrix and a rhs. > > The toy code solves the linear system and check the norms and the mean of > the solution. > > The problem into the matrix and the rhs is the finite volume > discretization of the pressure equation of an incompressible NS solver. > > It has been cooked as tiny as possible (16 cells!). > > It is important to say that it is an elliptic problem with homogeneous > Neumann boundary conditions only, for this reason the toy code sets a null > space containing the constant. > > > > The unexpected (to me) behaviour is evident by launching the code using > different preconditioners, using -pc-type > I tested using PCNONE (?none?), PCGAMG (?gamg?) and PCILU (?ilu?). The > default solver is KSPFGMRES. > > Using the three PC, I get 3 different solutions. It seems to me that they > differ in the mean value, but GAMG is impressive. > > PCNONE gives me the zero mean solution I expected. What about the others? > > > > Asking for residuals monitor, the ratio ||r||/||b|| shows convergence for > PCNONE and PCILU (~10^-16), but it stalls for PCGAMG (~10^-4). > > I cannot see why. Am I doing anything wrong or incorrectly thinking about > the expected behaviour? > > > > Generalizing to larger mesh the behaviour is similar. > > > > Thank you for any help. > > > > Marco Cisternino > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Mon Mar 21 11:30:57 2022 From: mfadams at lbl.gov (Mark Adams) Date: Mon, 21 Mar 2022 12:30:57 -0400 Subject: [petsc-users] Null space and preconditioners In-Reply-To: References: Message-ID: And for GAMG you can use: -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi -mg_coarse_ksp_type cg Note if you are using more that one MPI process you can use 'lu' instead of 'jacobi' If GAMG converges fast enough it can solve before the constant creeps in and works without cleaning in the KSP method. On Mon, Mar 21, 2022 at 12:06 PM Mark Adams wrote: > The solution for Neumann problems can "float away" if the constant is not > controlled in some way because floating point errors can introduce it even > if your RHS is exactly orthogonal to it. > > You should use a special coarse grid solver for GAMG but it seems to be > working for you. > > I have lost track of the simply way to have the KSP solver clean the > constant out, which is what you want. > > can someone help Marco? > > Mark > > > > > > On Mon, Mar 21, 2022 at 8:18 AM Marco Cisternino < > marco.cisternino at optimad.it> wrote: > >> Good morning, >> >> I?m observing an unexpected (to me) behaviour of my code. >> >> I tried to reduce the problem in a toy code here attached. >> The toy code archive contains a small main, a matrix and a rhs. >> >> The toy code solves the linear system and check the norms and the mean of >> the solution. >> >> The problem into the matrix and the rhs is the finite volume >> discretization of the pressure equation of an incompressible NS solver. >> >> It has been cooked as tiny as possible (16 cells!). >> >> It is important to say that it is an elliptic problem with homogeneous >> Neumann boundary conditions only, for this reason the toy code sets a null >> space containing the constant. >> >> >> >> The unexpected (to me) behaviour is evident by launching the code using >> different preconditioners, using -pc-type >> I tested using PCNONE (?none?), PCGAMG (?gamg?) and PCILU (?ilu?). The >> default solver is KSPFGMRES. >> >> Using the three PC, I get 3 different solutions. It seems to me that they >> differ in the mean value, but GAMG is impressive. >> >> PCNONE gives me the zero mean solution I expected. What about the others? >> >> >> >> Asking for residuals monitor, the ratio ||r||/||b|| shows convergence for >> PCNONE and PCILU (~10^-16), but it stalls for PCGAMG (~10^-4). >> >> I cannot see why. Am I doing anything wrong or incorrectly thinking about >> the expected behaviour? >> >> >> >> Generalizing to larger mesh the behaviour is similar. >> >> >> >> Thank you for any help. >> >> >> >> Marco Cisternino >> >> >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From marco.cisternino at optimad.it Mon Mar 21 11:41:15 2022 From: marco.cisternino at optimad.it (Marco Cisternino) Date: Mon, 21 Mar 2022 16:41:15 +0000 Subject: [petsc-users] Null space and preconditioners In-Reply-To: References: Message-ID: Thank you, Mark. However, doing this with my toy code mpirun -n 1 ./testpreconditioner -pc_type gamg -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi -mg_coarse_ksp_type cg I get 16 inf elements. Do I miss anything? Thanks again Marco Cisternino From: Mark Adams Sent: luned? 21 marzo 2022 17:31 To: Marco Cisternino Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Null space and preconditioners And for GAMG you can use: -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi -mg_coarse_ksp_type cg Note if you are using more that one MPI process you can use 'lu' instead of 'jacobi' If GAMG converges fast enough it can solve before the constant creeps in and works without cleaning in the KSP method. On Mon, Mar 21, 2022 at 12:06 PM Mark Adams > wrote: The solution for Neumann problems can "float away" if the constant is not controlled in some way because floating point errors can introduce it even if your RHS is exactly orthogonal to it. You should use a special coarse grid solver for GAMG but it seems to be working for you. I have lost track of the simply way to have the KSP solver clean the constant out, which is what you want. can someone help Marco? Mark On Mon, Mar 21, 2022 at 8:18 AM Marco Cisternino > wrote: Good morning, I?m observing an unexpected (to me) behaviour of my code. I tried to reduce the problem in a toy code here attached. The toy code archive contains a small main, a matrix and a rhs. The toy code solves the linear system and check the norms and the mean of the solution. The problem into the matrix and the rhs is the finite volume discretization of the pressure equation of an incompressible NS solver. It has been cooked as tiny as possible (16 cells!). It is important to say that it is an elliptic problem with homogeneous Neumann boundary conditions only, for this reason the toy code sets a null space containing the constant. The unexpected (to me) behaviour is evident by launching the code using different preconditioners, using -pc-type I tested using PCNONE (?none?), PCGAMG (?gamg?) and PCILU (?ilu?). The default solver is KSPFGMRES. Using the three PC, I get 3 different solutions. It seems to me that they differ in the mean value, but GAMG is impressive. PCNONE gives me the zero mean solution I expected. What about the others? Asking for residuals monitor, the ratio ||r||/||b|| shows convergence for PCNONE and PCILU (~10^-16), but it stalls for PCGAMG (~10^-4). I cannot see why. Am I doing anything wrong or incorrectly thinking about the expected behaviour? Generalizing to larger mesh the behaviour is similar. Thank you for any help. Marco Cisternino -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Mar 21 12:16:14 2022 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 21 Mar 2022 13:16:14 -0400 Subject: [petsc-users] Null space and preconditioners In-Reply-To: References: Message-ID: On Mon, Mar 21, 2022 at 12:06 PM Mark Adams wrote: > The solution for Neumann problems can "float away" if the constant is not > controlled in some way because floating point errors can introduce it even > if your RHS is exactly orthogonal to it. > > You should use a special coarse grid solver for GAMG but it seems to be > working for you. > > I have lost track of the simply way to have the KSP solver clean the > constant out, which is what you want. > > can someone help Marco? > I have not had time to look at the code. However, here are two ways we use to fix the pure Neumann solution: 1) Attach a null space to the operator using https://petsc.org/main/docs/manualpages/Mat/MatSetNullSpace.html 2) Use a coarse grid solver that does least-squares -mg_coarse_pc_type svd Thanks, Matt > Mark > > > > > > On Mon, Mar 21, 2022 at 8:18 AM Marco Cisternino < > marco.cisternino at optimad.it> wrote: > >> Good morning, >> >> I?m observing an unexpected (to me) behaviour of my code. >> >> I tried to reduce the problem in a toy code here attached. >> The toy code archive contains a small main, a matrix and a rhs. >> >> The toy code solves the linear system and check the norms and the mean of >> the solution. >> >> The problem into the matrix and the rhs is the finite volume >> discretization of the pressure equation of an incompressible NS solver. >> >> It has been cooked as tiny as possible (16 cells!). >> >> It is important to say that it is an elliptic problem with homogeneous >> Neumann boundary conditions only, for this reason the toy code sets a null >> space containing the constant. >> >> >> >> The unexpected (to me) behaviour is evident by launching the code using >> different preconditioners, using -pc-type >> I tested using PCNONE (?none?), PCGAMG (?gamg?) and PCILU (?ilu?). The >> default solver is KSPFGMRES. >> >> Using the three PC, I get 3 different solutions. It seems to me that they >> differ in the mean value, but GAMG is impressive. >> >> PCNONE gives me the zero mean solution I expected. What about the others? >> >> >> >> Asking for residuals monitor, the ratio ||r||/||b|| shows convergence for >> PCNONE and PCILU (~10^-16), but it stalls for PCGAMG (~10^-4). >> >> I cannot see why. Am I doing anything wrong or incorrectly thinking about >> the expected behaviour? >> >> >> >> Generalizing to larger mesh the behaviour is similar. >> >> >> >> Thank you for any help. >> >> >> >> Marco Cisternino >> >> >> >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From marco.cisternino at optimad.it Mon Mar 21 12:25:34 2022 From: marco.cisternino at optimad.it (Marco Cisternino) Date: Mon, 21 Mar 2022 17:25:34 +0000 Subject: [petsc-users] Null space and preconditioners In-Reply-To: References: Message-ID: Thank you, Matt. 1. I already set the null space and test it in the toy code 2. I tried your suggestion: the norm and the mean of the solution using -mg_coarse_pc_type svd with PCGAMG is much closer to the one of PCNONE (the norm is the same up to the 6th digit, the mean is about 10e-4 with ?svd? PCGAMG and 10e-17 with PCNONE). I?m going to try with the real code and see what happens on larger meshes. Thank you all. Marco Cisternino From: Matthew Knepley Sent: luned? 21 marzo 2022 18:16 To: Mark Adams Cc: Marco Cisternino ; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Null space and preconditioners On Mon, Mar 21, 2022 at 12:06 PM Mark Adams > wrote: The solution for Neumann problems can "float away" if the constant is not controlled in some way because floating point errors can introduce it even if your RHS is exactly orthogonal to it. You should use a special coarse grid solver for GAMG but it seems to be working for you. I have lost track of the simply way to have the KSP solver clean the constant out, which is what you want. can someone help Marco? I have not had time to look at the code. However, here are two ways we use to fix the pure Neumann solution: 1) Attach a null space to the operator using https://petsc.org/main/docs/manualpages/Mat/MatSetNullSpace.html 2) Use a coarse grid solver that does least-squares -mg_coarse_pc_type svd Thanks, Matt Mark On Mon, Mar 21, 2022 at 8:18 AM Marco Cisternino > wrote: Good morning, I?m observing an unexpected (to me) behaviour of my code. I tried to reduce the problem in a toy code here attached. The toy code archive contains a small main, a matrix and a rhs. The toy code solves the linear system and check the norms and the mean of the solution. The problem into the matrix and the rhs is the finite volume discretization of the pressure equation of an incompressible NS solver. It has been cooked as tiny as possible (16 cells!). It is important to say that it is an elliptic problem with homogeneous Neumann boundary conditions only, for this reason the toy code sets a null space containing the constant. The unexpected (to me) behaviour is evident by launching the code using different preconditioners, using -pc-type I tested using PCNONE (?none?), PCGAMG (?gamg?) and PCILU (?ilu?). The default solver is KSPFGMRES. Using the three PC, I get 3 different solutions. It seems to me that they differ in the mean value, but GAMG is impressive. PCNONE gives me the zero mean solution I expected. What about the others? Asking for residuals monitor, the ratio ||r||/||b|| shows convergence for PCNONE and PCILU (~10^-16), but it stalls for PCGAMG (~10^-4). I cannot see why. Am I doing anything wrong or incorrectly thinking about the expected behaviour? Generalizing to larger mesh the behaviour is similar. Thank you for any help. Marco Cisternino -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Mar 21 12:31:10 2022 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 21 Mar 2022 13:31:10 -0400 Subject: [petsc-users] Null space and preconditioners In-Reply-To: References: Message-ID: On Mon, Mar 21, 2022 at 1:25 PM Marco Cisternino < marco.cisternino at optimad.it> wrote: > Thank you, Matt. > > 1. I already set the null space and test it in the toy code > > If you set this and it is not working, something is wrong. This will remove null space components from each update in the Krylov space. This is done in an example where I check convergence to the exact solution: https://gitlab.com/petsc/petsc/-/blob/main/src/snes/tutorials/ex69.c > 1. > 2. I tried your suggestion: the norm and the mean of the solution > using -mg_coarse_pc_type svd with PCGAMG is much closer to the one of > PCNONE (the norm is the same up to the 6th digit, the mean is about > 10e-4 with ?svd? PCGAMG and 10e-17 with PCNONE). I?m going to try with the > real code and see what happens on larger meshes > > This is not perfect since null space components can be introduced by the rest of the preconditioner, but when I use range-space smoothers and local interpolation it tends to be much better for me. Maybe it is just my problems. Thanks, Matt > Thank you all. > > > > Marco Cisternino > > > > > > *From:* Matthew Knepley > *Sent:* luned? 21 marzo 2022 18:16 > *To:* Mark Adams > *Cc:* Marco Cisternino ; > petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Null space and preconditioners > > > > On Mon, Mar 21, 2022 at 12:06 PM Mark Adams wrote: > > The solution for Neumann problems can "float away" if the constant is not > controlled in some way because floating point errors can introduce it even > if your RHS is exactly orthogonal to it. > > > > You should use a special coarse grid solver for GAMG but it seems to be > working for you. > > > > I have lost track of the simply way to have the KSP solver clean the > constant out, which is what you want. > > > > can someone help Marco? > > > > I have not had time to look at the code. However, here are two ways we use > to fix the pure Neumann solution: > > > > 1) Attach a null space to the operator using > https://petsc.org/main/docs/manualpages/Mat/MatSetNullSpace.html > > > > 2) Use a coarse grid solver that does least-squares > > > > -mg_coarse_pc_type svd > > > > Thanks, > > > > Matt > > > > Mark > > > > > > > > > > > > On Mon, Mar 21, 2022 at 8:18 AM Marco Cisternino < > marco.cisternino at optimad.it> wrote: > > Good morning, > > I?m observing an unexpected (to me) behaviour of my code. > > I tried to reduce the problem in a toy code here attached. > The toy code archive contains a small main, a matrix and a rhs. > > The toy code solves the linear system and check the norms and the mean of > the solution. > > The problem into the matrix and the rhs is the finite volume > discretization of the pressure equation of an incompressible NS solver. > > It has been cooked as tiny as possible (16 cells!). > > It is important to say that it is an elliptic problem with homogeneous > Neumann boundary conditions only, for this reason the toy code sets a null > space containing the constant. > > > > The unexpected (to me) behaviour is evident by launching the code using > different preconditioners, using -pc-type > I tested using PCNONE (?none?), PCGAMG (?gamg?) and PCILU (?ilu?). The > default solver is KSPFGMRES. > > Using the three PC, I get 3 different solutions. It seems to me that they > differ in the mean value, but GAMG is impressive. > > PCNONE gives me the zero mean solution I expected. What about the others? > > > > Asking for residuals monitor, the ratio ||r||/||b|| shows convergence for > PCNONE and PCILU (~10^-16), but it stalls for PCGAMG (~10^-4). > > I cannot see why. Am I doing anything wrong or incorrectly thinking about > the expected behaviour? > > > > Generalizing to larger mesh the behaviour is similar. > > > > Thank you for any help. > > > > Marco Cisternino > > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From marco.cisternino at optimad.it Mon Mar 21 12:51:14 2022 From: marco.cisternino at optimad.it (Marco Cisternino) Date: Mon, 21 Mar 2022 17:51:14 +0000 Subject: [petsc-users] Null space and preconditioners In-Reply-To: References: Message-ID: From: Matthew Knepley Sent: luned? 21 marzo 2022 18:31 To: Marco Cisternino Cc: Mark Adams ; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Null space and preconditioners On Mon, Mar 21, 2022 at 1:25 PM Marco Cisternino > wrote: Thank you, Matt. 1. I already set the null space and test it in the toy code If you set this and it is not working, something is wrong. This will remove null space components from each update in the Krylov space. This is done in an example where I check convergence to the exact solution: https://gitlab.com/petsc/petsc/-/blob/main/src/snes/tutorials/ex69.c Do I have to set a special null space when I use GAMG? The toy code works for PCNONE and PCILU, giving a zero mean solution the first PC and an almost zero mean solution the second one. GAMG floats away, quoting Mark. Looking at what I do: MatNullSpaceCreate(PETSC_COMM_WORLD, PETSC_TRUE, 0, nullptr, &nullspace); MatSetNullSpace(matrix, nullspace); it is more or less what you do at lines 3231-3233 of your reference. Am I wrong? What about lines 3220-3223? What is the difference between nullSpace and nullSpacePres? 1. 2. I tried your suggestion: the norm and the mean of the solution using -mg_coarse_pc_type svd with PCGAMG is much closer to the one of PCNONE (the norm is the same up to the 6th digit, the mean is about 10e-4 with ?svd? PCGAMG and 10e-17 with PCNONE). I?m going to try with the real code and see what happens on larger meshes This is not perfect since null space components can be introduced by the rest of the preconditioner, but when I use range-space smoothers and local interpolation it tends to be much better for me. Maybe it is just my problems. Is there a way to set -mg_coarse_pc_type svd with the API into the code? Thanks, Marco Thanks, Matt Thank you all. Marco Cisternino From: Matthew Knepley > Sent: luned? 21 marzo 2022 18:16 To: Mark Adams > Cc: Marco Cisternino >; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Null space and preconditioners On Mon, Mar 21, 2022 at 12:06 PM Mark Adams > wrote: The solution for Neumann problems can "float away" if the constant is not controlled in some way because floating point errors can introduce it even if your RHS is exactly orthogonal to it. You should use a special coarse grid solver for GAMG but it seems to be working for you. I have lost track of the simply way to have the KSP solver clean the constant out, which is what you want. can someone help Marco? I have not had time to look at the code. However, here are two ways we use to fix the pure Neumann solution: 1) Attach a null space to the operator using https://petsc.org/main/docs/manualpages/Mat/MatSetNullSpace.html 2) Use a coarse grid solver that does least-squares -mg_coarse_pc_type svd Thanks, Matt Mark On Mon, Mar 21, 2022 at 8:18 AM Marco Cisternino > wrote: Good morning, I?m observing an unexpected (to me) behaviour of my code. I tried to reduce the problem in a toy code here attached. The toy code archive contains a small main, a matrix and a rhs. The toy code solves the linear system and check the norms and the mean of the solution. The problem into the matrix and the rhs is the finite volume discretization of the pressure equation of an incompressible NS solver. It has been cooked as tiny as possible (16 cells!). It is important to say that it is an elliptic problem with homogeneous Neumann boundary conditions only, for this reason the toy code sets a null space containing the constant. The unexpected (to me) behaviour is evident by launching the code using different preconditioners, using -pc-type I tested using PCNONE (?none?), PCGAMG (?gamg?) and PCILU (?ilu?). The default solver is KSPFGMRES. Using the three PC, I get 3 different solutions. It seems to me that they differ in the mean value, but GAMG is impressive. PCNONE gives me the zero mean solution I expected. What about the others? Asking for residuals monitor, the ratio ||r||/||b|| shows convergence for PCNONE and PCILU (~10^-16), but it stalls for PCGAMG (~10^-4). I cannot see why. Am I doing anything wrong or incorrectly thinking about the expected behaviour? Generalizing to larger mesh the behaviour is similar. Thank you for any help. Marco Cisternino -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Mon Mar 21 13:01:55 2022 From: sam.guo at cd-adapco.com (Sam Guo) Date: Mon, 21 Mar 2022 11:01:55 -0700 Subject: [petsc-users] MatCreateSBAIJ Message-ID: Dear PETSc dev team, The documentation about MatCreateSBAIJ has following "It is recommended that one use the MatCreate (), MatSetType () and/or MatSetFromOptions (), MatXXXXSetPreallocation() paradigm instead of this routine directly. [MatXXXXSetPreallocation() is, for example, MatSeqAIJSetPreallocation ]" I currently call MatCreateSBAIJ directly as follows: MatCreateSBAIJ (with d_nnz and o_nnz) MatSetValues (to add row by row) MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); MatSetOption(A, MAT_SYMMETRIC, PETSC_TRUE); Two questions: (1) I am wondering whether what I am doing is the most efficient. (2) I try to find out how the matrix vector multiplication is implemented in PETSc for SBAIJ storage. Thanks, Sam -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Mon Mar 21 13:10:48 2022 From: mfadams at lbl.gov (Mark Adams) Date: Mon, 21 Mar 2022 14:10:48 -0400 Subject: [petsc-users] Null space and preconditioners In-Reply-To: References: Message-ID: On Mon, Mar 21, 2022 at 1:51 PM Marco Cisternino < marco.cisternino at optimad.it> wrote: > > > *From:* Matthew Knepley > *Sent:* luned? 21 marzo 2022 18:31 > *To:* Marco Cisternino > *Cc:* Mark Adams ; petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Null space and preconditioners > > > > On Mon, Mar 21, 2022 at 1:25 PM Marco Cisternino < > marco.cisternino at optimad.it> wrote: > > Thank you, Matt. > > 1. I already set the null space and test it in the toy code > > If you set this and it is not working, something is wrong. This will > remove null space components from each update in the Krylov space. > > This is done in an example where I check convergence to the exact > solution: > https://gitlab.com/petsc/petsc/-/blob/main/src/snes/tutorials/ex69.c > > > > Do I have to set a special null space when I use GAMG? > No, GAMG does not look at the NULL space. (GAMG can use a _near_ null space, but that is not needed here_. > The toy code works for PCNONE and PCILU, giving a zero mean solution the > first PC and an almost zero mean solution the second one. > GAMG floats away, quoting Mark. > Humm, if it's drifting off then it sounds like the null space is not getting cleaned by KSP. If you use the svd coarse solver then GAMG should work.' You seem to be using KSPFGMRES. Use -ksp_type cg Maybe try '-info :ksp'. THis will have the KSP print diagnostics. Maybe it will say something about the null space. I don't know what you mean by "I get 16 inf elements". If you have a 16 cell problem the parallel coarse grid solver should work .. Oh well press on. svd is better anyway. Looking at what I do: > MatNullSpaceCreate(PETSC_COMM_WORLD, PETSC_TRUE, 0, nullptr, &nullspace); > MatSetNullSpace(matrix, nullspace); > it is more or less what you do at lines 3231-3233 of your reference. Am I > wrong? > What about lines 3220-3223? What is the difference between nullSpace and > nullSpacePres? > > 1. > 2. I tried your suggestion: the norm and the mean of the solution > using -mg_coarse_pc_type svd with PCGAMG is much closer to the one of > PCNONE (the norm is the same up to the 6th digit, the mean is about > 10e-4 with ?svd? PCGAMG and 10e-17 with PCNONE). I?m going to try with the > real code and see what happens on larger meshes > > This is not perfect since null space components can be introduced by the > rest of the preconditioner, but when I use range-space smoothers and > > local interpolation it tends to be much better for me. Maybe it is just my > problems. > > > > Is there a way to set -mg_coarse_pc_type svd with the API into the code? > There is not an easy way that I know of. You can insert the option in the database with: ierr = PetscOptionsSetValue(NULL,"-mg_coarse_pc_type","svd");CHKERRQ(ierr); > > > Thanks, > > > > Marco > > > > Thanks, > > > > Matt > > > > Thank you all. > > > > Marco Cisternino > > > > > > *From:* Matthew Knepley > *Sent:* luned? 21 marzo 2022 18:16 > *To:* Mark Adams > *Cc:* Marco Cisternino ; > petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Null space and preconditioners > > > > On Mon, Mar 21, 2022 at 12:06 PM Mark Adams wrote: > > The solution for Neumann problems can "float away" if the constant is not > controlled in some way because floating point errors can introduce it even > if your RHS is exactly orthogonal to it. > > > > You should use a special coarse grid solver for GAMG but it seems to be > working for you. > > > > I have lost track of the simply way to have the KSP solver clean the > constant out, which is what you want. > > > > can someone help Marco? > > > > I have not had time to look at the code. However, here are two ways we use > to fix the pure Neumann solution: > > > > 1) Attach a null space to the operator using > https://petsc.org/main/docs/manualpages/Mat/MatSetNullSpace.html > > > > 2) Use a coarse grid solver that does least-squares > > > > -mg_coarse_pc_type svd > > > > Thanks, > > > > Matt > > > > Mark > > > > > > > > > > > > On Mon, Mar 21, 2022 at 8:18 AM Marco Cisternino < > marco.cisternino at optimad.it> wrote: > > Good morning, > > I?m observing an unexpected (to me) behaviour of my code. > > I tried to reduce the problem in a toy code here attached. > The toy code archive contains a small main, a matrix and a rhs. > > The toy code solves the linear system and check the norms and the mean of > the solution. > > The problem into the matrix and the rhs is the finite volume > discretization of the pressure equation of an incompressible NS solver. > > It has been cooked as tiny as possible (16 cells!). > > It is important to say that it is an elliptic problem with homogeneous > Neumann boundary conditions only, for this reason the toy code sets a null > space containing the constant. > > > > The unexpected (to me) behaviour is evident by launching the code using > different preconditioners, using -pc-type > I tested using PCNONE (?none?), PCGAMG (?gamg?) and PCILU (?ilu?). The > default solver is KSPFGMRES. > > Using the three PC, I get 3 different solutions. It seems to me that they > differ in the mean value, but GAMG is impressive. > > PCNONE gives me the zero mean solution I expected. What about the others? > > > > Asking for residuals monitor, the ratio ||r||/||b|| shows convergence for > PCNONE and PCILU (~10^-16), but it stalls for PCGAMG (~10^-4). > > I cannot see why. Am I doing anything wrong or incorrectly thinking about > the expected behaviour? > > > > Generalizing to larger mesh the behaviour is similar. > > > > Thank you for any help. > > > > Marco Cisternino > > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Mon Mar 21 13:14:30 2022 From: mfadams at lbl.gov (Mark Adams) Date: Mon, 21 Mar 2022 14:14:30 -0400 Subject: [petsc-users] MatCreateSBAIJ In-Reply-To: References: Message-ID: This code looks fine to me and the code is in src/mat/impls/sbaij/seq/sbaij2.c On Mon, Mar 21, 2022 at 2:02 PM Sam Guo wrote: > Dear PETSc dev team, > The documentation about MatCreateSBAIJ has following > "It is recommended that one use the MatCreate > (), > MatSetType > () > and/or MatSetFromOptions > (), > MatXXXXSetPreallocation() paradigm instead of this routine directly. > [MatXXXXSetPreallocation() is, for example, MatSeqAIJSetPreallocation > > ]" > I currently call MatCreateSBAIJ directly as follows: > MatCreateSBAIJ (with d_nnz and o_nnz) > MatSetValues (to add row by row) > MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); > MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); > MatSetOption(A, MAT_SYMMETRIC, PETSC_TRUE); > > Two questions: > (1) I am wondering whether what I am doing is the most efficient. > > (2) I try to find out how the matrix vector multiplication is > implemented in PETSc for SBAIJ storage. > > Thanks, > Sam > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Mon Mar 21 13:27:08 2022 From: sam.guo at cd-adapco.com (Sam Guo) Date: Mon, 21 Mar 2022 11:27:08 -0700 Subject: [petsc-users] MatCreateSBAIJ In-Reply-To: References: Message-ID: Mark, thanks for the quick response. I am more interested in parallel implementation of MatMult for SBAIJ. I found following 1094: *PetscErrorCode MatMult_MPISBAIJ(Mat A,Vec xx,Vec yy)*1095: {1096: Mat_MPISBAIJ *a = (Mat_MPISBAIJ*)A->data;1097: PetscErrorCode ierr;1098: PetscInt mbs=a->mbs,bs=A->rmap->bs;1099: PetscScalar *from;1100: const PetscScalar *x; 1103: /* diagonal part */1104: (*a->A->ops->mult)(a->A,xx,a->slvec1a);1105: VecSet (a->slvec1b,0.0); 1107: /* subdiagonal part */1108: (*a->B->ops->multtranspose)(a->B,xx,a->slvec0b); 1110: /* copy x into the vec slvec0 */1111: VecGetArray (a->slvec0,&from);1112: VecGetArrayRead (xx,&x); 1114: PetscArraycpy (from,x,bs*mbs);1115: VecRestoreArray (a->slvec0,&from);1116: VecRestoreArrayRead (xx,&x); 1118: VecScatterBegin (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1119: VecScatterEnd (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1120: /* supperdiagonal part */1121: (*a->B->ops->multadd)(a->B,a->slvec1b,a->slvec1a,yy);1122: return(0);1123: } I try to understand the algorithm. Thanks, Sam On Mon, Mar 21, 2022 at 11:14 AM Mark Adams wrote: > This code looks fine to me and the code is > in src/mat/impls/sbaij/seq/sbaij2.c > > On Mon, Mar 21, 2022 at 2:02 PM Sam Guo wrote: > >> Dear PETSc dev team, >> The documentation about MatCreateSBAIJ has following >> "It is recommended that one use the MatCreate >> >> (), MatSetType >> () >> and/or MatSetFromOptions >> (), >> MatXXXXSetPreallocation() paradigm instead of this routine directly. >> [MatXXXXSetPreallocation() is, for example, MatSeqAIJSetPreallocation >> >> ]" >> I currently call MatCreateSBAIJ directly as follows: >> MatCreateSBAIJ (with d_nnz and o_nnz) >> MatSetValues (to add row by row) >> MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); >> MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); >> MatSetOption(A, MAT_SYMMETRIC, PETSC_TRUE); >> >> Two questions: >> (1) I am wondering whether what I am doing is the most efficient. >> >> (2) I try to find out how the matrix vector multiplication is >> implemented in PETSc for SBAIJ storage. >> >> Thanks, >> Sam >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Mon Mar 21 13:35:43 2022 From: sam.guo at cd-adapco.com (Sam Guo) Date: Mon, 21 Mar 2022 11:35:43 -0700 Subject: [petsc-users] MatCreateSBAIJ In-Reply-To: References: Message-ID: I am most interested in how the lower triangular part is redistributed. It seems that SBAJI saves memory but requires more communication than BAIJ. On Mon, Mar 21, 2022 at 11:27 AM Sam Guo wrote: > Mark, thanks for the quick response. I am more interested in parallel > implementation of MatMult for SBAIJ. I found following > > 1094: *PetscErrorCode MatMult_MPISBAIJ(Mat A,Vec xx,Vec yy)*1095: {1096: Mat_MPISBAIJ *a = (Mat_MPISBAIJ*)A->data;1097: PetscErrorCode ierr;1098: PetscInt mbs=a->mbs,bs=A->rmap->bs;1099: PetscScalar *from;1100: const PetscScalar *x; > 1103: /* diagonal part */1104: (*a->A->ops->mult)(a->A,xx,a->slvec1a);1105: VecSet (a->slvec1b,0.0); > 1107: /* subdiagonal part */1108: (*a->B->ops->multtranspose)(a->B,xx,a->slvec0b); > 1110: /* copy x into the vec slvec0 */1111: VecGetArray (a->slvec0,&from);1112: VecGetArrayRead (xx,&x); > 1114: PetscArraycpy (from,x,bs*mbs);1115: VecRestoreArray (a->slvec0,&from);1116: VecRestoreArrayRead (xx,&x); > 1118: VecScatterBegin (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1119: VecScatterEnd (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1120: /* supperdiagonal part */1121: (*a->B->ops->multadd)(a->B,a->slvec1b,a->slvec1a,yy);1122: return(0);1123: } > > I try to understand the algorithm. > > > Thanks, > > Sam > > > On Mon, Mar 21, 2022 at 11:14 AM Mark Adams wrote: > >> This code looks fine to me and the code is >> in src/mat/impls/sbaij/seq/sbaij2.c >> >> On Mon, Mar 21, 2022 at 2:02 PM Sam Guo wrote: >> >>> Dear PETSc dev team, >>> The documentation about MatCreateSBAIJ has following >>> "It is recommended that one use the MatCreate >>> >>> (), MatSetType >>> () >>> and/or MatSetFromOptions >>> (), >>> MatXXXXSetPreallocation() paradigm instead of this routine directly. >>> [MatXXXXSetPreallocation() is, for example, MatSeqAIJSetPreallocation >>> >>> ]" >>> I currently call MatCreateSBAIJ directly as follows: >>> MatCreateSBAIJ (with d_nnz and o_nnz) >>> MatSetValues (to add row by row) >>> MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); >>> MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); >>> MatSetOption(A, MAT_SYMMETRIC, PETSC_TRUE); >>> >>> Two questions: >>> (1) I am wondering whether what I am doing is the most efficient. >>> >>> (2) I try to find out how the matrix vector multiplication is >>> implemented in PETSc for SBAIJ storage. >>> >>> Thanks, >>> Sam >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Mar 21 13:48:50 2022 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 21 Mar 2022 14:48:50 -0400 Subject: [petsc-users] Null space and preconditioners In-Reply-To: References: Message-ID: Marco, I have confirmed your results. Urgg, it appears we do not have something well documented. The removal of the null space only works for left preconditioned solvers and FGMRES only works with right preconditioning. Here is the reasoning. The Krylov space for left preconditioning is built from [r, BAr, (BA)^2 r, ...] and the solution space is built from this basis. If A has a null space of n then the left preconditioned Krylov methods simply remove n from the "full" Krylov space after applying each B preconditioner and the resulting "reduced" Krylov space has no components in the n directions hence the solution built by GMRES naturally has no component in the n. But with right preconditioning the Krylov space is [s ABs (AB)^2 s, ....] We would need to remove B^-1 n from the Krylov space so that (A B) B^-1 n = 0 In general we don't have any way of applying B^-1 to a vector so we cannot create the appropriate "reduced" Krylov space. If I run with GMRES (which defaults to left preconditioner) and the options ./testPreconditioners -pc_type gamg -ksp_type gmres -ksp_monitor_true_residual -ksp_rtol 1.e-12 -ksp_view -mg_coarse_pc_type svd Then it handles the null space correctly and the solution has Solution mean = 4.51028e-17 Is there any reason to use FGMRES instead of GMRES? You just cannot use GMRES as the smoother inside GAMG if you use GMRES on the outside, but for pressure equations you don't want use such a strong smoother anyways. Barry I feel we should add some information to the documentation on the removal of the null space to the user's manual when using right preconditioning and maybe even have an error check in the code so that people don't fall into this trap. But I am not sure exactly what to do. When the A and B are both symmetric I think special stuff happens that doesn't require providing a null space; but I am not sure. > On Mar 21, 2022, at 12:41 PM, Marco Cisternino wrote: > > Thank you, Mark. > However, doing this with my toy code > mpirun -n 1 ./testpreconditioner -pc_type gamg -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi -mg_coarse_ksp_type cg > > I get 16 inf elements. Do I miss anything? > > Thanks again > > Marco Cisternino > > > From: Mark Adams > > Sent: luned? 21 marzo 2022 17:31 > To: Marco Cisternino > > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Null space and preconditioners > > And for GAMG you can use: > > -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi -mg_coarse_ksp_type cg > > Note if you are using more that one MPI process you can use 'lu' instead of 'jacobi' > > If GAMG converges fast enough it can solve before the constant creeps in and works without cleaning in the KSP method. > > On Mon, Mar 21, 2022 at 12:06 PM Mark Adams > wrote: > The solution for Neumann problems can "float away" if the constant is not controlled in some way because floating point errors can introduce it even if your RHS is exactly orthogonal to it. > > You should use a special coarse grid solver for GAMG but it seems to be working for you. > > I have lost track of the simply way to have the KSP solver clean the constant out, which is what you want. > > can someone help Marco? > > Mark > > > > > > On Mon, Mar 21, 2022 at 8:18 AM Marco Cisternino > wrote: > Good morning, > I?m observing an unexpected (to me) behaviour of my code. > I tried to reduce the problem in a toy code here attached. > The toy code archive contains a small main, a matrix and a rhs. > The toy code solves the linear system and check the norms and the mean of the solution. > The problem into the matrix and the rhs is the finite volume discretization of the pressure equation of an incompressible NS solver. > It has been cooked as tiny as possible (16 cells!). > It is important to say that it is an elliptic problem with homogeneous Neumann boundary conditions only, for this reason the toy code sets a null space containing the constant. > > The unexpected (to me) behaviour is evident by launching the code using different preconditioners, using -pc-type > I tested using PCNONE (?none?), PCGAMG (?gamg?) and PCILU (?ilu?). The default solver is KSPFGMRES. > Using the three PC, I get 3 different solutions. It seems to me that they differ in the mean value, but GAMG is impressive. > PCNONE gives me the zero mean solution I expected. What about the others? > > Asking for residuals monitor, the ratio ||r||/||b|| shows convergence for PCNONE and PCILU (~10^-16), but it stalls for PCGAMG (~10^-4). > I cannot see why. Am I doing anything wrong or incorrectly thinking about the expected behaviour? > > Generalizing to larger mesh the behaviour is similar. > > Thank you for any help. > > Marco Cisternino -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Mon Mar 21 14:13:54 2022 From: mfadams at lbl.gov (Mark Adams) Date: Mon, 21 Mar 2022 15:13:54 -0400 Subject: [petsc-users] MatCreateSBAIJ In-Reply-To: References: Message-ID: PETSc stores parallel matrices as two serial matrices. One for the diagonal (d or A) block and one for the rest (o or B). I would guess that for symmetric matrices it has a symmetric matrix for the diagonal and a full AIJ matrix for the (upper) off-diagonal. So the multtranspose is applying B symmetrically. This lower off-diagonal and the diagonal block can be done without communication. Then the off processor values are collected, and the upper off-diagonal is applied. On Mon, Mar 21, 2022 at 2:35 PM Sam Guo wrote: > I am most interested in how the lower triangular part is redistributed. It > seems that SBAJI saves memory but requires more communication than BAIJ. > > On Mon, Mar 21, 2022 at 11:27 AM Sam Guo wrote: > >> Mark, thanks for the quick response. I am more interested in parallel >> implementation of MatMult for SBAIJ. I found following >> >> 1094: *PetscErrorCode MatMult_MPISBAIJ(Mat A,Vec xx,Vec yy)*1095: {1096: Mat_MPISBAIJ *a = (Mat_MPISBAIJ*)A->data;1097: PetscErrorCode ierr;1098: PetscInt mbs=a->mbs,bs=A->rmap->bs;1099: PetscScalar *from;1100: const PetscScalar *x; >> 1103: /* diagonal part */1104: (*a->A->ops->mult)(a->A,xx,a->slvec1a);1105: VecSet (a->slvec1b,0.0); >> 1107: /* subdiagonal part */1108: (*a->B->ops->multtranspose)(a->B,xx,a->slvec0b); >> 1110: /* copy x into the vec slvec0 */1111: VecGetArray (a->slvec0,&from);1112: VecGetArrayRead (xx,&x); >> 1114: PetscArraycpy (from,x,bs*mbs);1115: VecRestoreArray (a->slvec0,&from);1116: VecRestoreArrayRead (xx,&x); >> 1118: VecScatterBegin (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1119: VecScatterEnd (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1120: /* supperdiagonal part */1121: (*a->B->ops->multadd)(a->B,a->slvec1b,a->slvec1a,yy);1122: return(0);1123: } >> >> I try to understand the algorithm. >> >> >> Thanks, >> >> Sam >> >> >> On Mon, Mar 21, 2022 at 11:14 AM Mark Adams wrote: >> >>> This code looks fine to me and the code is >>> in src/mat/impls/sbaij/seq/sbaij2.c >>> >>> On Mon, Mar 21, 2022 at 2:02 PM Sam Guo wrote: >>> >>>> Dear PETSc dev team, >>>> The documentation about MatCreateSBAIJ has following >>>> "It is recommended that one use the MatCreate >>>> >>>> (), MatSetType >>>> () >>>> and/or MatSetFromOptions >>>> (), >>>> MatXXXXSetPreallocation() paradigm instead of this routine directly. >>>> [MatXXXXSetPreallocation() is, for example, MatSeqAIJSetPreallocation >>>> >>>> ]" >>>> I currently call MatCreateSBAIJ directly as follows: >>>> MatCreateSBAIJ (with d_nnz and o_nnz) >>>> MatSetValues (to add row by row) >>>> MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); >>>> MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); >>>> MatSetOption(A, MAT_SYMMETRIC, PETSC_TRUE); >>>> >>>> Two questions: >>>> (1) I am wondering whether what I am doing is the most efficient. >>>> >>>> (2) I try to find out how the matrix vector multiplication is >>>> implemented in PETSc for SBAIJ storage. >>>> >>>> Thanks, >>>> Sam >>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Mon Mar 21 14:26:07 2022 From: sam.guo at cd-adapco.com (Sam Guo) Date: Mon, 21 Mar 2022 12:26:07 -0700 Subject: [petsc-users] MatCreateSBAIJ In-Reply-To: References: Message-ID: Using following example from the MatCreateSBAIJ documentation 0 1 2 3 4 5 6 7 8 9 10 11 -------------------------- row 3 |. . . d d d o o o o o o row 4 |. . . d d d o o o o o o row 5 |. . . d d d o o o o o o -------------------------- On a processor that owns rows 3, 4 and 5, rows 0-2 info are still needed. Is is that the processor that owns rows 0-2 will apply B symmetrical and then send the result to the processor that owns 3-5? On Mon, Mar 21, 2022 at 12:14 PM Mark Adams wrote: > PETSc stores parallel matrices as two serial matrices. One for the > diagonal (d or A) block and one for the rest (o or B). > I would guess that for symmetric matrices it has a symmetric matrix for > the diagonal and a full AIJ matrix for the (upper) off-diagonal. > So the multtranspose is applying B symmetrically. This lower off-diagonal > and the diagonal block can be done without communication. > Then the off processor values are collected, and the upper off-diagonal is > applied. > > On Mon, Mar 21, 2022 at 2:35 PM Sam Guo wrote: > >> I am most interested in how the lower triangular part is redistributed. >> It seems that SBAJI saves memory but requires more communication than BAIJ. >> >> On Mon, Mar 21, 2022 at 11:27 AM Sam Guo wrote: >> >>> Mark, thanks for the quick response. I am more interested in parallel >>> implementation of MatMult for SBAIJ. I found following >>> >>> 1094: *PetscErrorCode MatMult_MPISBAIJ(Mat A,Vec xx,Vec yy)*1095: {1096: Mat_MPISBAIJ *a = (Mat_MPISBAIJ*)A->data;1097: PetscErrorCode ierr;1098: PetscInt mbs=a->mbs,bs=A->rmap->bs;1099: PetscScalar *from;1100: const PetscScalar *x; >>> 1103: /* diagonal part */1104: (*a->A->ops->mult)(a->A,xx,a->slvec1a);1105: VecSet (a->slvec1b,0.0); >>> 1107: /* subdiagonal part */1108: (*a->B->ops->multtranspose)(a->B,xx,a->slvec0b); >>> 1110: /* copy x into the vec slvec0 */1111: VecGetArray (a->slvec0,&from);1112: VecGetArrayRead (xx,&x); >>> 1114: PetscArraycpy (from,x,bs*mbs);1115: VecRestoreArray (a->slvec0,&from);1116: VecRestoreArrayRead (xx,&x); >>> 1118: VecScatterBegin (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1119: VecScatterEnd (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1120: /* supperdiagonal part */1121: (*a->B->ops->multadd)(a->B,a->slvec1b,a->slvec1a,yy);1122: return(0);1123: } >>> >>> I try to understand the algorithm. >>> >>> >>> Thanks, >>> >>> Sam >>> >>> >>> On Mon, Mar 21, 2022 at 11:14 AM Mark Adams wrote: >>> >>>> This code looks fine to me and the code is >>>> in src/mat/impls/sbaij/seq/sbaij2.c >>>> >>>> On Mon, Mar 21, 2022 at 2:02 PM Sam Guo wrote: >>>> >>>>> Dear PETSc dev team, >>>>> The documentation about MatCreateSBAIJ has following >>>>> "It is recommended that one use the MatCreate >>>>> >>>>> (), MatSetType >>>>> () >>>>> and/or MatSetFromOptions >>>>> (), >>>>> MatXXXXSetPreallocation() paradigm instead of this routine directly. >>>>> [MatXXXXSetPreallocation() is, for example, MatSeqAIJSetPreallocation >>>>> >>>>> ]" >>>>> I currently call MatCreateSBAIJ directly as follows: >>>>> MatCreateSBAIJ (with d_nnz and o_nnz) >>>>> MatSetValues (to add row by row) >>>>> MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); >>>>> MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); >>>>> MatSetOption(A, MAT_SYMMETRIC, PETSC_TRUE); >>>>> >>>>> Two questions: >>>>> (1) I am wondering whether what I am doing is the most efficient. >>>>> >>>>> (2) I try to find out how the matrix vector multiplication is >>>>> implemented in PETSc for SBAIJ storage. >>>>> >>>>> Thanks, >>>>> Sam >>>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Mon Mar 21 14:33:34 2022 From: mfadams at lbl.gov (Mark Adams) Date: Mon, 21 Mar 2022 15:33:34 -0400 Subject: [petsc-users] Null space and preconditioners In-Reply-To: References: Message-ID: (I did suggest CG, he just has a pressure solve, which is a Laplacian, right?) Ugh, this is pretty bad. The logic might be a bit convoluted, but if SetOperator is called after SetFromOptions, as is usual I think, it could check if it has left a left PC, if the operator has a null space. On Mon, Mar 21, 2022 at 2:48 PM Barry Smith wrote: > > Marco, > > I have confirmed your results. > > Urgg, it appears we do not have something well documented. The > removal of the null space only works for left preconditioned solvers and > FGMRES only works with right preconditioning. Here is the reasoning. > > The Krylov space for left preconditioning is built from [r, BAr, > (BA)^2 r, ...] and the solution space is built from this basis. If A has a > null space of n then the left preconditioned Krylov methods simply remove n > from the "full" Krylov space after applying each B preconditioner and the > resulting "reduced" Krylov space has no components in the n directions > hence the solution built by GMRES naturally has no component in the n. > > But with right preconditioning the Krylov space is [s ABs (AB)^2 s, > ....] We would need to remove B^-1 n from the Krylov space so that (A B) > B^-1 n = 0 In general we don't have any way of applying B^-1 to a vector so > we cannot create the appropriate "reduced" Krylov space. > > If I run with GMRES (which defaults to left preconditioner) and the > options ./testPreconditioners -pc_type gamg -ksp_type gmres > -ksp_monitor_true_residual -ksp_rtol 1.e-12 -ksp_view -mg_coarse_pc_type svd > > Then it handles the null space correctly and the solution has Solution > mean = 4.51028e-17 > > Is there any reason to use FGMRES instead of GMRES? You just cannot use > GMRES as the smoother inside GAMG if you use GMRES on the outside, but for > pressure equations you don't want use such a strong smoother anyways. > > Barry > > I feel we should add some information to the documentation on the > removal of the null space to the user's manual when using right > preconditioning and maybe even have an error check in the code so that > people don't fall into this trap. But I am not sure exactly what to do. > When the A and B are both symmetric I think special stuff happens that > doesn't require providing a null space; but I am not sure. > > > > > > On Mar 21, 2022, at 12:41 PM, Marco Cisternino < > marco.cisternino at optimad.it> wrote: > > Thank you, Mark. > However, doing this with my toy code > mpirun -n 1 ./testpreconditioner -pc_type gamg > -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi > -mg_coarse_ksp_type cg > > I get 16 inf elements. Do I miss anything? > > Thanks again > > Marco Cisternino > > > *From:* Mark Adams > *Sent:* luned? 21 marzo 2022 17:31 > *To:* Marco Cisternino > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Null space and preconditioners > > And for GAMG you can use: > > -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi > -mg_coarse_ksp_type cg > > Note if you are using more that one MPI process you can use 'lu' instead > of 'jacobi' > > If GAMG converges fast enough it can solve before the constant creeps in > and works without cleaning in the KSP method. > > On Mon, Mar 21, 2022 at 12:06 PM Mark Adams wrote: > > The solution for Neumann problems can "float away" if the constant is not > controlled in some way because floating point errors can introduce it even > if your RHS is exactly orthogonal to it. > > You should use a special coarse grid solver for GAMG but it seems to be > working for you. > > I have lost track of the simply way to have the KSP solver clean the > constant out, which is what you want. > > can someone help Marco? > > Mark > > > > > > On Mon, Mar 21, 2022 at 8:18 AM Marco Cisternino < > marco.cisternino at optimad.it> wrote: > > Good morning, > I?m observing an unexpected (to me) behaviour of my code. > I tried to reduce the problem in a toy code here attached. > The toy code archive contains a small main, a matrix and a rhs. > The toy code solves the linear system and check the norms and the mean of > the solution. > The problem into the matrix and the rhs is the finite volume > discretization of the pressure equation of an incompressible NS solver. > It has been cooked as tiny as possible (16 cells!). > It is important to say that it is an elliptic problem with homogeneous > Neumann boundary conditions only, for this reason the toy code sets a null > space containing the constant. > > The unexpected (to me) behaviour is evident by launching the code using > different preconditioners, using -pc-type > I tested using PCNONE (?none?), PCGAMG (?gamg?) and PCILU (?ilu?). The > default solver is KSPFGMRES. > Using the three PC, I get 3 different solutions. It seems to me that they > differ in the mean value, but GAMG is impressive. > PCNONE gives me the zero mean solution I expected. What about the others? > > Asking for residuals monitor, the ratio ||r||/||b|| shows convergence for > PCNONE and PCILU (~10^-16), but it stalls for PCGAMG (~10^-4). > I cannot see why. Am I doing anything wrong or incorrectly thinking about > the expected behaviour? > > Generalizing to larger mesh the behaviour is similar. > > Thank you for any help. > > Marco Cisternino > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Mar 21 14:42:04 2022 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 21 Mar 2022 15:42:04 -0400 Subject: [petsc-users] Null space and preconditioners In-Reply-To: References: Message-ID: <8FE95274-4641-4722-BD9B-08A40744508E@petsc.dev> > On Mar 21, 2022, at 3:33 PM, Mark Adams wrote: > > (I did suggest CG, he just has a pressure solve, which is a Laplacian, right?) His finite volume scheme is finite difference-ish in that it produces a non-symmetric matrix. The non-symmetric part A-A' is actually very cool looking; one of the coolest matrices I've ever seen :-) > > Ugh, this is pretty bad. The logic might be a bit convoluted, but if SetOperator is called after SetFromOptions, as is usual I think, it could check if it has left a left PC, if the operator has a null space. > > > > On Mon, Mar 21, 2022 at 2:48 PM Barry Smith > wrote: > > Marco, > > I have confirmed your results. > > Urgg, it appears we do not have something well documented. The removal of the null space only works for left preconditioned solvers and FGMRES only works with right preconditioning. Here is the reasoning. > > The Krylov space for left preconditioning is built from [r, BAr, (BA)^2 r, ...] and the solution space is built from this basis. If A has a null space of n then the left preconditioned Krylov methods simply remove n from the "full" Krylov space after applying each B preconditioner and the resulting "reduced" Krylov space has no components in the n directions hence the solution built by GMRES naturally has no component in the n. > > But with right preconditioning the Krylov space is [s ABs (AB)^2 s, ....] We would need to remove B^-1 n from the Krylov space so that (A B) B^-1 n = 0 In general we don't have any way of applying B^-1 to a vector so we cannot create the appropriate "reduced" Krylov space. > > If I run with GMRES (which defaults to left preconditioner) and the options ./testPreconditioners -pc_type gamg -ksp_type gmres -ksp_monitor_true_residual -ksp_rtol 1.e-12 -ksp_view -mg_coarse_pc_type svd > > Then it handles the null space correctly and the solution has Solution mean = 4.51028e-17 > > Is there any reason to use FGMRES instead of GMRES? You just cannot use GMRES as the smoother inside GAMG if you use GMRES on the outside, but for pressure equations you don't want use such a strong smoother anyways. > > Barry > > I feel we should add some information to the documentation on the removal of the null space to the user's manual when using right preconditioning and maybe even have an error check in the code so that people don't fall into this trap. But I am not sure exactly what to do. When the A and B are both symmetric I think special stuff happens that doesn't require providing a null space; but I am not sure. > > > > > >> On Mar 21, 2022, at 12:41 PM, Marco Cisternino > wrote: >> >> Thank you, Mark. >> However, doing this with my toy code >> mpirun -n 1 ./testpreconditioner -pc_type gamg -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi -mg_coarse_ksp_type cg >> >> I get 16 inf elements. Do I miss anything? >> >> Thanks again >> >> Marco Cisternino >> >> >> From: Mark Adams > >> Sent: luned? 21 marzo 2022 17:31 >> To: Marco Cisternino > >> Cc: petsc-users at mcs.anl.gov >> Subject: Re: [petsc-users] Null space and preconditioners >> >> And for GAMG you can use: >> >> -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi -mg_coarse_ksp_type cg >> >> Note if you are using more that one MPI process you can use 'lu' instead of 'jacobi' >> >> If GAMG converges fast enough it can solve before the constant creeps in and works without cleaning in the KSP method. >> >> On Mon, Mar 21, 2022 at 12:06 PM Mark Adams > wrote: >> The solution for Neumann problems can "float away" if the constant is not controlled in some way because floating point errors can introduce it even if your RHS is exactly orthogonal to it. >> >> You should use a special coarse grid solver for GAMG but it seems to be working for you. >> >> I have lost track of the simply way to have the KSP solver clean the constant out, which is what you want. >> >> can someone help Marco? >> >> Mark >> >> >> >> >> >> On Mon, Mar 21, 2022 at 8:18 AM Marco Cisternino > wrote: >> Good morning, >> I?m observing an unexpected (to me) behaviour of my code. >> I tried to reduce the problem in a toy code here attached. >> The toy code archive contains a small main, a matrix and a rhs. >> The toy code solves the linear system and check the norms and the mean of the solution. >> The problem into the matrix and the rhs is the finite volume discretization of the pressure equation of an incompressible NS solver. >> It has been cooked as tiny as possible (16 cells!). >> It is important to say that it is an elliptic problem with homogeneous Neumann boundary conditions only, for this reason the toy code sets a null space containing the constant. >> >> The unexpected (to me) behaviour is evident by launching the code using different preconditioners, using -pc-type >> I tested using PCNONE (?none?), PCGAMG (?gamg?) and PCILU (?ilu?). The default solver is KSPFGMRES. >> Using the three PC, I get 3 different solutions. It seems to me that they differ in the mean value, but GAMG is impressive. >> PCNONE gives me the zero mean solution I expected. What about the others? >> >> Asking for residuals monitor, the ratio ||r||/||b|| shows convergence for PCNONE and PCILU (~10^-16), but it stalls for PCGAMG (~10^-4). >> I cannot see why. Am I doing anything wrong or incorrectly thinking about the expected behaviour? >> >> Generalizing to larger mesh the behaviour is similar. >> >> Thank you for any help. >> >> Marco Cisternino > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Untitled.png Type: image/png Size: 214150 bytes Desc: not available URL: From bsmith at petsc.dev Mon Mar 21 14:56:03 2022 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 21 Mar 2022 15:56:03 -0400 Subject: [petsc-users] MatCreateSBAIJ In-Reply-To: References: Message-ID: The "trick" is that though "more" communication is needed to complete the product the communication can still be done in a single VecScatter instead of two separate calls to VecScatter. We simply pack both pieces of information that needs to be sent into a single vector. /* copy x into the vec slvec0 */ 1111: <> VecGetArray (a->slvec0,&from); 1112: <> VecGetArrayRead (xx,&x); 1114: <> PetscArraycpy (from,x,bs*mbs); 1115: <> VecRestoreArray (a->slvec0,&from); 1116: <> VecRestoreArrayRead (xx,&x); 1118: <> VecScatterBegin (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD ); 1119: <> VecScatterEnd (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD ); If you create two symmetric matrices, one with SBAIJ and one with BAIJ and compare the time to do the product you will find that the SBAIJ is not significantly slower but does save memory. > On Mar 21, 2022, at 3:26 PM, Sam Guo wrote: > > Using following example from the MatCreateSBAIJ documentation > 0 1 2 3 4 5 6 7 8 9 10 11 > -------------------------- > row 3 |. . . d d d o o o o o o > row 4 |. . . d d d o o o o o o > row 5 |. . . d d d o o o o o o > -------------------------- > > On a processor that owns rows 3, 4 and 5, rows 0-2 info are still needed. Is is that the processor that owns rows 0-2 will apply B symmetrical and then send the result > to the processor that owns 3-5? > > On Mon, Mar 21, 2022 at 12:14 PM Mark Adams > wrote: > PETSc stores parallel matrices as two serial matrices. One for the diagonal (d or A) block and one for the rest (o or B). > I would guess that for symmetric matrices it has a symmetric matrix for the diagonal and a full AIJ matrix for the (upper) off-diagonal. > So the multtranspose is applying B symmetrically. This lower off-diagonal and the diagonal block can be done without communication. > Then the off processor values are collected, and the upper off-diagonal is applied. > > On Mon, Mar 21, 2022 at 2:35 PM Sam Guo > wrote: > I am most interested in how the lower triangular part is redistributed. It seems that SBAJI saves memory but requires more communication than BAIJ. > > On Mon, Mar 21, 2022 at 11:27 AM Sam Guo > wrote: > Mark, thanks for the quick response. I am more interested in parallel implementation of MatMult for SBAIJ. I found following > 1094: <> <>PetscErrorCode MatMult_MPISBAIJ(Mat A,Vec xx,Vec yy) > 1095: <>{ > 1096: <> Mat_MPISBAIJ *a = (Mat_MPISBAIJ*)A->data; > 1097: <> PetscErrorCode ierr; > 1098: <> PetscInt mbs=a->mbs,bs=A->rmap->bs; > 1099: <> PetscScalar *from; > 1100: <> const PetscScalar *x; > > 1103: <> /* diagonal part */ > 1104: <> (*a->A->ops->mult)(a->A,xx,a->slvec1a); > 1105: <> VecSet (a->slvec1b,0.0); > > 1107: <> /* subdiagonal part */ > 1108: <> (*a->B->ops->multtranspose)(a->B,xx,a->slvec0b); > > 1110: <> /* copy x into the vec slvec0 */ > 1111: <> VecGetArray (a->slvec0,&from); > 1112: <> VecGetArrayRead (xx,&x); > > 1114: <> PetscArraycpy (from,x,bs*mbs); > 1115: <> VecRestoreArray (a->slvec0,&from); > 1116: <> VecRestoreArrayRead (xx,&x); > > 1118: <> VecScatterBegin (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD ); > 1119: <> VecScatterEnd (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD ); > 1120: <> /* supperdiagonal part */ > 1121: <> (*a->B->ops->multadd)(a->B,a->slvec1b,a->slvec1a,yy); > 1122: <> return(0); > 1123: <>} > I try to understand the algorithm. > > Thanks, > Sam > > On Mon, Mar 21, 2022 at 11:14 AM Mark Adams > wrote: > This code looks fine to me and the code is in src/mat/impls/sbaij/seq/sbaij2.c > > On Mon, Mar 21, 2022 at 2:02 PM Sam Guo > wrote: > Dear PETSc dev team, > The documentation about MatCreateSBAIJ has following > "It is recommended that one use the MatCreate (), MatSetType () and/or MatSetFromOptions (), MatXXXXSetPreallocation() paradigm instead of this routine directly. [MatXXXXSetPreallocation() is, for example, MatSeqAIJSetPreallocation ]" > I currently call MatCreateSBAIJ directly as follows: > MatCreateSBAIJ (with d_nnz and o_nnz) > MatSetValues (to add row by row) > MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); > MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); > MatSetOption(A, MAT_SYMMETRIC, PETSC_TRUE); > > Two questions: > (1) I am wondering whether what I am doing is the most efficient. > > (2) I try to find out how the matrix vector multiplication is implemented in PETSc for SBAIJ storage. > > Thanks, > Sam -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Mon Mar 21 15:36:40 2022 From: sam.guo at cd-adapco.com (Sam Guo) Date: Mon, 21 Mar 2022 13:36:40 -0700 Subject: [petsc-users] MatCreateSBAIJ In-Reply-To: References: Message-ID: Barry, Thanks. Could you elaborate? I try to implement the matrix-vector multiplication for a symmetric matrix using shell matrix. Thanks, Sam On Mon, Mar 21, 2022 at 12:56 PM Barry Smith wrote: > > The "trick" is that though "more" communication is needed to complete > the product the communication can still be done in a single VecScatter > instead of two separate calls to VecScatter. We simply pack both pieces of > information that needs to be sent into a single vector. > > /* copy x into the vec slvec0 */1111: VecGetArray (a->slvec0,&from);1112: VecGetArrayRead (xx,&x); > 1114: PetscArraycpy (from,x,bs*mbs);1115: VecRestoreArray (a->slvec0,&from);1116: VecRestoreArrayRead (xx,&x); > 1118: VecScatterBegin (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1119: VecScatterEnd (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD ); > > If you create two symmetric matrices, one with SBAIJ and one with BAIJ and > compare the time to do the product you will find that the SBAIJ is not > significantly slower but does save memory. > > > On Mar 21, 2022, at 3:26 PM, Sam Guo wrote: > > Using following example from the MatCreateSBAIJ documentation > > 0 1 2 3 4 5 6 7 8 9 10 11 > -------------------------- > row 3 |. . . d d d o o o o o o > row 4 |. . . d d d o o o o o o > row 5 |. . . d d d o o o o o o > -------------------------- > > > On a processor that owns rows 3, 4 and 5, rows 0-2 info are still needed. Is is that the processor that owns rows 0-2 will apply B symmetrical and then send the result > > to the processor that owns 3-5? > > > On Mon, Mar 21, 2022 at 12:14 PM Mark Adams wrote: > >> PETSc stores parallel matrices as two serial matrices. One for the >> diagonal (d or A) block and one for the rest (o or B). >> I would guess that for symmetric matrices it has a symmetric matrix for >> the diagonal and a full AIJ matrix for the (upper) off-diagonal. >> So the multtranspose is applying B symmetrically. This lower >> off-diagonal and the diagonal block can be done without communication. >> Then the off processor values are collected, and the upper off-diagonal >> is applied. >> >> On Mon, Mar 21, 2022 at 2:35 PM Sam Guo wrote: >> >>> I am most interested in how the lower triangular part is redistributed. >>> It seems that SBAJI saves memory but requires more communication than BAIJ. >>> >>> On Mon, Mar 21, 2022 at 11:27 AM Sam Guo wrote: >>> >>>> Mark, thanks for the quick response. I am more interested in parallel >>>> implementation of MatMult for SBAIJ. I found following >>>> >>>> 1094: *PetscErrorCode MatMult_MPISBAIJ(Mat A,Vec xx,Vec yy)*1095: {1096: Mat_MPISBAIJ *a = (Mat_MPISBAIJ*)A->data;1097: PetscErrorCode ierr;1098: PetscInt mbs=a->mbs,bs=A->rmap->bs;1099: PetscScalar *from;1100: const PetscScalar *x; >>>> 1103: /* diagonal part */1104: (*a->A->ops->mult)(a->A,xx,a->slvec1a);1105: VecSet (a->slvec1b,0.0); >>>> 1107: /* subdiagonal part */1108: (*a->B->ops->multtranspose)(a->B,xx,a->slvec0b); >>>> 1110: /* copy x into the vec slvec0 */1111: VecGetArray (a->slvec0,&from);1112: VecGetArrayRead (xx,&x); >>>> 1114: PetscArraycpy (from,x,bs*mbs);1115: VecRestoreArray (a->slvec0,&from);1116: VecRestoreArrayRead (xx,&x); >>>> 1118: VecScatterBegin (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1119: VecScatterEnd (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1120: /* supperdiagonal part */1121: (*a->B->ops->multadd)(a->B,a->slvec1b,a->slvec1a,yy);1122: return(0);1123: } >>>> >>>> I try to understand the algorithm. >>>> >>>> >>>> Thanks, >>>> >>>> Sam >>>> >>>> >>>> On Mon, Mar 21, 2022 at 11:14 AM Mark Adams wrote: >>>> >>>>> This code looks fine to me and the code is >>>>> in src/mat/impls/sbaij/seq/sbaij2.c >>>>> >>>>> On Mon, Mar 21, 2022 at 2:02 PM Sam Guo wrote: >>>>> >>>>>> Dear PETSc dev team, >>>>>> The documentation about MatCreateSBAIJ has following >>>>>> "It is recommended that one use the MatCreate >>>>>> >>>>>> (), MatSetType >>>>>> () >>>>>> and/or MatSetFromOptions >>>>>> (), >>>>>> MatXXXXSetPreallocation() paradigm instead of this routine directly. >>>>>> [MatXXXXSetPreallocation() is, for example, MatSeqAIJSetPreallocation >>>>>> >>>>>> ]" >>>>>> I currently call MatCreateSBAIJ directly as follows: >>>>>> MatCreateSBAIJ (with d_nnz and o_nnz) >>>>>> MatSetValues (to add row by row) >>>>>> MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); >>>>>> MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); >>>>>> MatSetOption(A, MAT_SYMMETRIC, PETSC_TRUE); >>>>>> >>>>>> Two questions: >>>>>> (1) I am wondering whether what I am doing is the most efficient. >>>>>> >>>>>> (2) I try to find out how the matrix vector multiplication is >>>>>> implemented in PETSc for SBAIJ storage. >>>>>> >>>>>> Thanks, >>>>>> Sam >>>>>> >>>>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Mon Mar 21 15:41:46 2022 From: mfadams at lbl.gov (Mark Adams) Date: Mon, 21 Mar 2022 16:41:46 -0400 Subject: [petsc-users] Null space and preconditioners In-Reply-To: <8FE95274-4641-4722-BD9B-08A40744508E@petsc.dev> References: <8FE95274-4641-4722-BD9B-08A40744508E@petsc.dev> Message-ID: I'll bet CG will work fine on it. CG is pretty robust and as Phil would say this operator is morally symmetric. On Mon, Mar 21, 2022 at 3:42 PM Barry Smith wrote: > > > On Mar 21, 2022, at 3:33 PM, Mark Adams wrote: > > (I did suggest CG, he just has a pressure solve, which is a Laplacian, > right?) > > > His finite volume scheme is finite difference-ish in that it produces a > non-symmetric matrix. The non-symmetric part A-A' is actually very cool > looking; one of the coolest matrices I've ever seen :-) > > > > Ugh, this is pretty bad. The logic might be a bit convoluted, but > if SetOperator is called after SetFromOptions, as is usual I think, it > could check if it has left a left PC, if the operator has a null space. > > > > On Mon, Mar 21, 2022 at 2:48 PM Barry Smith wrote: > >> >> Marco, >> >> I have confirmed your results. >> >> Urgg, it appears we do not have something well documented. The >> removal of the null space only works for left preconditioned solvers and >> FGMRES only works with right preconditioning. Here is the reasoning. >> >> The Krylov space for left preconditioning is built from [r, BAr, >> (BA)^2 r, ...] and the solution space is built from this basis. If A has a >> null space of n then the left preconditioned Krylov methods simply remove n >> from the "full" Krylov space after applying each B preconditioner and the >> resulting "reduced" Krylov space has no components in the n directions >> hence the solution built by GMRES naturally has no component in the n. >> >> But with right preconditioning the Krylov space is [s ABs (AB)^2 s, >> ....] We would need to remove B^-1 n from the Krylov space so that (A B) >> B^-1 n = 0 In general we don't have any way of applying B^-1 to a vector so >> we cannot create the appropriate "reduced" Krylov space. >> >> If I run with GMRES (which defaults to left preconditioner) and the >> options ./testPreconditioners -pc_type gamg -ksp_type gmres >> -ksp_monitor_true_residual -ksp_rtol 1.e-12 -ksp_view -mg_coarse_pc_type svd >> >> Then it handles the null space correctly and the solution has Solution >> mean = 4.51028e-17 >> >> Is there any reason to use FGMRES instead of GMRES? You just cannot use >> GMRES as the smoother inside GAMG if you use GMRES on the outside, but for >> pressure equations you don't want use such a strong smoother anyways. >> >> Barry >> >> I feel we should add some information to the documentation on the >> removal of the null space to the user's manual when using right >> preconditioning and maybe even have an error check in the code so that >> people don't fall into this trap. But I am not sure exactly what to do. >> When the A and B are both symmetric I think special stuff happens that >> doesn't require providing a null space; but I am not sure. >> >> >> >> >> >> On Mar 21, 2022, at 12:41 PM, Marco Cisternino < >> marco.cisternino at optimad.it> wrote: >> >> Thank you, Mark. >> However, doing this with my toy code >> mpirun -n 1 ./testpreconditioner -pc_type gamg >> -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi >> -mg_coarse_ksp_type cg >> >> I get 16 inf elements. Do I miss anything? >> >> Thanks again >> >> Marco Cisternino >> >> >> *From:* Mark Adams >> *Sent:* luned? 21 marzo 2022 17:31 >> *To:* Marco Cisternino >> *Cc:* petsc-users at mcs.anl.gov >> *Subject:* Re: [petsc-users] Null space and preconditioners >> >> And for GAMG you can use: >> >> -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi >> -mg_coarse_ksp_type cg >> >> Note if you are using more that one MPI process you can use 'lu' instead >> of 'jacobi' >> >> If GAMG converges fast enough it can solve before the constant creeps in >> and works without cleaning in the KSP method. >> >> On Mon, Mar 21, 2022 at 12:06 PM Mark Adams wrote: >> >> The solution for Neumann problems can "float away" if the constant is not >> controlled in some way because floating point errors can introduce it even >> if your RHS is exactly orthogonal to it. >> >> You should use a special coarse grid solver for GAMG but it seems to be >> working for you. >> >> I have lost track of the simply way to have the KSP solver clean the >> constant out, which is what you want. >> >> can someone help Marco? >> >> Mark >> >> >> >> >> >> On Mon, Mar 21, 2022 at 8:18 AM Marco Cisternino < >> marco.cisternino at optimad.it> wrote: >> >> Good morning, >> I?m observing an unexpected (to me) behaviour of my code. >> I tried to reduce the problem in a toy code here attached. >> The toy code archive contains a small main, a matrix and a rhs. >> The toy code solves the linear system and check the norms and the mean of >> the solution. >> The problem into the matrix and the rhs is the finite volume >> discretization of the pressure equation of an incompressible NS solver. >> It has been cooked as tiny as possible (16 cells!). >> It is important to say that it is an elliptic problem with homogeneous >> Neumann boundary conditions only, for this reason the toy code sets a null >> space containing the constant. >> >> The unexpected (to me) behaviour is evident by launching the code using >> different preconditioners, using -pc-type >> I tested using PCNONE (?none?), PCGAMG (?gamg?) and PCILU (?ilu?). The >> default solver is KSPFGMRES. >> Using the three PC, I get 3 different solutions. It seems to me that they >> differ in the mean value, but GAMG is impressive. >> PCNONE gives me the zero mean solution I expected. What about the others? >> >> Asking for residuals monitor, the ratio ||r||/||b|| shows convergence for >> PCNONE and PCILU (~10^-16), but it stalls for PCGAMG (~10^-4). >> I cannot see why. Am I doing anything wrong or incorrectly thinking about >> the expected behaviour? >> >> Generalizing to larger mesh the behaviour is similar. >> >> Thank you for any help. >> >> Marco Cisternino >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Untitled.png Type: image/png Size: 214150 bytes Desc: not available URL: From bsmith at petsc.dev Mon Mar 21 16:48:48 2022 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 21 Mar 2022 17:48:48 -0400 Subject: [petsc-users] MatCreateSBAIJ In-Reply-To: References: Message-ID: > On Mar 21, 2022, at 4:36 PM, Sam Guo wrote: > > Barry, > Thanks. Could you elaborate? I try to implement the matrix-vector multiplication for a symmetric matrix using shell matrix. Consider with three ranks (a) = ( A B D) (x) (b) ( B' C E) (y) (c) ( D' E' F) (w) Only the ones without the ' are stored on the rank. So for example B is stored on rank 0. Rank 0 computes A x and keeps it in a. Rank 1 computes Cy and keeps it in b Rank 2 computes Fw and keeps it in c Rank 0 computes B'x and D'x. It puts the nonzero entries of these values as well as the values of x into slvec0 Rank 1 computes E'y and puts the nonzero entries as well as the values into slvec0 Rank 2 puts the values of we needed by the other ranks into slvec0 Rank 0 does B y_h + D z_h where it gets the y_h and z_h values from slvec1 and adds it to a Rank 1 takes the B'x from slvec1 and adds it to b it then takes the E y_h values where the y_h are pulled from slvec1 and adds them b Rank 2 takes the B'x and E'y from slvec0 and adds them to c. > > Thanks, > Sam > > On Mon, Mar 21, 2022 at 12:56 PM Barry Smith > wrote: > > The "trick" is that though "more" communication is needed to complete the product the communication can still be done in a single VecScatter instead of two separate calls to VecScatter. We simply pack both pieces of information that needs to be sent into a single vector. > > /* copy x into the vec slvec0 */ > 1111: <> VecGetArray (a->slvec0,&from); > 1112: <> VecGetArrayRead (xx,&x); > > 1114: <> PetscArraycpy (from,x,bs*mbs); > 1115: <> VecRestoreArray (a->slvec0,&from); > 1116: <> VecRestoreArrayRead (xx,&x); > > 1118: <> VecScatterBegin (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD ); > 1119: <> VecScatterEnd (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD ); > If you create two symmetric matrices, one with SBAIJ and one with BAIJ and compare the time to do the product you will find that the SBAIJ is not significantly slower but does save memory. > > >> On Mar 21, 2022, at 3:26 PM, Sam Guo > wrote: >> >> Using following example from the MatCreateSBAIJ documentation >> 0 1 2 3 4 5 6 7 8 9 10 11 >> -------------------------- >> row 3 |. . . d d d o o o o o o >> row 4 |. . . d d d o o o o o o >> row 5 |. . . d d d o o o o o o >> -------------------------- >> >> On a processor that owns rows 3, 4 and 5, rows 0-2 info are still needed. Is is that the processor that owns rows 0-2 will apply B symmetrical and then send the result >> to the processor that owns 3-5? >> >> On Mon, Mar 21, 2022 at 12:14 PM Mark Adams > wrote: >> PETSc stores parallel matrices as two serial matrices. One for the diagonal (d or A) block and one for the rest (o or B). >> I would guess that for symmetric matrices it has a symmetric matrix for the diagonal and a full AIJ matrix for the (upper) off-diagonal. >> So the multtranspose is applying B symmetrically. This lower off-diagonal and the diagonal block can be done without communication. >> Then the off processor values are collected, and the upper off-diagonal is applied. >> >> On Mon, Mar 21, 2022 at 2:35 PM Sam Guo > wrote: >> I am most interested in how the lower triangular part is redistributed. It seems that SBAJI saves memory but requires more communication than BAIJ. >> >> On Mon, Mar 21, 2022 at 11:27 AM Sam Guo > wrote: >> Mark, thanks for the quick response. I am more interested in parallel implementation of MatMult for SBAIJ. I found following >> 1094: <> <>PetscErrorCode MatMult_MPISBAIJ(Mat A,Vec xx,Vec yy) >> 1095: <>{ >> 1096: <> Mat_MPISBAIJ *a = (Mat_MPISBAIJ*)A->data; >> 1097: <> PetscErrorCode ierr; >> 1098: <> PetscInt mbs=a->mbs,bs=A->rmap->bs; >> 1099: <> PetscScalar *from; >> 1100: <> const PetscScalar *x; >> >> 1103: <> /* diagonal part */ >> 1104: <> (*a->A->ops->mult)(a->A,xx,a->slvec1a); >> 1105: <> VecSet (a->slvec1b,0.0); >> >> 1107: <> /* subdiagonal part */ >> 1108: <> (*a->B->ops->multtranspose)(a->B,xx,a->slvec0b); >> >> 1110: <> /* copy x into the vec slvec0 */ >> 1111: <> VecGetArray (a->slvec0,&from); >> 1112: <> VecGetArrayRead (xx,&x); >> >> 1114: <> PetscArraycpy (from,x,bs*mbs); >> 1115: <> VecRestoreArray (a->slvec0,&from); >> 1116: <> VecRestoreArrayRead (xx,&x); >> >> 1118: <> VecScatterBegin (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD ); >> 1119: <> VecScatterEnd (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD ); >> 1120: <> /* supperdiagonal part */ >> 1121: <> (*a->B->ops->multadd)(a->B,a->slvec1b,a->slvec1a,yy); >> 1122: <> return(0); >> 1123: <>} >> I try to understand the algorithm. >> >> Thanks, >> Sam >> >> On Mon, Mar 21, 2022 at 11:14 AM Mark Adams > wrote: >> This code looks fine to me and the code is in src/mat/impls/sbaij/seq/sbaij2.c >> >> On Mon, Mar 21, 2022 at 2:02 PM Sam Guo > wrote: >> Dear PETSc dev team, >> The documentation about MatCreateSBAIJ has following >> "It is recommended that one use the MatCreate (), MatSetType () and/or MatSetFromOptions (), MatXXXXSetPreallocation() paradigm instead of this routine directly. [MatXXXXSetPreallocation() is, for example, MatSeqAIJSetPreallocation ]" >> I currently call MatCreateSBAIJ directly as follows: >> MatCreateSBAIJ (with d_nnz and o_nnz) >> MatSetValues (to add row by row) >> MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); >> MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); >> MatSetOption(A, MAT_SYMMETRIC, PETSC_TRUE); >> >> Two questions: >> (1) I am wondering whether what I am doing is the most efficient. >> >> (2) I try to find out how the matrix vector multiplication is implemented in PETSc for SBAIJ storage. >> >> Thanks, >> Sam > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Mar 21 17:07:57 2022 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 21 Mar 2022 18:07:57 -0400 Subject: [petsc-users] Null space and preconditioners In-Reply-To: References: Message-ID: I have made a merge request https://gitlab.com/petsc/petsc/-/merge_requests/5002 with an attempt to improve the documentation. It turns out that sometimes it works well also with right preconditioning (but generically it should not) so I did not put in error checking to prevent such usage. It turns out a bunch of test examples with null spaces are run with right preconditioning. Barry > On Mar 21, 2022, at 1:51 PM, Marco Cisternino wrote: > > > From: Matthew Knepley > > Sent: luned? 21 marzo 2022 18:31 > To: Marco Cisternino > > Cc: Mark Adams >; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Null space and preconditioners > > On Mon, Mar 21, 2022 at 1:25 PM Marco Cisternino > wrote: > Thank you, Matt. > I already set the null space and test it in the toy code > If you set this and it is not working, something is wrong. This will remove null space components from each update in the Krylov space. > This is done in an example where I check convergence to the exact solution: https://gitlab.com/petsc/petsc/-/blob/main/src/snes/tutorials/ex69.c > > Do I have to set a special null space when I use GAMG? > The toy code works for PCNONE and PCILU, giving a zero mean solution the first PC and an almost zero mean solution the second one. > GAMG floats away, quoting Mark. > Looking at what I do: > MatNullSpaceCreate(PETSC_COMM_WORLD, PETSC_TRUE, 0, nullptr, &nullspace); > MatSetNullSpace(matrix, nullspace); > it is more or less what you do at lines 3231-3233 of your reference. Am I wrong? > What about lines 3220-3223? What is the difference between nullSpace and nullSpacePres? > > > I tried your suggestion: the norm and the mean of the solution using -mg_coarse_pc_type svd with PCGAMG is much closer to the one of PCNONE (the norm is the same up to the 6th digit, the mean is about 10e-4 with ?svd? PCGAMG and 10e-17 with PCNONE). I?m going to try with the real code and see what happens on larger meshes > This is not perfect since null space components can be introduced by the rest of the preconditioner, but when I use range-space smoothers and > local interpolation it tends to be much better for me. Maybe it is just my problems. > > Is there a way to set -mg_coarse_pc_type svd with the API into the code? > > Thanks, > > Marco > > Thanks, > > Matt > > Thank you all. > > Marco Cisternino > > > From: Matthew Knepley > > Sent: luned? 21 marzo 2022 18:16 > To: Mark Adams > > Cc: Marco Cisternino >; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Null space and preconditioners > > On Mon, Mar 21, 2022 at 12:06 PM Mark Adams > wrote: > The solution for Neumann problems can "float away" if the constant is not controlled in some way because floating point errors can introduce it even if your RHS is exactly orthogonal to it. > > You should use a special coarse grid solver for GAMG but it seems to be working for you. > > I have lost track of the simply way to have the KSP solver clean the constant out, which is what you want. > > can someone help Marco? > > I have not had time to look at the code. However, here are two ways we use to fix the pure Neumann solution: > > 1) Attach a null space to the operator using https://petsc.org/main/docs/manualpages/Mat/MatSetNullSpace.html > > 2) Use a coarse grid solver that does least-squares > > -mg_coarse_pc_type svd > > Thanks, > > Matt > > Mark > > > > > > On Mon, Mar 21, 2022 at 8:18 AM Marco Cisternino > wrote: > Good morning, > I?m observing an unexpected (to me) behaviour of my code. > I tried to reduce the problem in a toy code here attached. > The toy code archive contains a small main, a matrix and a rhs. > The toy code solves the linear system and check the norms and the mean of the solution. > The problem into the matrix and the rhs is the finite volume discretization of the pressure equation of an incompressible NS solver. > It has been cooked as tiny as possible (16 cells!). > It is important to say that it is an elliptic problem with homogeneous Neumann boundary conditions only, for this reason the toy code sets a null space containing the constant. > > The unexpected (to me) behaviour is evident by launching the code using different preconditioners, using -pc-type > I tested using PCNONE (?none?), PCGAMG (?gamg?) and PCILU (?ilu?). The default solver is KSPFGMRES. > Using the three PC, I get 3 different solutions. It seems to me that they differ in the mean value, but GAMG is impressive. > PCNONE gives me the zero mean solution I expected. What about the others? > > Asking for residuals monitor, the ratio ||r||/||b|| shows convergence for PCNONE and PCILU (~10^-16), but it stalls for PCGAMG (~10^-4). > I cannot see why. Am I doing anything wrong or incorrectly thinking about the expected behaviour? > > Generalizing to larger mesh the behaviour is similar. > > Thank you for any help. > > Marco Cisternino > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Mar 21 17:22:57 2022 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 21 Mar 2022 18:22:57 -0400 Subject: [petsc-users] PetscSection and DMPlexVTKWriteAll in parallel In-Reply-To: References: Message-ID: On Mon, Mar 21, 2022 at 11:22 AM Ferrand, Jesus A. wrote: > Greetings. > > I am having trouble exporting a vertex-based solution field to ParaView > when I run my PETSc script in parallel (see screenshots). The smoothly > changing field is produced by my serial runs whereas the "messed up" one is > produced by my parallel runs. This is not a calculation bug, rather, it > concerns the vtk output only (the solution field is the same in parallel > and serial). I am using DMPlexVTKWriteAll() but will make the switch to > hdf5 sometime. > For output, I would suggest keeping it as simple as possible inside your code. For example, I would use Ierr = DMViewFromOptions(dm, NULL, "-dm_view");CHKERRQ(ierr); ierr = VecViewFromOptions(sol, NULL, "-sol_view");CHKERRQ(ierr); as the only output in my program (until you need something more sophisticated). Then you could use -sol_view vtk:sol.vtu for output, or -dm_view hdf5:sol.h5 -sol_view hdf5:sol.h5::append to get HDF5 output. For HDF5 you run ${PETSC_DIR}/lib/bin/petsc_gen_xdmf.py sol.h5 to get sol.xmf which can be loaded in Paraview. > Anyways, my suspicion is about PetscSection and how I am setting it up. I > call PetscSectionSetChart() where my "pStart" and "pEnd" I collect from > DMPlexGetDepthStratum() where "depth" is set to zero (for vertices) and > then I call DMSetLocalSection(). After tinkering with DMPlex routines, I > realize that DMPlexGetDepthStratum() returns "pStart" and "pEnd" in local > numbering when the run is in parallel. Thus, I think that my serial output > is correct because in that case local numbering matches the global > numbering. > > So, am I correct in believing that the PetscSectionSetChart() call should > be done with global numbering? > No, the reason it is DMSetLocalSection() is that the section is explicitly local. Also, even the global section uses local numbering for the points (global point numbering is never used inside Plex so that it does not inhibit scalability). > Also, I noticed that the parallel DMPlex counts ghost vertices towards the > "pStart" and "pEnd". So, when I set the chart in the local PetscSection, > should I figure out the chart for the owned vertices or can PETSc figure > the ghost/owned dilemma when the local PetscSections feature overlapping > charts? > You do not have to. PETSc will do that automatically. Thanks, Matt > Sincerely: > > *J.A. Ferrand* > > Embry-Riddle Aeronautical University - Daytona Beach FL > > M.Sc. Aerospace Engineering | May 2022 > > B.Sc. Aerospace Engineering > > B.Sc. Computational Mathematics > > > > Sigma Gamma Tau > > Tau Beta Pi > > > > *Phone:* (386)-843-1829 > > *Email(s):* ferranj2 at my.erau.edu > > jesus.ferrand at gmail.com > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From marco.cisternino at optimad.it Tue Mar 22 08:55:09 2022 From: marco.cisternino at optimad.it (Marco Cisternino) Date: Tue, 22 Mar 2022 13:55:09 +0000 Subject: [petsc-users] Null space and preconditioners In-Reply-To: References: Message-ID: Thank you Barry! No, no reason for FGMRES (some old tests showed shorter wall-times relative to GMRES), I?m going to use GMRES. I tried GMRES with GAMG using PCSVD on the coarser level on real cases, larger than the toy case, I cannot get machine epsilon mean value of the solution, as in the toy case but about 10e-5, which is much better than before. The rest of GAMG is on default configuration. The mesh is much more complicated being an octree with immersed geometries. I get the same behaviour using GMRES+ASM+ILU as well. I cannot really get why the solution is not at zero mean value. I?m using a rtol~1.0e-8 but I get the same mean value using rtol~1.o-6, I tried 1.0-12 too, but no improvement in the constant, definitely tiny but not zero. Thanks. Marco Cisternino From: Barry Smith Sent: luned? 21 marzo 2022 19:49 To: Marco Cisternino Cc: Mark Adams ; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Null space and preconditioners Marco, I have confirmed your results. Urgg, it appears we do not have something well documented. The removal of the null space only works for left preconditioned solvers and FGMRES only works with right preconditioning. Here is the reasoning. The Krylov space for left preconditioning is built from [r, BAr, (BA)^2 r, ...] and the solution space is built from this basis. If A has a null space of n then the left preconditioned Krylov methods simply remove n from the "full" Krylov space after applying each B preconditioner and the resulting "reduced" Krylov space has no components in the n directions hence the solution built by GMRES naturally has no component in the n. But with right preconditioning the Krylov space is [s ABs (AB)^2 s, ....] We would need to remove B^-1 n from the Krylov space so that (A B) B^-1 n = 0 In general we don't have any way of applying B^-1 to a vector so we cannot create the appropriate "reduced" Krylov space. If I run with GMRES (which defaults to left preconditioner) and the options ./testPreconditioners -pc_type gamg -ksp_type gmres -ksp_monitor_true_residual -ksp_rtol 1.e-12 -ksp_view -mg_coarse_pc_type svd Then it handles the null space correctly and the solution has Solution mean = 4.51028e-17 Is there any reason to use FGMRES instead of GMRES? You just cannot use GMRES as the smoother inside GAMG if you use GMRES on the outside, but for pressure equations you don't want use such a strong smoother anyways. Barry I feel we should add some information to the documentation on the removal of the null space to the user's manual when using right preconditioning and maybe even have an error check in the code so that people don't fall into this trap. But I am not sure exactly what to do. When the A and B are both symmetric I think special stuff happens that doesn't require providing a null space; but I am not sure. On Mar 21, 2022, at 12:41 PM, Marco Cisternino > wrote: Thank you, Mark. However, doing this with my toy code mpirun -n 1 ./testpreconditioner -pc_type gamg -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi -mg_coarse_ksp_type cg I get 16 inf elements. Do I miss anything? Thanks again Marco Cisternino From: Mark Adams > Sent: luned? 21 marzo 2022 17:31 To: Marco Cisternino > Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Null space and preconditioners And for GAMG you can use: -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi -mg_coarse_ksp_type cg Note if you are using more that one MPI process you can use 'lu' instead of 'jacobi' If GAMG converges fast enough it can solve before the constant creeps in and works without cleaning in the KSP method. On Mon, Mar 21, 2022 at 12:06 PM Mark Adams > wrote: The solution for Neumann problems can "float away" if the constant is not controlled in some way because floating point errors can introduce it even if your RHS is exactly orthogonal to it. You should use a special coarse grid solver for GAMG but it seems to be working for you. I have lost track of the simply way to have the KSP solver clean the constant out, which is what you want. can someone help Marco? Mark On Mon, Mar 21, 2022 at 8:18 AM Marco Cisternino > wrote: Good morning, I?m observing an unexpected (to me) behaviour of my code. I tried to reduce the problem in a toy code here attached. The toy code archive contains a small main, a matrix and a rhs. The toy code solves the linear system and check the norms and the mean of the solution. The problem into the matrix and the rhs is the finite volume discretization of the pressure equation of an incompressible NS solver. It has been cooked as tiny as possible (16 cells!). It is important to say that it is an elliptic problem with homogeneous Neumann boundary conditions only, for this reason the toy code sets a null space containing the constant. The unexpected (to me) behaviour is evident by launching the code using different preconditioners, using -pc-type I tested using PCNONE (?none?), PCGAMG (?gamg?) and PCILU (?ilu?). The default solver is KSPFGMRES. Using the three PC, I get 3 different solutions. It seems to me that they differ in the mean value, but GAMG is impressive. PCNONE gives me the zero mean solution I expected. What about the others? Asking for residuals monitor, the ratio ||r||/||b|| shows convergence for PCNONE and PCILU (~10^-16), but it stalls for PCGAMG (~10^-4). I cannot see why. Am I doing anything wrong or incorrectly thinking about the expected behaviour? Generalizing to larger mesh the behaviour is similar. Thank you for any help. Marco Cisternino -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Mar 22 09:22:05 2022 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 22 Mar 2022 10:22:05 -0400 Subject: [petsc-users] Null space and preconditioners In-Reply-To: References: Message-ID: On Tue, Mar 22, 2022 at 9:55 AM Marco Cisternino < marco.cisternino at optimad.it> wrote: > Thank you Barry! > No, no reason for FGMRES (some old tests showed shorter wall-times > relative to GMRES), I?m going to use GMRES. > I tried GMRES with GAMG using PCSVD on the coarser level on real cases, > larger than the toy case, I cannot get machine epsilon mean value of the > solution, as in the toy case but about 10e-5, which is much better than > before. The rest of GAMG is on default configuration. The mesh is much more > complicated being an octree with immersed geometries. I get the same > behaviour using GMRES+ASM+ILU as well. > This does not seem possible. Setting the null space will remove the mean value. Something is not configured correctly here. Can you show -ksp_view for this solve? Thanks, Matt > I cannot really get why the solution is not at zero mean value. I?m using > a rtol~1.0e-8 but I get the same mean value using rtol~1.o-6, I tried > 1.0-12 too, but no improvement in the constant, definitely tiny but not > zero. > > Thanks. > > > > Marco Cisternino > > > > *From:* Barry Smith > *Sent:* luned? 21 marzo 2022 19:49 > *To:* Marco Cisternino > *Cc:* Mark Adams ; petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Null space and preconditioners > > > > > > Marco, > > > > I have confirmed your results. > > > > Urgg, it appears we do not have something well documented. The > removal of the null space only works for left preconditioned solvers and > FGMRES only works with right preconditioning. Here is the reasoning. > > > > The Krylov space for left preconditioning is built from [r, BAr, > (BA)^2 r, ...] and the solution space is built from this basis. If A has a > null space of n then the left preconditioned Krylov methods simply remove n > from the "full" Krylov space after applying each B preconditioner and the > resulting "reduced" Krylov space has no components in the n directions > hence the solution built by GMRES naturally has no component in the n. > > > > But with right preconditioning the Krylov space is [s ABs (AB)^2 s, > ....] We would need to remove B^-1 n from the Krylov space so that (A B) > B^-1 n = 0 In general we don't have any way of applying B^-1 to a vector so > we cannot create the appropriate "reduced" Krylov space. > > > > If I run with GMRES (which defaults to left preconditioner) and the > options ./testPreconditioners -pc_type gamg -ksp_type gmres > -ksp_monitor_true_residual -ksp_rtol 1.e-12 -ksp_view -mg_coarse_pc_type svd > > > > Then it handles the null space correctly and the solution has Solution > mean = 4.51028e-17 > > > > Is there any reason to use FGMRES instead of GMRES? You just cannot use > GMRES as the smoother inside GAMG if you use GMRES on the outside, but for > pressure equations you don't want use such a strong smoother anyways. > > > > Barry > > > > I feel we should add some information to the documentation on the > removal of the null space to the user's manual when using right > preconditioning and maybe even have an error check in the code so that > people don't fall into this trap. But I am not sure exactly what to do. > When the A and B are both symmetric I think special stuff happens that > doesn't require providing a null space; but I am not sure. > > > > > > > > > > > > On Mar 21, 2022, at 12:41 PM, Marco Cisternino < > marco.cisternino at optimad.it> wrote: > > > > Thank you, Mark. > However, doing this with my toy code > > mpirun -n 1 ./testpreconditioner -pc_type gamg > -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi > -mg_coarse_ksp_type cg > > > > I get 16 inf elements. Do I miss anything? > > > > Thanks again > > > > Marco Cisternino > > > > > > *From:* Mark Adams > *Sent:* luned? 21 marzo 2022 17:31 > *To:* Marco Cisternino > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Null space and preconditioners > > > > And for GAMG you can use: > > > > -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi > -mg_coarse_ksp_type cg > > > > Note if you are using more that one MPI process you can use 'lu' instead > of 'jacobi' > > > > If GAMG converges fast enough it can solve before the constant creeps in > and works without cleaning in the KSP method. > > > > On Mon, Mar 21, 2022 at 12:06 PM Mark Adams wrote: > > The solution for Neumann problems can "float away" if the constant is not > controlled in some way because floating point errors can introduce it even > if your RHS is exactly orthogonal to it. > > > > You should use a special coarse grid solver for GAMG but it seems to be > working for you. > > > > I have lost track of the simply way to have the KSP solver clean the > constant out, which is what you want. > > > > can someone help Marco? > > > > Mark > > > > > > > > > > > > On Mon, Mar 21, 2022 at 8:18 AM Marco Cisternino < > marco.cisternino at optimad.it> wrote: > > Good morning, > > I?m observing an unexpected (to me) behaviour of my code. > > I tried to reduce the problem in a toy code here attached. > The toy code archive contains a small main, a matrix and a rhs. > > The toy code solves the linear system and check the norms and the mean of > the solution. > > The problem into the matrix and the rhs is the finite volume > discretization of the pressure equation of an incompressible NS solver. > > It has been cooked as tiny as possible (16 cells!). > > It is important to say that it is an elliptic problem with homogeneous > Neumann boundary conditions only, for this reason the toy code sets a null > space containing the constant. > > > > The unexpected (to me) behaviour is evident by launching the code using > different preconditioners, using -pc-type > I tested using PCNONE (?none?), PCGAMG (?gamg?) and PCILU (?ilu?). The > default solver is KSPFGMRES. > > Using the three PC, I get 3 different solutions. It seems to me that they > differ in the mean value, but GAMG is impressive. > > PCNONE gives me the zero mean solution I expected. What about the others? > > > > Asking for residuals monitor, the ratio ||r||/||b|| shows convergence for > PCNONE and PCILU (~10^-16), but it stalls for PCGAMG (~10^-4). > > I cannot see why. Am I doing anything wrong or incorrectly thinking about > the expected behaviour? > > > > Generalizing to larger mesh the behaviour is similar. > > > > Thank you for any help. > > > > Marco Cisternino > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Mar 22 09:26:40 2022 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 22 Mar 2022 10:26:40 -0400 Subject: [petsc-users] Null space and preconditioners In-Reply-To: References: Message-ID: > On Mar 22, 2022, at 9:55 AM, Marco Cisternino wrote: > > Thank you Barry! > No, no reason for FGMRES (some old tests showed shorter wall-times relative to GMRES), I?m going to use GMRES. > I tried GMRES with GAMG using PCSVD on the coarser level on real cases, larger than the toy case, I cannot get machine epsilon mean value of the solution, as in the toy case but about 10e-5, which is much better than before. The rest of GAMG is on default configuration. The mesh is much more complicated being an octree with immersed geometries. I get the same behaviour using GMRES+ASM+ILU as well. > I cannot really get why the solution is not at zero mean value. I?m using a rtol~1.0e-8 but I get the same mean value using rtol~1.o-6, I tried 1.0-12 too, but no improvement in the constant, definitely tiny but not zero. > Thanks. Are you using GMRES in the smoother or Chebyshev? Recommend using Chebyshev. I have no explanation why the average is not much closer to zero. Here is one thing to try. Use KSPMonitorSet() to provide a monitor that calls KSPBuildSolution() and then computes the average and prints it. Is the average always around 1e-5 from the first iteration or does it start close to 1e-12 and get larger with more iterations? > > Marco Cisternino > > From: Barry Smith > > Sent: luned? 21 marzo 2022 19:49 > To: Marco Cisternino > > Cc: Mark Adams >; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Null space and preconditioners > > > Marco, > > I have confirmed your results. > > Urgg, it appears we do not have something well documented. The removal of the null space only works for left preconditioned solvers and FGMRES only works with right preconditioning. Here is the reasoning. > > The Krylov space for left preconditioning is built from [r, BAr, (BA)^2 r, ...] and the solution space is built from this basis. If A has a null space of n then the left preconditioned Krylov methods simply remove n from the "full" Krylov space after applying each B preconditioner and the resulting "reduced" Krylov space has no components in the n directions hence the solution built by GMRES naturally has no component in the n. > > But with right preconditioning the Krylov space is [s ABs (AB)^2 s, ....] We would need to remove B^-1 n from the Krylov space so that (A B) B^-1 n = 0 In general we don't have any way of applying B^-1 to a vector so we cannot create the appropriate "reduced" Krylov space. > > If I run with GMRES (which defaults to left preconditioner) and the options ./testPreconditioners -pc_type gamg -ksp_type gmres -ksp_monitor_true_residual -ksp_rtol 1.e-12 -ksp_view -mg_coarse_pc_type svd > > Then it handles the null space correctly and the solution has Solution mean = 4.51028e-17 > > Is there any reason to use FGMRES instead of GMRES? You just cannot use GMRES as the smoother inside GAMG if you use GMRES on the outside, but for pressure equations you don't want use such a strong smoother anyways. > > Barry > > I feel we should add some information to the documentation on the removal of the null space to the user's manual when using right preconditioning and maybe even have an error check in the code so that people don't fall into this trap. But I am not sure exactly what to do. When the A and B are both symmetric I think special stuff happens that doesn't require providing a null space; but I am not sure. > > > > > > > On Mar 21, 2022, at 12:41 PM, Marco Cisternino > wrote: > > Thank you, Mark. > However, doing this with my toy code > mpirun -n 1 ./testpreconditioner -pc_type gamg -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi -mg_coarse_ksp_type cg > > I get 16 inf elements. Do I miss anything? > > Thanks again > > Marco Cisternino > > > From: Mark Adams > > Sent: luned? 21 marzo 2022 17:31 > To: Marco Cisternino > > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Null space and preconditioners > > And for GAMG you can use: > > -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi -mg_coarse_ksp_type cg > > Note if you are using more that one MPI process you can use 'lu' instead of 'jacobi' > > If GAMG converges fast enough it can solve before the constant creeps in and works without cleaning in the KSP method. > > On Mon, Mar 21, 2022 at 12:06 PM Mark Adams > wrote: > The solution for Neumann problems can "float away" if the constant is not controlled in some way because floating point errors can introduce it even if your RHS is exactly orthogonal to it. > > You should use a special coarse grid solver for GAMG but it seems to be working for you. > > I have lost track of the simply way to have the KSP solver clean the constant out, which is what you want. > > can someone help Marco? > > Mark > > > > > > On Mon, Mar 21, 2022 at 8:18 AM Marco Cisternino > wrote: > Good morning, > I?m observing an unexpected (to me) behaviour of my code. > I tried to reduce the problem in a toy code here attached. > The toy code archive contains a small main, a matrix and a rhs. > The toy code solves the linear system and check the norms and the mean of the solution. > The problem into the matrix and the rhs is the finite volume discretization of the pressure equation of an incompressible NS solver. > It has been cooked as tiny as possible (16 cells!). > It is important to say that it is an elliptic problem with homogeneous Neumann boundary conditions only, for this reason the toy code sets a null space containing the constant. > > The unexpected (to me) behaviour is evident by launching the code using different preconditioners, using -pc-type > I tested using PCNONE (?none?), PCGAMG (?gamg?) and PCILU (?ilu?). The default solver is KSPFGMRES. > Using the three PC, I get 3 different solutions. It seems to me that they differ in the mean value, but GAMG is impressive. > PCNONE gives me the zero mean solution I expected. What about the others? > > Asking for residuals monitor, the ratio ||r||/||b|| shows convergence for PCNONE and PCILU (~10^-16), but it stalls for PCGAMG (~10^-4). > I cannot see why. Am I doing anything wrong or incorrectly thinking about the expected behaviour? > > Generalizing to larger mesh the behaviour is similar. > > Thank you for any help. > > Marco Cisternino -------------- next part -------------- An HTML attachment was scrubbed... URL: From marco.cisternino at optimad.it Tue Mar 22 09:30:23 2022 From: marco.cisternino at optimad.it (Marco Cisternino) Date: Tue, 22 Mar 2022 14:30:23 +0000 Subject: [petsc-users] Null space and preconditioners In-Reply-To: References: Message-ID: Chebyshev, it should be the default, isn?t it? I checked for it in the toy code. I?m preparing a lanuch using -ksp_view to check for the real case, I will prepare a monitor for computing solution average if ksp_view is going to say nothing relevant. Thanks Marco Cisternino From: Barry Smith Sent: marted? 22 marzo 2022 15:27 To: Marco Cisternino Cc: Mark Adams ; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Null space and preconditioners On Mar 22, 2022, at 9:55 AM, Marco Cisternino > wrote: Thank you Barry! No, no reason for FGMRES (some old tests showed shorter wall-times relative to GMRES), I?m going to use GMRES. I tried GMRES with GAMG using PCSVD on the coarser level on real cases, larger than the toy case, I cannot get machine epsilon mean value of the solution, as in the toy case but about 10e-5, which is much better than before. The rest of GAMG is on default configuration. The mesh is much more complicated being an octree with immersed geometries. I get the same behaviour using GMRES+ASM+ILU as well. I cannot really get why the solution is not at zero mean value. I?m using a rtol~1.0e-8 but I get the same mean value using rtol~1.o-6, I tried 1.0-12 too, but no improvement in the constant, definitely tiny but not zero. Thanks. Are you using GMRES in the smoother or Chebyshev? Recommend using Chebyshev. I have no explanation why the average is not much closer to zero. Here is one thing to try. Use KSPMonitorSet() to provide a monitor that calls KSPBuildSolution() and then computes the average and prints it. Is the average always around 1e-5 from the first iteration or does it start close to 1e-12 and get larger with more iterations? Marco Cisternino From: Barry Smith > Sent: luned? 21 marzo 2022 19:49 To: Marco Cisternino > Cc: Mark Adams >; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Null space and preconditioners Marco, I have confirmed your results. Urgg, it appears we do not have something well documented. The removal of the null space only works for left preconditioned solvers and FGMRES only works with right preconditioning. Here is the reasoning. The Krylov space for left preconditioning is built from [r, BAr, (BA)^2 r, ...] and the solution space is built from this basis. If A has a null space of n then the left preconditioned Krylov methods simply remove n from the "full" Krylov space after applying each B preconditioner and the resulting "reduced" Krylov space has no components in the n directions hence the solution built by GMRES naturally has no component in the n. But with right preconditioning the Krylov space is [s ABs (AB)^2 s, ....] We would need to remove B^-1 n from the Krylov space so that (A B) B^-1 n = 0 In general we don't have any way of applying B^-1 to a vector so we cannot create the appropriate "reduced" Krylov space. If I run with GMRES (which defaults to left preconditioner) and the options ./testPreconditioners -pc_type gamg -ksp_type gmres -ksp_monitor_true_residual -ksp_rtol 1.e-12 -ksp_view -mg_coarse_pc_type svd Then it handles the null space correctly and the solution has Solution mean = 4.51028e-17 Is there any reason to use FGMRES instead of GMRES? You just cannot use GMRES as the smoother inside GAMG if you use GMRES on the outside, but for pressure equations you don't want use such a strong smoother anyways. Barry I feel we should add some information to the documentation on the removal of the null space to the user's manual when using right preconditioning and maybe even have an error check in the code so that people don't fall into this trap. But I am not sure exactly what to do. When the A and B are both symmetric I think special stuff happens that doesn't require providing a null space; but I am not sure. On Mar 21, 2022, at 12:41 PM, Marco Cisternino > wrote: Thank you, Mark. However, doing this with my toy code mpirun -n 1 ./testpreconditioner -pc_type gamg -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi -mg_coarse_ksp_type cg I get 16 inf elements. Do I miss anything? Thanks again Marco Cisternino From: Mark Adams > Sent: luned? 21 marzo 2022 17:31 To: Marco Cisternino > Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Null space and preconditioners And for GAMG you can use: -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi -mg_coarse_ksp_type cg Note if you are using more that one MPI process you can use 'lu' instead of 'jacobi' If GAMG converges fast enough it can solve before the constant creeps in and works without cleaning in the KSP method. On Mon, Mar 21, 2022 at 12:06 PM Mark Adams > wrote: The solution for Neumann problems can "float away" if the constant is not controlled in some way because floating point errors can introduce it even if your RHS is exactly orthogonal to it. You should use a special coarse grid solver for GAMG but it seems to be working for you. I have lost track of the simply way to have the KSP solver clean the constant out, which is what you want. can someone help Marco? Mark On Mon, Mar 21, 2022 at 8:18 AM Marco Cisternino > wrote: Good morning, I?m observing an unexpected (to me) behaviour of my code. I tried to reduce the problem in a toy code here attached. The toy code archive contains a small main, a matrix and a rhs. The toy code solves the linear system and check the norms and the mean of the solution. The problem into the matrix and the rhs is the finite volume discretization of the pressure equation of an incompressible NS solver. It has been cooked as tiny as possible (16 cells!). It is important to say that it is an elliptic problem with homogeneous Neumann boundary conditions only, for this reason the toy code sets a null space containing the constant. The unexpected (to me) behaviour is evident by launching the code using different preconditioners, using -pc-type I tested using PCNONE (?none?), PCGAMG (?gamg?) and PCILU (?ilu?). The default solver is KSPFGMRES. Using the three PC, I get 3 different solutions. It seems to me that they differ in the mean value, but GAMG is impressive. PCNONE gives me the zero mean solution I expected. What about the others? Asking for residuals monitor, the ratio ||r||/||b|| shows convergence for PCNONE and PCILU (~10^-16), but it stalls for PCGAMG (~10^-4). I cannot see why. Am I doing anything wrong or incorrectly thinking about the expected behaviour? Generalizing to larger mesh the behaviour is similar. Thank you for any help. Marco Cisternino -------------- next part -------------- An HTML attachment was scrubbed... URL: From marco.cisternino at optimad.it Tue Mar 22 12:44:48 2022 From: marco.cisternino at optimad.it (Marco Cisternino) Date: Tue, 22 Mar 2022 17:44:48 +0000 Subject: [petsc-users] Null space and preconditioners In-Reply-To: References: Message-ID: Thank you, Matt. I rechecked and the mean value of the solution is 1.0-15!! Clumsily, I was checking an integral average and on an octree mesh it is not exactly the same. Thanks again! Marco Cisternino From: Matthew Knepley Sent: marted? 22 marzo 2022 15:22 To: Marco Cisternino Cc: Barry Smith ; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Null space and preconditioners On Tue, Mar 22, 2022 at 9:55 AM Marco Cisternino > wrote: Thank you Barry! No, no reason for FGMRES (some old tests showed shorter wall-times relative to GMRES), I?m going to use GMRES. I tried GMRES with GAMG using PCSVD on the coarser level on real cases, larger than the toy case, I cannot get machine epsilon mean value of the solution, as in the toy case but about 10e-5, which is much better than before. The rest of GAMG is on default configuration. The mesh is much more complicated being an octree with immersed geometries. I get the same behaviour using GMRES+ASM+ILU as well. This does not seem possible. Setting the null space will remove the mean value. Something is not configured correctly here. Can you show -ksp_view for this solve? Thanks, Matt I cannot really get why the solution is not at zero mean value. I?m using a rtol~1.0e-8 but I get the same mean value using rtol~1.o-6, I tried 1.0-12 too, but no improvement in the constant, definitely tiny but not zero. Thanks. Marco Cisternino From: Barry Smith > Sent: luned? 21 marzo 2022 19:49 To: Marco Cisternino > Cc: Mark Adams >; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Null space and preconditioners Marco, I have confirmed your results. Urgg, it appears we do not have something well documented. The removal of the null space only works for left preconditioned solvers and FGMRES only works with right preconditioning. Here is the reasoning. The Krylov space for left preconditioning is built from [r, BAr, (BA)^2 r, ...] and the solution space is built from this basis. If A has a null space of n then the left preconditioned Krylov methods simply remove n from the "full" Krylov space after applying each B preconditioner and the resulting "reduced" Krylov space has no components in the n directions hence the solution built by GMRES naturally has no component in the n. But with right preconditioning the Krylov space is [s ABs (AB)^2 s, ....] We would need to remove B^-1 n from the Krylov space so that (A B) B^-1 n = 0 In general we don't have any way of applying B^-1 to a vector so we cannot create the appropriate "reduced" Krylov space. If I run with GMRES (which defaults to left preconditioner) and the options ./testPreconditioners -pc_type gamg -ksp_type gmres -ksp_monitor_true_residual -ksp_rtol 1.e-12 -ksp_view -mg_coarse_pc_type svd Then it handles the null space correctly and the solution has Solution mean = 4.51028e-17 Is there any reason to use FGMRES instead of GMRES? You just cannot use GMRES as the smoother inside GAMG if you use GMRES on the outside, but for pressure equations you don't want use such a strong smoother anyways. Barry I feel we should add some information to the documentation on the removal of the null space to the user's manual when using right preconditioning and maybe even have an error check in the code so that people don't fall into this trap. But I am not sure exactly what to do. When the A and B are both symmetric I think special stuff happens that doesn't require providing a null space; but I am not sure. On Mar 21, 2022, at 12:41 PM, Marco Cisternino > wrote: Thank you, Mark. However, doing this with my toy code mpirun -n 1 ./testpreconditioner -pc_type gamg -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi -mg_coarse_ksp_type cg I get 16 inf elements. Do I miss anything? Thanks again Marco Cisternino From: Mark Adams > Sent: luned? 21 marzo 2022 17:31 To: Marco Cisternino > Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Null space and preconditioners And for GAMG you can use: -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi -mg_coarse_ksp_type cg Note if you are using more that one MPI process you can use 'lu' instead of 'jacobi' If GAMG converges fast enough it can solve before the constant creeps in and works without cleaning in the KSP method. On Mon, Mar 21, 2022 at 12:06 PM Mark Adams > wrote: The solution for Neumann problems can "float away" if the constant is not controlled in some way because floating point errors can introduce it even if your RHS is exactly orthogonal to it. You should use a special coarse grid solver for GAMG but it seems to be working for you. I have lost track of the simply way to have the KSP solver clean the constant out, which is what you want. can someone help Marco? Mark On Mon, Mar 21, 2022 at 8:18 AM Marco Cisternino > wrote: Good morning, I?m observing an unexpected (to me) behaviour of my code. I tried to reduce the problem in a toy code here attached. The toy code archive contains a small main, a matrix and a rhs. The toy code solves the linear system and check the norms and the mean of the solution. The problem into the matrix and the rhs is the finite volume discretization of the pressure equation of an incompressible NS solver. It has been cooked as tiny as possible (16 cells!). It is important to say that it is an elliptic problem with homogeneous Neumann boundary conditions only, for this reason the toy code sets a null space containing the constant. The unexpected (to me) behaviour is evident by launching the code using different preconditioners, using -pc-type I tested using PCNONE (?none?), PCGAMG (?gamg?) and PCILU (?ilu?). The default solver is KSPFGMRES. Using the three PC, I get 3 different solutions. It seems to me that they differ in the mean value, but GAMG is impressive. PCNONE gives me the zero mean solution I expected. What about the others? Asking for residuals monitor, the ratio ||r||/||b|| shows convergence for PCNONE and PCILU (~10^-16), but it stalls for PCGAMG (~10^-4). I cannot see why. Am I doing anything wrong or incorrectly thinking about the expected behaviour? Generalizing to larger mesh the behaviour is similar. Thank you for any help. Marco Cisternino -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Mar 22 12:48:59 2022 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 22 Mar 2022 13:48:59 -0400 Subject: [petsc-users] Null space and preconditioners In-Reply-To: References: Message-ID: On Tue, Mar 22, 2022 at 1:44 PM Marco Cisternino < marco.cisternino at optimad.it> wrote: > Thank you, Matt. > > I rechecked and the mean value of the solution is 1.0-15!! > Clumsily, I was checking an integral average and on an octree mesh it is > not exactly the same. > > > > Thanks again! > Great! I am happy everything is working. Matt > Marco Cisternino > > > > > > *From:* Matthew Knepley > *Sent:* marted? 22 marzo 2022 15:22 > *To:* Marco Cisternino > *Cc:* Barry Smith ; petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Null space and preconditioners > > > > On Tue, Mar 22, 2022 at 9:55 AM Marco Cisternino < > marco.cisternino at optimad.it> wrote: > > Thank you Barry! > No, no reason for FGMRES (some old tests showed shorter wall-times > relative to GMRES), I?m going to use GMRES. > I tried GMRES with GAMG using PCSVD on the coarser level on real cases, > larger than the toy case, I cannot get machine epsilon mean value of the > solution, as in the toy case but about 10e-5, which is much better than > before. The rest of GAMG is on default configuration. The mesh is much more > complicated being an octree with immersed geometries. I get the same > behaviour using GMRES+ASM+ILU as well. > > > > This does not seem possible. Setting the null space will remove the mean > value. Something is not configured correctly here. Can you show -ksp_view > > for this solve? > > > > Thanks, > > > > Matt > > > > I cannot really get why the solution is not at zero mean value. I?m using > a rtol~1.0e-8 but I get the same mean value using rtol~1.o-6, I tried > 1.0-12 too, but no improvement in the constant, definitely tiny but not > zero. > > Thanks. > > > > Marco Cisternino > > > > *From:* Barry Smith > *Sent:* luned? 21 marzo 2022 19:49 > *To:* Marco Cisternino > *Cc:* Mark Adams ; petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Null space and preconditioners > > > > > > Marco, > > > > I have confirmed your results. > > > > Urgg, it appears we do not have something well documented. The > removal of the null space only works for left preconditioned solvers and > FGMRES only works with right preconditioning. Here is the reasoning. > > > > The Krylov space for left preconditioning is built from [r, BAr, > (BA)^2 r, ...] and the solution space is built from this basis. If A has a > null space of n then the left preconditioned Krylov methods simply remove n > from the "full" Krylov space after applying each B preconditioner and the > resulting "reduced" Krylov space has no components in the n directions > hence the solution built by GMRES naturally has no component in the n. > > > > But with right preconditioning the Krylov space is [s ABs (AB)^2 s, > ....] We would need to remove B^-1 n from the Krylov space so that (A B) > B^-1 n = 0 In general we don't have any way of applying B^-1 to a vector so > we cannot create the appropriate "reduced" Krylov space. > > > > If I run with GMRES (which defaults to left preconditioner) and the > options ./testPreconditioners -pc_type gamg -ksp_type gmres > -ksp_monitor_true_residual -ksp_rtol 1.e-12 -ksp_view -mg_coarse_pc_type svd > > > > Then it handles the null space correctly and the solution has Solution > mean = 4.51028e-17 > > > > Is there any reason to use FGMRES instead of GMRES? You just cannot use > GMRES as the smoother inside GAMG if you use GMRES on the outside, but for > pressure equations you don't want use such a strong smoother anyways. > > > > Barry > > > > I feel we should add some information to the documentation on the > removal of the null space to the user's manual when using right > preconditioning and maybe even have an error check in the code so that > people don't fall into this trap. But I am not sure exactly what to do. > When the A and B are both symmetric I think special stuff happens that > doesn't require providing a null space; but I am not sure. > > > > > > > > > > > > On Mar 21, 2022, at 12:41 PM, Marco Cisternino < > marco.cisternino at optimad.it> wrote: > > > > Thank you, Mark. > However, doing this with my toy code > > mpirun -n 1 ./testpreconditioner -pc_type gamg > -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi > -mg_coarse_ksp_type cg > > > > I get 16 inf elements. Do I miss anything? > > > > Thanks again > > > > Marco Cisternino > > > > > > *From:* Mark Adams > *Sent:* luned? 21 marzo 2022 17:31 > *To:* Marco Cisternino > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Null space and preconditioners > > > > And for GAMG you can use: > > > > -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi > -mg_coarse_ksp_type cg > > > > Note if you are using more that one MPI process you can use 'lu' instead > of 'jacobi' > > > > If GAMG converges fast enough it can solve before the constant creeps in > and works without cleaning in the KSP method. > > > > On Mon, Mar 21, 2022 at 12:06 PM Mark Adams wrote: > > The solution for Neumann problems can "float away" if the constant is not > controlled in some way because floating point errors can introduce it even > if your RHS is exactly orthogonal to it. > > > > You should use a special coarse grid solver for GAMG but it seems to be > working for you. > > > > I have lost track of the simply way to have the KSP solver clean the > constant out, which is what you want. > > > > can someone help Marco? > > > > Mark > > > > > > > > > > > > On Mon, Mar 21, 2022 at 8:18 AM Marco Cisternino < > marco.cisternino at optimad.it> wrote: > > Good morning, > > I?m observing an unexpected (to me) behaviour of my code. > > I tried to reduce the problem in a toy code here attached. > The toy code archive contains a small main, a matrix and a rhs. > > The toy code solves the linear system and check the norms and the mean of > the solution. > > The problem into the matrix and the rhs is the finite volume > discretization of the pressure equation of an incompressible NS solver. > > It has been cooked as tiny as possible (16 cells!). > > It is important to say that it is an elliptic problem with homogeneous > Neumann boundary conditions only, for this reason the toy code sets a null > space containing the constant. > > > > The unexpected (to me) behaviour is evident by launching the code using > different preconditioners, using -pc-type > I tested using PCNONE (?none?), PCGAMG (?gamg?) and PCILU (?ilu?). The > default solver is KSPFGMRES. > > Using the three PC, I get 3 different solutions. It seems to me that they > differ in the mean value, but GAMG is impressive. > > PCNONE gives me the zero mean solution I expected. What about the others? > > > > Asking for residuals monitor, the ratio ||r||/||b|| shows convergence for > PCNONE and PCILU (~10^-16), but it stalls for PCGAMG (~10^-4). > > I cannot see why. Am I doing anything wrong or incorrectly thinking about > the expected behaviour? > > > > Generalizing to larger mesh the behaviour is similar. > > > > Thank you for any help. > > > > Marco Cisternino > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Tue Mar 22 14:53:37 2022 From: sam.guo at cd-adapco.com (Sam Guo) Date: Tue, 22 Mar 2022 12:53:37 -0700 Subject: [petsc-users] MatCreateSBAIJ In-Reply-To: References: Message-ID: Barry, Thanks for the illustration. Is there an easy way to mimic the implementation using shell matrix? I have been studying how the sMvctx is created and it seems pretty involved. Thanks, Sam On Mon, Mar 21, 2022 at 2:48 PM Barry Smith wrote: > > > On Mar 21, 2022, at 4:36 PM, Sam Guo wrote: > > Barry, > Thanks. Could you elaborate? I try to implement the matrix-vector > multiplication for a symmetric matrix using shell matrix. > > > Consider with three ranks > > (a) = ( A B D) (x) > (b) ( B' C E) (y) > (c) ( D' E' F) (w) > > Only the ones without the ' are stored on the rank. So for example B > is stored on rank 0. > > Rank 0 computes A x and keeps it in a. Rank 1 computes Cy and keeps > it in b Rank 2 computes Fw and keeps it in c > > Rank 0 computes B'x and D'x. It puts the nonzero entries of these > values as well as the values of x into slvec0 > > Rank 1 computes E'y and puts the nonzero entries as well as the > values into slvec0 > > Rank 2 puts the values of we needed by the other ranks into slvec0 > > Rank 0 does B y_h + D z_h where it gets the y_h and z_h values from > slvec1 and adds it to a > > Rank 1 takes the B'x from slvec1 and adds it to b it then takes the > E y_h values where the y_h are pulled from slvec1 and adds them b > > Rank 2 takes the B'x and E'y from slvec0 and adds them to c. > > > > Thanks, > Sam > > On Mon, Mar 21, 2022 at 12:56 PM Barry Smith wrote: > >> >> The "trick" is that though "more" communication is needed to complete >> the product the communication can still be done in a single VecScatter >> instead of two separate calls to VecScatter. We simply pack both pieces of >> information that needs to be sent into a single vector. >> >> /* copy x into the vec slvec0 */1111: VecGetArray (a->slvec0,&from);1112: VecGetArrayRead (xx,&x); >> 1114: PetscArraycpy (from,x,bs*mbs);1115: VecRestoreArray (a->slvec0,&from);1116: VecRestoreArrayRead (xx,&x); >> 1118: VecScatterBegin (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1119: VecScatterEnd (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD ); >> >> If you create two symmetric matrices, one with SBAIJ and one with BAIJ >> and compare the time to do the product you will find that the SBAIJ is not >> significantly slower but does save memory. >> >> >> On Mar 21, 2022, at 3:26 PM, Sam Guo wrote: >> >> Using following example from the MatCreateSBAIJ documentation >> >> 0 1 2 3 4 5 6 7 8 9 10 11 >> -------------------------- >> row 3 |. . . d d d o o o o o o >> row 4 |. . . d d d o o o o o o >> row 5 |. . . d d d o o o o o o >> -------------------------- >> >> >> On a processor that owns rows 3, 4 and 5, rows 0-2 info are still needed. Is is that the processor that owns rows 0-2 will apply B symmetrical and then send the result >> >> to the processor that owns 3-5? >> >> >> On Mon, Mar 21, 2022 at 12:14 PM Mark Adams wrote: >> >>> PETSc stores parallel matrices as two serial matrices. One for the >>> diagonal (d or A) block and one for the rest (o or B). >>> I would guess that for symmetric matrices it has a symmetric matrix for >>> the diagonal and a full AIJ matrix for the (upper) off-diagonal. >>> So the multtranspose is applying B symmetrically. This lower >>> off-diagonal and the diagonal block can be done without communication. >>> Then the off processor values are collected, and the upper off-diagonal >>> is applied. >>> >>> On Mon, Mar 21, 2022 at 2:35 PM Sam Guo wrote: >>> >>>> I am most interested in how the lower triangular part is redistributed. >>>> It seems that SBAJI saves memory but requires more communication than BAIJ. >>>> >>>> On Mon, Mar 21, 2022 at 11:27 AM Sam Guo wrote: >>>> >>>>> Mark, thanks for the quick response. I am more interested in parallel >>>>> implementation of MatMult for SBAIJ. I found following >>>>> >>>>> 1094: *PetscErrorCode MatMult_MPISBAIJ(Mat A,Vec xx,Vec yy)*1095: {1096: Mat_MPISBAIJ *a = (Mat_MPISBAIJ*)A->data;1097: PetscErrorCode ierr;1098: PetscInt mbs=a->mbs,bs=A->rmap->bs;1099: PetscScalar *from;1100: const PetscScalar *x; >>>>> 1103: /* diagonal part */1104: (*a->A->ops->mult)(a->A,xx,a->slvec1a);1105: VecSet (a->slvec1b,0.0); >>>>> 1107: /* subdiagonal part */1108: (*a->B->ops->multtranspose)(a->B,xx,a->slvec0b); >>>>> 1110: /* copy x into the vec slvec0 */1111: VecGetArray (a->slvec0,&from);1112: VecGetArrayRead (xx,&x); >>>>> 1114: PetscArraycpy (from,x,bs*mbs);1115: VecRestoreArray (a->slvec0,&from);1116: VecRestoreArrayRead (xx,&x); >>>>> 1118: VecScatterBegin (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1119: VecScatterEnd (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1120: /* supperdiagonal part */1121: (*a->B->ops->multadd)(a->B,a->slvec1b,a->slvec1a,yy);1122: return(0);1123: } >>>>> >>>>> I try to understand the algorithm. >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Sam >>>>> >>>>> >>>>> On Mon, Mar 21, 2022 at 11:14 AM Mark Adams wrote: >>>>> >>>>>> This code looks fine to me and the code is >>>>>> in src/mat/impls/sbaij/seq/sbaij2.c >>>>>> >>>>>> On Mon, Mar 21, 2022 at 2:02 PM Sam Guo >>>>>> wrote: >>>>>> >>>>>>> Dear PETSc dev team, >>>>>>> The documentation about MatCreateSBAIJ has following >>>>>>> "It is recommended that one use the MatCreate >>>>>>> >>>>>>> (), MatSetType >>>>>>> () >>>>>>> and/or MatSetFromOptions >>>>>>> (), >>>>>>> MatXXXXSetPreallocation() paradigm instead of this routine directly. >>>>>>> [MatXXXXSetPreallocation() is, for example, >>>>>>> MatSeqAIJSetPreallocation >>>>>>> >>>>>>> ]" >>>>>>> I currently call MatCreateSBAIJ directly as follows: >>>>>>> MatCreateSBAIJ (with d_nnz and o_nnz) >>>>>>> MatSetValues (to add row by row) >>>>>>> MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); >>>>>>> MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); >>>>>>> MatSetOption(A, MAT_SYMMETRIC, PETSC_TRUE); >>>>>>> >>>>>>> Two questions: >>>>>>> (1) I am wondering whether what I am doing is the most efficient. >>>>>>> >>>>>>> (2) I try to find out how the matrix vector multiplication is >>>>>>> implemented in PETSc for SBAIJ storage. >>>>>>> >>>>>>> Thanks, >>>>>>> Sam >>>>>>> >>>>>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Tue Mar 22 14:58:43 2022 From: sam.guo at cd-adapco.com (Sam Guo) Date: Tue, 22 Mar 2022 12:58:43 -0700 Subject: [petsc-users] MatCreateSBAIJ In-Reply-To: References: Message-ID: The reason I want to use shell matrix is to reduce memory footprint. If I create a PETSc matrix and use MUMPS, I understand PETSc will create another copy of the matrix for MUMPS. Is there any way to avoid the extra copy of MUMPS? On Tue, Mar 22, 2022 at 12:53 PM Sam Guo wrote: > Barry, > Thanks for the illustration. Is there an easy way to mimic the > implementation using shell matrix? I have been studying how the sMvctx is > created and it seems pretty involved. > > Thanks, > Sam > > On Mon, Mar 21, 2022 at 2:48 PM Barry Smith wrote: > >> >> >> On Mar 21, 2022, at 4:36 PM, Sam Guo wrote: >> >> Barry, >> Thanks. Could you elaborate? I try to implement the matrix-vector >> multiplication for a symmetric matrix using shell matrix. >> >> >> Consider with three ranks >> >> (a) = ( A B D) (x) >> (b) ( B' C E) (y) >> (c) ( D' E' F) (w) >> >> Only the ones without the ' are stored on the rank. So for example >> B is stored on rank 0. >> >> Rank 0 computes A x and keeps it in a. Rank 1 computes Cy and >> keeps it in b Rank 2 computes Fw and keeps it in c >> >> Rank 0 computes B'x and D'x. It puts the nonzero entries of these >> values as well as the values of x into slvec0 >> >> Rank 1 computes E'y and puts the nonzero entries as well as the >> values into slvec0 >> >> Rank 2 puts the values of we needed by the other ranks into slvec0 >> >> Rank 0 does B y_h + D z_h where it gets the y_h and z_h values >> from slvec1 and adds it to a >> >> Rank 1 takes the B'x from slvec1 and adds it to b it then takes >> the E y_h values where the y_h are pulled from slvec1 and adds them b >> >> Rank 2 takes the B'x and E'y from slvec0 and adds them to c. >> >> >> >> Thanks, >> Sam >> >> On Mon, Mar 21, 2022 at 12:56 PM Barry Smith wrote: >> >>> >>> The "trick" is that though "more" communication is needed to complete >>> the product the communication can still be done in a single VecScatter >>> instead of two separate calls to VecScatter. We simply pack both pieces of >>> information that needs to be sent into a single vector. >>> >>> /* copy x into the vec slvec0 */1111: VecGetArray (a->slvec0,&from);1112: VecGetArrayRead (xx,&x); >>> 1114: PetscArraycpy (from,x,bs*mbs);1115: VecRestoreArray (a->slvec0,&from);1116: VecRestoreArrayRead (xx,&x); >>> 1118: VecScatterBegin (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1119: VecScatterEnd (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD ); >>> >>> If you create two symmetric matrices, one with SBAIJ and one with BAIJ >>> and compare the time to do the product you will find that the SBAIJ is not >>> significantly slower but does save memory. >>> >>> >>> On Mar 21, 2022, at 3:26 PM, Sam Guo wrote: >>> >>> Using following example from the MatCreateSBAIJ documentation >>> >>> 0 1 2 3 4 5 6 7 8 9 10 11 >>> -------------------------- >>> row 3 |. . . d d d o o o o o o >>> row 4 |. . . d d d o o o o o o >>> row 5 |. . . d d d o o o o o o >>> -------------------------- >>> >>> >>> On a processor that owns rows 3, 4 and 5, rows 0-2 info are still needed. Is is that the processor that owns rows 0-2 will apply B symmetrical and then send the result >>> >>> to the processor that owns 3-5? >>> >>> >>> On Mon, Mar 21, 2022 at 12:14 PM Mark Adams wrote: >>> >>>> PETSc stores parallel matrices as two serial matrices. One for the >>>> diagonal (d or A) block and one for the rest (o or B). >>>> I would guess that for symmetric matrices it has a symmetric matrix for >>>> the diagonal and a full AIJ matrix for the (upper) off-diagonal. >>>> So the multtranspose is applying B symmetrically. This lower >>>> off-diagonal and the diagonal block can be done without communication. >>>> Then the off processor values are collected, and the upper off-diagonal >>>> is applied. >>>> >>>> On Mon, Mar 21, 2022 at 2:35 PM Sam Guo wrote: >>>> >>>>> I am most interested in how the lower triangular part is >>>>> redistributed. It seems that SBAJI saves memory but requires more >>>>> communication than BAIJ. >>>>> >>>>> On Mon, Mar 21, 2022 at 11:27 AM Sam Guo >>>>> wrote: >>>>> >>>>>> Mark, thanks for the quick response. I am more interested in parallel >>>>>> implementation of MatMult for SBAIJ. I found following >>>>>> >>>>>> 1094: *PetscErrorCode MatMult_MPISBAIJ(Mat A,Vec xx,Vec yy)*1095: {1096: Mat_MPISBAIJ *a = (Mat_MPISBAIJ*)A->data;1097: PetscErrorCode ierr;1098: PetscInt mbs=a->mbs,bs=A->rmap->bs;1099: PetscScalar *from;1100: const PetscScalar *x; >>>>>> 1103: /* diagonal part */1104: (*a->A->ops->mult)(a->A,xx,a->slvec1a);1105: VecSet (a->slvec1b,0.0); >>>>>> 1107: /* subdiagonal part */1108: (*a->B->ops->multtranspose)(a->B,xx,a->slvec0b); >>>>>> 1110: /* copy x into the vec slvec0 */1111: VecGetArray (a->slvec0,&from);1112: VecGetArrayRead (xx,&x); >>>>>> 1114: PetscArraycpy (from,x,bs*mbs);1115: VecRestoreArray (a->slvec0,&from);1116: VecRestoreArrayRead (xx,&x); >>>>>> 1118: VecScatterBegin (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1119: VecScatterEnd (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1120: /* supperdiagonal part */1121: (*a->B->ops->multadd)(a->B,a->slvec1b,a->slvec1a,yy);1122: return(0);1123: } >>>>>> >>>>>> I try to understand the algorithm. >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Sam >>>>>> >>>>>> >>>>>> On Mon, Mar 21, 2022 at 11:14 AM Mark Adams wrote: >>>>>> >>>>>>> This code looks fine to me and the code is >>>>>>> in src/mat/impls/sbaij/seq/sbaij2.c >>>>>>> >>>>>>> On Mon, Mar 21, 2022 at 2:02 PM Sam Guo >>>>>>> wrote: >>>>>>> >>>>>>>> Dear PETSc dev team, >>>>>>>> The documentation about MatCreateSBAIJ has following >>>>>>>> "It is recommended that one use the MatCreate >>>>>>>> >>>>>>>> (), MatSetType >>>>>>>> () >>>>>>>> and/or MatSetFromOptions >>>>>>>> (), >>>>>>>> MatXXXXSetPreallocation() paradigm instead of this routine directly. >>>>>>>> [MatXXXXSetPreallocation() is, for example, >>>>>>>> MatSeqAIJSetPreallocation >>>>>>>> >>>>>>>> ]" >>>>>>>> I currently call MatCreateSBAIJ directly as follows: >>>>>>>> MatCreateSBAIJ (with d_nnz and o_nnz) >>>>>>>> MatSetValues (to add row by row) >>>>>>>> MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); >>>>>>>> MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); >>>>>>>> MatSetOption(A, MAT_SYMMETRIC, PETSC_TRUE); >>>>>>>> >>>>>>>> Two questions: >>>>>>>> (1) I am wondering whether what I am doing is the >>>>>>>> most efficient. >>>>>>>> >>>>>>>> (2) I try to find out how the matrix vector multiplication is >>>>>>>> implemented in PETSc for SBAIJ storage. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Sam >>>>>>>> >>>>>>> >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Mar 22 15:10:19 2022 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 22 Mar 2022 16:10:19 -0400 Subject: [petsc-users] MatCreateSBAIJ In-Reply-To: References: Message-ID: <195BAAA7-B506-4FFE-83C2-E7874633B0EA@petsc.dev> Sam, MUMPS is a direct solver, as such, it requires much more memory than the original matrix (stored as a PETSc matrix) to store the factored matrix. The savings you will get by not having a PETSc copy of the matrix and a MUMPS copy of the matrix at the same time is unlikely to be significant. Do you have memory footprint measurements indicating that not having the PETSc copy of the matrix in memory will allow you to run measurably larger simulations? Barry > On Mar 22, 2022, at 3:58 PM, Sam Guo wrote: > > The reason I want to use shell matrix is to reduce memory footprint. If I create a PETSc matrix and use MUMPS, I understand PETSc will create another copy of the matrix for MUMPS. Is there any way to avoid the extra copy of MUMPS? > > On Tue, Mar 22, 2022 at 12:53 PM Sam Guo > wrote: > Barry, > Thanks for the illustration. Is there an easy way to mimic the implementation using shell matrix? I have been studying how the sMvctx is created and it seems pretty involved. > > Thanks, > Sam > > On Mon, Mar 21, 2022 at 2:48 PM Barry Smith > wrote: > > >> On Mar 21, 2022, at 4:36 PM, Sam Guo > wrote: >> >> Barry, >> Thanks. Could you elaborate? I try to implement the matrix-vector multiplication for a symmetric matrix using shell matrix. > > Consider with three ranks > > (a) = ( A B D) (x) > (b) ( B' C E) (y) > (c) ( D' E' F) (w) > > Only the ones without the ' are stored on the rank. So for example B is stored on rank 0. > > Rank 0 computes A x and keeps it in a. Rank 1 computes Cy and keeps it in b Rank 2 computes Fw and keeps it in c > > Rank 0 computes B'x and D'x. It puts the nonzero entries of these values as well as the values of x into slvec0 > > Rank 1 computes E'y and puts the nonzero entries as well as the values into slvec0 > > Rank 2 puts the values of we needed by the other ranks into slvec0 > > Rank 0 does B y_h + D z_h where it gets the y_h and z_h values from slvec1 and adds it to a > > Rank 1 takes the B'x from slvec1 and adds it to b it then takes the E y_h values where the y_h are pulled from slvec1 and adds them b > > Rank 2 takes the B'x and E'y from slvec0 and adds them to c. > > >> >> Thanks, >> Sam >> >> On Mon, Mar 21, 2022 at 12:56 PM Barry Smith > wrote: >> >> The "trick" is that though "more" communication is needed to complete the product the communication can still be done in a single VecScatter instead of two separate calls to VecScatter. We simply pack both pieces of information that needs to be sent into a single vector. >> >> /* copy x into the vec slvec0 */ >> 1111: <> VecGetArray (a->slvec0,&from); >> 1112: <> VecGetArrayRead (xx,&x); >> >> 1114: <> PetscArraycpy (from,x,bs*mbs); >> 1115: <> VecRestoreArray (a->slvec0,&from); >> 1116: <> VecRestoreArrayRead (xx,&x); >> >> 1118: <> VecScatterBegin (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD ); >> 1119: <> VecScatterEnd (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD ); >> If you create two symmetric matrices, one with SBAIJ and one with BAIJ and compare the time to do the product you will find that the SBAIJ is not significantly slower but does save memory. >> >> >>> On Mar 21, 2022, at 3:26 PM, Sam Guo > wrote: >>> >>> Using following example from the MatCreateSBAIJ documentation >>> 0 1 2 3 4 5 6 7 8 9 10 11 >>> -------------------------- >>> row 3 |. . . d d d o o o o o o >>> row 4 |. . . d d d o o o o o o >>> row 5 |. . . d d d o o o o o o >>> -------------------------- >>> >>> On a processor that owns rows 3, 4 and 5, rows 0-2 info are still needed. Is is that the processor that owns rows 0-2 will apply B symmetrical and then send the result >>> to the processor that owns 3-5? >>> >>> On Mon, Mar 21, 2022 at 12:14 PM Mark Adams > wrote: >>> PETSc stores parallel matrices as two serial matrices. One for the diagonal (d or A) block and one for the rest (o or B). >>> I would guess that for symmetric matrices it has a symmetric matrix for the diagonal and a full AIJ matrix for the (upper) off-diagonal. >>> So the multtranspose is applying B symmetrically. This lower off-diagonal and the diagonal block can be done without communication. >>> Then the off processor values are collected, and the upper off-diagonal is applied. >>> >>> On Mon, Mar 21, 2022 at 2:35 PM Sam Guo > wrote: >>> I am most interested in how the lower triangular part is redistributed. It seems that SBAJI saves memory but requires more communication than BAIJ. >>> >>> On Mon, Mar 21, 2022 at 11:27 AM Sam Guo > wrote: >>> Mark, thanks for the quick response. I am more interested in parallel implementation of MatMult for SBAIJ. I found following >>> 1094: <> <>PetscErrorCode MatMult_MPISBAIJ(Mat A,Vec xx,Vec yy) >>> 1095: <>{ >>> 1096: <> Mat_MPISBAIJ *a = (Mat_MPISBAIJ*)A->data; >>> 1097: <> PetscErrorCode ierr; >>> 1098: <> PetscInt mbs=a->mbs,bs=A->rmap->bs; >>> 1099: <> PetscScalar *from; >>> 1100: <> const PetscScalar *x; >>> >>> 1103: <> /* diagonal part */ >>> 1104: <> (*a->A->ops->mult)(a->A,xx,a->slvec1a); >>> 1105: <> VecSet (a->slvec1b,0.0); >>> >>> 1107: <> /* subdiagonal part */ >>> 1108: <> (*a->B->ops->multtranspose)(a->B,xx,a->slvec0b); >>> >>> 1110: <> /* copy x into the vec slvec0 */ >>> 1111: <> VecGetArray (a->slvec0,&from); >>> 1112: <> VecGetArrayRead (xx,&x); >>> >>> 1114: <> PetscArraycpy (from,x,bs*mbs); >>> 1115: <> VecRestoreArray (a->slvec0,&from); >>> 1116: <> VecRestoreArrayRead (xx,&x); >>> >>> 1118: <> VecScatterBegin (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD ); >>> 1119: <> VecScatterEnd (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD ); >>> 1120: <> /* supperdiagonal part */ >>> 1121: <> (*a->B->ops->multadd)(a->B,a->slvec1b,a->slvec1a,yy); >>> 1122: <> return(0); >>> 1123: <>} >>> I try to understand the algorithm. >>> >>> Thanks, >>> Sam >>> >>> On Mon, Mar 21, 2022 at 11:14 AM Mark Adams > wrote: >>> This code looks fine to me and the code is in src/mat/impls/sbaij/seq/sbaij2.c >>> >>> On Mon, Mar 21, 2022 at 2:02 PM Sam Guo > wrote: >>> Dear PETSc dev team, >>> The documentation about MatCreateSBAIJ has following >>> "It is recommended that one use the MatCreate (), MatSetType () and/or MatSetFromOptions (), MatXXXXSetPreallocation() paradigm instead of this routine directly. [MatXXXXSetPreallocation() is, for example, MatSeqAIJSetPreallocation ]" >>> I currently call MatCreateSBAIJ directly as follows: >>> MatCreateSBAIJ (with d_nnz and o_nnz) >>> MatSetValues (to add row by row) >>> MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); >>> MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); >>> MatSetOption(A, MAT_SYMMETRIC, PETSC_TRUE); >>> >>> Two questions: >>> (1) I am wondering whether what I am doing is the most efficient. >>> >>> (2) I try to find out how the matrix vector multiplication is implemented in PETSc for SBAIJ storage. >>> >>> Thanks, >>> Sam >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Tue Mar 22 15:16:41 2022 From: sam.guo at cd-adapco.com (Sam Guo) Date: Tue, 22 Mar 2022 13:16:41 -0700 Subject: [petsc-users] MatCreateSBAIJ In-Reply-To: <195BAAA7-B506-4FFE-83C2-E7874633B0EA@petsc.dev> References: <195BAAA7-B506-4FFE-83C2-E7874633B0EA@petsc.dev> Message-ID: Here is one memory comparison (memory in MB) np=1np=2np=4np=8np=16 shell 1614 1720 1874 1673 1248 PETSc(using full matrix) 2108 2260 2364 2215 1734 PETSc(using symmetric matrix) 1750 2100 2189 2094 1727Those are the total water mark memory added. On Tue, Mar 22, 2022 at 1:10 PM Barry Smith wrote: > > Sam, > > MUMPS is a direct solver, as such, it requires much more memory than the > original matrix (stored as a PETSc matrix) to store the factored matrix. > The savings you will get by not having a PETSc copy of the matrix and a > MUMPS copy of the matrix at the same time is unlikely to be significant. Do > you have memory footprint measurements indicating that not having the PETSc > copy of the matrix in memory will allow you to run measurably larger > simulations? > > Barry > > > > > On Mar 22, 2022, at 3:58 PM, Sam Guo wrote: > > The reason I want to use shell matrix is to reduce memory footprint. If I > create a PETSc matrix and use MUMPS, I understand PETSc will create another > copy of the matrix for MUMPS. Is there any way to avoid the extra copy of > MUMPS? > > On Tue, Mar 22, 2022 at 12:53 PM Sam Guo wrote: > >> Barry, >> Thanks for the illustration. Is there an easy way to mimic the >> implementation using shell matrix? I have been studying how the sMvctx is >> created and it seems pretty involved. >> >> Thanks, >> Sam >> >> On Mon, Mar 21, 2022 at 2:48 PM Barry Smith wrote: >> >>> >>> >>> On Mar 21, 2022, at 4:36 PM, Sam Guo wrote: >>> >>> Barry, >>> Thanks. Could you elaborate? I try to implement the matrix-vector >>> multiplication for a symmetric matrix using shell matrix. >>> >>> >>> Consider with three ranks >>> >>> (a) = ( A B D) (x) >>> (b) ( B' C E) (y) >>> (c) ( D' E' F) (w) >>> >>> Only the ones without the ' are stored on the rank. So for example >>> B is stored on rank 0. >>> >>> Rank 0 computes A x and keeps it in a. Rank 1 computes Cy and >>> keeps it in b Rank 2 computes Fw and keeps it in c >>> >>> Rank 0 computes B'x and D'x. It puts the nonzero entries of these >>> values as well as the values of x into slvec0 >>> >>> Rank 1 computes E'y and puts the nonzero entries as well as the >>> values into slvec0 >>> >>> Rank 2 puts the values of we needed by the other ranks into slvec0 >>> >>> Rank 0 does B y_h + D z_h where it gets the y_h and z_h values >>> from slvec1 and adds it to a >>> >>> Rank 1 takes the B'x from slvec1 and adds it to b it then takes >>> the E y_h values where the y_h are pulled from slvec1 and adds them b >>> >>> Rank 2 takes the B'x and E'y from slvec0 and adds them to c. >>> >>> >>> >>> Thanks, >>> Sam >>> >>> On Mon, Mar 21, 2022 at 12:56 PM Barry Smith wrote: >>> >>>> >>>> The "trick" is that though "more" communication is needed to complete >>>> the product the communication can still be done in a single VecScatter >>>> instead of two separate calls to VecScatter. We simply pack both pieces of >>>> information that needs to be sent into a single vector. >>>> >>>> /* copy x into the vec slvec0 */1111: VecGetArray (a->slvec0,&from);1112: VecGetArrayRead (xx,&x); >>>> 1114: PetscArraycpy (from,x,bs*mbs);1115: VecRestoreArray (a->slvec0,&from);1116: VecRestoreArrayRead (xx,&x); >>>> 1118: VecScatterBegin (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1119: VecScatterEnd (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD ); >>>> >>>> If you create two symmetric matrices, one with SBAIJ and one with BAIJ >>>> and compare the time to do the product you will find that the SBAIJ is not >>>> significantly slower but does save memory. >>>> >>>> >>>> On Mar 21, 2022, at 3:26 PM, Sam Guo wrote: >>>> >>>> Using following example from the MatCreateSBAIJ documentation >>>> >>>> 0 1 2 3 4 5 6 7 8 9 10 11 >>>> -------------------------- >>>> row 3 |. . . d d d o o o o o o >>>> row 4 |. . . d d d o o o o o o >>>> row 5 |. . . d d d o o o o o o >>>> -------------------------- >>>> >>>> >>>> On a processor that owns rows 3, 4 and 5, rows 0-2 info are still needed. Is is that the processor that owns rows 0-2 will apply B symmetrical and then send the result >>>> >>>> to the processor that owns 3-5? >>>> >>>> >>>> On Mon, Mar 21, 2022 at 12:14 PM Mark Adams wrote: >>>> >>>>> PETSc stores parallel matrices as two serial matrices. One for the >>>>> diagonal (d or A) block and one for the rest (o or B). >>>>> I would guess that for symmetric matrices it has a symmetric matrix >>>>> for the diagonal and a full AIJ matrix for the (upper) off-diagonal. >>>>> So the multtranspose is applying B symmetrically. This lower >>>>> off-diagonal and the diagonal block can be done without communication. >>>>> Then the off processor values are collected, and the upper >>>>> off-diagonal is applied. >>>>> >>>>> On Mon, Mar 21, 2022 at 2:35 PM Sam Guo wrote: >>>>> >>>>>> I am most interested in how the lower triangular part is >>>>>> redistributed. It seems that SBAJI saves memory but requires more >>>>>> communication than BAIJ. >>>>>> >>>>>> On Mon, Mar 21, 2022 at 11:27 AM Sam Guo >>>>>> wrote: >>>>>> >>>>>>> Mark, thanks for the quick response. I am more interested in >>>>>>> parallel implementation of MatMult for SBAIJ. I found following >>>>>>> >>>>>>> 1094: *PetscErrorCode MatMult_MPISBAIJ(Mat A,Vec xx,Vec yy)*1095: {1096: Mat_MPISBAIJ *a = (Mat_MPISBAIJ*)A->data;1097: PetscErrorCode ierr;1098: PetscInt mbs=a->mbs,bs=A->rmap->bs;1099: PetscScalar *from;1100: const PetscScalar *x; >>>>>>> 1103: /* diagonal part */1104: (*a->A->ops->mult)(a->A,xx,a->slvec1a);1105: VecSet (a->slvec1b,0.0); >>>>>>> 1107: /* subdiagonal part */1108: (*a->B->ops->multtranspose)(a->B,xx,a->slvec0b); >>>>>>> 1110: /* copy x into the vec slvec0 */1111: VecGetArray (a->slvec0,&from);1112: VecGetArrayRead (xx,&x); >>>>>>> 1114: PetscArraycpy (from,x,bs*mbs);1115: VecRestoreArray (a->slvec0,&from);1116: VecRestoreArrayRead (xx,&x); >>>>>>> 1118: VecScatterBegin (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1119: VecScatterEnd (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1120: /* supperdiagonal part */1121: (*a->B->ops->multadd)(a->B,a->slvec1b,a->slvec1a,yy);1122: return(0);1123: } >>>>>>> >>>>>>> I try to understand the algorithm. >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Sam >>>>>>> >>>>>>> >>>>>>> On Mon, Mar 21, 2022 at 11:14 AM Mark Adams wrote: >>>>>>> >>>>>>>> This code looks fine to me and the code is >>>>>>>> in src/mat/impls/sbaij/seq/sbaij2.c >>>>>>>> >>>>>>>> On Mon, Mar 21, 2022 at 2:02 PM Sam Guo >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Dear PETSc dev team, >>>>>>>>> The documentation about MatCreateSBAIJ has following >>>>>>>>> "It is recommended that one use the MatCreate >>>>>>>>> >>>>>>>>> (), MatSetType >>>>>>>>> () >>>>>>>>> and/or MatSetFromOptions >>>>>>>>> (), >>>>>>>>> MatXXXXSetPreallocation() paradigm instead of this routine directly. >>>>>>>>> [MatXXXXSetPreallocation() is, for example, >>>>>>>>> MatSeqAIJSetPreallocation >>>>>>>>> >>>>>>>>> ]" >>>>>>>>> I currently call MatCreateSBAIJ directly as follows: >>>>>>>>> MatCreateSBAIJ (with d_nnz and o_nnz) >>>>>>>>> MatSetValues (to add row by row) >>>>>>>>> MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); >>>>>>>>> MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); >>>>>>>>> MatSetOption(A, MAT_SYMMETRIC, PETSC_TRUE); >>>>>>>>> >>>>>>>>> Two questions: >>>>>>>>> (1) I am wondering whether what I am doing is the >>>>>>>>> most efficient. >>>>>>>>> >>>>>>>>> (2) I try to find out how the matrix vector multiplication is >>>>>>>>> implemented in PETSc for SBAIJ storage. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Sam >>>>>>>>> >>>>>>>> >>>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Tue Mar 22 15:20:41 2022 From: sam.guo at cd-adapco.com (Sam Guo) Date: Tue, 22 Mar 2022 13:20:41 -0700 Subject: [petsc-users] MatCreateSBAIJ In-Reply-To: References: <195BAAA7-B506-4FFE-83C2-E7874633B0EA@petsc.dev> Message-ID: I am using SLEPc to solve a generalized eigenvalue problem. On Tue, Mar 22, 2022 at 1:16 PM Sam Guo wrote: > Here is one memory comparison (memory in MB) > np=1np=2np=4np=8np=16 > shell 1614 1720 1874 1673 1248 > PETSc(using full matrix) 2108 2260 2364 2215 1734 > PETSc(using symmetric matrix) 1750 2100 2189 2094 1727Those are the total > water mark memory added. > > On Tue, Mar 22, 2022 at 1:10 PM Barry Smith wrote: > >> >> Sam, >> >> MUMPS is a direct solver, as such, it requires much more memory than >> the original matrix (stored as a PETSc matrix) to store the factored >> matrix. The savings you will get by not having a PETSc copy of the matrix >> and a MUMPS copy of the matrix at the same time is unlikely to be >> significant. Do you have memory footprint measurements indicating that not >> having the PETSc copy of the matrix in memory will allow you to run >> measurably larger simulations? >> >> Barry >> >> >> >> >> On Mar 22, 2022, at 3:58 PM, Sam Guo wrote: >> >> The reason I want to use shell matrix is to reduce memory footprint. If I >> create a PETSc matrix and use MUMPS, I understand PETSc will create another >> copy of the matrix for MUMPS. Is there any way to avoid the extra copy of >> MUMPS? >> >> On Tue, Mar 22, 2022 at 12:53 PM Sam Guo wrote: >> >>> Barry, >>> Thanks for the illustration. Is there an easy way to mimic the >>> implementation using shell matrix? I have been studying how the sMvctx is >>> created and it seems pretty involved. >>> >>> Thanks, >>> Sam >>> >>> On Mon, Mar 21, 2022 at 2:48 PM Barry Smith wrote: >>> >>>> >>>> >>>> On Mar 21, 2022, at 4:36 PM, Sam Guo wrote: >>>> >>>> Barry, >>>> Thanks. Could you elaborate? I try to implement the matrix-vector >>>> multiplication for a symmetric matrix using shell matrix. >>>> >>>> >>>> Consider with three ranks >>>> >>>> (a) = ( A B D) (x) >>>> (b) ( B' C E) (y) >>>> (c) ( D' E' F) (w) >>>> >>>> Only the ones without the ' are stored on the rank. So for >>>> example B is stored on rank 0. >>>> >>>> Rank 0 computes A x and keeps it in a. Rank 1 computes Cy and >>>> keeps it in b Rank 2 computes Fw and keeps it in c >>>> >>>> Rank 0 computes B'x and D'x. It puts the nonzero entries of >>>> these values as well as the values of x into slvec0 >>>> >>>> Rank 1 computes E'y and puts the nonzero entries as well as the >>>> values into slvec0 >>>> >>>> Rank 2 puts the values of we needed by the other ranks into >>>> slvec0 >>>> >>>> Rank 0 does B y_h + D z_h where it gets the y_h and z_h values >>>> from slvec1 and adds it to a >>>> >>>> Rank 1 takes the B'x from slvec1 and adds it to b it then takes >>>> the E y_h values where the y_h are pulled from slvec1 and adds them b >>>> >>>> Rank 2 takes the B'x and E'y from slvec0 and adds them to c. >>>> >>>> >>>> >>>> Thanks, >>>> Sam >>>> >>>> On Mon, Mar 21, 2022 at 12:56 PM Barry Smith wrote: >>>> >>>>> >>>>> The "trick" is that though "more" communication is needed to >>>>> complete the product the communication can still be done in a single >>>>> VecScatter instead of two separate calls to VecScatter. We simply pack both >>>>> pieces of information that needs to be sent into a single vector. >>>>> >>>>> /* copy x into the vec slvec0 */1111: VecGetArray (a->slvec0,&from);1112: VecGetArrayRead (xx,&x); >>>>> 1114: PetscArraycpy (from,x,bs*mbs);1115: VecRestoreArray (a->slvec0,&from);1116: VecRestoreArrayRead (xx,&x); >>>>> 1118: VecScatterBegin (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1119: VecScatterEnd (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD ); >>>>> >>>>> If you create two symmetric matrices, one with SBAIJ and one with BAIJ >>>>> and compare the time to do the product you will find that the SBAIJ is not >>>>> significantly slower but does save memory. >>>>> >>>>> >>>>> On Mar 21, 2022, at 3:26 PM, Sam Guo wrote: >>>>> >>>>> Using following example from the MatCreateSBAIJ documentation >>>>> >>>>> 0 1 2 3 4 5 6 7 8 9 10 11 >>>>> -------------------------- >>>>> row 3 |. . . d d d o o o o o o >>>>> row 4 |. . . d d d o o o o o o >>>>> row 5 |. . . d d d o o o o o o >>>>> -------------------------- >>>>> >>>>> >>>>> On a processor that owns rows 3, 4 and 5, rows 0-2 info are still needed. Is is that the processor that owns rows 0-2 will apply B symmetrical and then send the result >>>>> >>>>> to the processor that owns 3-5? >>>>> >>>>> >>>>> On Mon, Mar 21, 2022 at 12:14 PM Mark Adams wrote: >>>>> >>>>>> PETSc stores parallel matrices as two serial matrices. One for the >>>>>> diagonal (d or A) block and one for the rest (o or B). >>>>>> I would guess that for symmetric matrices it has a symmetric matrix >>>>>> for the diagonal and a full AIJ matrix for the (upper) off-diagonal. >>>>>> So the multtranspose is applying B symmetrically. This lower >>>>>> off-diagonal and the diagonal block can be done without communication. >>>>>> Then the off processor values are collected, and the upper >>>>>> off-diagonal is applied. >>>>>> >>>>>> On Mon, Mar 21, 2022 at 2:35 PM Sam Guo >>>>>> wrote: >>>>>> >>>>>>> I am most interested in how the lower triangular part is >>>>>>> redistributed. It seems that SBAJI saves memory but requires more >>>>>>> communication than BAIJ. >>>>>>> >>>>>>> On Mon, Mar 21, 2022 at 11:27 AM Sam Guo >>>>>>> wrote: >>>>>>> >>>>>>>> Mark, thanks for the quick response. I am more interested in >>>>>>>> parallel implementation of MatMult for SBAIJ. I found following >>>>>>>> >>>>>>>> 1094: *PetscErrorCode MatMult_MPISBAIJ(Mat A,Vec xx,Vec yy)*1095: {1096: Mat_MPISBAIJ *a = (Mat_MPISBAIJ*)A->data;1097: PetscErrorCode ierr;1098: PetscInt mbs=a->mbs,bs=A->rmap->bs;1099: PetscScalar *from;1100: const PetscScalar *x; >>>>>>>> 1103: /* diagonal part */1104: (*a->A->ops->mult)(a->A,xx,a->slvec1a);1105: VecSet (a->slvec1b,0.0); >>>>>>>> 1107: /* subdiagonal part */1108: (*a->B->ops->multtranspose)(a->B,xx,a->slvec0b); >>>>>>>> 1110: /* copy x into the vec slvec0 */1111: VecGetArray (a->slvec0,&from);1112: VecGetArrayRead (xx,&x); >>>>>>>> 1114: PetscArraycpy (from,x,bs*mbs);1115: VecRestoreArray (a->slvec0,&from);1116: VecRestoreArrayRead (xx,&x); >>>>>>>> 1118: VecScatterBegin (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1119: VecScatterEnd (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1120: /* supperdiagonal part */1121: (*a->B->ops->multadd)(a->B,a->slvec1b,a->slvec1a,yy);1122: return(0);1123: } >>>>>>>> >>>>>>>> I try to understand the algorithm. >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Sam >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Mar 21, 2022 at 11:14 AM Mark Adams >>>>>>>> wrote: >>>>>>>> >>>>>>>>> This code looks fine to me and the code is >>>>>>>>> in src/mat/impls/sbaij/seq/sbaij2.c >>>>>>>>> >>>>>>>>> On Mon, Mar 21, 2022 at 2:02 PM Sam Guo >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Dear PETSc dev team, >>>>>>>>>> The documentation about MatCreateSBAIJ has following >>>>>>>>>> "It is recommended that one use the MatCreate >>>>>>>>>> >>>>>>>>>> (), MatSetType >>>>>>>>>> () >>>>>>>>>> and/or MatSetFromOptions >>>>>>>>>> (), >>>>>>>>>> MatXXXXSetPreallocation() paradigm instead of this routine directly. >>>>>>>>>> [MatXXXXSetPreallocation() is, for example, >>>>>>>>>> MatSeqAIJSetPreallocation >>>>>>>>>> >>>>>>>>>> ]" >>>>>>>>>> I currently call MatCreateSBAIJ directly as follows: >>>>>>>>>> MatCreateSBAIJ (with d_nnz and o_nnz) >>>>>>>>>> MatSetValues (to add row by row) >>>>>>>>>> MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); >>>>>>>>>> MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); >>>>>>>>>> MatSetOption(A, MAT_SYMMETRIC, PETSC_TRUE); >>>>>>>>>> >>>>>>>>>> Two questions: >>>>>>>>>> (1) I am wondering whether what I am doing is the >>>>>>>>>> most efficient. >>>>>>>>>> >>>>>>>>>> (2) I try to find out how the matrix vector multiplication is >>>>>>>>>> implemented in PETSc for SBAIJ storage. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Sam >>>>>>>>>> >>>>>>>>> >>>>> >>>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Mar 22 15:21:01 2022 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 22 Mar 2022 16:21:01 -0400 Subject: [petsc-users] MatCreateSBAIJ In-Reply-To: References: <195BAAA7-B506-4FFE-83C2-E7874633B0EA@petsc.dev> Message-ID: On Tue, Mar 22, 2022 at 4:16 PM Sam Guo wrote: > Here is one memory comparison (memory in MB) > np=1np=2np=4np=8np=16 > shell 1614 1720 1874 1673 1248 > PETSc(using full matrix) 2108 2260 2364 2215 1734 > PETSc(using symmetric matrix) 1750 2100 2189 2094 1727Those are the total > water mark memory added. > You should be able to directly read off the memory in the PETSc Mat structures (it is at the end of the log). With a tool like massif you could also directly measure the MUMPS memory. Thanks, Matt > On Tue, Mar 22, 2022 at 1:10 PM Barry Smith wrote: > >> >> Sam, >> >> MUMPS is a direct solver, as such, it requires much more memory than >> the original matrix (stored as a PETSc matrix) to store the factored >> matrix. The savings you will get by not having a PETSc copy of the matrix >> and a MUMPS copy of the matrix at the same time is unlikely to be >> significant. Do you have memory footprint measurements indicating that not >> having the PETSc copy of the matrix in memory will allow you to run >> measurably larger simulations? >> >> Barry >> >> >> >> >> On Mar 22, 2022, at 3:58 PM, Sam Guo wrote: >> >> The reason I want to use shell matrix is to reduce memory footprint. If I >> create a PETSc matrix and use MUMPS, I understand PETSc will create another >> copy of the matrix for MUMPS. Is there any way to avoid the extra copy of >> MUMPS? >> >> On Tue, Mar 22, 2022 at 12:53 PM Sam Guo wrote: >> >>> Barry, >>> Thanks for the illustration. Is there an easy way to mimic the >>> implementation using shell matrix? I have been studying how the sMvctx is >>> created and it seems pretty involved. >>> >>> Thanks, >>> Sam >>> >>> On Mon, Mar 21, 2022 at 2:48 PM Barry Smith wrote: >>> >>>> >>>> >>>> On Mar 21, 2022, at 4:36 PM, Sam Guo wrote: >>>> >>>> Barry, >>>> Thanks. Could you elaborate? I try to implement the matrix-vector >>>> multiplication for a symmetric matrix using shell matrix. >>>> >>>> >>>> Consider with three ranks >>>> >>>> (a) = ( A B D) (x) >>>> (b) ( B' C E) (y) >>>> (c) ( D' E' F) (w) >>>> >>>> Only the ones without the ' are stored on the rank. So for >>>> example B is stored on rank 0. >>>> >>>> Rank 0 computes A x and keeps it in a. Rank 1 computes Cy and >>>> keeps it in b Rank 2 computes Fw and keeps it in c >>>> >>>> Rank 0 computes B'x and D'x. It puts the nonzero entries of >>>> these values as well as the values of x into slvec0 >>>> >>>> Rank 1 computes E'y and puts the nonzero entries as well as the >>>> values into slvec0 >>>> >>>> Rank 2 puts the values of we needed by the other ranks into >>>> slvec0 >>>> >>>> Rank 0 does B y_h + D z_h where it gets the y_h and z_h values >>>> from slvec1 and adds it to a >>>> >>>> Rank 1 takes the B'x from slvec1 and adds it to b it then takes >>>> the E y_h values where the y_h are pulled from slvec1 and adds them b >>>> >>>> Rank 2 takes the B'x and E'y from slvec0 and adds them to c. >>>> >>>> >>>> >>>> Thanks, >>>> Sam >>>> >>>> On Mon, Mar 21, 2022 at 12:56 PM Barry Smith wrote: >>>> >>>>> >>>>> The "trick" is that though "more" communication is needed to >>>>> complete the product the communication can still be done in a single >>>>> VecScatter instead of two separate calls to VecScatter. We simply pack both >>>>> pieces of information that needs to be sent into a single vector. >>>>> >>>>> /* copy x into the vec slvec0 */1111: VecGetArray (a->slvec0,&from);1112: VecGetArrayRead (xx,&x); >>>>> 1114: PetscArraycpy (from,x,bs*mbs);1115: VecRestoreArray (a->slvec0,&from);1116: VecRestoreArrayRead (xx,&x); >>>>> 1118: VecScatterBegin (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1119: VecScatterEnd (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD ); >>>>> >>>>> If you create two symmetric matrices, one with SBAIJ and one with BAIJ >>>>> and compare the time to do the product you will find that the SBAIJ is not >>>>> significantly slower but does save memory. >>>>> >>>>> >>>>> On Mar 21, 2022, at 3:26 PM, Sam Guo wrote: >>>>> >>>>> Using following example from the MatCreateSBAIJ documentation >>>>> >>>>> 0 1 2 3 4 5 6 7 8 9 10 11 >>>>> -------------------------- >>>>> row 3 |. . . d d d o o o o o o >>>>> row 4 |. . . d d d o o o o o o >>>>> row 5 |. . . d d d o o o o o o >>>>> -------------------------- >>>>> >>>>> >>>>> On a processor that owns rows 3, 4 and 5, rows 0-2 info are still needed. Is is that the processor that owns rows 0-2 will apply B symmetrical and then send the result >>>>> >>>>> to the processor that owns 3-5? >>>>> >>>>> >>>>> On Mon, Mar 21, 2022 at 12:14 PM Mark Adams wrote: >>>>> >>>>>> PETSc stores parallel matrices as two serial matrices. One for the >>>>>> diagonal (d or A) block and one for the rest (o or B). >>>>>> I would guess that for symmetric matrices it has a symmetric matrix >>>>>> for the diagonal and a full AIJ matrix for the (upper) off-diagonal. >>>>>> So the multtranspose is applying B symmetrically. This lower >>>>>> off-diagonal and the diagonal block can be done without communication. >>>>>> Then the off processor values are collected, and the upper >>>>>> off-diagonal is applied. >>>>>> >>>>>> On Mon, Mar 21, 2022 at 2:35 PM Sam Guo >>>>>> wrote: >>>>>> >>>>>>> I am most interested in how the lower triangular part is >>>>>>> redistributed. It seems that SBAJI saves memory but requires more >>>>>>> communication than BAIJ. >>>>>>> >>>>>>> On Mon, Mar 21, 2022 at 11:27 AM Sam Guo >>>>>>> wrote: >>>>>>> >>>>>>>> Mark, thanks for the quick response. I am more interested in >>>>>>>> parallel implementation of MatMult for SBAIJ. I found following >>>>>>>> >>>>>>>> 1094: *PetscErrorCode MatMult_MPISBAIJ(Mat A,Vec xx,Vec yy)*1095: {1096: Mat_MPISBAIJ *a = (Mat_MPISBAIJ*)A->data;1097: PetscErrorCode ierr;1098: PetscInt mbs=a->mbs,bs=A->rmap->bs;1099: PetscScalar *from;1100: const PetscScalar *x; >>>>>>>> 1103: /* diagonal part */1104: (*a->A->ops->mult)(a->A,xx,a->slvec1a);1105: VecSet (a->slvec1b,0.0); >>>>>>>> 1107: /* subdiagonal part */1108: (*a->B->ops->multtranspose)(a->B,xx,a->slvec0b); >>>>>>>> 1110: /* copy x into the vec slvec0 */1111: VecGetArray (a->slvec0,&from);1112: VecGetArrayRead (xx,&x); >>>>>>>> 1114: PetscArraycpy (from,x,bs*mbs);1115: VecRestoreArray (a->slvec0,&from);1116: VecRestoreArrayRead (xx,&x); >>>>>>>> 1118: VecScatterBegin (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1119: VecScatterEnd (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1120: /* supperdiagonal part */1121: (*a->B->ops->multadd)(a->B,a->slvec1b,a->slvec1a,yy);1122: return(0);1123: } >>>>>>>> >>>>>>>> I try to understand the algorithm. >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Sam >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Mar 21, 2022 at 11:14 AM Mark Adams >>>>>>>> wrote: >>>>>>>> >>>>>>>>> This code looks fine to me and the code is >>>>>>>>> in src/mat/impls/sbaij/seq/sbaij2.c >>>>>>>>> >>>>>>>>> On Mon, Mar 21, 2022 at 2:02 PM Sam Guo >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Dear PETSc dev team, >>>>>>>>>> The documentation about MatCreateSBAIJ has following >>>>>>>>>> "It is recommended that one use the MatCreate >>>>>>>>>> >>>>>>>>>> (), MatSetType >>>>>>>>>> () >>>>>>>>>> and/or MatSetFromOptions >>>>>>>>>> (), >>>>>>>>>> MatXXXXSetPreallocation() paradigm instead of this routine directly. >>>>>>>>>> [MatXXXXSetPreallocation() is, for example, >>>>>>>>>> MatSeqAIJSetPreallocation >>>>>>>>>> >>>>>>>>>> ]" >>>>>>>>>> I currently call MatCreateSBAIJ directly as follows: >>>>>>>>>> MatCreateSBAIJ (with d_nnz and o_nnz) >>>>>>>>>> MatSetValues (to add row by row) >>>>>>>>>> MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); >>>>>>>>>> MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); >>>>>>>>>> MatSetOption(A, MAT_SYMMETRIC, PETSC_TRUE); >>>>>>>>>> >>>>>>>>>> Two questions: >>>>>>>>>> (1) I am wondering whether what I am doing is the >>>>>>>>>> most efficient. >>>>>>>>>> >>>>>>>>>> (2) I try to find out how the matrix vector multiplication is >>>>>>>>>> implemented in PETSc for SBAIJ storage. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Sam >>>>>>>>>> >>>>>>>>> >>>>> >>>> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Tue Mar 22 15:26:30 2022 From: sam.guo at cd-adapco.com (Sam Guo) Date: Tue, 22 Mar 2022 13:26:30 -0700 Subject: [petsc-users] MatCreateSBAIJ In-Reply-To: References: <195BAAA7-B506-4FFE-83C2-E7874633B0EA@petsc.dev> Message-ID: The matrix only requires 175MB (upper triangular part). Not sure where the other extra memory comes from for np > 1. On Tue, Mar 22, 2022 at 1:21 PM Matthew Knepley wrote: > On Tue, Mar 22, 2022 at 4:16 PM Sam Guo wrote: > >> Here is one memory comparison (memory in MB) >> np=1np=2np=4np=8np=16 >> shell 1614 1720 1874 1673 1248 >> PETSc(using full matrix) 2108 2260 2364 2215 1734 >> PETSc(using symmetric matrix) 1750 2100 2189 2094 1727Those are the >> total water mark memory added. >> > > You should be able to directly read off the memory in the PETSc Mat > structures (it is at the end of the log). > With a tool like massif you could also directly measure the MUMPS memory. > > Thanks, > > Matt > > >> On Tue, Mar 22, 2022 at 1:10 PM Barry Smith wrote: >> >>> >>> Sam, >>> >>> MUMPS is a direct solver, as such, it requires much more memory than >>> the original matrix (stored as a PETSc matrix) to store the factored >>> matrix. The savings you will get by not having a PETSc copy of the matrix >>> and a MUMPS copy of the matrix at the same time is unlikely to be >>> significant. Do you have memory footprint measurements indicating that not >>> having the PETSc copy of the matrix in memory will allow you to run >>> measurably larger simulations? >>> >>> Barry >>> >>> >>> >>> >>> On Mar 22, 2022, at 3:58 PM, Sam Guo wrote: >>> >>> The reason I want to use shell matrix is to reduce memory footprint. If >>> I create a PETSc matrix and use MUMPS, I understand PETSc will create >>> another copy of the matrix for MUMPS. Is there any way to avoid the extra >>> copy of MUMPS? >>> >>> On Tue, Mar 22, 2022 at 12:53 PM Sam Guo wrote: >>> >>>> Barry, >>>> Thanks for the illustration. Is there an easy way to mimic the >>>> implementation using shell matrix? I have been studying how the sMvctx is >>>> created and it seems pretty involved. >>>> >>>> Thanks, >>>> Sam >>>> >>>> On Mon, Mar 21, 2022 at 2:48 PM Barry Smith wrote: >>>> >>>>> >>>>> >>>>> On Mar 21, 2022, at 4:36 PM, Sam Guo wrote: >>>>> >>>>> Barry, >>>>> Thanks. Could you elaborate? I try to implement the matrix-vector >>>>> multiplication for a symmetric matrix using shell matrix. >>>>> >>>>> >>>>> Consider with three ranks >>>>> >>>>> (a) = ( A B D) (x) >>>>> (b) ( B' C E) (y) >>>>> (c) ( D' E' F) (w) >>>>> >>>>> Only the ones without the ' are stored on the rank. So for >>>>> example B is stored on rank 0. >>>>> >>>>> Rank 0 computes A x and keeps it in a. Rank 1 computes Cy and >>>>> keeps it in b Rank 2 computes Fw and keeps it in c >>>>> >>>>> Rank 0 computes B'x and D'x. It puts the nonzero entries of >>>>> these values as well as the values of x into slvec0 >>>>> >>>>> Rank 1 computes E'y and puts the nonzero entries as well as the >>>>> values into slvec0 >>>>> >>>>> Rank 2 puts the values of we needed by the other ranks into >>>>> slvec0 >>>>> >>>>> Rank 0 does B y_h + D z_h where it gets the y_h and z_h values >>>>> from slvec1 and adds it to a >>>>> >>>>> Rank 1 takes the B'x from slvec1 and adds it to b it then takes >>>>> the E y_h values where the y_h are pulled from slvec1 and adds them b >>>>> >>>>> Rank 2 takes the B'x and E'y from slvec0 and adds them to c. >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> Sam >>>>> >>>>> On Mon, Mar 21, 2022 at 12:56 PM Barry Smith wrote: >>>>> >>>>>> >>>>>> The "trick" is that though "more" communication is needed to >>>>>> complete the product the communication can still be done in a single >>>>>> VecScatter instead of two separate calls to VecScatter. We simply pack both >>>>>> pieces of information that needs to be sent into a single vector. >>>>>> >>>>>> /* copy x into the vec slvec0 */1111: VecGetArray (a->slvec0,&from);1112: VecGetArrayRead (xx,&x); >>>>>> 1114: PetscArraycpy (from,x,bs*mbs);1115: VecRestoreArray (a->slvec0,&from);1116: VecRestoreArrayRead (xx,&x); >>>>>> 1118: VecScatterBegin (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1119: VecScatterEnd (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD ); >>>>>> >>>>>> If you create two symmetric matrices, one with SBAIJ and one with >>>>>> BAIJ and compare the time to do the product you will find that the SBAIJ is >>>>>> not significantly slower but does save memory. >>>>>> >>>>>> >>>>>> On Mar 21, 2022, at 3:26 PM, Sam Guo wrote: >>>>>> >>>>>> Using following example from the MatCreateSBAIJ documentation >>>>>> >>>>>> 0 1 2 3 4 5 6 7 8 9 10 11 >>>>>> -------------------------- >>>>>> row 3 |. . . d d d o o o o o o >>>>>> row 4 |. . . d d d o o o o o o >>>>>> row 5 |. . . d d d o o o o o o >>>>>> -------------------------- >>>>>> >>>>>> >>>>>> On a processor that owns rows 3, 4 and 5, rows 0-2 info are still needed. Is is that the processor that owns rows 0-2 will apply B symmetrical and then send the result >>>>>> >>>>>> to the processor that owns 3-5? >>>>>> >>>>>> >>>>>> On Mon, Mar 21, 2022 at 12:14 PM Mark Adams wrote: >>>>>> >>>>>>> PETSc stores parallel matrices as two serial matrices. One for the >>>>>>> diagonal (d or A) block and one for the rest (o or B). >>>>>>> I would guess that for symmetric matrices it has a symmetric matrix >>>>>>> for the diagonal and a full AIJ matrix for the (upper) off-diagonal. >>>>>>> So the multtranspose is applying B symmetrically. This lower >>>>>>> off-diagonal and the diagonal block can be done without communication. >>>>>>> Then the off processor values are collected, and the upper >>>>>>> off-diagonal is applied. >>>>>>> >>>>>>> On Mon, Mar 21, 2022 at 2:35 PM Sam Guo >>>>>>> wrote: >>>>>>> >>>>>>>> I am most interested in how the lower triangular part is >>>>>>>> redistributed. It seems that SBAJI saves memory but requires more >>>>>>>> communication than BAIJ. >>>>>>>> >>>>>>>> On Mon, Mar 21, 2022 at 11:27 AM Sam Guo >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Mark, thanks for the quick response. I am more interested in >>>>>>>>> parallel implementation of MatMult for SBAIJ. I found following >>>>>>>>> >>>>>>>>> 1094: *PetscErrorCode MatMult_MPISBAIJ(Mat A,Vec xx,Vec yy)*1095: {1096: Mat_MPISBAIJ *a = (Mat_MPISBAIJ*)A->data;1097: PetscErrorCode ierr;1098: PetscInt mbs=a->mbs,bs=A->rmap->bs;1099: PetscScalar *from;1100: const PetscScalar *x; >>>>>>>>> 1103: /* diagonal part */1104: (*a->A->ops->mult)(a->A,xx,a->slvec1a);1105: VecSet (a->slvec1b,0.0); >>>>>>>>> 1107: /* subdiagonal part */1108: (*a->B->ops->multtranspose)(a->B,xx,a->slvec0b); >>>>>>>>> 1110: /* copy x into the vec slvec0 */1111: VecGetArray (a->slvec0,&from);1112: VecGetArrayRead (xx,&x); >>>>>>>>> 1114: PetscArraycpy (from,x,bs*mbs);1115: VecRestoreArray (a->slvec0,&from);1116: VecRestoreArrayRead (xx,&x); >>>>>>>>> 1118: VecScatterBegin (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1119: VecScatterEnd (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1120: /* supperdiagonal part */1121: (*a->B->ops->multadd)(a->B,a->slvec1b,a->slvec1a,yy);1122: return(0);1123: } >>>>>>>>> >>>>>>>>> I try to understand the algorithm. >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Sam >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Mar 21, 2022 at 11:14 AM Mark Adams >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> This code looks fine to me and the code is >>>>>>>>>> in src/mat/impls/sbaij/seq/sbaij2.c >>>>>>>>>> >>>>>>>>>> On Mon, Mar 21, 2022 at 2:02 PM Sam Guo >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Dear PETSc dev team, >>>>>>>>>>> The documentation about MatCreateSBAIJ has following >>>>>>>>>>> "It is recommended that one use the MatCreate >>>>>>>>>>> >>>>>>>>>>> (), MatSetType >>>>>>>>>>> () >>>>>>>>>>> and/or MatSetFromOptions >>>>>>>>>>> (), >>>>>>>>>>> MatXXXXSetPreallocation() paradigm instead of this routine directly. >>>>>>>>>>> [MatXXXXSetPreallocation() is, for example, >>>>>>>>>>> MatSeqAIJSetPreallocation >>>>>>>>>>> >>>>>>>>>>> ]" >>>>>>>>>>> I currently call MatCreateSBAIJ directly as follows: >>>>>>>>>>> MatCreateSBAIJ (with d_nnz and o_nnz) >>>>>>>>>>> MatSetValues (to add row by row) >>>>>>>>>>> MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); >>>>>>>>>>> MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); >>>>>>>>>>> MatSetOption(A, MAT_SYMMETRIC, PETSC_TRUE); >>>>>>>>>>> >>>>>>>>>>> Two questions: >>>>>>>>>>> (1) I am wondering whether what I am doing is the >>>>>>>>>>> most efficient. >>>>>>>>>>> >>>>>>>>>>> (2) I try to find out how the matrix vector multiplication is >>>>>>>>>>> implemented in PETSc for SBAIJ storage. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Sam >>>>>>>>>>> >>>>>>>>>> >>>>>> >>>>> >>> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Mar 22 15:30:37 2022 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 22 Mar 2022 16:30:37 -0400 Subject: [petsc-users] MatCreateSBAIJ In-Reply-To: References: <195BAAA7-B506-4FFE-83C2-E7874633B0EA@petsc.dev> Message-ID: On Tue, Mar 22, 2022 at 4:26 PM Sam Guo wrote: > The matrix only requires 175MB (upper triangular part). Not sure where the > other extra memory comes from for np > 1 > This is likely MUMPS. However, we could be sure by using a heap profiler, like massif. Thanks, Matt > On Tue, Mar 22, 2022 at 1:21 PM Matthew Knepley wrote: > >> On Tue, Mar 22, 2022 at 4:16 PM Sam Guo wrote: >> >>> Here is one memory comparison (memory in MB) >>> np=1np=2np=4np=8np=16 >>> shell 1614 1720 1874 1673 1248 >>> PETSc(using full matrix) 2108 2260 2364 2215 1734 >>> PETSc(using symmetric matrix) 1750 2100 2189 2094 1727Those are the >>> total water mark memory added. >>> >> >> You should be able to directly read off the memory in the PETSc Mat >> structures (it is at the end of the log). >> With a tool like massif you could also directly measure the MUMPS memory. >> >> Thanks, >> >> Matt >> >> >>> On Tue, Mar 22, 2022 at 1:10 PM Barry Smith wrote: >>> >>>> >>>> Sam, >>>> >>>> MUMPS is a direct solver, as such, it requires much more memory than >>>> the original matrix (stored as a PETSc matrix) to store the factored >>>> matrix. The savings you will get by not having a PETSc copy of the matrix >>>> and a MUMPS copy of the matrix at the same time is unlikely to be >>>> significant. Do you have memory footprint measurements indicating that not >>>> having the PETSc copy of the matrix in memory will allow you to run >>>> measurably larger simulations? >>>> >>>> Barry >>>> >>>> >>>> >>>> >>>> On Mar 22, 2022, at 3:58 PM, Sam Guo wrote: >>>> >>>> The reason I want to use shell matrix is to reduce memory footprint. If >>>> I create a PETSc matrix and use MUMPS, I understand PETSc will create >>>> another copy of the matrix for MUMPS. Is there any way to avoid the extra >>>> copy of MUMPS? >>>> >>>> On Tue, Mar 22, 2022 at 12:53 PM Sam Guo wrote: >>>> >>>>> Barry, >>>>> Thanks for the illustration. Is there an easy way to mimic the >>>>> implementation using shell matrix? I have been studying how the sMvctx is >>>>> created and it seems pretty involved. >>>>> >>>>> Thanks, >>>>> Sam >>>>> >>>>> On Mon, Mar 21, 2022 at 2:48 PM Barry Smith wrote: >>>>> >>>>>> >>>>>> >>>>>> On Mar 21, 2022, at 4:36 PM, Sam Guo wrote: >>>>>> >>>>>> Barry, >>>>>> Thanks. Could you elaborate? I try to implement the matrix-vector >>>>>> multiplication for a symmetric matrix using shell matrix. >>>>>> >>>>>> >>>>>> Consider with three ranks >>>>>> >>>>>> (a) = ( A B D) (x) >>>>>> (b) ( B' C E) (y) >>>>>> (c) ( D' E' F) (w) >>>>>> >>>>>> Only the ones without the ' are stored on the rank. So for >>>>>> example B is stored on rank 0. >>>>>> >>>>>> Rank 0 computes A x and keeps it in a. Rank 1 computes Cy and >>>>>> keeps it in b Rank 2 computes Fw and keeps it in c >>>>>> >>>>>> Rank 0 computes B'x and D'x. It puts the nonzero entries of >>>>>> these values as well as the values of x into slvec0 >>>>>> >>>>>> Rank 1 computes E'y and puts the nonzero entries as well as >>>>>> the values into slvec0 >>>>>> >>>>>> Rank 2 puts the values of we needed by the other ranks into >>>>>> slvec0 >>>>>> >>>>>> Rank 0 does B y_h + D z_h where it gets the y_h and z_h values >>>>>> from slvec1 and adds it to a >>>>>> >>>>>> Rank 1 takes the B'x from slvec1 and adds it to b it then >>>>>> takes the E y_h values where the y_h are pulled from slvec1 and adds them b >>>>>> >>>>>> Rank 2 takes the B'x and E'y from slvec0 and adds them to c. >>>>>> >>>>>> >>>>>> >>>>>> Thanks, >>>>>> Sam >>>>>> >>>>>> On Mon, Mar 21, 2022 at 12:56 PM Barry Smith >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> The "trick" is that though "more" communication is needed to >>>>>>> complete the product the communication can still be done in a single >>>>>>> VecScatter instead of two separate calls to VecScatter. We simply pack both >>>>>>> pieces of information that needs to be sent into a single vector. >>>>>>> >>>>>>> /* copy x into the vec slvec0 */1111: VecGetArray (a->slvec0,&from);1112: VecGetArrayRead (xx,&x); >>>>>>> 1114: PetscArraycpy (from,x,bs*mbs);1115: VecRestoreArray (a->slvec0,&from);1116: VecRestoreArrayRead (xx,&x); >>>>>>> 1118: VecScatterBegin (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1119: VecScatterEnd (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD ); >>>>>>> >>>>>>> If you create two symmetric matrices, one with SBAIJ and one with >>>>>>> BAIJ and compare the time to do the product you will find that the SBAIJ is >>>>>>> not significantly slower but does save memory. >>>>>>> >>>>>>> >>>>>>> On Mar 21, 2022, at 3:26 PM, Sam Guo wrote: >>>>>>> >>>>>>> Using following example from the MatCreateSBAIJ documentation >>>>>>> >>>>>>> 0 1 2 3 4 5 6 7 8 9 10 11 >>>>>>> -------------------------- >>>>>>> row 3 |. . . d d d o o o o o o >>>>>>> row 4 |. . . d d d o o o o o o >>>>>>> row 5 |. . . d d d o o o o o o >>>>>>> -------------------------- >>>>>>> >>>>>>> >>>>>>> On a processor that owns rows 3, 4 and 5, rows 0-2 info are still needed. Is is that the processor that owns rows 0-2 will apply B symmetrical and then send the result >>>>>>> >>>>>>> to the processor that owns 3-5? >>>>>>> >>>>>>> >>>>>>> On Mon, Mar 21, 2022 at 12:14 PM Mark Adams wrote: >>>>>>> >>>>>>>> PETSc stores parallel matrices as two serial matrices. One for the >>>>>>>> diagonal (d or A) block and one for the rest (o or B). >>>>>>>> I would guess that for symmetric matrices it has a symmetric matrix >>>>>>>> for the diagonal and a full AIJ matrix for the (upper) off-diagonal. >>>>>>>> So the multtranspose is applying B symmetrically. This lower >>>>>>>> off-diagonal and the diagonal block can be done without communication. >>>>>>>> Then the off processor values are collected, and the upper >>>>>>>> off-diagonal is applied. >>>>>>>> >>>>>>>> On Mon, Mar 21, 2022 at 2:35 PM Sam Guo >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I am most interested in how the lower triangular part is >>>>>>>>> redistributed. It seems that SBAJI saves memory but requires more >>>>>>>>> communication than BAIJ. >>>>>>>>> >>>>>>>>> On Mon, Mar 21, 2022 at 11:27 AM Sam Guo >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Mark, thanks for the quick response. I am more interested in >>>>>>>>>> parallel implementation of MatMult for SBAIJ. I found following >>>>>>>>>> >>>>>>>>>> 1094: *PetscErrorCode MatMult_MPISBAIJ(Mat A,Vec xx,Vec yy)*1095: {1096: Mat_MPISBAIJ *a = (Mat_MPISBAIJ*)A->data;1097: PetscErrorCode ierr;1098: PetscInt mbs=a->mbs,bs=A->rmap->bs;1099: PetscScalar *from;1100: const PetscScalar *x; >>>>>>>>>> 1103: /* diagonal part */1104: (*a->A->ops->mult)(a->A,xx,a->slvec1a);1105: VecSet (a->slvec1b,0.0); >>>>>>>>>> 1107: /* subdiagonal part */1108: (*a->B->ops->multtranspose)(a->B,xx,a->slvec0b); >>>>>>>>>> 1110: /* copy x into the vec slvec0 */1111: VecGetArray (a->slvec0,&from);1112: VecGetArrayRead (xx,&x); >>>>>>>>>> 1114: PetscArraycpy (from,x,bs*mbs);1115: VecRestoreArray (a->slvec0,&from);1116: VecRestoreArrayRead (xx,&x); >>>>>>>>>> 1118: VecScatterBegin (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1119: VecScatterEnd (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1120: /* supperdiagonal part */1121: (*a->B->ops->multadd)(a->B,a->slvec1b,a->slvec1a,yy);1122: return(0);1123: } >>>>>>>>>> >>>>>>>>>> I try to understand the algorithm. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Sam >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Mar 21, 2022 at 11:14 AM Mark Adams >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> This code looks fine to me and the code is >>>>>>>>>>> in src/mat/impls/sbaij/seq/sbaij2.c >>>>>>>>>>> >>>>>>>>>>> On Mon, Mar 21, 2022 at 2:02 PM Sam Guo >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Dear PETSc dev team, >>>>>>>>>>>> The documentation about MatCreateSBAIJ has following >>>>>>>>>>>> "It is recommended that one use the MatCreate >>>>>>>>>>>> >>>>>>>>>>>> (), MatSetType >>>>>>>>>>>> () >>>>>>>>>>>> and/or MatSetFromOptions >>>>>>>>>>>> (), >>>>>>>>>>>> MatXXXXSetPreallocation() paradigm instead of this routine directly. >>>>>>>>>>>> [MatXXXXSetPreallocation() is, for example, >>>>>>>>>>>> MatSeqAIJSetPreallocation >>>>>>>>>>>> >>>>>>>>>>>> ]" >>>>>>>>>>>> I currently call MatCreateSBAIJ directly as follows: >>>>>>>>>>>> MatCreateSBAIJ (with d_nnz and o_nnz) >>>>>>>>>>>> MatSetValues (to add row by row) >>>>>>>>>>>> MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); >>>>>>>>>>>> MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); >>>>>>>>>>>> MatSetOption(A, MAT_SYMMETRIC, PETSC_TRUE); >>>>>>>>>>>> >>>>>>>>>>>> Two questions: >>>>>>>>>>>> (1) I am wondering whether what I am doing is the >>>>>>>>>>>> most efficient. >>>>>>>>>>>> >>>>>>>>>>>> (2) I try to find out how the matrix >>>>>>>>>>>> vector multiplication is implemented in PETSc for SBAIJ storage. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Sam >>>>>>>>>>>> >>>>>>>>>>> >>>>>>> >>>>>> >>>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Tue Mar 22 15:34:49 2022 From: sam.guo at cd-adapco.com (Sam Guo) Date: Tue, 22 Mar 2022 13:34:49 -0700 Subject: [petsc-users] MatCreateSBAIJ In-Reply-To: References: <195BAAA7-B506-4FFE-83C2-E7874633B0EA@petsc.dev> Message-ID: Hi Matt, I refer to the extra memory comparing shell vs PETSc. I am using the same MUMPS for both shell and PETSc matrix and expect the memory increase by MUMPS to be the same for both. Thanks, Sam On Tue, Mar 22, 2022 at 1:30 PM Matthew Knepley wrote: > On Tue, Mar 22, 2022 at 4:26 PM Sam Guo wrote: > >> The matrix only requires 175MB (upper triangular part). Not sure where >> the other extra memory comes from for np > 1 >> > > This is likely MUMPS. However, we could be sure by using a heap profiler, > like massif. > > Thanks, > > Matt > > >> On Tue, Mar 22, 2022 at 1:21 PM Matthew Knepley >> wrote: >> >>> On Tue, Mar 22, 2022 at 4:16 PM Sam Guo wrote: >>> >>>> Here is one memory comparison (memory in MB) >>>> np=1np=2np=4np=8np=16 >>>> shell 1614 1720 1874 1673 1248 >>>> PETSc(using full matrix) 2108 2260 2364 2215 1734 >>>> PETSc(using symmetric matrix) 1750 2100 2189 2094 1727Those are the >>>> total water mark memory added. >>>> >>> >>> You should be able to directly read off the memory in the PETSc Mat >>> structures (it is at the end of the log). >>> With a tool like massif you could also directly measure the MUMPS memory. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> On Tue, Mar 22, 2022 at 1:10 PM Barry Smith wrote: >>>> >>>>> >>>>> Sam, >>>>> >>>>> MUMPS is a direct solver, as such, it requires much more memory than >>>>> the original matrix (stored as a PETSc matrix) to store the factored >>>>> matrix. The savings you will get by not having a PETSc copy of the matrix >>>>> and a MUMPS copy of the matrix at the same time is unlikely to be >>>>> significant. Do you have memory footprint measurements indicating that not >>>>> having the PETSc copy of the matrix in memory will allow you to run >>>>> measurably larger simulations? >>>>> >>>>> Barry >>>>> >>>>> >>>>> >>>>> >>>>> On Mar 22, 2022, at 3:58 PM, Sam Guo wrote: >>>>> >>>>> The reason I want to use shell matrix is to reduce memory footprint. >>>>> If I create a PETSc matrix and use MUMPS, I understand PETSc will create >>>>> another copy of the matrix for MUMPS. Is there any way to avoid the extra >>>>> copy of MUMPS? >>>>> >>>>> On Tue, Mar 22, 2022 at 12:53 PM Sam Guo >>>>> wrote: >>>>> >>>>>> Barry, >>>>>> Thanks for the illustration. Is there an easy way to mimic the >>>>>> implementation using shell matrix? I have been studying how the sMvctx is >>>>>> created and it seems pretty involved. >>>>>> >>>>>> Thanks, >>>>>> Sam >>>>>> >>>>>> On Mon, Mar 21, 2022 at 2:48 PM Barry Smith wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Mar 21, 2022, at 4:36 PM, Sam Guo wrote: >>>>>>> >>>>>>> Barry, >>>>>>> Thanks. Could you elaborate? I try to implement the matrix-vector >>>>>>> multiplication for a symmetric matrix using shell matrix. >>>>>>> >>>>>>> >>>>>>> Consider with three ranks >>>>>>> >>>>>>> (a) = ( A B D) (x) >>>>>>> (b) ( B' C E) (y) >>>>>>> (c) ( D' E' F) (w) >>>>>>> >>>>>>> Only the ones without the ' are stored on the rank. So for >>>>>>> example B is stored on rank 0. >>>>>>> >>>>>>> Rank 0 computes A x and keeps it in a. Rank 1 computes Cy and >>>>>>> keeps it in b Rank 2 computes Fw and keeps it in c >>>>>>> >>>>>>> Rank 0 computes B'x and D'x. It puts the nonzero entries of >>>>>>> these values as well as the values of x into slvec0 >>>>>>> >>>>>>> Rank 1 computes E'y and puts the nonzero entries as well as >>>>>>> the values into slvec0 >>>>>>> >>>>>>> Rank 2 puts the values of we needed by the other ranks into >>>>>>> slvec0 >>>>>>> >>>>>>> Rank 0 does B y_h + D z_h where it gets the y_h and z_h >>>>>>> values from slvec1 and adds it to a >>>>>>> >>>>>>> Rank 1 takes the B'x from slvec1 and adds it to b it then >>>>>>> takes the E y_h values where the y_h are pulled from slvec1 and adds them b >>>>>>> >>>>>>> Rank 2 takes the B'x and E'y from slvec0 and adds them to c. >>>>>>> >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> Sam >>>>>>> >>>>>>> On Mon, Mar 21, 2022 at 12:56 PM Barry Smith >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> The "trick" is that though "more" communication is needed to >>>>>>>> complete the product the communication can still be done in a single >>>>>>>> VecScatter instead of two separate calls to VecScatter. We simply pack both >>>>>>>> pieces of information that needs to be sent into a single vector. >>>>>>>> >>>>>>>> /* copy x into the vec slvec0 */1111: VecGetArray (a->slvec0,&from);1112: VecGetArrayRead (xx,&x); >>>>>>>> 1114: PetscArraycpy (from,x,bs*mbs);1115: VecRestoreArray (a->slvec0,&from);1116: VecRestoreArrayRead (xx,&x); >>>>>>>> 1118: VecScatterBegin (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1119: VecScatterEnd (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD ); >>>>>>>> >>>>>>>> If you create two symmetric matrices, one with SBAIJ and one with >>>>>>>> BAIJ and compare the time to do the product you will find that the SBAIJ is >>>>>>>> not significantly slower but does save memory. >>>>>>>> >>>>>>>> >>>>>>>> On Mar 21, 2022, at 3:26 PM, Sam Guo wrote: >>>>>>>> >>>>>>>> Using following example from the MatCreateSBAIJ documentation >>>>>>>> >>>>>>>> 0 1 2 3 4 5 6 7 8 9 10 11 >>>>>>>> -------------------------- >>>>>>>> row 3 |. . . d d d o o o o o o >>>>>>>> row 4 |. . . d d d o o o o o o >>>>>>>> row 5 |. . . d d d o o o o o o >>>>>>>> -------------------------- >>>>>>>> >>>>>>>> >>>>>>>> On a processor that owns rows 3, 4 and 5, rows 0-2 info are still needed. Is is that the processor that owns rows 0-2 will apply B symmetrical and then send the result >>>>>>>> >>>>>>>> to the processor that owns 3-5? >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Mar 21, 2022 at 12:14 PM Mark Adams >>>>>>>> wrote: >>>>>>>> >>>>>>>>> PETSc stores parallel matrices as two serial matrices. One for the >>>>>>>>> diagonal (d or A) block and one for the rest (o or B). >>>>>>>>> I would guess that for symmetric matrices it has a symmetric >>>>>>>>> matrix for the diagonal and a full AIJ matrix for the (upper) off-diagonal. >>>>>>>>> So the multtranspose is applying B symmetrically. This lower >>>>>>>>> off-diagonal and the diagonal block can be done without communication. >>>>>>>>> Then the off processor values are collected, and the upper >>>>>>>>> off-diagonal is applied. >>>>>>>>> >>>>>>>>> On Mon, Mar 21, 2022 at 2:35 PM Sam Guo >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I am most interested in how the lower triangular part is >>>>>>>>>> redistributed. It seems that SBAJI saves memory but requires more >>>>>>>>>> communication than BAIJ. >>>>>>>>>> >>>>>>>>>> On Mon, Mar 21, 2022 at 11:27 AM Sam Guo >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Mark, thanks for the quick response. I am more interested in >>>>>>>>>>> parallel implementation of MatMult for SBAIJ. I found following >>>>>>>>>>> >>>>>>>>>>> 1094: *PetscErrorCode MatMult_MPISBAIJ(Mat A,Vec xx,Vec yy)*1095: {1096: Mat_MPISBAIJ *a = (Mat_MPISBAIJ*)A->data;1097: PetscErrorCode ierr;1098: PetscInt mbs=a->mbs,bs=A->rmap->bs;1099: PetscScalar *from;1100: const PetscScalar *x; >>>>>>>>>>> 1103: /* diagonal part */1104: (*a->A->ops->mult)(a->A,xx,a->slvec1a);1105: VecSet (a->slvec1b,0.0); >>>>>>>>>>> 1107: /* subdiagonal part */1108: (*a->B->ops->multtranspose)(a->B,xx,a->slvec0b); >>>>>>>>>>> 1110: /* copy x into the vec slvec0 */1111: VecGetArray (a->slvec0,&from);1112: VecGetArrayRead (xx,&x); >>>>>>>>>>> 1114: PetscArraycpy (from,x,bs*mbs);1115: VecRestoreArray (a->slvec0,&from);1116: VecRestoreArrayRead (xx,&x); >>>>>>>>>>> 1118: VecScatterBegin (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1119: VecScatterEnd (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1120: /* supperdiagonal part */1121: (*a->B->ops->multadd)(a->B,a->slvec1b,a->slvec1a,yy);1122: return(0);1123: } >>>>>>>>>>> >>>>>>>>>>> I try to understand the algorithm. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Sam >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, Mar 21, 2022 at 11:14 AM Mark Adams >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> This code looks fine to me and the code is >>>>>>>>>>>> in src/mat/impls/sbaij/seq/sbaij2.c >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Mar 21, 2022 at 2:02 PM Sam Guo >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Dear PETSc dev team, >>>>>>>>>>>>> The documentation about MatCreateSBAIJ has following >>>>>>>>>>>>> "It is recommended that one use the MatCreate >>>>>>>>>>>>> >>>>>>>>>>>>> (), MatSetType >>>>>>>>>>>>> () >>>>>>>>>>>>> and/or MatSetFromOptions >>>>>>>>>>>>> (), >>>>>>>>>>>>> MatXXXXSetPreallocation() paradigm instead of this routine directly. >>>>>>>>>>>>> [MatXXXXSetPreallocation() is, for example, >>>>>>>>>>>>> MatSeqAIJSetPreallocation >>>>>>>>>>>>> >>>>>>>>>>>>> ]" >>>>>>>>>>>>> I currently call MatCreateSBAIJ directly as follows: >>>>>>>>>>>>> MatCreateSBAIJ (with d_nnz and o_nnz) >>>>>>>>>>>>> MatSetValues (to add row by row) >>>>>>>>>>>>> MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); >>>>>>>>>>>>> MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); >>>>>>>>>>>>> MatSetOption(A, MAT_SYMMETRIC, PETSC_TRUE); >>>>>>>>>>>>> >>>>>>>>>>>>> Two questions: >>>>>>>>>>>>> (1) I am wondering whether what I am doing is the >>>>>>>>>>>>> most efficient. >>>>>>>>>>>>> >>>>>>>>>>>>> (2) I try to find out how the matrix >>>>>>>>>>>>> vector multiplication is implemented in PETSc for SBAIJ storage. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Sam >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>> >>>>>>> >>>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Mar 22 16:11:08 2022 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 22 Mar 2022 17:11:08 -0400 Subject: [petsc-users] MatCreateSBAIJ In-Reply-To: References: <195BAAA7-B506-4FFE-83C2-E7874633B0EA@petsc.dev> Message-ID: I don't understand the numbers in the table. Is this memory summed over all the ranks or the maximum over the ranks? Is this for the same problem size on different np or are you increasing the problem size with more ranks? Are you storing and factoring the matrix on each rank or is the solution of a single linear system done in parallel? > On Mar 22, 2022, at 4:16 PM, Sam Guo wrote: > > Here is one memory comparison (memory in MB) > np=1 np=2 np=4 np=8 np=16 > shell 1614 1720 1874 1673 1248 > PETSc(using full matrix) 2108 2260 2364 2215 1734 > PETSc(using symmetric matrix) 1750 2100 2189 2094 1727 > Those are the total water mark memory added. > > On Tue, Mar 22, 2022 at 1:10 PM Barry Smith > wrote: > > Sam, > > MUMPS is a direct solver, as such, it requires much more memory than the original matrix (stored as a PETSc matrix) to store the factored matrix. The savings you will get by not having a PETSc copy of the matrix and a MUMPS copy of the matrix at the same time is unlikely to be significant. Do you have memory footprint measurements indicating that not having the PETSc copy of the matrix in memory will allow you to run measurably larger simulations? > > Barry > > > > >> On Mar 22, 2022, at 3:58 PM, Sam Guo > wrote: >> >> The reason I want to use shell matrix is to reduce memory footprint. If I create a PETSc matrix and use MUMPS, I understand PETSc will create another copy of the matrix for MUMPS. Is there any way to avoid the extra copy of MUMPS? >> >> On Tue, Mar 22, 2022 at 12:53 PM Sam Guo > wrote: >> Barry, >> Thanks for the illustration. Is there an easy way to mimic the implementation using shell matrix? I have been studying how the sMvctx is created and it seems pretty involved. >> >> Thanks, >> Sam >> >> On Mon, Mar 21, 2022 at 2:48 PM Barry Smith > wrote: >> >> >>> On Mar 21, 2022, at 4:36 PM, Sam Guo > wrote: >>> >>> Barry, >>> Thanks. Could you elaborate? I try to implement the matrix-vector multiplication for a symmetric matrix using shell matrix. >> >> Consider with three ranks >> >> (a) = ( A B D) (x) >> (b) ( B' C E) (y) >> (c) ( D' E' F) (w) >> >> Only the ones without the ' are stored on the rank. So for example B is stored on rank 0. >> >> Rank 0 computes A x and keeps it in a. Rank 1 computes Cy and keeps it in b Rank 2 computes Fw and keeps it in c >> >> Rank 0 computes B'x and D'x. It puts the nonzero entries of these values as well as the values of x into slvec0 >> >> Rank 1 computes E'y and puts the nonzero entries as well as the values into slvec0 >> >> Rank 2 puts the values of we needed by the other ranks into slvec0 >> >> Rank 0 does B y_h + D z_h where it gets the y_h and z_h values from slvec1 and adds it to a >> >> Rank 1 takes the B'x from slvec1 and adds it to b it then takes the E y_h values where the y_h are pulled from slvec1 and adds them b >> >> Rank 2 takes the B'x and E'y from slvec0 and adds them to c. >> >> >>> >>> Thanks, >>> Sam >>> >>> On Mon, Mar 21, 2022 at 12:56 PM Barry Smith > wrote: >>> >>> The "trick" is that though "more" communication is needed to complete the product the communication can still be done in a single VecScatter instead of two separate calls to VecScatter. We simply pack both pieces of information that needs to be sent into a single vector. >>> >>> /* copy x into the vec slvec0 */ >>> 1111: <> VecGetArray (a->slvec0,&from); >>> 1112: <> VecGetArrayRead (xx,&x); >>> >>> 1114: <> PetscArraycpy (from,x,bs*mbs); >>> 1115: <> VecRestoreArray (a->slvec0,&from); >>> 1116: <> VecRestoreArrayRead (xx,&x); >>> >>> 1118: <> VecScatterBegin (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD ); >>> 1119: <> VecScatterEnd (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD ); >>> If you create two symmetric matrices, one with SBAIJ and one with BAIJ and compare the time to do the product you will find that the SBAIJ is not significantly slower but does save memory. >>> >>> >>>> On Mar 21, 2022, at 3:26 PM, Sam Guo > wrote: >>>> >>>> Using following example from the MatCreateSBAIJ documentation >>>> 0 1 2 3 4 5 6 7 8 9 10 11 >>>> -------------------------- >>>> row 3 |. . . d d d o o o o o o >>>> row 4 |. . . d d d o o o o o o >>>> row 5 |. . . d d d o o o o o o >>>> -------------------------- >>>> >>>> On a processor that owns rows 3, 4 and 5, rows 0-2 info are still needed. Is is that the processor that owns rows 0-2 will apply B symmetrical and then send the result >>>> to the processor that owns 3-5? >>>> >>>> On Mon, Mar 21, 2022 at 12:14 PM Mark Adams > wrote: >>>> PETSc stores parallel matrices as two serial matrices. One for the diagonal (d or A) block and one for the rest (o or B). >>>> I would guess that for symmetric matrices it has a symmetric matrix for the diagonal and a full AIJ matrix for the (upper) off-diagonal. >>>> So the multtranspose is applying B symmetrically. This lower off-diagonal and the diagonal block can be done without communication. >>>> Then the off processor values are collected, and the upper off-diagonal is applied. >>>> >>>> On Mon, Mar 21, 2022 at 2:35 PM Sam Guo > wrote: >>>> I am most interested in how the lower triangular part is redistributed. It seems that SBAJI saves memory but requires more communication than BAIJ. >>>> >>>> On Mon, Mar 21, 2022 at 11:27 AM Sam Guo > wrote: >>>> Mark, thanks for the quick response. I am more interested in parallel implementation of MatMult for SBAIJ. I found following >>>> 1094: <> <>PetscErrorCode MatMult_MPISBAIJ(Mat A,Vec xx,Vec yy) >>>> 1095: <>{ >>>> 1096: <> Mat_MPISBAIJ *a = (Mat_MPISBAIJ*)A->data; >>>> 1097: <> PetscErrorCode ierr; >>>> 1098: <> PetscInt mbs=a->mbs,bs=A->rmap->bs; >>>> 1099: <> PetscScalar *from; >>>> 1100: <> const PetscScalar *x; >>>> >>>> 1103: <> /* diagonal part */ >>>> 1104: <> (*a->A->ops->mult)(a->A,xx,a->slvec1a); >>>> 1105: <> VecSet (a->slvec1b,0.0); >>>> >>>> 1107: <> /* subdiagonal part */ >>>> 1108: <> (*a->B->ops->multtranspose)(a->B,xx,a->slvec0b); >>>> >>>> 1110: <> /* copy x into the vec slvec0 */ >>>> 1111: <> VecGetArray (a->slvec0,&from); >>>> 1112: <> VecGetArrayRead (xx,&x); >>>> >>>> 1114: <> PetscArraycpy (from,x,bs*mbs); >>>> 1115: <> VecRestoreArray (a->slvec0,&from); >>>> 1116: <> VecRestoreArrayRead (xx,&x); >>>> >>>> 1118: <> VecScatterBegin (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD ); >>>> 1119: <> VecScatterEnd (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD ); >>>> 1120: <> /* supperdiagonal part */ >>>> 1121: <> (*a->B->ops->multadd)(a->B,a->slvec1b,a->slvec1a,yy); >>>> 1122: <> return(0); >>>> 1123: <>} >>>> I try to understand the algorithm. >>>> >>>> Thanks, >>>> Sam >>>> >>>> On Mon, Mar 21, 2022 at 11:14 AM Mark Adams > wrote: >>>> This code looks fine to me and the code is in src/mat/impls/sbaij/seq/sbaij2.c >>>> >>>> On Mon, Mar 21, 2022 at 2:02 PM Sam Guo > wrote: >>>> Dear PETSc dev team, >>>> The documentation about MatCreateSBAIJ has following >>>> "It is recommended that one use the MatCreate (), MatSetType () and/or MatSetFromOptions (), MatXXXXSetPreallocation() paradigm instead of this routine directly. [MatXXXXSetPreallocation() is, for example, MatSeqAIJSetPreallocation ]" >>>> I currently call MatCreateSBAIJ directly as follows: >>>> MatCreateSBAIJ (with d_nnz and o_nnz) >>>> MatSetValues (to add row by row) >>>> MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); >>>> MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); >>>> MatSetOption(A, MAT_SYMMETRIC, PETSC_TRUE); >>>> >>>> Two questions: >>>> (1) I am wondering whether what I am doing is the most efficient. >>>> >>>> (2) I try to find out how the matrix vector multiplication is implemented in PETSc for SBAIJ storage. >>>> >>>> Thanks, >>>> Sam >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Tue Mar 22 16:24:19 2022 From: sam.guo at cd-adapco.com (Sam Guo) Date: Tue, 22 Mar 2022 14:24:19 -0700 Subject: [petsc-users] MatCreateSBAIJ In-Reply-To: References: <195BAAA7-B506-4FFE-83C2-E7874633B0EA@petsc.dev> Message-ID: Hi Barry, This is the total memory summed over all the ranks. Same problem size on different np. I call MUMPS in parallel with distributed input and centralized rhs. Thanks, Sam On Tue, Mar 22, 2022 at 2:11 PM Barry Smith wrote: > > I don't understand the numbers in the table. > > Is this memory summed over all the ranks or the maximum over the ranks? > > Is this for the same problem size on different np or are you increasing > the problem size with more ranks? > > Are you storing and factoring the matrix on each rank or is the > solution of a single linear system done in parallel? > > > > > > > > > On Mar 22, 2022, at 4:16 PM, Sam Guo wrote: > > Here is one memory comparison (memory in MB) > np=1np=2np=4np=8np=16 > shell 1614 1720 1874 1673 1248 > PETSc(using full matrix) 2108 2260 2364 2215 1734 > PETSc(using symmetric matrix) 1750 2100 2189 2094 1727Those are the total > water mark memory added. > > On Tue, Mar 22, 2022 at 1:10 PM Barry Smith wrote: > >> >> Sam, >> >> MUMPS is a direct solver, as such, it requires much more memory than >> the original matrix (stored as a PETSc matrix) to store the factored >> matrix. The savings you will get by not having a PETSc copy of the matrix >> and a MUMPS copy of the matrix at the same time is unlikely to be >> significant. Do you have memory footprint measurements indicating that not >> having the PETSc copy of the matrix in memory will allow you to run >> measurably larger simulations? >> >> Barry >> >> >> >> >> On Mar 22, 2022, at 3:58 PM, Sam Guo wrote: >> >> The reason I want to use shell matrix is to reduce memory footprint. If I >> create a PETSc matrix and use MUMPS, I understand PETSc will create another >> copy of the matrix for MUMPS. Is there any way to avoid the extra copy of >> MUMPS? >> >> On Tue, Mar 22, 2022 at 12:53 PM Sam Guo wrote: >> >>> Barry, >>> Thanks for the illustration. Is there an easy way to mimic the >>> implementation using shell matrix? I have been studying how the sMvctx is >>> created and it seems pretty involved. >>> >>> Thanks, >>> Sam >>> >>> On Mon, Mar 21, 2022 at 2:48 PM Barry Smith wrote: >>> >>>> >>>> >>>> On Mar 21, 2022, at 4:36 PM, Sam Guo wrote: >>>> >>>> Barry, >>>> Thanks. Could you elaborate? I try to implement the matrix-vector >>>> multiplication for a symmetric matrix using shell matrix. >>>> >>>> >>>> Consider with three ranks >>>> >>>> (a) = ( A B D) (x) >>>> (b) ( B' C E) (y) >>>> (c) ( D' E' F) (w) >>>> >>>> Only the ones without the ' are stored on the rank. So for >>>> example B is stored on rank 0. >>>> >>>> Rank 0 computes A x and keeps it in a. Rank 1 computes Cy and >>>> keeps it in b Rank 2 computes Fw and keeps it in c >>>> >>>> Rank 0 computes B'x and D'x. It puts the nonzero entries of >>>> these values as well as the values of x into slvec0 >>>> >>>> Rank 1 computes E'y and puts the nonzero entries as well as the >>>> values into slvec0 >>>> >>>> Rank 2 puts the values of we needed by the other ranks into >>>> slvec0 >>>> >>>> Rank 0 does B y_h + D z_h where it gets the y_h and z_h values >>>> from slvec1 and adds it to a >>>> >>>> Rank 1 takes the B'x from slvec1 and adds it to b it then takes >>>> the E y_h values where the y_h are pulled from slvec1 and adds them b >>>> >>>> Rank 2 takes the B'x and E'y from slvec0 and adds them to c. >>>> >>>> >>>> >>>> Thanks, >>>> Sam >>>> >>>> On Mon, Mar 21, 2022 at 12:56 PM Barry Smith wrote: >>>> >>>>> >>>>> The "trick" is that though "more" communication is needed to >>>>> complete the product the communication can still be done in a single >>>>> VecScatter instead of two separate calls to VecScatter. We simply pack both >>>>> pieces of information that needs to be sent into a single vector. >>>>> >>>>> /* copy x into the vec slvec0 */1111: VecGetArray (a->slvec0,&from);1112: VecGetArrayRead (xx,&x); >>>>> 1114: PetscArraycpy (from,x,bs*mbs);1115: VecRestoreArray (a->slvec0,&from);1116: VecRestoreArrayRead (xx,&x); >>>>> 1118: VecScatterBegin (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1119: VecScatterEnd (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD ); >>>>> >>>>> If you create two symmetric matrices, one with SBAIJ and one with BAIJ >>>>> and compare the time to do the product you will find that the SBAIJ is not >>>>> significantly slower but does save memory. >>>>> >>>>> >>>>> On Mar 21, 2022, at 3:26 PM, Sam Guo wrote: >>>>> >>>>> Using following example from the MatCreateSBAIJ documentation >>>>> >>>>> 0 1 2 3 4 5 6 7 8 9 10 11 >>>>> -------------------------- >>>>> row 3 |. . . d d d o o o o o o >>>>> row 4 |. . . d d d o o o o o o >>>>> row 5 |. . . d d d o o o o o o >>>>> -------------------------- >>>>> >>>>> >>>>> On a processor that owns rows 3, 4 and 5, rows 0-2 info are still needed. Is is that the processor that owns rows 0-2 will apply B symmetrical and then send the result >>>>> >>>>> to the processor that owns 3-5? >>>>> >>>>> >>>>> On Mon, Mar 21, 2022 at 12:14 PM Mark Adams wrote: >>>>> >>>>>> PETSc stores parallel matrices as two serial matrices. One for the >>>>>> diagonal (d or A) block and one for the rest (o or B). >>>>>> I would guess that for symmetric matrices it has a symmetric matrix >>>>>> for the diagonal and a full AIJ matrix for the (upper) off-diagonal. >>>>>> So the multtranspose is applying B symmetrically. This lower >>>>>> off-diagonal and the diagonal block can be done without communication. >>>>>> Then the off processor values are collected, and the upper >>>>>> off-diagonal is applied. >>>>>> >>>>>> On Mon, Mar 21, 2022 at 2:35 PM Sam Guo >>>>>> wrote: >>>>>> >>>>>>> I am most interested in how the lower triangular part is >>>>>>> redistributed. It seems that SBAJI saves memory but requires more >>>>>>> communication than BAIJ. >>>>>>> >>>>>>> On Mon, Mar 21, 2022 at 11:27 AM Sam Guo >>>>>>> wrote: >>>>>>> >>>>>>>> Mark, thanks for the quick response. I am more interested in >>>>>>>> parallel implementation of MatMult for SBAIJ. I found following >>>>>>>> >>>>>>>> 1094: *PetscErrorCode MatMult_MPISBAIJ(Mat A,Vec xx,Vec yy)*1095: {1096: Mat_MPISBAIJ *a = (Mat_MPISBAIJ*)A->data;1097: PetscErrorCode ierr;1098: PetscInt mbs=a->mbs,bs=A->rmap->bs;1099: PetscScalar *from;1100: const PetscScalar *x; >>>>>>>> 1103: /* diagonal part */1104: (*a->A->ops->mult)(a->A,xx,a->slvec1a);1105: VecSet (a->slvec1b,0.0); >>>>>>>> 1107: /* subdiagonal part */1108: (*a->B->ops->multtranspose)(a->B,xx,a->slvec0b); >>>>>>>> 1110: /* copy x into the vec slvec0 */1111: VecGetArray (a->slvec0,&from);1112: VecGetArrayRead (xx,&x); >>>>>>>> 1114: PetscArraycpy (from,x,bs*mbs);1115: VecRestoreArray (a->slvec0,&from);1116: VecRestoreArrayRead (xx,&x); >>>>>>>> 1118: VecScatterBegin (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1119: VecScatterEnd (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1120: /* supperdiagonal part */1121: (*a->B->ops->multadd)(a->B,a->slvec1b,a->slvec1a,yy);1122: return(0);1123: } >>>>>>>> >>>>>>>> I try to understand the algorithm. >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Sam >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Mar 21, 2022 at 11:14 AM Mark Adams >>>>>>>> wrote: >>>>>>>> >>>>>>>>> This code looks fine to me and the code is >>>>>>>>> in src/mat/impls/sbaij/seq/sbaij2.c >>>>>>>>> >>>>>>>>> On Mon, Mar 21, 2022 at 2:02 PM Sam Guo >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Dear PETSc dev team, >>>>>>>>>> The documentation about MatCreateSBAIJ has following >>>>>>>>>> "It is recommended that one use the MatCreate >>>>>>>>>> >>>>>>>>>> (), MatSetType >>>>>>>>>> () >>>>>>>>>> and/or MatSetFromOptions >>>>>>>>>> (), >>>>>>>>>> MatXXXXSetPreallocation() paradigm instead of this routine directly. >>>>>>>>>> [MatXXXXSetPreallocation() is, for example, >>>>>>>>>> MatSeqAIJSetPreallocation >>>>>>>>>> >>>>>>>>>> ]" >>>>>>>>>> I currently call MatCreateSBAIJ directly as follows: >>>>>>>>>> MatCreateSBAIJ (with d_nnz and o_nnz) >>>>>>>>>> MatSetValues (to add row by row) >>>>>>>>>> MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); >>>>>>>>>> MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); >>>>>>>>>> MatSetOption(A, MAT_SYMMETRIC, PETSC_TRUE); >>>>>>>>>> >>>>>>>>>> Two questions: >>>>>>>>>> (1) I am wondering whether what I am doing is the >>>>>>>>>> most efficient. >>>>>>>>>> >>>>>>>>>> (2) I try to find out how the matrix vector multiplication is >>>>>>>>>> implemented in PETSc for SBAIJ storage. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Sam >>>>>>>>>> >>>>>>>>> >>>>> >>>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joauma.marichal at uclouvain.be Wed Mar 23 10:07:17 2022 From: joauma.marichal at uclouvain.be (Joauma Marichal) Date: Wed, 23 Mar 2022 15:07:17 +0000 Subject: [petsc-users] DMSwarm In-Reply-To: References: Message-ID: Hello, I sent an email last week about an issue I had with DMSwarm but did not get an answer yet. If there is any other information needed or anything I could try to solve it, I would be happy to do them... Thanks a lot for your help. Best regards, Joauma ________________________________ From: Joauma Marichal Sent: Friday, March 18, 2022 4:02 PM To: petsc-users at mcs.anl.gov Subject: DMSwarm Hello, I am writing to you as I am trying to implement a Lagrangian Particle Tracking method to my eulerian solver that relies on a 3D collocated DMDA. I have been using examples to develop a first basic code. The latter creates particles on rank 0 with random coordinates on the whole domain and then migrates them to the rank corresponding to these coordinates. Unfortunately, as I migrate I am loosing some particles. I came to understand that when I create a DMDA with 6 grid points in each 3 directions and then set coordinates in between 0 and 1 using ,DMDASetUniformCoordinates and running on 2 processors, I obtain the following coordinates values on each proc: [Proc 0] X = 0.000000 0.200000 0.400000 0.600000 0.800000 1.000000 [Proc 0] Y = 0.000000 0.200000 0.400000 0.600000 0.800000 1.000000 [Proc 0] Z = 0.000000 0.200000 0.400000 [Proc 1] X = 0.000000 0.200000 0.400000 0.600000 0.800000 1.000000 [Proc 1] Y = 0.000000 0.200000 0.400000 0.600000 0.800000 1.000000 [Proc 1] Z = 0.600000 0.800000 1.000000 . Furthermore, it appears that the particles that I am losing are (in the case of 2 processors) located in between z = 0.4 and z = 0.6. How can this be avoided? I attach my code to this email (I run it using mpirun -np 2 ./cobpor). Furthermore, my actual code relies on a collocated 3D DMDA, however the DMDASetUniformCoordinates seems to be working for staggered grids only... How would you advice to deal with particles in this case? Thanks a lot for your help. Best regards, Joauma -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed Mar 23 12:51:40 2022 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 23 Mar 2022 13:51:40 -0400 Subject: [petsc-users] DMSwarm In-Reply-To: References: Message-ID: On Fri, Mar 18, 2022 at 11:03 AM Joauma Marichal < joauma.marichal at uclouvain.be> wrote: > Hello, > > I am writing to you as I am trying to implement a Lagrangian Particle > Tracking method to my eulerian solver that relies on a 3D collocated DMDA. > > I have been using examples to develop a first basic code. The latter > creates particles on rank 0 with random coordinates on the whole domain and > then migrates them to the rank corresponding to these coordinates. > Unfortunately, as I migrate I am loosing some particles. I came to > understand that when I create a DMDA with 6 grid points in each 3 > directions and then set coordinates in between 0 and 1 using > ,DMDASetUniformCoordinates and running on 2 processors, I obtain the > following coordinates values on each proc: > [Proc 0] X = 0.000000 0.200000 0.400000 0.600000 0.800000 1.000000 > [Proc 0] Y = 0.000000 0.200000 0.400000 0.600000 0.800000 1.000000 > [Proc 0] Z = 0.000000 0.200000 0.400000 > [Proc 1] X = 0.000000 0.200000 0.400000 0.600000 0.800000 1.000000 > [Proc 1] Y = 0.000000 0.200000 0.400000 0.600000 0.800000 1.000000 > [Proc 1] Z = 0.600000 0.800000 1.000000 . > I am not super familiar with DAs, but it looks like you have a non-overlapping set of points and Swarm is cell centered and it is getting confused. > Furthermore, it appears that the particles that I am losing are (in the > case of 2 processors) located in between z = 0.4 and z = 0.6. How can this > be avoided? > Yea, I can see that. Swarm is only tested with Plex meshes. It looks like the abstraction for DAs does not work. Swarm has been adding support for regular grids in Plex (not DA) and that is the direction being developed You should start with one of the Swarm examples. src/dm/impls/swarm/tutorials/ex1.c is probably as good a place to start. We recommend using command line arguments for specifying the grid size. See the example tests in the comment at the end of the test file. (you can do this with code but it is cumbersome and we are moving toward command line arguments. If you really need to you can add the command line args to the database in the code.) Sorry for the delay, Mark > > I attach my code to this email (I run it using mpirun -np 2 ./cobpor). > > Furthermore, my actual code relies on a collocated 3D DMDA, however the > DMDASetUniformCoordinates seems to be working for staggered grids only... > How would you advice to deal with particles in this case? > > Thanks a lot for your help. > > Best regards, > Joauma > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Mar 23 12:51:44 2022 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 23 Mar 2022 13:51:44 -0400 Subject: [petsc-users] DMSwarm In-Reply-To: References: Message-ID: On Wed, Mar 23, 2022 at 11:09 AM Joauma Marichal < joauma.marichal at uclouvain.be> wrote: > Hello, > > I sent an email last week about an issue I had with DMSwarm but did not > get an answer yet. If there is any other information needed or anything I > could try to solve it, I would be happy to do them... > I got a chance to run the code. I believe this undercovered a bug in our implementation of point location with DMDA. I will make an Issue. Your example runs correctly for me if you replace DM_BOUNDARY_GHOSTED with DM_BOUNDARY_NONE in the DMDACreate3d. Can you try that? Thanks, Matt > Thanks a lot for your help. > > Best regards, > Joauma > > ------------------------------ > *From:* Joauma Marichal > *Sent:* Friday, March 18, 2022 4:02 PM > *To:* petsc-users at mcs.anl.gov > *Subject:* DMSwarm > > Hello, > > I am writing to you as I am trying to implement a Lagrangian Particle > Tracking method to my eulerian solver that relies on a 3D collocated DMDA. > > I have been using examples to develop a first basic code. The latter > creates particles on rank 0 with random coordinates on the whole domain and > then migrates them to the rank corresponding to these coordinates. > Unfortunately, as I migrate I am loosing some particles. I came to > understand that when I create a DMDA with 6 grid points in each 3 > directions and then set coordinates in between 0 and 1 using > ,DMDASetUniformCoordinates and running on 2 processors, I obtain the > following coordinates values on each proc: > [Proc 0] X = 0.000000 0.200000 0.400000 0.600000 0.800000 1.000000 > [Proc 0] Y = 0.000000 0.200000 0.400000 0.600000 0.800000 1.000000 > [Proc 0] Z = 0.000000 0.200000 0.400000 > [Proc 1] X = 0.000000 0.200000 0.400000 0.600000 0.800000 1.000000 > [Proc 1] Y = 0.000000 0.200000 0.400000 0.600000 0.800000 1.000000 > [Proc 1] Z = 0.600000 0.800000 1.000000 . > Furthermore, it appears that the particles that I am losing are (in the > case of 2 processors) located in between z = 0.4 and z = 0.6. How can this > be avoided? > I attach my code to this email (I run it using mpirun -np 2 ./cobpor). > > Furthermore, my actual code relies on a collocated 3D DMDA, however the > DMDASetUniformCoordinates seems to be working for staggered grids only... > How would you advice to deal with particles in this case? > > Thanks a lot for your help. > > Best regards, > Joauma > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Wed Mar 23 15:51:11 2022 From: sam.guo at cd-adapco.com (Sam Guo) Date: Wed, 23 Mar 2022 13:51:11 -0700 Subject: [petsc-users] MatCreateSBAIJ In-Reply-To: References: <195BAAA7-B506-4FFE-83C2-E7874633B0EA@petsc.dev> Message-ID: Hi Barry, I try to understand your example. Why does Rank 0 put the values of x into slvec0? x is needed on rank 0. Other ranks need B'x and D'x. Is it because we need slvec0 to be the same size as (x,y,z) on all ranks? Thanks, Sam On Tue, Mar 22, 2022 at 2:24 PM Sam Guo wrote: > Hi Barry, > This is the total memory summed over all the ranks. > Same problem size on different np. > I call MUMPS in parallel with distributed input and centralized rhs. > > Thanks, > Sam > > On Tue, Mar 22, 2022 at 2:11 PM Barry Smith wrote: > >> >> I don't understand the numbers in the table. >> >> Is this memory summed over all the ranks or the maximum over the ranks? >> >> Is this for the same problem size on different np or are you increasing >> the problem size with more ranks? >> >> Are you storing and factoring the matrix on each rank or is the >> solution of a single linear system done in parallel? >> >> >> >> >> >> >> >> >> On Mar 22, 2022, at 4:16 PM, Sam Guo wrote: >> >> Here is one memory comparison (memory in MB) >> np=1np=2np=4np=8np=16 >> shell 1614 1720 1874 1673 1248 >> PETSc(using full matrix) 2108 2260 2364 2215 1734 >> PETSc(using symmetric matrix) 1750 2100 2189 2094 1727Those are the >> total water mark memory added. >> >> On Tue, Mar 22, 2022 at 1:10 PM Barry Smith wrote: >> >>> >>> Sam, >>> >>> MUMPS is a direct solver, as such, it requires much more memory than >>> the original matrix (stored as a PETSc matrix) to store the factored >>> matrix. The savings you will get by not having a PETSc copy of the matrix >>> and a MUMPS copy of the matrix at the same time is unlikely to be >>> significant. Do you have memory footprint measurements indicating that not >>> having the PETSc copy of the matrix in memory will allow you to run >>> measurably larger simulations? >>> >>> Barry >>> >>> >>> >>> >>> On Mar 22, 2022, at 3:58 PM, Sam Guo wrote: >>> >>> The reason I want to use shell matrix is to reduce memory footprint. If >>> I create a PETSc matrix and use MUMPS, I understand PETSc will create >>> another copy of the matrix for MUMPS. Is there any way to avoid the extra >>> copy of MUMPS? >>> >>> On Tue, Mar 22, 2022 at 12:53 PM Sam Guo wrote: >>> >>>> Barry, >>>> Thanks for the illustration. Is there an easy way to mimic the >>>> implementation using shell matrix? I have been studying how the sMvctx is >>>> created and it seems pretty involved. >>>> >>>> Thanks, >>>> Sam >>>> >>>> On Mon, Mar 21, 2022 at 2:48 PM Barry Smith wrote: >>>> >>>>> >>>>> >>>>> On Mar 21, 2022, at 4:36 PM, Sam Guo wrote: >>>>> >>>>> Barry, >>>>> Thanks. Could you elaborate? I try to implement the matrix-vector >>>>> multiplication for a symmetric matrix using shell matrix. >>>>> >>>>> >>>>> Consider with three ranks >>>>> >>>>> (a) = ( A B D) (x) >>>>> (b) ( B' C E) (y) >>>>> (c) ( D' E' F) (w) >>>>> >>>>> Only the ones without the ' are stored on the rank. So for >>>>> example B is stored on rank 0. >>>>> >>>>> Rank 0 computes A x and keeps it in a. Rank 1 computes Cy and >>>>> keeps it in b Rank 2 computes Fw and keeps it in c >>>>> >>>>> Rank 0 computes B'x and D'x. It puts the nonzero entries of >>>>> these values as well as the values of x into slvec0 >>>>> >>>>> Rank 1 computes E'y and puts the nonzero entries as well as the >>>>> values into slvec0 >>>>> >>>>> Rank 2 puts the values of we needed by the other ranks into >>>>> slvec0 >>>>> >>>>> Rank 0 does B y_h + D z_h where it gets the y_h and z_h values >>>>> from slvec1 and adds it to a >>>>> >>>>> Rank 1 takes the B'x from slvec1 and adds it to b it then takes >>>>> the E y_h values where the y_h are pulled from slvec1 and adds them b >>>>> >>>>> Rank 2 takes the B'x and E'y from slvec0 and adds them to c. >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> Sam >>>>> >>>>> On Mon, Mar 21, 2022 at 12:56 PM Barry Smith wrote: >>>>> >>>>>> >>>>>> The "trick" is that though "more" communication is needed to >>>>>> complete the product the communication can still be done in a single >>>>>> VecScatter instead of two separate calls to VecScatter. We simply pack both >>>>>> pieces of information that needs to be sent into a single vector. >>>>>> >>>>>> /* copy x into the vec slvec0 */1111: VecGetArray (a->slvec0,&from);1112: VecGetArrayRead (xx,&x); >>>>>> 1114: PetscArraycpy (from,x,bs*mbs);1115: VecRestoreArray (a->slvec0,&from);1116: VecRestoreArrayRead (xx,&x); >>>>>> 1118: VecScatterBegin (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1119: VecScatterEnd (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD ); >>>>>> >>>>>> If you create two symmetric matrices, one with SBAIJ and one with >>>>>> BAIJ and compare the time to do the product you will find that the SBAIJ is >>>>>> not significantly slower but does save memory. >>>>>> >>>>>> >>>>>> On Mar 21, 2022, at 3:26 PM, Sam Guo wrote: >>>>>> >>>>>> Using following example from the MatCreateSBAIJ documentation >>>>>> >>>>>> 0 1 2 3 4 5 6 7 8 9 10 11 >>>>>> -------------------------- >>>>>> row 3 |. . . d d d o o o o o o >>>>>> row 4 |. . . d d d o o o o o o >>>>>> row 5 |. . . d d d o o o o o o >>>>>> -------------------------- >>>>>> >>>>>> >>>>>> On a processor that owns rows 3, 4 and 5, rows 0-2 info are still needed. Is is that the processor that owns rows 0-2 will apply B symmetrical and then send the result >>>>>> >>>>>> to the processor that owns 3-5? >>>>>> >>>>>> >>>>>> On Mon, Mar 21, 2022 at 12:14 PM Mark Adams wrote: >>>>>> >>>>>>> PETSc stores parallel matrices as two serial matrices. One for the >>>>>>> diagonal (d or A) block and one for the rest (o or B). >>>>>>> I would guess that for symmetric matrices it has a symmetric matrix >>>>>>> for the diagonal and a full AIJ matrix for the (upper) off-diagonal. >>>>>>> So the multtranspose is applying B symmetrically. This lower >>>>>>> off-diagonal and the diagonal block can be done without communication. >>>>>>> Then the off processor values are collected, and the upper >>>>>>> off-diagonal is applied. >>>>>>> >>>>>>> On Mon, Mar 21, 2022 at 2:35 PM Sam Guo >>>>>>> wrote: >>>>>>> >>>>>>>> I am most interested in how the lower triangular part is >>>>>>>> redistributed. It seems that SBAJI saves memory but requires more >>>>>>>> communication than BAIJ. >>>>>>>> >>>>>>>> On Mon, Mar 21, 2022 at 11:27 AM Sam Guo >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Mark, thanks for the quick response. I am more interested in >>>>>>>>> parallel implementation of MatMult for SBAIJ. I found following >>>>>>>>> >>>>>>>>> 1094: *PetscErrorCode MatMult_MPISBAIJ(Mat A,Vec xx,Vec yy)*1095: {1096: Mat_MPISBAIJ *a = (Mat_MPISBAIJ*)A->data;1097: PetscErrorCode ierr;1098: PetscInt mbs=a->mbs,bs=A->rmap->bs;1099: PetscScalar *from;1100: const PetscScalar *x; >>>>>>>>> 1103: /* diagonal part */1104: (*a->A->ops->mult)(a->A,xx,a->slvec1a);1105: VecSet (a->slvec1b,0.0); >>>>>>>>> 1107: /* subdiagonal part */1108: (*a->B->ops->multtranspose)(a->B,xx,a->slvec0b); >>>>>>>>> 1110: /* copy x into the vec slvec0 */1111: VecGetArray (a->slvec0,&from);1112: VecGetArrayRead (xx,&x); >>>>>>>>> 1114: PetscArraycpy (from,x,bs*mbs);1115: VecRestoreArray (a->slvec0,&from);1116: VecRestoreArrayRead (xx,&x); >>>>>>>>> 1118: VecScatterBegin (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1119: VecScatterEnd (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD );1120: /* supperdiagonal part */1121: (*a->B->ops->multadd)(a->B,a->slvec1b,a->slvec1a,yy);1122: return(0);1123: } >>>>>>>>> >>>>>>>>> I try to understand the algorithm. >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Sam >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Mar 21, 2022 at 11:14 AM Mark Adams >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> This code looks fine to me and the code is >>>>>>>>>> in src/mat/impls/sbaij/seq/sbaij2.c >>>>>>>>>> >>>>>>>>>> On Mon, Mar 21, 2022 at 2:02 PM Sam Guo >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Dear PETSc dev team, >>>>>>>>>>> The documentation about MatCreateSBAIJ has following >>>>>>>>>>> "It is recommended that one use the MatCreate >>>>>>>>>>> >>>>>>>>>>> (), MatSetType >>>>>>>>>>> () >>>>>>>>>>> and/or MatSetFromOptions >>>>>>>>>>> (), >>>>>>>>>>> MatXXXXSetPreallocation() paradigm instead of this routine directly. >>>>>>>>>>> [MatXXXXSetPreallocation() is, for example, >>>>>>>>>>> MatSeqAIJSetPreallocation >>>>>>>>>>> >>>>>>>>>>> ]" >>>>>>>>>>> I currently call MatCreateSBAIJ directly as follows: >>>>>>>>>>> MatCreateSBAIJ (with d_nnz and o_nnz) >>>>>>>>>>> MatSetValues (to add row by row) >>>>>>>>>>> MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); >>>>>>>>>>> MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); >>>>>>>>>>> MatSetOption(A, MAT_SYMMETRIC, PETSC_TRUE); >>>>>>>>>>> >>>>>>>>>>> Two questions: >>>>>>>>>>> (1) I am wondering whether what I am doing is the >>>>>>>>>>> most efficient. >>>>>>>>>>> >>>>>>>>>>> (2) I try to find out how the matrix vector multiplication is >>>>>>>>>>> implemented in PETSc for SBAIJ storage. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Sam >>>>>>>>>>> >>>>>>>>>> >>>>>> >>>>> >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Mar 23 15:58:26 2022 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 23 Mar 2022 16:58:26 -0400 Subject: [petsc-users] MatCreateSBAIJ In-Reply-To: References: <195BAAA7-B506-4FFE-83C2-E7874633B0EA@petsc.dev> Message-ID: My mistake, yes rank 0 does not need to put any values of x into slvec0. rank 1 and 2 need to put parts of y and w in because they will be needed by the other ranks. > On Mar 23, 2022, at 4:51 PM, Sam Guo wrote: > > Hi Barry, > I try to understand your example. Why does Rank 0 put the values of x into slvec0? x is needed on rank 0. Other ranks need B'x and D'x. Is it because we need slvec0 to be the same size as (x,y,z) on all ranks? > > Thanks, > Sam > > > On Tue, Mar 22, 2022 at 2:24 PM Sam Guo > wrote: > Hi Barry, > This is the total memory summed over all the ranks. > Same problem size on different np. > I call MUMPS in parallel with distributed input and centralized rhs. > > Thanks, > Sam > > On Tue, Mar 22, 2022 at 2:11 PM Barry Smith > wrote: > > I don't understand the numbers in the table. > > Is this memory summed over all the ranks or the maximum over the ranks? > > Is this for the same problem size on different np or are you increasing the problem size with more ranks? > > Are you storing and factoring the matrix on each rank or is the solution of a single linear system done in parallel? > > > > > > > > >> On Mar 22, 2022, at 4:16 PM, Sam Guo > wrote: >> >> Here is one memory comparison (memory in MB) >> np=1 np=2 np=4 np=8 np=16 >> shell 1614 1720 1874 1673 1248 >> PETSc(using full matrix) 2108 2260 2364 2215 1734 >> PETSc(using symmetric matrix) 1750 2100 2189 2094 1727 >> Those are the total water mark memory added. >> >> On Tue, Mar 22, 2022 at 1:10 PM Barry Smith > wrote: >> >> Sam, >> >> MUMPS is a direct solver, as such, it requires much more memory than the original matrix (stored as a PETSc matrix) to store the factored matrix. The savings you will get by not having a PETSc copy of the matrix and a MUMPS copy of the matrix at the same time is unlikely to be significant. Do you have memory footprint measurements indicating that not having the PETSc copy of the matrix in memory will allow you to run measurably larger simulations? >> >> Barry >> >> >> >> >>> On Mar 22, 2022, at 3:58 PM, Sam Guo > wrote: >>> >>> The reason I want to use shell matrix is to reduce memory footprint. If I create a PETSc matrix and use MUMPS, I understand PETSc will create another copy of the matrix for MUMPS. Is there any way to avoid the extra copy of MUMPS? >>> >>> On Tue, Mar 22, 2022 at 12:53 PM Sam Guo > wrote: >>> Barry, >>> Thanks for the illustration. Is there an easy way to mimic the implementation using shell matrix? I have been studying how the sMvctx is created and it seems pretty involved. >>> >>> Thanks, >>> Sam >>> >>> On Mon, Mar 21, 2022 at 2:48 PM Barry Smith > wrote: >>> >>> >>>> On Mar 21, 2022, at 4:36 PM, Sam Guo > wrote: >>>> >>>> Barry, >>>> Thanks. Could you elaborate? I try to implement the matrix-vector multiplication for a symmetric matrix using shell matrix. >>> >>> Consider with three ranks >>> >>> (a) = ( A B D) (x) >>> (b) ( B' C E) (y) >>> (c) ( D' E' F) (w) >>> >>> Only the ones without the ' are stored on the rank. So for example B is stored on rank 0. >>> >>> Rank 0 computes A x and keeps it in a. Rank 1 computes Cy and keeps it in b Rank 2 computes Fw and keeps it in c >>> >>> Rank 0 computes B'x and D'x. It puts the nonzero entries of these values as well as the values of x into slvec0 >>> >>> Rank 1 computes E'y and puts the nonzero entries as well as the values into slvec0 >>> >>> Rank 2 puts the values of we needed by the other ranks into slvec0 >>> >>> Rank 0 does B y_h + D z_h where it gets the y_h and z_h values from slvec1 and adds it to a >>> >>> Rank 1 takes the B'x from slvec1 and adds it to b it then takes the E y_h values where the y_h are pulled from slvec1 and adds them b >>> >>> Rank 2 takes the B'x and E'y from slvec0 and adds them to c. >>> >>> >>>> >>>> Thanks, >>>> Sam >>>> >>>> On Mon, Mar 21, 2022 at 12:56 PM Barry Smith > wrote: >>>> >>>> The "trick" is that though "more" communication is needed to complete the product the communication can still be done in a single VecScatter instead of two separate calls to VecScatter. We simply pack both pieces of information that needs to be sent into a single vector. >>>> >>>> /* copy x into the vec slvec0 */ >>>> 1111: <> VecGetArray (a->slvec0,&from); >>>> 1112: <> VecGetArrayRead (xx,&x); >>>> >>>> 1114: <> PetscArraycpy (from,x,bs*mbs); >>>> 1115: <> VecRestoreArray (a->slvec0,&from); >>>> 1116: <> VecRestoreArrayRead (xx,&x); >>>> >>>> 1118: <> VecScatterBegin (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD ); >>>> 1119: <> VecScatterEnd (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD ); >>>> If you create two symmetric matrices, one with SBAIJ and one with BAIJ and compare the time to do the product you will find that the SBAIJ is not significantly slower but does save memory. >>>> >>>> >>>>> On Mar 21, 2022, at 3:26 PM, Sam Guo > wrote: >>>>> >>>>> Using following example from the MatCreateSBAIJ documentation >>>>> 0 1 2 3 4 5 6 7 8 9 10 11 >>>>> -------------------------- >>>>> row 3 |. . . d d d o o o o o o >>>>> row 4 |. . . d d d o o o o o o >>>>> row 5 |. . . d d d o o o o o o >>>>> -------------------------- >>>>> >>>>> On a processor that owns rows 3, 4 and 5, rows 0-2 info are still needed. Is is that the processor that owns rows 0-2 will apply B symmetrical and then send the result >>>>> to the processor that owns 3-5? >>>>> >>>>> On Mon, Mar 21, 2022 at 12:14 PM Mark Adams > wrote: >>>>> PETSc stores parallel matrices as two serial matrices. One for the diagonal (d or A) block and one for the rest (o or B). >>>>> I would guess that for symmetric matrices it has a symmetric matrix for the diagonal and a full AIJ matrix for the (upper) off-diagonal. >>>>> So the multtranspose is applying B symmetrically. This lower off-diagonal and the diagonal block can be done without communication. >>>>> Then the off processor values are collected, and the upper off-diagonal is applied. >>>>> >>>>> On Mon, Mar 21, 2022 at 2:35 PM Sam Guo > wrote: >>>>> I am most interested in how the lower triangular part is redistributed. It seems that SBAJI saves memory but requires more communication than BAIJ. >>>>> >>>>> On Mon, Mar 21, 2022 at 11:27 AM Sam Guo > wrote: >>>>> Mark, thanks for the quick response. I am more interested in parallel implementation of MatMult for SBAIJ. I found following >>>>> 1094: <> <>PetscErrorCode MatMult_MPISBAIJ(Mat A,Vec xx,Vec yy) >>>>> 1095: <>{ >>>>> 1096: <> Mat_MPISBAIJ *a = (Mat_MPISBAIJ*)A->data; >>>>> 1097: <> PetscErrorCode ierr; >>>>> 1098: <> PetscInt mbs=a->mbs,bs=A->rmap->bs; >>>>> 1099: <> PetscScalar *from; >>>>> 1100: <> const PetscScalar *x; >>>>> >>>>> 1103: <> /* diagonal part */ >>>>> 1104: <> (*a->A->ops->mult)(a->A,xx,a->slvec1a); >>>>> 1105: <> VecSet (a->slvec1b,0.0); >>>>> >>>>> 1107: <> /* subdiagonal part */ >>>>> 1108: <> (*a->B->ops->multtranspose)(a->B,xx,a->slvec0b); >>>>> >>>>> 1110: <> /* copy x into the vec slvec0 */ >>>>> 1111: <> VecGetArray (a->slvec0,&from); >>>>> 1112: <> VecGetArrayRead (xx,&x); >>>>> >>>>> 1114: <> PetscArraycpy (from,x,bs*mbs); >>>>> 1115: <> VecRestoreArray (a->slvec0,&from); >>>>> 1116: <> VecRestoreArrayRead (xx,&x); >>>>> >>>>> 1118: <> VecScatterBegin (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD ); >>>>> 1119: <> VecScatterEnd (a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES ,SCATTER_FORWARD ); >>>>> 1120: <> /* supperdiagonal part */ >>>>> 1121: <> (*a->B->ops->multadd)(a->B,a->slvec1b,a->slvec1a,yy); >>>>> 1122: <> return(0); >>>>> 1123: <>} >>>>> I try to understand the algorithm. >>>>> >>>>> Thanks, >>>>> Sam >>>>> >>>>> On Mon, Mar 21, 2022 at 11:14 AM Mark Adams > wrote: >>>>> This code looks fine to me and the code is in src/mat/impls/sbaij/seq/sbaij2.c >>>>> >>>>> On Mon, Mar 21, 2022 at 2:02 PM Sam Guo > wrote: >>>>> Dear PETSc dev team, >>>>> The documentation about MatCreateSBAIJ has following >>>>> "It is recommended that one use the MatCreate (), MatSetType () and/or MatSetFromOptions (), MatXXXXSetPreallocation() paradigm instead of this routine directly. [MatXXXXSetPreallocation() is, for example, MatSeqAIJSetPreallocation ]" >>>>> I currently call MatCreateSBAIJ directly as follows: >>>>> MatCreateSBAIJ (with d_nnz and o_nnz) >>>>> MatSetValues (to add row by row) >>>>> MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); >>>>> MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); >>>>> MatSetOption(A, MAT_SYMMETRIC, PETSC_TRUE); >>>>> >>>>> Two questions: >>>>> (1) I am wondering whether what I am doing is the most efficient. >>>>> >>>>> (2) I try to find out how the matrix vector multiplication is implemented in PETSc for SBAIJ storage. >>>>> >>>>> Thanks, >>>>> Sam >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.seize at onera.fr Fri Mar 25 06:07:00 2022 From: pierre.seize at onera.fr (Pierre Seize) Date: Fri, 25 Mar 2022 12:07:00 +0100 Subject: [petsc-users] Question on MATMFFD_WP Message-ID: <477b39b9-6c54-2ad0-4c50-4d56dd529b9f@onera.fr> Hello PETSc team and users, I have a question regarding MATMFFD_WP : the documentation states that 1) || U || does not change between linear iterations and 2) in GMRES || a || = 1 except at restart. In src/mat/impls/mffd/wp.c, in MatMFFDCompute_WP, I see that the computation of || U || is inside an if statement, which I guess corresponds to what is stated in the documentation, but the computation of || a || is done every time. Does this mean that || a || is computed at each GMRES iteration, even when we know it's 1 ? I was checking this to see how you handle the case of right preconditioning : then in GMRES it is no longer true than || a || == 1, i think. Thank you in advance for your replies. Pierre Seize From dave.mayhem23 at gmail.com Fri Mar 25 07:23:01 2022 From: dave.mayhem23 at gmail.com (Dave May) Date: Fri, 25 Mar 2022 13:23:01 +0100 Subject: [petsc-users] DMSwarm In-Reply-To: References: Message-ID: Hi, On Wed 23. Mar 2022 at 18:52, Matthew Knepley wrote: > On Wed, Mar 23, 2022 at 11:09 AM Joauma Marichal < > joauma.marichal at uclouvain.be> wrote: > >> Hello, >> >> I sent an email last week about an issue I had with DMSwarm but did not >> get an answer yet. If there is any other information needed or anything I >> could try to solve it, I would be happy to do them... >> > > I got a chance to run the code. I believe this undercovered a bug in our > implementation of point location with DMDA. I will make an Issue. > > Your example runs correctly for me if you replace DM_BOUNDARY_GHOSTED with > DM_BOUNDARY_NONE in the DMDACreate3d. > Can you try that? > The PIC support in place between DMSwarm and DMDA only works when the DA points define the vertices of a set of quads / hexes AND if the mesh is uniform, ie you defined the coordinates using SetUniformCoordinates. The point location routine is very simple. There is no way for the DA infrastructure to know what the points in the DA physically represent (Ie vertices, cell centroid or face centroids). The DA just defines a set of logically order points which can be indexed in an i,j,k manner. So if you are using the DA to represent cell centered data then the point location routine will give incorrect results. Also, the coordinates from SetUniformCoordinates won?t give you what you expect either if the x0,x1 you provide define the start,end coordinates of the physical boundary, but you interpret the DA points to be cell centers. There are several options you can pursue. 1/ Make an independent DMDA which represents the vertices of your mesh. Use this DA with your DMSwarm. 2/ Provide your own point location routine for your collocated DA representation. Thanks, Dave > Thanks, > > Matt > > >> Thanks a lot for your help. >> >> Best regards, >> Joauma >> >> ------------------------------ >> *From:* Joauma Marichal >> *Sent:* Friday, March 18, 2022 4:02 PM >> *To:* petsc-users at mcs.anl.gov >> *Subject:* DMSwarm >> >> Hello, >> >> I am writing to you as I am trying to implement a Lagrangian Particle >> Tracking method to my eulerian solver that relies on a 3D collocated DMDA. >> >> I have been using examples to develop a first basic code. The latter >> creates particles on rank 0 with random coordinates on the whole domain and >> then migrates them to the rank corresponding to these coordinates. >> Unfortunately, as I migrate I am loosing some particles. I came to >> understand that when I create a DMDA with 6 grid points in each 3 >> directions and then set coordinates in between 0 and 1 using >> ,DMDASetUniformCoordinates and running on 2 processors, I obtain the >> following coordinates values on each proc: >> [Proc 0] X = 0.000000 0.200000 0.400000 0.600000 0.800000 1.000000 >> [Proc 0] Y = 0.000000 0.200000 0.400000 0.600000 0.800000 1.000000 >> [Proc 0] Z = 0.000000 0.200000 0.400000 >> [Proc 1] X = 0.000000 0.200000 0.400000 0.600000 0.800000 1.000000 >> [Proc 1] Y = 0.000000 0.200000 0.400000 0.600000 0.800000 1.000000 >> [Proc 1] Z = 0.600000 0.800000 1.000000 . >> Furthermore, it appears that the particles that I am losing are (in the >> case of 2 processors) located in between z = 0.4 and z = 0.6. How can this >> be avoided? >> I attach my code to this email (I run it using mpirun -np 2 ./cobpor). >> >> Furthermore, my actual code relies on a collocated 3D DMDA, however the >> DMDASetUniformCoordinates seems to be working for staggered grids >> only... How would you advice to deal with particles in this case? >> >> Thanks a lot for your help. >> >> Best regards, >> Joauma >> >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Mar 25 09:48:06 2022 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 25 Mar 2022 10:48:06 -0400 Subject: [petsc-users] Question on MATMFFD_WP In-Reply-To: <477b39b9-6c54-2ad0-4c50-4d56dd529b9f@onera.fr> References: <477b39b9-6c54-2ad0-4c50-4d56dd529b9f@onera.fr> Message-ID: This uses a PETSc "trick". When a norm is computed on a vector it is "stashed" in the object and retrieved quickly if requested again. https://petsc.org/main/src/vec/vec/interface/rvector.c.html#VecNorm Because PETSc controls write access to the vector it always knows if the current stashed value is valid or invalid so can avoid unneeded recompilations. When GMRES normalizes the vector, its known norm which is stashed is suitably scaled in the VecScale routine https://petsc.org/main/src/vec/vec/interface/rvector.c.html#VecScale and thus available the the MATMFFD_WP at no cost. With right conditioned GMRES or other KSP methods since the norm of a is generally not known the call to VecNorm triggers is computation; but with left preconditioned GMRES is available "for free" in the stash. Barry > On Mar 25, 2022, at 7:07 AM, Pierre Seize wrote: > > Hello PETSc team and users, > > I have a question regarding MATMFFD_WP : the documentation states that 1) || U || does not change between linear iterations and 2) in GMRES || a || = 1 except at restart. > > In src/mat/impls/mffd/wp.c, in MatMFFDCompute_WP, I see that the computation of || U || is inside an if statement, which I guess corresponds to what is stated in the documentation, but the computation of || a || is done every time. > > Does this mean that || a || is computed at each GMRES iteration, even when we know it's 1 ? > > I was checking this to see how you handle the case of right preconditioning : then in GMRES it is no longer true than || a || == 1, i think. > > > Thank you in advance for your replies. > > > Pierre Seize > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bruce.Palmer at pnnl.gov Fri Mar 25 10:26:45 2022 From: Bruce.Palmer at pnnl.gov (Palmer, Bruce J) Date: Fri, 25 Mar 2022 15:26:45 +0000 Subject: [petsc-users] Configuring with CMake In-Reply-To: <87ee3utxpi.fsf@jedbrown.org> References: <1D44E11E-7219-4260-8B58-0274EE1BA3A6@pnnl.gov> <66DE95E7-67B2-4DEF-9C76-8DFF24647480@petsc.dev> <3649bdda-ac31-10d8-7951-4bbacfa6c05e@mcs.anl.gov> <249144c-4765-c4d9-529-e3891c5caabf@mcs.anl.gov> <241B5C7C-14E4-479E-A843-AA392AC1F17D@pnnl.gov> <87ee3utxpi.fsf@jedbrown.org> Message-ID: <1C5EF826-B032-44F6-890C-99AD6DDBBDBB@pnnl.gov> I didn't get around to writing a reproducer, but I did send an extensive complaint to the CMake user group. I haven't heard anything back yet. I finally got this to work by adding this to the CMakeLists.txt file if (NOT BUILD_SHARED_LIBS) target_link_libraries(gridpack_math PUBLIC ${PETSC_STATIC_LIBRARIES} ) target_link_options(gridpack_math PUBLIC ${PETSC_STATIC_LDFLAGS}) endif() I had to add PETSC_STATIC_LIBRARIES and PETSC_STATIC_LDFLAGS to get everything to compile. Bruce ?On 2/22/22, 1:10 PM, "Jed Brown" wrote: Check twice before you click! This email originated from outside PNNL. It would be good to report a reduced test case upstream. They may not fix it, but a lot of things related to static libraries don't work without coaxing and they'll never get fixed if people who use CMake with static libraries don't make their voices heard. "Palmer, Bruce J via petsc-users" writes: > Argh, I'm an idiot. I can't write a proper print statement in CMake. > > The PETSC_STATIC_LDFLAGS variable is showing all the libraries so probably all I need to do is substitute that for PETSC_LDFLAGS in the GridPACK CMake build (once I find it) when the build is static. > > On 2/22/22, 10:39 AM, "Satish Balay" wrote: > > You can run 'pkg-config --static --libs PETSC_DIR/PETSC_ARCH/lib/pkgconfig/petsc.pc' to verify if pkg-config is able to obtain 'Libs.private' values. > > And then you would need help from someone who can debug cmake - on why PETSC_STATIC set by cmake does not reflect this value [as it should - per the FindPkgConfig doc] > > [sorry - I don't understand cmake - or how one would debug cmake issues] > > Satish > > On Tue, 22 Feb 2022, Palmer, Bruce J via petsc-users wrote: > > > The static versions of the variables exist (PETSC_STATIC), but they appear to have the same values as the non-static variables. > > > > As I mentioned, I'm a complete novice at pkgconfig, but it looks like if you could add the contents of Libs.private to the link line, you'd be in business. Any idea how to access this information from CMake? > > > > Bruce > > > > On 2/22/22, 10:22 AM, "Satish Balay" wrote: > > > > https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcmake.org%2Fcmake%2Fhelp%2Flatest%2Fmodule%2FFindPkgConfig.html&data=04%7C01%7CBruce.Palmer%40pnnl.gov%7C65eb4c4214a04fae212408d9f647ca0b%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637811610469013883%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=PRqeyGFzP%2B4nJr7ugFirs4wUmB0PR6fstQJ31cPzdPM%3D&reserved=0 > > > > >>> > > Two sets of values exist: One for the common case ( = ) and another for the information pkg-config provides when called with the --static option ( = _STATIC). > > <<< > > > > So perhaps CMAKE is already setting the _STATIC variant of (for PETSC_LIB or equivalent) variable that's currently used)? > > > > Satish > > > > On Tue, 22 Feb 2022, Palmer, Bruce J via petsc-users wrote: > > > > > The contents of the petsc.pc file are listed below. It looks good to me. The Libs.private variable seems to include the -lf2clapack and -lf2cblas libraries. I don't know how this info gets propagated up the build chain. > > > > > > Bruce > > > > > > prefix=/pic/projects/gridpack/software/petsc-3.16.3/linux-openmpi-gnu-cxx-complex-opt > > > exec_prefix=${prefix} > > > includedir=${prefix}/include > > > libdir=${prefix}/lib > > > ccompiler=mpicc > > > cflags_extra=-fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O > > > cflags_dep=-MMD -MP > > > ldflag_rpath=-Wl,-rpath, > > > cxxcompiler=mpicxx > > > cxxflags_extra=-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O -std=gnu++11 > > > fcompiler=mpif90 > > > fflags_extra=-Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O > > > > > > Name: PETSc > > > Description: Library to solve ODEs and algebraic equations > > > Version: 3.16.3 > > > Cflags: -I${includedir} -I/pic/projects/gridpack/software/petsc-3.16.3/include > > > Libs: -L${libdir} -lpetsc > > > Libs.private: -L/pic/projects/gridpack/software/petsc-3.16.3/linux-openmpi-gnu-cxx-complex-opt/lib -L/share/apps/openmpi/3.0.1/gcc/6.1.0/lib -L/qfs/projects/ops/rh6/gcc/6.1.0/lib/gcc/x86_64-pc-linux-gnu/6.1.0 -L/qfs/projects/ops/rh6/gcc/6.1.0/lib/gcc -L/qfs/projects/ops/rh6/gcc/6.1.0/lib64 -L/qfs/projects/ops/rh6/gcc/6.1.0/lib -lspqr -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lrt -lsuperlu -lsuperlu_dist -lf2clapack -lf2cblas -lparmetis -lmetis -lm -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lquadmath -lstdc++ -ldl > > > > > > On 2/22/22, 8:39 AM, "Satish Balay" wrote: > > > > > > The relevant pkg-config commands are: > > > > > > balay at sb /home/balay/petsc (release=) > > > $ pkg-config --libs arch-linux-c-debug/lib/pkgconfig/petsc.pc > > > -L/home/balay/petsc/arch-linux-c-debug/lib -lpetsc > > > balay at sb /home/balay/petsc (release=) > > > $ pkg-config --shared --libs arch-linux-c-debug/lib/pkgconfig/petsc.pc > > > -L/home/balay/petsc/arch-linux-c-debug/lib -lpetsc > > > balay at sb /home/balay/petsc (release=) > > > $ pkg-config --static --libs arch-linux-c-debug/lib/pkgconfig/petsc.pc > > > -L/home/balay/petsc/arch-linux-c-debug/lib -lpetsc -L/home/balay/soft/mpich-3.4.2/lib -L/usr/lib/gcc/x86_64-redhat-linux/11 -llapack -lblas -lm -lX11 -lstdc++ -ldl -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl > > > > > > > > > And more example usages in share/petsc/Makefile.user > > > > > > Satish > > > > > > > > > On Tue, 22 Feb 2022, Barry Smith wrote: > > > > > > > Bruce, > > > > > > > > Can you please send the PkgConfig calls that you make to get the PETSc values? And then exactly what PETSc PkgConfig returns. > > > > > > > > Thanks > > > > > > > > Barry > > > > > > > > > > > > > On Feb 22, 2022, at 11:03 AM, Palmer, Bruce J via petsc-users wrote: > > > > > > > > > > Hi, > > > > > > > > > > We recently switched the CMake configuration on our GridPACK application to use the PkgConfig utility instead of Jeb Brown?s FindPETSc.cmake module. This seems to work on a number of platforms but it is failing to link on others. It appears that the build cannot find the LAPACK and BLAS libraries. The PETSc library I?m linking to (v3.16.3) was configured with -download-f2cblaslapack so it should have these libraries, but when I try and link one of the test applications in GridPACK I get the errors > > > > > > > > > > /share/apps/gcc/6.1.0/bin/g++ -pthread -g -rdynamic CMakeFiles/greetings.dir/test/greetings.cpp.o -o greetings -Wl,-rpath,/qfs/projects/ops/rh6/openmpi/3.0.1/gcc/6.1.0/lib ../math/libgridpack_math.a libgridpack_parallel.a ../timer/libgridpack_timer.a ../environment/libgridpack_environment.a ../math/libgridpack_math.a /pic/projects/gridpack/software/boost_1_65_0/lib/libboost_mpi.a /pic/projects/gridpack/software/boost_1_65_0/lib/libboost_serialization.a /pic/projects/gridpack/software/boost_1_65_0/lib/libboost_random.a /pic/projects/gridpack/software/boost_1_65_0/lib/libboost_filesystem.a /pic/projects/gridpack/software/boost_1_65_0/lib/libboost_system.a /pic/projects/gridpack/software/ga-5.7/build_pr/lib/libga++.a /pic/projects/gridpack/software/ga-5.7/build_pr/lib/libga.a /pic/projects/gridpack/software/ga-5.7/build_pr/lib/libarmci.a -lrt /usr/lib64/librt.so /usr/lib64/libdl.so /qfs/projects/ops/rh6/openmpi/3.0.1/gcc/6.1.0/lib/libmpi.so ../timer/libgridpack_timer.a > libgri > > dpack_ > > > parallel.a ../configuration/libgridpack_configuration.a /pic/projects/gridpack/software/petsc-3.16.3/linux-openmpi-gnu-cxx-complex-opt/lib/libpetsc.a > > > > > /pic/projects/gridpack/software/petsc-3.16.3/linux-openmpi-gnu-cxx-complex-opt/lib/libpetsc.a(zstart.o): In function `petscinitializef_': > > > > > /pic/projects/gridpack/software/petsc-3.16.3/src/sys/objects/ftn-custom/zstart.c:280: undefined reference to `mpi_init_' > > > > > /pic/projects/gridpack/software/petsc-3.16.3/linux-openmpi-gnu-cxx-complex-opt/lib/libpetsc.a(baijfact.o): In function `MatSolve_SeqBAIJ_N': > > > > > /pic/projects/gridpack/software/petsc-3.16.3/src/mat/impls/baij/seq/baijfact.c:1462: undefined reference to `zgemv_' > > > > > /pic/projects/gridpack/software/petsc-3.16.3/src/mat/impls/baij/seq/baijfact.c:1475: undefined reference to `zgemv_' > > > > > /pic/projects/gridpack/software/petsc-3.16.3/src/mat/impls/baij/seq/baijfact.c:1478: undefined reference to `zgemv_' > > > > > /pic/projects/gridpack/software/petsc-3.16.3/linux-openmpi-gnu-cxx-complex-opt/lib/libpetsc.a(baijfact.o): In function `MatSolve_SeqBAIJ_N_NaturalOrdering': > > > > > /pic/projects/gridpack/software/petsc-3.16.3/src/mat/impls/baij/seq/baijfact.c:1407: undefined reference to `zgemv_' > > > > > /pic/projects/gridpack/software/petsc-3.16.3/src/mat/impls/baij/seq/baijfact.c:1420: undefined reference to `zgemv_' > > > > > /pic/projects/gridpack/software/petsc-3.16.3/linux-openmpi-gnu-cxx-complex-opt/lib/libpetsc.a(baijfact.o):/pic/projects/gridpack/software/petsc-3.16.3/sr > > > > > > > > > > I suspect that the reason it worked for others and not for me is that they had viable blas and lapack libraries in their path and I don?t. Is there anything special you need to do to make sure that the build is pointed at the libraries that get created with the -download-f2cblaslapack option? > > > > > > > > > > Bruce Palmer > > > > > > > > > > > > > > > > > > From jxiong at anl.gov Fri Mar 25 13:14:03 2022 From: jxiong at anl.gov (Xiong, Jing) Date: Fri, 25 Mar 2022 18:14:03 +0000 Subject: [petsc-users] Question about how to solve the DAE step by step using PETSC4PY. Message-ID: Good afternoon, Thanks for all your help. I got a question about how to solve the DAE step by step using PETSC4PY. I got two DAE systems, let's say f1(x1, u1) and f2(x2, u2). The simplified algorithm I need to implement is as follows: Step 1: solve f1 for 1 step. Step 2: use part of x1 as input for f2, which means u2 = x1[:n]. Step 3: solve f2 for one step with u2 = x1[:n]. Step 4: use part of x2 as input for f1, which means u1 = x2[:m]. I'm not able to find any examples of how to use PETSC4PY in such scenarios. If using the "scikits.odes.dae" package, it is like: daesolver.init_step(timeInit, XInit, XpInit) daesolver.step(time) Please let me know if there are any examples I can refer to. Thank you. Best, Jing -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongzhang at anl.gov Sun Mar 27 21:19:35 2022 From: hongzhang at anl.gov (Zhang, Hong) Date: Mon, 28 Mar 2022 02:19:35 +0000 Subject: [petsc-users] Question about how to solve the DAE step by step using PETSC4PY. In-Reply-To: References: Message-ID: <97119E7E-36CB-4169-995C-98CCAD6E1F6F@anl.gov> On Mar 25, 2022, at 1:14 PM, Xiong, Jing via petsc-users > wrote: Good afternoon, Thanks for all your help. I got a question about how to solve the DAE step by step using PETSC4PY. I got two DAE systems, let's say f1(x1, u1) and f2(x2, u2). The simplified algorithm I need to implement is as follows: Step 1: solve f1 for 1 step. Step 2: use part of x1 as input for f2, which means u2 = x1[:n]. Step 3: solve f2 for one step with u2 = x1[:n]. Step 4: use part of x2 as input for f1, which means u1 = x2[:m]. I'm not able to find any examples of how to use PETSC4PY in such scenarios. If using the "scikits.odes.dae" package, it is like: daesolver.init_step(timeInit, XInit, XpInit) daesolver.step(time) Jing, You can certainly do the same thing in petsc4py with ts.setMaxTime(t1) ts.solve() // integrate over time interval [0,t1] ts.setTime(t1) ts.setMaxTime(t2) ts.solve() // integrate over time interval [t1,t2] ... There are also APIs that allow you to control the step size (ts.setTimeStep) and the final step behavior (ts.setExactFinalTime). The C example src/ts/tutorials/power_grid/stability_9bus/ex9busdmnetwork.c might be helpful to you. Most of the C examples can be reproduced in python with similar APIs. Hong Please let me know if there are any examples I can refer to. Thank you. Best, Jing -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexlindsay239 at gmail.com Mon Mar 28 15:42:27 2022 From: alexlindsay239 at gmail.com (Alexander Lindsay) Date: Mon, 28 Mar 2022 13:42:27 -0700 Subject: [petsc-users] Failure to configure superlu_dist Message-ID: Attached is my configure.log. Error is: Could NOT find MPI_C (missing: MPI_C_HEADER_DIR) (found version "4.0") What's interesting is that cmake does successfully find MPI_CXX and MPI_Fortran albeit in a place I'd rather it not find it (a gcc mpi build located at $HOME/mpich/installed/lib whereas mpicc -show yields what it should at $HOME/mpich/installed-clang/lib). My same configure line worked well with PETSc 3.15.x. I noticed that these arguments are empty: -DMPI_C_INCLUDE_PATH:STRING="" -DMPI_C_HEADER_DIR:STRING="" -DMPI_C_LIBRARIES:STRING="" If I fill those with the correct values and manually cmake superlu_dist, then configure is successful -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: text/x-log Size: 2634419 bytes Desc: not available URL: From bsmith at petsc.dev Mon Mar 28 16:03:37 2022 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 28 Mar 2022 17:03:37 -0400 Subject: [petsc-users] Failure to configure superlu_dist In-Reply-To: References: Message-ID: <1D1FE35D-83E9-488D-9E6B-02BA1A27025C@petsc.dev> Could you please try with the main branch of PETSc? We've seen similar problems that have at least partially been dealt with in the main branch. Barry > On Mar 28, 2022, at 4:42 PM, Alexander Lindsay wrote: > > Attached is my configure.log. Error is: > > Could NOT find MPI_C (missing: MPI_C_HEADER_DIR) (found version "4.0") > > What's interesting is that cmake does successfully find MPI_CXX and MPI_Fortran albeit in a place I'd rather it not find it (a gcc mpi build located at $HOME/mpich/installed/lib whereas mpicc -show yields what it should at $HOME/mpich/installed-clang/lib). > > My same configure line worked well with PETSc 3.15.x. I noticed that these arguments are empty: > > -DMPI_C_INCLUDE_PATH:STRING="" -DMPI_C_HEADER_DIR:STRING="" -DMPI_C_LIBRARIES:STRING="" > > If I fill those with the correct values and manually cmake superlu_dist, then configure is successful > From balay at mcs.anl.gov Mon Mar 28 16:13:00 2022 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 28 Mar 2022 16:13:00 -0500 (CDT) Subject: [petsc-users] Failure to configure superlu_dist In-Reply-To: <1D1FE35D-83E9-488D-9E6B-02BA1A27025C@petsc.dev> References: <1D1FE35D-83E9-488D-9E6B-02BA1A27025C@petsc.dev> Message-ID: <98c45426-f7aa-76eb-ed82-2c962995d432@mcs.anl.gov> I think moose is bound now to a petsc-3.16.5+ snapshot >>>>>> -- Could NOT find MPI_C (missing: MPI_C_HEADER_DIR) (found version "4.0") -- Found MPI_CXX: /home/lindad/mpich/installed/lib/libmpicxx.so (found version "4.0") -- Found MPI_Fortran: /home/lindad/mpich/installed/lib/libmpifort.so (found version "3.1") -- Configuring incomplete, errors occurred! <<<< This is strange. libmpifort is a different version than libmpicxx.so ? If specifying these additional options (manually) get a functional superlu_dist build - you can also specify them to petsc configure via --download-superlu_dist-cmake-arguments=string [but yeah - good to know if this issue persists with petsc/main] Satish On Mon, 28 Mar 2022, Barry Smith wrote: > > Could you please try with the main branch of PETSc? > > We've seen similar problems that have at least partially been dealt with in the main branch. > > Barry > > > > > On Mar 28, 2022, at 4:42 PM, Alexander Lindsay wrote: > > > > Attached is my configure.log. Error is: > > > > Could NOT find MPI_C (missing: MPI_C_HEADER_DIR) (found version "4.0") > > > > What's interesting is that cmake does successfully find MPI_CXX and MPI_Fortran albeit in a place I'd rather it not find it (a gcc mpi build located at $HOME/mpich/installed/lib whereas mpicc -show yields what it should at $HOME/mpich/installed-clang/lib). > > > > My same configure line worked well with PETSc 3.15.x. I noticed that these arguments are empty: > > > > -DMPI_C_INCLUDE_PATH:STRING="" -DMPI_C_HEADER_DIR:STRING="" -DMPI_C_LIBRARIES:STRING="" > > > > If I fill those with the correct values and manually cmake superlu_dist, then configure is successful > > > From alexlindsay239 at gmail.com Mon Mar 28 16:33:50 2022 From: alexlindsay239 at gmail.com (Alexander Lindsay) Date: Mon, 28 Mar 2022 14:33:50 -0700 Subject: [petsc-users] Failure to configure superlu_dist In-Reply-To: <98c45426-f7aa-76eb-ed82-2c962995d432@mcs.anl.gov> References: <1D1FE35D-83E9-488D-9E6B-02BA1A27025C@petsc.dev> <98c45426-f7aa-76eb-ed82-2c962995d432@mcs.anl.gov> Message-ID: Ok, nothing to see here ... This was user error. I had MPI_ROOT set to a different MPI install than that corresponding to the mpi in my PATH. On Mon, Mar 28, 2022 at 2:13 PM Satish Balay wrote: > I think moose is bound now to a petsc-3.16.5+ snapshot > > >>>>>> > -- Could NOT find MPI_C (missing: MPI_C_HEADER_DIR) (found version "4.0") > -- Found MPI_CXX: /home/lindad/mpich/installed/lib/libmpicxx.so (found > version "4.0") > -- Found MPI_Fortran: /home/lindad/mpich/installed/lib/libmpifort.so > (found version "3.1") > -- Configuring incomplete, errors occurred! > <<<< > > This is strange. libmpifort is a different version than libmpicxx.so ? > > If specifying these additional options (manually) get a functional > superlu_dist build - you can also specify them to petsc configure via > > --download-superlu_dist-cmake-arguments=string > > [but yeah - good to know if this issue persists with petsc/main] > > Satish > > > On Mon, 28 Mar 2022, Barry Smith wrote: > > > > > Could you please try with the main branch of PETSc? > > > > We've seen similar problems that have at least partially been dealt > with in the main branch. > > > > Barry > > > > > > > > > On Mar 28, 2022, at 4:42 PM, Alexander Lindsay < > alexlindsay239 at gmail.com> wrote: > > > > > > Attached is my configure.log. Error is: > > > > > > Could NOT find MPI_C (missing: MPI_C_HEADER_DIR) (found version "4.0") > > > > > > What's interesting is that cmake does successfully find MPI_CXX and > MPI_Fortran albeit in a place I'd rather it not find it (a gcc mpi build > located at $HOME/mpich/installed/lib whereas mpicc -show yields what it > should at $HOME/mpich/installed-clang/lib). > > > > > > My same configure line worked well with PETSc 3.15.x. I noticed that > these arguments are empty: > > > > > > -DMPI_C_INCLUDE_PATH:STRING="" -DMPI_C_HEADER_DIR:STRING="" > -DMPI_C_LIBRARIES:STRING="" > > > > > > If I fill those with the correct values and manually cmake > superlu_dist, then configure is successful > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Mon Mar 28 16:44:08 2022 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 28 Mar 2022 16:44:08 -0500 (CDT) Subject: [petsc-users] Failure to configure superlu_dist In-Reply-To: References: <1D1FE35D-83E9-488D-9E6B-02BA1A27025C@petsc.dev> <98c45426-f7aa-76eb-ed82-2c962995d432@mcs.anl.gov> Message-ID: <52677c1-a8b0-aa3f-a4b8-f2430131727@mcs.anl.gov> Glad you were able to figure this out. So having an incompatible MPI_ROOT env variable set can break builds [breaks cmake or mpif90?] Satish On Mon, 28 Mar 2022, Alexander Lindsay wrote: > Ok, nothing to see here ... This was user error. I had MPI_ROOT set to a > different MPI install than that corresponding to the mpi in my PATH. > > On Mon, Mar 28, 2022 at 2:13 PM Satish Balay wrote: > > > I think moose is bound now to a petsc-3.16.5+ snapshot > > > > >>>>>> > > -- Could NOT find MPI_C (missing: MPI_C_HEADER_DIR) (found version "4.0") > > -- Found MPI_CXX: /home/lindad/mpich/installed/lib/libmpicxx.so (found > > version "4.0") > > -- Found MPI_Fortran: /home/lindad/mpich/installed/lib/libmpifort.so > > (found version "3.1") > > -- Configuring incomplete, errors occurred! > > <<<< > > > > This is strange. libmpifort is a different version than libmpicxx.so ? > > > > If specifying these additional options (manually) get a functional > > superlu_dist build - you can also specify them to petsc configure via > > > > --download-superlu_dist-cmake-arguments=string > > > > [but yeah - good to know if this issue persists with petsc/main] > > > > Satish > > > > > > On Mon, 28 Mar 2022, Barry Smith wrote: > > > > > > > > Could you please try with the main branch of PETSc? > > > > > > We've seen similar problems that have at least partially been dealt > > with in the main branch. > > > > > > Barry > > > > > > > > > > > > > On Mar 28, 2022, at 4:42 PM, Alexander Lindsay < > > alexlindsay239 at gmail.com> wrote: > > > > > > > > Attached is my configure.log. Error is: > > > > > > > > Could NOT find MPI_C (missing: MPI_C_HEADER_DIR) (found version "4.0") > > > > > > > > What's interesting is that cmake does successfully find MPI_CXX and > > MPI_Fortran albeit in a place I'd rather it not find it (a gcc mpi build > > located at $HOME/mpich/installed/lib whereas mpicc -show yields what it > > should at $HOME/mpich/installed-clang/lib). > > > > > > > > My same configure line worked well with PETSc 3.15.x. I noticed that > > these arguments are empty: > > > > > > > > -DMPI_C_INCLUDE_PATH:STRING="" -DMPI_C_HEADER_DIR:STRING="" > > -DMPI_C_LIBRARIES:STRING="" > > > > > > > > If I fill those with the correct values and manually cmake > > superlu_dist, then configure is successful > > > > > > > > > > > > From alexlindsay239 at gmail.com Mon Mar 28 16:47:01 2022 From: alexlindsay239 at gmail.com (Alexander Lindsay) Date: Mon, 28 Mar 2022 14:47:01 -0700 Subject: [petsc-users] Failure to configure superlu_dist In-Reply-To: <52677c1-a8b0-aa3f-a4b8-f2430131727@mcs.anl.gov> References: <1D1FE35D-83E9-488D-9E6B-02BA1A27025C@petsc.dev> <98c45426-f7aa-76eb-ed82-2c962995d432@mcs.anl.gov> <52677c1-a8b0-aa3f-a4b8-f2430131727@mcs.anl.gov> Message-ID: Pretty sure it's cmake. Seems like a similar issue here: https://gitlab.kitware.com/cmake/cmake/-/issues/21723 On Mon, Mar 28, 2022 at 2:44 PM Satish Balay wrote: > Glad you were able to figure this out. > > So having an incompatible MPI_ROOT env variable set can break builds > [breaks cmake or mpif90?] > > Satish > > On Mon, 28 Mar 2022, Alexander Lindsay wrote: > > > Ok, nothing to see here ... This was user error. I had MPI_ROOT set to a > > different MPI install than that corresponding to the mpi in my PATH. > > > > On Mon, Mar 28, 2022 at 2:13 PM Satish Balay wrote: > > > > > I think moose is bound now to a petsc-3.16.5+ snapshot > > > > > > >>>>>> > > > -- Could NOT find MPI_C (missing: MPI_C_HEADER_DIR) (found version > "4.0") > > > -- Found MPI_CXX: /home/lindad/mpich/installed/lib/libmpicxx.so (found > > > version "4.0") > > > -- Found MPI_Fortran: /home/lindad/mpich/installed/lib/libmpifort.so > > > (found version "3.1") > > > -- Configuring incomplete, errors occurred! > > > <<<< > > > > > > This is strange. libmpifort is a different version than libmpicxx.so ? > > > > > > If specifying these additional options (manually) get a functional > > > superlu_dist build - you can also specify them to petsc configure via > > > > > > --download-superlu_dist-cmake-arguments=string > > > > > > [but yeah - good to know if this issue persists with petsc/main] > > > > > > Satish > > > > > > > > > On Mon, 28 Mar 2022, Barry Smith wrote: > > > > > > > > > > > Could you please try with the main branch of PETSc? > > > > > > > > We've seen similar problems that have at least partially been dealt > > > with in the main branch. > > > > > > > > Barry > > > > > > > > > > > > > > > > > On Mar 28, 2022, at 4:42 PM, Alexander Lindsay < > > > alexlindsay239 at gmail.com> wrote: > > > > > > > > > > Attached is my configure.log. Error is: > > > > > > > > > > Could NOT find MPI_C (missing: MPI_C_HEADER_DIR) (found version > "4.0") > > > > > > > > > > What's interesting is that cmake does successfully find MPI_CXX and > > > MPI_Fortran albeit in a place I'd rather it not find it (a gcc mpi > build > > > located at $HOME/mpich/installed/lib whereas mpicc -show yields what it > > > should at $HOME/mpich/installed-clang/lib). > > > > > > > > > > My same configure line worked well with PETSc 3.15.x. I noticed > that > > > these arguments are empty: > > > > > > > > > > -DMPI_C_INCLUDE_PATH:STRING="" -DMPI_C_HEADER_DIR:STRING="" > > > -DMPI_C_LIBRARIES:STRING="" > > > > > > > > > > If I fill those with the correct values and manually cmake > > > superlu_dist, then configure is successful > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph at email.arizona.edu Tue Mar 29 01:07:01 2022 From: aph at email.arizona.edu (Anthony Paul Haas) Date: Mon, 28 Mar 2022 23:07:01 -0700 Subject: [petsc-users] Mumps and PTScotch Message-ID: Hello, Is PTScotch required when using Petsc with Mumps to solve a (complex) linear system of equations (direct solution)? The page https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html mentions to use the following options for the configuration (see below) but the option "--download-ptscotch" results in the error message "Unable to configure with given options" *From https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html * ./configure --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch Thanks, Anthony -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at joliv.et Tue Mar 29 01:17:00 2022 From: pierre at joliv.et (Pierre Jolivet) Date: Tue, 29 Mar 2022 08:17:00 +0200 Subject: [petsc-users] Mumps and PTScotch In-Reply-To: References: Message-ID: <8E398BC0-CBE2-4F0C-A730-7DF10F50EFEF@joliv.et> Hello Anthony, > On 29 Mar 2022, at 8:07 AM, Anthony Paul Haas wrote: > > Hello, > > Is PTScotch required when using Petsc with Mumps to solve a (complex) linear system of equations (direct solution)? SCOTCH is not required when using MUMPS, it?s an optional dependency (odeps), see https://gitlab.com/petsc/petsc/-/blob/c9b04d1e75cd0716fe0f943e5da74654588dc53a/config/BuildSystem/config/packages/MUMPS.py#L49 . > The page https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html mentions to use the following options for the configuration (see below) but the option "--download-ptscotch" results in the error message "Unable to configure with given options" Please send configure.log at petsc-maint at mcs.anl.gov Thanks, Pierre > From https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html > ./configure --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch > > > Thanks, > > Anthony > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland.richter at ntnu.no Thu Mar 31 04:41:09 2022 From: roland.richter at ntnu.no (Roland Richter) Date: Thu, 31 Mar 2022 11:41:09 +0200 Subject: [petsc-users] Memory leak when combining PETSc-based vectors and boost::odeint Message-ID: <7ced777e-1b9a-0fb4-3db8-78b14612fe2b@ntnu.no> Hei, For a project I wanted to combine boost::odeint for timestepping and PETSc-based vectors and matrices for calculating the right hand side. As comparison for both timing and correctness I set up an armadillo-based right hand side (with the main-function being in *main.cpp*, and the test code in *test_timestepping_clean.cpp*) In theory, the code works fine, but I have some issues with cleaning up afterwards in my struct /Petsc_RHS_state_clean/. My initial intention was to set up all involved matrices and vectors within the constructor, and free the memory in the destructor. To avoid freeing vectors I have not used I initially set them to /PETSC_NULL/, and check if this value has been changed before calling /VecDestroy()./ However, when doing that I get the following error: [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run ? [0]PETSC ERROR: to get more information on the crash. [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- If I comment out that code in ~Petsc_RHS_state_clean(), the program runs, but will use ~17 GByte of RAM during runtime. As the memory is not used immediately in full, but rather increases during running, I assume a memory leak somewhere. Where does it come from, and how can I avoid it? Thanks! Regards, Roland Richter -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: main.cpp Type: text/x-c++src Size: 601 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test_timestepping_clean.cpp Type: text/x-c++src Size: 9613 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test_timestepping_clean.hpp Type: text/x-c++hdr Size: 483 bytes Desc: not available URL: From knepley at gmail.com Thu Mar 31 05:14:52 2022 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 31 Mar 2022 06:14:52 -0400 Subject: [petsc-users] Memory leak when combining PETSc-based vectors and boost::odeint In-Reply-To: <7ced777e-1b9a-0fb4-3db8-78b14612fe2b@ntnu.no> References: <7ced777e-1b9a-0fb4-3db8-78b14612fe2b@ntnu.no> Message-ID: On Thu, Mar 31, 2022 at 5:58 AM Roland Richter wrote: > Hei, > > For a project I wanted to combine boost::odeint for timestepping and > PETSc-based vectors and matrices for calculating the right hand side. As > comparison for both timing and correctness I set up an armadillo-based > right hand side (with the main-function being in *main.cpp*, and the test > code in *test_timestepping_clean.cpp*) > > In theory, the code works fine, but I have some issues with cleaning up > afterwards in my struct *Petsc_RHS_state_clean*. My initial intention was > to set up all involved matrices and vectors within the constructor, and > free the memory in the destructor. To avoid freeing vectors I have not used > I initially set them to *PETSC_NULL*, and check if this value has been > changed before calling *VecDestroy().* > You do not need to check. Destroy() functions already check for NULL. > However, when doing that I get the following error: > > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS > X to find memory corruption errors > [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and > run > [0]PETSC ERROR: to get more information on the crash. > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > If I comment out that code in ~Petsc_RHS_state_clean(), the program runs, > but will use ~17 GByte of RAM during runtime. As the memory is not used > immediately in full, but rather increases during running, I assume a memory > leak somewhere. Where does it come from, and how can I avoid it? > It must be that your constructor is called multiple times without calling your destructor. I cannot understand this code in order to see where that happens, but you should just be able to run in the debugger and put a break point at the creation and destruction calls. Thanks, Matt > Thanks! > > Regards, > > Roland Richter > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland.richter at ntnu.no Thu Mar 31 08:01:21 2022 From: roland.richter at ntnu.no (Roland Richter) Date: Thu, 31 Mar 2022 15:01:21 +0200 Subject: [petsc-users] Memory leak when combining PETSc-based vectors and boost::odeint In-Reply-To: References: <7ced777e-1b9a-0fb4-3db8-78b14612fe2b@ntnu.no> Message-ID: Hei, Thanks for the idea! I added a simple std::cout for both constructor and destructor, and found out that my destructor is called multiple times, while the constructor is called only once. This could explain the error (double free), but I do not know why segfault is thrown even though I explicitly check if the vector has been used. Are there explanations for that? Regards, Roland Richter Am 31.03.22 um 12:14 schrieb Matthew Knepley: > On Thu, Mar 31, 2022 at 5:58 AM Roland Richter > wrote: > > Hei, > > For a project I wanted to combine boost::odeint for timestepping > and PETSc-based vectors and matrices for calculating the right > hand side. As comparison for both timing and correctness I set up > an armadillo-based right hand side (with the main-function being > in *main.cpp*, and the test code in *test_timestepping_clean.cpp*) > > In theory, the code works fine, but I have some issues with > cleaning up afterwards in my struct /Petsc_RHS_state_clean/. My > initial intention was to set up all involved matrices and vectors > within the constructor, and free the memory in the destructor. To > avoid freeing vectors I have not used I initially set them to > /PETSC_NULL/, and check if this value has been changed before > calling /VecDestroy()./ > > You do not need to check. Destroy() functions already check for NULL. > > However, when doing that I get the following error: > > [0]PETSC ERROR: > ------------------------------------------------------------------------ > > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation > Violation, probably memory access out of range > [0]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger > [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple > Mac OS X to find memory corruption errors > [0]PETSC ERROR: configure using --with-debugging=yes, recompile, > link, and run ? > [0]PETSC ERROR: to get more information on the crash. > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > If I comment out that code in ~Petsc_RHS_state_clean(), the > program runs, but will use ~17 GByte of RAM during runtime. As the > memory is not used immediately in full, but rather increases > during running, I assume a memory leak somewhere. Where does it > come from, and how can I avoid it? > > It must be that your constructor is called multiple times without > calling your destructor. I cannot understand this code in order > to see where that happens, but you should just be able?to run in the > debugger and put a break point at the creation and > destruction calls. > > ? Thanks, > > ? ? ? Matt > > Thanks! > > Regards, > > Roland Richter > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Mar 31 08:35:28 2022 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 31 Mar 2022 09:35:28 -0400 Subject: [petsc-users] Memory leak when combining PETSc-based vectors and boost::odeint In-Reply-To: References: <7ced777e-1b9a-0fb4-3db8-78b14612fe2b@ntnu.no> Message-ID: On Thu, Mar 31, 2022 at 9:01 AM Roland Richter wrote: > Hei, > > Thanks for the idea! I added a simple std::cout for both constructor and > destructor, and found out that my destructor is called multiple times, > while the constructor is called only once. This could explain the error > (double free), but I do not know why segfault is thrown even though I > explicitly check if the vector has been used. Are there explanations for > that? > Run with -start_in_debugger and get the stack trace when it faults. Right now, I have no idea where it is faulting. Thanks, Matt > Regards, > > Roland Richter > Am 31.03.22 um 12:14 schrieb Matthew Knepley: > > On Thu, Mar 31, 2022 at 5:58 AM Roland Richter > wrote: > >> Hei, >> >> For a project I wanted to combine boost::odeint for timestepping and >> PETSc-based vectors and matrices for calculating the right hand side. As >> comparison for both timing and correctness I set up an armadillo-based >> right hand side (with the main-function being in *main.cpp*, and the >> test code in *test_timestepping_clean.cpp*) >> >> In theory, the code works fine, but I have some issues with cleaning up >> afterwards in my struct *Petsc_RHS_state_clean*. My initial intention >> was to set up all involved matrices and vectors within the constructor, and >> free the memory in the destructor. To avoid freeing vectors I have not used >> I initially set them to *PETSC_NULL*, and check if this value has been >> changed before calling *VecDestroy().* >> > You do not need to check. Destroy() functions already check for NULL. > >> However, when doing that I get the following error: >> >> [0]PETSC ERROR: >> ------------------------------------------------------------------------ >> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, >> probably memory access out of range >> [0]PETSC ERROR: Try option -start_in_debugger or >> -on_error_attach_debugger >> [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind >> [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS >> X to find memory corruption errors >> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, >> and run >> [0]PETSC ERROR: to get more information on the crash. >> [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> >> If I comment out that code in ~Petsc_RHS_state_clean(), the program runs, >> but will use ~17 GByte of RAM during runtime. As the memory is not used >> immediately in full, but rather increases during running, I assume a memory >> leak somewhere. Where does it come from, and how can I avoid it? >> > It must be that your constructor is called multiple times without calling > your destructor. I cannot understand this code in order > to see where that happens, but you should just be able to run in the > debugger and put a break point at the creation and > destruction calls. > > Thanks, > > Matt > >> Thanks! >> >> Regards, >> >> Roland Richter >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland.richter at ntnu.no Thu Mar 31 08:46:56 2022 From: roland.richter at ntnu.no (Roland Richter) Date: Thu, 31 Mar 2022 15:46:56 +0200 Subject: [petsc-users] Memory leak when combining PETSc-based vectors and boost::odeint In-Reply-To: References: <7ced777e-1b9a-0fb4-3db8-78b14612fe2b@ntnu.no> Message-ID: The backtrace is #0 ?0x00007fffeec4ba97in VecGetSize_Seq() from /opt/petsc/lib/libpetsc.so.3.016 #1 ?0x00007fffeec78f5ain VecGetSize() from /opt/petsc/lib/libpetsc.so.3.016 #2 ?0x0000000000410b73in test_ts_arma_with_pure_petsc_preconfigured_clean(unsigned long, unsigned long, arma::Col > const&, arma::Col >&, double, double, double) [clone .constprop.0]() #3 ?0x0000000000414384in test_RK4_solvers_clean(unsigned long, unsigned long, unsigned long, bool) [clone .constprop.0]() #4 ?0x0000000000405c6cin main() Regards, Roland Richter Am 31.03.22 um 15:35 schrieb Matthew Knepley: > On Thu, Mar 31, 2022 at 9:01 AM Roland Richter > wrote: > > Hei, > > Thanks for the idea! I added a simple std::cout for both > constructor and destructor, and found out that my destructor is > called multiple times, while the constructor is called only once. > This could explain the error (double free), but I do not know why > segfault is thrown even though I explicitly check if the vector > has been used. Are there explanations for that? > > Run with -start_in_debugger and get the stack trace when it faults. > Right now, I have no idea where it is faulting. > > ? Thanks, > > ? ? Matt > ? > > Regards, > > Roland Richter > > Am 31.03.22 um 12:14 schrieb Matthew Knepley: >> On Thu, Mar 31, 2022 at 5:58 AM Roland Richter >> wrote: >> >> Hei, >> >> For a project I wanted to combine boost::odeint for >> timestepping and PETSc-based vectors and matrices for >> calculating the right hand side. As comparison for both >> timing and correctness I set up an armadillo-based right hand >> side (with the main-function being in *main.cpp*, and the >> test code in *test_timestepping_clean.cpp*) >> >> In theory, the code works fine, but I have some issues with >> cleaning up afterwards in my struct /Petsc_RHS_state_clean/. >> My initial intention was to set up all involved matrices and >> vectors within the constructor, and free the memory in the >> destructor. To avoid freeing vectors I have not used I >> initially set them to /PETSC_NULL/, and check if this value >> has been changed before calling /VecDestroy()./ >> >> You do not need to check. Destroy() functions already check for NULL. >> >> However, when doing that I get the following error: >> >> [0]PETSC ERROR: >> ------------------------------------------------------------------------ >> >> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation >> Violation, probably memory access out of range >> [0]PETSC ERROR: Try option -start_in_debugger or >> -on_error_attach_debugger >> [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind >> [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and >> Apple Mac OS X to find memory corruption errors >> [0]PETSC ERROR: configure using --with-debugging=yes, >> recompile, link, and run ? >> [0]PETSC ERROR: to get more information on the crash. >> [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> >> If I comment out that code in ~Petsc_RHS_state_clean(), the >> program runs, but will use ~17 GByte of RAM during runtime. >> As the memory is not used immediately in full, but rather >> increases during running, I assume a memory leak somewhere. >> Where does it come from, and how can I avoid it? >> >> It must be that your constructor is called multiple times without >> calling your destructor. I cannot understand this code in order >> to see where that happens, but you should just be able?to run in >> the debugger and put a break point at the creation and >> destruction calls. >> >> ? Thanks, >> >> ? ? ? Matt >> >> Thanks! >> >> Regards, >> >> Roland Richter >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to >> which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Mar 31 08:50:02 2022 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 31 Mar 2022 09:50:02 -0400 Subject: [petsc-users] Memory leak when combining PETSc-based vectors and boost::odeint In-Reply-To: References: <7ced777e-1b9a-0fb4-3db8-78b14612fe2b@ntnu.no> Message-ID: On Thu, Mar 31, 2022 at 9:47 AM Roland Richter wrote: > The backtrace is > > #0 0x00007fffeec4ba97 in VecGetSize_Seq () from > /opt/petsc/lib/libpetsc.so.3.016 > #1 0x00007fffeec78f5a in VecGetSize () from > /opt/petsc/lib/libpetsc.so.3.016 > #2 0x0000000000410b73 in test_ts_arma_with_pure_petsc_preconfigured_clean(unsigned > long, unsigned long, arma::Col e> > const&, arma::Col >&, double, double, double) > [clone .constprop.0] () > #3 0x0000000000414384 in test_RK4_solvers_clean(unsigned long, unsigned > long, unsigned long, bool) [clone .constprop.0] () > #4 0x0000000000405c6c in main () > > It looks like you are passing an invalid vector. If you compiled in debug mode, it would tell you. I would run in debug until my code was running like I expect, then switch to optimized. You can do that by using two different PETSC_ARCH configures, and switch at runtime with that variable. Thanks, Matt > Regards, > Roland Richter > > Am 31.03.22 um 15:35 schrieb Matthew Knepley: > > On Thu, Mar 31, 2022 at 9:01 AM Roland Richter > wrote: > >> Hei, >> >> Thanks for the idea! I added a simple std::cout for both constructor and >> destructor, and found out that my destructor is called multiple times, >> while the constructor is called only once. This could explain the error >> (double free), but I do not know why segfault is thrown even though I >> explicitly check if the vector has been used. Are there explanations for >> that? >> > Run with -start_in_debugger and get the stack trace when it faults. Right > now, I have no idea where it is faulting. > > Thanks, > > Matt > > >> Regards, >> >> Roland Richter >> Am 31.03.22 um 12:14 schrieb Matthew Knepley: >> >> On Thu, Mar 31, 2022 at 5:58 AM Roland Richter >> wrote: >> >>> Hei, >>> >>> For a project I wanted to combine boost::odeint for timestepping and >>> PETSc-based vectors and matrices for calculating the right hand side. As >>> comparison for both timing and correctness I set up an armadillo-based >>> right hand side (with the main-function being in *main.cpp*, and the >>> test code in *test_timestepping_clean.cpp*) >>> >>> In theory, the code works fine, but I have some issues with cleaning up >>> afterwards in my struct *Petsc_RHS_state_clean*. My initial intention >>> was to set up all involved matrices and vectors within the constructor, and >>> free the memory in the destructor. To avoid freeing vectors I have not used >>> I initially set them to *PETSC_NULL*, and check if this value has been >>> changed before calling *VecDestroy().* >>> >> You do not need to check. Destroy() functions already check for NULL. >> >>> However, when doing that I get the following error: >>> >>> [0]PETSC ERROR: >>> ------------------------------------------------------------------------ >>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, >>> probably memory access out of range >>> [0]PETSC ERROR: Try option -start_in_debugger or >>> -on_error_attach_debugger >>> [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind >>> [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac >>> OS X to find memory corruption errors >>> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, >>> and run >>> [0]PETSC ERROR: to get more information on the crash. >>> [0]PETSC ERROR: --------------------- Error Message >>> -------------------------------------------------------------- >>> >>> If I comment out that code in ~Petsc_RHS_state_clean(), the program >>> runs, but will use ~17 GByte of RAM during runtime. As the memory is not >>> used immediately in full, but rather increases during running, I assume a >>> memory leak somewhere. Where does it come from, and how can I avoid it? >>> >> It must be that your constructor is called multiple times without calling >> your destructor. I cannot understand this code in order >> to see where that happens, but you should just be able to run in the >> debugger and put a break point at the creation and >> destruction calls. >> >> Thanks, >> >> Matt >> >>> Thanks! >>> >>> Regards, >>> >>> Roland Richter >>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From medane.tchakorom at univ-fcomte.fr Thu Mar 31 09:10:54 2022 From: medane.tchakorom at univ-fcomte.fr (Medane TCHAKOROM) Date: Thu, 31 Mar 2022 16:10:54 +0200 Subject: [petsc-users] MatMult method Message-ID: <3ae23132-ff34-a0be-a3b4-b8e33a8c320a@univ-fcomte.fr> Hello, I got one issue with MatMult method that I do not understand. Whenever I multiply Matrix A by vector b (as shown below), the printed result show a value with an exponent that is far away form the expected result. So I did the same Matrix-vector product in python and get the expected answer (shown below). Can someone please explain what is going on, cause I need the exact results in order to compute the norm. Thanks *Matrix A* Mat Object: 1 MPI processes ? type: seqaij row 0: (0, 4.)? (1, -1.)? (4, -1.) row 1: (0, -1.)? (1, 4.)? (2, -1.)? (5, -1.) row 2: (1, -1.)? (2, 4.)? (3, -1.)? (6, -1.) row 3: (2, -1.)? (3, 4.)? (7, -1.) row 4: (0, -1.)? (4, 4.)? (5, -1.)? (8, -1.) row 5: (1, -1.)? (4, -1.)? (5, 4.)? (6, -1.)? (9, -1.) row 6: (2, -1.)? (5, -1.)? (6, 4.)? (7, -1.)? (10, -1.) row 7: (3, -1.)? (6, -1.)? (7, 4.)? (11, -1.) *Vector b* Vec Object: 1 MPI processes ? type: seq 0.998617 0.997763 0.997763 0.998617 0.996705 0.994672 0.994672 0.996705 0.993529 0.989549 0.989549 0.993529 0.997285 0.995611 0.995611 0.997285 *PETSc Results with : MatMult (A,x, some_vector_to_store_the_result)* Vec Object: 1 MPI processes ? type: seq 2. 1. 1. 2. 1. 0. *1.11022e-16* 1. *Python result:* [ 2. 0.999999 0.999999 2. 0.990577 -0.015165 -0.015165 0.990577] -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Mar 31 09:44:07 2022 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 31 Mar 2022 10:44:07 -0400 Subject: [petsc-users] MatMult method In-Reply-To: <3ae23132-ff34-a0be-a3b4-b8e33a8c320a@univ-fcomte.fr> References: <3ae23132-ff34-a0be-a3b4-b8e33a8c320a@univ-fcomte.fr> Message-ID: On Thu, Mar 31, 2022 at 10:11 AM Medane TCHAKOROM < medane.tchakorom at univ-fcomte.fr> wrote: > Hello, > > I got one issue with MatMult method that I do not understand. > > Whenever I multiply Matrix A by vector b (as shown below), the printed > result > > show a value with an exponent that is far away form the expected result. > So I did the same > > Matrix-vector product in python and get the expected answer (shown below). > > Can someone please explain what is going on, cause I need the exact > results in order to compute the norm. > I did the multiply by hand. Your Python results are wrong for rows 5 and 6, but the PETSc results are correct. Thanks, Matt > Thanks > > > *Matrix A* > > Mat Object: 1 MPI processes > type: seqaij > row 0: (0, 4.) (1, -1.) (4, -1.) > row 1: (0, -1.) (1, 4.) (2, -1.) (5, -1.) > row 2: (1, -1.) (2, 4.) (3, -1.) (6, -1.) > row 3: (2, -1.) (3, 4.) (7, -1.) > row 4: (0, -1.) (4, 4.) (5, -1.) (8, -1.) > row 5: (1, -1.) (4, -1.) (5, 4.) (6, -1.) (9, -1.) > row 6: (2, -1.) (5, -1.) (6, 4.) (7, -1.) (10, -1.) > row 7: (3, -1.) (6, -1.) (7, 4.) (11, -1.) > > > *Vector b* > > > Vec Object: 1 MPI processes > type: seq > 0.998617 > 0.997763 > 0.997763 > 0.998617 > 0.996705 > 0.994672 > 0.994672 > 0.996705 > 0.993529 > 0.989549 > 0.989549 > 0.993529 > 0.997285 > 0.995611 > 0.995611 > 0.997285 > > > *PETSc Results with : MatMult (A,x, some_vector_to_store_the_result)* > > Vec Object: 1 MPI processes > type: seq > 2. > 1. > 1. > 2. > 1. > 0. > *1.11022e-16* > 1. > > > > *Python result:* > > [ 2. 0.999999 0.999999 2. 0.990577 -0.015165 -0.015165 0.990577] > > > > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Mar 31 16:59:14 2022 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 31 Mar 2022 17:59:14 -0400 Subject: [petsc-users] PETSc 3.17 release Message-ID: We are pleased to announce the release of PETSc version 3.17.0 at https://petsc.org/release/download/ A list of the major changes and updates can be found at https://petsc.org/release/docs/changes/317 The final update to petsc-3.16 i.e petsc-3.16.6 is also available We recommend upgrading to PETSc 3.17.0 soon. As always, please report problems to petsc-maint at mcs.anl.gov and ask questions at petsc-users at mcs.anl.gov This release includes contributions from Alp Dener Barry Smith Blaise Bourdin Connor Ward Daniel Finn Dave May David Wells dr-robertk Fande Kong Francesco Ballarin Getnet Grzegorz Mazur Heeho Park Hofer-Julian Hong Zhang Jacob Faibussowitsch Jed Brown Jeremy L Thompson Joe Pusztay Joe Wallwork Johann Rudi J?rgen Dokken Jose Roman Junchao Zhang Koki Sagiyama Lawrence Mitchell Lisandro Dalcin Mark Adams Martin Diehl Matthew Knepley Matt McGurn Mr. Hong Zhang Nicolas Barnafi Nicolas Barral Pablo Brubeck Patrick Sanan Paul Bartholomew Pierre Jolivet Rey Koki Richard Tran Mills Romain Beucher Satish Balay Scott Kruger Sebastian Grimberg Stefano Zampini Toby Isaac Vaclav Hapla Xiangmin Jiao Zongze Yang and bug reports/patches/proposed improvements received from Adem Candas Adina P?s?k Dave May David Hor?k David Trebotich "Deij-van Rijswijk, Menno" edgar at openmail.cc Elias Karabelas Fabio Rossi Fande Kong Getnet Betrie Giovane Avancini Gregory Walton "Hammond, Glenn E" Jacob Faibussowitsch Jed Brown Jin Chen Jose E. Roman Junchao Zhang Karthikeyan Chockalingam - STFC UKRI "Kaus, Boris" liang at geoazur.unice.fr Lisandro Dalcin Marco Cisternino Marius Buerkle Mark Adams Milan Pelletier Paul Bauman Pierre Jolivet Pierre Seize Qin C Satish Balay Sharan Roongta Stefano Zampini Thomas Vasileiou Victor Eijkhout "Williams, Timothy J." Xiaoye S. Li As always, thanks for your support, Barry -------------- next part -------------- An HTML attachment was scrubbed... URL: