From tom.alex.mondragon at gmail.com Sat Feb 1 01:09:18 2020 From: tom.alex.mondragon at gmail.com (Tomas Mondragon) Date: Sat, 1 Feb 2020 01:09:18 -0600 Subject: [petsc-users] Running moose/scripts/update_and_rebuild_petsc.sh on HPC In-Reply-To: References: <277eb13a-0590-4b1a-a089-09a7c35efd83@googlegroups.com> <641b2c64-0e88-47e7-b33a-e2528287095a@googlegroups.com> <2bf174ba-a994-45f8-a661-454458a6ffa3@googlegroups.com> <0a8315b7-185e-44c9-b1d3-d3b8f52939d4@googlegroups.com> <2c9e5abd-bd4f-4b95-b2ea-8aa6a993d5fb@googlegroups.com> <0b4c29ac-2261-404a-84f6-5e8e28e1c51f@googlegroups.com> <095881e4-592d-427a-ad84-6cbe5fb8fe2e@googlegroups.com> <1a976f38-4944-425f-af72-f5ce7ce3ac85@googlegroups.com> Message-ID: Thanks, that does sound useful! On Fri, Jan 31, 2020, 6:23 PM Smith, Barry F. wrote: > > You might find this option useful. > > --with-packages-download-dir= > Skip network download of package tarballs and locate them in > specified dir. If not found in dir, print package URL - so it can be > obtained manually. > > > This generates a list of URLs to download so you don't need to look > through the xxx.py files for that information. Conceivably a script could > gather this information from the run of configure and get the tarballs for > you. > > Barry > > > > > On Jan 31, 2020, at 11:58 AM, Tomas Mondragon < > tom.alex.mondragon at gmail.com> wrote: > > > > Hypre problem resolved. PETSc commit 05f86fb made in August 05, 2019 > added the line 'self.installwithbatch = 0' to the __init__ method of the > Configure class in the file > petsc/config/BuildSystem/config/packages/hypre.py to fix a bug with hypre > installation on Cray KNL systems. Since the machine I was installing os was > an SGI system, I decided to try switching to 'self.installwithbatch = 1' > and it worked! The configure script was finally able to run to completion. > > > > Perhaps there can be a Cray flag for configure that can control this, > since it is only Cray's that have this problem with Hypre? > > > > For my benefit when I have to do this again - > > To get moose/petsc/scripts/update_and_rebuild_petsc.sh to run on an SGI > system as a batch job, I had to: > > > > Make sure the git (gnu version) module was loaded > > git clone moose > > cd to the petsc directory and git clone the petsc submodule, but make > sure to pull the latest commit. The commit that the moose repo refers to is > outdated. > > cd back to the moose directory, git add petsc and git commit so that the > newest petsc commit gets used by the update script. otherwise the old > commit will be used. > > download the tarballs for fblaspack, hypre, metis, mumps, parmetis, > scalapack, (PT)scotch, slepc, and superLU_dist. The URLS are in the > __init__ methods of the relevant files > inmost/petsc/config/BuildSystem/config/packages/ > > alter moose/scripts/update_and_rebuild_petsc.sh script so that it is a > working PBS batch job. Be sure to module swap to the gcc compiler and > module load git (gnu version) and alter the ./configure command arguments > > adding > > --with-cudac=0 > > --with-batch=1 > > changing > > --download-=/path/to/thirdparty/package/tarball > > If the supercomputer is not a Cray KNL system, change line 26 of > moose/petsc/config/BuildSystem/config/packages/hypre.py from > 'self.installwithbath = 0' to 'self.installwithbatch = 1', otherwise, > install hypre on its own and use --with-hypre-dir=/path/to/hypre in the > ./configure command > > > > On Fri, Jan 31, 2020 at 10:06 AM Tomas Mondragon < > tom.alex.mondragon at gmail.com> wrote: > > Thanks for the change to base.py. Pulling the commit, confirm was able > to skip over lgrind and c2html. I did have a problem with Parmetis, but > that was because I was using an old ParMetis commit accidentally. Fixed by > downloading the right commit of ParMetis. > > > > My current problem is with Hypre. Apparently --download-hypre cannot be > used with --with-batch=1 even if the download URL is on the local machine. > The configuration.log that resulted is attached for anyone who may be > interested. > > > > -- > > You received this message because you are subscribed to a topic in the > Google Groups "moose-users" group. > > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/moose-users/2xZsBpG-DtY/unsubscribe. > > To unsubscribe from this group and all its topics, send an email to > moose-users+unsubscribe at googlegroups.com. > > To view this discussion on the web visit > https://groups.google.com/d/msgid/moose-users/a34fa09e-a4f5-4225-8933-34eb36759260%40googlegroups.com > . > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmitry.melnichuk at geosteertech.com Mon Feb 3 09:38:57 2020 From: dmitry.melnichuk at geosteertech.com (=?utf-8?B?0JTQvNC40YLRgNC40Lkg0JzQtdC70YzQvdC40YfRg9C6?=) Date: Mon, 03 Feb 2020 18:38:57 +0300 Subject: [petsc-users] Triple increasing of allocated memory during KSPSolve calling(GMRES preconditioned by ASM) Message-ID: <9444561580744337@myt2-b8bf7a4d4ebc.qloud-c.yandex.net> An HTML attachment was scrubbed... URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Feb 3 10:34:47 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 3 Feb 2020 11:34:47 -0500 Subject: [petsc-users] Triple increasing of allocated memory during KSPSolve calling(GMRES preconditioned by ASM) In-Reply-To: <9444561580744337@myt2-b8bf7a4d4ebc.qloud-c.yandex.net> References: <9444561580744337@myt2-b8bf7a4d4ebc.qloud-c.yandex.net> Message-ID: On Mon, Feb 3, 2020 at 10:38 AM ??????? ????????? < dmitry.melnichuk at geosteertech.com> wrote: > Hello all! > > Now I am faced with a problem associated with the memory allocation when > calling of KSPSolve . > > GMRES preconditioned by ASM for solving linear algebra system (obtained by > the finite element spatial discretisation of Biot poroelasticity model) was > chosen. > According to the output value of PetscMallocGetCurrentUsage subroutine 176 > MB for matrix and RHS vector storage is required (before KSPSolve calling). > But during solving linear algebra system 543 MB of RAM is required (during > KSPSolve calling). > Thus, the amount of allocated memory after preconditioning stage increased > three times. This kind of behaviour is critically for 3D models with > several millions of cells. > 1) In order to know anything, we have to see the output of -ksp_view, although I see you used an overlap of 2 2) The overlap increases the size of submatrices beyond that of the original matrix. Suppose that you used LU for the sub-preconditioner. You would need at least 2x memory (with ILU(0)) since the matrix dominates memory usage. Moreover, you have overlap and you might have fill-in depending on the solver. 3) The massif tool from valgrind is a good fine-grained way to look at memory allocation Thanks, Matt Is there a way to decrease amout of allocated memory? > Is that an expected behaviour for GMRES-ASM combination? > > As I remember, using previous version of PETSc didn't demonstrate so > significante memory increasing. > > ... > Vec :: Vec_F, Vec_U > Mat :: Mat_K > ... > ... > call MatAssemblyBegin(Mat_M,Mat_Final_Assembly,ierr) > call MatAssemblyEnd(Mat_M,Mat_Final_Assembly,ierr) > .... > call VecAssemblyBegin(Vec_F_mod,ierr) > call VecAssemblyEnd(Vec_F_mod,ierr) > ... > ... > call PetscMallocGetCurrentUsage(mem, ierr) > print *,"Memory used: ",mem > ... > ... > call KSPSetType(Krylov,KSPGMRES,ierr) > call KSPGetPC(Krylov,PreCon,ierr) > call PCSetType(PreCon,PCASM,ierr) > call KSPSetFromOptions(Krylov,ierr) > ... > call KSPSolve(Krylov,Vec_F,Vec_U,ierr) > ... > ... > options = "-pc_asm_overlap 2 -pc_asm_type basic -ksp_monitor > -ksp_converged_reason" > > > Kind regards, > Dmitry Melnichuk > Matrix.dat (265288024) > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashour.msc at gmail.com Sun Feb 2 11:24:42 2020 From: ashour.msc at gmail.com (Mohammed Ashour) Date: Sun, 2 Feb 2020 18:24:42 +0100 Subject: [petsc-users] Flagging the solver to restart Message-ID: Dear All, I'm solving a constraint phase-field problem using PetIGA. This question i'm having is more relevant to PETSc, so I'm posting here. I have an algorithm involving iterating on the solution vector until certain criteria are met before moving forward for the next time step. The sequence inside the TSSolve is to call TSMonitor first, to print a user-defined set of values and the move to solve at TSStep and then call TSPostEvaluate. So I'm using the TSMonitor to update some variables at time n , those variables are used the in the residual and jacobian calculations at time n+1, and then solving and then check if those criteria are met or not in a function assigned to TS via TSSetPostEvaluate, if the criteria are met, it'll move forward, if not, it'll engaged the routine TSRollBack(), which based on my understanding is the proper way yo flag the solver to recalculate n+1. My question is, is this the proper way to do it? what is the difference between TSRollBack and TSRestart? Thanks a lot -- -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon Feb 3 10:55:36 2020 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Mon, 3 Feb 2020 16:55:36 +0000 Subject: [petsc-users] Triple increasing of allocated memory during KSPSolve calling(GMRES preconditioned by ASM) In-Reply-To: References: <9444561580744337@myt2-b8bf7a4d4ebc.qloud-c.yandex.net> Message-ID: <86DE2602-DCBD-47C4-819C-223F9CF9A503@mcs.anl.gov> GMRES also can by default require about 35 work vectors if it reaches the full restart. You can set a smaller restart with -ksp_gmres_restart 15 for example but this can also hurt the convergence of GMRES dramatically. People sometimes use the KSPBCGS algorithm since it does not require all the restart vectors but it can also converge more slowly. Depending on how much memory the sparse matrices use relative to the vectors the vector memory may matter or not. If you are using a recent version of PETSc you can run with -log_view -log_view_memory and it will show on the right side of the columns how much memory is being allocated for each of the operations in various ways. Barry > On Feb 3, 2020, at 10:34 AM, Matthew Knepley wrote: > > On Mon, Feb 3, 2020 at 10:38 AM ??????? ????????? wrote: > Hello all! > > Now I am faced with a problem associated with the memory allocation when calling of KSPSolve . > > GMRES preconditioned by ASM for solving linear algebra system (obtained by the finite element spatial discretisation of Biot poroelasticity model) was chosen. > According to the output value of PetscMallocGetCurrentUsage subroutine 176 MB for matrix and RHS vector storage is required (before KSPSolve calling). > But during solving linear algebra system 543 MB of RAM is required (during KSPSolve calling). > Thus, the amount of allocated memory after preconditioning stage increased three times. This kind of behaviour is critically for 3D models with several millions of cells. > > 1) In order to know anything, we have to see the output of -ksp_view, although I see you used an overlap of 2 > > 2) The overlap increases the size of submatrices beyond that of the original matrix. Suppose that you used LU for the sub-preconditioner. > You would need at least 2x memory (with ILU(0)) since the matrix dominates memory usage. Moreover, you have overlap > and you might have fill-in depending on the solver. > > 3) The massif tool from valgrind is a good fine-grained way to look at memory allocation > > Thanks, > > Matt > > Is there a way to decrease amout of allocated memory? > Is that an expected behaviour for GMRES-ASM combination? > > As I remember, using previous version of PETSc didn't demonstrate so significante memory increasing. > > ... > Vec :: Vec_F, Vec_U > Mat :: Mat_K > ... > ... > call MatAssemblyBegin(Mat_M,Mat_Final_Assembly,ierr) > call MatAssemblyEnd(Mat_M,Mat_Final_Assembly,ierr) > .... > call VecAssemblyBegin(Vec_F_mod,ierr) > call VecAssemblyEnd(Vec_F_mod,ierr) > ... > ... > call PetscMallocGetCurrentUsage(mem, ierr) > print *,"Memory used: ",mem > ... > ... > call KSPSetType(Krylov,KSPGMRES,ierr) > call KSPGetPC(Krylov,PreCon,ierr) > call PCSetType(PreCon,PCASM,ierr) > call KSPSetFromOptions(Krylov,ierr) > ... > call KSPSolve(Krylov,Vec_F,Vec_U,ierr) > ... > ... > options = "-pc_asm_overlap 2 -pc_asm_type basic -ksp_monitor -ksp_converged_reason" > > > Kind regards, > Dmitry Melnichuk > Matrix.dat (265288024) > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From dmitry.melnichuk at geosteertech.com Tue Feb 4 06:04:58 2020 From: dmitry.melnichuk at geosteertech.com (=?utf-8?B?0JTQvNC40YLRgNC40Lkg0JzQtdC70YzQvdC40YfRg9C6?=) Date: Tue, 04 Feb 2020 15:04:58 +0300 Subject: [petsc-users] Triple increasing of allocated memory during KSPSolve calling(GMRES preconditioned by ASM) In-Reply-To: <86DE2602-DCBD-47C4-819C-223F9CF9A503@mcs.anl.gov> References: <9444561580744337@myt2-b8bf7a4d4ebc.qloud-c.yandex.net> <86DE2602-DCBD-47C4-819C-223F9CF9A503@mcs.anl.gov> Message-ID: <14855941580817898@vla4-d1c3bcedfacb.qloud-c.yandex.net> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Logs_26K_GMRES-ASM-log_view-log_view_memory-malloc_dump_32bit Type: application/octet-stream Size: 68273 bytes Desc: not available URL: From bsmith at mcs.anl.gov Tue Feb 4 10:04:45 2020 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Tue, 4 Feb 2020 16:04:45 +0000 Subject: [petsc-users] Triple increasing of allocated memory during KSPSolve calling(GMRES preconditioned by ASM) In-Reply-To: <14855941580817898@vla4-d1c3bcedfacb.qloud-c.yandex.net> References: <9444561580744337@myt2-b8bf7a4d4ebc.qloud-c.yandex.net> <86DE2602-DCBD-47C4-819C-223F9CF9A503@mcs.anl.gov> <14855941580817898@vla4-d1c3bcedfacb.qloud-c.yandex.net> Message-ID: <72B6812D-4131-41A6-9A30-878D2F9058D8@mcs.anl.gov> Please run with the option -ksp_view so we know the exact solver options you are using. From the lines MatCreateSubMats 1 1.0 1.9397e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 1.1066e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatIncreaseOvrlp 1 1.0 3.0324e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 and the fact you have three matrices I would guess you are using the additive Schwarz preconditioner (ASM) with ILU(0) on the blocks. (which converges the same as ILU on one process but does use much more memory). Note: your code is still built with 32 bit integers. I would guess the basic matrix formed plus the vectors in this example could take ~200 MB. It is the two matrices in the additive Schwarz that is taking the additional memory. What kind of PDEs are you solving and what kind of formulation? ASM plus ILU is the "work mans" type preconditioner, relatively robust but not particularly fast for convergence. Depending on your problem you might be able to do much better convergence wise by using a PCFIELDSPLIT and a PCGAMG on one of the splits. In your own run you see the ILU is chugging along rather slowly to the solution. With your current solvers you can use the option -sub_pc_factor_in_place which will shave off one of the matrices memories. Please try that. Avoiding the ASM you can avoid both extra matrices but at the cost of even slower convergence. Use, for example -pc_type sor The petroleum industry also has a variety of "custom" preconditioners/solvers for particular models and formulations that can beat the convergence of general purpose solvers; and require less memory. Some of these can be implemented or simulated with PETSc. Some of these are implemented in the commercial petroleum simulation codes and it can be difficult to get a handle on exactly what they do because of proprietary issues. I think I have an old text on these approaches in my office, there may be modern books that discuss these. Barry > On Feb 4, 2020, at 6:04 AM, ??????? ????????? wrote: > > Hello again! > Thank you very much for your replies! > Log is attached. > > 1. The main problem now is following. To solve the matrix that is attached to my previous e-mail PETSc consumes ~550 MB. > I know definitely that there are commercial softwares in petroleum industry (e.g., Schlumberger Petrel) that solve the same initial problem consuming only ~200 MB. > Moreover, I am sure that when I used 32-bit PETSc (GMRES + ASM) a year ago, it also consumed ~200 MB for this matrix. > > So, my question is: do you have any advice on how to decrease the amount of RAM consumed for such matrix from 550 MB to 200 MB? Maybe some specific preconditioner or other ways? > > I will be very grateful for any thoughts! > > 2. The second problem is more particular. > According to resource manager in Windows 10, Fortran solver based on PETSc consumes 548 MB RAM while solving the system of linear equations. > As I understand it form logs, it is required 459 MB and 52 MB for matrix and vector storage respectively. After summing of all objects for which memory is allocated we get only 517 MB. > > Thank you again for your time! Have a nice day. > > Kind regards, > Dmitry > > > 03.02.2020, 19:55, "Smith, Barry F." : > > GMRES also can by default require about 35 work vectors if it reaches the full restart. You can set a smaller restart with -ksp_gmres_restart 15 for example but this can also hurt the convergence of GMRES dramatically. People sometimes use the KSPBCGS algorithm since it does not require all the restart vectors but it can also converge more slowly. > > Depending on how much memory the sparse matrices use relative to the vectors the vector memory may matter or not. > > If you are using a recent version of PETSc you can run with -log_view -log_view_memory and it will show on the right side of the columns how much memory is being allocated for each of the operations in various ways. > > Barry > > > > On Feb 3, 2020, at 10:34 AM, Matthew Knepley wrote: > > On Mon, Feb 3, 2020 at 10:38 AM ??????? ????????? wrote: > Hello all! > > Now I am faced with a problem associated with the memory allocation when calling of KSPSolve . > > GMRES preconditioned by ASM for solving linear algebra system (obtained by the finite element spatial discretisation of Biot poroelasticity model) was chosen. > According to the output value of PetscMallocGetCurrentUsage subroutine 176 MB for matrix and RHS vector storage is required (before KSPSolve calling). > But during solving linear algebra system 543 MB of RAM is required (during KSPSolve calling). > Thus, the amount of allocated memory after preconditioning stage increased three times. This kind of behaviour is critically for 3D models with several millions of cells. > > 1) In order to know anything, we have to see the output of -ksp_view, although I see you used an overlap of 2 > > 2) The overlap increases the size of submatrices beyond that of the original matrix. Suppose that you used LU for the sub-preconditioner. > You would need at least 2x memory (with ILU(0)) since the matrix dominates memory usage. Moreover, you have overlap > and you might have fill-in depending on the solver. > > 3) The massif tool from valgrind is a good fine-grained way to look at memory allocation > > Thanks, > > Matt > > Is there a way to decrease amout of allocated memory? > Is that an expected behaviour for GMRES-ASM combination? > > As I remember, using previous version of PETSc didn't demonstrate so significante memory increasing. > > ... > Vec :: Vec_F, Vec_U > Mat :: Mat_K > ... > ... > call MatAssemblyBegin(Mat_M,Mat_Final_Assembly,ierr) > call MatAssemblyEnd(Mat_M,Mat_Final_Assembly,ierr) > .... > call VecAssemblyBegin(Vec_F_mod,ierr) > call VecAssemblyEnd(Vec_F_mod,ierr) > ... > ... > call PetscMallocGetCurrentUsage(mem, ierr) > print *,"Memory used: ",mem > ... > ... > call KSPSetType(Krylov,KSPGMRES,ierr) > call KSPGetPC(Krylov,PreCon,ierr) > call PCSetType(PreCon,PCASM,ierr) > call KSPSetFromOptions(Krylov,ierr) > ... > call KSPSolve(Krylov,Vec_F,Vec_U,ierr) > ... > ... > options = "-pc_asm_overlap 2 -pc_asm_type basic -ksp_monitor -ksp_converged_reason" > > > Kind regards, > Dmitry Melnichuk > Matrix.dat (265288024) > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > From hongzhang at anl.gov Tue Feb 4 10:32:45 2020 From: hongzhang at anl.gov (Zhang, Hong) Date: Tue, 4 Feb 2020 16:32:45 +0000 Subject: [petsc-users] Flagging the solver to restart In-Reply-To: References: Message-ID: <4246EF6F-6B7A-4202-806B-6D334E5B9427@anl.gov> > On Feb 2, 2020, at 11:24 AM, Mohammed Ashour wrote: > > Dear All, > I'm solving a constraint phase-field problem using PetIGA. This question i'm having is more relevant to PETSc, so I'm posting here. > > I have an algorithm involving iterating on the solution vector until certain criteria are met before moving forward for the next time step. The sequence inside the TSSolve is to call TSMonitor first, to print a user-defined set of values and the move to solve at TSStep and then call TSPostEvaluate. > > So I'm using the TSMonitor to update some variables at time n , those variables are used the in the residual and jacobian calculations at time n+1, and then solving and then check if those criteria are met or not in a function assigned to TS via TSSetPostEvaluate, if the criteria are met, it'll move forward, if not, it'll engaged the routine TSRollBack(), which based on my understanding is the proper way yo flag the solver to recalculate n+1. My question is, is this the proper way to do it? what is the difference between TSRollBack and TSRestart? You are right that TSRollBack() recalculates the current time step. But I would not suggest to use TSPostEvaluate in your case. Presumably you are not using the PETSc adaptor (e.g. via -ts_adapt_type none) and want to control the stepsize yourself. You can check the criteria in TSPostStep, call TSRollBack() if the criteria are not met and update the variables accordingly. The variables can also be updated in TSPreStep(), but TSMonitor should not be used since it is designed for read-only operations. TSRestart may be needed when you are using non-self-starting integration methods such as multiple step methods and FSAL RK methods (-ts_rk_type <3bs,5dp,5bs,6vr,7vr,8vr>). These methods rely on solutions or residuals from previous time steps, thus need a flag to hard restart the time integration whenever discontinuity is introduced (e.g. a parameter in the RHS function is changed). So TSRestart sets the flag to tell the integrator to treat the next time step like the first time step in a time integration. Hong (Mr.) > Thanks a lot > > -- From balay at mcs.anl.gov Tue Feb 4 12:10:41 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 4 Feb 2020 12:10:41 -0600 Subject: [petsc-users] petsc-3.12.4.tar.gz now available Message-ID: Dear PETSc users, The patch release petsc-3.12.4 is now available for download, with change list at 'PETSc-3.12 Changelog' http://www.mcs.anl.gov/petsc/download/index.html Satish From dong-hao at outlook.com Tue Feb 4 12:41:43 2020 From: dong-hao at outlook.com (Hao DONG) Date: Tue, 4 Feb 2020 18:41:43 +0000 Subject: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC? Message-ID: Dear all, I have a few questions about the implementation of diagonal ILU PC in PETSc. I want to solve a very simple system with KSP (in parallel), the nature of the system (finite difference time-harmonic Maxwell) is probably not important to the question itself. Long story short, I just need to solve a set of equations of Ax = b with a block diagonal system matrix, like (not sure if the mono font works): |X | A =| Y | | Z| Note that A is not really block-diagonal, it?s just a multi-diagonal matrix determined by the FD mesh, where most elements are close to diagonal. So instead of a full ILU decomposition, a D-ILU is a good approximation as a preconditioner, and the number of blocks should not matter too much: |Lx | |Ux | L = | Ly | and U = | Uy | | Lz| | Uz| Where [Lx, Ux] = ILU0(X), etc. Indeed, the D-ILU preconditioner (with 3N blocks) is quite efficient with Krylov subspace methods like BiCGstab or QMR in my sequential Fortran/Matlab code. So like most users, I am looking for a parallel implement with this problem in PETSc. After looking through the manual, it seems that the most straightforward way to do it is through PCBJACOBI. Not sure I understand it right, I just setup a 3-block PCJACOBI and give each of the block a KSP with PCILU. Is this supposed to be equivalent to my D-ILU preconditioner? My little block of fortran code would look like: ... call PCBJacobiSetTotalBlocks(pc_local,Ntotal, & & isubs,ierr) call PCBJacobiSetLocalBlocks(pc_local, Nsub, & & isubs(istart:iend),ierr) ! set up the block jacobi structure call KSPSetup(ksp_local,ierr) ! allocate sub ksps allocate(ksp_sub(Nsub)) call PCBJacobiGetSubKSP(pc_local,Nsub,istart, & & ksp_sub,ierr) do i=1,Nsub call KSPGetPC(ksp_sub(i),pc_sub,ierr) !ILU preconditioner call PCSetType(pc_sub,ptype,ierr) call PCFactorSetLevels(pc_sub,1,ierr) ! use ILU(1) here call KSPSetType(ksp_sub(i),KSPPREONLY,ierr)] end do call KSPSetTolerances(ksp_local,KSSiter%tol,PETSC_DEFAULT_REAL, & & PETSC_DEFAULT_REAL,KSSiter%maxit,ierr) ? I understand that the parallel performance may not be comparable, so I first set up a one-process test (with MPIAij, but all the rows are local since there is only one process). The system is solved without any problem (identical results within error). But the performance is actually a lot worse (code built without debugging flags in performance tests) than my own home-brew implementation in Fortran (I wrote my own ILU0 in CSR sparse matrix format), which is hard to believe. I suspect the difference is from the PC as the PETSc version took much more BiCGstab iterations (60-ish vs 100-ish) to converge to the same relative tol. This is further confirmed when I change the setup of D-ILU (using 6 or 9 blocks instead of 3). While my Fortran/Matlab codes see minimal performance difference (<5%) when I play with the D-ILU setup, increasing the number of D-ILU blocks from 3 to 9 caused the ksp setup with PCBJACOBI to suffer a performance decrease of more than 50% in sequential test. So my implementation IS somewhat different in PETSc. Do I miss something in the PCBJACOBI setup? Or do I have some fundamental misunderstanding of how PCBJACOBI works in PETSc? If this is not the right way to implement a block diagonal ILU as (parallel) PC, please kindly point me to the right direction. I searched through the mail list to find some answers, only to find a couple of similar questions... An example would be nice. On the other hand, does PETSc support a simple way to use explicit L/U matrix as a preconditioner? I can import the D-ILU matrix (I already converted my A matrix into Mat) constructed in my Fortran code to make a better comparison. Or do I have to construct my own PC using PCshell? If so, is there a good tutorial/example to learn about how to use PCSHELL (in a more advanced sense, like how to setup pc side and transpose)? Thanks in advance, Hao -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Tue Feb 4 13:16:04 2020 From: hzhang at mcs.anl.gov (hzhang at mcs.anl.gov) Date: Tue, 4 Feb 2020 13:16:04 -0600 Subject: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC? In-Reply-To: References: Message-ID: Hao: I would suggest to use a parallel sparse direct solver, e.g., superlu_dist or mumps. These solvers can take advantage of your sparse data structure. Once it works, then you may play with other preconditioners, such as bjacobi + lu/ilu. See https://www.mcs.anl.gov/petsc/miscellaneous/external.html Hong Dear all, > > > I have a few questions about the implementation of diagonal ILU PC in > PETSc. I want to solve a very simple system with KSP (in parallel), the > nature of the system (finite difference time-harmonic Maxwell) is probably > not important to the question itself. Long story short, I just need to > solve a set of equations of Ax = b with a block diagonal system matrix, > like (not sure if the mono font works): > > |X | > A =| Y | > | Z| > > Note that A is not really block-diagonal, it?s just a multi-diagonal > matrix determined by the FD mesh, where most elements are close to > diagonal. So instead of a full ILU decomposition, a D-ILU is a good > approximation as a preconditioner, and the number of blocks should not > matter too much: > > |Lx | |Ux | > L = | Ly | and U = | Uy | > | Lz| | Uz| > > Where [Lx, Ux] = ILU0(X), etc. Indeed, the D-ILU preconditioner (with 3N > blocks) is quite efficient with Krylov subspace methods like BiCGstab or > QMR in my sequential Fortran/Matlab code. > > So like most users, I am looking for a parallel implement with this > problem in PETSc. After looking through the manual, it seems that the > most straightforward way to do it is through PCBJACOBI. Not sure I > understand it right, I just setup a 3-block PCJACOBI and give each of the > block a KSP with PCILU. Is this supposed to be equivalent to my > D-ILU preconditioner? My little block of fortran code would look like: > ... > * call* PCBJacobiSetTotalBlocks(pc_local,Ntotal, & > & isubs,ierr) > *call* PCBJacobiSetLocalBlocks(pc_local, Nsub, & > & isubs(istart:iend),ierr) > ! set up the block jacobi structure > *call* KSPSetup(ksp_local,ierr) > ! allocate sub ksps > allocate(ksp_sub(Nsub)) > *call* PCBJacobiGetSubKSP(pc_local,Nsub,istart, & > & ksp_sub,ierr) > do i=1,Nsub > *call* KSPGetPC(ksp_sub(i),pc_sub,ierr) > !ILU preconditioner > *call* PCSetType(pc_sub,ptype,ierr) > *call* PCFactorSetLevels(pc_sub,1,ierr) ! use ILU(1) here > *call* KSPSetType(ksp_sub(i),KSPPREONLY,ierr)] > end do > *call* KSPSetTolerances(ksp_local,KSSiter%tol,PETSC_DEFAULT_REAL, & > & PETSC_DEFAULT_REAL,KSSiter%maxit,ierr) > ? > > I understand that the parallel performance may not be comparable, so I > first set up a one-process test (with MPIAij, but all the rows are local > since there is only one process). The system is solved without any > problem (identical results within error). But the performance is actually a > lot worse (code built without debugging flags in performance tests) than my > own home-brew implementation in Fortran (I wrote my own ILU0 in CSR sparse > matrix format), which is hard to believe. I suspect the difference is from > the PC as the PETSc version took much more BiCGstab iterations (60-ish vs > 100-ish) to converge to the same relative tol. > > This is further confirmed when I change the setup of D-ILU (using 6 or 9 > blocks instead of 3). While my Fortran/Matlab codes see minimal performance > difference (<5%) when I play with the D-ILU setup, increasing the number of > D-ILU blocks from 3 to 9 caused the ksp setup with PCBJACOBI to suffer a > performance decrease of more than 50% in sequential test. So my > implementation IS somewhat different in PETSc. Do I miss something in the > PCBJACOBI setup? Or do I have some fundamental misunderstanding of how > PCBJACOBI works in PETSc? > > If this is not the right way to implement a block diagonal ILU as > (parallel) PC, please kindly point me to the right direction. I searched > through the mail list to find some answers, only to find a couple of > similar questions... An example would be nice. > > On the other hand, does PETSc support a simple way to use explicit L/U > matrix as a preconditioner? I can import the D-ILU matrix (I already > converted my A matrix into Mat) constructed in my Fortran code to make a > better comparison. Or do I have to construct my own PC using PCshell? If > so, is there a good tutorial/example to learn about how to use PCSHELL (in > a more advanced sense, like how to setup pc side and transpose)? > > Thanks in advance, > > Hao > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Feb 4 14:27:09 2020 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Tue, 4 Feb 2020 20:27:09 +0000 Subject: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC? In-Reply-To: References: Message-ID: <264F91C4-8558-4850-9B4B-ABE4123C2A2C@anl.gov> > On Feb 4, 2020, at 12:41 PM, Hao DONG wrote: > > Dear all, > > > I have a few questions about the implementation of diagonal ILU PC in PETSc. I want to solve a very simple system with KSP (in parallel), the nature of the system (finite difference time-harmonic Maxwell) is probably not important to the question itself. Long story short, I just need to solve a set of equations of Ax = b with a block diagonal system matrix, like (not sure if the mono font works): > > |X | > A =| Y | > | Z| > > Note that A is not really block-diagonal, it?s just a multi-diagonal matrix determined by the FD mesh, where most elements are close to diagonal. So instead of a full ILU decomposition, a D-ILU is a good approximation as a preconditioner, and the number of blocks should not matter too much: > > |Lx | |Ux | > L = | Ly | and U = | Uy | > | Lz| | Uz| > > Where [Lx, Ux] = ILU0(X), etc. Indeed, the D-ILU preconditioner (with 3N blocks) is quite efficient with Krylov subspace methods like BiCGstab or QMR in my sequential Fortran/Matlab code. > > So like most users, I am looking for a parallel implement with this problem in PETSc. After looking through the manual, it seems that the most straightforward way to do it is through PCBJACOBI. Not sure I understand it right, I just setup a 3-block PCJACOBI and give each of the block a KSP with PCILU. Is this supposed to be equivalent to my D-ILU preconditioner? My little block of fortran code would look like: > ... > call PCBJacobiSetTotalBlocks(pc_local,Ntotal, & > & isubs,ierr) > call PCBJacobiSetLocalBlocks(pc_local, Nsub, & > & isubs(istart:iend),ierr) > ! set up the block jacobi structure > call KSPSetup(ksp_local,ierr) > ! allocate sub ksps > allocate(ksp_sub(Nsub)) > call PCBJacobiGetSubKSP(pc_local,Nsub,istart, & > & ksp_sub,ierr) > do i=1,Nsub > call KSPGetPC(ksp_sub(i),pc_sub,ierr) > !ILU preconditioner > call PCSetType(pc_sub,ptype,ierr) > call PCFactorSetLevels(pc_sub,1,ierr) ! use ILU(1) here > call KSPSetType(ksp_sub(i),KSPPREONLY,ierr)] > end do > call KSPSetTolerances(ksp_local,KSSiter%tol,PETSC_DEFAULT_REAL, & > & PETSC_DEFAULT_REAL,KSSiter%maxit,ierr) > ? This code looks essentially right. You should call with -ksp_view to see exactly what PETSc is using for a solver. > > I understand that the parallel performance may not be comparable, so I first set up a one-process test (with MPIAij, but all the rows are local since there is only one process). The system is solved without any problem (identical results within error). But the performance is actually a lot worse (code built without debugging flags in performance tests) than my own home-brew implementation in Fortran (I wrote my own ILU0 in CSR sparse matrix format), which is hard to believe. I suspect the difference is from the PC as the PETSc version took much more BiCGstab iterations (60-ish vs 100-ish) to converge to the same relative tol. PETSc uses GMRES by default with a restart of 30 and left preconditioning. It could be different exact choices in the solver (which is why -ksp_view is so useful) can explain the differences in the runs between your code and PETSc's > > This is further confirmed when I change the setup of D-ILU (using 6 or 9 blocks instead of 3). While my Fortran/Matlab codes see minimal performance difference (<5%) when I play with the D-ILU setup, increasing the number of D-ILU blocks from 3 to 9 caused the ksp setup with PCBJACOBI to suffer a performance decrease of more than 50% in sequential test. This is odd, the more blocks the smaller each block so the quicker the ILU set up should be. You can run various cases with -log_view and send us the output to see what is happening at each part of the computation in time. > So my implementation IS somewhat different in PETSc. Do I miss something in the PCBJACOBI setup? Or do I have some fundamental misunderstanding of how PCBJACOBI works in PETSc? Probably not. > > If this is not the right way to implement a block diagonal ILU as (parallel) PC, please kindly point me to the right direction. I searched through the mail list to find some answers, only to find a couple of similar questions... An example would be nice. You approach seems fundamentally right but I cannot be sure of possible bugs. > > On the other hand, does PETSc support a simple way to use explicit L/U matrix as a preconditioner? I can import the D-ILU matrix (I already converted my A matrix into Mat) constructed in my Fortran code to make a better comparison. Or do I have to construct my own PC using PCshell? If so, is there a good tutorial/example to learn about how to use PCSHELL (in a more advanced sense, like how to setup pc side and transpose)? Not sure what you mean by explicit L/U matrix as a preconditioner. As Hong said, yes you can use a parallel LU from MUMPS or SuperLU_DIST or Pastix as the solver. You do not need any shell code. You simply need to set the PCType to lu You can also set all this options from the command line and don't need to write the code specifically. So call KSPSetFromOptions() and then for example -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type ilu (this last one is applied to each block so you could use -pc_type lu and it would use lu on each block.) -ksp_type_none -pc_type lu -pc_factor_mat_solver_type mumps (do parallel LU with mumps) By not hardwiring in the code and just using options you can test out different cases much quicker Use -ksp_view to make sure that is using the solver the way you expect. Barry Barry > > Thanks in advance, > > Hao From aan2 at princeton.edu Tue Feb 4 17:07:36 2020 From: aan2 at princeton.edu (Olek Niewiarowski) Date: Tue, 4 Feb 2020 23:07:36 +0000 Subject: [petsc-users] Implementing the Sherman Morisson formula (low rank update) in petsc4py and FEniCS? Message-ID: Hello, I am a FEniCS user but new to petsc4py. I am trying to modify the KSP solver through the SNES object to implement the Sherman-Morrison formula (e.g. http://fourier.eng.hmc.edu/e176/lectures/algebra/node6.html ). I am solving a nonlinear system of the form [K(u)?kaaT]?u=?F(u). Here the jacobian matrix K is modified by the term kaaT, where k is a scalar. Notably, K is sparse, while the term kaaT results in a full matrix. This problem can be solved efficiently using the Sherman-Morrison formula : [K?kaaT]-1 = K-1 - (kK-1 aaTK-1)/(1+kaTK-1a) I have managed to successfully implement this at the linear solve level (by modifying the KSP solver) inside a custom Newton solver in python by following an incomplete tutorial at https://www.firedrakeproject.org/petsc-interface.html#defining-a-preconditioner : * while (norm(delU) > alpha): # while not converged * * self.update_F() # call to method to update r.h.s form * self.update_K() # call to update the jacobian form * K = assemble(self.K) # assemble the jacobian matrix * F = assemble(self.F) # assemble the r.h.s vector * a = assemble(self.a_form) # assemble the a_form (see Sherman Morrison formula) * * for bc in self.mem.bc: # apply boundary conditions * bc.apply(K, F) * bc.apply(K, a) * * B = PETSc.Mat().create() * * # Assemble the bilinear form that defines A and get the concrete * # PETSc matrix * A = as_backend_type(K).mat() # get the PETSc objects for K and a * u = as_backend_type(a).vec() * * # Build the matrix "context" # see firedrake docs * Bctx = MatrixFreeB(A, u, u, self.k) * * # Set up B * # B is the same size as A * B.setSizes(*A.getSizes()) * B.setType(B.Type.PYTHON) * B.setPythonContext(Bctx) * B.setUp() * * * ksp = PETSc.KSP().create() # create the KSP linear solver object * ksp.setOperators(B) * ksp.setUp() * pc = ksp.pc * pc.setType(pc.Type.PYTHON) * pc.setPythonContext(MatrixFreePC()) * ksp.setFromOptions() * * solution = delU # the incremental displacement at this iteration * * b = as_backend_type(-F).vec() * delu = solution.vector().vec() * * ksp.solve(b, delu) * * self.mem.u.vector().axpy(0.25, self.delU.vector()) # poor man's linesearch * counter += 1 Here is the corresponding petsc4py code adapted from the firedrake docs: 1. class MatrixFreePC(object): 2. 3. def setUp(self, pc): 4. B, P = pc.getOperators() 5. # extract the MatrixFreeB object from B 6. ctx = B.getPythonContext() 7. self.A = ctx.A 8. self.u = ctx.u 9. self.v = ctx.v 10. self.k = ctx.k 11. # Here we build the PC object that uses the concrete, 12. # assembled matrix A. We will use this to apply the action 13. # of A^{-1} 14. self.pc = PETSc.PC().create() 15. self.pc.setOptionsPrefix("mf_") 16. self.pc.setOperators(self.A) 17. self.pc.setFromOptions() 18. # Since u and v do not change, we can build the denominator 19. # and the action of A^{-1} on u only once, in the setup 20. # phase. 21. tmp = self.A.createVecLeft() 22. self.pc.apply(self.u, tmp) 23. self._Ainvu = tmp 24. self._denom = 1 + self.k*self.v.dot(self._Ainvu) 25. 26. def apply(self, pc, x, y): 27. # y <- A^{-1}x 28. self.pc.apply(x, y) 29. # alpha <- (v^T A^{-1} x) / (1 + v^T A^{-1} u) 30. alpha = (self.k*self.v.dot(y)) / self._denom 31. # y <- y - alpha * A^{-1}u 32. y.axpy(-alpha, self._Ainvu) 33. 34. 35. class MatrixFreeB(object): 36. 37. def __init__(self, A, u, v, k): 38. self.A = A 39. self.u = u 40. self.v = v 41. self.k = k 42. 43. def mult(self, mat, x, y): 44. # y <- A x 45. self.A.mult(x, y) 46. 47. # alpha <- v^T x 48. alpha = self.v.dot(x) 49. 50. # y <- y + alpha*u 51. y.axpy(alpha, self.u) However, this approach is not efficient as it requires many iterations due to the Newton step being fixed, so I would like to implement it using SNES and use line search. Unfortunately, I have not been able to find any documentation/tutorial on how to do so. Provided I have the FEniCS forms for F, K, and a, I'd like to do something along the lines of: solver = PETScSNESSolver() # the FEniCS SNES wrapper snes = solver.snes() # the petsc4py SNES object ## ?? ksp = snes.getKSP() # set ksp option similar to above solver.solve() I would be very grateful if anyone could could help or point me to a reference or demo that does something similar (or maybe a completely different way of solving the problem!). Many thanks in advance! Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From sajidsyed2021 at u.northwestern.edu Tue Feb 4 18:30:46 2020 From: sajidsyed2021 at u.northwestern.edu (Sajid Ali) Date: Tue, 4 Feb 2020 18:30:46 -0600 Subject: [petsc-users] Required structure and attrs for MatLoad from hdf5 Message-ID: Hi PETSc-developers, The example src/mat/examples/tutorials/ex10.c shows how one would read a matrix from a hdf5 file. Since MatView isn?t implemented for hdf5_mat format, how is the hdf5 file (to be used to run ex10) generated ? I tried reading from an hdf5 file but I saw an error stating object 'jc' doesn't exist and thus would like to know how I should store a sparse matrix in an hdf5 file so that MatLoad works. PS: I?m guessing that MATLAB stores the matrix in the format that PETSc expects (group/dset/attrs) but I?m creating this from Python. If the recommended approach is to transfer numpy arrays to PETSc matrices via petsc4py, I?d switch to that instead of directly creating hdf5 files. Thank You, Sajid Ali | PhD Candidate Applied Physics Northwestern University s-sajid-ali.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Feb 4 19:56:34 2020 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Wed, 5 Feb 2020 01:56:34 +0000 Subject: [petsc-users] Required structure and attrs for MatLoad from hdf5 In-Reply-To: References: Message-ID: <0B9AE74A-7239-4CCF-8A64-A4CFB46B1370@anl.gov> I think this is a Python-Matlab question, not specifically related to PETSc in any way. Googling python matrix hdf5 matlab there are mentions of h5py library that can be used to write out sparse matrices in Matlab HDF5 format. Which could presumably be read by PETSc. PETSc can also read in the non-transposed version which presumably can also be written out with h5py https://www.loc.gov/preservation/digital/formats/fdd/fdd000440.shtml gives some indication that an indication of whether the transpose is stored in the file might exist, if so and it is used by Matlab we wouldn't need the special matlab format flag. Barry > On Feb 4, 2020, at 6:30 PM, Sajid Ali wrote: > > Hi PETSc-developers, > > The example src/mat/examples/tutorials/ex10.c shows how one would read a matrix from a hdf5 file. Since MatView isn?t implemented for hdf5_mat format, how is the hdf5 file (to be used to run ex10) generated ? > > I tried reading from an hdf5 file but I saw an error stating object 'jc' doesn't exist and thus would like to know how I should store a sparse matrix in an hdf5 file so that MatLoad works. > > PS: I?m guessing that MATLAB stores the matrix in the format that PETSc expects (group/dset/attrs) but I?m creating this from Python. If the recommended approach is to transfer numpy arrays to PETSc matrices via petsc4py, I?d switch to that instead of directly creating hdf5 files. > > Thank You, > Sajid Ali | PhD Candidate > Applied Physics > Northwestern University > s-sajid-ali.github.io > From bsmith at mcs.anl.gov Wed Feb 5 00:35:58 2020 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Wed, 5 Feb 2020 06:35:58 +0000 Subject: [petsc-users] Implementing the Sherman Morisson formula (low rank update) in petsc4py and FEniCS? In-Reply-To: References: Message-ID: I am not sure of everything in your email but it sounds like you want to use a "Picard" iteration to solve [K(u)?kaaT]?u=?F(u). That is solve A(u^{n}) (u^{n+1} - u^{n}) = F(u^{n}) - A(u^{n})u^{n} where A(u) = K(u) - kaaT PETSc provides code to this with SNESSetPicard() (see the manual pages) I don't know if Petsc4py has bindings for this. Adding missing python bindings is not terribly difficult and you may be able to do it yourself if this is the approach you want. Barry > On Feb 4, 2020, at 5:07 PM, Olek Niewiarowski wrote: > > Hello, > I am a FEniCS user but new to petsc4py. I am trying to modify the KSP solver through the SNES object to implement the Sherman-Morrison formula(e.g. http://fourier.eng.hmc.edu/e176/lectures/algebra/node6.html ). I am solving a nonlinear system of the form [K(u)?kaaT]?u=?F(u). Here the jacobian matrix K is modified by the term kaaT, where k is a scalar. Notably, K is sparse, while the term kaaT results in a full matrix. This problem can be solved efficiently using the Sherman-Morrison formula : > > [K?kaaT]-1 = K-1 - (kK-1 aaTK-1)/(1+kaTK-1a) > I have managed to successfully implement this at the linear solve level (by modifying the KSP solver) inside a custom Newton solver in python by following an incomplete tutorial at https://www.firedrakeproject.org/petsc-interface.html#defining-a-preconditioner : > ? while (norm(delU) > alpha): # while not converged > ? > ? self.update_F() # call to method to update r.h.s form > ? self.update_K() # call to update the jacobian form > ? K = assemble(self.K) # assemble the jacobian matrix > ? F = assemble(self.F) # assemble the r.h.s vector > ? a = assemble(self.a_form) # assemble the a_form (see Sherman Morrison formula) > ? > ? for bc in self.mem.bc: # apply boundary conditions > ? bc.apply(K, F) > ? bc.apply(K, a) > ? > ? B = PETSc.Mat().create() > ? > ? # Assemble the bilinear form that defines A and get the concrete > ? # PETSc matrix > ? A = as_backend_type(K).mat() # get the PETSc objects for K and a > ? u = as_backend_type(a).vec() > ? > ? # Build the matrix "context" # see firedrake docs > ? Bctx = MatrixFreeB(A, u, u, self.k) > ? > ? # Set up B > ? # B is the same size as A > ? B.setSizes(*A.getSizes()) > ? B.setType(B.Type.PYTHON) > ? B.setPythonContext(Bctx) > ? B.setUp() > ? > ? > ? ksp = PETSc.KSP().create() # create the KSP linear solver object > ? ksp.setOperators(B) > ? ksp.setUp() > ? pc = ksp.pc > ? pc.setType(pc.Type.PYTHON) > ? pc.setPythonContext(MatrixFreePC()) > ? ksp.setFromOptions() > ? > ? solution = delU # the incremental displacement at this iteration > ? > ? b = as_backend_type(-F).vec() > ? delu = solution.vector().vec() > ? > ? ksp.solve(b, delu) > > ? self.mem.u.vector().axpy(0.25, self.delU.vector()) # poor man's linesearch > ? counter += 1 > Here is the corresponding petsc4py code adapted from the firedrake docs: > > ? class MatrixFreePC(object): > ? > ? def setUp(self, pc): > ? B, P = pc.getOperators() > ? # extract the MatrixFreeB object from B > ? ctx = B.getPythonContext() > ? self.A = ctx.A > ? self.u = ctx.u > ? self.v = ctx.v > ? self.k = ctx.k > ? # Here we build the PC object that uses the concrete, > ? # assembled matrix A. We will use this to apply the action > ? # of A^{-1} > ? self.pc = PETSc.PC().create() > ? self.pc.setOptionsPrefix("mf_") > ? self.pc.setOperators(self.A) > ? self.pc.setFromOptions() > ? # Since u and v do not change, we can build the denominator > ? # and the action of A^{-1} on u only once, in the setup > ? # phase. > ? tmp = self.A.createVecLeft() > ? self.pc.apply(self.u, tmp) > ? self._Ainvu = tmp > ? self._denom = 1 + self.k*self.v.dot(self._Ainvu) > ? > ? def apply(self, pc, x, y): > ? # y <- A^{-1}x > ? self.pc.apply(x, y) > ? # alpha <- (v^T A^{-1} x) / (1 + v^T A^{-1} u) > ? alpha = (self.k*self.v.dot(y)) / self._denom > ? # y <- y - alpha * A^{-1}u > ? y.axpy(-alpha, self._Ainvu) > ? > ? > ? class MatrixFreeB(object): > ? > ? def __init__(self, A, u, v, k): > ? self.A = A > ? self.u = u > ? self.v = v > ? self.k = k > ? > ? def mult(self, mat, x, y): > ? # y <- A x > ? self.A.mult(x, y) > ? > ? # alpha <- v^T x > ? alpha = self.v.dot(x) > ? > ? # y <- y + alpha*u > ? y.axpy(alpha, self.u) > However, this approach is not efficient as it requires many iterations due to the Newton step being fixed, so I would like to implement it using SNES and use line search. Unfortunately, I have not been able to find any documentation/tutorial on how to do so. Provided I have the FEniCS forms for F, K, and a, I'd like to do something along the lines of: > solver = PETScSNESSolver() # the FEniCS SNES wrapper > snes = solver.snes() # the petsc4py SNES object > ## ?? > ksp = snes.getKSP() > # set ksp option similar to above > solver.solve() > > I would be very grateful if anyone could could help or point me to a reference or demo that does something similar (or maybe a completely different way of solving the problem!). > Many thanks in advance! > Alex From dong-hao at outlook.com Wed Feb 5 04:36:26 2020 From: dong-hao at outlook.com (Hao DONG) Date: Wed, 5 Feb 2020 10:36:26 +0000 Subject: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC? In-Reply-To: <264F91C4-8558-4850-9B4B-ABE4123C2A2C@anl.gov> References: , <264F91C4-8558-4850-9B4B-ABE4123C2A2C@anl.gov> Message-ID: Thanks a lot for your suggestions, Hong and Barry - As you suggested, I first tried the LU direct solvers (built-in and MUMPs) out this morning, which work perfectly, albeit slow. As my problem itself is a part of a PDE based optimization, the exact solution in the searching procedure is not necessary (I often set a relative tolerance of 1E-7/1E-8, etc. for Krylov subspace methods). Also tried BJACOBI with exact LU, the KSP just converges in one or two iterations, as expected. I added -kspview option for the D-ILU test (still with Block Jacobi as preconditioner and bcgs as KSP solver). The KSPview output from one of the examples in a toy model looks like: KSP Object: 1 MPI processes type: bcgs maximum iterations=120, nonzero initial guess tolerances: relative=1e-07, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: 1 MPI processes type: bjacobi number of blocks = 3 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (sub_) 1 MPI processes type: ilu out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1., needed 1. Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=11294, cols=11294 package used to perform factorization: petsc total: nonzeros=76008, allocated nonzeros=76008 total number of mallocs used during MatSetValues calls=0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=11294, cols=11294 total: nonzeros=76008, allocated nonzeros=76008 total number of mallocs used during MatSetValues calls=0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: mpiaij rows=33880, cols=33880 total: nonzeros=436968, allocated nonzeros=436968 total number of mallocs used during MatSetValues calls=0 not using I-node (on process 0) routines do you see something wrong with my setup? I also tried a quick performance test with a small 278906 by 278906 matrix (3850990 nnz) with the following parameters: -ksp_type bcgs -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type ilu -ksp_view Reducing the relative residual to 1E-7 Took 4.08s with 41 bcgs iterations. Merely changing the -pc_bjacobi_local_blocks to 6 Took 7.02s with 73 bcgs iterations. 9 blocks would further take 9.45s with 101 bcgs iterations. As a reference, my home-brew Fortran code solves the same problem (3-block D-ILU0) in 1.84s with 24 bcgs iterations (the bcgs code is also a home-brew one)? Well, by saying ?use explicit L/U matrix as preconditioner?, I wonder if a user is allowed to provide his own (separate) L and U Mat for preconditioning (like how it works in Matlab solvers), e.g. x = qmr(A,b,Tol,MaxIter,L,U,x) As I already explicitly constructed the L and U matrices in Fortran, it is not hard to convert them to Mat in Petsc to test Petsc and my Fortran code head-to-head. In that case, the A, b, x, and L/U are all identical, it would be easier to see where the problem came from. BTW, there is another thing I wondered - is there a way to output residual in unpreconditioned norm? I tried to call KSPSetNormType(ksp_local, KSP_NORM_UNPRECONDITIONED, ierr) But always get an error that current ksp does not support unpreconditioned in LEFT/RIGHT (either way). Is it possible to do that (output unpreconditioned residual) in PETSc at all? Cheers, Hao ________________________________ From: Smith, Barry F. Sent: Tuesday, February 4, 2020 8:27 PM To: Hao DONG Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC? > On Feb 4, 2020, at 12:41 PM, Hao DONG wrote: > > Dear all, > > > I have a few questions about the implementation of diagonal ILU PC in PETSc. I want to solve a very simple system with KSP (in parallel), the nature of the system (finite difference time-harmonic Maxwell) is probably not important to the question itself. Long story short, I just need to solve a set of equations of Ax = b with a block diagonal system matrix, like (not sure if the mono font works): > > |X | > A =| Y | > | Z| > > Note that A is not really block-diagonal, it?s just a multi-diagonal matrix determined by the FD mesh, where most elements are close to diagonal. So instead of a full ILU decomposition, a D-ILU is a good approximation as a preconditioner, and the number of blocks should not matter too much: > > |Lx | |Ux | > L = | Ly | and U = | Uy | > | Lz| | Uz| > > Where [Lx, Ux] = ILU0(X), etc. Indeed, the D-ILU preconditioner (with 3N blocks) is quite efficient with Krylov subspace methods like BiCGstab or QMR in my sequential Fortran/Matlab code. > > So like most users, I am looking for a parallel implement with this problem in PETSc. After looking through the manual, it seems that the most straightforward way to do it is through PCBJACOBI. Not sure I understand it right, I just setup a 3-block PCJACOBI and give each of the block a KSP with PCILU. Is this supposed to be equivalent to my D-ILU preconditioner? My little block of fortran code would look like: > ... > call PCBJacobiSetTotalBlocks(pc_local,Ntotal, & > & isubs,ierr) > call PCBJacobiSetLocalBlocks(pc_local, Nsub, & > & isubs(istart:iend),ierr) > ! set up the block jacobi structure > call KSPSetup(ksp_local,ierr) > ! allocate sub ksps > allocate(ksp_sub(Nsub)) > call PCBJacobiGetSubKSP(pc_local,Nsub,istart, & > & ksp_sub,ierr) > do i=1,Nsub > call KSPGetPC(ksp_sub(i),pc_sub,ierr) > !ILU preconditioner > call PCSetType(pc_sub,ptype,ierr) > call PCFactorSetLevels(pc_sub,1,ierr) ! use ILU(1) here > call KSPSetType(ksp_sub(i),KSPPREONLY,ierr)] > end do > call KSPSetTolerances(ksp_local,KSSiter%tol,PETSC_DEFAULT_REAL, & > & PETSC_DEFAULT_REAL,KSSiter%maxit,ierr) > ? This code looks essentially right. You should call with -ksp_view to see exactly what PETSc is using for a solver. > > I understand that the parallel performance may not be comparable, so I first set up a one-process test (with MPIAij, but all the rows are local since there is only one process). The system is solved without any problem (identical results within error). But the performance is actually a lot worse (code built without debugging flags in performance tests) than my own home-brew implementation in Fortran (I wrote my own ILU0 in CSR sparse matrix format), which is hard to believe. I suspect the difference is from the PC as the PETSc version took much more BiCGstab iterations (60-ish vs 100-ish) to converge to the same relative tol. PETSc uses GMRES by default with a restart of 30 and left preconditioning. It could be different exact choices in the solver (which is why -ksp_view is so useful) can explain the differences in the runs between your code and PETSc's > > This is further confirmed when I change the setup of D-ILU (using 6 or 9 blocks instead of 3). While my Fortran/Matlab codes see minimal performance difference (<5%) when I play with the D-ILU setup, increasing the number of D-ILU blocks from 3 to 9 caused the ksp setup with PCBJACOBI to suffer a performance decrease of more than 50% in sequential test. This is odd, the more blocks the smaller each block so the quicker the ILU set up should be. You can run various cases with -log_view and send us the output to see what is happening at each part of the computation in time. > So my implementation IS somewhat different in PETSc. Do I miss something in the PCBJACOBI setup? Or do I have some fundamental misunderstanding of how PCBJACOBI works in PETSc? Probably not. > > If this is not the right way to implement a block diagonal ILU as (parallel) PC, please kindly point me to the right direction. I searched through the mail list to find some answers, only to find a couple of similar questions... An example would be nice. You approach seems fundamentally right but I cannot be sure of possible bugs. > > On the other hand, does PETSc support a simple way to use explicit L/U matrix as a preconditioner? I can import the D-ILU matrix (I already converted my A matrix into Mat) constructed in my Fortran code to make a better comparison. Or do I have to construct my own PC using PCshell? If so, is there a good tutorial/example to learn about how to use PCSHELL (in a more advanced sense, like how to setup pc side and transpose)? Not sure what you mean by explicit L/U matrix as a preconditioner. As Hong said, yes you can use a parallel LU from MUMPS or SuperLU_DIST or Pastix as the solver. You do not need any shell code. You simply need to set the PCType to lu You can also set all this options from the command line and don't need to write the code specifically. So call KSPSetFromOptions() and then for example -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type ilu (this last one is applied to each block so you could use -pc_type lu and it would use lu on each block.) -ksp_type_none -pc_type lu -pc_factor_mat_solver_type mumps (do parallel LU with mumps) By not hardwiring in the code and just using options you can test out different cases much quicker Use -ksp_view to make sure that is using the solver the way you expect. Barry Barry > > Thanks in advance, > > Hao -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Feb 5 07:35:27 2020 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 5 Feb 2020 08:35:27 -0500 Subject: [petsc-users] Implementing the Sherman Morisson formula (low rank update) in petsc4py and FEniCS? In-Reply-To: References: Message-ID: Perhaps Barry is right that you want Picard, but suppose you really want Newton. "This problem can be solved efficiently using the Sherman-Morrison formula" Well, maybe. The main assumption here is that inverting K is cheap. I see two things you can do in a straightforward way: 1) Use MatCreateLRC() https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateLRC.html to create the Jacobian and solve using an iterative method. If you pass just K was the preconditioning matrix, you can use common PCs. 2) We only implemented MatMult() for LRC, but you could stick your SMW code in for MatSolve_LRC if you really want to factor K. We would of course help you do this. Thanks, Matt On Wed, Feb 5, 2020 at 1:36 AM Smith, Barry F. via petsc-users < petsc-users at mcs.anl.gov> wrote: > > I am not sure of everything in your email but it sounds like you want > to use a "Picard" iteration to solve [K(u)?kaaT]?u=?F(u). That is solve > > A(u^{n}) (u^{n+1} - u^{n}) = F(u^{n}) - A(u^{n})u^{n} where A(u) = K(u) > - kaaT > > PETSc provides code to this with SNESSetPicard() (see the manual pages) I > don't know if Petsc4py has bindings for this. > > Adding missing python bindings is not terribly difficult and you may be > able to do it yourself if this is the approach you want. > > Barry > > > > > On Feb 4, 2020, at 5:07 PM, Olek Niewiarowski > wrote: > > > > Hello, > > I am a FEniCS user but new to petsc4py. I am trying to modify the KSP > solver through the SNES object to implement the Sherman-Morrison > formula(e.g. http://fourier.eng.hmc.edu/e176/lectures/algebra/node6.html > ). I am solving a nonlinear system of the form [K(u)?kaaT]?u=?F(u). Here > the jacobian matrix K is modified by the term kaaT, where k is a scalar. > Notably, K is sparse, while the term kaaT results in a full matrix. This > problem can be solved efficiently using the Sherman-Morrison formula : > > > > [K?kaaT]-1 = K-1 - (kK-1 aaTK-1)/(1+kaTK-1a) > > I have managed to successfully implement this at the linear solve level > (by modifying the KSP solver) inside a custom Newton solver in python by > following an incomplete tutorial at > https://www.firedrakeproject.org/petsc-interface.html#defining-a-preconditioner > : > > ? while (norm(delU) > alpha): # while not converged > > ? > > ? self.update_F() # call to method to update r.h.s form > > ? self.update_K() # call to update the jacobian form > > ? K = assemble(self.K) # assemble the jacobian matrix > > ? F = assemble(self.F) # assemble the r.h.s vector > > ? a = assemble(self.a_form) # assemble the a_form (see > Sherman Morrison formula) > > ? > > ? for bc in self.mem.bc: # apply boundary conditions > > ? bc.apply(K, F) > > ? bc.apply(K, a) > > ? > > ? B = PETSc.Mat().create() > > ? > > ? # Assemble the bilinear form that defines A and get > the concrete > > ? # PETSc matrix > > ? A = as_backend_type(K).mat() # get the PETSc objects > for K and a > > ? u = as_backend_type(a).vec() > > ? > > ? # Build the matrix "context" # see firedrake docs > > ? Bctx = MatrixFreeB(A, u, u, self.k) > > ? > > ? # Set up B > > ? # B is the same size as A > > ? B.setSizes(*A.getSizes()) > > ? B.setType(B.Type.PYTHON) > > ? B.setPythonContext(Bctx) > > ? B.setUp() > > ? > > ? > > ? ksp = PETSc.KSP().create() # create the KSP linear > solver object > > ? ksp.setOperators(B) > > ? ksp.setUp() > > ? pc = ksp.pc > > ? pc.setType(pc.Type.PYTHON) > > ? pc.setPythonContext(MatrixFreePC()) > > ? ksp.setFromOptions() > > ? > > ? solution = delU # the incremental displacement at > this iteration > > ? > > ? b = as_backend_type(-F).vec() > > ? delu = solution.vector().vec() > > ? > > ? ksp.solve(b, delu) > > > > ? self.mem.u.vector().axpy(0.25, self.delU.vector()) # > poor man's linesearch > > ? counter += 1 > > Here is the corresponding petsc4py code adapted from the firedrake docs: > > > > ? class MatrixFreePC(object): > > ? > > ? def setUp(self, pc): > > ? B, P = pc.getOperators() > > ? # extract the MatrixFreeB object from B > > ? ctx = B.getPythonContext() > > ? self.A = ctx.A > > ? self.u = ctx.u > > ? self.v = ctx.v > > ? self.k = ctx.k > > ? # Here we build the PC object that uses the concrete, > > ? # assembled matrix A. We will use this to apply the > action > > ? # of A^{-1} > > ? self.pc = PETSc.PC().create() > > ? self.pc.setOptionsPrefix("mf_") > > ? self.pc.setOperators(self.A) > > ? self.pc.setFromOptions() > > ? # Since u and v do not change, we can build the > denominator > > ? # and the action of A^{-1} on u only once, in the setup > > ? # phase. > > ? tmp = self.A.createVecLeft() > > ? self.pc.apply(self.u, tmp) > > ? self._Ainvu = tmp > > ? self._denom = 1 + self.k*self.v.dot(self._Ainvu) > > ? > > ? def apply(self, pc, x, y): > > ? # y <- A^{-1}x > > ? self.pc.apply(x, y) > > ? # alpha <- (v^T A^{-1} x) / (1 + v^T A^{-1} u) > > ? alpha = (self.k*self.v.dot(y)) / self._denom > > ? # y <- y - alpha * A^{-1}u > > ? y.axpy(-alpha, self._Ainvu) > > ? > > ? > > ? class MatrixFreeB(object): > > ? > > ? def __init__(self, A, u, v, k): > > ? self.A = A > > ? self.u = u > > ? self.v = v > > ? self.k = k > > ? > > ? def mult(self, mat, x, y): > > ? # y <- A x > > ? self.A.mult(x, y) > > ? > > ? # alpha <- v^T x > > ? alpha = self.v.dot(x) > > ? > > ? # y <- y + alpha*u > > ? y.axpy(alpha, self.u) > > However, this approach is not efficient as it requires many iterations > due to the Newton step being fixed, so I would like to implement it using > SNES and use line search. Unfortunately, I have not been able to find any > documentation/tutorial on how to do so. Provided I have the FEniCS forms > for F, K, and a, I'd like to do something along the lines of: > > solver = PETScSNESSolver() # the FEniCS SNES wrapper > > snes = solver.snes() # the petsc4py SNES object > > ## ?? > > ksp = snes.getKSP() > > # set ksp option similar to above > > solver.solve() > > > > I would be very grateful if anyone could could help or point me to a > reference or demo that does something similar (or maybe a completely > different way of solving the problem!). > > Many thanks in advance! > > Alex > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmitry.melnichuk at geosteertech.com Wed Feb 5 09:03:35 2020 From: dmitry.melnichuk at geosteertech.com (=?utf-8?B?0JTQvNC40YLRgNC40Lkg0JzQtdC70YzQvdC40YfRg9C6?=) Date: Wed, 05 Feb 2020 18:03:35 +0300 Subject: [petsc-users] Triple increasing of allocated memory during KSPSolve calling(GMRES preconditioned by ASM) In-Reply-To: <72B6812D-4131-41A6-9A30-878D2F9058D8@mcs.anl.gov> References: <9444561580744337@myt2-b8bf7a4d4ebc.qloud-c.yandex.net> <86DE2602-DCBD-47C4-819C-223F9CF9A503@mcs.anl.gov> <14855941580817898@vla4-d1c3bcedfacb.qloud-c.yandex.net> <72B6812D-4131-41A6-9A30-878D2F9058D8@mcs.anl.gov> Message-ID: <18751580915015@vla4-4046ec513d04.qloud-c.yandex.net> An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ksp_view.txt URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PDE.JPG Type: image/jpeg Size: 108816 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Zheng_poroelasticity.pdf Type: application/pdf Size: 3466224 bytes Desc: not available URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: full_log_ASM_factor_in_place.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: Error_gmres_sor.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: Error_ilu_pc_factor.txt URL: From knepley at gmail.com Wed Feb 5 09:46:29 2020 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 5 Feb 2020 10:46:29 -0500 Subject: [petsc-users] Triple increasing of allocated memory during KSPSolve calling(GMRES preconditioned by ASM) In-Reply-To: <18751580915015@vla4-4046ec513d04.qloud-c.yandex.net> References: <9444561580744337@myt2-b8bf7a4d4ebc.qloud-c.yandex.net> <86DE2602-DCBD-47C4-819C-223F9CF9A503@mcs.anl.gov> <14855941580817898@vla4-d1c3bcedfacb.qloud-c.yandex.net> <72B6812D-4131-41A6-9A30-878D2F9058D8@mcs.anl.gov> <18751580915015@vla4-4046ec513d04.qloud-c.yandex.net> Message-ID: On Wed, Feb 5, 2020 at 10:04 AM ??????? ????????? < dmitry.melnichuk at geosteertech.com> wrote: > Barry, appreciate your response, as always. > > - You are saying that I am using ASM + ILU(0). However, I use PETSc only > with "ASM" as the input parameter for preconditioner. Does it mean that > ILU(0) is default sub-preconditioner for ASM? > Yes. > Can I change it using the option "-sub_pc_type"? > Yes Does it make sense to you within the scope of my general goal, which is > memory consumption decrease? Can it be useful to vary the "-sub_ksp_type" > option? > Yes. For example, try measuring the memory usage with -sub_pc_type jacobi > - I have run the computation for the same initial matrix with the > "-sub_pc_factor_in_place" option, PC = ASM. Now the process consumed ~400 > MB comparing to 550 MB without this option. > I used "-ksp_view" for this computation, two logs for this computation are > attached: > "*ksp_view.txt" - *ksp_view option only > *"full_log_ASM_factor_in_place.txt"* - full log without ksp_view option > > - Then I changed primary preconditioner from ASM to ILU(0) and ran the > computation: memory consumption was again about ~400 MB, no matter if I use > the "-sub_pc_factor_in_place" option. > > - Then I tried to run the computation with ILU(0) and > "-pc_factor_in_place", just in case: the computation did not start, I got > an error message, the log is attached:* "Error_ilu_pc_factor.txt"* > > - Then I ran the computation with SOR as a preconditioner. PETSc gave me > an error message, the log is attached: *"Error_gmres_sor.txt"* > > - As for the kind of PDEs: I am solving the standard poroelasticity > problem, the formulation can be found in the attached paper > (Zheng_poroelasticity.pdf), pages 2-3. > The file PDE.jpg is a snapshot of PDEs from this paper. > Proelasticity is elliptic (the kind that I am familiar with), so I would at least try Algebraic Multigrid, either GAMG, or ML, or Hypre (probably try all of them). Thanks, Matt > > So, if you may give me any further advice on how to decrease the consumed > amount of memory to approximately the matrix size (~200 MB in this case), > it will be great. Do I need to focus on searching a proper preconditioner? > BTW, the single ILU(0) did not give me any memory advantage comparing to > ASM with "-sub_pc_factor_in_place". > > Have a pleasant day! > > Kind regards, > Dmitry > > > > 04.02.2020, 19:04, "Smith, Barry F." : > > > Please run with the option -ksp_view so we know the exact solver > options you are using. > > From the lines > > MatCreateSubMats 1 1.0 1.9397e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 > 0 0 0 0 0 0 0 0 0 0 0 > MatGetOrdering 1 1.0 1.1066e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 > 0 0 0 0 0 0 0 0 0 0 > MatIncreaseOvrlp 1 1.0 3.0324e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 > 0 0 0 0 0 0 0 0 0 0 0 > > and the fact you have three matrices I would guess you are using the > additive Schwarz preconditioner (ASM) with ILU(0) on the blocks. (which > converges the same as ILU on one process but does use much more memory). > > Note: your code is still built with 32 bit integers. > > I would guess the basic matrix formed plus the vectors in this example > could take ~200 MB. It is the two matrices in the additive Schwarz that is > taking the additional memory. > > What kind of PDEs are you solving and what kind of formulation? > > ASM plus ILU is the "work mans" type preconditioner, relatively robust > but not particularly fast for convergence. Depending on your problem you > might be able to do much better convergence wise by using a PCFIELDSPLIT > and a PCGAMG on one of the splits. In your own run you see the ILU is > chugging along rather slowly to the solution. > > With your current solvers you can use the option > -sub_pc_factor_in_place which will shave off one of the matrices memories. > Please try that. > > Avoiding the ASM you can avoid both extra matrices but at the cost of > even slower convergence. Use, for example -pc_type sor > > > The petroleum industry also has a variety of "custom" > preconditioners/solvers for particular models and formulations that can > beat the convergence of general purpose solvers; and require less memory. > Some of these can be implemented or simulated with PETSc. Some of these are > implemented in the commercial petroleum simulation codes and it can be > difficult to get a handle on exactly what they do because of proprietary > issues. I think I have an old text on these approaches in my office, there > may be modern books that discuss these. > > > Barry > > > > > On Feb 4, 2020, at 6:04 AM, ??????? ????????? < > dmitry.melnichuk at geosteertech.com> wrote: > > Hello again! > Thank you very much for your replies! > Log is attached. > > 1. The main problem now is following. To solve the matrix that is > attached to my previous e-mail PETSc consumes ~550 MB. > I know definitely that there are commercial softwares in petroleum > industry (e.g., Schlumberger Petrel) that solve the same initial problem > consuming only ~200 MB. > Moreover, I am sure that when I used 32-bit PETSc (GMRES + ASM) a year > ago, it also consumed ~200 MB for this matrix. > > So, my question is: do you have any advice on how to decrease the amount > of RAM consumed for such matrix from 550 MB to 200 MB? Maybe some specific > preconditioner or other ways? > > I will be very grateful for any thoughts! > > 2. The second problem is more particular. > According to resource manager in Windows 10, Fortran solver based on > PETSc consumes 548 MB RAM while solving the system of linear equations. > As I understand it form logs, it is required 459 MB and 52 MB for matrix > and vector storage respectively. After summing of all objects for which > memory is allocated we get only 517 MB. > > Thank you again for your time! Have a nice day. > > Kind regards, > Dmitry > > > 03.02.2020, 19:55, "Smith, Barry F." : > > GMRES also can by default require about 35 work vectors if it reaches > the full restart. You can set a smaller restart with -ksp_gmres_restart 15 > for example but this can also hurt the convergence of GMRES dramatically. > People sometimes use the KSPBCGS algorithm since it does not require all > the restart vectors but it can also converge more slowly. > > Depending on how much memory the sparse matrices use relative to the > vectors the vector memory may matter or not. > > If you are using a recent version of PETSc you can run with -log_view > -log_view_memory and it will show on the right side of the columns how much > memory is being allocated for each of the operations in various ways. > > Barry > > > > On Feb 3, 2020, at 10:34 AM, Matthew Knepley wrote: > > On Mon, Feb 3, 2020 at 10:38 AM ??????? ????????? < > dmitry.melnichuk at geosteertech.com> wrote: > Hello all! > > Now I am faced with a problem associated with the memory allocation when > calling of KSPSolve . > > GMRES preconditioned by ASM for solving linear algebra system (obtained > by the finite element spatial discretisation of Biot poroelasticity model) > was chosen. > According to the output value of PetscMallocGetCurrentUsage subroutine > 176 MB for matrix and RHS vector storage is required (before KSPSolve > calling). > But during solving linear algebra system 543 MB of RAM is required > (during KSPSolve calling). > Thus, the amount of allocated memory after preconditioning stage > increased three times. This kind of behaviour is critically for 3D models > with several millions of cells. > > 1) In order to know anything, we have to see the output of -ksp_view, > although I see you used an overlap of 2 > > 2) The overlap increases the size of submatrices beyond that of the > original matrix. Suppose that you used LU for the sub-preconditioner. > You would need at least 2x memory (with ILU(0)) since the matrix > dominates memory usage. Moreover, you have overlap > and you might have fill-in depending on the solver. > > 3) The massif tool from valgrind is a good fine-grained way to look at > memory allocation > > Thanks, > > Matt > > Is there a way to decrease amout of allocated memory? > Is that an expected behaviour for GMRES-ASM combination? > > As I remember, using previous version of PETSc didn't demonstrate so > significante memory increasing. > > ... > Vec :: Vec_F, Vec_U > Mat :: Mat_K > ... > ... > call MatAssemblyBegin(Mat_M,Mat_Final_Assembly,ierr) > call MatAssemblyEnd(Mat_M,Mat_Final_Assembly,ierr) > .... > call VecAssemblyBegin(Vec_F_mod,ierr) > call VecAssemblyEnd(Vec_F_mod,ierr) > ... > ... > call PetscMallocGetCurrentUsage(mem, ierr) > print *,"Memory used: ",mem > ... > ... > call KSPSetType(Krylov,KSPGMRES,ierr) > call KSPGetPC(Krylov,PreCon,ierr) > call PCSetType(PreCon,PCASM,ierr) > call KSPSetFromOptions(Krylov,ierr) > ... > call KSPSolve(Krylov,Vec_F,Vec_U,ierr) > ... > ... > options = "-pc_asm_overlap 2 -pc_asm_type basic -ksp_monitor > -ksp_converged_reason" > > > Kind regards, > Dmitry Melnichuk > Matrix.dat (265288024) > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Feb 5 14:16:32 2020 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Wed, 5 Feb 2020 20:16:32 +0000 Subject: [petsc-users] Triple increasing of allocated memory during KSPSolve calling(GMRES preconditioned by ASM) In-Reply-To: <18751580915015@vla4-4046ec513d04.qloud-c.yandex.net> References: <9444561580744337@myt2-b8bf7a4d4ebc.qloud-c.yandex.net> <86DE2602-DCBD-47C4-819C-223F9CF9A503@mcs.anl.gov> <14855941580817898@vla4-d1c3bcedfacb.qloud-c.yandex.net> <72B6812D-4131-41A6-9A30-878D2F9058D8@mcs.anl.gov> <18751580915015@vla4-4046ec513d04.qloud-c.yandex.net> Message-ID: <6E2C0944-0849-461D-AA7C-A8120A2FCCD3@mcs.anl.gov> > On Feb 5, 2020, at 9:03 AM, ??????? ????????? wrote: > > Barry, appreciate your response, as always. > > - You are saying that I am using ASM + ILU(0). However, I use PETSc only with "ASM" as the input parameter for preconditioner. Does it mean that ILU(0) is default sub-preconditioner for ASM? Yes > Can I change it using the option "-sub_pc_type"? Yes -sub_pc_type for then it will use SOR on each block instead of ILU saves a matrix. > Does it make sense to you within the scope of my general goal, which is memory consumption decrease? Can it be useful to vary the "-sub_ksp_type" option? Probably not. > > - I have run the computation for the same initial matrix with the "-sub_pc_factor_in_place" option, PC = ASM. Now the process consumed ~400 MB comparing to 550 MB without this option. This is what I expected, good. > I used "-ksp_view" for this computation, two logs for this computation are attached: > "ksp_view.txt" - ksp_view option only > "full_log_ASM_factor_in_place.txt" - full log without ksp_view option > > - Then I changed primary preconditioner from ASM to ILU(0) and ran the computation: memory consumption was again about ~400 MB, no matter if I use the "-sub_pc_factor_in_place" option. > > - Then I tried to run the computation with ILU(0) and "-pc_factor_in_place", just in case: the computation did not start, I got an error message, the log is attached: "Error_ilu_pc_factor.txt" Since that matrix is used for the MatMuilt you cannot do the factorization in place since it replaces the original matrix entries with the factorization matrix entries > > - Then I ran the computation with SOR as a preconditioner. PETSc gave me an error message, the log is attached: "Error_gmres_sor.txt" This is because our SOR cannot handle zeros on the diagonal. > > - As for the kind of PDEs: I am solving the standard poroelasticity problem, the formulation can be found in the attached paper (Zheng_poroelasticity.pdf), pages 2-3. > The file PDE.jpg is a snapshot of PDEs from this paper. > > > So, if you may give me any further advice on how to decrease the consumed amount of memory to approximately the matrix size (~200 MB in this case), it will be great. Do I need to focus on searching a proper preconditioner? BTW, the single ILU(0) did not give me any memory advantage comparing to ASM with "-sub_pc_factor_in_place". Yes, because in both cases you need two copies of the matrix, for the multiple and for the ILU. But you want a preconditioner that doesn't require any new matrices in the preconditioner. This is difficult. You want an efficient preconditioner that requires essentially no additional memory? -ksp_type gmres or bcgs -pc_type jacobi (the sor won't work because the zero diagonals) It will not be good preconditioner. Are you sure you don't have additional memory for the preconditioner? A good preconditioner might require up to 5 to 6 the memory of the original matrix. > > Have a pleasant day! > > Kind regards, > Dmitry > > > > 04.02.2020, 19:04, "Smith, Barry F." : > > Please run with the option -ksp_view so we know the exact solver options you are using. > > From the lines > > MatCreateSubMats 1 1.0 1.9397e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatGetOrdering 1 1.0 1.1066e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatIncreaseOvrlp 1 1.0 3.0324e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > and the fact you have three matrices I would guess you are using the additive Schwarz preconditioner (ASM) with ILU(0) on the blocks. (which converges the same as ILU on one process but does use much more memory). > > Note: your code is still built with 32 bit integers. > > I would guess the basic matrix formed plus the vectors in this example could take ~200 MB. It is the two matrices in the additive Schwarz that is taking the additional memory. > > What kind of PDEs are you solving and what kind of formulation? > > ASM plus ILU is the "work mans" type preconditioner, relatively robust but not particularly fast for convergence. Depending on your problem you might be able to do much better convergence wise by using a PCFIELDSPLIT and a PCGAMG on one of the splits. In your own run you see the ILU is chugging along rather slowly to the solution. > > With your current solvers you can use the option -sub_pc_factor_in_place which will shave off one of the matrices memories. Please try that. > > Avoiding the ASM you can avoid both extra matrices but at the cost of even slower convergence. Use, for example -pc_type sor > > > The petroleum industry also has a variety of "custom" preconditioners/solvers for particular models and formulations that can beat the convergence of general purpose solvers; and require less memory. Some of these can be implemented or simulated with PETSc. Some of these are implemented in the commercial petroleum simulation codes and it can be difficult to get a handle on exactly what they do because of proprietary issues. I think I have an old text on these approaches in my office, there may be modern books that discuss these. > > > Barry > > > > > On Feb 4, 2020, at 6:04 AM, ??????? ????????? wrote: > > Hello again! > Thank you very much for your replies! > Log is attached. > > 1. The main problem now is following. To solve the matrix that is attached to my previous e-mail PETSc consumes ~550 MB. > I know definitely that there are commercial softwares in petroleum industry (e.g., Schlumberger Petrel) that solve the same initial problem consuming only ~200 MB. > Moreover, I am sure that when I used 32-bit PETSc (GMRES + ASM) a year ago, it also consumed ~200 MB for this matrix. > > So, my question is: do you have any advice on how to decrease the amount of RAM consumed for such matrix from 550 MB to 200 MB? Maybe some specific preconditioner or other ways? > > I will be very grateful for any thoughts! > > 2. The second problem is more particular. > According to resource manager in Windows 10, Fortran solver based on PETSc consumes 548 MB RAM while solving the system of linear equations. > As I understand it form logs, it is required 459 MB and 52 MB for matrix and vector storage respectively. After summing of all objects for which memory is allocated we get only 517 MB. > > Thank you again for your time! Have a nice day. > > Kind regards, > Dmitry > > > 03.02.2020, 19:55, "Smith, Barry F." : > > GMRES also can by default require about 35 work vectors if it reaches the full restart. You can set a smaller restart with -ksp_gmres_restart 15 for example but this can also hurt the convergence of GMRES dramatically. People sometimes use the KSPBCGS algorithm since it does not require all the restart vectors but it can also converge more slowly. > > Depending on how much memory the sparse matrices use relative to the vectors the vector memory may matter or not. > > If you are using a recent version of PETSc you can run with -log_view -log_view_memory and it will show on the right side of the columns how much memory is being allocated for each of the operations in various ways. > > Barry > > > > On Feb 3, 2020, at 10:34 AM, Matthew Knepley wrote: > > On Mon, Feb 3, 2020 at 10:38 AM ??????? ????????? wrote: > Hello all! > > Now I am faced with a problem associated with the memory allocation when calling of KSPSolve . > > GMRES preconditioned by ASM for solving linear algebra system (obtained by the finite element spatial discretisation of Biot poroelasticity model) was chosen. > According to the output value of PetscMallocGetCurrentUsage subroutine 176 MB for matrix and RHS vector storage is required (before KSPSolve calling). > But during solving linear algebra system 543 MB of RAM is required (during KSPSolve calling). > Thus, the amount of allocated memory after preconditioning stage increased three times. This kind of behaviour is critically for 3D models with several millions of cells. > > 1) In order to know anything, we have to see the output of -ksp_view, although I see you used an overlap of 2 > > 2) The overlap increases the size of submatrices beyond that of the original matrix. Suppose that you used LU for the sub-preconditioner. > You would need at least 2x memory (with ILU(0)) since the matrix dominates memory usage. Moreover, you have overlap > and you might have fill-in depending on the solver. > > 3) The massif tool from valgrind is a good fine-grained way to look at memory allocation > > Thanks, > > Matt > > Is there a way to decrease amout of allocated memory? > Is that an expected behaviour for GMRES-ASM combination? > > As I remember, using previous version of PETSc didn't demonstrate so significante memory increasing. > > ... > Vec :: Vec_F, Vec_U > Mat :: Mat_K > ... > ... > call MatAssemblyBegin(Mat_M,Mat_Final_Assembly,ierr) > call MatAssemblyEnd(Mat_M,Mat_Final_Assembly,ierr) > .... > call VecAssemblyBegin(Vec_F_mod,ierr) > call VecAssemblyEnd(Vec_F_mod,ierr) > ... > ... > call PetscMallocGetCurrentUsage(mem, ierr) > print *,"Memory used: ",mem > ... > ... > call KSPSetType(Krylov,KSPGMRES,ierr) > call KSPGetPC(Krylov,PreCon,ierr) > call PCSetType(PreCon,PCASM,ierr) > call KSPSetFromOptions(Krylov,ierr) > ... > call KSPSolve(Krylov,Vec_F,Vec_U,ierr) > ... > ... > options = "-pc_asm_overlap 2 -pc_asm_type basic -ksp_monitor -ksp_converged_reason" > > > Kind regards, > Dmitry Melnichuk > Matrix.dat (265288024) > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > > From bsmith at mcs.anl.gov Wed Feb 5 14:46:43 2020 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Wed, 5 Feb 2020 20:46:43 +0000 Subject: [petsc-users] Implementing the Sherman Morisson formula (low rank update) in petsc4py and FEniCS? In-Reply-To: References: Message-ID: <20E8B18C-F71E-4B10-958B-6CD24DA869A3@mcs.anl.gov> https://gitlab.com/petsc/petsc/issues/557 > On Feb 5, 2020, at 7:35 AM, Matthew Knepley wrote: > > Perhaps Barry is right that you want Picard, but suppose you really want Newton. > > "This problem can be solved efficiently using the Sherman-Morrison formula" Well, maybe. The main assumption here is that inverting K is cheap. I see two things you can do in a straightforward way: > > 1) Use MatCreateLRC() https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateLRC.html to create the Jacobian > and solve using an iterative method. If you pass just K was the preconditioning matrix, you can use common PCs. > > 2) We only implemented MatMult() for LRC, but you could stick your SMW code in for MatSolve_LRC if you really want to factor K. We would > of course help you do this. > > Thanks, > > Matt > > On Wed, Feb 5, 2020 at 1:36 AM Smith, Barry F. via petsc-users wrote: > > I am not sure of everything in your email but it sounds like you want to use a "Picard" iteration to solve [K(u)?kaaT]?u=?F(u). That is solve > > A(u^{n}) (u^{n+1} - u^{n}) = F(u^{n}) - A(u^{n})u^{n} where A(u) = K(u) - kaaT > > PETSc provides code to this with SNESSetPicard() (see the manual pages) I don't know if Petsc4py has bindings for this. > > Adding missing python bindings is not terribly difficult and you may be able to do it yourself if this is the approach you want. > > Barry > > > > > On Feb 4, 2020, at 5:07 PM, Olek Niewiarowski wrote: > > > > Hello, > > I am a FEniCS user but new to petsc4py. I am trying to modify the KSP solver through the SNES object to implement the Sherman-Morrison formula(e.g. http://fourier.eng.hmc.edu/e176/lectures/algebra/node6.html ). I am solving a nonlinear system of the form [K(u)?kaaT]?u=?F(u). Here the jacobian matrix K is modified by the term kaaT, where k is a scalar. Notably, K is sparse, while the term kaaT results in a full matrix. This problem can be solved efficiently using the Sherman-Morrison formula : > > > > [K?kaaT]-1 = K-1 - (kK-1 aaTK-1)/(1+kaTK-1a) > > I have managed to successfully implement this at the linear solve level (by modifying the KSP solver) inside a custom Newton solver in python by following an incomplete tutorial at https://www.firedrakeproject.org/petsc-interface.html#defining-a-preconditioner : > > ? while (norm(delU) > alpha): # while not converged > > ? > > ? self.update_F() # call to method to update r.h.s form > > ? self.update_K() # call to update the jacobian form > > ? K = assemble(self.K) # assemble the jacobian matrix > > ? F = assemble(self.F) # assemble the r.h.s vector > > ? a = assemble(self.a_form) # assemble the a_form (see Sherman Morrison formula) > > ? > > ? for bc in self.mem.bc: # apply boundary conditions > > ? bc.apply(K, F) > > ? bc.apply(K, a) > > ? > > ? B = PETSc.Mat().create() > > ? > > ? # Assemble the bilinear form that defines A and get the concrete > > ? # PETSc matrix > > ? A = as_backend_type(K).mat() # get the PETSc objects for K and a > > ? u = as_backend_type(a).vec() > > ? > > ? # Build the matrix "context" # see firedrake docs > > ? Bctx = MatrixFreeB(A, u, u, self.k) > > ? > > ? # Set up B > > ? # B is the same size as A > > ? B.setSizes(*A.getSizes()) > > ? B.setType(B.Type.PYTHON) > > ? B.setPythonContext(Bctx) > > ? B.setUp() > > ? > > ? > > ? ksp = PETSc.KSP().create() # create the KSP linear solver object > > ? ksp.setOperators(B) > > ? ksp.setUp() > > ? pc = ksp.pc > > ? pc.setType(pc.Type.PYTHON) > > ? pc.setPythonContext(MatrixFreePC()) > > ? ksp.setFromOptions() > > ? > > ? solution = delU # the incremental displacement at this iteration > > ? > > ? b = as_backend_type(-F).vec() > > ? delu = solution.vector().vec() > > ? > > ? ksp.solve(b, delu) > > > > ? self.mem.u.vector().axpy(0.25, self.delU.vector()) # poor man's linesearch > > ? counter += 1 > > Here is the corresponding petsc4py code adapted from the firedrake docs: > > > > ? class MatrixFreePC(object): > > ? > > ? def setUp(self, pc): > > ? B, P = pc.getOperators() > > ? # extract the MatrixFreeB object from B > > ? ctx = B.getPythonContext() > > ? self.A = ctx.A > > ? self.u = ctx.u > > ? self.v = ctx.v > > ? self.k = ctx.k > > ? # Here we build the PC object that uses the concrete, > > ? # assembled matrix A. We will use this to apply the action > > ? # of A^{-1} > > ? self.pc = PETSc.PC().create() > > ? self.pc.setOptionsPrefix("mf_") > > ? self.pc.setOperators(self.A) > > ? self.pc.setFromOptions() > > ? # Since u and v do not change, we can build the denominator > > ? # and the action of A^{-1} on u only once, in the setup > > ? # phase. > > ? tmp = self.A.createVecLeft() > > ? self.pc.apply(self.u, tmp) > > ? self._Ainvu = tmp > > ? self._denom = 1 + self.k*self.v.dot(self._Ainvu) > > ? > > ? def apply(self, pc, x, y): > > ? # y <- A^{-1}x > > ? self.pc.apply(x, y) > > ? # alpha <- (v^T A^{-1} x) / (1 + v^T A^{-1} u) > > ? alpha = (self.k*self.v.dot(y)) / self._denom > > ? # y <- y - alpha * A^{-1}u > > ? y.axpy(-alpha, self._Ainvu) > > ? > > ? > > ? class MatrixFreeB(object): > > ? > > ? def __init__(self, A, u, v, k): > > ? self.A = A > > ? self.u = u > > ? self.v = v > > ? self.k = k > > ? > > ? def mult(self, mat, x, y): > > ? # y <- A x > > ? self.A.mult(x, y) > > ? > > ? # alpha <- v^T x > > ? alpha = self.v.dot(x) > > ? > > ? # y <- y + alpha*u > > ? y.axpy(alpha, self.u) > > However, this approach is not efficient as it requires many iterations due to the Newton step being fixed, so I would like to implement it using SNES and use line search. Unfortunately, I have not been able to find any documentation/tutorial on how to do so. Provided I have the FEniCS forms for F, K, and a, I'd like to do something along the lines of: > > solver = PETScSNESSolver() # the FEniCS SNES wrapper > > snes = solver.snes() # the petsc4py SNES object > > ## ?? > > ksp = snes.getKSP() > > # set ksp option similar to above > > solver.solve() > > > > I would be very grateful if anyone could could help or point me to a reference or demo that does something similar (or maybe a completely different way of solving the problem!). > > Many thanks in advance! > > Alex > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From hong at aspiritech.org Wed Feb 5 15:26:10 2020 From: hong at aspiritech.org (hong at aspiritech.org) Date: Wed, 5 Feb 2020 15:26:10 -0600 Subject: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC? In-Reply-To: References: <264F91C4-8558-4850-9B4B-ABE4123C2A2C@anl.gov> Message-ID: Hao: Try '-pc_sub_type lu -ksp_type gmres -ksp_monitor' Hong Thanks a lot for your suggestions, Hong and Barry - > > As you suggested, I first tried the LU direct solvers (built-in and MUMPs) > out this morning, which work perfectly, albeit slow. As my problem itself > is a part of a PDE based optimization, the exact solution in the > searching procedure is not necessary (I often set a relative tolerance of > 1E-7/1E-8, etc. for Krylov subspace methods). Also tried BJACOBI with exact > LU, the KSP just converges in one or two iterations, as expected. > > I added -kspview option for the D-ILU test (still with Block Jacobi as > preconditioner and bcgs as KSP solver). The KSPview output from one of the > examples in a toy model looks like: > > KSP Object: 1 MPI processes > type: bcgs > maximum iterations=120, nonzero initial guess > tolerances: relative=1e-07, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: 1 MPI processes > type: bjacobi > number of blocks = 3 > Local solve is same for all blocks, in the following KSP and PC > objects: > KSP Object: (sub_) 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (sub_) 1 MPI processes > type: ilu > out-of-place factorization > 0 levels of fill > tolerance for zero pivot 2.22045e-14 > matrix ordering: natural > factor fill ratio given 1., needed 1. > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=11294, cols=11294 > package used to perform factorization: petsc > total: nonzeros=76008, allocated nonzeros=76008 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=11294, cols=11294 > total: nonzeros=76008, allocated nonzeros=76008 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: mpiaij > rows=33880, cols=33880 > total: nonzeros=436968, allocated nonzeros=436968 > total number of mallocs used during MatSetValues calls=0 > not using I-node (on process 0) routines > > do you see something wrong with my setup? > > I also tried a quick performance test with a small 278906 by 278906 > matrix (3850990 nnz) with the following parameters: > > -ksp_type bcgs -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type > ilu -ksp_view > > Reducing the relative residual to 1E-7 > > Took 4.08s with 41 bcgs iterations. > > Merely changing the -pc_bjacobi_local_blocks to 6 > > Took 7.02s with 73 bcgs iterations. 9 blocks would further take 9.45s with > 101 bcgs iterations. > > As a reference, my home-brew Fortran code solves the same problem (3-block > D-ILU0) in > > 1.84s with 24 bcgs iterations (the bcgs code is also a home-brew one)? > > > > Well, by saying ?use explicit L/U matrix as preconditioner?, I wonder if a > user is allowed to provide his own (separate) L and U Mat for > preconditioning (like how it works in Matlab solvers), e.g. > > x = qmr(A,b,Tol,MaxIter,L,U,x) > > As I already explicitly constructed the L and U matrices in Fortran, it is > not hard to convert them to Mat in Petsc to test Petsc and my Fortran code > head-to-head. In that case, the A, b, x, and L/U are all identical, it > would be easier to see where the problem came from. > > > > BTW, there is another thing I wondered - is there a way to output residual > in unpreconditioned norm? I tried to > > *call* KSPSetNormType(ksp_local, KSP_NORM_UNPRECONDITIONED, ierr) > > But always get an error that current ksp does not support unpreconditioned > in LEFT/RIGHT (either way). Is it possible to do that (output > unpreconditioned residual) in PETSc at all? > > Cheers, > Hao > > > ------------------------------ > *From:* Smith, Barry F. > *Sent:* Tuesday, February 4, 2020 8:27 PM > *To:* Hao DONG > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] What is the right way to implement a (block) > Diagonal ILU as PC? > > > > > On Feb 4, 2020, at 12:41 PM, Hao DONG wrote: > > > > Dear all, > > > > > > I have a few questions about the implementation of diagonal ILU PC in > PETSc. I want to solve a very simple system with KSP (in parallel), the > nature of the system (finite difference time-harmonic Maxwell) is probably > not important to the question itself. Long story short, I just need to > solve a set of equations of Ax = b with a block diagonal system matrix, > like (not sure if the mono font works): > > > > |X | > > A =| Y | > > | Z| > > > > Note that A is not really block-diagonal, it?s just a multi-diagonal > matrix determined by the FD mesh, where most elements are close to > diagonal. So instead of a full ILU decomposition, a D-ILU is a good > approximation as a preconditioner, and the number of blocks should not > matter too much: > > > > |Lx | |Ux | > > L = | Ly | and U = | Uy | > > | Lz| | Uz| > > > > Where [Lx, Ux] = ILU0(X), etc. Indeed, the D-ILU preconditioner (with 3N > blocks) is quite efficient with Krylov subspace methods like BiCGstab or > QMR in my sequential Fortran/Matlab code. > > > > So like most users, I am looking for a parallel implement with this > problem in PETSc. After looking through the manual, it seems that the most > straightforward way to do it is through PCBJACOBI. Not sure I understand it > right, I just setup a 3-block PCJACOBI and give each of the block a KSP > with PCILU. Is this supposed to be equivalent to my D-ILU preconditioner? > My little block of fortran code would look like: > > ... > > call PCBJacobiSetTotalBlocks(pc_local,Ntotal, & > > & isubs,ierr) > > call PCBJacobiSetLocalBlocks(pc_local, Nsub, & > > & isubs(istart:iend),ierr) > > ! set up the block jacobi structure > > call KSPSetup(ksp_local,ierr) > > ! allocate sub ksps > > allocate(ksp_sub(Nsub)) > > call PCBJacobiGetSubKSP(pc_local,Nsub,istart, & > > & ksp_sub,ierr) > > do i=1,Nsub > > call KSPGetPC(ksp_sub(i),pc_sub,ierr) > > !ILU preconditioner > > call PCSetType(pc_sub,ptype,ierr) > > call PCFactorSetLevels(pc_sub,1,ierr) ! use ILU(1) here > > call KSPSetType(ksp_sub(i),KSPPREONLY,ierr)] > > end do > > call KSPSetTolerances(ksp_local,KSSiter%tol,PETSC_DEFAULT_REAL, & > > & PETSC_DEFAULT_REAL,KSSiter%maxit,ierr) > > ? > > This code looks essentially right. You should call with -ksp_view to > see exactly what PETSc is using for a solver. > > > > > I understand that the parallel performance may not be comparable, so I > first set up a one-process test (with MPIAij, but all the rows are local > since there is only one process). The system is solved without any problem > (identical results within error). But the performance is actually a lot > worse (code built without debugging flags in performance tests) than my own > home-brew implementation in Fortran (I wrote my own ILU0 in CSR sparse > matrix format), which is hard to believe. I suspect the difference is from > the PC as the PETSc version took much more BiCGstab iterations (60-ish vs > 100-ish) to converge to the same relative tol. > > PETSc uses GMRES by default with a restart of 30 and left > preconditioning. It could be different exact choices in the solver (which > is why -ksp_view is so useful) can explain the differences in the runs > between your code and PETSc's > > > > This is further confirmed when I change the setup of D-ILU (using 6 or 9 > blocks instead of 3). While my Fortran/Matlab codes see minimal performance > difference (<5%) when I play with the D-ILU setup, increasing the number of > D-ILU blocks from 3 to 9 caused the ksp setup with PCBJACOBI to suffer a > performance decrease of more than 50% in sequential test. > > This is odd, the more blocks the smaller each block so the quicker the > ILU set up should be. You can run various cases with -log_view and send us > the output to see what is happening at each part of the computation in time. > > > So my implementation IS somewhat different in PETSc. Do I miss something > in the PCBJACOBI setup? Or do I have some fundamental misunderstanding of > how PCBJACOBI works in PETSc? > > Probably not. > > > > If this is not the right way to implement a block diagonal ILU as > (parallel) PC, please kindly point me to the right direction. I searched > through the mail list to find some answers, only to find a couple of > similar questions... An example would be nice. > > You approach seems fundamentally right but I cannot be sure of possible > bugs. > > > > On the other hand, does PETSc support a simple way to use explicit L/U > matrix as a preconditioner? I can import the D-ILU matrix (I already > converted my A matrix into Mat) constructed in my Fortran code to make a > better comparison. Or do I have to construct my own PC using PCshell? If > so, is there a good tutorial/example to learn about how to use PCSHELL (in > a more advanced sense, like how to setup pc side and transpose)? > > Not sure what you mean by explicit L/U matrix as a preconditioner. As > Hong said, yes you can use a parallel LU from MUMPS or SuperLU_DIST or > Pastix as the solver. You do not need any shell code. You simply need to > set the PCType to lu > > You can also set all this options from the command line and don't need > to write the code specifically. So call KSPSetFromOptions() and then for > example > > -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type ilu (this > last one is applied to each block so you could use -pc_type lu and it would > use lu on each block.) > > -ksp_type_none -pc_type lu -pc_factor_mat_solver_type mumps (do > parallel LU with mumps) > > By not hardwiring in the code and just using options you can test out > different cases much quicker > > Use -ksp_view to make sure that is using the solver the way you expect. > > Barry > > > > Barry > > > > > Thanks in advance, > > > > Hao > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Feb 5 15:42:04 2020 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Wed, 5 Feb 2020 21:42:04 +0000 Subject: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC? In-Reply-To: References: <264F91C4-8558-4850-9B4B-ABE4123C2A2C@anl.gov> Message-ID: <4A373D93-4018-45E0-B805-3ECC528472DD@mcs.anl.gov> > On Feb 5, 2020, at 4:36 AM, Hao DONG wrote: > > Thanks a lot for your suggestions, Hong and Barry - > > As you suggested, I first tried the LU direct solvers (built-in and MUMPs) out this morning, which work perfectly, albeit slow. As my problem itself is a part of a PDE based optimization, the exact solution in the searching procedure is not necessary (I often set a relative tolerance of 1E-7/1E-8, etc. for Krylov subspace methods). Also tried BJACOBI with exact LU, the KSP just converges in one or two iterations, as expected. > > I added -kspview option for the D-ILU test (still with Block Jacobi as preconditioner and bcgs as KSP solver). The KSPview output from one of the examples in a toy model looks like: > > KSP Object: 1 MPI processes > type: bcgs > maximum iterations=120, nonzero initial guess > tolerances: relative=1e-07, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: 1 MPI processes > type: bjacobi > number of blocks = 3 > Local solve is same for all blocks, in the following KSP and PC objects: > KSP Object: (sub_) 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (sub_) 1 MPI processes > type: ilu > out-of-place factorization > 0 levels of fill > tolerance for zero pivot 2.22045e-14 > matrix ordering: natural > factor fill ratio given 1., needed 1. > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=11294, cols=11294 > package used to perform factorization: petsc > total: nonzeros=76008, allocated nonzeros=76008 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=11294, cols=11294 > total: nonzeros=76008, allocated nonzeros=76008 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: mpiaij > rows=33880, cols=33880 > total: nonzeros=436968, allocated nonzeros=436968 > total number of mallocs used during MatSetValues calls=0 > not using I-node (on process 0) routines > > do you see something wrong with my setup? > > I also tried a quick performance test with a small 278906 by 278906 matrix (3850990 nnz) with the following parameters: > > -ksp_type bcgs -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type ilu -ksp_view > > Reducing the relative residual to 1E-7 > > Took 4.08s with 41 bcgs iterations. > > Merely changing the -pc_bjacobi_local_blocks to 6 > > Took 7.02s with 73 bcgs iterations. 9 blocks would further take 9.45s with 101 bcgs iterations. This is normal. more blocks slower convergence > > As a reference, my home-brew Fortran code solves the same problem (3-block D-ILU0) in > > 1.84s with 24 bcgs iterations (the bcgs code is also a home-brew one)? > Run the PETSc code with optimization ./configure --with-debugging=0 an run the code with -log_view this will show where the PETSc code is spending the time (send it to use) > > > Well, by saying ?use explicit L/U matrix as preconditioner?, I wonder if a user is allowed to provide his own (separate) L and U Mat for preconditioning (like how it works in Matlab solvers), e.g. > > x = qmr(A,b,Tol,MaxIter,L,U,x) > > As I already explicitly constructed the L and U matrices in Fortran, it is not hard to convert them to Mat in Petsc to test Petsc and my Fortran code head-to-head. In that case, the A, b, x, and L/U are all identical, it would be easier to see where the problem came from. > > No, we don't provide this kind of support > > BTW, there is another thing I wondered - is there a way to output residual in unpreconditioned norm? I tried to > > call KSPSetNormType(ksp_local, KSP_NORM_UNPRECONDITIONED, ierr) > > But always get an error that current ksp does not support unpreconditioned in LEFT/RIGHT (either way). Is it possible to do that (output unpreconditioned residual) in PETSc at all? -ksp_monitor_true_residual You can also run GMRES (and some other methods) with right preconditioning, -ksp_pc_side right then the residual computed is by the algorithm the unpreconditioned residual KSPSetNormType sets the norm used in the algorithm, it generally always has to left or right, only a couple algorithms support unpreconditioned directly. Barry > > Cheers, > Hao > > > From: Smith, Barry F. > Sent: Tuesday, February 4, 2020 8:27 PM > To: Hao DONG > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC? > > > > > On Feb 4, 2020, at 12:41 PM, Hao DONG wrote: > > > > Dear all, > > > > > > I have a few questions about the implementation of diagonal ILU PC in PETSc. I want to solve a very simple system with KSP (in parallel), the nature of the system (finite difference time-harmonic Maxwell) is probably not important to the question itself. Long story short, I just need to solve a set of equations of Ax = b with a block diagonal system matrix, like (not sure if the mono font works): > > > > |X | > > A =| Y | > > | Z| > > > > Note that A is not really block-diagonal, it?s just a multi-diagonal matrix determined by the FD mesh, where most elements are close to diagonal. So instead of a full ILU decomposition, a D-ILU is a good approximation as a preconditioner, and the number of blocks should not matter too much: > > > > |Lx | |Ux | > > L = | Ly | and U = | Uy | > > | Lz| | Uz| > > > > Where [Lx, Ux] = ILU0(X), etc. Indeed, the D-ILU preconditioner (with 3N blocks) is quite efficient with Krylov subspace methods like BiCGstab or QMR in my sequential Fortran/Matlab code. > > > > So like most users, I am looking for a parallel implement with this problem in PETSc. After looking through the manual, it seems that the most straightforward way to do it is through PCBJACOBI. Not sure I understand it right, I just setup a 3-block PCJACOBI and give each of the block a KSP with PCILU. Is this supposed to be equivalent to my D-ILU preconditioner? My little block of fortran code would look like: > > ... > > call PCBJacobiSetTotalBlocks(pc_local,Ntotal, & > > & isubs,ierr) > > call PCBJacobiSetLocalBlocks(pc_local, Nsub, & > > & isubs(istart:iend),ierr) > > ! set up the block jacobi structure > > call KSPSetup(ksp_local,ierr) > > ! allocate sub ksps > > allocate(ksp_sub(Nsub)) > > call PCBJacobiGetSubKSP(pc_local,Nsub,istart, & > > & ksp_sub,ierr) > > do i=1,Nsub > > call KSPGetPC(ksp_sub(i),pc_sub,ierr) > > !ILU preconditioner > > call PCSetType(pc_sub,ptype,ierr) > > call PCFactorSetLevels(pc_sub,1,ierr) ! use ILU(1) here > > call KSPSetType(ksp_sub(i),KSPPREONLY,ierr)] > > end do > > call KSPSetTolerances(ksp_local,KSSiter%tol,PETSC_DEFAULT_REAL, & > > & PETSC_DEFAULT_REAL,KSSiter%maxit,ierr) > > ? > > This code looks essentially right. You should call with -ksp_view to see exactly what PETSc is using for a solver. > > > > > I understand that the parallel performance may not be comparable, so I first set up a one-process test (with MPIAij, but all the rows are local since there is only one process). The system is solved without any problem (identical results within error). But the performance is actually a lot worse (code built without debugging flags in performance tests) than my own home-brew implementation in Fortran (I wrote my own ILU0 in CSR sparse matrix format), which is hard to believe. I suspect the difference is from the PC as the PETSc version took much more BiCGstab iterations (60-ish vs 100-ish) to converge to the same relative tol. > > PETSc uses GMRES by default with a restart of 30 and left preconditioning. It could be different exact choices in the solver (which is why -ksp_view is so useful) can explain the differences in the runs between your code and PETSc's > > > > This is further confirmed when I change the setup of D-ILU (using 6 or 9 blocks instead of 3). While my Fortran/Matlab codes see minimal performance difference (<5%) when I play with the D-ILU setup, increasing the number of D-ILU blocks from 3 to 9 caused the ksp setup with PCBJACOBI to suffer a performance decrease of more than 50% in sequential test. > > This is odd, the more blocks the smaller each block so the quicker the ILU set up should be. You can run various cases with -log_view and send us the output to see what is happening at each part of the computation in time. > > > So my implementation IS somewhat different in PETSc. Do I miss something in the PCBJACOBI setup? Or do I have some fundamental misunderstanding of how PCBJACOBI works in PETSc? > > Probably not. > > > > If this is not the right way to implement a block diagonal ILU as (parallel) PC, please kindly point me to the right direction. I searched through the mail list to find some answers, only to find a couple of similar questions... An example would be nice. > > You approach seems fundamentally right but I cannot be sure of possible bugs. > > > > On the other hand, does PETSc support a simple way to use explicit L/U matrix as a preconditioner? I can import the D-ILU matrix (I already converted my A matrix into Mat) constructed in my Fortran code to make a better comparison. Or do I have to construct my own PC using PCshell? If so, is there a good tutorial/example to learn about how to use PCSHELL (in a more advanced sense, like how to setup pc side and transpose)? > > Not sure what you mean by explicit L/U matrix as a preconditioner. As Hong said, yes you can use a parallel LU from MUMPS or SuperLU_DIST or Pastix as the solver. You do not need any shell code. You simply need to set the PCType to lu > > You can also set all this options from the command line and don't need to write the code specifically. So call KSPSetFromOptions() and then for example > > -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type ilu (this last one is applied to each block so you could use -pc_type lu and it would use lu on each block.) > > -ksp_type_none -pc_type lu -pc_factor_mat_solver_type mumps (do parallel LU with mumps) > > By not hardwiring in the code and just using options you can test out different cases much quicker > > Use -ksp_view to make sure that is using the solver the way you expect. > > Barry > > > > Barry > > > > > Thanks in advance, > > > > Hao From aan2 at princeton.edu Wed Feb 5 19:53:51 2020 From: aan2 at princeton.edu (Olek Niewiarowski) Date: Thu, 6 Feb 2020 01:53:51 +0000 Subject: [petsc-users] Implementing the Sherman Morisson formula (low rank update) in petsc4py and FEniCS? In-Reply-To: <20E8B18C-F71E-4B10-958B-6CD24DA869A3@mcs.anl.gov> References: , <20E8B18C-F71E-4B10-958B-6CD24DA869A3@mcs.anl.gov> Message-ID: Hi Barry and Matt, Thank you for your input and for creating a new issue in the repo. My initial question was more basic (how to configure the SNES's KSP solver as in my first message with a and k), but now I see there's more to the implementation. To reiterate, for my problem's structure, a good solution algorithm (on the algebraic level) is the following "double back-substitution": For each nonlinear iteration: 1. define intermediate vectors u_1 and u_2 2. solve Ku_1 = -F --> u_1 = -K^{-1}F (this solve is cheap, don't actually need K^-1) 3. solve Ku_2 = -a --> u_2 = -K^{-1}a (ditto) 4. define \beta = 1/(1 + k a^Tu_2) 5. \Delta u = u_1 + \beta k u_2^T F u_2 6. u = u + \Delta u I don't need the Jacobian inverse, [K?kaaT]-1 = K-1 - (kK-1 aaTK-1)/(1+kaTK-1a) just the solution ?u = [K?kaaT]-1F = K-1F - (kK-1 aFK-1a)/(1 + kaTK-1a) = u_1 + beta k u_2^T F u_2 (so I never need to invert K either). (To Matt's point on gitlab, K is a symmetric sparse matrix arising from a bilinear form. ) Also, eventually, I want to have more than one low-rank updates to K, but again, Sherman Morrisson Woodbury should still work. Being new to PETSc, I don't know if this algorithm directly translates into an efficient numerical solver. (I'm also not sure if Picard iteration would be useful here.) What would it take to set up the KSP solver in SNES like I did below? Is it possible "out of the box"? I looked at MatCreateLRC() - what would I pass this to? (A pointer to demo/tutorial would be very appreciated.) If there's a better way to go about all of this, I'm open to any and all ideas. My only limitation is that I use petsc4py exclusively since I/future users of my code will not be comfortable with C. Thanks again for your help! Alexander (Olek) Niewiarowski PhD Candidate, Civil & Environmental Engineering Princeton University, 2020 Cell: +1 (610) 393-2978 ________________________________ From: Smith, Barry F. Sent: Wednesday, February 5, 2020 15:46 To: Matthew Knepley Cc: Olek Niewiarowski ; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Implementing the Sherman Morisson formula (low rank update) in petsc4py and FEniCS? https://gitlab.com/petsc/petsc/issues/557 > On Feb 5, 2020, at 7:35 AM, Matthew Knepley wrote: > > Perhaps Barry is right that you want Picard, but suppose you really want Newton. > > "This problem can be solved efficiently using the Sherman-Morrison formula" Well, maybe. The main assumption here is that inverting K is cheap. I see two things you can do in a straightforward way: > > 1) Use MatCreateLRC() https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateLRC.html to create the Jacobian > and solve using an iterative method. If you pass just K was the preconditioning matrix, you can use common PCs. > > 2) We only implemented MatMult() for LRC, but you could stick your SMW code in for MatSolve_LRC if you really want to factor K. We would > of course help you do this. > > Thanks, > > Matt > > On Wed, Feb 5, 2020 at 1:36 AM Smith, Barry F. via petsc-users wrote: > > I am not sure of everything in your email but it sounds like you want to use a "Picard" iteration to solve [K(u)?kaaT]?u=?F(u). That is solve > > A(u^{n}) (u^{n+1} - u^{n}) = F(u^{n}) - A(u^{n})u^{n} where A(u) = K(u) - kaaT > > PETSc provides code to this with SNESSetPicard() (see the manual pages) I don't know if Petsc4py has bindings for this. > > Adding missing python bindings is not terribly difficult and you may be able to do it yourself if this is the approach you want. > > Barry > > > > > On Feb 4, 2020, at 5:07 PM, Olek Niewiarowski wrote: > > > > Hello, > > I am a FEniCS user but new to petsc4py. I am trying to modify the KSP solver through the SNES object to implement the Sherman-Morrison formula(e.g. http://fourier.eng.hmc.edu/e176/lectures/algebra/node6.html ). I am solving a nonlinear system of the form [K(u)?kaaT]?u=?F(u). Here the jacobian matrix K is modified by the term kaaT, where k is a scalar. Notably, K is sparse, while the term kaaT results in a full matrix. This problem can be solved efficiently using the Sherman-Morrison formula : > > > > [K?kaaT]-1 = K-1 - (kK-1 aaTK-1)/(1+kaTK-1a) > > I have managed to successfully implement this at the linear solve level (by modifying the KSP solver) inside a custom Newton solver in python by following an incomplete tutorial at https://www.firedrakeproject.org/petsc-interface.html#defining-a-preconditioner : > > ? while (norm(delU) > alpha): # while not converged > > ? > > ? self.update_F() # call to method to update r.h.s form > > ? self.update_K() # call to update the jacobian form > > ? K = assemble(self.K) # assemble the jacobian matrix > > ? F = assemble(self.F) # assemble the r.h.s vector > > ? a = assemble(self.a_form) # assemble the a_form (see Sherman Morrison formula) > > ? > > ? for bc in self.mem.bc: # apply boundary conditions > > ? bc.apply(K, F) > > ? bc.apply(K, a) > > ? > > ? B = PETSc.Mat().create() > > ? > > ? # Assemble the bilinear form that defines A and get the concrete > > ? # PETSc matrix > > ? A = as_backend_type(K).mat() # get the PETSc objects for K and a > > ? u = as_backend_type(a).vec() > > ? > > ? # Build the matrix "context" # see firedrake docs > > ? Bctx = MatrixFreeB(A, u, u, self.k) > > ? > > ? # Set up B > > ? # B is the same size as A > > ? B.setSizes(*A.getSizes()) > > ? B.setType(B.Type.PYTHON) > > ? B.setPythonContext(Bctx) > > ? B.setUp() > > ? > > ? > > ? ksp = PETSc.KSP().create() # create the KSP linear solver object > > ? ksp.setOperators(B) > > ? ksp.setUp() > > ? pc = ksp.pc > > ? pc.setType(pc.Type.PYTHON) > > ? pc.setPythonContext(MatrixFreePC()) > > ? ksp.setFromOptions() > > ? > > ? solution = delU # the incremental displacement at this iteration > > ? > > ? b = as_backend_type(-F).vec() > > ? delu = solution.vector().vec() > > ? > > ? ksp.solve(b, delu) > > > > ? self.mem.u.vector().axpy(0.25, self.delU.vector()) # poor man's linesearch > > ? counter += 1 > > Here is the corresponding petsc4py code adapted from the firedrake docs: > > > > ? class MatrixFreePC(object): > > ? > > ? def setUp(self, pc): > > ? B, P = pc.getOperators() > > ? # extract the MatrixFreeB object from B > > ? ctx = B.getPythonContext() > > ? self.A = ctx.A > > ? self.u = ctx.u > > ? self.v = ctx.v > > ? self.k = ctx.k > > ? # Here we build the PC object that uses the concrete, > > ? # assembled matrix A. We will use this to apply the action > > ? # of A^{-1} > > ? self.pc = PETSc.PC().create() > > ? self.pc.setOptionsPrefix("mf_") > > ? self.pc.setOperators(self.A) > > ? self.pc.setFromOptions() > > ? # Since u and v do not change, we can build the denominator > > ? # and the action of A^{-1} on u only once, in the setup > > ? # phase. > > ? tmp = self.A.createVecLeft() > > ? self.pc.apply(self.u, tmp) > > ? self._Ainvu = tmp > > ? self._denom = 1 + self.k*self.v.dot(self._Ainvu) > > ? > > ? def apply(self, pc, x, y): > > ? # y <- A^{-1}x > > ? self.pc.apply(x, y) > > ? # alpha <- (v^T A^{-1} x) / (1 + v^T A^{-1} u) > > ? alpha = (self.k*self.v.dot(y)) / self._denom > > ? # y <- y - alpha * A^{-1}u > > ? y.axpy(-alpha, self._Ainvu) > > ? > > ? > > ? class MatrixFreeB(object): > > ? > > ? def __init__(self, A, u, v, k): > > ? self.A = A > > ? self.u = u > > ? self.v = v > > ? self.k = k > > ? > > ? def mult(self, mat, x, y): > > ? # y <- A x > > ? self.A.mult(x, y) > > ? > > ? # alpha <- v^T x > > ? alpha = self.v.dot(x) > > ? > > ? # y <- y + alpha*u > > ? y.axpy(alpha, self.u) > > However, this approach is not efficient as it requires many iterations due to the Newton step being fixed, so I would like to implement it using SNES and use line search. Unfortunately, I have not been able to find any documentation/tutorial on how to do so. Provided I have the FEniCS forms for F, K, and a, I'd like to do something along the lines of: > > solver = PETScSNESSolver() # the FEniCS SNES wrapper > > snes = solver.snes() # the petsc4py SNES object > > ## ?? > > ksp = snes.getKSP() > > # set ksp option similar to above > > solver.solve() > > > > I would be very grateful if anyone could could help or point me to a reference or demo that does something similar (or maybe a completely different way of solving the problem!). > > Many thanks in advance! > > Alex > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dong-hao at outlook.com Thu Feb 6 03:43:05 2020 From: dong-hao at outlook.com (Hao DONG) Date: Thu, 6 Feb 2020 09:43:05 +0000 Subject: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC? In-Reply-To: <4A373D93-4018-45E0-B805-3ECC528472DD@mcs.anl.gov> References: <264F91C4-8558-4850-9B4B-ABE4123C2A2C@anl.gov> , <4A373D93-4018-45E0-B805-3ECC528472DD@mcs.anl.gov> Message-ID: Dear Hong and Barry, Thanks for the suggestions. So there could be some problems in my PETSc configuration? - but my PETSc lib was indeed compiled without the debug flags (--with-debugging=0). I use GCC/GFortran (Home-brew GCC 9.2.0) for the compiling and building of PETSc and my own fortran code. My Fortran compiling flags are simply like: -O3 -ffree-line-length-none -fastsse Which is also used for -FOPTFLAGS in PETSc (I added -openmp for PETSc, but not my fortran code, as I don?t have any OMP optimizations in my program). Note the performance test results I listed yesterday (e.g. 4.08s with 41 bcgs iterations.) are without any CSR-array->PETSc translation overhead (only include the set and solve part). I have two questions about the performance difference - 1. Is ilu only factorized once for each iteration, or ilu is performed at every outer ksp iteration steps? Sounds unlikely - but if so, this could cause some extra overheads. 2. Some KSP solvers like BCGS or TFQMR has two ?half-iterations? for each iteration step. Not sure how it works in PETSc, but is that possible that both the two ?half" relative residuals are counted in the output array, doubling the number of iterations (but that cannot explain the extra time consumed)? Anyway, the output with -log_view from the same 278906 by 278906 matrix with 3-block D-ILU in PETSc is as follows: ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- MEMsolv.lu on a arch-darwin-c-opt named Haos-MBP with 1 processor, by donghao Thu Feb 6 09:07:35 2020 Using Petsc Release Version 3.12.3, unknown Max Max/Min Avg Total Time (sec): 4.443e+00 1.000 4.443e+00 Objects: 1.155e+03 1.000 1.155e+03 Flop: 4.311e+09 1.000 4.311e+09 4.311e+09 Flop/sec: 9.703e+08 1.000 9.703e+08 9.703e+08 MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Reductions: 0.000e+00 0.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flop and VecAXPY() for complex vectors of length N --> 8N flop Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 4.4435e+00 100.0% 4.3113e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% ???????????????????????????????????????????????????????????? See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage BuildTwoSidedF 1 1.0 2.3000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMult 83 1.0 1.7815e+00 1.0 2.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 40 48 0 0 0 40 48 0 0 0 1168 MatSolve 252 1.0 1.2708e+00 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 29 28 0 0 0 29 28 0 0 0 939 MatLUFactorNum 3 1.0 7.9725e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 298 MatILUFactorSym 3 1.0 2.6998e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatAssemblyBegin 5 1.0 3.6000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 5 1.0 3.1619e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatGetRowIJ 3 1.0 2.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatCreateSubMats 1 1.0 3.9659e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatGetOrdering 3 1.0 4.3070e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatView 3 1.0 1.3600e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecDot 82 1.0 1.8948e-01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00 4 4 0 0 0 4 4 0 0 0 966 VecDotNorm2 41 1.0 1.6812e-01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00 4 4 0 0 0 4 4 0 0 0 1088 VecNorm 43 1.0 9.5099e-02 1.0 9.59e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 1009 VecCopy 2 1.0 1.0120e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 271 1.0 3.8922e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecAXPY 1 1.0 7.7200e-04 1.0 2.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2890 VecAXPBYCZ 82 1.0 2.4370e-01 1.0 3.66e+08 1.0 0.0e+00 0.0e+00 0.0e+00 5 8 0 0 0 5 8 0 0 0 1502 VecWAXPY 82 1.0 1.4148e-01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 4 0 0 0 3 4 0 0 0 1293 VecAssemblyBegin 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 84 1.0 5.9300e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSetUp 4 1.0 1.4167e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 4.0250e+00 1.0 4.31e+09 1.0 0.0e+00 0.0e+00 0.0e+00 91100 0 0 0 91100 0 0 0 1071 PCSetUp 4 1.0 1.5207e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 1 0 0 0 3 1 0 0 0 156 PCSetUpOnBlocks 1 1.0 1.1116e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 1 0 0 0 3 1 0 0 0 214 PCApply 84 1.0 1.2912e+00 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 29 28 0 0 0 29 28 0 0 0 924 PCApplyOnBlocks 252 1.0 1.2909e+00 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 29 28 0 0 0 29 28 0 0 0 924 ------------------------------------------------------------------------------------------------------------------------ # I skipped the memory part - the options (and compiler options) are as follows: #PETSc Option Table entries: -ksp_type bcgs -ksp_view -log_view -pc_bjacobi_local_blocks 3 -pc_factor_levels 0 -pc_sub_type ilu -pc_type bjacobi #End of PETSc Option Table entries Compiled with FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 16 sizeof(PetscInt) 4 Configure options: --with-scalar-type=complex --download-mumps --download-scalapack --with-fortran-kernels=1 -- FOPTFLAGS=?-O3 -fastsse -mp -openmp? --COPTFLAGS=?-O3 -fastsse -mp -openmp? --CXXOPTFLAGS="-O3 -fastsse -mp -openmp" -- with-debugging=0 ----------------------------------------- Libraries compiled on 2020-02-03 10:44:31 on Haos-MBP Machine characteristics: Darwin-19.2.0-x86_64-i386-64bit Using PETSc directory: /Users/donghao/src/git/PETSc-current Using PETSc arch: arch-darwin-c-opt ----------------------------------------- Using C compiler: mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fno-stack- check -Qunused-arguments -fvisibility=hidden Using Fortran compiler: mpif90 -Wall -ffree-line-length-0 -Wno-unused-dummy-argument Using include paths: -I/Users/donghao/src/git/PETSc-current/include -I/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/include ----------------------------------------- Using C linker: mpicc Using Fortran linker: mpif90 Using libraries: -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/donghao/src/git/PETSc- current/arch-darwin-c-opt/lib -lpetsc -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/ donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -Wl,-rpath,/usr/local/opt/libevent/lib -L/usr/local/opt/libevent/ lib -Wl,-rpath,/usr/local/Cellar/open-mpi/4.0.2/lib -L/usr/local/Cellar/open-mpi/4.0.2/lib -Wl,-rpath,/usr/local/Cellar/ gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple-darwin19/9.2.0 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple- darwin19/9.2.0 -Wl,-rpath,/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -lcmumps - ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -llapack -lblas -lc++ -ldl -lmpi_usempif08 - lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lquadmath -lm -lc++ -ldl On the other hand, running PETSc with -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type lu -ksp_type gmres -ksp_monitor -ksp_view -log_view For the same problem takes 5.37s and 72 GMRES iterations. Our previous testings show that BiCGstab (bcgs in PETSc) is almost always the most effective KSP solver for our non-symmetrical complex system. Strangely, the system is still using ilu instead of lu for sub blocks. The output is like: 0 KSP Residual norm 2.480412407430e+02 1 KSP Residual norm 8.848059967835e+01 2 KSP Residual norm 3.415272863261e+01 3 KSP Residual norm 1.563045190939e+01 4 KSP Residual norm 6.241296940043e+00 5 KSP Residual norm 2.739710899854e+00 6 KSP Residual norm 1.391304148888e+00 7 KSP Residual norm 7.959262020849e-01 8 KSP Residual norm 4.828323055231e-01 9 KSP Residual norm 2.918529739200e-01 10 KSP Residual norm 1.905508589557e-01 11 KSP Residual norm 1.291541892702e-01 12 KSP Residual norm 8.827145774707e-02 13 KSP Residual norm 6.521331095889e-02 14 KSP Residual norm 5.095787952595e-02 15 KSP Residual norm 4.043060387395e-02 16 KSP Residual norm 3.232590200012e-02 17 KSP Residual norm 2.593944982216e-02 18 KSP Residual norm 2.064639483533e-02 19 KSP Residual norm 1.653916663492e-02 20 KSP Residual norm 1.334946415452e-02 21 KSP Residual norm 1.092886880597e-02 22 KSP Residual norm 8.988004105542e-03 23 KSP Residual norm 7.466501315240e-03 24 KSP Residual norm 6.284389135436e-03 25 KSP Residual norm 5.425231669964e-03 26 KSP Residual norm 4.766338253084e-03 27 KSP Residual norm 4.241238878242e-03 28 KSP Residual norm 3.808113525685e-03 29 KSP Residual norm 3.449383788116e-03 30 KSP Residual norm 3.126025526388e-03 31 KSP Residual norm 2.958328054299e-03 32 KSP Residual norm 2.802344900403e-03 33 KSP Residual norm 2.621993580492e-03 34 KSP Residual norm 2.430066269304e-03 35 KSP Residual norm 2.259043079597e-03 36 KSP Residual norm 2.104287972986e-03 37 KSP Residual norm 1.952916080045e-03 38 KSP Residual norm 1.804988937999e-03 39 KSP Residual norm 1.643302117377e-03 40 KSP Residual norm 1.471661332036e-03 41 KSP Residual norm 1.286445911163e-03 42 KSP Residual norm 1.127543025848e-03 43 KSP Residual norm 9.777148275484e-04 44 KSP Residual norm 8.293314450006e-04 45 KSP Residual norm 6.989331136622e-04 46 KSP Residual norm 5.852307780220e-04 47 KSP Residual norm 4.926715539762e-04 48 KSP Residual norm 4.215941372075e-04 49 KSP Residual norm 3.699489548162e-04 50 KSP Residual norm 3.293897163533e-04 51 KSP Residual norm 2.959954542998e-04 52 KSP Residual norm 2.700193032414e-04 53 KSP Residual norm 2.461789791204e-04 54 KSP Residual norm 2.218839085563e-04 55 KSP Residual norm 1.945154309976e-04 56 KSP Residual norm 1.661128781744e-04 57 KSP Residual norm 1.413198766258e-04 58 KSP Residual norm 1.213984003195e-04 59 KSP Residual norm 1.044317450754e-04 60 KSP Residual norm 8.919957502977e-05 61 KSP Residual norm 8.042584301275e-05 62 KSP Residual norm 7.292784493581e-05 63 KSP Residual norm 6.481935501872e-05 64 KSP Residual norm 5.718564652679e-05 65 KSP Residual norm 5.072589750116e-05 66 KSP Residual norm 4.487930741285e-05 67 KSP Residual norm 3.941040674119e-05 68 KSP Residual norm 3.492873281291e-05 69 KSP Residual norm 3.103798339845e-05 70 KSP Residual norm 2.822943237409e-05 71 KSP Residual norm 2.610615023776e-05 72 KSP Residual norm 2.441692671173e-05 KSP Object: 1 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=150, nonzero initial guess tolerances: relative=1e-07, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: 1 MPI processes type: bjacobi number of blocks = 3 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (sub_) 1 MPI processes type: ilu out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1., needed 1. Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=92969, cols=92969 package used to perform factorization: petsc total: nonzeros=638417, allocated nonzeros=638417 total number of mallocs used during MatSetValues calls=0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=92969, cols=92969 total: nonzeros=638417, allocated nonzeros=638417 total number of mallocs used during MatSetValues calls=0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: mpiaij rows=278906, cols=278906 total: nonzeros=3274027, allocated nonzeros=3274027 total number of mallocs used during MatSetValues calls=0 not using I-node (on process 0) routines ... ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage BuildTwoSidedF 1 1.0 2.3000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMult 75 1.0 1.5812e+00 1.0 1.88e+09 1.0 0.0e+00 0.0e+00 0.0e+00 28 24 0 0 0 28 24 0 0 0 1189 MatSolve 228 1.0 1.1442e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 14 0 0 0 20 14 0 0 0 944 MatLUFactorNum 3 1.0 8.1930e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 290 MatILUFactorSym 3 1.0 2.7102e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyBegin 5 1.0 3.7000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 5 1.0 3.1895e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatGetRowIJ 3 1.0 2.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatCreateSubMats 1 1.0 4.0904e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatGetOrdering 3 1.0 4.2640e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatView 3 1.0 1.4400e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecMDot 72 1.0 1.1984e+00 1.0 2.25e+09 1.0 0.0e+00 0.0e+00 0.0e+00 21 28 0 0 0 21 28 0 0 0 1877 VecNorm 76 1.0 1.6841e-01 1.0 1.70e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 2 0 0 0 3 2 0 0 0 1007 VecScale 75 1.0 1.8241e-02 1.0 8.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 4587 VecCopy 3 1.0 1.4970e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 276 1.0 9.1970e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 VecAXPY 6 1.0 3.7450e-03 1.0 1.34e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3575 VecMAXPY 75 1.0 1.0022e+00 1.0 2.41e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 30 0 0 0 18 30 0 0 0 2405 VecAssemblyBegin 2 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 76 1.0 5.5100e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecNormalize 75 1.0 1.8462e-01 1.0 2.51e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 3 0 0 0 3 3 0 0 0 1360 KSPSetUp 4 1.0 1.1341e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 5.3123e+00 1.0 7.91e+09 1.0 0.0e+00 0.0e+00 0.0e+00 93100 0 0 0 93100 0 0 0 1489 KSPGMRESOrthog 72 1.0 2.1316e+00 1.0 4.50e+09 1.0 0.0e+00 0.0e+00 0.0e+00 37 57 0 0 0 37 57 0 0 0 2110 PCSetUp 4 1.0 1.5531e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 153 PCSetUpOnBlocks 1 1.0 1.1343e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 209 PCApply 76 1.0 1.1671e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 14 0 0 0 20 14 0 0 0 925 PCApplyOnBlocks 228 1.0 1.1668e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 14 0 0 0 20 14 0 0 0 925 ???????????????????????????????????????????????????????????? ... #PETSc Option Table entries: -ksp_monitor -ksp_type gmres -ksp_view -log_view -pc_bjacobi_local_blocks 3 -pc_sub_type lu -pc_type bjacobi #End of PETSc Option Table entries ... Does any of the setup/output ring a bell? BTW, out of curiosity - what is a ?I-node? routine? Cheers, Hao ________________________________ From: Smith, Barry F. Sent: Wednesday, February 5, 2020 9:42 PM To: Hao DONG Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC? > On Feb 5, 2020, at 4:36 AM, Hao DONG wrote: > > Thanks a lot for your suggestions, Hong and Barry - > > As you suggested, I first tried the LU direct solvers (built-in and MUMPs) out this morning, which work perfectly, albeit slow. As my problem itself is a part of a PDE based optimization, the exact solution in the searching procedure is not necessary (I often set a relative tolerance of 1E-7/1E-8, etc. for Krylov subspace methods). Also tried BJACOBI with exact LU, the KSP just converges in one or two iterations, as expected. > > I added -kspview option for the D-ILU test (still with Block Jacobi as preconditioner and bcgs as KSP solver). The KSPview output from one of the examples in a toy model looks like: > > KSP Object: 1 MPI processes > type: bcgs > maximum iterations=120, nonzero initial guess > tolerances: relative=1e-07, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: 1 MPI processes > type: bjacobi > number of blocks = 3 > Local solve is same for all blocks, in the following KSP and PC objects: > KSP Object: (sub_) 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (sub_) 1 MPI processes > type: ilu > out-of-place factorization > 0 levels of fill > tolerance for zero pivot 2.22045e-14 > matrix ordering: natural > factor fill ratio given 1., needed 1. > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=11294, cols=11294 > package used to perform factorization: petsc > total: nonzeros=76008, allocated nonzeros=76008 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=11294, cols=11294 > total: nonzeros=76008, allocated nonzeros=76008 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: mpiaij > rows=33880, cols=33880 > total: nonzeros=436968, allocated nonzeros=436968 > total number of mallocs used during MatSetValues calls=0 > not using I-node (on process 0) routines > > do you see something wrong with my setup? > > I also tried a quick performance test with a small 278906 by 278906 matrix (3850990 nnz) with the following parameters: > > -ksp_type bcgs -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type ilu -ksp_view > > Reducing the relative residual to 1E-7 > > Took 4.08s with 41 bcgs iterations. > > Merely changing the -pc_bjacobi_local_blocks to 6 > > Took 7.02s with 73 bcgs iterations. 9 blocks would further take 9.45s with 101 bcgs iterations. This is normal. more blocks slower convergence > > As a reference, my home-brew Fortran code solves the same problem (3-block D-ILU0) in > > 1.84s with 24 bcgs iterations (the bcgs code is also a home-brew one)? > Run the PETSc code with optimization ./configure --with-debugging=0 an run the code with -log_view this will show where the PETSc code is spending the time (send it to use) > > > Well, by saying ?use explicit L/U matrix as preconditioner?, I wonder if a user is allowed to provide his own (separate) L and U Mat for preconditioning (like how it works in Matlab solvers), e.g. > > x = qmr(A,b,Tol,MaxIter,L,U,x) > > As I already explicitly constructed the L and U matrices in Fortran, it is not hard to convert them to Mat in Petsc to test Petsc and my Fortran code head-to-head. In that case, the A, b, x, and L/U are all identical, it would be easier to see where the problem came from. > > No, we don't provide this kind of support > > BTW, there is another thing I wondered - is there a way to output residual in unpreconditioned norm? I tried to > > call KSPSetNormType(ksp_local, KSP_NORM_UNPRECONDITIONED, ierr) > > But always get an error that current ksp does not support unpreconditioned in LEFT/RIGHT (either way). Is it possible to do that (output unpreconditioned residual) in PETSc at all? -ksp_monitor_true_residual You can also run GMRES (and some other methods) with right preconditioning, -ksp_pc_side right then the residual computed is by the algorithm the unpreconditioned residual KSPSetNormType sets the norm used in the algorithm, it generally always has to left or right, only a couple algorithms support unpreconditioned directly. Barry > > Cheers, > Hao > > > From: Smith, Barry F. > Sent: Tuesday, February 4, 2020 8:27 PM > To: Hao DONG > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC? > > > > > On Feb 4, 2020, at 12:41 PM, Hao DONG wrote: > > > > Dear all, > > > > > > I have a few questions about the implementation of diagonal ILU PC in PETSc. I want to solve a very simple system with KSP (in parallel), the nature of the system (finite difference time-harmonic Maxwell) is probably not important to the question itself. Long story short, I just need to solve a set of equations of Ax = b with a block diagonal system matrix, like (not sure if the mono font works): > > > > |X | > > A =| Y | > > | Z| > > > > Note that A is not really block-diagonal, it?s just a multi-diagonal matrix determined by the FD mesh, where most elements are close to diagonal. So instead of a full ILU decomposition, a D-ILU is a good approximation as a preconditioner, and the number of blocks should not matter too much: > > > > |Lx | |Ux | > > L = | Ly | and U = | Uy | > > | Lz| | Uz| > > > > Where [Lx, Ux] = ILU0(X), etc. Indeed, the D-ILU preconditioner (with 3N blocks) is quite efficient with Krylov subspace methods like BiCGstab or QMR in my sequential Fortran/Matlab code. > > > > So like most users, I am looking for a parallel implement with this problem in PETSc. After looking through the manual, it seems that the most straightforward way to do it is through PCBJACOBI. Not sure I understand it right, I just setup a 3-block PCJACOBI and give each of the block a KSP with PCILU. Is this supposed to be equivalent to my D-ILU preconditioner? My little block of fortran code would look like: > > ... > > call PCBJacobiSetTotalBlocks(pc_local,Ntotal, & > > & isubs,ierr) > > call PCBJacobiSetLocalBlocks(pc_local, Nsub, & > > & isubs(istart:iend),ierr) > > ! set up the block jacobi structure > > call KSPSetup(ksp_local,ierr) > > ! allocate sub ksps > > allocate(ksp_sub(Nsub)) > > call PCBJacobiGetSubKSP(pc_local,Nsub,istart, & > > & ksp_sub,ierr) > > do i=1,Nsub > > call KSPGetPC(ksp_sub(i),pc_sub,ierr) > > !ILU preconditioner > > call PCSetType(pc_sub,ptype,ierr) > > call PCFactorSetLevels(pc_sub,1,ierr) ! use ILU(1) here > > call KSPSetType(ksp_sub(i),KSPPREONLY,ierr)] > > end do > > call KSPSetTolerances(ksp_local,KSSiter%tol,PETSC_DEFAULT_REAL, & > > & PETSC_DEFAULT_REAL,KSSiter%maxit,ierr) > > ? > > This code looks essentially right. You should call with -ksp_view to see exactly what PETSc is using for a solver. > > > > > I understand that the parallel performance may not be comparable, so I first set up a one-process test (with MPIAij, but all the rows are local since there is only one process). The system is solved without any problem (identical results within error). But the performance is actually a lot worse (code built without debugging flags in performance tests) than my own home-brew implementation in Fortran (I wrote my own ILU0 in CSR sparse matrix format), which is hard to believe. I suspect the difference is from the PC as the PETSc version took much more BiCGstab iterations (60-ish vs 100-ish) to converge to the same relative tol. > > PETSc uses GMRES by default with a restart of 30 and left preconditioning. It could be different exact choices in the solver (which is why -ksp_view is so useful) can explain the differences in the runs between your code and PETSc's > > > > This is further confirmed when I change the setup of D-ILU (using 6 or 9 blocks instead of 3). While my Fortran/Matlab codes see minimal performance difference (<5%) when I play with the D-ILU setup, increasing the number of D-ILU blocks from 3 to 9 caused the ksp setup with PCBJACOBI to suffer a performance decrease of more than 50% in sequential test. > > This is odd, the more blocks the smaller each block so the quicker the ILU set up should be. You can run various cases with -log_view and send us the output to see what is happening at each part of the computation in time. > > > So my implementation IS somewhat different in PETSc. Do I miss something in the PCBJACOBI setup? Or do I have some fundamental misunderstanding of how PCBJACOBI works in PETSc? > > Probably not. > > > > If this is not the right way to implement a block diagonal ILU as (parallel) PC, please kindly point me to the right direction. I searched through the mail list to find some answers, only to find a couple of similar questions... An example would be nice. > > You approach seems fundamentally right but I cannot be sure of possible bugs. > > > > On the other hand, does PETSc support a simple way to use explicit L/U matrix as a preconditioner? I can import the D-ILU matrix (I already converted my A matrix into Mat) constructed in my Fortran code to make a better comparison. Or do I have to construct my own PC using PCshell? If so, is there a good tutorial/example to learn about how to use PCSHELL (in a more advanced sense, like how to setup pc side and transpose)? > > Not sure what you mean by explicit L/U matrix as a preconditioner. As Hong said, yes you can use a parallel LU from MUMPS or SuperLU_DIST or Pastix as the solver. You do not need any shell code. You simply need to set the PCType to lu > > You can also set all this options from the command line and don't need to write the code specifically. So call KSPSetFromOptions() and then for example > > -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type ilu (this last one is applied to each block so you could use -pc_type lu and it would use lu on each block.) > > -ksp_type_none -pc_type lu -pc_factor_mat_solver_type mumps (do parallel LU with mumps) > > By not hardwiring in the code and just using options you can test out different cases much quicker > > Use -ksp_view to make sure that is using the solver the way you expect. > > Barry > > > > Barry > > > > > Thanks in advance, > > > > Hao -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Feb 6 04:33:44 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 6 Feb 2020 05:33:44 -0500 Subject: [petsc-users] Implementing the Sherman Morisson formula (low rank update) in petsc4py and FEniCS? In-Reply-To: References: <20E8B18C-F71E-4B10-958B-6CD24DA869A3@mcs.anl.gov> Message-ID: On Wed, Feb 5, 2020 at 8:53 PM Olek Niewiarowski wrote: > Hi Barry and Matt, > > Thank you for your input and for creating a new issue in the repo. > My initial question was more basic (how to configure the SNES's KSP solver > as in my first message with *a* and *k*), but now I see there's more to > the implementation. To reiterate, for my problem's structure, a good > solution algorithm (on the algebraic level) is the following "double > back-substitution": > For each nonlinear iteration: > > 1. define intermediate vectors u_1 and u_2 > 2. solve Ku_1 = -F --> u_1 = -K^{-1}F (this solve is cheap, don't > actually need K^-1) > 3. solve Ku_2 = -a --> u_2 = -K^{-1}a (ditto) > 4. define \beta = 1/(1 + k a^Tu_2) > 5. \Delta u = u_1 + \beta k u_2^T F u_2 > 6. u = u + \Delta u > > This is very easy to setup: 1) Create a KSP object KSPCreate(comm, &ksp) 2) Call KSPSetOperators(ksp, K, K,) 3) Solve the first system KSPSolve(ksp, -F, u_1) 4) Solve the second system KSPSolve(ksp, a, u_2) 5) Calculate beta VecDot(a, u_2, &gamma); beta = 1./(1. + k*gamma); 6) Update the guess VecDot(u_2, F, &delta); VecAXPBYPCZ(u, 1.0, beta*delta, 1.0, u_1, u_2) Thanks, Matt I don't need the Jacobian inverse, [*K*?k*aa**T*]-1 = *K*-1 - (k*K*-1 *a* > *a**T**K*-1)/(1+k*a**T**K*-1*a*) just the solution ?*u =* [*K*?k*aa**T*]-1 > *F *= *K*-1*F* - (k*K*-1 *a**F**K*-1*a*)/(1 + k*a**T**K*-1*a*) > = *u*_1 + beta k *u*_2^T *F u*_2 (so I never need to invert *K *either). (To > Matt's point on gitlab, K is a symmetric sparse matrix arising from a > bilinear form. ) Also, eventually, I want to have more than one low-rank > updates to K, but again, Sherman Morrisson Woodbury should still work. > > Being new to PETSc, I don't know if this algorithm directly translates > into an efficient numerical solver. (I'm also not sure if Picard iteration > would be useful here.) What would it take to set up the KSP solver in SNES > like I did below? Is it possible "out of the box"? I looked at > MatCreateLRC() - what would I pass this to? (A pointer to demo/tutorial > would be very appreciated.) If there's a better way to go about all of > this, I'm open to any and all ideas. My only limitation is that I use > petsc4py exclusively since I/future users of my code will not be > comfortable with C. > > Thanks again for your help! > > > *Alexander (Olek) Niewiarowski* > PhD Candidate, Civil & Environmental Engineering > Princeton University, 2020 > Cell: +1 (610) 393-2978 > ------------------------------ > *From:* Smith, Barry F. > *Sent:* Wednesday, February 5, 2020 15:46 > *To:* Matthew Knepley > *Cc:* Olek Niewiarowski ; petsc-users at mcs.anl.gov < > petsc-users at mcs.anl.gov> > *Subject:* Re: [petsc-users] Implementing the Sherman Morisson formula > (low rank update) in petsc4py and FEniCS? > > > https://gitlab.com/petsc/petsc/issues/557 > > > > On Feb 5, 2020, at 7:35 AM, Matthew Knepley wrote: > > > > Perhaps Barry is right that you want Picard, but suppose you really want > Newton. > > > > "This problem can be solved efficiently using the Sherman-Morrison > formula" Well, maybe. The main assumption here is that inverting K is > cheap. I see two things you can do in a straightforward way: > > > > 1) Use MatCreateLRC() > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateLRC.html > to create the Jacobian > > and solve using an iterative method. If you pass just K was the > preconditioning matrix, you can use common PCs. > > > > 2) We only implemented MatMult() for LRC, but you could stick your SMW > code in for MatSolve_LRC if you really want to factor K. We would > > of course help you do this. > > > > Thanks, > > > > Matt > > > > On Wed, Feb 5, 2020 at 1:36 AM Smith, Barry F. via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > > I am not sure of everything in your email but it sounds like you want > to use a "Picard" iteration to solve [K(u)?kaaT]?u=?F(u). That is solve > > > > A(u^{n}) (u^{n+1} - u^{n}) = F(u^{n}) - A(u^{n})u^{n} where A(u) = > K(u) - kaaT > > > > PETSc provides code to this with SNESSetPicard() (see the manual pages) > I don't know if Petsc4py has bindings for this. > > > > Adding missing python bindings is not terribly difficult and you may > be able to do it yourself if this is the approach you want. > > > > Barry > > > > > > > > > On Feb 4, 2020, at 5:07 PM, Olek Niewiarowski > wrote: > > > > > > Hello, > > > I am a FEniCS user but new to petsc4py. I am trying to modify the KSP > solver through the SNES object to implement the Sherman-Morrison > formula(e.g. http://fourier.eng.hmc.edu/e176/lectures/algebra/node6.html > ). I am solving a nonlinear system of the form [K(u)?kaaT]?u=?F(u). Here > the jacobian matrix K is modified by the term kaaT, where k is a scalar. > Notably, K is sparse, while the term kaaT results in a full matrix. This > problem can be solved efficiently using the Sherman-Morrison formula : > > > > > > [K?kaaT]-1 = K-1 - (kK-1 aaTK-1)/(1+kaTK-1a) > > > I have managed to successfully implement this at the linear solve > level (by modifying the KSP solver) inside a custom Newton solver in python > by following an incomplete tutorial at > https://www.firedrakeproject.org/petsc-interface.html#defining-a-preconditioner > : > > > ? while (norm(delU) > alpha): # while not converged > > > ? > > > ? self.update_F() # call to method to update r.h.s > form > > > ? self.update_K() # call to update the jacobian form > > > ? K = assemble(self.K) # assemble the jacobian matrix > > > ? F = assemble(self.F) # assemble the r.h.s vector > > > ? a = assemble(self.a_form) # assemble the a_form > (see Sherman Morrison formula) > > > ? > > > ? for bc in self.mem.bc: # apply boundary conditions > > > ? bc.apply(K, F) > > > ? bc.apply(K, a) > > > ? > > > ? B = PETSc.Mat().create() > > > ? > > > ? # Assemble the bilinear form that defines A and get > the concrete > > > ? # PETSc matrix > > > ? A = as_backend_type(K).mat() # get the PETSc > objects for K and a > > > ? u = as_backend_type(a).vec() > > > ? > > > ? # Build the matrix "context" # see firedrake docs > > > ? Bctx = MatrixFreeB(A, u, u, self.k) > > > ? > > > ? # Set up B > > > ? # B is the same size as A > > > ? B.setSizes(*A.getSizes()) > > > ? B.setType(B.Type.PYTHON) > > > ? B.setPythonContext(Bctx) > > > ? B.setUp() > > > ? > > > ? > > > ? ksp = PETSc.KSP().create() # create the KSP linear > solver object > > > ? ksp.setOperators(B) > > > ? ksp.setUp() > > > ? pc = ksp.pc > > > ? pc.setType(pc.Type.PYTHON) > > > ? pc.setPythonContext(MatrixFreePC()) > > > ? ksp.setFromOptions() > > > ? > > > ? solution = delU # the incremental displacement at > this iteration > > > ? > > > ? b = as_backend_type(-F).vec() > > > ? delu = solution.vector().vec() > > > ? > > > ? ksp.solve(b, delu) > > > > > > ? self.mem.u.vector().axpy(0.25, self.delU.vector()) > # poor man's linesearch > > > ? counter += 1 > > > Here is the corresponding petsc4py code adapted from the firedrake > docs: > > > > > > ? class MatrixFreePC(object): > > > ? > > > ? def setUp(self, pc): > > > ? B, P = pc.getOperators() > > > ? # extract the MatrixFreeB object from B > > > ? ctx = B.getPythonContext() > > > ? self.A = ctx.A > > > ? self.u = ctx.u > > > ? self.v = ctx.v > > > ? self.k = ctx.k > > > ? # Here we build the PC object that uses the concrete, > > > ? # assembled matrix A. We will use this to apply the > action > > > ? # of A^{-1} > > > ? self.pc = PETSc.PC().create() > > > ? self.pc.setOptionsPrefix("mf_") > > > ? self.pc.setOperators(self.A) > > > ? self.pc.setFromOptions() > > > ? # Since u and v do not change, we can build the > denominator > > > ? # and the action of A^{-1} on u only once, in the > setup > > > ? # phase. > > > ? tmp = self.A.createVecLeft() > > > ? self.pc.apply(self.u, tmp) > > > ? self._Ainvu = tmp > > > ? self._denom = 1 + self.k*self.v.dot(self._Ainvu) > > > ? > > > ? def apply(self, pc, x, y): > > > ? # y <- A^{-1}x > > > ? self.pc.apply(x, y) > > > ? # alpha <- (v^T A^{-1} x) / (1 + v^T A^{-1} u) > > > ? alpha = (self.k*self.v.dot(y)) / self._denom > > > ? # y <- y - alpha * A^{-1}u > > > ? y.axpy(-alpha, self._Ainvu) > > > ? > > > ? > > > ? class MatrixFreeB(object): > > > ? > > > ? def __init__(self, A, u, v, k): > > > ? self.A = A > > > ? self.u = u > > > ? self.v = v > > > ? self.k = k > > > ? > > > ? def mult(self, mat, x, y): > > > ? # y <- A x > > > ? self.A.mult(x, y) > > > ? > > > ? # alpha <- v^T x > > > ? alpha = self.v.dot(x) > > > ? > > > ? # y <- y + alpha*u > > > ? y.axpy(alpha, self.u) > > > However, this approach is not efficient as it requires many iterations > due to the Newton step being fixed, so I would like to implement it using > SNES and use line search. Unfortunately, I have not been able to find any > documentation/tutorial on how to do so. Provided I have the FEniCS forms > for F, K, and a, I'd like to do something along the lines of: > > > solver = PETScSNESSolver() # the FEniCS SNES wrapper > > > snes = solver.snes() # the petsc4py SNES object > > > ## ?? > > > ksp = snes.getKSP() > > > # set ksp option similar to above > > > solver.solve() > > > > > > I would be very grateful if anyone could could help or point me to a > reference or demo that does something similar (or maybe a completely > different way of solving the problem!). > > > Many thanks in advance! > > > Alex > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From aan2 at princeton.edu Thu Feb 6 09:02:46 2020 From: aan2 at princeton.edu (Olek Niewiarowski) Date: Thu, 6 Feb 2020 15:02:46 +0000 Subject: [petsc-users] Implementing the Sherman Morisson formula (low rank update) in petsc4py and FEniCS? In-Reply-To: References: <20E8B18C-F71E-4B10-958B-6CD24DA869A3@mcs.anl.gov> , Message-ID: Hi Matt, What you suggested in your last email was exactly what I did on my very first attempt at the problem, and while it "worked," convergence was not satisfactory due to the Newton step being fixed in step 6. This is the reason I would like to use the linesearch in SNES instead. Indeed in your manual you "recommend most PETSc users work directly with SNES, rather than using PETSc for the linear problem within a nonlinear solver." Ideally I'd like to create a SNES solver, pass in the functions to evaluate K, F, a, and k, and set up the underlying KSP object as in my first message. Is this possible? Thanks, Alexander (Olek) Niewiarowski PhD Candidate, Civil & Environmental Engineering Princeton University, 2020 Cell: +1 (610) 393-2978 ________________________________ From: Matthew Knepley Sent: Thursday, February 6, 2020 5:33 To: Olek Niewiarowski Cc: Smith, Barry F. ; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Implementing the Sherman Morisson formula (low rank update) in petsc4py and FEniCS? On Wed, Feb 5, 2020 at 8:53 PM Olek Niewiarowski > wrote: Hi Barry and Matt, Thank you for your input and for creating a new issue in the repo. My initial question was more basic (how to configure the SNES's KSP solver as in my first message with a and k), but now I see there's more to the implementation. To reiterate, for my problem's structure, a good solution algorithm (on the algebraic level) is the following "double back-substitution": For each nonlinear iteration: 1. define intermediate vectors u_1 and u_2 2. solve Ku_1 = -F --> u_1 = -K^{-1}F (this solve is cheap, don't actually need K^-1) 3. solve Ku_2 = -a --> u_2 = -K^{-1}a (ditto) 4. define \beta = 1/(1 + k a^Tu_2) 5. \Delta u = u_1 + \beta k u_2^T F u_2 6. u = u + \Delta u This is very easy to setup: 1) Create a KSP object KSPCreate(comm, &ksp) 2) Call KSPSetOperators(ksp, K, K,) 3) Solve the first system KSPSolve(ksp, -F, u_1) 4) Solve the second system KSPSolve(ksp, a, u_2) 5) Calculate beta VecDot(a, u_2, &gamma); beta = 1./(1. + k*gamma); 6) Update the guess VecDot(u_2, F, &delta); VecAXPBYPCZ(u, 1.0, beta*delta, 1.0, u_1, u_2) Thanks, Matt I don't need the Jacobian inverse, [K?kaaT]-1 = K-1 - (kK-1 aaTK-1)/(1+kaTK-1a) just the solution ?u = [K?kaaT]-1F = K-1F - (kK-1 aFK-1a)/(1 + kaTK-1a) = u_1 + beta k u_2^T F u_2 (so I never need to invert K either). (To Matt's point on gitlab, K is a symmetric sparse matrix arising from a bilinear form. ) Also, eventually, I want to have more than one low-rank updates to K, but again, Sherman Morrisson Woodbury should still work. Being new to PETSc, I don't know if this algorithm directly translates into an efficient numerical solver. (I'm also not sure if Picard iteration would be useful here.) What would it take to set up the KSP solver in SNES like I did below? Is it possible "out of the box"? I looked at MatCreateLRC() - what would I pass this to? (A pointer to demo/tutorial would be very appreciated.) If there's a better way to go about all of this, I'm open to any and all ideas. My only limitation is that I use petsc4py exclusively since I/future users of my code will not be comfortable with C. Thanks again for your help! Alexander (Olek) Niewiarowski PhD Candidate, Civil & Environmental Engineering Princeton University, 2020 Cell: +1 (610) 393-2978 ________________________________ From: Smith, Barry F. > Sent: Wednesday, February 5, 2020 15:46 To: Matthew Knepley > Cc: Olek Niewiarowski >; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Implementing the Sherman Morisson formula (low rank update) in petsc4py and FEniCS? https://gitlab.com/petsc/petsc/issues/557 > On Feb 5, 2020, at 7:35 AM, Matthew Knepley > wrote: > > Perhaps Barry is right that you want Picard, but suppose you really want Newton. > > "This problem can be solved efficiently using the Sherman-Morrison formula" Well, maybe. The main assumption here is that inverting K is cheap. I see two things you can do in a straightforward way: > > 1) Use MatCreateLRC() https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateLRC.html to create the Jacobian > and solve using an iterative method. If you pass just K was the preconditioning matrix, you can use common PCs. > > 2) We only implemented MatMult() for LRC, but you could stick your SMW code in for MatSolve_LRC if you really want to factor K. We would > of course help you do this. > > Thanks, > > Matt > > On Wed, Feb 5, 2020 at 1:36 AM Smith, Barry F. via petsc-users > wrote: > > I am not sure of everything in your email but it sounds like you want to use a "Picard" iteration to solve [K(u)?kaaT]?u=?F(u). That is solve > > A(u^{n}) (u^{n+1} - u^{n}) = F(u^{n}) - A(u^{n})u^{n} where A(u) = K(u) - kaaT > > PETSc provides code to this with SNESSetPicard() (see the manual pages) I don't know if Petsc4py has bindings for this. > > Adding missing python bindings is not terribly difficult and you may be able to do it yourself if this is the approach you want. > > Barry > > > > > On Feb 4, 2020, at 5:07 PM, Olek Niewiarowski > wrote: > > > > Hello, > > I am a FEniCS user but new to petsc4py. I am trying to modify the KSP solver through the SNES object to implement the Sherman-Morrison formula(e.g. http://fourier.eng.hmc.edu/e176/lectures/algebra/node6.html ). I am solving a nonlinear system of the form [K(u)?kaaT]?u=?F(u). Here the jacobian matrix K is modified by the term kaaT, where k is a scalar. Notably, K is sparse, while the term kaaT results in a full matrix. This problem can be solved efficiently using the Sherman-Morrison formula : > > > > [K?kaaT]-1 = K-1 - (kK-1 aaTK-1)/(1+kaTK-1a) > > I have managed to successfully implement this at the linear solve level (by modifying the KSP solver) inside a custom Newton solver in python by following an incomplete tutorial at https://www.firedrakeproject.org/petsc-interface.html#defining-a-preconditioner : > > ? while (norm(delU) > alpha): # while not converged > > ? > > ? self.update_F() # call to method to update r.h.s form > > ? self.update_K() # call to update the jacobian form > > ? K = assemble(self.K) # assemble the jacobian matrix > > ? F = assemble(self.F) # assemble the r.h.s vector > > ? a = assemble(self.a_form) # assemble the a_form (see Sherman Morrison formula) > > ? > > ? for bc in self.mem.bc: # apply boundary conditions > > ? bc.apply(K, F) > > ? bc.apply(K, a) > > ? > > ? B = PETSc.Mat().create() > > ? > > ? # Assemble the bilinear form that defines A and get the concrete > > ? # PETSc matrix > > ? A = as_backend_type(K).mat() # get the PETSc objects for K and a > > ? u = as_backend_type(a).vec() > > ? > > ? # Build the matrix "context" # see firedrake docs > > ? Bctx = MatrixFreeB(A, u, u, self.k) > > ? > > ? # Set up B > > ? # B is the same size as A > > ? B.setSizes(*A.getSizes()) > > ? B.setType(B.Type.PYTHON) > > ? B.setPythonContext(Bctx) > > ? B.setUp() > > ? > > ? > > ? ksp = PETSc.KSP().create() # create the KSP linear solver object > > ? ksp.setOperators(B) > > ? ksp.setUp() > > ? pc = ksp.pc > > ? pc.setType(pc.Type.PYTHON) > > ? pc.setPythonContext(MatrixFreePC()) > > ? ksp.setFromOptions() > > ? > > ? solution = delU # the incremental displacement at this iteration > > ? > > ? b = as_backend_type(-F).vec() > > ? delu = solution.vector().vec() > > ? > > ? ksp.solve(b, delu) > > > > ? self.mem.u.vector().axpy(0.25, self.delU.vector()) # poor man's linesearch > > ? counter += 1 > > Here is the corresponding petsc4py code adapted from the firedrake docs: > > > > ? class MatrixFreePC(object): > > ? > > ? def setUp(self, pc): > > ? B, P = pc.getOperators() > > ? # extract the MatrixFreeB object from B > > ? ctx = B.getPythonContext() > > ? self.A = ctx.A > > ? self.u = ctx.u > > ? self.v = ctx.v > > ? self.k = ctx.k > > ? # Here we build the PC object that uses the concrete, > > ? # assembled matrix A. We will use this to apply the action > > ? # of A^{-1} > > ? self.pc = PETSc.PC().create() > > ? self.pc.setOptionsPrefix("mf_") > > ? self.pc.setOperators(self.A) > > ? self.pc.setFromOptions() > > ? # Since u and v do not change, we can build the denominator > > ? # and the action of A^{-1} on u only once, in the setup > > ? # phase. > > ? tmp = self.A.createVecLeft() > > ? self.pc.apply(self.u, tmp) > > ? self._Ainvu = tmp > > ? self._denom = 1 + self.k*self.v.dot(self._Ainvu) > > ? > > ? def apply(self, pc, x, y): > > ? # y <- A^{-1}x > > ? self.pc.apply(x, y) > > ? # alpha <- (v^T A^{-1} x) / (1 + v^T A^{-1} u) > > ? alpha = (self.k*self.v.dot(y)) / self._denom > > ? # y <- y - alpha * A^{-1}u > > ? y.axpy(-alpha, self._Ainvu) > > ? > > ? > > ? class MatrixFreeB(object): > > ? > > ? def __init__(self, A, u, v, k): > > ? self.A = A > > ? self.u = u > > ? self.v = v > > ? self.k = k > > ? > > ? def mult(self, mat, x, y): > > ? # y <- A x > > ? self.A.mult(x, y) > > ? > > ? # alpha <- v^T x > > ? alpha = self.v.dot(x) > > ? > > ? # y <- y + alpha*u > > ? y.axpy(alpha, self.u) > > However, this approach is not efficient as it requires many iterations due to the Newton step being fixed, so I would like to implement it using SNES and use line search. Unfortunately, I have not been able to find any documentation/tutorial on how to do so. Provided I have the FEniCS forms for F, K, and a, I'd like to do something along the lines of: > > solver = PETScSNESSolver() # the FEniCS SNES wrapper > > snes = solver.snes() # the petsc4py SNES object > > ## ?? > > ksp = snes.getKSP() > > # set ksp option similar to above > > solver.solve() > > > > I would be very grateful if anyone could could help or point me to a reference or demo that does something similar (or maybe a completely different way of solving the problem!). > > Many thanks in advance! > > Alex > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From amin.sadeghi at live.com Thu Feb 6 11:36:37 2020 From: amin.sadeghi at live.com (Amin Sadeghi) Date: Thu, 6 Feb 2020 17:36:37 +0000 Subject: [petsc-users] PETSc scaling for solving system of equations Message-ID: Hi, Recently, I've been playing around with petsc4py to solve a battery simulation, which takes too long to solve using scipy solvers. I also have access to an HPC cluster with a few nodes, each with a dozen CPU cores. However, I can't seem to get any further speedup past 4 processors. Very likely, I'm doing something wrong. I'd really appreciate it if someone could shed some light on this. For the record, I'm using PETSc's cg solver. Best, Amin -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Feb 6 12:25:03 2020 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Thu, 6 Feb 2020 18:25:03 +0000 Subject: [petsc-users] Implementing the Sherman Morisson formula (low rank update) in petsc4py and FEniCS? In-Reply-To: References: <20E8B18C-F71E-4B10-958B-6CD24DA869A3@mcs.anl.gov> Message-ID: <35929586-4D5D-4B31-A34E-8D8D266FEA0A@mcs.anl.gov> If I remember your problem is K(u) + kaa' = F(u) You should start by creating a SNES and calling SNESSetPicard. Read its manual page. Your matrix should be a MatCreateLRC() for the Mat argument to SNESSetPicard and the Peat should be just your K matrix. If you run with -ksp_fd_operator -pc_type lu will be using K to precondition K + kaa' + d F(U)/dU . Newton's method should converge at quadratic order. You can use -ksp_fd_operator -pc_type anything else to use an iterative linear solver as the preconditioner of K. If you really want to use Sherman Morisson formula then you would create a PC shell and do typedef struct{ KSP innerksp; Vec u_1,u_2; } YourStruct; SNESGetKSP(&ksp); KSPGetPC(&pc); PCSetType(pc,PCSHELL); PCShellSetApply(pc,YourPCApply) PetscMemclear(yourstruct,si PCShellSetContext(pc,&yourstruct); Then YourPCApply(PC pc,Vec in, Vec out) { YourStruct *yourstruct; PCShellGetContext(pc,(void**)&yourstruct) if (!yourstruct->ksp) { PCCreate(comm,&yourstruct->ksp); KSPSetPrefix(yourstruct->ksp,"yourpc_"); Mat A,B; KSPGetOperators(ksp,&A,&B); KSPSetOperators(yourstruct->ksp,A,B); create work vectors } Apply the solve as you do for the linear case with Sherman Morisson formula } This you can run with for example -yourpc_pc_type cholesky Barry Looks complicated, conceptually simple. > 2) Call KSPSetOperators(ksp, K, K,) > > 3) Solve the first system KSPSolve(ksp, -F, u_1) > > 4) Solve the second system KSPSolve(ksp, a, u_2) > > 5) Calculate beta VecDot(a, u_2, &gamma); beta = 1./(1. + k*gamma); > > 6) Update the guess VecDot(u_2, F, &delta); VecAXPBYPCZ(u, 1.0, beta*delta, 1.0, u_1, u_2) No > On Feb 6, 2020, at 9:02 AM, Olek Niewiarowski wrote: > > Hi Matt, > > What you suggested in your last email was exactly what I did on my very first attempt at the problem, and while it "worked," convergence was not satisfactory due to the Newton step being fixed in step 6. This is the reason I would like to use the linesearch in SNES instead. Indeed in your manual you "recommend most PETSc users work directly with SNES, rather than using PETSc for the linear problem within a nonlinear solver." Ideally I'd like to create a SNES solver, pass in the functions to evaluate K, F, a, and k, and set up the underlying KSP object as in my first message. Is this possible? > > Thanks, > > Alexander (Olek) Niewiarowski > PhD Candidate, Civil & Environmental Engineering > Princeton University, 2020 > Cell: +1 (610) 393-2978 > From: Matthew Knepley > Sent: Thursday, February 6, 2020 5:33 > To: Olek Niewiarowski > Cc: Smith, Barry F. ; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Implementing the Sherman Morisson formula (low rank update) in petsc4py and FEniCS? > > On Wed, Feb 5, 2020 at 8:53 PM Olek Niewiarowski wrote: > Hi Barry and Matt, > > Thank you for your input and for creating a new issue in the repo. > My initial question was more basic (how to configure the SNES's KSP solver as in my first message with a and k), but now I see there's more to the implementation. To reiterate, for my problem's structure, a good solution algorithm (on the algebraic level) is the following "double back-substitution": > For each nonlinear iteration: > ? define intermediate vectors u_1 and u_2 > ? solve Ku_1 = -F --> u_1 = -K^{-1}F (this solve is cheap, don't actually need K^-1) > ? solve Ku_2 = -a --> u_2 = -K^{-1}a (ditto) > ? define \beta = 1/(1 + k a^Tu_2) > ? \Delta u = u_1 + \beta k u_2^T F u_2 > ? u = u + \Delta u > This is very easy to setup: > > 1) Create a KSP object KSPCreate(comm, &ksp) > > 2) Call KSPSetOperators(ksp, K, K,) > > 3) Solve the first system KSPSolve(ksp, -F, u_1) > > 4) Solve the second system KSPSolve(ksp, a, u_2) > > 5) Calculate beta VecDot(a, u_2, &gamma); beta = 1./(1. + k*gamma); > > 6) Update the guess VecDot(u_2, F, &delta); VecAXPBYPCZ(u, 1.0, beta*delta, 1.0, u_1, u_2) > > Thanks, > > Matt > > I don't need the Jacobian inverse, [K?kaaT]-1 = K-1 - (kK-1 aaTK-1)/(1+kaTK-1a) just the solution ?u = [K?kaaT]-1F = K-1F - (kK-1 aFK-1a)/(1 + kaTK-1a) > = u_1 + beta k u_2^T F u_2 (so I never need to invert K either). (To Matt's point on gitlab, K is a symmetric sparse matrix arising from a bilinear form. ) Also, eventually, I want to have more than one low-rank updates to K, but again, Sherman Morrisson Woodbury should still work. > > Being new to PETSc, I don't know if this algorithm directly translates into an efficient numerical solver. (I'm also not sure if Picard iteration would be useful here.) What would it take to set up the KSP solver in SNES like I did below? Is it possible "out of the box"? I looked at MatCreateLRC() - what would I pass this to? (A pointer to demo/tutorial would be very appreciated.) If there's a better way to go about all of this, I'm open to any and all ideas. My only limitation is that I use petsc4py exclusively since I/future users of my code will not be comfortable with C. > > Thanks again for your help! > > > Alexander (Olek) Niewiarowski > PhD Candidate, Civil & Environmental Engineering > Princeton University, 2020 > Cell: +1 (610) 393-2978 > From: Smith, Barry F. > Sent: Wednesday, February 5, 2020 15:46 > To: Matthew Knepley > Cc: Olek Niewiarowski ; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Implementing the Sherman Morisson formula (low rank update) in petsc4py and FEniCS? > > > https://gitlab.com/petsc/petsc/issues/557 > > > > On Feb 5, 2020, at 7:35 AM, Matthew Knepley wrote: > > > > Perhaps Barry is right that you want Picard, but suppose you really want Newton. > > > > "This problem can be solved efficiently using the Sherman-Morrison formula" Well, maybe. The main assumption here is that inverting K is cheap. I see two things you can do in a straightforward way: > > > > 1) Use MatCreateLRC() https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateLRC.html to create the Jacobian > > and solve using an iterative method. If you pass just K was the preconditioning matrix, you can use common PCs. > > > > 2) We only implemented MatMult() for LRC, but you could stick your SMW code in for MatSolve_LRC if you really want to factor K. We would > > of course help you do this. > > > > Thanks, > > > > Matt > > > > On Wed, Feb 5, 2020 at 1:36 AM Smith, Barry F. via petsc-users wrote: > > > > I am not sure of everything in your email but it sounds like you want to use a "Picard" iteration to solve [K(u)?kaaT]?u=?F(u). That is solve > > > > A(u^{n}) (u^{n+1} - u^{n}) = F(u^{n}) - A(u^{n})u^{n} where A(u) = K(u) - kaaT > > > > PETSc provides code to this with SNESSetPicard() (see the manual pages) I don't know if Petsc4py has bindings for this. > > > > Adding missing python bindings is not terribly difficult and you may be able to do it yourself if this is the approach you want. > > > > Barry > > > > > > > > > On Feb 4, 2020, at 5:07 PM, Olek Niewiarowski wrote: > > > > > > Hello, > > > I am a FEniCS user but new to petsc4py. I am trying to modify the KSP solver through the SNES object to implement the Sherman-Morrison formula(e.g. http://fourier.eng.hmc.edu/e176/lectures/algebra/node6.html ). I am solving a nonlinear system of the form [K(u)?kaaT]?u=?F(u). Here the jacobian matrix K is modified by the term kaaT, where k is a scalar. Notably, K is sparse, while the term kaaT results in a full matrix. This problem can be solved efficiently using the Sherman-Morrison formula : > > > > > > [K?kaaT]-1 = K-1 - (kK-1 aaTK-1)/(1+kaTK-1a) > > > I have managed to successfully implement this at the linear solve level (by modifying the KSP solver) inside a custom Newton solver in python by following an incomplete tutorial at https://www.firedrakeproject.org/petsc-interface.html#defining-a-preconditioner : > > > ? while (norm(delU) > alpha): # while not converged > > > ? > > > ? self.update_F() # call to method to update r.h.s form > > > ? self.update_K() # call to update the jacobian form > > > ? K = assemble(self.K) # assemble the jacobian matrix > > > ? F = assemble(self.F) # assemble the r.h.s vector > > > ? a = assemble(self.a_form) # assemble the a_form (see Sherman Morrison formula) > > > ? > > > ? for bc in self.mem.bc: # apply boundary conditions > > > ? bc.apply(K, F) > > > ? bc.apply(K, a) > > > ? > > > ? B = PETSc.Mat().create() > > > ? > > > ? # Assemble the bilinear form that defines A and get the concrete > > > ? # PETSc matrix > > > ? A = as_backend_type(K).mat() # get the PETSc objects for K and a > > > ? u = as_backend_type(a).vec() > > > ? > > > ? # Build the matrix "context" # see firedrake docs > > > ? Bctx = MatrixFreeB(A, u, u, self.k) > > > ? > > > ? # Set up B > > > ? # B is the same size as A > > > ? B.setSizes(*A.getSizes()) > > > ? B.setType(B.Type.PYTHON) > > > ? B.setPythonContext(Bctx) > > > ? B.setUp() > > > ? > > > ? > > > ? ksp = PETSc.KSP().create() # create the KSP linear solver object > > > ? ksp.setOperators(B) > > > ? ksp.setUp() > > > ? pc = ksp.pc > > > ? pc.setType(pc.Type.PYTHON) > > > ? pc.setPythonContext(MatrixFreePC()) > > > ? ksp.setFromOptions() > > > ? > > > ? solution = delU # the incremental displacement at this iteration > > > ? > > > ? b = as_backend_type(-F).vec() > > > ? delu = solution.vector().vec() > > > ? > > > ? ksp.solve(b, delu) > > > > > > ? self.mem.u.vector().axpy(0.25, self.delU.vector()) # poor man's linesearch > > > ? counter += 1 > > > Here is the corresponding petsc4py code adapted from the firedrake docs: > > > > > > ? class MatrixFreePC(object): > > > ? > > > ? def setUp(self, pc): > > > ? B, P = pc.getOperators() > > > ? # extract the MatrixFreeB object from B > > > ? ctx = B.getPythonContext() > > > ? self.A = ctx.A > > > ? self.u = ctx.u > > > ? self.v = ctx.v > > > ? self.k = ctx.k > > > ? # Here we build the PC object that uses the concrete, > > > ? # assembled matrix A. We will use this to apply the action > > > ? # of A^{-1} > > > ? self.pc = PETSc.PC().create() > > > ? self.pc.setOptionsPrefix("mf_") > > > ? self.pc.setOperators(self.A) > > > ? self.pc.setFromOptions() > > > ? # Since u and v do not change, we can build the denominator > > > ? # and the action of A^{-1} on u only once, in the setup > > > ? # phase. > > > ? tmp = self.A.createVecLeft() > > > ? self.pc.apply(self.u, tmp) > > > ? self._Ainvu = tmp > > > ? self._denom = 1 + self.k*self.v.dot(self._Ainvu) > > > ? > > > ? def apply(self, pc, x, y): > > > ? # y <- A^{-1}x > > > ? self.pc.apply(x, y) > > > ? # alpha <- (v^T A^{-1} x) / (1 + v^T A^{-1} u) > > > ? alpha = (self.k*self.v.dot(y)) / self._denom > > > ? # y <- y - alpha * A^{-1}u > > > ? y.axpy(-alpha, self._Ainvu) > > > ? > > > ? > > > ? class MatrixFreeB(object): > > > ? > > > ? def __init__(self, A, u, v, k): > > > ? self.A = A > > > ? self.u = u > > > ? self.v = v > > > ? self.k = k > > > ? > > > ? def mult(self, mat, x, y): > > > ? # y <- A x > > > ? self.A.mult(x, y) > > > ? > > > ? # alpha <- v^T x > > > ? alpha = self.v.dot(x) > > > ? > > > ? # y <- y + alpha*u > > > ? y.axpy(alpha, self.u) > > > However, this approach is not efficient as it requires many iterations due to the Newton step being fixed, so I would like to implement it using SNES and use line search. Unfortunately, I have not been able to find any documentation/tutorial on how to do so. Provided I have the FEniCS forms for F, K, and a, I'd like to do something along the lines of: > > > solver = PETScSNESSolver() # the FEniCS SNES wrapper > > > snes = solver.snes() # the petsc4py SNES object > > > ## ?? > > > ksp = snes.getKSP() > > > # set ksp option similar to above > > > solver.solve() > > > > > > I would be very grateful if anyone could could help or point me to a reference or demo that does something similar (or maybe a completely different way of solving the problem!). > > > Many thanks in advance! > > > Alex > > > > > > > > -- > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From jed at jedbrown.org Thu Feb 6 12:25:56 2020 From: jed at jedbrown.org (Jed Brown) Date: Thu, 06 Feb 2020 11:25:56 -0700 Subject: [petsc-users] PETSc scaling for solving system of equations In-Reply-To: References: Message-ID: <87pnerd13f.fsf@jedbrown.org> As a first step, please run with -log_view and send results. What is the size of your problem and what preconditioner are you using? Amin Sadeghi writes: > Hi, > > Recently, I've been playing around with petsc4py to solve a battery simulation, which takes too long to solve using scipy solvers. I also have access to an HPC cluster with a few nodes, each with a dozen CPU cores. However, I can't seem to get any further speedup past 4 processors. Very likely, I'm doing something wrong. I'd really appreciate it if someone could shed some light on this. > > For the record, I'm using PETSc's cg solver. > > Best, > Amin From bsmith at mcs.anl.gov Thu Feb 6 13:03:50 2020 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Thu, 6 Feb 2020 19:03:50 +0000 Subject: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC? In-Reply-To: References: <264F91C4-8558-4850-9B4B-ABE4123C2A2C@anl.gov> <4A373D93-4018-45E0-B805-3ECC528472DD@mcs.anl.gov> Message-ID: Read my comments ALL the way down, they go a long way. > On Feb 6, 2020, at 3:43 AM, Hao DONG wrote: > > Dear Hong and Barry, > > Thanks for the suggestions. So there could be some problems in my PETSc configuration? - but my PETSc lib was indeed compiled without the debug flags (--with-debugging=0). I use GCC/GFortran (Home-brew GCC 9.2.0) for the compiling and building of PETSc and my own fortran code. My Fortran compiling flags are simply like: > > -O3 -ffree-line-length-none -fastsse > > Which is also used for -FOPTFLAGS in PETSc (I added -openmp for PETSc, but not my fortran code, as I don?t have any OMP optimizations in my program). Note the performance test results I listed yesterday (e.g. 4.08s with 41 bcgs iterations.) are without any CSR-array->PETSc translation overhead (only include the set and solve part). PETSc doesn't use -openmp in any way for its solvers. Do not use this option, it may be slowing the code down. Please send configure.log > > I have two questions about the performance difference - > > 1. Is ilu only factorized once for each iteration, or ilu is performed at every outer ksp iteration steps? Sounds unlikely - but if so, this could cause some extra overheads. ILU is ONLY done if the matrix has changed (which seems wrong). > > 2. Some KSP solvers like BCGS or TFQMR has two ?half-iterations? for each iteration step. Not sure how it works in PETSc, but is that possible that both the two ?half" relative residuals are counted in the output array, doubling the number of iterations (but that cannot explain the extra time consumed)? Yes, PETSc might report them as two, you need to check the exact code. > > Anyway, the output with -log_view from the same 278906 by 278906 matrix with 3-block D-ILU in PETSc is as follows: > > > ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- > > MEMsolv.lu on a arch-darwin-c-opt named Haos-MBP with 1 processor, by donghao Thu Feb 6 09:07:35 2020 > Using Petsc Release Version 3.12.3, unknown > > Max Max/Min Avg Total > Time (sec): 4.443e+00 1.000 4.443e+00 > Objects: 1.155e+03 1.000 1.155e+03 > Flop: 4.311e+09 1.000 4.311e+09 4.311e+09 > Flop/sec: 9.703e+08 1.000 9.703e+08 9.703e+08 > MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00 > MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00 > MPI Reductions: 0.000e+00 0.000 > > Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) > e.g., VecAXPY() for real vectors of length N --> 2N flop > and VecAXPY() for complex vectors of length N --> 8N flop > > Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- > Avg %Total Avg %Total Count %Total Avg %Total Count %Total > 0: Main Stage: 4.4435e+00 100.0% 4.3113e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% > > ???????????????????????????????????????????????????????????? > See the 'Profiling' chapter of the users' manual for details on interpreting output. > Phase summary info: > Count: number of times phase was executed > Time and Flop: Max - maximum over all processors > Ratio - ratio of maximum to minimum over all processors > Mess: number of messages sent > AvgLen: average message length (bytes) > Reduct: number of global reductions > Global: entire computation > Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). > %T - percent time in this phase %F - percent flop in this phase > %M - percent messages in this phase %L - percent message lengths in this phase > %R - percent reductions in this phase > Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) > ------------------------------------------------------------------------------------------------------------------------ > Event Count Time (sec) Flop --- Global --- --- Stage ---- Total > Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > ------------------------------------------------------------------------------------------------------------------------ > > --- Event Stage 0: Main Stage > > BuildTwoSidedF 1 1.0 2.3000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatMult 83 1.0 1.7815e+00 1.0 2.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 40 48 0 0 0 40 48 0 0 0 1168 > MatSolve 252 1.0 1.2708e+00 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 29 28 0 0 0 29 28 0 0 0 939 > MatLUFactorNum 3 1.0 7.9725e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 298 > MatILUFactorSym 3 1.0 2.6998e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > MatAssemblyBegin 5 1.0 3.6000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyEnd 5 1.0 3.1619e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > MatGetRowIJ 3 1.0 2.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatCreateSubMats 1 1.0 3.9659e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > MatGetOrdering 3 1.0 4.3070e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatView 3 1.0 1.3600e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecDot 82 1.0 1.8948e-01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00 4 4 0 0 0 4 4 0 0 0 966 > VecDotNorm2 41 1.0 1.6812e-01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00 4 4 0 0 0 4 4 0 0 0 1088 > VecNorm 43 1.0 9.5099e-02 1.0 9.59e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 1009 > VecCopy 2 1.0 1.0120e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecSet 271 1.0 3.8922e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > VecAXPY 1 1.0 7.7200e-04 1.0 2.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2890 > VecAXPBYCZ 82 1.0 2.4370e-01 1.0 3.66e+08 1.0 0.0e+00 0.0e+00 0.0e+00 5 8 0 0 0 5 8 0 0 0 1502 > VecWAXPY 82 1.0 1.4148e-01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 4 0 0 0 3 4 0 0 0 1293 > VecAssemblyBegin 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecAssemblyEnd 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecScatterBegin 84 1.0 5.9300e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSetUp 4 1.0 1.4167e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSolve 1 1.0 4.0250e+00 1.0 4.31e+09 1.0 0.0e+00 0.0e+00 0.0e+00 91100 0 0 0 91100 0 0 0 1071 > PCSetUp 4 1.0 1.5207e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 1 0 0 0 3 1 0 0 0 156 > PCSetUpOnBlocks 1 1.0 1.1116e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 1 0 0 0 3 1 0 0 0 214 > PCApply 84 1.0 1.2912e+00 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 29 28 0 0 0 29 28 0 0 0 924 > PCApplyOnBlocks 252 1.0 1.2909e+00 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 29 28 0 0 0 29 28 0 0 0 924 > ------------------------------------------------------------------------------------------------------------------------ > > # I skipped the memory part - the options (and compiler options) are as follows: > > #PETSc Option Table entries: > -ksp_type bcgs > -ksp_view > -log_view > -pc_bjacobi_local_blocks 3 > -pc_factor_levels 0 > -pc_sub_type ilu > -pc_type bjacobi > #End of PETSc Option Table entries > Compiled with FORTRAN kernels > Compiled with full precision matrices (default) > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 16 sizeof(PetscInt) 4 > Configure options: --with-scalar-type=complex --download-mumps --download-scalapack --with-fortran-kernels=1 -- FOPTFLAGS=?-O3 -fastsse -mp -openmp? --COPTFLAGS=?-O3 -fastsse -mp -openmp? --CXXOPTFLAGS="-O3 -fastsse -mp -openmp" -- with-debugging=0 > ----------------------------------------- > Libraries compiled on 2020-02-03 10:44:31 on Haos-MBP > Machine characteristics: Darwin-19.2.0-x86_64-i386-64bit > Using PETSc directory: /Users/donghao/src/git/PETSc-current > Using PETSc arch: arch-darwin-c-opt > ----------------------------------------- > > Using C compiler: mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fno-stack- check -Qunused-arguments -fvisibility=hidden > Using Fortran compiler: mpif90 -Wall -ffree-line-length-0 -Wno-unused-dummy-argument > > Using include paths: -I/Users/donghao/src/git/PETSc-current/include -I/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/include > ----------------------------------------- > > Using C linker: mpicc > Using Fortran linker: mpif90 > Using libraries: -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/donghao/src/git/PETSc- current/arch-darwin-c-opt/lib -lpetsc -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/ donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -Wl,-rpath,/usr/local/opt/libevent/lib -L/usr/local/opt/libevent/ lib -Wl,-rpath,/usr/local/Cellar/open-mpi/4.0.2/lib -L/usr/local/Cellar/open-mpi/4.0.2/lib -Wl,-rpath,/usr/local/Cellar/ gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple-darwin19/9.2.0 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple- darwin19/9.2.0 -Wl,-rpath,/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -lcmumps - ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -llapack -lblas -lc++ -ldl -lmpi_usempif08 - lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lquadmath -lm -lc++ -ldl > > > On the other hand, running PETSc with > > -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type lu -ksp_type gmres -ksp_monitor -ksp_view -log_view > > For the same problem takes 5.37s and 72 GMRES iterations. Our previous testings show that BiCGstab (bcgs in PETSc) is almost always the most effective KSP solver for our non-symmetrical complex system. Strangely, the system is still using ilu instead of lu for sub blocks. The output is like: -sub_pc_type lu > > 0 KSP Residual norm 2.480412407430e+02 > 1 KSP Residual norm 8.848059967835e+01 > 2 KSP Residual norm 3.415272863261e+01 > 3 KSP Residual norm 1.563045190939e+01 > 4 KSP Residual norm 6.241296940043e+00 > 5 KSP Residual norm 2.739710899854e+00 > 6 KSP Residual norm 1.391304148888e+00 > 7 KSP Residual norm 7.959262020849e-01 > 8 KSP Residual norm 4.828323055231e-01 > 9 KSP Residual norm 2.918529739200e-01 > 10 KSP Residual norm 1.905508589557e-01 > 11 KSP Residual norm 1.291541892702e-01 > 12 KSP Residual norm 8.827145774707e-02 > 13 KSP Residual norm 6.521331095889e-02 > 14 KSP Residual norm 5.095787952595e-02 > 15 KSP Residual norm 4.043060387395e-02 > 16 KSP Residual norm 3.232590200012e-02 > 17 KSP Residual norm 2.593944982216e-02 > 18 KSP Residual norm 2.064639483533e-02 > 19 KSP Residual norm 1.653916663492e-02 > 20 KSP Residual norm 1.334946415452e-02 > 21 KSP Residual norm 1.092886880597e-02 > 22 KSP Residual norm 8.988004105542e-03 > 23 KSP Residual norm 7.466501315240e-03 > 24 KSP Residual norm 6.284389135436e-03 > 25 KSP Residual norm 5.425231669964e-03 > 26 KSP Residual norm 4.766338253084e-03 > 27 KSP Residual norm 4.241238878242e-03 > 28 KSP Residual norm 3.808113525685e-03 > 29 KSP Residual norm 3.449383788116e-03 > 30 KSP Residual norm 3.126025526388e-03 > 31 KSP Residual norm 2.958328054299e-03 > 32 KSP Residual norm 2.802344900403e-03 > 33 KSP Residual norm 2.621993580492e-03 > 34 KSP Residual norm 2.430066269304e-03 > 35 KSP Residual norm 2.259043079597e-03 > 36 KSP Residual norm 2.104287972986e-03 > 37 KSP Residual norm 1.952916080045e-03 > 38 KSP Residual norm 1.804988937999e-03 > 39 KSP Residual norm 1.643302117377e-03 > 40 KSP Residual norm 1.471661332036e-03 > 41 KSP Residual norm 1.286445911163e-03 > 42 KSP Residual norm 1.127543025848e-03 > 43 KSP Residual norm 9.777148275484e-04 > 44 KSP Residual norm 8.293314450006e-04 > 45 KSP Residual norm 6.989331136622e-04 > 46 KSP Residual norm 5.852307780220e-04 > 47 KSP Residual norm 4.926715539762e-04 > 48 KSP Residual norm 4.215941372075e-04 > 49 KSP Residual norm 3.699489548162e-04 > 50 KSP Residual norm 3.293897163533e-04 > 51 KSP Residual norm 2.959954542998e-04 > 52 KSP Residual norm 2.700193032414e-04 > 53 KSP Residual norm 2.461789791204e-04 > 54 KSP Residual norm 2.218839085563e-04 > 55 KSP Residual norm 1.945154309976e-04 > 56 KSP Residual norm 1.661128781744e-04 > 57 KSP Residual norm 1.413198766258e-04 > 58 KSP Residual norm 1.213984003195e-04 > 59 KSP Residual norm 1.044317450754e-04 > 60 KSP Residual norm 8.919957502977e-05 > 61 KSP Residual norm 8.042584301275e-05 > 62 KSP Residual norm 7.292784493581e-05 > 63 KSP Residual norm 6.481935501872e-05 > 64 KSP Residual norm 5.718564652679e-05 > 65 KSP Residual norm 5.072589750116e-05 > 66 KSP Residual norm 4.487930741285e-05 > 67 KSP Residual norm 3.941040674119e-05 > 68 KSP Residual norm 3.492873281291e-05 > 69 KSP Residual norm 3.103798339845e-05 > 70 KSP Residual norm 2.822943237409e-05 > 71 KSP Residual norm 2.610615023776e-05 > 72 KSP Residual norm 2.441692671173e-05 > KSP Object: 1 MPI processes > type: gmres > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=150, nonzero initial guess > tolerances: relative=1e-07, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: 1 MPI processes > type: bjacobi > number of blocks = 3 > Local solve is same for all blocks, in the following KSP and PC objects: > KSP Object: (sub_) 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (sub_) 1 MPI processes > type: ilu > out-of-place factorization > 0 levels of fill > tolerance for zero pivot 2.22045e-14 > matrix ordering: natural > factor fill ratio given 1., needed 1. > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=92969, cols=92969 > package used to perform factorization: petsc > total: nonzeros=638417, allocated nonzeros=638417 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=92969, cols=92969 > total: nonzeros=638417, allocated nonzeros=638417 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: mpiaij > rows=278906, cols=278906 > total: nonzeros=3274027, allocated nonzeros=3274027 > total number of mallocs used during MatSetValues calls=0 > not using I-node (on process 0) routines > ... > ------------------------------------------------------------------------------------------------------------------------ > Event Count Time (sec) Flop --- Global --- --- Stage ---- Total > Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > ------------------------------------------------------------------------------------------------------------------------ > > --- Event Stage 0: Main Stage > > BuildTwoSidedF 1 1.0 2.3000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatMult 75 1.0 1.5812e+00 1.0 1.88e+09 1.0 0.0e+00 0.0e+00 0.0e+00 28 24 0 0 0 28 24 0 0 0 1189 > MatSolve 228 1.0 1.1442e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 14 0 0 0 20 14 0 0 0 944 These flop rates are ok, but not great. Perhaps an older machine. > MatLUFactorNum 3 1.0 8.1930e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 290 > MatILUFactorSym 3 1.0 2.7102e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyBegin 5 1.0 3.7000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyEnd 5 1.0 3.1895e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > MatGetRowIJ 3 1.0 2.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatCreateSubMats 1 1.0 4.0904e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > MatGetOrdering 3 1.0 4.2640e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatView 3 1.0 1.4400e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecMDot 72 1.0 1.1984e+00 1.0 2.25e+09 1.0 0.0e+00 0.0e+00 0.0e+00 21 28 0 0 0 21 28 0 0 0 1877 21 percent of the time in VecMDOT this is huge for s sequential fun. I think maybe you are using a terrible OpenMP BLAS? Send configure.log > VecNorm 76 1.0 1.6841e-01 1.0 1.70e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 2 0 0 0 3 2 0 0 0 1007 > VecScale 75 1.0 1.8241e-02 1.0 8.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 4587 > VecCopy 3 1.0 1.4970e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecSet 276 1.0 9.1970e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 > VecAXPY 6 1.0 3.7450e-03 1.0 1.34e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3575 > VecMAXPY 75 1.0 1.0022e+00 1.0 2.41e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 30 0 0 0 18 30 0 0 0 2405 > VecAssemblyBegin 2 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecAssemblyEnd 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecScatterBegin 76 1.0 5.5100e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecNormalize 75 1.0 1.8462e-01 1.0 2.51e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 3 0 0 0 3 3 0 0 0 1360 > KSPSetUp 4 1.0 1.1341e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSolve 1 1.0 5.3123e+00 1.0 7.91e+09 1.0 0.0e+00 0.0e+00 0.0e+00 93100 0 0 0 93100 0 0 0 1489 > KSPGMRESOrthog 72 1.0 2.1316e+00 1.0 4.50e+09 1.0 0.0e+00 0.0e+00 0.0e+00 37 57 0 0 0 37 57 0 0 0 2110 > PCSetUp 4 1.0 1.5531e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 153 > PCSetUpOnBlocks 1 1.0 1.1343e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 209 > PCApply 76 1.0 1.1671e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 14 0 0 0 20 14 0 0 0 925 > PCApplyOnBlocks 228 1.0 1.1668e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 14 0 0 0 20 14 0 0 0 925 > ???????????????????????????????????????????????????????????? > ... > #PETSc Option Table entries: > -ksp_monitor > -ksp_type gmres > -ksp_view > -log_view > -pc_bjacobi_local_blocks 3 > -pc_sub_type lu > -pc_type bjacobi > #End of PETSc Option Table entries > ... > > Does any of the setup/output ring a bell? > > BTW, out of curiosity - what is a ?I-node? routine? > > > Cheers, > Hao > > > From: Smith, Barry F. > Sent: Wednesday, February 5, 2020 9:42 PM > To: Hao DONG > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC? > > > > > On Feb 5, 2020, at 4:36 AM, Hao DONG wrote: > > > > Thanks a lot for your suggestions, Hong and Barry - > > > > As you suggested, I first tried the LU direct solvers (built-in and MUMPs) out this morning, which work perfectly, albeit slow. As my problem itself is a part of a PDE based optimization, the exact solution in the searching procedure is not necessary (I often set a relative tolerance of 1E-7/1E-8, etc. for Krylov subspace methods). Also tried BJACOBI with exact LU, the KSP just converges in one or two iterations, as expected. > > > > I added -kspview option for the D-ILU test (still with Block Jacobi as preconditioner and bcgs as KSP solver). The KSPview output from one of the examples in a toy model looks like: > > > > KSP Object: 1 MPI processes > > type: bcgs > > maximum iterations=120, nonzero initial guess > > tolerances: relative=1e-07, absolute=1e-50, divergence=10000. > > left preconditioning > > using PRECONDITIONED norm type for convergence test > > PC Object: 1 MPI processes > > type: bjacobi > > number of blocks = 3 > > Local solve is same for all blocks, in the following KSP and PC objects: > > KSP Object: (sub_) 1 MPI processes > > type: preonly > > maximum iterations=10000, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > using NONE norm type for convergence test > > PC Object: (sub_) 1 MPI processes > > type: ilu > > out-of-place factorization > > 0 levels of fill > > tolerance for zero pivot 2.22045e-14 > > matrix ordering: natural > > factor fill ratio given 1., needed 1. > > Factored matrix follows: > > Mat Object: 1 MPI processes > > type: seqaij > > rows=11294, cols=11294 > > package used to perform factorization: petsc > > total: nonzeros=76008, allocated nonzeros=76008 > > total number of mallocs used during MatSetValues calls=0 > > not using I-node routines > > linear system matrix = precond matrix: > > Mat Object: 1 MPI processes > > type: seqaij > > rows=11294, cols=11294 > > total: nonzeros=76008, allocated nonzeros=76008 > > total number of mallocs used during MatSetValues calls=0 > > not using I-node routines > > linear system matrix = precond matrix: > > Mat Object: 1 MPI processes > > type: mpiaij > > rows=33880, cols=33880 > > total: nonzeros=436968, allocated nonzeros=436968 > > total number of mallocs used during MatSetValues calls=0 > > not using I-node (on process 0) routines > > > > do you see something wrong with my setup? > > > > I also tried a quick performance test with a small 278906 by 278906 matrix (3850990 nnz) with the following parameters: > > > > -ksp_type bcgs -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type ilu -ksp_view > > > > Reducing the relative residual to 1E-7 > > > > Took 4.08s with 41 bcgs iterations. > > > > Merely changing the -pc_bjacobi_local_blocks to 6 > > > > Took 7.02s with 73 bcgs iterations. 9 blocks would further take 9.45s with 101 bcgs iterations. > > This is normal. more blocks slower convergence > > > > As a reference, my home-brew Fortran code solves the same problem (3-block D-ILU0) in > > > > 1.84s with 24 bcgs iterations (the bcgs code is also a home-brew one)? > > > Run the PETSc code with optimization ./configure --with-debugging=0 an run the code with -log_view this will show where the PETSc code is spending the time (send it to use) > > > > > > > > Well, by saying ?use explicit L/U matrix as preconditioner?, I wonder if a user is allowed to provide his own (separate) L and U Mat for preconditioning (like how it works in Matlab solvers), e.g. > > > > x = qmr(A,b,Tol,MaxIter,L,U,x) > > > > As I already explicitly constructed the L and U matrices in Fortran, it is not hard to convert them to Mat in Petsc to test Petsc and my Fortran code head-to-head. In that case, the A, b, x, and L/U are all identical, it would be easier to see where the problem came from. > > > > > No, we don't provide this kind of support > > > > > > BTW, there is another thing I wondered - is there a way to output residual in unpreconditioned norm? I tried to > > > > call KSPSetNormType(ksp_local, KSP_NORM_UNPRECONDITIONED, ierr) > > > > But always get an error that current ksp does not support unpreconditioned in LEFT/RIGHT (either way). Is it possible to do that (output unpreconditioned residual) in PETSc at all? > > -ksp_monitor_true_residual You can also run GMRES (and some other methods) with right preconditioning, -ksp_pc_side right then the residual computed is by the algorithm the unpreconditioned residual > > KSPSetNormType sets the norm used in the algorithm, it generally always has to left or right, only a couple algorithms support unpreconditioned directly. > > Barry > > > > > > Cheers, > > Hao > > > > > > From: Smith, Barry F. > > Sent: Tuesday, February 4, 2020 8:27 PM > > To: Hao DONG > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC? > > > > > > > > > On Feb 4, 2020, at 12:41 PM, Hao DONG wrote: > > > > > > Dear all, > > > > > > > > > I have a few questions about the implementation of diagonal ILU PC in PETSc. I want to solve a very simple system with KSP (in parallel), the nature of the system (finite difference time-harmonic Maxwell) is probably not important to the question itself. Long story short, I just need to solve a set of equations of Ax = b with a block diagonal system matrix, like (not sure if the mono font works): > > > > > > |X | > > > A =| Y | > > > | Z| > > > > > > Note that A is not really block-diagonal, it?s just a multi-diagonal matrix determined by the FD mesh, where most elements are close to diagonal. So instead of a full ILU decomposition, a D-ILU is a good approximation as a preconditioner, and the number of blocks should not matter too much: > > > > > > |Lx | |Ux | > > > L = | Ly | and U = | Uy | > > > | Lz| | Uz| > > > > > > Where [Lx, Ux] = ILU0(X), etc. Indeed, the D-ILU preconditioner (with 3N blocks) is quite efficient with Krylov subspace methods like BiCGstab or QMR in my sequential Fortran/Matlab code. > > > > > > So like most users, I am looking for a parallel implement with this problem in PETSc. After looking through the manual, it seems that the most straightforward way to do it is through PCBJACOBI. Not sure I understand it right, I just setup a 3-block PCJACOBI and give each of the block a KSP with PCILU. Is this supposed to be equivalent to my D-ILU preconditioner? My little block of fortran code would look like: > > > ... > > > call PCBJacobiSetTotalBlocks(pc_local,Ntotal, & > > > & isubs,ierr) > > > call PCBJacobiSetLocalBlocks(pc_local, Nsub, & > > > & isubs(istart:iend),ierr) > > > ! set up the block jacobi structure > > > call KSPSetup(ksp_local,ierr) > > > ! allocate sub ksps > > > allocate(ksp_sub(Nsub)) > > > call PCBJacobiGetSubKSP(pc_local,Nsub,istart, & > > > & ksp_sub,ierr) > > > do i=1,Nsub > > > call KSPGetPC(ksp_sub(i),pc_sub,ierr) > > > !ILU preconditioner > > > call PCSetType(pc_sub,ptype,ierr) > > > call PCFactorSetLevels(pc_sub,1,ierr) ! use ILU(1) here > > > call KSPSetType(ksp_sub(i),KSPPREONLY,ierr)] > > > end do > > > call KSPSetTolerances(ksp_local,KSSiter%tol,PETSC_DEFAULT_REAL, & > > > & PETSC_DEFAULT_REAL,KSSiter%maxit,ierr) > > > ? > > > > This code looks essentially right. You should call with -ksp_view to see exactly what PETSc is using for a solver. > > > > > > > > I understand that the parallel performance may not be comparable, so I first set up a one-process test (with MPIAij, but all the rows are local since there is only one process). The system is solved without any problem (identical results within error). But the performance is actually a lot worse (code built without debugging flags in performance tests) than my own home-brew implementation in Fortran (I wrote my own ILU0 in CSR sparse matrix format), which is hard to believe. I suspect the difference is from the PC as the PETSc version took much more BiCGstab iterations (60-ish vs 100-ish) to converge to the same relative tol. > > > > PETSc uses GMRES by default with a restart of 30 and left preconditioning. It could be different exact choices in the solver (which is why -ksp_view is so useful) can explain the differences in the runs between your code and PETSc's > > > > > > This is further confirmed when I change the setup of D-ILU (using 6 or 9 blocks instead of 3). While my Fortran/Matlab codes see minimal performance difference (<5%) when I play with the D-ILU setup, increasing the number of D-ILU blocks from 3 to 9 caused the ksp setup with PCBJACOBI to suffer a performance decrease of more than 50% in sequential test. > > > > This is odd, the more blocks the smaller each block so the quicker the ILU set up should be. You can run various cases with -log_view and send us the output to see what is happening at each part of the computation in time. > > > > > So my implementation IS somewhat different in PETSc. Do I miss something in the PCBJACOBI setup? Or do I have some fundamental misunderstanding of how PCBJACOBI works in PETSc? > > > > Probably not. > > > > > > If this is not the right way to implement a block diagonal ILU as (parallel) PC, please kindly point me to the right direction. I searched through the mail list to find some answers, only to find a couple of similar questions... An example would be nice. > > > > You approach seems fundamentally right but I cannot be sure of possible bugs. > > > > > > On the other hand, does PETSc support a simple way to use explicit L/U matrix as a preconditioner? I can import the D-ILU matrix (I already converted my A matrix into Mat) constructed in my Fortran code to make a better comparison. Or do I have to construct my own PC using PCshell? If so, is there a good tutorial/example to learn about how to use PCSHELL (in a more advanced sense, like how to setup pc side and transpose)? > > > > Not sure what you mean by explicit L/U matrix as a preconditioner. As Hong said, yes you can use a parallel LU from MUMPS or SuperLU_DIST or Pastix as the solver. You do not need any shell code. You simply need to set the PCType to lu > > > > You can also set all this options from the command line and don't need to write the code specifically. So call KSPSetFromOptions() and then for example > > > > -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type ilu (this last one is applied to each block so you could use -pc_type lu and it would use lu on each block.) > > > > -ksp_type_none -pc_type lu -pc_factor_mat_solver_type mumps (do parallel LU with mumps) > > > > By not hardwiring in the code and just using options you can test out different cases much quicker > > > > Use -ksp_view to make sure that is using the solver the way you expect. > > > > Barry > > > > > > > > Barry > > > > > > > > Thanks in advance, > > > > > > Hao From juaneah at gmail.com Thu Feb 6 18:00:01 2020 From: juaneah at gmail.com (Emmanuel Ayala) Date: Thu, 6 Feb 2020 18:00:01 -0600 Subject: [petsc-users] SLEPc: The inner product is not well defined Message-ID: Hi everyone, I'm solving the eigenvalue problem of three bodies in the same program, it generates a three sets of matrices. I installed PETSc as optimized version: Configure options --with-debugging=0 COPTFLAGS="-O2 -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich --download-superlu_dist --download-metis --download-parmetis --download-cmake --download-fblaslapack=1 --with-cxx-dialect=C++11 Then I installed SLEPc in the standard form and referring to PETSc optimized directory. I did NOT install SLEPc with --with-debugging=0, because I'm still testing my code. My matrices comes from DMCreateMatrix, the stiffness matrix and mass matrix. I use the function MatIsSymmetric to check if my matrices are symmetric or not, and always the matrices are symmetric (even when the program crash). For that reason I use: ierr = EPSSetProblemType(eps,EPS_GHEP); CHKERRQ(ierr); ierr = EPSSetType(eps,EPSLOBPCG); CHKERRQ(ierr); // because I need the smallest The problem is: sometimes the code works well for the three sets of matrices and I get the expected results, but sometimes it does not happen, it only works for the first sets of matrices, and then it crashes, and when that occurs the error message is: [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: The inner product is not well defined: indefinite matrix [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.12.3, Jan, 03, 2020 [0]PETSC ERROR: ./comp on a linux-opt-02 named lnx by ayala Thu Feb 6 17:20:16 2020 [0]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2 -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich --download-superlu_dist --download-metis --download-parmetis --download-cmake --download-fblaslapack=1 --with-cxx-dialect=C++11 [0]PETSC ERROR: #1 BV_SafeSqrt() line 130 in /home/ayala/Documents/SLEPc/slepc-3.12.2/include/slepc/private/bvimpl.h [0]PETSC ERROR: #2 BVNorm_Private() line 473 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/bv/interface/bvglobal.c [0]PETSC ERROR: #3 BVNormColumn() line 718 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/bv/interface/bvglobal.c [0]PETSC ERROR: #4 BV_NormVecOrColumn() line 26 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/bv/interface/bvorthog.c [0]PETSC ERROR: #5 BVOrthogonalizeCGS1() line 136 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/bv/interface/bvorthog.c [0]PETSC ERROR: #6 BVOrthogonalizeGS() line 188 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/bv/interface/bvorthog.c [0]PETSC ERROR: #7 BVOrthonormalizeColumn() line 416 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/bv/interface/bvorthog.c [0]PETSC ERROR: #8 EPSSolve_LOBPCG() line 144 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/eps/impls/cg/lobpcg/lobpcg.c [0]PETSC ERROR: #9 EPSSolve() line 149 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/eps/interface/epssolve.c Number of iterations of the method: 0 Number of linear iterations of the method: 0 Solution method: lobpcg Number of requested eigenvalues: 6 Stopping condition: tol=1e-06, maxit=10000 Number of converged eigenpairs: 0 The problem appears even when I run the same compilation. Anyone have any suggestions to solve the problem? Kind regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdkong.jd at gmail.com Thu Feb 6 18:35:56 2020 From: fdkong.jd at gmail.com (Fande Kong) Date: Thu, 6 Feb 2020 17:35:56 -0700 Subject: [petsc-users] Condition Number and GMRES iteration Message-ID: Hi All, MOOSE team, Alex and I are working on some variable scaling techniques to improve the condition number of the matrix of linear systems. The goal of variable scaling is to make the diagonal of matrix as close to unity as possible. After scaling (for certain example), the condition number of the linear system is actually reduced, but the GMRES iteration does not decrease at all. >From my understanding, the condition number is the worst estimation for GMRES convergence. That is, the GMRES iteration should not increases when the condition number decreases. This actually could example what we saw: the improved condition number does not necessary lead to a decrease in GMRES iteration. We try to understand this a bit more, and we guess that the number of eigenvalue clusters of the matrix of the linear system may/might be related to the convergence rate of GMRES. We plot eigenvalues of scaled system and unscaled system, and the clusters look different from each other, but the GMRRES iterations are the same. Anyone know what is the right relationship between the condition number and GMRES iteration? How does the number of eigenvalue clusters affect GMRES iteration? How to count eigenvalue clusters? For example, how many eigenvalue clusters we have in the attach image respectively? If you need more details, please let us know. Alex and I are happy to provide any details you are interested in. Thanks, Fande Kong, -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: imgo-1.jpg Type: image/jpeg Size: 16293 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: imgo.jpg Type: image/jpeg Size: 17349 bytes Desc: not available URL: From alexlindsay239 at gmail.com Thu Feb 6 19:05:48 2020 From: alexlindsay239 at gmail.com (Alexander Lindsay) Date: Thu, 6 Feb 2020 17:05:48 -0800 Subject: [petsc-users] Condition Number and GMRES iteration In-Reply-To: References: Message-ID: It looks like Fande has attached the eigenvalue plots with the real axis having a logarithmic scale. The same plots with a linear scale are attached here. The system has 306 degrees of freedom. 12 eigenvalues are unity for both scaled and unscaled cases; this number corresponds to the number of mesh nodes with Dirichlet boundary conditions (just a 1 on the diagonal for the corresponding rows). The rest of the eigenvalues are orders of magnitude smaller for the unscaled case; using scaling these eigenvalues are brought much closer 1. This particular problem is linear but we solve it with SNES, so constant Jacobian. We run with options '-pc_type none -ksp_gmres_restart 1000 -snes_rtol 1e-8 -ksp_rtol 1e-5` so for this linear problem it takes two non-linear iterations to solve. Unscaled: first nonlinear iteration takes 2 linear iterations second nonlinear iteration takes 99 linear iterations Scaled: first nonlinear iteration takes 94 linear iterations second nonlinear iteration takes 100 linear iterations Running with `-pc_type svd` the condition number for the unscaled simulation is 4e9 while it is 2e3 for the scaled simulation. On Thu, Feb 6, 2020 at 4:36 PM Fande Kong wrote: > Hi All, > > MOOSE team, Alex and I are working on some variable scaling techniques to > improve the condition number of the matrix of linear systems. The goal of > variable scaling is to make the diagonal of matrix as close to unity as > possible. After scaling (for certain example), the condition number of the > linear system is actually reduced, but the GMRES iteration does not > decrease at all. > > From my understanding, the condition number is the worst estimation for > GMRES convergence. That is, the GMRES iteration should not increases when > the condition number decreases. This actually could example what we saw: > the improved condition number does not necessary lead to a decrease in > GMRES iteration. We try to understand this a bit more, and we guess that > the number of eigenvalue clusters of the matrix of the linear system > may/might be related to the convergence rate of GMRES. We plot eigenvalues > of scaled system and unscaled system, and the clusters look different from > each other, but the GMRRES iterations are the same. > > Anyone know what is the right relationship between the condition number > and GMRES iteration? How does the number of eigenvalue clusters affect > GMRES iteration? How to count eigenvalue clusters? For example, how many > eigenvalue clusters we have in the attach image respectively? > > If you need more details, please let us know. Alex and I are happy to > provide any details you are interested in. > > > Thanks, > > Fande Kong, > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: no-scaling-linear.png Type: image/png Size: 11539 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: with-scaling-linear.png Type: image/png Size: 12940 bytes Desc: not available URL: From jroman at dsic.upv.es Fri Feb 7 03:06:38 2020 From: jroman at dsic.upv.es (Jose E. Roman) Date: Fri, 7 Feb 2020 10:06:38 +0100 Subject: [petsc-users] SLEPc: The inner product is not well defined In-Reply-To: References: Message-ID: This error appears when computing the B-norm of a vector x, as sqrt(x'*B*x). Probably your B matrix is semi-definite, and due to floating-point error the value x'*B*x becomes negative for a certain vector x. The code uses a tolerance of 10*PETSC_MACHINE_EPSILON, but it seems the rounding errors are larger in your case. Or maybe your B-matrix is indefinite, in which case you should solve the problem as non-symmetric (or as symmetric-indefinite GHIEP). Do you get the same problem with the Krylov-Schur solver? A workaround is to edit the source code and remove the check or increase the tolerance, but this may be catastrophic if your B is indefinite. A better solution is to reformulate the problem, solving the matrix pair (A,C) where C=alpha*A+beta*B is positive definite (note that then the eigenvalues become lambda/(beta+alpha*lambda)). Jose > El 7 feb 2020, a las 1:00, Emmanuel Ayala escribi?: > > Hi everyone, > > I'm solving the eigenvalue problem of three bodies in the same program, it generates a three sets of matrices. > > I installed PETSc as optimized version: > > Configure options --with-debugging=0 COPTFLAGS="-O2 -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich --download-superlu_dist --download-metis --download-parmetis --download-cmake --download-fblaslapack=1 --with-cxx-dialect=C++11 > > Then I installed SLEPc in the standard form and referring to PETSc optimized directory. I did NOT install SLEPc with --with-debugging=0, because I'm still testing my code. > > My matrices comes from DMCreateMatrix, the stiffness matrix and mass matrix. I use the function MatIsSymmetric to check if my matrices are symmetric or not, and always the matrices are symmetric (even when the program crash). For that reason I use: > > ierr = EPSSetProblemType(eps,EPS_GHEP); CHKERRQ(ierr); > ierr = EPSSetType(eps,EPSLOBPCG); CHKERRQ(ierr); // because I need the smallest > > The problem is: sometimes the code works well for the three sets of matrices and I get the expected results, but sometimes it does not happen, it only works for the first sets of matrices, and then it crashes, and when that occurs the error message is: > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: The inner product is not well defined: indefinite matrix > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.12.3, Jan, 03, 2020 > [0]PETSC ERROR: ./comp on a linux-opt-02 named lnx by ayala Thu Feb 6 17:20:16 2020 > [0]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2 -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich --download-superlu_dist --download-metis --download-parmetis --download-cmake --download-fblaslapack=1 --with-cxx-dialect=C++11 > [0]PETSC ERROR: #1 BV_SafeSqrt() line 130 in /home/ayala/Documents/SLEPc/slepc-3.12.2/include/slepc/private/bvimpl.h > [0]PETSC ERROR: #2 BVNorm_Private() line 473 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/bv/interface/bvglobal.c > [0]PETSC ERROR: #3 BVNormColumn() line 718 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/bv/interface/bvglobal.c > [0]PETSC ERROR: #4 BV_NormVecOrColumn() line 26 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/bv/interface/bvorthog.c > [0]PETSC ERROR: #5 BVOrthogonalizeCGS1() line 136 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/bv/interface/bvorthog.c > [0]PETSC ERROR: #6 BVOrthogonalizeGS() line 188 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/bv/interface/bvorthog.c > [0]PETSC ERROR: #7 BVOrthonormalizeColumn() line 416 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/bv/interface/bvorthog.c > [0]PETSC ERROR: #8 EPSSolve_LOBPCG() line 144 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/eps/impls/cg/lobpcg/lobpcg.c > [0]PETSC ERROR: #9 EPSSolve() line 149 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/eps/interface/epssolve.c > > Number of iterations of the method: 0 > Number of linear iterations of the method: 0 > Solution method: lobpcg > > Number of requested eigenvalues: 6 > Stopping condition: tol=1e-06, maxit=10000 > Number of converged eigenpairs: 0 > > The problem appears even when I run the same compilation. > > Anyone have any suggestions to solve the problem? > > Kind regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dong-hao at outlook.com Fri Feb 7 07:44:59 2020 From: dong-hao at outlook.com (Hao DONG) Date: Fri, 7 Feb 2020 13:44:59 +0000 Subject: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC? In-Reply-To: References: <264F91C4-8558-4850-9B4B-ABE4123C2A2C@anl.gov> <4A373D93-4018-45E0-B805-3ECC528472DD@mcs.anl.gov> , Message-ID: Thanks, Barry, I really appreciate your help - I removed the OpenMP flags and rebuilt PETSc. So the problem is from the BLAS lib I linked? Not sure which version my BLAS is, though? But I also included the -download-Scalapack option. Shouldn?t that enable linking with PBLAS automatically? After looking at the bcgs code in PETSc, I suppose the iteration residual recorded is indeed recorded twice per one "actual iteration?. So that can explain the difference of iteration numbers. My laptop is indeed an old machine (MBP15 mid-2014). I just cannot work with vi without a physical ESC key... I have attached the configure.log -didn?t know that it is so large! Anyway, it seems that the removal of -openmp changes quite a lot of things, the performance is indeed getting much better - the flop/sec increases by a factor of 3. Still, I am getting 20 percent of VecMDot, but no VecMDot in BCGS all (see output below), is that a feature of gmres method? here is the output of the same problem with: -pc_type bjacobi -pc_bjacobi_local_blocks 3 -sub_pc_type ilu -ksp_type gmres -ksp_monitor -ksp_view ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- Mod3DMT.test on a arch-darwin-c-opt named Haos-MBP with 1 processor, by donghao Fri Feb 7 10:26:19 2020 Using Petsc Release Version 3.12.3, unknown Max Max/Min Avg Total Time (sec): 2.520e+00 1.000 2.520e+00 Objects: 1.756e+03 1.000 1.756e+03 Flop: 7.910e+09 1.000 7.910e+09 7.910e+09 Flop/sec: 3.138e+09 1.000 3.138e+09 3.138e+09 MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Reductions: 0.000e+00 0.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flop and VecAXPY() for complex vectors of length N --> 8N flop Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 2.5204e+00 100.0% 7.9096e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% ------------------------------------------------------------------------------------------------------------------------ ? ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage BuildTwoSidedF 1 1.0 3.4000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMult 75 1.0 6.2884e-01 1.0 1.88e+09 1.0 0.0e+00 0.0e+00 0.0e+00 25 24 0 0 0 25 24 0 0 0 2991 MatSolve 228 1.0 4.4164e-01 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 14 0 0 0 18 14 0 0 0 2445 MatLUFactorNum 3 1.0 4.1317e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 574 MatILUFactorSym 3 1.0 2.3858e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatAssemblyBegin 5 1.0 4.4000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 5 1.0 1.5067e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatGetRowIJ 3 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatCreateSubMats 1 1.0 2.4558e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatGetOrdering 3 1.0 1.3290e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatView 3 1.0 1.2800e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecMDot 72 1.0 4.9875e-01 1.0 2.25e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 28 0 0 0 20 28 0 0 0 4509 VecNorm 76 1.0 6.6666e-02 1.0 1.70e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 2 0 0 0 3 2 0 0 0 2544 VecScale 75 1.0 1.7982e-02 1.0 8.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 4653 VecCopy 3 1.0 1.5080e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 276 1.0 9.6784e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 0 VecAXPY 6 1.0 3.6860e-03 1.0 1.34e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3632 VecMAXPY 75 1.0 4.0490e-01 1.0 2.41e+09 1.0 0.0e+00 0.0e+00 0.0e+00 16 30 0 0 0 16 30 0 0 0 5951 VecAssemblyBegin 2 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 2 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 76 1.0 5.3800e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecNormalize 75 1.0 8.3690e-02 1.0 2.51e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 3 0 0 0 3 3 0 0 0 2999 KSPSetUp 4 1.0 1.1663e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 2.2119e+00 1.0 7.91e+09 1.0 0.0e+00 0.0e+00 0.0e+00 88100 0 0 0 88100 0 0 0 3576 KSPGMRESOrthog 72 1.0 8.7843e-01 1.0 4.50e+09 1.0 0.0e+00 0.0e+00 0.0e+00 35 57 0 0 0 35 57 0 0 0 5121 PCSetUp 4 1.0 9.2448e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 257 PCSetUpOnBlocks 1 1.0 6.6597e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 356 PCApply 76 1.0 4.6281e-01 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 14 0 0 0 18 14 0 0 0 2333 PCApplyOnBlocks 228 1.0 4.6262e-01 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 14 0 0 0 18 14 0 0 0 2334 ------------------------------------------------------------------------------------------------------------------------ Average time to get PetscTime(): 1e-07 #PETSc Option Table entries: -I LBFGS -ksp_type gmres -ksp_view -log_view -pc_bjacobi_local_blocks 3 -pc_type bjacobi -sub_pc_type ilu #End of PETSc Option Table entries Compiled with FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 16 sizeof(PetscInt) 4 Configure options: --with-scalar-type=complex --download-mumps --download-scalapack --with-fortran-kernels=1 -- FOPTFLAGS="-O3 -ffree-line-length-0 -msse2" --COPTFLAGS="-O3 -msse2" --CXXOPTFLAGS="-O3 -msse2" --with-debugging=0 ----------------------------------------- Libraries compiled on 2020-02-07 10:07:42 on Haos-MBP Machine characteristics: Darwin-19.3.0-x86_64-i386-64bit Using PETSc directory: /Users/donghao/src/git/PETSc-current Using PETSc arch: arch-darwin-c-opt ----------------------------------------- Using C compiler: mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fno-stack- check -Qunused-arguments -fvisibility=hidden -O3 -msse2 Using Fortran compiler: mpif90 -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -O3 -ffree-line-length-0 - msse2 ----------------------------------------- Using include paths: -I/Users/donghao/src/git/PETSc-current/include -I/Users/donghao/src/git/PETSc-current/arch-darwin- c-opt/include ----------------------------------------- Using C linker: mpicc Using Fortran linker: mpif90 Using libraries: -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/donghao/src/git/PETSc- current/arch-darwin-c-opt/lib -lpetsc -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/ donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -Wl,-rpath,/usr/local/opt/libevent/lib -L/usr/local/opt/libevent/ lib -Wl,-rpath,/usr/local/Cellar/open-mpi/4.0.2/lib -L/usr/local/Cellar/open-mpi/4.0.2/lib -Wl,-rpath,/usr/local/Cellar/ gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple-darwin19/9.2.0 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple- darwin19/9.2.0 -Wl,-rpath,/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -lcmumps - ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -llapack -lblas -lc++ -ldl -lmpi_usempif08 - lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lquadmath -lm -lc++ -ldl ----------------------------------------- The BCGS solver performance is now comparable to my own Fortran code (1.84s). Still, I feel that there is something wrong hidden somewhere in my setup - a professional lib should to perform better, I believe. Any other ideas that I can look into? Interestingly there is no VecMDot operation at all! Here is the output with the option of: -pc_type bjacobi -pc_bjacobi_local_blocks 3 -sub_pc_type ilu -ksp_type bcgs -ksp_monitor -ksp_view ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- Mod3DMT.test on a arch-darwin-c-opt named Haos-MBP with 1 processor, by donghao Fri Feb 7 10:38:00 2020 Using Petsc Release Version 3.12.3, unknown Max Max/Min Avg Total Time (sec): 2.187e+00 1.000 2.187e+00 Objects: 1.155e+03 1.000 1.155e+03 Flop: 4.311e+09 1.000 4.311e+09 4.311e+09 Flop/sec: 1.971e+09 1.000 1.971e+09 1.971e+09 MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Reductions: 0.000e+00 0.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flop and VecAXPY() for complex vectors of length N --> 8N flop Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 2.1870e+00 100.0% 4.3113e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% ------------------------------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage BuildTwoSidedF 1 1.0 2.2000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMult 83 1.0 7.8726e-01 1.0 2.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 36 48 0 0 0 36 48 0 0 0 2644 MatSolve 252 1.0 5.5656e-01 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 25 28 0 0 0 25 28 0 0 0 2144 MatLUFactorNum 3 1.0 4.5115e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 526 MatILUFactorSym 3 1.0 2.5103e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatAssemblyBegin 5 1.0 3.3000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 5 1.0 1.5709e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatGetRowIJ 3 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatCreateSubMats 1 1.0 2.8989e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatGetOrdering 3 1.0 1.1200e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatView 3 1.0 1.2600e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecDot 82 1.0 8.9328e-02 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00 4 4 0 0 0 4 4 0 0 0 2048 VecDotNorm2 41 1.0 9.9019e-02 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00 5 4 0 0 0 5 4 0 0 0 1848 VecNorm 43 1.0 3.9988e-02 1.0 9.59e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 2399 VecCopy 2 1.0 1.1150e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 271 1.0 4.2833e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 VecAXPY 1 1.0 5.9200e-04 1.0 2.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3769 VecAXPBYCZ 82 1.0 1.1448e-01 1.0 3.66e+08 1.0 0.0e+00 0.0e+00 0.0e+00 5 8 0 0 0 5 8 0 0 0 3196 VecWAXPY 82 1.0 6.7460e-02 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 4 0 0 0 3 4 0 0 0 2712 VecAssemblyBegin 2 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 84 1.0 5.2800e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSetUp 4 1.0 1.4765e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 KSPSolve 1 1.0 1.8514e+00 1.0 4.31e+09 1.0 0.0e+00 0.0e+00 0.0e+00 85100 0 0 0 85100 0 0 0 2329 PCSetUp 4 1.0 1.0193e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 5 1 0 0 0 5 1 0 0 0 233 PCSetUpOnBlocks 1 1.0 7.1421e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 1 0 0 0 3 1 0 0 0 332 PCApply 84 1.0 5.7927e-01 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 26 28 0 0 0 26 28 0 0 0 2060 PCApplyOnBlocks 252 1.0 5.7902e-01 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 26 28 0 0 0 26 28 0 0 0 2061 ------------------------------------------------------------------------------------------------------------------------ Cheers, Hao ________________________________ From: Smith, Barry F. Sent: Thursday, February 6, 2020 7:03 PM To: Hao DONG Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC? Read my comments ALL the way down, they go a long way. > On Feb 6, 2020, at 3:43 AM, Hao DONG wrote: > > Dear Hong and Barry, > > Thanks for the suggestions. So there could be some problems in my PETSc configuration? - but my PETSc lib was indeed compiled without the debug flags (--with-debugging=0). I use GCC/GFortran (Home-brew GCC 9.2.0) for the compiling and building of PETSc and my own fortran code. My Fortran compiling flags are simply like: > > -O3 -ffree-line-length-none -fastsse > > Which is also used for -FOPTFLAGS in PETSc (I added -openmp for PETSc, but not my fortran code, as I don?t have any OMP optimizations in my program). Note the performance test results I listed yesterday (e.g. 4.08s with 41 bcgs iterations.) are without any CSR-array->PETSc translation overhead (only include the set and solve part). PETSc doesn't use -openmp in any way for its solvers. Do not use this option, it may be slowing the code down. Please send configure.log > > I have two questions about the performance difference - > > 1. Is ilu only factorized once for each iteration, or ilu is performed at every outer ksp iteration steps? Sounds unlikely - but if so, this could cause some extra overheads. ILU is ONLY done if the matrix has changed (which seems wrong). > > 2. Some KSP solvers like BCGS or TFQMR has two ?half-iterations? for each iteration step. Not sure how it works in PETSc, but is that possible that both the two ?half" relative residuals are counted in the output array, doubling the number of iterations (but that cannot explain the extra time consumed)? Yes, PETSc might report them as two, you need to check the exact code. > > Anyway, the output with -log_view from the same 278906 by 278906 matrix with 3-block D-ILU in PETSc is as follows: > > > ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- > > MEMsolv.lu on a arch-darwin-c-opt named Haos-MBP with 1 processor, by donghao Thu Feb 6 09:07:35 2020 > Using Petsc Release Version 3.12.3, unknown > > Max Max/Min Avg Total > Time (sec): 4.443e+00 1.000 4.443e+00 > Objects: 1.155e+03 1.000 1.155e+03 > Flop: 4.311e+09 1.000 4.311e+09 4.311e+09 > Flop/sec: 9.703e+08 1.000 9.703e+08 9.703e+08 > MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00 > MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00 > MPI Reductions: 0.000e+00 0.000 > > Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) > e.g., VecAXPY() for real vectors of length N --> 2N flop > and VecAXPY() for complex vectors of length N --> 8N flop > > Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- > Avg %Total Avg %Total Count %Total Avg %Total Count %Total > 0: Main Stage: 4.4435e+00 100.0% 4.3113e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% > > ???????????????????????????????????????????????????????????? > See the 'Profiling' chapter of the users' manual for details on interpreting output. > Phase summary info: > Count: number of times phase was executed > Time and Flop: Max - maximum over all processors > Ratio - ratio of maximum to minimum over all processors > Mess: number of messages sent > AvgLen: average message length (bytes) > Reduct: number of global reductions > Global: entire computation > Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). > %T - percent time in this phase %F - percent flop in this phase > %M - percent messages in this phase %L - percent message lengths in this phase > %R - percent reductions in this phase > Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) > ------------------------------------------------------------------------------------------------------------------------ > Event Count Time (sec) Flop --- Global --- --- Stage ---- Total > Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > ------------------------------------------------------------------------------------------------------------------------ > > --- Event Stage 0: Main Stage > > BuildTwoSidedF 1 1.0 2.3000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatMult 83 1.0 1.7815e+00 1.0 2.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 40 48 0 0 0 40 48 0 0 0 1168 > MatSolve 252 1.0 1.2708e+00 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 29 28 0 0 0 29 28 0 0 0 939 > MatLUFactorNum 3 1.0 7.9725e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 298 > MatILUFactorSym 3 1.0 2.6998e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > MatAssemblyBegin 5 1.0 3.6000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyEnd 5 1.0 3.1619e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > MatGetRowIJ 3 1.0 2.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatCreateSubMats 1 1.0 3.9659e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > MatGetOrdering 3 1.0 4.3070e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatView 3 1.0 1.3600e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecDot 82 1.0 1.8948e-01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00 4 4 0 0 0 4 4 0 0 0 966 > VecDotNorm2 41 1.0 1.6812e-01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00 4 4 0 0 0 4 4 0 0 0 1088 > VecNorm 43 1.0 9.5099e-02 1.0 9.59e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 1009 > VecCopy 2 1.0 1.0120e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecSet 271 1.0 3.8922e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > VecAXPY 1 1.0 7.7200e-04 1.0 2.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2890 > VecAXPBYCZ 82 1.0 2.4370e-01 1.0 3.66e+08 1.0 0.0e+00 0.0e+00 0.0e+00 5 8 0 0 0 5 8 0 0 0 1502 > VecWAXPY 82 1.0 1.4148e-01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 4 0 0 0 3 4 0 0 0 1293 > VecAssemblyBegin 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecAssemblyEnd 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecScatterBegin 84 1.0 5.9300e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSetUp 4 1.0 1.4167e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSolve 1 1.0 4.0250e+00 1.0 4.31e+09 1.0 0.0e+00 0.0e+00 0.0e+00 91100 0 0 0 91100 0 0 0 1071 > PCSetUp 4 1.0 1.5207e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 1 0 0 0 3 1 0 0 0 156 > PCSetUpOnBlocks 1 1.0 1.1116e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 1 0 0 0 3 1 0 0 0 214 > PCApply 84 1.0 1.2912e+00 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 29 28 0 0 0 29 28 0 0 0 924 > PCApplyOnBlocks 252 1.0 1.2909e+00 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 29 28 0 0 0 29 28 0 0 0 924 > ------------------------------------------------------------------------------------------------------------------------ > > # I skipped the memory part - the options (and compiler options) are as follows: > > #PETSc Option Table entries: > -ksp_type bcgs > -ksp_view > -log_view > -pc_bjacobi_local_blocks 3 > -pc_factor_levels 0 > -pc_sub_type ilu > -pc_type bjacobi > #End of PETSc Option Table entries > Compiled with FORTRAN kernels > Compiled with full precision matrices (default) > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 16 sizeof(PetscInt) 4 > Configure options: --with-scalar-type=complex --download-mumps --download-scalapack --with-fortran-kernels=1 -- FOPTFLAGS=?-O3 -fastsse -mp -openmp? --COPTFLAGS=?-O3 -fastsse -mp -openmp? --CXXOPTFLAGS="-O3 -fastsse -mp -openmp" -- with-debugging=0 > ----------------------------------------- > Libraries compiled on 2020-02-03 10:44:31 on Haos-MBP > Machine characteristics: Darwin-19.2.0-x86_64-i386-64bit > Using PETSc directory: /Users/donghao/src/git/PETSc-current > Using PETSc arch: arch-darwin-c-opt > ----------------------------------------- > > Using C compiler: mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fno-stack- check -Qunused-arguments -fvisibility=hidden > Using Fortran compiler: mpif90 -Wall -ffree-line-length-0 -Wno-unused-dummy-argument > > Using include paths: -I/Users/donghao/src/git/PETSc-current/include -I/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/include > ----------------------------------------- > > Using C linker: mpicc > Using Fortran linker: mpif90 > Using libraries: -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/donghao/src/git/PETSc- current/arch-darwin-c-opt/lib -lpetsc -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/ donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -Wl,-rpath,/usr/local/opt/libevent/lib -L/usr/local/opt/libevent/ lib -Wl,-rpath,/usr/local/Cellar/open-mpi/4.0.2/lib -L/usr/local/Cellar/open-mpi/4.0.2/lib -Wl,-rpath,/usr/local/Cellar/ gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple-darwin19/9.2.0 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple- darwin19/9.2.0 -Wl,-rpath,/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -lcmumps - ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -llapack -lblas -lc++ -ldl -lmpi_usempif08 - lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lquadmath -lm -lc++ -ldl > > > On the other hand, running PETSc with > > -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type lu -ksp_type gmres -ksp_monitor -ksp_view -log_view > > For the same problem takes 5.37s and 72 GMRES iterations. Our previous testings show that BiCGstab (bcgs in PETSc) is almost always the most effective KSP solver for our non-symmetrical complex system. Strangely, the system is still using ilu instead of lu for sub blocks. The output is like: -sub_pc_type lu > > 0 KSP Residual norm 2.480412407430e+02 > 1 KSP Residual norm 8.848059967835e+01 > 2 KSP Residual norm 3.415272863261e+01 > 3 KSP Residual norm 1.563045190939e+01 > 4 KSP Residual norm 6.241296940043e+00 > 5 KSP Residual norm 2.739710899854e+00 > 6 KSP Residual norm 1.391304148888e+00 > 7 KSP Residual norm 7.959262020849e-01 > 8 KSP Residual norm 4.828323055231e-01 > 9 KSP Residual norm 2.918529739200e-01 > 10 KSP Residual norm 1.905508589557e-01 > 11 KSP Residual norm 1.291541892702e-01 > 12 KSP Residual norm 8.827145774707e-02 > 13 KSP Residual norm 6.521331095889e-02 > 14 KSP Residual norm 5.095787952595e-02 > 15 KSP Residual norm 4.043060387395e-02 > 16 KSP Residual norm 3.232590200012e-02 > 17 KSP Residual norm 2.593944982216e-02 > 18 KSP Residual norm 2.064639483533e-02 > 19 KSP Residual norm 1.653916663492e-02 > 20 KSP Residual norm 1.334946415452e-02 > 21 KSP Residual norm 1.092886880597e-02 > 22 KSP Residual norm 8.988004105542e-03 > 23 KSP Residual norm 7.466501315240e-03 > 24 KSP Residual norm 6.284389135436e-03 > 25 KSP Residual norm 5.425231669964e-03 > 26 KSP Residual norm 4.766338253084e-03 > 27 KSP Residual norm 4.241238878242e-03 > 28 KSP Residual norm 3.808113525685e-03 > 29 KSP Residual norm 3.449383788116e-03 > 30 KSP Residual norm 3.126025526388e-03 > 31 KSP Residual norm 2.958328054299e-03 > 32 KSP Residual norm 2.802344900403e-03 > 33 KSP Residual norm 2.621993580492e-03 > 34 KSP Residual norm 2.430066269304e-03 > 35 KSP Residual norm 2.259043079597e-03 > 36 KSP Residual norm 2.104287972986e-03 > 37 KSP Residual norm 1.952916080045e-03 > 38 KSP Residual norm 1.804988937999e-03 > 39 KSP Residual norm 1.643302117377e-03 > 40 KSP Residual norm 1.471661332036e-03 > 41 KSP Residual norm 1.286445911163e-03 > 42 KSP Residual norm 1.127543025848e-03 > 43 KSP Residual norm 9.777148275484e-04 > 44 KSP Residual norm 8.293314450006e-04 > 45 KSP Residual norm 6.989331136622e-04 > 46 KSP Residual norm 5.852307780220e-04 > 47 KSP Residual norm 4.926715539762e-04 > 48 KSP Residual norm 4.215941372075e-04 > 49 KSP Residual norm 3.699489548162e-04 > 50 KSP Residual norm 3.293897163533e-04 > 51 KSP Residual norm 2.959954542998e-04 > 52 KSP Residual norm 2.700193032414e-04 > 53 KSP Residual norm 2.461789791204e-04 > 54 KSP Residual norm 2.218839085563e-04 > 55 KSP Residual norm 1.945154309976e-04 > 56 KSP Residual norm 1.661128781744e-04 > 57 KSP Residual norm 1.413198766258e-04 > 58 KSP Residual norm 1.213984003195e-04 > 59 KSP Residual norm 1.044317450754e-04 > 60 KSP Residual norm 8.919957502977e-05 > 61 KSP Residual norm 8.042584301275e-05 > 62 KSP Residual norm 7.292784493581e-05 > 63 KSP Residual norm 6.481935501872e-05 > 64 KSP Residual norm 5.718564652679e-05 > 65 KSP Residual norm 5.072589750116e-05 > 66 KSP Residual norm 4.487930741285e-05 > 67 KSP Residual norm 3.941040674119e-05 > 68 KSP Residual norm 3.492873281291e-05 > 69 KSP Residual norm 3.103798339845e-05 > 70 KSP Residual norm 2.822943237409e-05 > 71 KSP Residual norm 2.610615023776e-05 > 72 KSP Residual norm 2.441692671173e-05 > KSP Object: 1 MPI processes > type: gmres > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=150, nonzero initial guess > tolerances: relative=1e-07, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: 1 MPI processes > type: bjacobi > number of blocks = 3 > Local solve is same for all blocks, in the following KSP and PC objects: > KSP Object: (sub_) 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (sub_) 1 MPI processes > type: ilu > out-of-place factorization > 0 levels of fill > tolerance for zero pivot 2.22045e-14 > matrix ordering: natural > factor fill ratio given 1., needed 1. > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=92969, cols=92969 > package used to perform factorization: petsc > total: nonzeros=638417, allocated nonzeros=638417 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=92969, cols=92969 > total: nonzeros=638417, allocated nonzeros=638417 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: mpiaij > rows=278906, cols=278906 > total: nonzeros=3274027, allocated nonzeros=3274027 > total number of mallocs used during MatSetValues calls=0 > not using I-node (on process 0) routines > ... > ------------------------------------------------------------------------------------------------------------------------ > Event Count Time (sec) Flop --- Global --- --- Stage ---- Total > Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > ------------------------------------------------------------------------------------------------------------------------ > > --- Event Stage 0: Main Stage > > BuildTwoSidedF 1 1.0 2.3000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatMult 75 1.0 1.5812e+00 1.0 1.88e+09 1.0 0.0e+00 0.0e+00 0.0e+00 28 24 0 0 0 28 24 0 0 0 1189 > MatSolve 228 1.0 1.1442e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 14 0 0 0 20 14 0 0 0 944 These flop rates are ok, but not great. Perhaps an older machine. > MatLUFactorNum 3 1.0 8.1930e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 290 > MatILUFactorSym 3 1.0 2.7102e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyBegin 5 1.0 3.7000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyEnd 5 1.0 3.1895e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > MatGetRowIJ 3 1.0 2.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatCreateSubMats 1 1.0 4.0904e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > MatGetOrdering 3 1.0 4.2640e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatView 3 1.0 1.4400e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecMDot 72 1.0 1.1984e+00 1.0 2.25e+09 1.0 0.0e+00 0.0e+00 0.0e+00 21 28 0 0 0 21 28 0 0 0 1877 21 percent of the time in VecMDOT this is huge for s sequential fun. I think maybe you are using a terrible OpenMP BLAS? Send configure.log > VecNorm 76 1.0 1.6841e-01 1.0 1.70e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 2 0 0 0 3 2 0 0 0 1007 > VecScale 75 1.0 1.8241e-02 1.0 8.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 4587 > VecCopy 3 1.0 1.4970e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecSet 276 1.0 9.1970e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 > VecAXPY 6 1.0 3.7450e-03 1.0 1.34e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3575 > VecMAXPY 75 1.0 1.0022e+00 1.0 2.41e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 30 0 0 0 18 30 0 0 0 2405 > VecAssemblyBegin 2 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecAssemblyEnd 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecScatterBegin 76 1.0 5.5100e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecNormalize 75 1.0 1.8462e-01 1.0 2.51e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 3 0 0 0 3 3 0 0 0 1360 > KSPSetUp 4 1.0 1.1341e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSolve 1 1.0 5.3123e+00 1.0 7.91e+09 1.0 0.0e+00 0.0e+00 0.0e+00 93100 0 0 0 93100 0 0 0 1489 > KSPGMRESOrthog 72 1.0 2.1316e+00 1.0 4.50e+09 1.0 0.0e+00 0.0e+00 0.0e+00 37 57 0 0 0 37 57 0 0 0 2110 > PCSetUp 4 1.0 1.5531e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 153 > PCSetUpOnBlocks 1 1.0 1.1343e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 209 > PCApply 76 1.0 1.1671e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 14 0 0 0 20 14 0 0 0 925 > PCApplyOnBlocks 228 1.0 1.1668e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 14 0 0 0 20 14 0 0 0 925 > ???????????????????????????????????????????????????????????? > ... > #PETSc Option Table entries: > -ksp_monitor > -ksp_type gmres > -ksp_view > -log_view > -pc_bjacobi_local_blocks 3 > -pc_sub_type lu > -pc_type bjacobi > #End of PETSc Option Table entries > ... > > Does any of the setup/output ring a bell? > > BTW, out of curiosity - what is a ?I-node? routine? > > > Cheers, > Hao > > > From: Smith, Barry F. > Sent: Wednesday, February 5, 2020 9:42 PM > To: Hao DONG > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC? > > > > > On Feb 5, 2020, at 4:36 AM, Hao DONG wrote: > > > > Thanks a lot for your suggestions, Hong and Barry - > > > > As you suggested, I first tried the LU direct solvers (built-in and MUMPs) out this morning, which work perfectly, albeit slow. As my problem itself is a part of a PDE based optimization, the exact solution in the searching procedure is not necessary (I often set a relative tolerance of 1E-7/1E-8, etc. for Krylov subspace methods). Also tried BJACOBI with exact LU, the KSP just converges in one or two iterations, as expected. > > > > I added -kspview option for the D-ILU test (still with Block Jacobi as preconditioner and bcgs as KSP solver). The KSPview output from one of the examples in a toy model looks like: > > > > KSP Object: 1 MPI processes > > type: bcgs > > maximum iterations=120, nonzero initial guess > > tolerances: relative=1e-07, absolute=1e-50, divergence=10000. > > left preconditioning > > using PRECONDITIONED norm type for convergence test > > PC Object: 1 MPI processes > > type: bjacobi > > number of blocks = 3 > > Local solve is same for all blocks, in the following KSP and PC objects: > > KSP Object: (sub_) 1 MPI processes > > type: preonly > > maximum iterations=10000, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > using NONE norm type for convergence test > > PC Object: (sub_) 1 MPI processes > > type: ilu > > out-of-place factorization > > 0 levels of fill > > tolerance for zero pivot 2.22045e-14 > > matrix ordering: natural > > factor fill ratio given 1., needed 1. > > Factored matrix follows: > > Mat Object: 1 MPI processes > > type: seqaij > > rows=11294, cols=11294 > > package used to perform factorization: petsc > > total: nonzeros=76008, allocated nonzeros=76008 > > total number of mallocs used during MatSetValues calls=0 > > not using I-node routines > > linear system matrix = precond matrix: > > Mat Object: 1 MPI processes > > type: seqaij > > rows=11294, cols=11294 > > total: nonzeros=76008, allocated nonzeros=76008 > > total number of mallocs used during MatSetValues calls=0 > > not using I-node routines > > linear system matrix = precond matrix: > > Mat Object: 1 MPI processes > > type: mpiaij > > rows=33880, cols=33880 > > total: nonzeros=436968, allocated nonzeros=436968 > > total number of mallocs used during MatSetValues calls=0 > > not using I-node (on process 0) routines > > > > do you see something wrong with my setup? > > > > I also tried a quick performance test with a small 278906 by 278906 matrix (3850990 nnz) with the following parameters: > > > > -ksp_type bcgs -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type ilu -ksp_view > > > > Reducing the relative residual to 1E-7 > > > > Took 4.08s with 41 bcgs iterations. > > > > Merely changing the -pc_bjacobi_local_blocks to 6 > > > > Took 7.02s with 73 bcgs iterations. 9 blocks would further take 9.45s with 101 bcgs iterations. > > This is normal. more blocks slower convergence > > > > As a reference, my home-brew Fortran code solves the same problem (3-block D-ILU0) in > > > > 1.84s with 24 bcgs iterations (the bcgs code is also a home-brew one)? > > > Run the PETSc code with optimization ./configure --with-debugging=0 an run the code with -log_view this will show where the PETSc code is spending the time (send it to use) > > > > > > > > Well, by saying ?use explicit L/U matrix as preconditioner?, I wonder if a user is allowed to provide his own (separate) L and U Mat for preconditioning (like how it works in Matlab solvers), e.g. > > > > x = qmr(A,b,Tol,MaxIter,L,U,x) > > > > As I already explicitly constructed the L and U matrices in Fortran, it is not hard to convert them to Mat in Petsc to test Petsc and my Fortran code head-to-head. In that case, the A, b, x, and L/U are all identical, it would be easier to see where the problem came from. > > > > > No, we don't provide this kind of support > > > > > > BTW, there is another thing I wondered - is there a way to output residual in unpreconditioned norm? I tried to > > > > call KSPSetNormType(ksp_local, KSP_NORM_UNPRECONDITIONED, ierr) > > > > But always get an error that current ksp does not support unpreconditioned in LEFT/RIGHT (either way). Is it possible to do that (output unpreconditioned residual) in PETSc at all? > > -ksp_monitor_true_residual You can also run GMRES (and some other methods) with right preconditioning, -ksp_pc_side right then the residual computed is by the algorithm the unpreconditioned residual > > KSPSetNormType sets the norm used in the algorithm, it generally always has to left or right, only a couple algorithms support unpreconditioned directly. > > Barry > > > > > > Cheers, > > Hao > > > > > > From: Smith, Barry F. > > Sent: Tuesday, February 4, 2020 8:27 PM > > To: Hao DONG > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC? > > > > > > > > > On Feb 4, 2020, at 12:41 PM, Hao DONG wrote: > > > > > > Dear all, > > > > > > > > > I have a few questions about the implementation of diagonal ILU PC in PETSc. I want to solve a very simple system with KSP (in parallel), the nature of the system (finite difference time-harmonic Maxwell) is probably not important to the question itself. Long story short, I just need to solve a set of equations of Ax = b with a block diagonal system matrix, like (not sure if the mono font works): > > > > > > |X | > > > A =| Y | > > > | Z| > > > > > > Note that A is not really block-diagonal, it?s just a multi-diagonal matrix determined by the FD mesh, where most elements are close to diagonal. So instead of a full ILU decomposition, a D-ILU is a good approximation as a preconditioner, and the number of blocks should not matter too much: > > > > > > |Lx | |Ux | > > > L = | Ly | and U = | Uy | > > > | Lz| | Uz| > > > > > > Where [Lx, Ux] = ILU0(X), etc. Indeed, the D-ILU preconditioner (with 3N blocks) is quite efficient with Krylov subspace methods like BiCGstab or QMR in my sequential Fortran/Matlab code. > > > > > > So like most users, I am looking for a parallel implement with this problem in PETSc. After looking through the manual, it seems that the most straightforward way to do it is through PCBJACOBI. Not sure I understand it right, I just setup a 3-block PCJACOBI and give each of the block a KSP with PCILU. Is this supposed to be equivalent to my D-ILU preconditioner? My little block of fortran code would look like: > > > ... > > > call PCBJacobiSetTotalBlocks(pc_local,Ntotal, & > > > & isubs,ierr) > > > call PCBJacobiSetLocalBlocks(pc_local, Nsub, & > > > & isubs(istart:iend),ierr) > > > ! set up the block jacobi structure > > > call KSPSetup(ksp_local,ierr) > > > ! allocate sub ksps > > > allocate(ksp_sub(Nsub)) > > > call PCBJacobiGetSubKSP(pc_local,Nsub,istart, & > > > & ksp_sub,ierr) > > > do i=1,Nsub > > > call KSPGetPC(ksp_sub(i),pc_sub,ierr) > > > !ILU preconditioner > > > call PCSetType(pc_sub,ptype,ierr) > > > call PCFactorSetLevels(pc_sub,1,ierr) ! use ILU(1) here > > > call KSPSetType(ksp_sub(i),KSPPREONLY,ierr)] > > > end do > > > call KSPSetTolerances(ksp_local,KSSiter%tol,PETSC_DEFAULT_REAL, & > > > & PETSC_DEFAULT_REAL,KSSiter%maxit,ierr) > > > ? > > > > This code looks essentially right. You should call with -ksp_view to see exactly what PETSc is using for a solver. > > > > > > > > I understand that the parallel performance may not be comparable, so I first set up a one-process test (with MPIAij, but all the rows are local since there is only one process). The system is solved without any problem (identical results within error). But the performance is actually a lot worse (code built without debugging flags in performance tests) than my own home-brew implementation in Fortran (I wrote my own ILU0 in CSR sparse matrix format), which is hard to believe. I suspect the difference is from the PC as the PETSc version took much more BiCGstab iterations (60-ish vs 100-ish) to converge to the same relative tol. > > > > PETSc uses GMRES by default with a restart of 30 and left preconditioning. It could be different exact choices in the solver (which is why -ksp_view is so useful) can explain the differences in the runs between your code and PETSc's > > > > > > This is further confirmed when I change the setup of D-ILU (using 6 or 9 blocks instead of 3). While my Fortran/Matlab codes see minimal performance difference (<5%) when I play with the D-ILU setup, increasing the number of D-ILU blocks from 3 to 9 caused the ksp setup with PCBJACOBI to suffer a performance decrease of more than 50% in sequential test. > > > > This is odd, the more blocks the smaller each block so the quicker the ILU set up should be. You can run various cases with -log_view and send us the output to see what is happening at each part of the computation in time. > > > > > So my implementation IS somewhat different in PETSc. Do I miss something in the PCBJACOBI setup? Or do I have some fundamental misunderstanding of how PCBJACOBI works in PETSc? > > > > Probably not. > > > > > > If this is not the right way to implement a block diagonal ILU as (parallel) PC, please kindly point me to the right direction. I searched through the mail list to find some answers, only to find a couple of similar questions... An example would be nice. > > > > You approach seems fundamentally right but I cannot be sure of possible bugs. > > > > > > On the other hand, does PETSc support a simple way to use explicit L/U matrix as a preconditioner? I can import the D-ILU matrix (I already converted my A matrix into Mat) constructed in my Fortran code to make a better comparison. Or do I have to construct my own PC using PCshell? If so, is there a good tutorial/example to learn about how to use PCSHELL (in a more advanced sense, like how to setup pc side and transpose)? > > > > Not sure what you mean by explicit L/U matrix as a preconditioner. As Hong said, yes you can use a parallel LU from MUMPS or SuperLU_DIST or Pastix as the solver. You do not need any shell code. You simply need to set the PCType to lu > > > > You can also set all this options from the command line and don't need to write the code specifically. So call KSPSetFromOptions() and then for example > > > > -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type ilu (this last one is applied to each block so you could use -pc_type lu and it would use lu on each block.) > > > > -ksp_type_none -pc_type lu -pc_factor_mat_solver_type mumps (do parallel LU with mumps) > > > > By not hardwiring in the code and just using options you can test out different cases much quicker > > > > Use -ksp_view to make sure that is using the solver the way you expect. > > > > Barry > > > > > > > > Barry > > > > > > > > Thanks in advance, > > > > > > Hao -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 622592 bytes Desc: configure.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 1682411 bytes Desc: configure.log URL: From knepley at gmail.com Fri Feb 7 07:51:07 2020 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 7 Feb 2020 08:51:07 -0500 Subject: [petsc-users] Condition Number and GMRES iteration In-Reply-To: References: Message-ID: On Thu, Feb 6, 2020 at 7:37 PM Fande Kong wrote: > Hi All, > > MOOSE team, Alex and I are working on some variable scaling techniques to > improve the condition number of the matrix of linear systems. The goal of > variable scaling is to make the diagonal of matrix as close to unity as > possible. After scaling (for certain example), the condition number of the > linear system is actually reduced, but the GMRES iteration does not > decrease at all. > > From my understanding, the condition number is the worst estimation for > GMRES convergence. That is, the GMRES iteration should not increases when > the condition number decreases. This actually could example what we saw: > the improved condition number does not necessary lead to a decrease in > GMRES iteration. We try to understand this a bit more, and we guess that > the number of eigenvalue clusters of the matrix of the linear system > may/might be related to the convergence rate of GMRES. We plot eigenvalues > of scaled system and unscaled system, and the clusters look different from > each other, but the GMRRES iterations are the same. > > Anyone know what is the right relationship between the condition number > and GMRES iteration? How does the number of eigenvalue clusters affect > GMRES iteration? How to count eigenvalue clusters? For example, how many > eigenvalue clusters we have in the attach image respectively? > > If you need more details, please let us know. Alex and I are happy to > provide any details you are interested in. > Hi Fande, This is one of my favorite papers of all time: https://epubs.siam.org/doi/abs/10.1137/S0895479894275030 It shows that the spectrum alone tells you nothing at all about GMRES convergence. You need other things, like symmetry (almost everything is known) or normality (a little bit is known). Thanks, Matt > Thanks, > > Fande Kong, > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Feb 7 07:53:45 2020 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 7 Feb 2020 08:53:45 -0500 Subject: [petsc-users] Condition Number and GMRES iteration In-Reply-To: References: Message-ID: On Thu, Feb 6, 2020 at 8:07 PM Alexander Lindsay wrote: > It looks like Fande has attached the eigenvalue plots with the real axis > having a logarithmic scale. The same plots with a linear scale are attached > here. > > The system has 306 degrees of freedom. 12 eigenvalues are unity for both > scaled and unscaled cases; this number corresponds to the number of mesh > nodes with Dirichlet boundary conditions (just a 1 on the diagonal for the > corresponding rows). The rest of the eigenvalues are orders of magnitude > smaller for the unscaled case; using scaling these eigenvalues are brought > much closer 1. > > This particular problem is linear but we solve it with SNES, so constant > Jacobian. We run with options '-pc_type none -ksp_gmres_restart 1000 > -snes_rtol 1e-8 -ksp_rtol 1e-5` so for this linear problem it takes two > non-linear iterations to solve. > Why not just make -ksp_rtol 1e-8? Thanks, Matt > Unscaled: > > first nonlinear iteration takes 2 linear iterations > second nonlinear iteration takes 99 linear iterations > > Scaled: > > first nonlinear iteration takes 94 linear iterations > second nonlinear iteration takes 100 linear iterations > > Running with `-pc_type svd` the condition number for the unscaled > simulation is 4e9 while it is 2e3 for the scaled simulation. > > > > On Thu, Feb 6, 2020 at 4:36 PM Fande Kong wrote: > >> Hi All, >> >> MOOSE team, Alex and I are working on some variable scaling techniques to >> improve the condition number of the matrix of linear systems. The goal of >> variable scaling is to make the diagonal of matrix as close to unity as >> possible. After scaling (for certain example), the condition number of the >> linear system is actually reduced, but the GMRES iteration does not >> decrease at all. >> >> From my understanding, the condition number is the worst estimation for >> GMRES convergence. That is, the GMRES iteration should not increases when >> the condition number decreases. This actually could example what we saw: >> the improved condition number does not necessary lead to a decrease in >> GMRES iteration. We try to understand this a bit more, and we guess that >> the number of eigenvalue clusters of the matrix of the linear system >> may/might be related to the convergence rate of GMRES. We plot eigenvalues >> of scaled system and unscaled system, and the clusters look different from >> each other, but the GMRRES iterations are the same. >> >> Anyone know what is the right relationship between the condition number >> and GMRES iteration? How does the number of eigenvalue clusters affect >> GMRES iteration? How to count eigenvalue clusters? For example, how many >> eigenvalue clusters we have in the attach image respectively? >> >> If you need more details, please let us know. Alex and I are happy to >> provide any details you are interested in. >> >> >> Thanks, >> >> Fande Kong, >> >> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri Feb 7 12:31:43 2020 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 7 Feb 2020 13:31:43 -0500 Subject: [petsc-users] Condition Number and GMRES iteration In-Reply-To: References: Message-ID: On Thu, Feb 6, 2020 at 8:07 PM Alexander Lindsay wrote: > It looks like Fande has attached the eigenvalue plots with the real axis > having a logarithmic scale. The same plots with a linear scale are attached > here. > > The system has 306 degrees of freedom. 12 eigenvalues are unity for both > scaled and unscaled cases; this number corresponds to the number of mesh > nodes with Dirichlet boundary conditions (just a 1 on the diagonal for the > corresponding rows). The rest of the eigenvalues are orders of magnitude > smaller for the unscaled case; using scaling these eigenvalues are brought > much closer 1. > So you are running un-preconditioned GMRES, I assume. So your condition number is like 10^23 because if these 1s on the diagonal. I suggest always scaling by the diagonal for that reason, but if you want to run un-preconditioned then you have to be careful about what amount to penalty terms. In this case just take them out of the system entirely. They are just mudding up your numerical studies. (Now Krylov is just nailing these in the first iteration, espicaily GMRES which focuses on the largest eigenvalue vectors). BTW, one of my earliest talks, in grad school before I had any real results, was called "condition number does not matter" and I showed examples problems where solvers, multigrid to be specific in some cases, work great on poorly conditioned problems (eg, scale and your problem) and fail on well conditioned problems (eg, incompressibility) > > This particular problem is linear but we solve it with SNES, so constant > Jacobian. We run with options '-pc_type none -ksp_gmres_restart 1000 > -snes_rtol 1e-8 -ksp_rtol 1e-5` so for this linear problem it takes two > non-linear iterations to solve. > > Unscaled: > > first nonlinear iteration takes 2 linear iterations > second nonlinear iteration takes 99 linear iterations > > Scaled: > > first nonlinear iteration takes 94 linear iterations > second nonlinear iteration takes 100 linear iterations > > Running with `-pc_type svd` the condition number for the unscaled > simulation is 4e9 while it is 2e3 for the scaled simulation. > > > > On Thu, Feb 6, 2020 at 4:36 PM Fande Kong wrote: > >> Hi All, >> >> MOOSE team, Alex and I are working on some variable scaling techniques to >> improve the condition number of the matrix of linear systems. The goal of >> variable scaling is to make the diagonal of matrix as close to unity as >> possible. After scaling (for certain example), the condition number of the >> linear system is actually reduced, but the GMRES iteration does not >> decrease at all. >> >> From my understanding, the condition number is the worst estimation for >> GMRES convergence. That is, the GMRES iteration should not increases when >> the condition number decreases. This actually could example what we saw: >> the improved condition number does not necessary lead to a decrease in >> GMRES iteration. We try to understand this a bit more, and we guess that >> the number of eigenvalue clusters of the matrix of the linear system >> may/might be related to the convergence rate of GMRES. We plot eigenvalues >> of scaled system and unscaled system, and the clusters look different from >> each other, but the GMRRES iterations are the same. >> >> Anyone know what is the right relationship between the condition number >> and GMRES iteration? How does the number of eigenvalue clusters affect >> GMRES iteration? How to count eigenvalue clusters? For example, how many >> eigenvalue clusters we have in the attach image respectively? >> >> If you need more details, please let us know. Alex and I are happy to >> provide any details you are interested in. >> >> >> Thanks, >> >> Fande Kong, >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From eijkhout at tacc.utexas.edu Fri Feb 7 12:43:27 2020 From: eijkhout at tacc.utexas.edu (Victor Eijkhout) Date: Fri, 7 Feb 2020 18:43:27 +0000 Subject: [petsc-users] Condition Number and GMRES iteration In-Reply-To: References: Message-ID: <78AF6D48-807E-4014-A8D1-B31207A8C3FC@tacc.utexas.edu> On , 2020Feb7, at 12:31, Mark Adams > wrote: BTW, one of my earliest talks, in grad school before I had any real results, was called "condition number does not matter? After you learn that the condition number gives an _upper_bound_ on the number of iterations, you learn that if a few eigenvalues are separated from a cluster of other eigenvalues, your number of iterations is 1 for each separated one, and then a bound based on the remaining cluster. (Condition number predicts a number of iterations based on Chebychev polynomials. Since the CG polynomials are optimal, they are at least as good as Chebychev. Hence the number of iterations is at most what you got from Chebychev, which is the condition number bound.) Victor. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Feb 7 12:55:45 2020 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 7 Feb 2020 13:55:45 -0500 Subject: [petsc-users] Condition Number and GMRES iteration In-Reply-To: <78AF6D48-807E-4014-A8D1-B31207A8C3FC@tacc.utexas.edu> References: <78AF6D48-807E-4014-A8D1-B31207A8C3FC@tacc.utexas.edu> Message-ID: On Fri, Feb 7, 2020 at 1:43 PM Victor Eijkhout wrote: > > > On , 2020Feb7, at 12:31, Mark Adams wrote: > > BTW, one of my earliest talks, in grad school before I had any real > results, was called "condition number does not matter? > > > After you learn that the condition number gives an _upper_bound_ on the > number of iterations, you learn that if a few eigenvalues are separated > from a cluster of other eigenvalues, your number of iterations is 1 for > each separated one, and then a bound based on the remaining cluster. > This is _only_ for normal matrices. Not true for general matrices. Matt > (Condition number predicts a number of iterations based on Chebychev > polynomials. Since the CG polynomials are optimal, they are at least as > good as Chebychev. Hence the number of iterations is at most what you got > from Chebychev, which is the condition number bound.) > > Victor. > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdkong.jd at gmail.com Fri Feb 7 13:14:08 2020 From: fdkong.jd at gmail.com (Fande Kong) Date: Fri, 7 Feb 2020 12:14:08 -0700 Subject: [petsc-users] Condition Number and GMRES iteration In-Reply-To: References: Message-ID: Thanks, Matt, It is a great paper. According to the paper, here is my understanding: for normal matrices, the eigenvalues of the matrix together with the initial residual completely determine the GMRES convergence rate. For non-normal matrices, eigenvalues are NOT the relevant quantities in determining the behavior of GMRES. What quantities we should look at for non-normal matrices? In other words, how do we know one matrix is easier than others to solve? Possibly they are still open problems?! Thanks, Fande, On Fri, Feb 7, 2020 at 6:51 AM Matthew Knepley wrote: > On Thu, Feb 6, 2020 at 7:37 PM Fande Kong wrote: > >> Hi All, >> >> MOOSE team, Alex and I are working on some variable scaling techniques to >> improve the condition number of the matrix of linear systems. The goal of >> variable scaling is to make the diagonal of matrix as close to unity as >> possible. After scaling (for certain example), the condition number of the >> linear system is actually reduced, but the GMRES iteration does not >> decrease at all. >> >> From my understanding, the condition number is the worst estimation for >> GMRES convergence. That is, the GMRES iteration should not increases when >> the condition number decreases. This actually could example what we saw: >> the improved condition number does not necessary lead to a decrease in >> GMRES iteration. We try to understand this a bit more, and we guess that >> the number of eigenvalue clusters of the matrix of the linear system >> may/might be related to the convergence rate of GMRES. We plot eigenvalues >> of scaled system and unscaled system, and the clusters look different from >> each other, but the GMRRES iterations are the same. >> >> Anyone know what is the right relationship between the condition number >> and GMRES iteration? How does the number of eigenvalue clusters affect >> GMRES iteration? How to count eigenvalue clusters? For example, how many >> eigenvalue clusters we have in the attach image respectively? >> >> If you need more details, please let us know. Alex and I are happy to >> provide any details you are interested in. >> > > Hi Fande, > > This is one of my favorite papers of all time: > > https://epubs.siam.org/doi/abs/10.1137/S0895479894275030 > > It shows that the spectrum alone tells you nothing at all about GMRES > convergence. You need other things, like symmetry (almost > everything is known) or normality (a little bit is known). > > Thanks, > > Matt > > >> Thanks, >> >> Fande Kong, >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdkong.jd at gmail.com Fri Feb 7 13:15:14 2020 From: fdkong.jd at gmail.com (Fande Kong) Date: Fri, 7 Feb 2020 12:15:14 -0700 Subject: [petsc-users] Condition Number and GMRES iteration In-Reply-To: <78AF6D48-807E-4014-A8D1-B31207A8C3FC@tacc.utexas.edu> References: <78AF6D48-807E-4014-A8D1-B31207A8C3FC@tacc.utexas.edu> Message-ID: On Fri, Feb 7, 2020 at 11:43 AM Victor Eijkhout wrote: > > > On , 2020Feb7, at 12:31, Mark Adams wrote: > > BTW, one of my earliest talks, in grad school before I had any real > results, was called "condition number does not matter? > > > After you learn that the condition number gives an _upper_bound_ on the > number of iterations, you learn that if a few eigenvalues are separated > from a cluster of other eigenvalues, your number of iterations is 1 for > each separated one, and then a bound based on the remaining cluster. > > (Condition number predicts a number of iterations based on Chebychev > polynomials. Since the CG polynomials are optimal, they are at least as > good as Chebychev. Hence the number of iterations is at most what you got > from Chebychev, which is the condition number bound.) > I like this explanation for normal matrices. Thanks so much, Victor, Fande, > > Victor. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wence at gmx.li Fri Feb 7 15:17:36 2020 From: wence at gmx.li (Lawrence Mitchell) Date: Fri, 7 Feb 2020 21:17:36 +0000 Subject: [petsc-users] Condition Number and GMRES iteration In-Reply-To: References: Message-ID: On Fri, 7 Feb 2020 at 19:15, Fande Kong wrote: > Thanks, Matt, > > It is a great paper. According to the paper, here is my understanding: for > normal matrices, the eigenvalues of the matrix together with the > initial residual completely determine the GMRES convergence rate. For > non-normal matrices, eigenvalues are NOT the relevant quantities in > determining the behavior of GMRES. > > What quantities we should look at for non-normal matrices? In other words, > how do we know one matrix is easier than others to solve? > You need to do a field of values analysis to provide information. This can give you bounds on convergence but it is often very weak. J?rg Liesen has a bunch of papers on this. Lawrence -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Fri Feb 7 16:27:59 2020 From: jed at jedbrown.org (Jed Brown) Date: Fri, 07 Feb 2020 15:27:59 -0700 Subject: [petsc-users] Condition Number and GMRES iteration In-Reply-To: References: Message-ID: <87pneqav80.fsf@jedbrown.org> Fande Kong writes: > Thanks, Matt, > > It is a great paper. According to the paper, here is my understanding: for > normal matrices, the eigenvalues of the matrix together with the > initial residual completely determine the GMRES convergence rate. For > non-normal matrices, eigenvalues are NOT the relevant quantities in > determining the behavior of GMRES. > > What quantities we should look at for non-normal matrices? In other words, > how do we know one matrix is easier than others to solve? Possibly they > are still open problems?! You can use the pseudospectrum, but that isn't a convenient thing to compute for large systems. With respect to condition number: an orthogonal matrix is a normal matrix of condition number 1 for which GMRES requires n iterations. From bsmith at mcs.anl.gov Fri Feb 7 18:02:24 2020 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Sat, 8 Feb 2020 00:02:24 +0000 Subject: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC? In-Reply-To: References: <264F91C4-8558-4850-9B4B-ABE4123C2A2C@anl.gov> <4A373D93-4018-45E0-B805-3ECC528472DD@mcs.anl.gov> Message-ID: > On Feb 7, 2020, at 7:44 AM, Hao DONG wrote: > > Thanks, Barry, I really appreciate your help - > > I removed the OpenMP flags and rebuilt PETSc. So the problem is from the BLAS lib I linked? Yes, the openmp causes it to run in parallel, but the problem is not big enough and the machine is not good enough for parallel BLAS to speed things up, instead it slows things down a lot. We see this often, parallel BLAS must be used with care > Not sure which version my BLAS is, though? But I also included the -download-Scalapack option. Shouldn?t that enable linking with PBLAS automatically? > > After looking at the bcgs code in PETSc, I suppose the iteration residual recorded is indeed recorded twice per one "actual iteration?. So that can explain the difference of iteration numbers. > > My laptop is indeed an old machine (MBP15 mid-2014). I just cannot work with vi without a physical ESC key... The latest has a physical ESC, I am stuff without the ESC for a couple more years. > I have attached the configure.log -didn?t know that it is so large! > > Anyway, it seems that the removal of -openmp changes quite a lot of things, the performance is indeed getting much better - the flop/sec increases by a factor of 3. Still, I am getting 20 percent of VecMDot, but no VecMDot in BCGS all (see output below), is that a feature of gmres method? Yes, GMRES orthogonalizes against the last restart directions which uses these routines while BCGS does not, this is why BCGS is cheaper per iteration. PETSc is no faster than your code because the algorithm is the same, the compilers the same, and the hardware the same. No way to have clever tricks for PETSc to be much faster. What PETS provides is a huge variety of tested algorithms that no single person could code on their own. Anything in PETSc you could code yourself if you had endless time and get basically the same performance. Barry > > here is the output of the same problem with: > > -pc_type bjacobi -pc_bjacobi_local_blocks 3 -sub_pc_type ilu -ksp_type gmres -ksp_monitor -ksp_view > > > ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- > > Mod3DMT.test on a arch-darwin-c-opt named Haos-MBP with 1 processor, by donghao Fri Feb 7 10:26:19 2020 > Using Petsc Release Version 3.12.3, unknown > > Max Max/Min Avg Total > Time (sec): 2.520e+00 1.000 2.520e+00 > Objects: 1.756e+03 1.000 1.756e+03 > Flop: 7.910e+09 1.000 7.910e+09 7.910e+09 > Flop/sec: 3.138e+09 1.000 3.138e+09 3.138e+09 > MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00 > MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00 > MPI Reductions: 0.000e+00 0.000 > > Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) > e.g., VecAXPY() for real vectors of length N --> 2N flop > and VecAXPY() for complex vectors of length N --> 8N flop > > Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- > Avg %Total Avg %Total Count %Total Avg %Total Count %Total > 0: Main Stage: 2.5204e+00 100.0% 7.9096e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% > > ------------------------------------------------------------------------------------------------------------------------ > ? > ------------------------------------------------------------------------------------------------------------------------ > Event Count Time (sec) Flop --- Global --- --- Stage ---- Total > Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > ------------------------------------------------------------------------------------------------------------------------ > > --- Event Stage 0: Main Stage > > BuildTwoSidedF 1 1.0 3.4000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatMult 75 1.0 6.2884e-01 1.0 1.88e+09 1.0 0.0e+00 0.0e+00 0.0e+00 25 24 0 0 0 25 24 0 0 0 2991 > MatSolve 228 1.0 4.4164e-01 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 14 0 0 0 18 14 0 0 0 2445 > MatLUFactorNum 3 1.0 4.1317e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 574 > MatILUFactorSym 3 1.0 2.3858e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > MatAssemblyBegin 5 1.0 4.4000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyEnd 5 1.0 1.5067e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > MatGetRowIJ 3 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatCreateSubMats 1 1.0 2.4558e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > MatGetOrdering 3 1.0 1.3290e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatView 3 1.0 1.2800e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecMDot 72 1.0 4.9875e-01 1.0 2.25e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 28 0 0 0 20 28 0 0 0 4509 > VecNorm 76 1.0 6.6666e-02 1.0 1.70e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 2 0 0 0 3 2 0 0 0 2544 > VecScale 75 1.0 1.7982e-02 1.0 8.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 4653 > VecCopy 3 1.0 1.5080e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecSet 276 1.0 9.6784e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 0 > VecAXPY 6 1.0 3.6860e-03 1.0 1.34e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3632 > VecMAXPY 75 1.0 4.0490e-01 1.0 2.41e+09 1.0 0.0e+00 0.0e+00 0.0e+00 16 30 0 0 0 16 30 0 0 0 5951 > VecAssemblyBegin 2 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecAssemblyEnd 2 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecScatterBegin 76 1.0 5.3800e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecNormalize 75 1.0 8.3690e-02 1.0 2.51e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 3 0 0 0 3 3 0 0 0 2999 > KSPSetUp 4 1.0 1.1663e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSolve 1 1.0 2.2119e+00 1.0 7.91e+09 1.0 0.0e+00 0.0e+00 0.0e+00 88100 0 0 0 88100 0 0 0 3576 > KSPGMRESOrthog 72 1.0 8.7843e-01 1.0 4.50e+09 1.0 0.0e+00 0.0e+00 0.0e+00 35 57 0 0 0 35 57 0 0 0 5121 > PCSetUp 4 1.0 9.2448e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 257 > PCSetUpOnBlocks 1 1.0 6.6597e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 356 > PCApply 76 1.0 4.6281e-01 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 14 0 0 0 18 14 0 0 0 2333 > PCApplyOnBlocks 228 1.0 4.6262e-01 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 14 0 0 0 18 14 0 0 0 2334 > ------------------------------------------------------------------------------------------------------------------------ > > Average time to get PetscTime(): 1e-07 > #PETSc Option Table entries: > -I LBFGS > -ksp_type gmres > -ksp_view > -log_view > -pc_bjacobi_local_blocks 3 > -pc_type bjacobi > -sub_pc_type ilu > #End of PETSc Option Table entries > Compiled with FORTRAN kernels > Compiled with full precision matrices (default) > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 16 sizeof(PetscInt) 4 > Configure options: --with-scalar-type=complex --download-mumps --download-scalapack --with-fortran-kernels=1 -- FOPTFLAGS="-O3 -ffree-line-length-0 -msse2" --COPTFLAGS="-O3 -msse2" --CXXOPTFLAGS="-O3 -msse2" --with-debugging=0 > ----------------------------------------- > Libraries compiled on 2020-02-07 10:07:42 on Haos-MBP > Machine characteristics: Darwin-19.3.0-x86_64-i386-64bit > Using PETSc directory: /Users/donghao/src/git/PETSc-current > Using PETSc arch: arch-darwin-c-opt > ----------------------------------------- > > Using C compiler: mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fno-stack- check -Qunused-arguments -fvisibility=hidden -O3 -msse2 > Using Fortran compiler: mpif90 -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -O3 -ffree-line-length-0 - msse2 > ----------------------------------------- > > Using include paths: -I/Users/donghao/src/git/PETSc-current/include -I/Users/donghao/src/git/PETSc-current/arch-darwin- c-opt/include > ----------------------------------------- > > Using C linker: mpicc > Using Fortran linker: mpif90 > Using libraries: -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/donghao/src/git/PETSc- current/arch-darwin-c-opt/lib -lpetsc -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/ donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -Wl,-rpath,/usr/local/opt/libevent/lib -L/usr/local/opt/libevent/ lib -Wl,-rpath,/usr/local/Cellar/open-mpi/4.0.2/lib -L/usr/local/Cellar/open-mpi/4.0.2/lib -Wl,-rpath,/usr/local/Cellar/ gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple-darwin19/9.2.0 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple- darwin19/9.2.0 -Wl,-rpath,/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -lcmumps - ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -llapack -lblas -lc++ -ldl -lmpi_usempif08 - lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lquadmath -lm -lc++ -ldl > ----------------------------------------- > > > > The BCGS solver performance is now comparable to my own Fortran code (1.84s). Still, I feel that there is something wrong hidden somewhere in my setup - a professional lib should to perform better, I believe. Any other ideas that I can look into? Interestingly there is no VecMDot operation at all! Here is the output with the option of: > > -pc_type bjacobi -pc_bjacobi_local_blocks 3 -sub_pc_type ilu -ksp_type bcgs -ksp_monitor -ksp_view > > > > > ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- > > Mod3DMT.test on a arch-darwin-c-opt named Haos-MBP with 1 processor, by donghao Fri Feb 7 10:38:00 2020 > Using Petsc Release Version 3.12.3, unknown > > Max Max/Min Avg Total > Time (sec): 2.187e+00 1.000 2.187e+00 > Objects: 1.155e+03 1.000 1.155e+03 > Flop: 4.311e+09 1.000 4.311e+09 4.311e+09 > Flop/sec: 1.971e+09 1.000 1.971e+09 1.971e+09 > MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00 > MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00 > MPI Reductions: 0.000e+00 0.000 > > Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) > e.g., VecAXPY() for real vectors of length N --> 2N flop > and VecAXPY() for complex vectors of length N --> 8N flop > > Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- > Avg %Total Avg %Total Count %Total Avg %Total Count %Total > 0: Main Stage: 2.1870e+00 100.0% 4.3113e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% > > ------------------------------------------------------------------------------------------------------------------------ > > ------------------------------------------------------------------------------------------------------------------------ > Event Count Time (sec) Flop --- Global --- --- Stage ---- Total > Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > ------------------------------------------------------------------------------------------------------------------------ > > --- Event Stage 0: Main Stage > > BuildTwoSidedF 1 1.0 2.2000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatMult 83 1.0 7.8726e-01 1.0 2.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 36 48 0 0 0 36 48 0 0 0 2644 > MatSolve 252 1.0 5.5656e-01 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 25 28 0 0 0 25 28 0 0 0 2144 > MatLUFactorNum 3 1.0 4.5115e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 526 > MatILUFactorSym 3 1.0 2.5103e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > MatAssemblyBegin 5 1.0 3.3000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyEnd 5 1.0 1.5709e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > MatGetRowIJ 3 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatCreateSubMats 1 1.0 2.8989e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > MatGetOrdering 3 1.0 1.1200e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatView 3 1.0 1.2600e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecDot 82 1.0 8.9328e-02 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00 4 4 0 0 0 4 4 0 0 0 2048 > VecDotNorm2 41 1.0 9.9019e-02 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00 5 4 0 0 0 5 4 0 0 0 1848 > VecNorm 43 1.0 3.9988e-02 1.0 9.59e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 2399 > VecCopy 2 1.0 1.1150e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecSet 271 1.0 4.2833e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 > VecAXPY 1 1.0 5.9200e-04 1.0 2.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3769 > VecAXPBYCZ 82 1.0 1.1448e-01 1.0 3.66e+08 1.0 0.0e+00 0.0e+00 0.0e+00 5 8 0 0 0 5 8 0 0 0 3196 > VecWAXPY 82 1.0 6.7460e-02 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 4 0 0 0 3 4 0 0 0 2712 > VecAssemblyBegin 2 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecAssemblyEnd 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecScatterBegin 84 1.0 5.2800e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSetUp 4 1.0 1.4765e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > KSPSolve 1 1.0 1.8514e+00 1.0 4.31e+09 1.0 0.0e+00 0.0e+00 0.0e+00 85100 0 0 0 85100 0 0 0 2329 > PCSetUp 4 1.0 1.0193e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 5 1 0 0 0 5 1 0 0 0 233 > PCSetUpOnBlocks 1 1.0 7.1421e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 1 0 0 0 3 1 0 0 0 332 > PCApply 84 1.0 5.7927e-01 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 26 28 0 0 0 26 28 0 0 0 2060 > PCApplyOnBlocks 252 1.0 5.7902e-01 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 26 28 0 0 0 26 28 0 0 0 2061 > ------------------------------------------------------------------------------------------------------------------------ > > > Cheers, > Hao > > > > From: Smith, Barry F. > Sent: Thursday, February 6, 2020 7:03 PM > To: Hao DONG > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC? > > > Read my comments ALL the way down, they go a long way. > > > On Feb 6, 2020, at 3:43 AM, Hao DONG wrote: > > > > Dear Hong and Barry, > > > > Thanks for the suggestions. So there could be some problems in my PETSc configuration? - but my PETSc lib was indeed compiled without the debug flags (--with-debugging=0). I use GCC/GFortran (Home-brew GCC 9.2.0) for the compiling and building of PETSc and my own fortran code. My Fortran compiling flags are simply like: > > > > -O3 -ffree-line-length-none -fastsse > > > > Which is also used for -FOPTFLAGS in PETSc (I added -openmp for PETSc, but not my fortran code, as I don?t have any OMP optimizations in my program). Note the performance test results I listed yesterday (e.g. 4.08s with 41 bcgs iterations.) are without any CSR-array->PETSc translation overhead (only include the set and solve part). > > PETSc doesn't use -openmp in any way for its solvers. Do not use this option, it may be slowing the code down. Please send configure.log > > > > > I have two questions about the performance difference - > > > > 1. Is ilu only factorized once for each iteration, or ilu is performed at every outer ksp iteration steps? Sounds unlikely - but if so, this could cause some extra overheads. > > ILU is ONLY done if the matrix has changed (which seems wrong). > > > > 2. Some KSP solvers like BCGS or TFQMR has two ?half-iterations? for each iteration step. Not sure how it works in PETSc, but is that possible that both the two ?half" relative residuals are counted in the output array, doubling the number of iterations (but that cannot explain the extra time consumed)? > > Yes, PETSc might report them as two, you need to check the exact code. > > > > > Anyway, the output with -log_view from the same 278906 by 278906 matrix with 3-block D-ILU in PETSc is as follows: > > > > > > ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- > > > > MEMsolv.lu on a arch-darwin-c-opt named Haos-MBP with 1 processor, by donghao Thu Feb 6 09:07:35 2020 > > Using Petsc Release Version 3.12.3, unknown > > > > Max Max/Min Avg Total > > Time (sec): 4.443e+00 1.000 4.443e+00 > > Objects: 1.155e+03 1.000 1.155e+03 > > Flop: 4.311e+09 1.000 4.311e+09 4.311e+09 > > Flop/sec: 9.703e+08 1.000 9.703e+08 9.703e+08 > > MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00 > > MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00 > > MPI Reductions: 0.000e+00 0.000 > > > > Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) > > e.g., VecAXPY() for real vectors of length N --> 2N flop > > and VecAXPY() for complex vectors of length N --> 8N flop > > > > Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- > > Avg %Total Avg %Total Count %Total Avg %Total Count %Total > > 0: Main Stage: 4.4435e+00 100.0% 4.3113e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% > > > > ???????????????????????????????????????????????????????????? > > See the 'Profiling' chapter of the users' manual for details on interpreting output. > > Phase summary info: > > Count: number of times phase was executed > > Time and Flop: Max - maximum over all processors > > Ratio - ratio of maximum to minimum over all processors > > Mess: number of messages sent > > AvgLen: average message length (bytes) > > Reduct: number of global reductions > > Global: entire computation > > Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). > > %T - percent time in this phase %F - percent flop in this phase > > %M - percent messages in this phase %L - percent message lengths in this phase > > %R - percent reductions in this phase > > Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) > > ------------------------------------------------------------------------------------------------------------------------ > > Event Count Time (sec) Flop --- Global --- --- Stage ---- Total > > Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > > ------------------------------------------------------------------------------------------------------------------------ > > > > --- Event Stage 0: Main Stage > > > > BuildTwoSidedF 1 1.0 2.3000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatMult 83 1.0 1.7815e+00 1.0 2.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 40 48 0 0 0 40 48 0 0 0 1168 > > MatSolve 252 1.0 1.2708e+00 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 29 28 0 0 0 29 28 0 0 0 939 > > MatLUFactorNum 3 1.0 7.9725e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 298 > > MatILUFactorSym 3 1.0 2.6998e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > > MatAssemblyBegin 5 1.0 3.6000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatAssemblyEnd 5 1.0 3.1619e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > > MatGetRowIJ 3 1.0 2.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatCreateSubMats 1 1.0 3.9659e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > > MatGetOrdering 3 1.0 4.3070e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatView 3 1.0 1.3600e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecDot 82 1.0 1.8948e-01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00 4 4 0 0 0 4 4 0 0 0 966 > > VecDotNorm2 41 1.0 1.6812e-01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00 4 4 0 0 0 4 4 0 0 0 1088 > > VecNorm 43 1.0 9.5099e-02 1.0 9.59e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 1009 > > VecCopy 2 1.0 1.0120e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecSet 271 1.0 3.8922e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > > VecAXPY 1 1.0 7.7200e-04 1.0 2.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2890 > > VecAXPBYCZ 82 1.0 2.4370e-01 1.0 3.66e+08 1.0 0.0e+00 0.0e+00 0.0e+00 5 8 0 0 0 5 8 0 0 0 1502 > > VecWAXPY 82 1.0 1.4148e-01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 4 0 0 0 3 4 0 0 0 1293 > > VecAssemblyBegin 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecAssemblyEnd 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecScatterBegin 84 1.0 5.9300e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > KSPSetUp 4 1.0 1.4167e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > KSPSolve 1 1.0 4.0250e+00 1.0 4.31e+09 1.0 0.0e+00 0.0e+00 0.0e+00 91100 0 0 0 91100 0 0 0 1071 > > PCSetUp 4 1.0 1.5207e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 1 0 0 0 3 1 0 0 0 156 > > PCSetUpOnBlocks 1 1.0 1.1116e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 1 0 0 0 3 1 0 0 0 214 > > PCApply 84 1.0 1.2912e+00 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 29 28 0 0 0 29 28 0 0 0 924 > > PCApplyOnBlocks 252 1.0 1.2909e+00 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 29 28 0 0 0 29 28 0 0 0 924 > > ------------------------------------------------------------------------------------------------------------------------ > > > > # I skipped the memory part - the options (and compiler options) are as follows: > > > > #PETSc Option Table entries: > > -ksp_type bcgs > > -ksp_view > > -log_view > > -pc_bjacobi_local_blocks 3 > > -pc_factor_levels 0 > > -pc_sub_type ilu > > -pc_type bjacobi > > #End of PETSc Option Table entries > > Compiled with FORTRAN kernels > > Compiled with full precision matrices (default) > > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 16 sizeof(PetscInt) 4 > > Configure options: --with-scalar-type=complex --download-mumps --download-scalapack --with-fortran-kernels=1 -- FOPTFLAGS=?-O3 -fastsse -mp -openmp? --COPTFLAGS=?-O3 -fastsse -mp -openmp? --CXXOPTFLAGS="-O3 -fastsse -mp -openmp" -- with-debugging=0 > > ----------------------------------------- > > Libraries compiled on 2020-02-03 10:44:31 on Haos-MBP > > Machine characteristics: Darwin-19.2.0-x86_64-i386-64bit > > Using PETSc directory: /Users/donghao/src/git/PETSc-current > > Using PETSc arch: arch-darwin-c-opt > > ----------------------------------------- > > > > Using C compiler: mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fno-stack- check -Qunused-arguments -fvisibility=hidden > > Using Fortran compiler: mpif90 -Wall -ffree-line-length-0 -Wno-unused-dummy-argument > > > > Using include paths: -I/Users/donghao/src/git/PETSc-current/include -I/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/include > > ----------------------------------------- > > > > Using C linker: mpicc > > Using Fortran linker: mpif90 > > Using libraries: -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/donghao/src/git/PETSc- current/arch-darwin-c-opt/lib -lpetsc -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/ donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -Wl,-rpath,/usr/local/opt/libevent/lib -L/usr/local/opt/libevent/ lib -Wl,-rpath,/usr/local/Cellar/open-mpi/4.0.2/lib -L/usr/local/Cellar/open-mpi/4.0.2/lib -Wl,-rpath,/usr/local/Cellar/ gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple-darwin19/9.2.0 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple- darwin19/9.2.0 -Wl,-rpath,/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -lcmumps - ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -llapack -lblas -lc++ -ldl -lmpi_usempif08 - lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lquadmath -lm -lc++ -ldl > > > > > > On the other hand, running PETSc with > > > > -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type lu -ksp_type gmres -ksp_monitor -ksp_view -log_view > > > > For the same problem takes 5.37s and 72 GMRES iterations. Our previous testings show that BiCGstab (bcgs in PETSc) is almost always the most effective KSP solver for our non-symmetrical complex system. Strangely, the system is still using ilu instead of lu for sub blocks. The output is like: > > -sub_pc_type lu > > > > > 0 KSP Residual norm 2.480412407430e+02 > > 1 KSP Residual norm 8.848059967835e+01 > > 2 KSP Residual norm 3.415272863261e+01 > > 3 KSP Residual norm 1.563045190939e+01 > > 4 KSP Residual norm 6.241296940043e+00 > > 5 KSP Residual norm 2.739710899854e+00 > > 6 KSP Residual norm 1.391304148888e+00 > > 7 KSP Residual norm 7.959262020849e-01 > > 8 KSP Residual norm 4.828323055231e-01 > > 9 KSP Residual norm 2.918529739200e-01 > > 10 KSP Residual norm 1.905508589557e-01 > > 11 KSP Residual norm 1.291541892702e-01 > > 12 KSP Residual norm 8.827145774707e-02 > > 13 KSP Residual norm 6.521331095889e-02 > > 14 KSP Residual norm 5.095787952595e-02 > > 15 KSP Residual norm 4.043060387395e-02 > > 16 KSP Residual norm 3.232590200012e-02 > > 17 KSP Residual norm 2.593944982216e-02 > > 18 KSP Residual norm 2.064639483533e-02 > > 19 KSP Residual norm 1.653916663492e-02 > > 20 KSP Residual norm 1.334946415452e-02 > > 21 KSP Residual norm 1.092886880597e-02 > > 22 KSP Residual norm 8.988004105542e-03 > > 23 KSP Residual norm 7.466501315240e-03 > > 24 KSP Residual norm 6.284389135436e-03 > > 25 KSP Residual norm 5.425231669964e-03 > > 26 KSP Residual norm 4.766338253084e-03 > > 27 KSP Residual norm 4.241238878242e-03 > > 28 KSP Residual norm 3.808113525685e-03 > > 29 KSP Residual norm 3.449383788116e-03 > > 30 KSP Residual norm 3.126025526388e-03 > > 31 KSP Residual norm 2.958328054299e-03 > > 32 KSP Residual norm 2.802344900403e-03 > > 33 KSP Residual norm 2.621993580492e-03 > > 34 KSP Residual norm 2.430066269304e-03 > > 35 KSP Residual norm 2.259043079597e-03 > > 36 KSP Residual norm 2.104287972986e-03 > > 37 KSP Residual norm 1.952916080045e-03 > > 38 KSP Residual norm 1.804988937999e-03 > > 39 KSP Residual norm 1.643302117377e-03 > > 40 KSP Residual norm 1.471661332036e-03 > > 41 KSP Residual norm 1.286445911163e-03 > > 42 KSP Residual norm 1.127543025848e-03 > > 43 KSP Residual norm 9.777148275484e-04 > > 44 KSP Residual norm 8.293314450006e-04 > > 45 KSP Residual norm 6.989331136622e-04 > > 46 KSP Residual norm 5.852307780220e-04 > > 47 KSP Residual norm 4.926715539762e-04 > > 48 KSP Residual norm 4.215941372075e-04 > > 49 KSP Residual norm 3.699489548162e-04 > > 50 KSP Residual norm 3.293897163533e-04 > > 51 KSP Residual norm 2.959954542998e-04 > > 52 KSP Residual norm 2.700193032414e-04 > > 53 KSP Residual norm 2.461789791204e-04 > > 54 KSP Residual norm 2.218839085563e-04 > > 55 KSP Residual norm 1.945154309976e-04 > > 56 KSP Residual norm 1.661128781744e-04 > > 57 KSP Residual norm 1.413198766258e-04 > > 58 KSP Residual norm 1.213984003195e-04 > > 59 KSP Residual norm 1.044317450754e-04 > > 60 KSP Residual norm 8.919957502977e-05 > > 61 KSP Residual norm 8.042584301275e-05 > > 62 KSP Residual norm 7.292784493581e-05 > > 63 KSP Residual norm 6.481935501872e-05 > > 64 KSP Residual norm 5.718564652679e-05 > > 65 KSP Residual norm 5.072589750116e-05 > > 66 KSP Residual norm 4.487930741285e-05 > > 67 KSP Residual norm 3.941040674119e-05 > > 68 KSP Residual norm 3.492873281291e-05 > > 69 KSP Residual norm 3.103798339845e-05 > > 70 KSP Residual norm 2.822943237409e-05 > > 71 KSP Residual norm 2.610615023776e-05 > > 72 KSP Residual norm 2.441692671173e-05 > > KSP Object: 1 MPI processes > > type: gmres > > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > > happy breakdown tolerance 1e-30 > > maximum iterations=150, nonzero initial guess > > tolerances: relative=1e-07, absolute=1e-50, divergence=10000. > > left preconditioning > > using PRECONDITIONED norm type for convergence test > > PC Object: 1 MPI processes > > type: bjacobi > > number of blocks = 3 > > Local solve is same for all blocks, in the following KSP and PC objects: > > KSP Object: (sub_) 1 MPI processes > > type: preonly > > maximum iterations=10000, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > using NONE norm type for convergence test > > PC Object: (sub_) 1 MPI processes > > type: ilu > > out-of-place factorization > > 0 levels of fill > > tolerance for zero pivot 2.22045e-14 > > matrix ordering: natural > > factor fill ratio given 1., needed 1. > > Factored matrix follows: > > Mat Object: 1 MPI processes > > type: seqaij > > rows=92969, cols=92969 > > package used to perform factorization: petsc > > total: nonzeros=638417, allocated nonzeros=638417 > > total number of mallocs used during MatSetValues calls=0 > > not using I-node routines > > linear system matrix = precond matrix: > > Mat Object: 1 MPI processes > > type: seqaij > > rows=92969, cols=92969 > > total: nonzeros=638417, allocated nonzeros=638417 > > total number of mallocs used during MatSetValues calls=0 > > not using I-node routines > > linear system matrix = precond matrix: > > Mat Object: 1 MPI processes > > type: mpiaij > > rows=278906, cols=278906 > > total: nonzeros=3274027, allocated nonzeros=3274027 > > total number of mallocs used during MatSetValues calls=0 > > not using I-node (on process 0) routines > > ... > > ------------------------------------------------------------------------------------------------------------------------ > > Event Count Time (sec) Flop --- Global --- --- Stage ---- Total > > Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > > ------------------------------------------------------------------------------------------------------------------------ > > > > --- Event Stage 0: Main Stage > > > > BuildTwoSidedF 1 1.0 2.3000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatMult 75 1.0 1.5812e+00 1.0 1.88e+09 1.0 0.0e+00 0.0e+00 0.0e+00 28 24 0 0 0 28 24 0 0 0 1189 > > MatSolve 228 1.0 1.1442e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 14 0 0 0 20 14 0 0 0 944 > > These flop rates are ok, but not great. Perhaps an older machine. > > > MatLUFactorNum 3 1.0 8.1930e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 290 > > MatILUFactorSym 3 1.0 2.7102e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatAssemblyBegin 5 1.0 3.7000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatAssemblyEnd 5 1.0 3.1895e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > > MatGetRowIJ 3 1.0 2.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatCreateSubMats 1 1.0 4.0904e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > > MatGetOrdering 3 1.0 4.2640e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatView 3 1.0 1.4400e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecMDot 72 1.0 1.1984e+00 1.0 2.25e+09 1.0 0.0e+00 0.0e+00 0.0e+00 21 28 0 0 0 21 28 0 0 0 1877 > > 21 percent of the time in VecMDOT this is huge for s sequential fun. I think maybe you are using a terrible OpenMP BLAS? > > Send configure.log > > > > VecNorm 76 1.0 1.6841e-01 1.0 1.70e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 2 0 0 0 3 2 0 0 0 1007 > > VecScale 75 1.0 1.8241e-02 1.0 8.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 4587 > > VecCopy 3 1.0 1.4970e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecSet 276 1.0 9.1970e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 > > VecAXPY 6 1.0 3.7450e-03 1.0 1.34e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3575 > > VecMAXPY 75 1.0 1.0022e+00 1.0 2.41e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 30 0 0 0 18 30 0 0 0 2405 > > VecAssemblyBegin 2 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecAssemblyEnd 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecScatterBegin 76 1.0 5.5100e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecNormalize 75 1.0 1.8462e-01 1.0 2.51e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 3 0 0 0 3 3 0 0 0 1360 > > KSPSetUp 4 1.0 1.1341e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > KSPSolve 1 1.0 5.3123e+00 1.0 7.91e+09 1.0 0.0e+00 0.0e+00 0.0e+00 93100 0 0 0 93100 0 0 0 1489 > > KSPGMRESOrthog 72 1.0 2.1316e+00 1.0 4.50e+09 1.0 0.0e+00 0.0e+00 0.0e+00 37 57 0 0 0 37 57 0 0 0 2110 > > PCSetUp 4 1.0 1.5531e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 153 > > PCSetUpOnBlocks 1 1.0 1.1343e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 209 > > PCApply 76 1.0 1.1671e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 14 0 0 0 20 14 0 0 0 925 > > PCApplyOnBlocks 228 1.0 1.1668e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 14 0 0 0 20 14 0 0 0 925 > > ???????????????????????????????????????????????????????????? > > ... > > #PETSc Option Table entries: > > -ksp_monitor > > -ksp_type gmres > > -ksp_view > > -log_view > > -pc_bjacobi_local_blocks 3 > > -pc_sub_type lu > > -pc_type bjacobi > > #End of PETSc Option Table entries > > ... > > > > Does any of the setup/output ring a bell? > > > > BTW, out of curiosity - what is a ?I-node? routine? > > > > > > Cheers, > > Hao > > > > > > From: Smith, Barry F. > > Sent: Wednesday, February 5, 2020 9:42 PM > > To: Hao DONG > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC? > > > > > > > > > On Feb 5, 2020, at 4:36 AM, Hao DONG wrote: > > > > > > Thanks a lot for your suggestions, Hong and Barry - > > > > > > As you suggested, I first tried the LU direct solvers (built-in and MUMPs) out this morning, which work perfectly, albeit slow. As my problem itself is a part of a PDE based optimization, the exact solution in the searching procedure is not necessary (I often set a relative tolerance of 1E-7/1E-8, etc. for Krylov subspace methods). Also tried BJACOBI with exact LU, the KSP just converges in one or two iterations, as expected. > > > > > > I added -kspview option for the D-ILU test (still with Block Jacobi as preconditioner and bcgs as KSP solver). The KSPview output from one of the examples in a toy model looks like: > > > > > > KSP Object: 1 MPI processes > > > type: bcgs > > > maximum iterations=120, nonzero initial guess > > > tolerances: relative=1e-07, absolute=1e-50, divergence=10000. > > > left preconditioning > > > using PRECONDITIONED norm type for convergence test > > > PC Object: 1 MPI processes > > > type: bjacobi > > > number of blocks = 3 > > > Local solve is same for all blocks, in the following KSP and PC objects: > > > KSP Object: (sub_) 1 MPI processes > > > type: preonly > > > maximum iterations=10000, initial guess is zero > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > > left preconditioning > > > using NONE norm type for convergence test > > > PC Object: (sub_) 1 MPI processes > > > type: ilu > > > out-of-place factorization > > > 0 levels of fill > > > tolerance for zero pivot 2.22045e-14 > > > matrix ordering: natural > > > factor fill ratio given 1., needed 1. > > > Factored matrix follows: > > > Mat Object: 1 MPI processes > > > type: seqaij > > > rows=11294, cols=11294 > > > package used to perform factorization: petsc > > > total: nonzeros=76008, allocated nonzeros=76008 > > > total number of mallocs used during MatSetValues calls=0 > > > not using I-node routines > > > linear system matrix = precond matrix: > > > Mat Object: 1 MPI processes > > > type: seqaij > > > rows=11294, cols=11294 > > > total: nonzeros=76008, allocated nonzeros=76008 > > > total number of mallocs used during MatSetValues calls=0 > > > not using I-node routines > > > linear system matrix = precond matrix: > > > Mat Object: 1 MPI processes > > > type: mpiaij > > > rows=33880, cols=33880 > > > total: nonzeros=436968, allocated nonzeros=436968 > > > total number of mallocs used during MatSetValues calls=0 > > > not using I-node (on process 0) routines > > > > > > do you see something wrong with my setup? > > > > > > I also tried a quick performance test with a small 278906 by 278906 matrix (3850990 nnz) with the following parameters: > > > > > > -ksp_type bcgs -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type ilu -ksp_view > > > > > > Reducing the relative residual to 1E-7 > > > > > > Took 4.08s with 41 bcgs iterations. > > > > > > Merely changing the -pc_bjacobi_local_blocks to 6 > > > > > > Took 7.02s with 73 bcgs iterations. 9 blocks would further take 9.45s with 101 bcgs iterations. > > > > This is normal. more blocks slower convergence > > > > > > As a reference, my home-brew Fortran code solves the same problem (3-block D-ILU0) in > > > > > > 1.84s with 24 bcgs iterations (the bcgs code is also a home-brew one)? > > > > > Run the PETSc code with optimization ./configure --with-debugging=0 an run the code with -log_view this will show where the PETSc code is spending the time (send it to use) > > > > > > > > > > > > > Well, by saying ?use explicit L/U matrix as preconditioner?, I wonder if a user is allowed to provide his own (separate) L and U Mat for preconditioning (like how it works in Matlab solvers), e.g. > > > > > > x = qmr(A,b,Tol,MaxIter,L,U,x) > > > > > > As I already explicitly constructed the L and U matrices in Fortran, it is not hard to convert them to Mat in Petsc to test Petsc and my Fortran code head-to-head. In that case, the A, b, x, and L/U are all identical, it would be easier to see where the problem came from. > > > > > > > > No, we don't provide this kind of support > > > > > > > > > > BTW, there is another thing I wondered - is there a way to output residual in unpreconditioned norm? I tried to > > > > > > call KSPSetNormType(ksp_local, KSP_NORM_UNPRECONDITIONED, ierr) > > > > > > But always get an error that current ksp does not support unpreconditioned in LEFT/RIGHT (either way). Is it possible to do that (output unpreconditioned residual) in PETSc at all? > > > > -ksp_monitor_true_residual You can also run GMRES (and some other methods) with right preconditioning, -ksp_pc_side right then the residual computed is by the algorithm the unpreconditioned residual > > > > KSPSetNormType sets the norm used in the algorithm, it generally always has to left or right, only a couple algorithms support unpreconditioned directly. > > > > Barry > > > > > > > > > > Cheers, > > > Hao > > > > > > > > > From: Smith, Barry F. > > > Sent: Tuesday, February 4, 2020 8:27 PM > > > To: Hao DONG > > > Cc: petsc-users at mcs.anl.gov > > > Subject: Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC? > > > > > > > > > > > > > On Feb 4, 2020, at 12:41 PM, Hao DONG wrote: > > > > > > > > Dear all, > > > > > > > > > > > > I have a few questions about the implementation of diagonal ILU PC in PETSc. I want to solve a very simple system with KSP (in parallel), the nature of the system (finite difference time-harmonic Maxwell) is probably not important to the question itself. Long story short, I just need to solve a set of equations of Ax = b with a block diagonal system matrix, like (not sure if the mono font works): > > > > > > > > |X | > > > > A =| Y | > > > > | Z| > > > > > > > > Note that A is not really block-diagonal, it?s just a multi-diagonal matrix determined by the FD mesh, where most elements are close to diagonal. So instead of a full ILU decomposition, a D-ILU is a good approximation as a preconditioner, and the number of blocks should not matter too much: > > > > > > > > |Lx | |Ux | > > > > L = | Ly | and U = | Uy | > > > > | Lz| | Uz| > > > > > > > > Where [Lx, Ux] = ILU0(X), etc. Indeed, the D-ILU preconditioner (with 3N blocks) is quite efficient with Krylov subspace methods like BiCGstab or QMR in my sequential Fortran/Matlab code. > > > > > > > > So like most users, I am looking for a parallel implement with this problem in PETSc. After looking through the manual, it seems that the most straightforward way to do it is through PCBJACOBI. Not sure I understand it right, I just setup a 3-block PCJACOBI and give each of the block a KSP with PCILU. Is this supposed to be equivalent to my D-ILU preconditioner? My little block of fortran code would look like: > > > > ... > > > > call PCBJacobiSetTotalBlocks(pc_local,Ntotal, & > > > > & isubs,ierr) > > > > call PCBJacobiSetLocalBlocks(pc_local, Nsub, & > > > > & isubs(istart:iend),ierr) > > > > ! set up the block jacobi structure > > > > call KSPSetup(ksp_local,ierr) > > > > ! allocate sub ksps > > > > allocate(ksp_sub(Nsub)) > > > > call PCBJacobiGetSubKSP(pc_local,Nsub,istart, & > > > > & ksp_sub,ierr) > > > > do i=1,Nsub > > > > call KSPGetPC(ksp_sub(i),pc_sub,ierr) > > > > !ILU preconditioner > > > > call PCSetType(pc_sub,ptype,ierr) > > > > call PCFactorSetLevels(pc_sub,1,ierr) ! use ILU(1) here > > > > call KSPSetType(ksp_sub(i),KSPPREONLY,ierr)] > > > > end do > > > > call KSPSetTolerances(ksp_local,KSSiter%tol,PETSC_DEFAULT_REAL, & > > > > & PETSC_DEFAULT_REAL,KSSiter%maxit,ierr) > > > > ? > > > > > > This code looks essentially right. You should call with -ksp_view to see exactly what PETSc is using for a solver. > > > > > > > > > > > I understand that the parallel performance may not be comparable, so I first set up a one-process test (with MPIAij, but all the rows are local since there is only one process). The system is solved without any problem (identical results within error). But the performance is actually a lot worse (code built without debugging flags in performance tests) than my own home-brew implementation in Fortran (I wrote my own ILU0 in CSR sparse matrix format), which is hard to believe. I suspect the difference is from the PC as the PETSc version took much more BiCGstab iterations (60-ish vs 100-ish) to converge to the same relative tol. > > > > > > PETSc uses GMRES by default with a restart of 30 and left preconditioning. It could be different exact choices in the solver (which is why -ksp_view is so useful) can explain the differences in the runs between your code and PETSc's > > > > > > > > This is further confirmed when I change the setup of D-ILU (using 6 or 9 blocks instead of 3). While my Fortran/Matlab codes see minimal performance difference (<5%) when I play with the D-ILU setup, increasing the number of D-ILU blocks from 3 to 9 caused the ksp setup with PCBJACOBI to suffer a performance decrease of more than 50% in sequential test. > > > > > > This is odd, the more blocks the smaller each block so the quicker the ILU set up should be. You can run various cases with -log_view and send us the output to see what is happening at each part of the computation in time. > > > > > > > So my implementation IS somewhat different in PETSc. Do I miss something in the PCBJACOBI setup? Or do I have some fundamental misunderstanding of how PCBJACOBI works in PETSc? > > > > > > Probably not. > > > > > > > > If this is not the right way to implement a block diagonal ILU as (parallel) PC, please kindly point me to the right direction. I searched through the mail list to find some answers, only to find a couple of similar questions... An example would be nice. > > > > > > You approach seems fundamentally right but I cannot be sure of possible bugs. > > > > > > > > On the other hand, does PETSc support a simple way to use explicit L/U matrix as a preconditioner? I can import the D-ILU matrix (I already converted my A matrix into Mat) constructed in my Fortran code to make a better comparison. Or do I have to construct my own PC using PCshell? If so, is there a good tutorial/example to learn about how to use PCSHELL (in a more advanced sense, like how to setup pc side and transpose)? > > > > > > Not sure what you mean by explicit L/U matrix as a preconditioner. As Hong said, yes you can use a parallel LU from MUMPS or SuperLU_DIST or Pastix as the solver. You do not need any shell code. You simply need to set the PCType to lu > > > > > > You can also set all this options from the command line and don't need to write the code specifically. So call KSPSetFromOptions() and then for example > > > > > > -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type ilu (this last one is applied to each block so you could use -pc_type lu and it would use lu on each block.) > > > > > > -ksp_type_none -pc_type lu -pc_factor_mat_solver_type mumps (do parallel LU with mumps) > > > > > > By not hardwiring in the code and just using options you can test out different cases much quicker > > > > > > Use -ksp_view to make sure that is using the solver the way you expect. > > > > > > Barry > > > > > > > > > > > > Barry > > > > > > > > > > > Thanks in advance, > > > > > > > > Hao > > From griesser.jan at googlemail.com Mon Feb 10 07:32:37 2020 From: griesser.jan at googlemail.com (=?UTF-8?B?SmFuIEdyaWXDn2Vy?=) Date: Mon, 10 Feb 2020 14:32:37 +0100 Subject: [petsc-users] Spectrum slicing, Cholesky factorization for positive semidefinite matrices Message-ID: Hello, everybody, i want to use the spectrum slicing method in Slepc4py to compute a subset of the eigenvalues and associated eigenvectors of my matrix. To do this I need a factorization that provids the Matrix Inertia. The Cholesky decomposition is given as an example in the user manual. The problem ist that my matrix is not positive definit but positive semidefinit (Three eigenvalues are zero). The PETSc user forum only states that for the Cholesky factorization a symmetric matrix is zero, but as far is i remember the Chosleky factorization is only numerical stable for positive definite matrices. Can i use an LU factorization for the spectrum slicing, although the PETSc user manual states that the Inertia is accessible when using Cholseky? Or can is still use Chollesky? Greetings Jan -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Mon Feb 10 07:41:50 2020 From: jroman at dsic.upv.es (Jose E. Roman) Date: Mon, 10 Feb 2020 14:41:50 +0100 Subject: [petsc-users] Spectrum slicing, Cholesky factorization for positive semidefinite matrices In-Reply-To: References: Message-ID: <5ED476D2-A8FE-49A2-A95D-33A5383BAC2B@dsic.upv.es> The spectrum slicing method computes the Cholesky factorization of (A-sigma*B) or (A-sigma*I) for several values of sigma. This matrix is indefinite, it does not matter if your B matrix is semi-definite. If B is singular, the only precaution is that you have to use purification, but this option is turned on by default so no problem. Jose > El 10 feb 2020, a las 14:32, Jan Grie?er via petsc-users escribi?: > > Hello, everybody, > i want to use the spectrum slicing method in Slepc4py to compute a subset of the eigenvalues and associated eigenvectors of my matrix. To do this I need a factorization that provids the Matrix Inertia. The Cholesky decomposition is given as an example in the user manual. The problem ist that my matrix is not positive definit but positive semidefinit (Three eigenvalues are zero). The PETSc user forum only states that for the Cholesky factorization a symmetric matrix is zero, but as far is i remember the Chosleky factorization is only numerical stable for positive definite matrices. Can i use an LU factorization for the spectrum slicing, although the PETSc user manual states that the Inertia is accessible when using Cholseky? Or can is still use Chollesky? > Greetings Jan From dong-hao at outlook.com Mon Feb 10 08:47:50 2020 From: dong-hao at outlook.com (Hao DONG) Date: Mon, 10 Feb 2020 14:47:50 +0000 Subject: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC? In-Reply-To: References: <264F91C4-8558-4850-9B4B-ABE4123C2A2C@anl.gov> <4A373D93-4018-45E0-B805-3ECC528472DD@mcs.anl.gov> Message-ID: <9DF1BA10-D81B-4BCF-98EE-0179B9A681BA@outlook.com> Hi Barry, Thank you for you suggestions (and insights)! Indeed my initial motivation to try out PETSc is the different methods. As my matrix pattern is relatively simple (3D time-harmonic Maxwell equation arises from stagger-grid finite difference), also considering the fact that I am not wealthy enough to utilize the direct solvers, I was looking for a fast Krylov subspace method / preconditioner that scale well with, say, tens of cpu cores. As a simple block-Jacobian preconditioner seems to lose its efficiency with more than a handful of blocks, I planned to look into other methods/preconditioners, e.g. multigrid (as preconditioner) or domain decomposition methods. But I will probably need to look through a number of literatures before laying my hands on those (or bother you with more questions!). Anyway, thanks again for your kind help. All the best, Hao > On Feb 8, 2020, at 8:02 AM, Smith, Barry F. wrote: > > > >> On Feb 7, 2020, at 7:44 AM, Hao DONG wrote: >> >> Thanks, Barry, I really appreciate your help - >> >> I removed the OpenMP flags and rebuilt PETSc. So the problem is from the BLAS lib I linked? > > Yes, the openmp causes it to run in parallel, but the problem is not big enough and the machine is not good enough for parallel BLAS to speed things up, instead it slows things down a lot. We see this often, parallel BLAS must be used with care > >> Not sure which version my BLAS is, though? But I also included the -download-Scalapack option. Shouldn?t that enable linking with PBLAS automatically? >> >> After looking at the bcgs code in PETSc, I suppose the iteration residual recorded is indeed recorded twice per one "actual iteration?. So that can explain the difference of iteration numbers. >> >> My laptop is indeed an old machine (MBP15 mid-2014). I just cannot work with vi without a physical ESC key... > > The latest has a physical ESC, I am stuff without the ESC for a couple more years. > >> I have attached the configure.log -didn?t know that it is so large! >> >> Anyway, it seems that the removal of -openmp changes quite a lot of things, the performance is indeed getting much better - the flop/sec increases by a factor of 3. Still, I am getting 20 percent of VecMDot, but no VecMDot in BCGS all (see output below), is that a feature of gmres method? > > Yes, GMRES orthogonalizes against the last restart directions which uses these routines while BCGS does not, this is why BCGS is cheaper per iteration. > > PETSc is no faster than your code because the algorithm is the same, the compilers the same, and the hardware the same. No way to have clever tricks for PETSc to be much faster. What PETS provides is a huge variety of tested algorithms that no single person could code on their own. Anything in PETSc you could code yourself if you had endless time and get basically the same performance. > > Barry > > >> >> here is the output of the same problem with: >> >> -pc_type bjacobi -pc_bjacobi_local_blocks 3 -sub_pc_type ilu -ksp_type gmres -ksp_monitor -ksp_view >> >> >> ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- >> >> Mod3DMT.test on a arch-darwin-c-opt named Haos-MBP with 1 processor, by donghao Fri Feb 7 10:26:19 2020 >> Using Petsc Release Version 3.12.3, unknown >> >> Max Max/Min Avg Total >> Time (sec): 2.520e+00 1.000 2.520e+00 >> Objects: 1.756e+03 1.000 1.756e+03 >> Flop: 7.910e+09 1.000 7.910e+09 7.910e+09 >> Flop/sec: 3.138e+09 1.000 3.138e+09 3.138e+09 >> MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00 >> MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00 >> MPI Reductions: 0.000e+00 0.000 >> >> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) >> e.g., VecAXPY() for real vectors of length N --> 2N flop >> and VecAXPY() for complex vectors of length N --> 8N flop >> >> Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- >> Avg %Total Avg %Total Count %Total Avg %Total Count %Total >> 0: Main Stage: 2.5204e+00 100.0% 7.9096e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% >> >> ------------------------------------------------------------------------------------------------------------------------ >> ? >> ------------------------------------------------------------------------------------------------------------------------ >> Event Count Time (sec) Flop --- Global --- --- Stage ---- Total >> Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >> ------------------------------------------------------------------------------------------------------------------------ >> >> --- Event Stage 0: Main Stage >> >> BuildTwoSidedF 1 1.0 3.4000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatMult 75 1.0 6.2884e-01 1.0 1.88e+09 1.0 0.0e+00 0.0e+00 0.0e+00 25 24 0 0 0 25 24 0 0 0 2991 >> MatSolve 228 1.0 4.4164e-01 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 14 0 0 0 18 14 0 0 0 2445 >> MatLUFactorNum 3 1.0 4.1317e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 574 >> MatILUFactorSym 3 1.0 2.3858e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> MatAssemblyBegin 5 1.0 4.4000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatAssemblyEnd 5 1.0 1.5067e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> MatGetRowIJ 3 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatCreateSubMats 1 1.0 2.4558e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> MatGetOrdering 3 1.0 1.3290e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatView 3 1.0 1.2800e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecMDot 72 1.0 4.9875e-01 1.0 2.25e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 28 0 0 0 20 28 0 0 0 4509 >> VecNorm 76 1.0 6.6666e-02 1.0 1.70e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 2 0 0 0 3 2 0 0 0 2544 >> VecScale 75 1.0 1.7982e-02 1.0 8.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 4653 >> VecCopy 3 1.0 1.5080e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecSet 276 1.0 9.6784e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 0 >> VecAXPY 6 1.0 3.6860e-03 1.0 1.34e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3632 >> VecMAXPY 75 1.0 4.0490e-01 1.0 2.41e+09 1.0 0.0e+00 0.0e+00 0.0e+00 16 30 0 0 0 16 30 0 0 0 5951 >> VecAssemblyBegin 2 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecAssemblyEnd 2 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecScatterBegin 76 1.0 5.3800e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecNormalize 75 1.0 8.3690e-02 1.0 2.51e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 3 0 0 0 3 3 0 0 0 2999 >> KSPSetUp 4 1.0 1.1663e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> KSPSolve 1 1.0 2.2119e+00 1.0 7.91e+09 1.0 0.0e+00 0.0e+00 0.0e+00 88100 0 0 0 88100 0 0 0 3576 >> KSPGMRESOrthog 72 1.0 8.7843e-01 1.0 4.50e+09 1.0 0.0e+00 0.0e+00 0.0e+00 35 57 0 0 0 35 57 0 0 0 5121 >> PCSetUp 4 1.0 9.2448e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 257 >> PCSetUpOnBlocks 1 1.0 6.6597e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 356 >> PCApply 76 1.0 4.6281e-01 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 14 0 0 0 18 14 0 0 0 2333 >> PCApplyOnBlocks 228 1.0 4.6262e-01 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 14 0 0 0 18 14 0 0 0 2334 >> ------------------------------------------------------------------------------------------------------------------------ >> >> Average time to get PetscTime(): 1e-07 >> #PETSc Option Table entries: >> -I LBFGS >> -ksp_type gmres >> -ksp_view >> -log_view >> -pc_bjacobi_local_blocks 3 >> -pc_type bjacobi >> -sub_pc_type ilu >> #End of PETSc Option Table entries >> Compiled with FORTRAN kernels >> Compiled with full precision matrices (default) >> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 16 sizeof(PetscInt) 4 >> Configure options: --with-scalar-type=complex --download-mumps --download-scalapack --with-fortran-kernels=1 -- FOPTFLAGS="-O3 -ffree-line-length-0 -msse2" --COPTFLAGS="-O3 -msse2" --CXXOPTFLAGS="-O3 -msse2" --with-debugging=0 >> ----------------------------------------- >> Libraries compiled on 2020-02-07 10:07:42 on Haos-MBP >> Machine characteristics: Darwin-19.3.0-x86_64-i386-64bit >> Using PETSc directory: /Users/donghao/src/git/PETSc-current >> Using PETSc arch: arch-darwin-c-opt >> ----------------------------------------- >> >> Using C compiler: mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fno-stack- check -Qunused-arguments -fvisibility=hidden -O3 -msse2 >> Using Fortran compiler: mpif90 -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -O3 -ffree-line-length-0 - msse2 >> ----------------------------------------- >> >> Using include paths: -I/Users/donghao/src/git/PETSc-current/include -I/Users/donghao/src/git/PETSc-current/arch-darwin- c-opt/include >> ----------------------------------------- >> >> Using C linker: mpicc >> Using Fortran linker: mpif90 >> Using libraries: -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/donghao/src/git/PETSc- current/arch-darwin-c-opt/lib -lpetsc -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/ donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -Wl,-rpath,/usr/local/opt/libevent/lib -L/usr/local/opt/libevent/ lib -Wl,-rpath,/usr/local/Cellar/open-mpi/4.0.2/lib -L/usr/local/Cellar/open-mpi/4.0.2/lib -Wl,-rpath,/usr/local/Cellar/ gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple-darwin19/9.2.0 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple- darwin19/9.2.0 -Wl,-rpath,/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -lcmumps - ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -llapack -lblas -lc++ -ldl -lmpi_usempif08 - lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lquadmath -lm -lc++ -ldl >> ----------------------------------------- >> >> >> >> The BCGS solver performance is now comparable to my own Fortran code (1.84s). Still, I feel that there is something wrong hidden somewhere in my setup - a professional lib should to perform better, I believe. Any other ideas that I can look into? Interestingly there is no VecMDot operation at all! Here is the output with the option of: >> >> -pc_type bjacobi -pc_bjacobi_local_blocks 3 -sub_pc_type ilu -ksp_type bcgs -ksp_monitor -ksp_view >> >> >> >> >> ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- >> >> Mod3DMT.test on a arch-darwin-c-opt named Haos-MBP with 1 processor, by donghao Fri Feb 7 10:38:00 2020 >> Using Petsc Release Version 3.12.3, unknown >> >> Max Max/Min Avg Total >> Time (sec): 2.187e+00 1.000 2.187e+00 >> Objects: 1.155e+03 1.000 1.155e+03 >> Flop: 4.311e+09 1.000 4.311e+09 4.311e+09 >> Flop/sec: 1.971e+09 1.000 1.971e+09 1.971e+09 >> MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00 >> MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00 >> MPI Reductions: 0.000e+00 0.000 >> >> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) >> e.g., VecAXPY() for real vectors of length N --> 2N flop >> and VecAXPY() for complex vectors of length N --> 8N flop >> >> Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- >> Avg %Total Avg %Total Count %Total Avg %Total Count %Total >> 0: Main Stage: 2.1870e+00 100.0% 4.3113e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% >> >> ------------------------------------------------------------------------------------------------------------------------ >> >> ------------------------------------------------------------------------------------------------------------------------ >> Event Count Time (sec) Flop --- Global --- --- Stage ---- Total >> Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >> ------------------------------------------------------------------------------------------------------------------------ >> >> --- Event Stage 0: Main Stage >> >> BuildTwoSidedF 1 1.0 2.2000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatMult 83 1.0 7.8726e-01 1.0 2.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 36 48 0 0 0 36 48 0 0 0 2644 >> MatSolve 252 1.0 5.5656e-01 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 25 28 0 0 0 25 28 0 0 0 2144 >> MatLUFactorNum 3 1.0 4.5115e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 526 >> MatILUFactorSym 3 1.0 2.5103e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> MatAssemblyBegin 5 1.0 3.3000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatAssemblyEnd 5 1.0 1.5709e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> MatGetRowIJ 3 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatCreateSubMats 1 1.0 2.8989e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> MatGetOrdering 3 1.0 1.1200e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatView 3 1.0 1.2600e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecDot 82 1.0 8.9328e-02 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00 4 4 0 0 0 4 4 0 0 0 2048 >> VecDotNorm2 41 1.0 9.9019e-02 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00 5 4 0 0 0 5 4 0 0 0 1848 >> VecNorm 43 1.0 3.9988e-02 1.0 9.59e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 2399 >> VecCopy 2 1.0 1.1150e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecSet 271 1.0 4.2833e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >> VecAXPY 1 1.0 5.9200e-04 1.0 2.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3769 >> VecAXPBYCZ 82 1.0 1.1448e-01 1.0 3.66e+08 1.0 0.0e+00 0.0e+00 0.0e+00 5 8 0 0 0 5 8 0 0 0 3196 >> VecWAXPY 82 1.0 6.7460e-02 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 4 0 0 0 3 4 0 0 0 2712 >> VecAssemblyBegin 2 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecAssemblyEnd 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecScatterBegin 84 1.0 5.2800e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> KSPSetUp 4 1.0 1.4765e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> KSPSolve 1 1.0 1.8514e+00 1.0 4.31e+09 1.0 0.0e+00 0.0e+00 0.0e+00 85100 0 0 0 85100 0 0 0 2329 >> PCSetUp 4 1.0 1.0193e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 5 1 0 0 0 5 1 0 0 0 233 >> PCSetUpOnBlocks 1 1.0 7.1421e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 1 0 0 0 3 1 0 0 0 332 >> PCApply 84 1.0 5.7927e-01 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 26 28 0 0 0 26 28 0 0 0 2060 >> PCApplyOnBlocks 252 1.0 5.7902e-01 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 26 28 0 0 0 26 28 0 0 0 2061 >> ------------------------------------------------------------------------------------------------------------------------ >> >> >> Cheers, >> Hao >> >> >> >> From: Smith, Barry F. >> Sent: Thursday, February 6, 2020 7:03 PM >> To: Hao DONG >> Cc: petsc-users at mcs.anl.gov >> Subject: Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC? >> >> >> Read my comments ALL the way down, they go a long way. >> >>> On Feb 6, 2020, at 3:43 AM, Hao DONG wrote: >>> >>> Dear Hong and Barry, >>> >>> Thanks for the suggestions. So there could be some problems in my PETSc configuration? - but my PETSc lib was indeed compiled without the debug flags (--with-debugging=0). I use GCC/GFortran (Home-brew GCC 9.2.0) for the compiling and building of PETSc and my own fortran code. My Fortran compiling flags are simply like: >>> >>> -O3 -ffree-line-length-none -fastsse >>> >>> Which is also used for -FOPTFLAGS in PETSc (I added -openmp for PETSc, but not my fortran code, as I don?t have any OMP optimizations in my program). Note the performance test results I listed yesterday (e.g. 4.08s with 41 bcgs iterations.) are without any CSR-array->PETSc translation overhead (only include the set and solve part). >> >> PETSc doesn't use -openmp in any way for its solvers. Do not use this option, it may be slowing the code down. Please send configure.log >> >>> >>> I have two questions about the performance difference - >>> >>> 1. Is ilu only factorized once for each iteration, or ilu is performed at every outer ksp iteration steps? Sounds unlikely - but if so, this could cause some extra overheads. >> >> ILU is ONLY done if the matrix has changed (which seems wrong). >>> >>> 2. Some KSP solvers like BCGS or TFQMR has two ?half-iterations? for each iteration step. Not sure how it works in PETSc, but is that possible that both the two ?half" relative residuals are counted in the output array, doubling the number of iterations (but that cannot explain the extra time consumed)? >> >> Yes, PETSc might report them as two, you need to check the exact code. >> >>> >>> Anyway, the output with -log_view from the same 278906 by 278906 matrix with 3-block D-ILU in PETSc is as follows: >>> >>> >>> ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- >>> >>> MEMsolv.lu on a arch-darwin-c-opt named Haos-MBP with 1 processor, by donghao Thu Feb 6 09:07:35 2020 >>> Using Petsc Release Version 3.12.3, unknown >>> >>> Max Max/Min Avg Total >>> Time (sec): 4.443e+00 1.000 4.443e+00 >>> Objects: 1.155e+03 1.000 1.155e+03 >>> Flop: 4.311e+09 1.000 4.311e+09 4.311e+09 >>> Flop/sec: 9.703e+08 1.000 9.703e+08 9.703e+08 >>> MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00 >>> MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00 >>> MPI Reductions: 0.000e+00 0.000 >>> >>> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) >>> e.g., VecAXPY() for real vectors of length N --> 2N flop >>> and VecAXPY() for complex vectors of length N --> 8N flop >>> >>> Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- >>> Avg %Total Avg %Total Count %Total Avg %Total Count %Total >>> 0: Main Stage: 4.4435e+00 100.0% 4.3113e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% >>> >>> ???????????????????????????????????????????????????????????? >>> See the 'Profiling' chapter of the users' manual for details on interpreting output. >>> Phase summary info: >>> Count: number of times phase was executed >>> Time and Flop: Max - maximum over all processors >>> Ratio - ratio of maximum to minimum over all processors >>> Mess: number of messages sent >>> AvgLen: average message length (bytes) >>> Reduct: number of global reductions >>> Global: entire computation >>> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). >>> %T - percent time in this phase %F - percent flop in this phase >>> %M - percent messages in this phase %L - percent message lengths in this phase >>> %R - percent reductions in this phase >>> Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) >>> ------------------------------------------------------------------------------------------------------------------------ >>> Event Count Time (sec) Flop --- Global --- --- Stage ---- Total >>> Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >>> ------------------------------------------------------------------------------------------------------------------------ >>> >>> --- Event Stage 0: Main Stage >>> >>> BuildTwoSidedF 1 1.0 2.3000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatMult 83 1.0 1.7815e+00 1.0 2.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 40 48 0 0 0 40 48 0 0 0 1168 >>> MatSolve 252 1.0 1.2708e+00 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 29 28 0 0 0 29 28 0 0 0 939 >>> MatLUFactorNum 3 1.0 7.9725e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 298 >>> MatILUFactorSym 3 1.0 2.6998e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>> MatAssemblyBegin 5 1.0 3.6000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatAssemblyEnd 5 1.0 3.1619e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>> MatGetRowIJ 3 1.0 2.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatCreateSubMats 1 1.0 3.9659e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>> MatGetOrdering 3 1.0 4.3070e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatView 3 1.0 1.3600e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecDot 82 1.0 1.8948e-01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00 4 4 0 0 0 4 4 0 0 0 966 >>> VecDotNorm2 41 1.0 1.6812e-01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00 4 4 0 0 0 4 4 0 0 0 1088 >>> VecNorm 43 1.0 9.5099e-02 1.0 9.59e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 1009 >>> VecCopy 2 1.0 1.0120e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecSet 271 1.0 3.8922e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>> VecAXPY 1 1.0 7.7200e-04 1.0 2.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2890 >>> VecAXPBYCZ 82 1.0 2.4370e-01 1.0 3.66e+08 1.0 0.0e+00 0.0e+00 0.0e+00 5 8 0 0 0 5 8 0 0 0 1502 >>> VecWAXPY 82 1.0 1.4148e-01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 4 0 0 0 3 4 0 0 0 1293 >>> VecAssemblyBegin 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecAssemblyEnd 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecScatterBegin 84 1.0 5.9300e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> KSPSetUp 4 1.0 1.4167e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> KSPSolve 1 1.0 4.0250e+00 1.0 4.31e+09 1.0 0.0e+00 0.0e+00 0.0e+00 91100 0 0 0 91100 0 0 0 1071 >>> PCSetUp 4 1.0 1.5207e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 1 0 0 0 3 1 0 0 0 156 >>> PCSetUpOnBlocks 1 1.0 1.1116e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 1 0 0 0 3 1 0 0 0 214 >>> PCApply 84 1.0 1.2912e+00 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 29 28 0 0 0 29 28 0 0 0 924 >>> PCApplyOnBlocks 252 1.0 1.2909e+00 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 29 28 0 0 0 29 28 0 0 0 924 >>> ------------------------------------------------------------------------------------------------------------------------ >>> >>> # I skipped the memory part - the options (and compiler options) are as follows: >>> >>> #PETSc Option Table entries: >>> -ksp_type bcgs >>> -ksp_view >>> -log_view >>> -pc_bjacobi_local_blocks 3 >>> -pc_factor_levels 0 >>> -pc_sub_type ilu >>> -pc_type bjacobi >>> #End of PETSc Option Table entries >>> Compiled with FORTRAN kernels >>> Compiled with full precision matrices (default) >>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 16 sizeof(PetscInt) 4 >>> Configure options: --with-scalar-type=complex --download-mumps --download-scalapack --with-fortran-kernels=1 -- FOPTFLAGS=?-O3 -fastsse -mp -openmp? --COPTFLAGS=?-O3 -fastsse -mp -openmp? --CXXOPTFLAGS="-O3 -fastsse -mp -openmp" -- with-debugging=0 >>> ----------------------------------------- >>> Libraries compiled on 2020-02-03 10:44:31 on Haos-MBP >>> Machine characteristics: Darwin-19.2.0-x86_64-i386-64bit >>> Using PETSc directory: /Users/donghao/src/git/PETSc-current >>> Using PETSc arch: arch-darwin-c-opt >>> ----------------------------------------- >>> >>> Using C compiler: mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fno-stack- check -Qunused-arguments -fvisibility=hidden >>> Using Fortran compiler: mpif90 -Wall -ffree-line-length-0 -Wno-unused-dummy-argument >>> >>> Using include paths: -I/Users/donghao/src/git/PETSc-current/include -I/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/include >>> ----------------------------------------- >>> >>> Using C linker: mpicc >>> Using Fortran linker: mpif90 >>> Using libraries: -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/donghao/src/git/PETSc- current/arch-darwin-c-opt/lib -lpetsc -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/ donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -Wl,-rpath,/usr/local/opt/libevent/lib -L/usr/local/opt/libevent/ lib -Wl,-rpath,/usr/local/Cellar/open-mpi/4.0.2/lib -L/usr/local/Cellar/open-mpi/4.0.2/lib -Wl,-rpath,/usr/local/Cellar/ gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple-darwin19/9.2.0 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple- darwin19/9.2.0 -Wl,-rpath,/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -lcmumps - ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -llapack -lblas -lc++ -ldl -lmpi_usempif08 - lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lquadmath -lm -lc++ -ldl >>> >>> >>> On the other hand, running PETSc with >>> >>> -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type lu -ksp_type gmres -ksp_monitor -ksp_view -log_view >>> >>> For the same problem takes 5.37s and 72 GMRES iterations. Our previous testings show that BiCGstab (bcgs in PETSc) is almost always the most effective KSP solver for our non-symmetrical complex system. Strangely, the system is still using ilu instead of lu for sub blocks. The output is like: >> >> -sub_pc_type lu >> >>> >>> 0 KSP Residual norm 2.480412407430e+02 >>> 1 KSP Residual norm 8.848059967835e+01 >>> 2 KSP Residual norm 3.415272863261e+01 >>> 3 KSP Residual norm 1.563045190939e+01 >>> 4 KSP Residual norm 6.241296940043e+00 >>> 5 KSP Residual norm 2.739710899854e+00 >>> 6 KSP Residual norm 1.391304148888e+00 >>> 7 KSP Residual norm 7.959262020849e-01 >>> 8 KSP Residual norm 4.828323055231e-01 >>> 9 KSP Residual norm 2.918529739200e-01 >>> 10 KSP Residual norm 1.905508589557e-01 >>> 11 KSP Residual norm 1.291541892702e-01 >>> 12 KSP Residual norm 8.827145774707e-02 >>> 13 KSP Residual norm 6.521331095889e-02 >>> 14 KSP Residual norm 5.095787952595e-02 >>> 15 KSP Residual norm 4.043060387395e-02 >>> 16 KSP Residual norm 3.232590200012e-02 >>> 17 KSP Residual norm 2.593944982216e-02 >>> 18 KSP Residual norm 2.064639483533e-02 >>> 19 KSP Residual norm 1.653916663492e-02 >>> 20 KSP Residual norm 1.334946415452e-02 >>> 21 KSP Residual norm 1.092886880597e-02 >>> 22 KSP Residual norm 8.988004105542e-03 >>> 23 KSP Residual norm 7.466501315240e-03 >>> 24 KSP Residual norm 6.284389135436e-03 >>> 25 KSP Residual norm 5.425231669964e-03 >>> 26 KSP Residual norm 4.766338253084e-03 >>> 27 KSP Residual norm 4.241238878242e-03 >>> 28 KSP Residual norm 3.808113525685e-03 >>> 29 KSP Residual norm 3.449383788116e-03 >>> 30 KSP Residual norm 3.126025526388e-03 >>> 31 KSP Residual norm 2.958328054299e-03 >>> 32 KSP Residual norm 2.802344900403e-03 >>> 33 KSP Residual norm 2.621993580492e-03 >>> 34 KSP Residual norm 2.430066269304e-03 >>> 35 KSP Residual norm 2.259043079597e-03 >>> 36 KSP Residual norm 2.104287972986e-03 >>> 37 KSP Residual norm 1.952916080045e-03 >>> 38 KSP Residual norm 1.804988937999e-03 >>> 39 KSP Residual norm 1.643302117377e-03 >>> 40 KSP Residual norm 1.471661332036e-03 >>> 41 KSP Residual norm 1.286445911163e-03 >>> 42 KSP Residual norm 1.127543025848e-03 >>> 43 KSP Residual norm 9.777148275484e-04 >>> 44 KSP Residual norm 8.293314450006e-04 >>> 45 KSP Residual norm 6.989331136622e-04 >>> 46 KSP Residual norm 5.852307780220e-04 >>> 47 KSP Residual norm 4.926715539762e-04 >>> 48 KSP Residual norm 4.215941372075e-04 >>> 49 KSP Residual norm 3.699489548162e-04 >>> 50 KSP Residual norm 3.293897163533e-04 >>> 51 KSP Residual norm 2.959954542998e-04 >>> 52 KSP Residual norm 2.700193032414e-04 >>> 53 KSP Residual norm 2.461789791204e-04 >>> 54 KSP Residual norm 2.218839085563e-04 >>> 55 KSP Residual norm 1.945154309976e-04 >>> 56 KSP Residual norm 1.661128781744e-04 >>> 57 KSP Residual norm 1.413198766258e-04 >>> 58 KSP Residual norm 1.213984003195e-04 >>> 59 KSP Residual norm 1.044317450754e-04 >>> 60 KSP Residual norm 8.919957502977e-05 >>> 61 KSP Residual norm 8.042584301275e-05 >>> 62 KSP Residual norm 7.292784493581e-05 >>> 63 KSP Residual norm 6.481935501872e-05 >>> 64 KSP Residual norm 5.718564652679e-05 >>> 65 KSP Residual norm 5.072589750116e-05 >>> 66 KSP Residual norm 4.487930741285e-05 >>> 67 KSP Residual norm 3.941040674119e-05 >>> 68 KSP Residual norm 3.492873281291e-05 >>> 69 KSP Residual norm 3.103798339845e-05 >>> 70 KSP Residual norm 2.822943237409e-05 >>> 71 KSP Residual norm 2.610615023776e-05 >>> 72 KSP Residual norm 2.441692671173e-05 >>> KSP Object: 1 MPI processes >>> type: gmres >>> restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement >>> happy breakdown tolerance 1e-30 >>> maximum iterations=150, nonzero initial guess >>> tolerances: relative=1e-07, absolute=1e-50, divergence=10000. >>> left preconditioning >>> using PRECONDITIONED norm type for convergence test >>> PC Object: 1 MPI processes >>> type: bjacobi >>> number of blocks = 3 >>> Local solve is same for all blocks, in the following KSP and PC objects: >>> KSP Object: (sub_) 1 MPI processes >>> type: preonly >>> maximum iterations=10000, initial guess is zero >>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>> left preconditioning >>> using NONE norm type for convergence test >>> PC Object: (sub_) 1 MPI processes >>> type: ilu >>> out-of-place factorization >>> 0 levels of fill >>> tolerance for zero pivot 2.22045e-14 >>> matrix ordering: natural >>> factor fill ratio given 1., needed 1. >>> Factored matrix follows: >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=92969, cols=92969 >>> package used to perform factorization: petsc >>> total: nonzeros=638417, allocated nonzeros=638417 >>> total number of mallocs used during MatSetValues calls=0 >>> not using I-node routines >>> linear system matrix = precond matrix: >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=92969, cols=92969 >>> total: nonzeros=638417, allocated nonzeros=638417 >>> total number of mallocs used during MatSetValues calls=0 >>> not using I-node routines >>> linear system matrix = precond matrix: >>> Mat Object: 1 MPI processes >>> type: mpiaij >>> rows=278906, cols=278906 >>> total: nonzeros=3274027, allocated nonzeros=3274027 >>> total number of mallocs used during MatSetValues calls=0 >>> not using I-node (on process 0) routines >>> ... >>> ------------------------------------------------------------------------------------------------------------------------ >>> Event Count Time (sec) Flop --- Global --- --- Stage ---- Total >>> Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >>> ------------------------------------------------------------------------------------------------------------------------ >>> >>> --- Event Stage 0: Main Stage >>> >>> BuildTwoSidedF 1 1.0 2.3000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatMult 75 1.0 1.5812e+00 1.0 1.88e+09 1.0 0.0e+00 0.0e+00 0.0e+00 28 24 0 0 0 28 24 0 0 0 1189 >>> MatSolve 228 1.0 1.1442e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 14 0 0 0 20 14 0 0 0 944 >> >> These flop rates are ok, but not great. Perhaps an older machine. >> >>> MatLUFactorNum 3 1.0 8.1930e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 290 >>> MatILUFactorSym 3 1.0 2.7102e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatAssemblyBegin 5 1.0 3.7000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatAssemblyEnd 5 1.0 3.1895e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>> MatGetRowIJ 3 1.0 2.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatCreateSubMats 1 1.0 4.0904e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>> MatGetOrdering 3 1.0 4.2640e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatView 3 1.0 1.4400e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecMDot 72 1.0 1.1984e+00 1.0 2.25e+09 1.0 0.0e+00 0.0e+00 0.0e+00 21 28 0 0 0 21 28 0 0 0 1877 >> >> 21 percent of the time in VecMDOT this is huge for s sequential fun. I think maybe you are using a terrible OpenMP BLAS? >> >> Send configure.log >> >> >>> VecNorm 76 1.0 1.6841e-01 1.0 1.70e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 2 0 0 0 3 2 0 0 0 1007 >>> VecScale 75 1.0 1.8241e-02 1.0 8.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 4587 >>> VecCopy 3 1.0 1.4970e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecSet 276 1.0 9.1970e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >>> VecAXPY 6 1.0 3.7450e-03 1.0 1.34e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3575 >>> VecMAXPY 75 1.0 1.0022e+00 1.0 2.41e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 30 0 0 0 18 30 0 0 0 2405 >>> VecAssemblyBegin 2 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecAssemblyEnd 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecScatterBegin 76 1.0 5.5100e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecNormalize 75 1.0 1.8462e-01 1.0 2.51e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 3 0 0 0 3 3 0 0 0 1360 >>> KSPSetUp 4 1.0 1.1341e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> KSPSolve 1 1.0 5.3123e+00 1.0 7.91e+09 1.0 0.0e+00 0.0e+00 0.0e+00 93100 0 0 0 93100 0 0 0 1489 >>> KSPGMRESOrthog 72 1.0 2.1316e+00 1.0 4.50e+09 1.0 0.0e+00 0.0e+00 0.0e+00 37 57 0 0 0 37 57 0 0 0 2110 >>> PCSetUp 4 1.0 1.5531e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 153 >>> PCSetUpOnBlocks 1 1.0 1.1343e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 209 >>> PCApply 76 1.0 1.1671e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 14 0 0 0 20 14 0 0 0 925 >>> PCApplyOnBlocks 228 1.0 1.1668e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 14 0 0 0 20 14 0 0 0 925 >>> ???????????????????????????????????????????????????????????? >>> ... >>> #PETSc Option Table entries: >>> -ksp_monitor >>> -ksp_type gmres >>> -ksp_view >>> -log_view >>> -pc_bjacobi_local_blocks 3 >>> -pc_sub_type lu >>> -pc_type bjacobi >>> #End of PETSc Option Table entries >>> ... >>> >>> Does any of the setup/output ring a bell? >>> >>> BTW, out of curiosity - what is a ?I-node? routine? >>> >>> >>> Cheers, >>> Hao >>> >>> >>> From: Smith, Barry F. >>> Sent: Wednesday, February 5, 2020 9:42 PM >>> To: Hao DONG >>> Cc: petsc-users at mcs.anl.gov >>> Subject: Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC? >>> >>> >>> >>>> On Feb 5, 2020, at 4:36 AM, Hao DONG wrote: >>>> >>>> Thanks a lot for your suggestions, Hong and Barry - >>>> >>>> As you suggested, I first tried the LU direct solvers (built-in and MUMPs) out this morning, which work perfectly, albeit slow. As my problem itself is a part of a PDE based optimization, the exact solution in the searching procedure is not necessary (I often set a relative tolerance of 1E-7/1E-8, etc. for Krylov subspace methods). Also tried BJACOBI with exact LU, the KSP just converges in one or two iterations, as expected. >>>> >>>> I added -kspview option for the D-ILU test (still with Block Jacobi as preconditioner and bcgs as KSP solver). The KSPview output from one of the examples in a toy model looks like: >>>> >>>> KSP Object: 1 MPI processes >>>> type: bcgs >>>> maximum iterations=120, nonzero initial guess >>>> tolerances: relative=1e-07, absolute=1e-50, divergence=10000. >>>> left preconditioning >>>> using PRECONDITIONED norm type for convergence test >>>> PC Object: 1 MPI processes >>>> type: bjacobi >>>> number of blocks = 3 >>>> Local solve is same for all blocks, in the following KSP and PC objects: >>>> KSP Object: (sub_) 1 MPI processes >>>> type: preonly >>>> maximum iterations=10000, initial guess is zero >>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>>> left preconditioning >>>> using NONE norm type for convergence test >>>> PC Object: (sub_) 1 MPI processes >>>> type: ilu >>>> out-of-place factorization >>>> 0 levels of fill >>>> tolerance for zero pivot 2.22045e-14 >>>> matrix ordering: natural >>>> factor fill ratio given 1., needed 1. >>>> Factored matrix follows: >>>> Mat Object: 1 MPI processes >>>> type: seqaij >>>> rows=11294, cols=11294 >>>> package used to perform factorization: petsc >>>> total: nonzeros=76008, allocated nonzeros=76008 >>>> total number of mallocs used during MatSetValues calls=0 >>>> not using I-node routines >>>> linear system matrix = precond matrix: >>>> Mat Object: 1 MPI processes >>>> type: seqaij >>>> rows=11294, cols=11294 >>>> total: nonzeros=76008, allocated nonzeros=76008 >>>> total number of mallocs used during MatSetValues calls=0 >>>> not using I-node routines >>>> linear system matrix = precond matrix: >>>> Mat Object: 1 MPI processes >>>> type: mpiaij >>>> rows=33880, cols=33880 >>>> total: nonzeros=436968, allocated nonzeros=436968 >>>> total number of mallocs used during MatSetValues calls=0 >>>> not using I-node (on process 0) routines >>>> >>>> do you see something wrong with my setup? >>>> >>>> I also tried a quick performance test with a small 278906 by 278906 matrix (3850990 nnz) with the following parameters: >>>> >>>> -ksp_type bcgs -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type ilu -ksp_view >>>> >>>> Reducing the relative residual to 1E-7 >>>> >>>> Took 4.08s with 41 bcgs iterations. >>>> >>>> Merely changing the -pc_bjacobi_local_blocks to 6 >>>> >>>> Took 7.02s with 73 bcgs iterations. 9 blocks would further take 9.45s with 101 bcgs iterations. >>> >>> This is normal. more blocks slower convergence >>>> >>>> As a reference, my home-brew Fortran code solves the same problem (3-block D-ILU0) in >>>> >>>> 1.84s with 24 bcgs iterations (the bcgs code is also a home-brew one)? >>>> >>> Run the PETSc code with optimization ./configure --with-debugging=0 an run the code with -log_view this will show where the PETSc code is spending the time (send it to use) >>> >>> >>>> >>>> >>>> Well, by saying ?use explicit L/U matrix as preconditioner?, I wonder if a user is allowed to provide his own (separate) L and U Mat for preconditioning (like how it works in Matlab solvers), e.g. >>>> >>>> x = qmr(A,b,Tol,MaxIter,L,U,x) >>>> >>>> As I already explicitly constructed the L and U matrices in Fortran, it is not hard to convert them to Mat in Petsc to test Petsc and my Fortran code head-to-head. In that case, the A, b, x, and L/U are all identical, it would be easier to see where the problem came from. >>>> >>>> >>> No, we don't provide this kind of support >>> >>> >>>> >>>> BTW, there is another thing I wondered - is there a way to output residual in unpreconditioned norm? I tried to >>>> >>>> call KSPSetNormType(ksp_local, KSP_NORM_UNPRECONDITIONED, ierr) >>>> >>>> But always get an error that current ksp does not support unpreconditioned in LEFT/RIGHT (either way). Is it possible to do that (output unpreconditioned residual) in PETSc at all? >>> >>> -ksp_monitor_true_residual You can also run GMRES (and some other methods) with right preconditioning, -ksp_pc_side right then the residual computed is by the algorithm the unpreconditioned residual >>> >>> KSPSetNormType sets the norm used in the algorithm, it generally always has to left or right, only a couple algorithms support unpreconditioned directly. >>> >>> Barry >>> >>> >>>> >>>> Cheers, >>>> Hao >>>> >>>> >>>> From: Smith, Barry F. >>>> Sent: Tuesday, February 4, 2020 8:27 PM >>>> To: Hao DONG >>>> Cc: petsc-users at mcs.anl.gov >>>> Subject: Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC? >>>> >>>> >>>> >>>>> On Feb 4, 2020, at 12:41 PM, Hao DONG wrote: >>>>> >>>>> Dear all, >>>>> >>>>> >>>>> I have a few questions about the implementation of diagonal ILU PC in PETSc. I want to solve a very simple system with KSP (in parallel), the nature of the system (finite difference time-harmonic Maxwell) is probably not important to the question itself. Long story short, I just need to solve a set of equations of Ax = b with a block diagonal system matrix, like (not sure if the mono font works): >>>>> >>>>> |X | >>>>> A =| Y | >>>>> | Z| >>>>> >>>>> Note that A is not really block-diagonal, it?s just a multi-diagonal matrix determined by the FD mesh, where most elements are close to diagonal. So instead of a full ILU decomposition, a D-ILU is a good approximation as a preconditioner, and the number of blocks should not matter too much: >>>>> >>>>> |Lx | |Ux | >>>>> L = | Ly | and U = | Uy | >>>>> | Lz| | Uz| >>>>> >>>>> Where [Lx, Ux] = ILU0(X), etc. Indeed, the D-ILU preconditioner (with 3N blocks) is quite efficient with Krylov subspace methods like BiCGstab or QMR in my sequential Fortran/Matlab code. >>>>> >>>>> So like most users, I am looking for a parallel implement with this problem in PETSc. After looking through the manual, it seems that the most straightforward way to do it is through PCBJACOBI. Not sure I understand it right, I just setup a 3-block PCJACOBI and give each of the block a KSP with PCILU. Is this supposed to be equivalent to my D-ILU preconditioner? My little block of fortran code would look like: >>>>> ... >>>>> call PCBJacobiSetTotalBlocks(pc_local,Ntotal, & >>>>> & isubs,ierr) >>>>> call PCBJacobiSetLocalBlocks(pc_local, Nsub, & >>>>> & isubs(istart:iend),ierr) >>>>> ! set up the block jacobi structure >>>>> call KSPSetup(ksp_local,ierr) >>>>> ! allocate sub ksps >>>>> allocate(ksp_sub(Nsub)) >>>>> call PCBJacobiGetSubKSP(pc_local,Nsub,istart, & >>>>> & ksp_sub,ierr) >>>>> do i=1,Nsub >>>>> call KSPGetPC(ksp_sub(i),pc_sub,ierr) >>>>> !ILU preconditioner >>>>> call PCSetType(pc_sub,ptype,ierr) >>>>> call PCFactorSetLevels(pc_sub,1,ierr) ! use ILU(1) here >>>>> call KSPSetType(ksp_sub(i),KSPPREONLY,ierr)] >>>>> end do >>>>> call KSPSetTolerances(ksp_local,KSSiter%tol,PETSC_DEFAULT_REAL, & >>>>> & PETSC_DEFAULT_REAL,KSSiter%maxit,ierr) >>>>> ? >>>> >>>> This code looks essentially right. You should call with -ksp_view to see exactly what PETSc is using for a solver. >>>> >>>>> >>>>> I understand that the parallel performance may not be comparable, so I first set up a one-process test (with MPIAij, but all the rows are local since there is only one process). The system is solved without any problem (identical results within error). But the performance is actually a lot worse (code built without debugging flags in performance tests) than my own home-brew implementation in Fortran (I wrote my own ILU0 in CSR sparse matrix format), which is hard to believe. I suspect the difference is from the PC as the PETSc version took much more BiCGstab iterations (60-ish vs 100-ish) to converge to the same relative tol. >>>> >>>> PETSc uses GMRES by default with a restart of 30 and left preconditioning. It could be different exact choices in the solver (which is why -ksp_view is so useful) can explain the differences in the runs between your code and PETSc's >>>>> >>>>> This is further confirmed when I change the setup of D-ILU (using 6 or 9 blocks instead of 3). While my Fortran/Matlab codes see minimal performance difference (<5%) when I play with the D-ILU setup, increasing the number of D-ILU blocks from 3 to 9 caused the ksp setup with PCBJACOBI to suffer a performance decrease of more than 50% in sequential test. >>>> >>>> This is odd, the more blocks the smaller each block so the quicker the ILU set up should be. You can run various cases with -log_view and send us the output to see what is happening at each part of the computation in time. >>>> >>>>> So my implementation IS somewhat different in PETSc. Do I miss something in the PCBJACOBI setup? Or do I have some fundamental misunderstanding of how PCBJACOBI works in PETSc? >>>> >>>> Probably not. >>>>> >>>>> If this is not the right way to implement a block diagonal ILU as (parallel) PC, please kindly point me to the right direction. I searched through the mail list to find some answers, only to find a couple of similar questions... An example would be nice. >>>> >>>> You approach seems fundamentally right but I cannot be sure of possible bugs. >>>>> >>>>> On the other hand, does PETSc support a simple way to use explicit L/U matrix as a preconditioner? I can import the D-ILU matrix (I already converted my A matrix into Mat) constructed in my Fortran code to make a better comparison. Or do I have to construct my own PC using PCshell? If so, is there a good tutorial/example to learn about how to use PCSHELL (in a more advanced sense, like how to setup pc side and transpose)? >>>> >>>> Not sure what you mean by explicit L/U matrix as a preconditioner. As Hong said, yes you can use a parallel LU from MUMPS or SuperLU_DIST or Pastix as the solver. You do not need any shell code. You simply need to set the PCType to lu >>>> >>>> You can also set all this options from the command line and don't need to write the code specifically. So call KSPSetFromOptions() and then for example >>>> >>>> -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type ilu (this last one is applied to each block so you could use -pc_type lu and it would use lu on each block.) >>>> >>>> -ksp_type_none -pc_type lu -pc_factor_mat_solver_type mumps (do parallel LU with mumps) >>>> >>>> By not hardwiring in the code and just using options you can test out different cases much quicker >>>> >>>> Use -ksp_view to make sure that is using the solver the way you expect. >>>> >>>> Barry >>>> >>>> >>>> >>>> Barry >>>> >>>>> >>>>> Thanks in advance, >>>>> >>>>> Hao >> >> > From bsmith at mcs.anl.gov Mon Feb 10 09:07:33 2020 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Mon, 10 Feb 2020 15:07:33 +0000 Subject: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC? In-Reply-To: <9DF1BA10-D81B-4BCF-98EE-0179B9A681BA@outlook.com> References: <264F91C4-8558-4850-9B4B-ABE4123C2A2C@anl.gov> <4A373D93-4018-45E0-B805-3ECC528472DD@mcs.anl.gov> <9DF1BA10-D81B-4BCF-98EE-0179B9A681BA@outlook.com> Message-ID: You should google for preconditioners/solvers for 3D time-harmonic Maxwell equation arises from stagger-grid finite difference. Maxwell has its own particular structure and difficulties with iterative solvers depending on the parameters. Also see page 87 in https://www.mcs.anl.gov/petsc/petsc-current/docs/manual.pdf Barry -pc_type asm may work well for a handful of processors. > On Feb 10, 2020, at 8:47 AM, Hao DONG wrote: > > Hi Barry, > > Thank you for you suggestions (and insights)! Indeed my initial motivation to try out PETSc is the different methods. As my matrix pattern is relatively simple (3D time-harmonic Maxwell equation arises from stagger-grid finite difference), also considering the fact that I am not wealthy enough to utilize the direct solvers, I was looking for a fast Krylov subspace method / preconditioner that scale well with, say, tens of cpu cores. > > As a simple block-Jacobian preconditioner seems to lose its efficiency with more than a handful of blocks, I planned to look into other methods/preconditioners, e.g. multigrid (as preconditioner) or domain decomposition methods. But I will probably need to look through a number of literatures before laying my hands on those (or bother you with more questions!). Anyway, thanks again for your kind help. > > > All the best, > Hao > >> On Feb 8, 2020, at 8:02 AM, Smith, Barry F. wrote: >> >> >> >>> On Feb 7, 2020, at 7:44 AM, Hao DONG wrote: >>> >>> Thanks, Barry, I really appreciate your help - >>> >>> I removed the OpenMP flags and rebuilt PETSc. So the problem is from the BLAS lib I linked? >> >> Yes, the openmp causes it to run in parallel, but the problem is not big enough and the machine is not good enough for parallel BLAS to speed things up, instead it slows things down a lot. We see this often, parallel BLAS must be used with care >> >>> Not sure which version my BLAS is, though? But I also included the -download-Scalapack option. Shouldn?t that enable linking with PBLAS automatically? >>> >>> After looking at the bcgs code in PETSc, I suppose the iteration residual recorded is indeed recorded twice per one "actual iteration?. So that can explain the difference of iteration numbers. >>> >>> My laptop is indeed an old machine (MBP15 mid-2014). I just cannot work with vi without a physical ESC key... >> >> The latest has a physical ESC, I am stuff without the ESC for a couple more years. >> >>> I have attached the configure.log -didn?t know that it is so large! >>> >>> Anyway, it seems that the removal of -openmp changes quite a lot of things, the performance is indeed getting much better - the flop/sec increases by a factor of 3. Still, I am getting 20 percent of VecMDot, but no VecMDot in BCGS all (see output below), is that a feature of gmres method? >> >> Yes, GMRES orthogonalizes against the last restart directions which uses these routines while BCGS does not, this is why BCGS is cheaper per iteration. >> >> PETSc is no faster than your code because the algorithm is the same, the compilers the same, and the hardware the same. No way to have clever tricks for PETSc to be much faster. What PETS provides is a huge variety of tested algorithms that no single person could code on their own. Anything in PETSc you could code yourself if you had endless time and get basically the same performance. >> >> Barry >> >> >>> >>> here is the output of the same problem with: >>> >>> -pc_type bjacobi -pc_bjacobi_local_blocks 3 -sub_pc_type ilu -ksp_type gmres -ksp_monitor -ksp_view >>> >>> >>> ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- >>> >>> Mod3DMT.test on a arch-darwin-c-opt named Haos-MBP with 1 processor, by donghao Fri Feb 7 10:26:19 2020 >>> Using Petsc Release Version 3.12.3, unknown >>> >>> Max Max/Min Avg Total >>> Time (sec): 2.520e+00 1.000 2.520e+00 >>> Objects: 1.756e+03 1.000 1.756e+03 >>> Flop: 7.910e+09 1.000 7.910e+09 7.910e+09 >>> Flop/sec: 3.138e+09 1.000 3.138e+09 3.138e+09 >>> MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00 >>> MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00 >>> MPI Reductions: 0.000e+00 0.000 >>> >>> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) >>> e.g., VecAXPY() for real vectors of length N --> 2N flop >>> and VecAXPY() for complex vectors of length N --> 8N flop >>> >>> Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- >>> Avg %Total Avg %Total Count %Total Avg %Total Count %Total >>> 0: Main Stage: 2.5204e+00 100.0% 7.9096e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% >>> >>> ------------------------------------------------------------------------------------------------------------------------ >>> ? >>> ------------------------------------------------------------------------------------------------------------------------ >>> Event Count Time (sec) Flop --- Global --- --- Stage ---- Total >>> Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >>> ------------------------------------------------------------------------------------------------------------------------ >>> >>> --- Event Stage 0: Main Stage >>> >>> BuildTwoSidedF 1 1.0 3.4000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatMult 75 1.0 6.2884e-01 1.0 1.88e+09 1.0 0.0e+00 0.0e+00 0.0e+00 25 24 0 0 0 25 24 0 0 0 2991 >>> MatSolve 228 1.0 4.4164e-01 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 14 0 0 0 18 14 0 0 0 2445 >>> MatLUFactorNum 3 1.0 4.1317e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 574 >>> MatILUFactorSym 3 1.0 2.3858e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>> MatAssemblyBegin 5 1.0 4.4000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatAssemblyEnd 5 1.0 1.5067e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>> MatGetRowIJ 3 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatCreateSubMats 1 1.0 2.4558e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>> MatGetOrdering 3 1.0 1.3290e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatView 3 1.0 1.2800e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecMDot 72 1.0 4.9875e-01 1.0 2.25e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 28 0 0 0 20 28 0 0 0 4509 >>> VecNorm 76 1.0 6.6666e-02 1.0 1.70e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 2 0 0 0 3 2 0 0 0 2544 >>> VecScale 75 1.0 1.7982e-02 1.0 8.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 4653 >>> VecCopy 3 1.0 1.5080e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecSet 276 1.0 9.6784e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 0 >>> VecAXPY 6 1.0 3.6860e-03 1.0 1.34e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3632 >>> VecMAXPY 75 1.0 4.0490e-01 1.0 2.41e+09 1.0 0.0e+00 0.0e+00 0.0e+00 16 30 0 0 0 16 30 0 0 0 5951 >>> VecAssemblyBegin 2 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecAssemblyEnd 2 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecScatterBegin 76 1.0 5.3800e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecNormalize 75 1.0 8.3690e-02 1.0 2.51e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 3 0 0 0 3 3 0 0 0 2999 >>> KSPSetUp 4 1.0 1.1663e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> KSPSolve 1 1.0 2.2119e+00 1.0 7.91e+09 1.0 0.0e+00 0.0e+00 0.0e+00 88100 0 0 0 88100 0 0 0 3576 >>> KSPGMRESOrthog 72 1.0 8.7843e-01 1.0 4.50e+09 1.0 0.0e+00 0.0e+00 0.0e+00 35 57 0 0 0 35 57 0 0 0 5121 >>> PCSetUp 4 1.0 9.2448e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 257 >>> PCSetUpOnBlocks 1 1.0 6.6597e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 356 >>> PCApply 76 1.0 4.6281e-01 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 14 0 0 0 18 14 0 0 0 2333 >>> PCApplyOnBlocks 228 1.0 4.6262e-01 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 14 0 0 0 18 14 0 0 0 2334 >>> ------------------------------------------------------------------------------------------------------------------------ >>> >>> Average time to get PetscTime(): 1e-07 >>> #PETSc Option Table entries: >>> -I LBFGS >>> -ksp_type gmres >>> -ksp_view >>> -log_view >>> -pc_bjacobi_local_blocks 3 >>> -pc_type bjacobi >>> -sub_pc_type ilu >>> #End of PETSc Option Table entries >>> Compiled with FORTRAN kernels >>> Compiled with full precision matrices (default) >>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 16 sizeof(PetscInt) 4 >>> Configure options: --with-scalar-type=complex --download-mumps --download-scalapack --with-fortran-kernels=1 -- FOPTFLAGS="-O3 -ffree-line-length-0 -msse2" --COPTFLAGS="-O3 -msse2" --CXXOPTFLAGS="-O3 -msse2" --with-debugging=0 >>> ----------------------------------------- >>> Libraries compiled on 2020-02-07 10:07:42 on Haos-MBP >>> Machine characteristics: Darwin-19.3.0-x86_64-i386-64bit >>> Using PETSc directory: /Users/donghao/src/git/PETSc-current >>> Using PETSc arch: arch-darwin-c-opt >>> ----------------------------------------- >>> >>> Using C compiler: mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fno-stack- check -Qunused-arguments -fvisibility=hidden -O3 -msse2 >>> Using Fortran compiler: mpif90 -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -O3 -ffree-line-length-0 - msse2 >>> ----------------------------------------- >>> >>> Using include paths: -I/Users/donghao/src/git/PETSc-current/include -I/Users/donghao/src/git/PETSc-current/arch-darwin- c-opt/include >>> ----------------------------------------- >>> >>> Using C linker: mpicc >>> Using Fortran linker: mpif90 >>> Using libraries: -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/donghao/src/git/PETSc- current/arch-darwin-c-opt/lib -lpetsc -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/ donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -Wl,-rpath,/usr/local/opt/libevent/lib -L/usr/local/opt/libevent/ lib -Wl,-rpath,/usr/local/Cellar/open-mpi/4.0.2/lib -L/usr/local/Cellar/open-mpi/4.0.2/lib -Wl,-rpath,/usr/local/Cellar/ gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple-darwin19/9.2.0 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple- darwin19/9.2.0 -Wl,-rpath,/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -lcmumps - ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -llapack -lblas -lc++ -ldl -lmpi_usempif08 - lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lquadmath -lm -lc++ -ldl >>> ----------------------------------------- >>> >>> >>> >>> The BCGS solver performance is now comparable to my own Fortran code (1.84s). Still, I feel that there is something wrong hidden somewhere in my setup - a professional lib should to perform better, I believe. Any other ideas that I can look into? Interestingly there is no VecMDot operation at all! Here is the output with the option of: >>> >>> -pc_type bjacobi -pc_bjacobi_local_blocks 3 -sub_pc_type ilu -ksp_type bcgs -ksp_monitor -ksp_view >>> >>> >>> >>> >>> ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- >>> >>> Mod3DMT.test on a arch-darwin-c-opt named Haos-MBP with 1 processor, by donghao Fri Feb 7 10:38:00 2020 >>> Using Petsc Release Version 3.12.3, unknown >>> >>> Max Max/Min Avg Total >>> Time (sec): 2.187e+00 1.000 2.187e+00 >>> Objects: 1.155e+03 1.000 1.155e+03 >>> Flop: 4.311e+09 1.000 4.311e+09 4.311e+09 >>> Flop/sec: 1.971e+09 1.000 1.971e+09 1.971e+09 >>> MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00 >>> MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00 >>> MPI Reductions: 0.000e+00 0.000 >>> >>> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) >>> e.g., VecAXPY() for real vectors of length N --> 2N flop >>> and VecAXPY() for complex vectors of length N --> 8N flop >>> >>> Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- >>> Avg %Total Avg %Total Count %Total Avg %Total Count %Total >>> 0: Main Stage: 2.1870e+00 100.0% 4.3113e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% >>> >>> ------------------------------------------------------------------------------------------------------------------------ >>> >>> ------------------------------------------------------------------------------------------------------------------------ >>> Event Count Time (sec) Flop --- Global --- --- Stage ---- Total >>> Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >>> ------------------------------------------------------------------------------------------------------------------------ >>> >>> --- Event Stage 0: Main Stage >>> >>> BuildTwoSidedF 1 1.0 2.2000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatMult 83 1.0 7.8726e-01 1.0 2.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 36 48 0 0 0 36 48 0 0 0 2644 >>> MatSolve 252 1.0 5.5656e-01 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 25 28 0 0 0 25 28 0 0 0 2144 >>> MatLUFactorNum 3 1.0 4.5115e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 526 >>> MatILUFactorSym 3 1.0 2.5103e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>> MatAssemblyBegin 5 1.0 3.3000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatAssemblyEnd 5 1.0 1.5709e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>> MatGetRowIJ 3 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatCreateSubMats 1 1.0 2.8989e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>> MatGetOrdering 3 1.0 1.1200e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatView 3 1.0 1.2600e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecDot 82 1.0 8.9328e-02 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00 4 4 0 0 0 4 4 0 0 0 2048 >>> VecDotNorm2 41 1.0 9.9019e-02 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00 5 4 0 0 0 5 4 0 0 0 1848 >>> VecNorm 43 1.0 3.9988e-02 1.0 9.59e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 2399 >>> VecCopy 2 1.0 1.1150e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecSet 271 1.0 4.2833e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >>> VecAXPY 1 1.0 5.9200e-04 1.0 2.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3769 >>> VecAXPBYCZ 82 1.0 1.1448e-01 1.0 3.66e+08 1.0 0.0e+00 0.0e+00 0.0e+00 5 8 0 0 0 5 8 0 0 0 3196 >>> VecWAXPY 82 1.0 6.7460e-02 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 4 0 0 0 3 4 0 0 0 2712 >>> VecAssemblyBegin 2 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecAssemblyEnd 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecScatterBegin 84 1.0 5.2800e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> KSPSetUp 4 1.0 1.4765e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>> KSPSolve 1 1.0 1.8514e+00 1.0 4.31e+09 1.0 0.0e+00 0.0e+00 0.0e+00 85100 0 0 0 85100 0 0 0 2329 >>> PCSetUp 4 1.0 1.0193e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 5 1 0 0 0 5 1 0 0 0 233 >>> PCSetUpOnBlocks 1 1.0 7.1421e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 1 0 0 0 3 1 0 0 0 332 >>> PCApply 84 1.0 5.7927e-01 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 26 28 0 0 0 26 28 0 0 0 2060 >>> PCApplyOnBlocks 252 1.0 5.7902e-01 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 26 28 0 0 0 26 28 0 0 0 2061 >>> ------------------------------------------------------------------------------------------------------------------------ >>> >>> >>> Cheers, >>> Hao >>> >>> >>> >>> From: Smith, Barry F. >>> Sent: Thursday, February 6, 2020 7:03 PM >>> To: Hao DONG >>> Cc: petsc-users at mcs.anl.gov >>> Subject: Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC? >>> >>> >>> Read my comments ALL the way down, they go a long way. >>> >>>> On Feb 6, 2020, at 3:43 AM, Hao DONG wrote: >>>> >>>> Dear Hong and Barry, >>>> >>>> Thanks for the suggestions. So there could be some problems in my PETSc configuration? - but my PETSc lib was indeed compiled without the debug flags (--with-debugging=0). I use GCC/GFortran (Home-brew GCC 9.2.0) for the compiling and building of PETSc and my own fortran code. My Fortran compiling flags are simply like: >>>> >>>> -O3 -ffree-line-length-none -fastsse >>>> >>>> Which is also used for -FOPTFLAGS in PETSc (I added -openmp for PETSc, but not my fortran code, as I don?t have any OMP optimizations in my program). Note the performance test results I listed yesterday (e.g. 4.08s with 41 bcgs iterations.) are without any CSR-array->PETSc translation overhead (only include the set and solve part). >>> >>> PETSc doesn't use -openmp in any way for its solvers. Do not use this option, it may be slowing the code down. Please send configure.log >>> >>>> >>>> I have two questions about the performance difference - >>>> >>>> 1. Is ilu only factorized once for each iteration, or ilu is performed at every outer ksp iteration steps? Sounds unlikely - but if so, this could cause some extra overheads. >>> >>> ILU is ONLY done if the matrix has changed (which seems wrong). >>>> >>>> 2. Some KSP solvers like BCGS or TFQMR has two ?half-iterations? for each iteration step. Not sure how it works in PETSc, but is that possible that both the two ?half" relative residuals are counted in the output array, doubling the number of iterations (but that cannot explain the extra time consumed)? >>> >>> Yes, PETSc might report them as two, you need to check the exact code. >>> >>>> >>>> Anyway, the output with -log_view from the same 278906 by 278906 matrix with 3-block D-ILU in PETSc is as follows: >>>> >>>> >>>> ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- >>>> >>>> MEMsolv.lu on a arch-darwin-c-opt named Haos-MBP with 1 processor, by donghao Thu Feb 6 09:07:35 2020 >>>> Using Petsc Release Version 3.12.3, unknown >>>> >>>> Max Max/Min Avg Total >>>> Time (sec): 4.443e+00 1.000 4.443e+00 >>>> Objects: 1.155e+03 1.000 1.155e+03 >>>> Flop: 4.311e+09 1.000 4.311e+09 4.311e+09 >>>> Flop/sec: 9.703e+08 1.000 9.703e+08 9.703e+08 >>>> MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00 >>>> MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00 >>>> MPI Reductions: 0.000e+00 0.000 >>>> >>>> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) >>>> e.g., VecAXPY() for real vectors of length N --> 2N flop >>>> and VecAXPY() for complex vectors of length N --> 8N flop >>>> >>>> Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- >>>> Avg %Total Avg %Total Count %Total Avg %Total Count %Total >>>> 0: Main Stage: 4.4435e+00 100.0% 4.3113e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% >>>> >>>> ???????????????????????????????????????????????????????????? >>>> See the 'Profiling' chapter of the users' manual for details on interpreting output. >>>> Phase summary info: >>>> Count: number of times phase was executed >>>> Time and Flop: Max - maximum over all processors >>>> Ratio - ratio of maximum to minimum over all processors >>>> Mess: number of messages sent >>>> AvgLen: average message length (bytes) >>>> Reduct: number of global reductions >>>> Global: entire computation >>>> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). >>>> %T - percent time in this phase %F - percent flop in this phase >>>> %M - percent messages in this phase %L - percent message lengths in this phase >>>> %R - percent reductions in this phase >>>> Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) >>>> ------------------------------------------------------------------------------------------------------------------------ >>>> Event Count Time (sec) Flop --- Global --- --- Stage ---- Total >>>> Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >>>> ------------------------------------------------------------------------------------------------------------------------ >>>> >>>> --- Event Stage 0: Main Stage >>>> >>>> BuildTwoSidedF 1 1.0 2.3000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatMult 83 1.0 1.7815e+00 1.0 2.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 40 48 0 0 0 40 48 0 0 0 1168 >>>> MatSolve 252 1.0 1.2708e+00 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 29 28 0 0 0 29 28 0 0 0 939 >>>> MatLUFactorNum 3 1.0 7.9725e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 298 >>>> MatILUFactorSym 3 1.0 2.6998e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>>> MatAssemblyBegin 5 1.0 3.6000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatAssemblyEnd 5 1.0 3.1619e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>>> MatGetRowIJ 3 1.0 2.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatCreateSubMats 1 1.0 3.9659e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>>> MatGetOrdering 3 1.0 4.3070e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatView 3 1.0 1.3600e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> VecDot 82 1.0 1.8948e-01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00 4 4 0 0 0 4 4 0 0 0 966 >>>> VecDotNorm2 41 1.0 1.6812e-01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00 4 4 0 0 0 4 4 0 0 0 1088 >>>> VecNorm 43 1.0 9.5099e-02 1.0 9.59e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 1009 >>>> VecCopy 2 1.0 1.0120e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> VecSet 271 1.0 3.8922e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>>> VecAXPY 1 1.0 7.7200e-04 1.0 2.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2890 >>>> VecAXPBYCZ 82 1.0 2.4370e-01 1.0 3.66e+08 1.0 0.0e+00 0.0e+00 0.0e+00 5 8 0 0 0 5 8 0 0 0 1502 >>>> VecWAXPY 82 1.0 1.4148e-01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 4 0 0 0 3 4 0 0 0 1293 >>>> VecAssemblyBegin 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> VecAssemblyEnd 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> VecScatterBegin 84 1.0 5.9300e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> KSPSetUp 4 1.0 1.4167e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> KSPSolve 1 1.0 4.0250e+00 1.0 4.31e+09 1.0 0.0e+00 0.0e+00 0.0e+00 91100 0 0 0 91100 0 0 0 1071 >>>> PCSetUp 4 1.0 1.5207e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 1 0 0 0 3 1 0 0 0 156 >>>> PCSetUpOnBlocks 1 1.0 1.1116e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 1 0 0 0 3 1 0 0 0 214 >>>> PCApply 84 1.0 1.2912e+00 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 29 28 0 0 0 29 28 0 0 0 924 >>>> PCApplyOnBlocks 252 1.0 1.2909e+00 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 29 28 0 0 0 29 28 0 0 0 924 >>>> ------------------------------------------------------------------------------------------------------------------------ >>>> >>>> # I skipped the memory part - the options (and compiler options) are as follows: >>>> >>>> #PETSc Option Table entries: >>>> -ksp_type bcgs >>>> -ksp_view >>>> -log_view >>>> -pc_bjacobi_local_blocks 3 >>>> -pc_factor_levels 0 >>>> -pc_sub_type ilu >>>> -pc_type bjacobi >>>> #End of PETSc Option Table entries >>>> Compiled with FORTRAN kernels >>>> Compiled with full precision matrices (default) >>>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 16 sizeof(PetscInt) 4 >>>> Configure options: --with-scalar-type=complex --download-mumps --download-scalapack --with-fortran-kernels=1 -- FOPTFLAGS=?-O3 -fastsse -mp -openmp? --COPTFLAGS=?-O3 -fastsse -mp -openmp? --CXXOPTFLAGS="-O3 -fastsse -mp -openmp" -- with-debugging=0 >>>> ----------------------------------------- >>>> Libraries compiled on 2020-02-03 10:44:31 on Haos-MBP >>>> Machine characteristics: Darwin-19.2.0-x86_64-i386-64bit >>>> Using PETSc directory: /Users/donghao/src/git/PETSc-current >>>> Using PETSc arch: arch-darwin-c-opt >>>> ----------------------------------------- >>>> >>>> Using C compiler: mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fno-stack- check -Qunused-arguments -fvisibility=hidden >>>> Using Fortran compiler: mpif90 -Wall -ffree-line-length-0 -Wno-unused-dummy-argument >>>> >>>> Using include paths: -I/Users/donghao/src/git/PETSc-current/include -I/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/include >>>> ----------------------------------------- >>>> >>>> Using C linker: mpicc >>>> Using Fortran linker: mpif90 >>>> Using libraries: -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/donghao/src/git/PETSc- current/arch-darwin-c-opt/lib -lpetsc -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/ donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -Wl,-rpath,/usr/local/opt/libevent/lib -L/usr/local/opt/libevent/ lib -Wl,-rpath,/usr/local/Cellar/open-mpi/4.0.2/lib -L/usr/local/Cellar/open-mpi/4.0.2/lib -Wl,-rpath,/usr/local/Cellar/ gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple-darwin19/9.2.0 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple- darwin19/9.2.0 -Wl,-rpath,/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -lcmumps - ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -llapack -lblas -lc++ -ldl -lmpi_usempif08 - lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lquadmath -lm -lc++ -ldl >>>> >>>> >>>> On the other hand, running PETSc with >>>> >>>> -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type lu -ksp_type gmres -ksp_monitor -ksp_view -log_view >>>> >>>> For the same problem takes 5.37s and 72 GMRES iterations. Our previous testings show that BiCGstab (bcgs in PETSc) is almost always the most effective KSP solver for our non-symmetrical complex system. Strangely, the system is still using ilu instead of lu for sub blocks. The output is like: >>> >>> -sub_pc_type lu >>> >>>> >>>> 0 KSP Residual norm 2.480412407430e+02 >>>> 1 KSP Residual norm 8.848059967835e+01 >>>> 2 KSP Residual norm 3.415272863261e+01 >>>> 3 KSP Residual norm 1.563045190939e+01 >>>> 4 KSP Residual norm 6.241296940043e+00 >>>> 5 KSP Residual norm 2.739710899854e+00 >>>> 6 KSP Residual norm 1.391304148888e+00 >>>> 7 KSP Residual norm 7.959262020849e-01 >>>> 8 KSP Residual norm 4.828323055231e-01 >>>> 9 KSP Residual norm 2.918529739200e-01 >>>> 10 KSP Residual norm 1.905508589557e-01 >>>> 11 KSP Residual norm 1.291541892702e-01 >>>> 12 KSP Residual norm 8.827145774707e-02 >>>> 13 KSP Residual norm 6.521331095889e-02 >>>> 14 KSP Residual norm 5.095787952595e-02 >>>> 15 KSP Residual norm 4.043060387395e-02 >>>> 16 KSP Residual norm 3.232590200012e-02 >>>> 17 KSP Residual norm 2.593944982216e-02 >>>> 18 KSP Residual norm 2.064639483533e-02 >>>> 19 KSP Residual norm 1.653916663492e-02 >>>> 20 KSP Residual norm 1.334946415452e-02 >>>> 21 KSP Residual norm 1.092886880597e-02 >>>> 22 KSP Residual norm 8.988004105542e-03 >>>> 23 KSP Residual norm 7.466501315240e-03 >>>> 24 KSP Residual norm 6.284389135436e-03 >>>> 25 KSP Residual norm 5.425231669964e-03 >>>> 26 KSP Residual norm 4.766338253084e-03 >>>> 27 KSP Residual norm 4.241238878242e-03 >>>> 28 KSP Residual norm 3.808113525685e-03 >>>> 29 KSP Residual norm 3.449383788116e-03 >>>> 30 KSP Residual norm 3.126025526388e-03 >>>> 31 KSP Residual norm 2.958328054299e-03 >>>> 32 KSP Residual norm 2.802344900403e-03 >>>> 33 KSP Residual norm 2.621993580492e-03 >>>> 34 KSP Residual norm 2.430066269304e-03 >>>> 35 KSP Residual norm 2.259043079597e-03 >>>> 36 KSP Residual norm 2.104287972986e-03 >>>> 37 KSP Residual norm 1.952916080045e-03 >>>> 38 KSP Residual norm 1.804988937999e-03 >>>> 39 KSP Residual norm 1.643302117377e-03 >>>> 40 KSP Residual norm 1.471661332036e-03 >>>> 41 KSP Residual norm 1.286445911163e-03 >>>> 42 KSP Residual norm 1.127543025848e-03 >>>> 43 KSP Residual norm 9.777148275484e-04 >>>> 44 KSP Residual norm 8.293314450006e-04 >>>> 45 KSP Residual norm 6.989331136622e-04 >>>> 46 KSP Residual norm 5.852307780220e-04 >>>> 47 KSP Residual norm 4.926715539762e-04 >>>> 48 KSP Residual norm 4.215941372075e-04 >>>> 49 KSP Residual norm 3.699489548162e-04 >>>> 50 KSP Residual norm 3.293897163533e-04 >>>> 51 KSP Residual norm 2.959954542998e-04 >>>> 52 KSP Residual norm 2.700193032414e-04 >>>> 53 KSP Residual norm 2.461789791204e-04 >>>> 54 KSP Residual norm 2.218839085563e-04 >>>> 55 KSP Residual norm 1.945154309976e-04 >>>> 56 KSP Residual norm 1.661128781744e-04 >>>> 57 KSP Residual norm 1.413198766258e-04 >>>> 58 KSP Residual norm 1.213984003195e-04 >>>> 59 KSP Residual norm 1.044317450754e-04 >>>> 60 KSP Residual norm 8.919957502977e-05 >>>> 61 KSP Residual norm 8.042584301275e-05 >>>> 62 KSP Residual norm 7.292784493581e-05 >>>> 63 KSP Residual norm 6.481935501872e-05 >>>> 64 KSP Residual norm 5.718564652679e-05 >>>> 65 KSP Residual norm 5.072589750116e-05 >>>> 66 KSP Residual norm 4.487930741285e-05 >>>> 67 KSP Residual norm 3.941040674119e-05 >>>> 68 KSP Residual norm 3.492873281291e-05 >>>> 69 KSP Residual norm 3.103798339845e-05 >>>> 70 KSP Residual norm 2.822943237409e-05 >>>> 71 KSP Residual norm 2.610615023776e-05 >>>> 72 KSP Residual norm 2.441692671173e-05 >>>> KSP Object: 1 MPI processes >>>> type: gmres >>>> restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement >>>> happy breakdown tolerance 1e-30 >>>> maximum iterations=150, nonzero initial guess >>>> tolerances: relative=1e-07, absolute=1e-50, divergence=10000. >>>> left preconditioning >>>> using PRECONDITIONED norm type for convergence test >>>> PC Object: 1 MPI processes >>>> type: bjacobi >>>> number of blocks = 3 >>>> Local solve is same for all blocks, in the following KSP and PC objects: >>>> KSP Object: (sub_) 1 MPI processes >>>> type: preonly >>>> maximum iterations=10000, initial guess is zero >>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>>> left preconditioning >>>> using NONE norm type for convergence test >>>> PC Object: (sub_) 1 MPI processes >>>> type: ilu >>>> out-of-place factorization >>>> 0 levels of fill >>>> tolerance for zero pivot 2.22045e-14 >>>> matrix ordering: natural >>>> factor fill ratio given 1., needed 1. >>>> Factored matrix follows: >>>> Mat Object: 1 MPI processes >>>> type: seqaij >>>> rows=92969, cols=92969 >>>> package used to perform factorization: petsc >>>> total: nonzeros=638417, allocated nonzeros=638417 >>>> total number of mallocs used during MatSetValues calls=0 >>>> not using I-node routines >>>> linear system matrix = precond matrix: >>>> Mat Object: 1 MPI processes >>>> type: seqaij >>>> rows=92969, cols=92969 >>>> total: nonzeros=638417, allocated nonzeros=638417 >>>> total number of mallocs used during MatSetValues calls=0 >>>> not using I-node routines >>>> linear system matrix = precond matrix: >>>> Mat Object: 1 MPI processes >>>> type: mpiaij >>>> rows=278906, cols=278906 >>>> total: nonzeros=3274027, allocated nonzeros=3274027 >>>> total number of mallocs used during MatSetValues calls=0 >>>> not using I-node (on process 0) routines >>>> ... >>>> ------------------------------------------------------------------------------------------------------------------------ >>>> Event Count Time (sec) Flop --- Global --- --- Stage ---- Total >>>> Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >>>> ------------------------------------------------------------------------------------------------------------------------ >>>> >>>> --- Event Stage 0: Main Stage >>>> >>>> BuildTwoSidedF 1 1.0 2.3000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatMult 75 1.0 1.5812e+00 1.0 1.88e+09 1.0 0.0e+00 0.0e+00 0.0e+00 28 24 0 0 0 28 24 0 0 0 1189 >>>> MatSolve 228 1.0 1.1442e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 14 0 0 0 20 14 0 0 0 944 >>> >>> These flop rates are ok, but not great. Perhaps an older machine. >>> >>>> MatLUFactorNum 3 1.0 8.1930e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 290 >>>> MatILUFactorSym 3 1.0 2.7102e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatAssemblyBegin 5 1.0 3.7000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatAssemblyEnd 5 1.0 3.1895e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>>> MatGetRowIJ 3 1.0 2.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatCreateSubMats 1 1.0 4.0904e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>>> MatGetOrdering 3 1.0 4.2640e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatView 3 1.0 1.4400e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> VecMDot 72 1.0 1.1984e+00 1.0 2.25e+09 1.0 0.0e+00 0.0e+00 0.0e+00 21 28 0 0 0 21 28 0 0 0 1877 >>> >>> 21 percent of the time in VecMDOT this is huge for s sequential fun. I think maybe you are using a terrible OpenMP BLAS? >>> >>> Send configure.log >>> >>> >>>> VecNorm 76 1.0 1.6841e-01 1.0 1.70e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 2 0 0 0 3 2 0 0 0 1007 >>>> VecScale 75 1.0 1.8241e-02 1.0 8.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 4587 >>>> VecCopy 3 1.0 1.4970e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> VecSet 276 1.0 9.1970e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >>>> VecAXPY 6 1.0 3.7450e-03 1.0 1.34e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3575 >>>> VecMAXPY 75 1.0 1.0022e+00 1.0 2.41e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 30 0 0 0 18 30 0 0 0 2405 >>>> VecAssemblyBegin 2 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> VecAssemblyEnd 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> VecScatterBegin 76 1.0 5.5100e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> VecNormalize 75 1.0 1.8462e-01 1.0 2.51e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 3 0 0 0 3 3 0 0 0 1360 >>>> KSPSetUp 4 1.0 1.1341e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> KSPSolve 1 1.0 5.3123e+00 1.0 7.91e+09 1.0 0.0e+00 0.0e+00 0.0e+00 93100 0 0 0 93100 0 0 0 1489 >>>> KSPGMRESOrthog 72 1.0 2.1316e+00 1.0 4.50e+09 1.0 0.0e+00 0.0e+00 0.0e+00 37 57 0 0 0 37 57 0 0 0 2110 >>>> PCSetUp 4 1.0 1.5531e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 153 >>>> PCSetUpOnBlocks 1 1.0 1.1343e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 209 >>>> PCApply 76 1.0 1.1671e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 14 0 0 0 20 14 0 0 0 925 >>>> PCApplyOnBlocks 228 1.0 1.1668e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 14 0 0 0 20 14 0 0 0 925 >>>> ???????????????????????????????????????????????????????????? >>>> ... >>>> #PETSc Option Table entries: >>>> -ksp_monitor >>>> -ksp_type gmres >>>> -ksp_view >>>> -log_view >>>> -pc_bjacobi_local_blocks 3 >>>> -pc_sub_type lu >>>> -pc_type bjacobi >>>> #End of PETSc Option Table entries >>>> ... >>>> >>>> Does any of the setup/output ring a bell? >>>> >>>> BTW, out of curiosity - what is a ?I-node? routine? >>>> >>>> >>>> Cheers, >>>> Hao >>>> >>>> >>>> From: Smith, Barry F. >>>> Sent: Wednesday, February 5, 2020 9:42 PM >>>> To: Hao DONG >>>> Cc: petsc-users at mcs.anl.gov >>>> Subject: Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC? >>>> >>>> >>>> >>>>> On Feb 5, 2020, at 4:36 AM, Hao DONG wrote: >>>>> >>>>> Thanks a lot for your suggestions, Hong and Barry - >>>>> >>>>> As you suggested, I first tried the LU direct solvers (built-in and MUMPs) out this morning, which work perfectly, albeit slow. As my problem itself is a part of a PDE based optimization, the exact solution in the searching procedure is not necessary (I often set a relative tolerance of 1E-7/1E-8, etc. for Krylov subspace methods). Also tried BJACOBI with exact LU, the KSP just converges in one or two iterations, as expected. >>>>> >>>>> I added -kspview option for the D-ILU test (still with Block Jacobi as preconditioner and bcgs as KSP solver). The KSPview output from one of the examples in a toy model looks like: >>>>> >>>>> KSP Object: 1 MPI processes >>>>> type: bcgs >>>>> maximum iterations=120, nonzero initial guess >>>>> tolerances: relative=1e-07, absolute=1e-50, divergence=10000. >>>>> left preconditioning >>>>> using PRECONDITIONED norm type for convergence test >>>>> PC Object: 1 MPI processes >>>>> type: bjacobi >>>>> number of blocks = 3 >>>>> Local solve is same for all blocks, in the following KSP and PC objects: >>>>> KSP Object: (sub_) 1 MPI processes >>>>> type: preonly >>>>> maximum iterations=10000, initial guess is zero >>>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>>>> left preconditioning >>>>> using NONE norm type for convergence test >>>>> PC Object: (sub_) 1 MPI processes >>>>> type: ilu >>>>> out-of-place factorization >>>>> 0 levels of fill >>>>> tolerance for zero pivot 2.22045e-14 >>>>> matrix ordering: natural >>>>> factor fill ratio given 1., needed 1. >>>>> Factored matrix follows: >>>>> Mat Object: 1 MPI processes >>>>> type: seqaij >>>>> rows=11294, cols=11294 >>>>> package used to perform factorization: petsc >>>>> total: nonzeros=76008, allocated nonzeros=76008 >>>>> total number of mallocs used during MatSetValues calls=0 >>>>> not using I-node routines >>>>> linear system matrix = precond matrix: >>>>> Mat Object: 1 MPI processes >>>>> type: seqaij >>>>> rows=11294, cols=11294 >>>>> total: nonzeros=76008, allocated nonzeros=76008 >>>>> total number of mallocs used during MatSetValues calls=0 >>>>> not using I-node routines >>>>> linear system matrix = precond matrix: >>>>> Mat Object: 1 MPI processes >>>>> type: mpiaij >>>>> rows=33880, cols=33880 >>>>> total: nonzeros=436968, allocated nonzeros=436968 >>>>> total number of mallocs used during MatSetValues calls=0 >>>>> not using I-node (on process 0) routines >>>>> >>>>> do you see something wrong with my setup? >>>>> >>>>> I also tried a quick performance test with a small 278906 by 278906 matrix (3850990 nnz) with the following parameters: >>>>> >>>>> -ksp_type bcgs -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type ilu -ksp_view >>>>> >>>>> Reducing the relative residual to 1E-7 >>>>> >>>>> Took 4.08s with 41 bcgs iterations. >>>>> >>>>> Merely changing the -pc_bjacobi_local_blocks to 6 >>>>> >>>>> Took 7.02s with 73 bcgs iterations. 9 blocks would further take 9.45s with 101 bcgs iterations. >>>> >>>> This is normal. more blocks slower convergence >>>>> >>>>> As a reference, my home-brew Fortran code solves the same problem (3-block D-ILU0) in >>>>> >>>>> 1.84s with 24 bcgs iterations (the bcgs code is also a home-brew one)? >>>>> >>>> Run the PETSc code with optimization ./configure --with-debugging=0 an run the code with -log_view this will show where the PETSc code is spending the time (send it to use) >>>> >>>> >>>>> >>>>> >>>>> Well, by saying ?use explicit L/U matrix as preconditioner?, I wonder if a user is allowed to provide his own (separate) L and U Mat for preconditioning (like how it works in Matlab solvers), e.g. >>>>> >>>>> x = qmr(A,b,Tol,MaxIter,L,U,x) >>>>> >>>>> As I already explicitly constructed the L and U matrices in Fortran, it is not hard to convert them to Mat in Petsc to test Petsc and my Fortran code head-to-head. In that case, the A, b, x, and L/U are all identical, it would be easier to see where the problem came from. >>>>> >>>>> >>>> No, we don't provide this kind of support >>>> >>>> >>>>> >>>>> BTW, there is another thing I wondered - is there a way to output residual in unpreconditioned norm? I tried to >>>>> >>>>> call KSPSetNormType(ksp_local, KSP_NORM_UNPRECONDITIONED, ierr) >>>>> >>>>> But always get an error that current ksp does not support unpreconditioned in LEFT/RIGHT (either way). Is it possible to do that (output unpreconditioned residual) in PETSc at all? >>>> >>>> -ksp_monitor_true_residual You can also run GMRES (and some other methods) with right preconditioning, -ksp_pc_side right then the residual computed is by the algorithm the unpreconditioned residual >>>> >>>> KSPSetNormType sets the norm used in the algorithm, it generally always has to left or right, only a couple algorithms support unpreconditioned directly. >>>> >>>> Barry >>>> >>>> >>>>> >>>>> Cheers, >>>>> Hao >>>>> >>>>> >>>>> From: Smith, Barry F. >>>>> Sent: Tuesday, February 4, 2020 8:27 PM >>>>> To: Hao DONG >>>>> Cc: petsc-users at mcs.anl.gov >>>>> Subject: Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC? >>>>> >>>>> >>>>> >>>>>> On Feb 4, 2020, at 12:41 PM, Hao DONG wrote: >>>>>> >>>>>> Dear all, >>>>>> >>>>>> >>>>>> I have a few questions about the implementation of diagonal ILU PC in PETSc. I want to solve a very simple system with KSP (in parallel), the nature of the system (finite difference time-harmonic Maxwell) is probably not important to the question itself. Long story short, I just need to solve a set of equations of Ax = b with a block diagonal system matrix, like (not sure if the mono font works): >>>>>> >>>>>> |X | >>>>>> A =| Y | >>>>>> | Z| >>>>>> >>>>>> Note that A is not really block-diagonal, it?s just a multi-diagonal matrix determined by the FD mesh, where most elements are close to diagonal. So instead of a full ILU decomposition, a D-ILU is a good approximation as a preconditioner, and the number of blocks should not matter too much: >>>>>> >>>>>> |Lx | |Ux | >>>>>> L = | Ly | and U = | Uy | >>>>>> | Lz| | Uz| >>>>>> >>>>>> Where [Lx, Ux] = ILU0(X), etc. Indeed, the D-ILU preconditioner (with 3N blocks) is quite efficient with Krylov subspace methods like BiCGstab or QMR in my sequential Fortran/Matlab code. >>>>>> >>>>>> So like most users, I am looking for a parallel implement with this problem in PETSc. After looking through the manual, it seems that the most straightforward way to do it is through PCBJACOBI. Not sure I understand it right, I just setup a 3-block PCJACOBI and give each of the block a KSP with PCILU. Is this supposed to be equivalent to my D-ILU preconditioner? My little block of fortran code would look like: >>>>>> ... >>>>>> call PCBJacobiSetTotalBlocks(pc_local,Ntotal, & >>>>>> & isubs,ierr) >>>>>> call PCBJacobiSetLocalBlocks(pc_local, Nsub, & >>>>>> & isubs(istart:iend),ierr) >>>>>> ! set up the block jacobi structure >>>>>> call KSPSetup(ksp_local,ierr) >>>>>> ! allocate sub ksps >>>>>> allocate(ksp_sub(Nsub)) >>>>>> call PCBJacobiGetSubKSP(pc_local,Nsub,istart, & >>>>>> & ksp_sub,ierr) >>>>>> do i=1,Nsub >>>>>> call KSPGetPC(ksp_sub(i),pc_sub,ierr) >>>>>> !ILU preconditioner >>>>>> call PCSetType(pc_sub,ptype,ierr) >>>>>> call PCFactorSetLevels(pc_sub,1,ierr) ! use ILU(1) here >>>>>> call KSPSetType(ksp_sub(i),KSPPREONLY,ierr)] >>>>>> end do >>>>>> call KSPSetTolerances(ksp_local,KSSiter%tol,PETSC_DEFAULT_REAL, & >>>>>> & PETSC_DEFAULT_REAL,KSSiter%maxit,ierr) >>>>>> ? >>>>> >>>>> This code looks essentially right. You should call with -ksp_view to see exactly what PETSc is using for a solver. >>>>> >>>>>> >>>>>> I understand that the parallel performance may not be comparable, so I first set up a one-process test (with MPIAij, but all the rows are local since there is only one process). The system is solved without any problem (identical results within error). But the performance is actually a lot worse (code built without debugging flags in performance tests) than my own home-brew implementation in Fortran (I wrote my own ILU0 in CSR sparse matrix format), which is hard to believe. I suspect the difference is from the PC as the PETSc version took much more BiCGstab iterations (60-ish vs 100-ish) to converge to the same relative tol. >>>>> >>>>> PETSc uses GMRES by default with a restart of 30 and left preconditioning. It could be different exact choices in the solver (which is why -ksp_view is so useful) can explain the differences in the runs between your code and PETSc's >>>>>> >>>>>> This is further confirmed when I change the setup of D-ILU (using 6 or 9 blocks instead of 3). While my Fortran/Matlab codes see minimal performance difference (<5%) when I play with the D-ILU setup, increasing the number of D-ILU blocks from 3 to 9 caused the ksp setup with PCBJACOBI to suffer a performance decrease of more than 50% in sequential test. >>>>> >>>>> This is odd, the more blocks the smaller each block so the quicker the ILU set up should be. You can run various cases with -log_view and send us the output to see what is happening at each part of the computation in time. >>>>> >>>>>> So my implementation IS somewhat different in PETSc. Do I miss something in the PCBJACOBI setup? Or do I have some fundamental misunderstanding of how PCBJACOBI works in PETSc? >>>>> >>>>> Probably not. >>>>>> >>>>>> If this is not the right way to implement a block diagonal ILU as (parallel) PC, please kindly point me to the right direction. I searched through the mail list to find some answers, only to find a couple of similar questions... An example would be nice. >>>>> >>>>> You approach seems fundamentally right but I cannot be sure of possible bugs. >>>>>> >>>>>> On the other hand, does PETSc support a simple way to use explicit L/U matrix as a preconditioner? I can import the D-ILU matrix (I already converted my A matrix into Mat) constructed in my Fortran code to make a better comparison. Or do I have to construct my own PC using PCshell? If so, is there a good tutorial/example to learn about how to use PCSHELL (in a more advanced sense, like how to setup pc side and transpose)? >>>>> >>>>> Not sure what you mean by explicit L/U matrix as a preconditioner. As Hong said, yes you can use a parallel LU from MUMPS or SuperLU_DIST or Pastix as the solver. You do not need any shell code. You simply need to set the PCType to lu >>>>> >>>>> You can also set all this options from the command line and don't need to write the code specifically. So call KSPSetFromOptions() and then for example >>>>> >>>>> -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type ilu (this last one is applied to each block so you could use -pc_type lu and it would use lu on each block.) >>>>> >>>>> -ksp_type_none -pc_type lu -pc_factor_mat_solver_type mumps (do parallel LU with mumps) >>>>> >>>>> By not hardwiring in the code and just using options you can test out different cases much quicker >>>>> >>>>> Use -ksp_view to make sure that is using the solver the way you expect. >>>>> >>>>> Barry >>>>> >>>>> >>>>> >>>>> Barry >>>>> >>>>>> >>>>>> Thanks in advance, >>>>>> >>>>>> Hao >>> >>> >> > From aan2 at princeton.edu Mon Feb 10 11:09:22 2020 From: aan2 at princeton.edu (Olek Niewiarowski) Date: Mon, 10 Feb 2020 17:09:22 +0000 Subject: [petsc-users] Implementing the Sherman Morisson formula (low rank update) in petsc4py and FEniCS? In-Reply-To: <35929586-4D5D-4B31-A34E-8D8D266FEA0A@mcs.anl.gov> References: <20E8B18C-F71E-4B10-958B-6CD24DA869A3@mcs.anl.gov> , <35929586-4D5D-4B31-A34E-8D8D266FEA0A@mcs.anl.gov> Message-ID: Barry, Thank you for your help and detailed suggestions. I will try to implement what you proposed and will follow-up with any questions. In the meantime, I just want to make sure I understand the use of SNESSetPicard: r - vector to store function value b - function evaluation routine - my F(u) function Amat - matrix with which A(x) x - b(x) is to be computed - a MatCreateLRC() -- what's the best way of passing in scalar k? Pmat - matrix from which preconditioner is computed (usually the same as Amat) - a regular Mat() J - function to compute matrix value, see SNESJacobianFunction for details on its calling sequence -- computes K + kaa' By the way, the manual page states "we do not recommend using this routine. It is far better to provide the nonlinear function F() and some approximation to the Jacobian and use an approximate Newton solver :-)" Thanks again! Alexander (Olek) Niewiarowski PhD Candidate, Civil & Environmental Engineering Princeton University, 2020 Cell: +1 (610) 393-2978 ________________________________ From: Smith, Barry F. Sent: Thursday, February 6, 2020 13:25 To: Olek Niewiarowski Cc: Matthew Knepley ; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Implementing the Sherman Morisson formula (low rank update) in petsc4py and FEniCS? If I remember your problem is K(u) + kaa' = F(u) You should start by creating a SNES and calling SNESSetPicard. Read its manual page. Your matrix should be a MatCreateLRC() for the Mat argument to SNESSetPicard and the Peat should be just your K matrix. If you run with -ksp_fd_operator -pc_type lu will be using K to precondition K + kaa' + d F(U)/dU . Newton's method should converge at quadratic order. You can use -ksp_fd_operator -pc_type anything else to use an iterative linear solver as the preconditioner of K. If you really want to use Sherman Morisson formula then you would create a PC shell and do typedef struct{ KSP innerksp; Vec u_1,u_2; } YourStruct; SNESGetKSP(&ksp); KSPGetPC(&pc); PCSetType(pc,PCSHELL); PCShellSetApply(pc,YourPCApply) PetscMemclear(yourstruct,si PCShellSetContext(pc,&yourstruct); Then YourPCApply(PC pc,Vec in, Vec out) { YourStruct *yourstruct; PCShellGetContext(pc,(void**)&yourstruct) if (!yourstruct->ksp) { PCCreate(comm,&yourstruct->ksp); KSPSetPrefix(yourstruct->ksp,"yourpc_"); Mat A,B; KSPGetOperators(ksp,&A,&B); KSPSetOperators(yourstruct->ksp,A,B); create work vectors } Apply the solve as you do for the linear case with Sherman Morisson formula } This you can run with for example -yourpc_pc_type cholesky Barry Looks complicated, conceptually simple. > 2) Call KSPSetOperators(ksp, K, K,) > > 3) Solve the first system KSPSolve(ksp, -F, u_1) > > 4) Solve the second system KSPSolve(ksp, a, u_2) > > 5) Calculate beta VecDot(a, u_2, &gamma); beta = 1./(1. + k*gamma); > > 6) Update the guess VecDot(u_2, F, &delta); VecAXPBYPCZ(u, 1.0, beta*delta, 1.0, u_1, u_2) No > On Feb 6, 2020, at 9:02 AM, Olek Niewiarowski wrote: > > Hi Matt, > > What you suggested in your last email was exactly what I did on my very first attempt at the problem, and while it "worked," convergence was not satisfactory due to the Newton step being fixed in step 6. This is the reason I would like to use the linesearch in SNES instead. Indeed in your manual you "recommend most PETSc users work directly with SNES, rather than using PETSc for the linear problem within a nonlinear solver." Ideally I'd like to create a SNES solver, pass in the functions to evaluate K, F, a, and k, and set up the underlying KSP object as in my first message. Is this possible? > > Thanks, > > Alexander (Olek) Niewiarowski > PhD Candidate, Civil & Environmental Engineering > Princeton University, 2020 > Cell: +1 (610) 393-2978 > From: Matthew Knepley > Sent: Thursday, February 6, 2020 5:33 > To: Olek Niewiarowski > Cc: Smith, Barry F. ; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Implementing the Sherman Morisson formula (low rank update) in petsc4py and FEniCS? > > On Wed, Feb 5, 2020 at 8:53 PM Olek Niewiarowski wrote: > Hi Barry and Matt, > > Thank you for your input and for creating a new issue in the repo. > My initial question was more basic (how to configure the SNES's KSP solver as in my first message with a and k), but now I see there's more to the implementation. To reiterate, for my problem's structure, a good solution algorithm (on the algebraic level) is the following "double back-substitution": > For each nonlinear iteration: > ? define intermediate vectors u_1 and u_2 > ? solve Ku_1 = -F --> u_1 = -K^{-1}F (this solve is cheap, don't actually need K^-1) > ? solve Ku_2 = -a --> u_2 = -K^{-1}a (ditto) > ? define \beta = 1/(1 + k a^Tu_2) > ? \Delta u = u_1 + \beta k u_2^T F u_2 > ? u = u + \Delta u > This is very easy to setup: > > 1) Create a KSP object KSPCreate(comm, &ksp) > > 2) Call KSPSetOperators(ksp, K, K,) > > 3) Solve the first system KSPSolve(ksp, -F, u_1) > > 4) Solve the second system KSPSolve(ksp, a, u_2) > > 5) Calculate beta VecDot(a, u_2, &gamma); beta = 1./(1. + k*gamma); > > 6) Update the guess VecDot(u_2, F, &delta); VecAXPBYPCZ(u, 1.0, beta*delta, 1.0, u_1, u_2) > > Thanks, > > Matt > > I don't need the Jacobian inverse, [K?kaaT]-1 = K-1 - (kK-1 aaTK-1)/(1+kaTK-1a) just the solution ?u = [K?kaaT]-1F = K-1F - (kK-1 aFK-1a)/(1 + kaTK-1a) > = u_1 + beta k u_2^T F u_2 (so I never need to invert K either). (To Matt's point on gitlab, K is a symmetric sparse matrix arising from a bilinear form. ) Also, eventually, I want to have more than one low-rank updates to K, but again, Sherman Morrisson Woodbury should still work. > > Being new to PETSc, I don't know if this algorithm directly translates into an efficient numerical solver. (I'm also not sure if Picard iteration would be useful here.) What would it take to set up the KSP solver in SNES like I did below? Is it possible "out of the box"? I looked at MatCreateLRC() - what would I pass this to? (A pointer to demo/tutorial would be very appreciated.) If there's a better way to go about all of this, I'm open to any and all ideas. My only limitation is that I use petsc4py exclusively since I/future users of my code will not be comfortable with C. > > Thanks again for your help! > > > Alexander (Olek) Niewiarowski > PhD Candidate, Civil & Environmental Engineering > Princeton University, 2020 > Cell: +1 (610) 393-2978 > From: Smith, Barry F. > Sent: Wednesday, February 5, 2020 15:46 > To: Matthew Knepley > Cc: Olek Niewiarowski ; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Implementing the Sherman Morisson formula (low rank update) in petsc4py and FEniCS? > > > https://gitlab.com/petsc/petsc/issues/557 > > > > On Feb 5, 2020, at 7:35 AM, Matthew Knepley wrote: > > > > Perhaps Barry is right that you want Picard, but suppose you really want Newton. > > > > "This problem can be solved efficiently using the Sherman-Morrison formula" Well, maybe. The main assumption here is that inverting K is cheap. I see two things you can do in a straightforward way: > > > > 1) Use MatCreateLRC() https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateLRC.html to create the Jacobian > > and solve using an iterative method. If you pass just K was the preconditioning matrix, you can use common PCs. > > > > 2) We only implemented MatMult() for LRC, but you could stick your SMW code in for MatSolve_LRC if you really want to factor K. We would > > of course help you do this. > > > > Thanks, > > > > Matt > > > > On Wed, Feb 5, 2020 at 1:36 AM Smith, Barry F. via petsc-users wrote: > > > > I am not sure of everything in your email but it sounds like you want to use a "Picard" iteration to solve [K(u)?kaaT]?u=?F(u). That is solve > > > > A(u^{n}) (u^{n+1} - u^{n}) = F(u^{n}) - A(u^{n})u^{n} where A(u) = K(u) - kaaT > > > > PETSc provides code to this with SNESSetPicard() (see the manual pages) I don't know if Petsc4py has bindings for this. > > > > Adding missing python bindings is not terribly difficult and you may be able to do it yourself if this is the approach you want. > > > > Barry > > > > > > > > > On Feb 4, 2020, at 5:07 PM, Olek Niewiarowski wrote: > > > > > > Hello, > > > I am a FEniCS user but new to petsc4py. I am trying to modify the KSP solver through the SNES object to implement the Sherman-Morrison formula(e.g. http://fourier.eng.hmc.edu/e176/lectures/algebra/node6.html ). I am solving a nonlinear system of the form [K(u)?kaaT]?u=?F(u). Here the jacobian matrix K is modified by the term kaaT, where k is a scalar. Notably, K is sparse, while the term kaaT results in a full matrix. This problem can be solved efficiently using the Sherman-Morrison formula : > > > > > > [K?kaaT]-1 = K-1 - (kK-1 aaTK-1)/(1+kaTK-1a) > > > I have managed to successfully implement this at the linear solve level (by modifying the KSP solver) inside a custom Newton solver in python by following an incomplete tutorial at https://www.firedrakeproject.org/petsc-interface.html#defining-a-preconditioner : > > > ? while (norm(delU) > alpha): # while not converged > > > ? > > > ? self.update_F() # call to method to update r.h.s form > > > ? self.update_K() # call to update the jacobian form > > > ? K = assemble(self.K) # assemble the jacobian matrix > > > ? F = assemble(self.F) # assemble the r.h.s vector > > > ? a = assemble(self.a_form) # assemble the a_form (see Sherman Morrison formula) > > > ? > > > ? for bc in self.mem.bc: # apply boundary conditions > > > ? bc.apply(K, F) > > > ? bc.apply(K, a) > > > ? > > > ? B = PETSc.Mat().create() > > > ? > > > ? # Assemble the bilinear form that defines A and get the concrete > > > ? # PETSc matrix > > > ? A = as_backend_type(K).mat() # get the PETSc objects for K and a > > > ? u = as_backend_type(a).vec() > > > ? > > > ? # Build the matrix "context" # see firedrake docs > > > ? Bctx = MatrixFreeB(A, u, u, self.k) > > > ? > > > ? # Set up B > > > ? # B is the same size as A > > > ? B.setSizes(*A.getSizes()) > > > ? B.setType(B.Type.PYTHON) > > > ? B.setPythonContext(Bctx) > > > ? B.setUp() > > > ? > > > ? > > > ? ksp = PETSc.KSP().create() # create the KSP linear solver object > > > ? ksp.setOperators(B) > > > ? ksp.setUp() > > > ? pc = ksp.pc > > > ? pc.setType(pc.Type.PYTHON) > > > ? pc.setPythonContext(MatrixFreePC()) > > > ? ksp.setFromOptions() > > > ? > > > ? solution = delU # the incremental displacement at this iteration > > > ? > > > ? b = as_backend_type(-F).vec() > > > ? delu = solution.vector().vec() > > > ? > > > ? ksp.solve(b, delu) > > > > > > ? self.mem.u.vector().axpy(0.25, self.delU.vector()) # poor man's linesearch > > > ? counter += 1 > > > Here is the corresponding petsc4py code adapted from the firedrake docs: > > > > > > ? class MatrixFreePC(object): > > > ? > > > ? def setUp(self, pc): > > > ? B, P = pc.getOperators() > > > ? # extract the MatrixFreeB object from B > > > ? ctx = B.getPythonContext() > > > ? self.A = ctx.A > > > ? self.u = ctx.u > > > ? self.v = ctx.v > > > ? self.k = ctx.k > > > ? # Here we build the PC object that uses the concrete, > > > ? # assembled matrix A. We will use this to apply the action > > > ? # of A^{-1} > > > ? self.pc = PETSc.PC().create() > > > ? self.pc.setOptionsPrefix("mf_") > > > ? self.pc.setOperators(self.A) > > > ? self.pc.setFromOptions() > > > ? # Since u and v do not change, we can build the denominator > > > ? # and the action of A^{-1} on u only once, in the setup > > > ? # phase. > > > ? tmp = self.A.createVecLeft() > > > ? self.pc.apply(self.u, tmp) > > > ? self._Ainvu = tmp > > > ? self._denom = 1 + self.k*self.v.dot(self._Ainvu) > > > ? > > > ? def apply(self, pc, x, y): > > > ? # y <- A^{-1}x > > > ? self.pc.apply(x, y) > > > ? # alpha <- (v^T A^{-1} x) / (1 + v^T A^{-1} u) > > > ? alpha = (self.k*self.v.dot(y)) / self._denom > > > ? # y <- y - alpha * A^{-1}u > > > ? y.axpy(-alpha, self._Ainvu) > > > ? > > > ? > > > ? class MatrixFreeB(object): > > > ? > > > ? def __init__(self, A, u, v, k): > > > ? self.A = A > > > ? self.u = u > > > ? self.v = v > > > ? self.k = k > > > ? > > > ? def mult(self, mat, x, y): > > > ? # y <- A x > > > ? self.A.mult(x, y) > > > ? > > > ? # alpha <- v^T x > > > ? alpha = self.v.dot(x) > > > ? > > > ? # y <- y + alpha*u > > > ? y.axpy(alpha, self.u) > > > However, this approach is not efficient as it requires many iterations due to the Newton step being fixed, so I would like to implement it using SNES and use line search. Unfortunately, I have not been able to find any documentation/tutorial on how to do so. Provided I have the FEniCS forms for F, K, and a, I'd like to do something along the lines of: > > > solver = PETScSNESSolver() # the FEniCS SNES wrapper > > > snes = solver.snes() # the petsc4py SNES object > > > ## ?? > > > ksp = snes.getKSP() > > > # set ksp option similar to above > > > solver.solve() > > > > > > I would be very grateful if anyone could could help or point me to a reference or demo that does something similar (or maybe a completely different way of solving the problem!). > > > Many thanks in advance! > > > Alex > > > > > > > > -- > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Feb 10 11:13:05 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 10 Feb 2020 09:13:05 -0800 Subject: [petsc-users] Implementing the Sherman Morisson formula (low rank update) in petsc4py and FEniCS? In-Reply-To: References: <20E8B18C-F71E-4B10-958B-6CD24DA869A3@mcs.anl.gov> <35929586-4D5D-4B31-A34E-8D8D266FEA0A@mcs.anl.gov> Message-ID: On Mon, Feb 10, 2020 at 9:09 AM Olek Niewiarowski wrote: > Barry, > Thank you for your help and detailed suggestions. I will try to implement > what you proposed and will follow-up with any questions. In the meantime, I > just want to make sure I understand the use of SNESSetPicard: > *r* - vector to store function value > *b* - function evaluation routine - *my F(u) function * > *Amat* - matrix with which A(x) x - b(x) is to be computed -* a > MatCreateLRC() -- what's the best way of passing in scalar k?* > *Pmat* - matrix from which preconditioner is computed (usually the same > as Amat) - * a regular Mat()* > *J* - function to compute matrix value, see SNESJacobianFunction > > for details on its calling sequence -- * computes K + kaa' * > By the way, the manual page states "we do not recommend using this > routine. It is far better to provide the nonlinear function F() and some > approximation to the Jacobian and use an approximate Newton solver :-)" > This is still correct. Barry is suggesting Picard since the implementation for you is a little bit simpler. However, the Picard solver is missing the K' piece of the Jacobian, so it will only ever convergence linearly, as opposed to quadratically for Newton in the convergence basin. However, once you have Picard going, it will be simple to switch to Newton by just providing that extra piece. Thanks, Matt > Thanks again! > > *Alexander (Olek) Niewiarowski* > PhD Candidate, Civil & Environmental Engineering > Princeton University, 2020 > Cell: +1 (610) 393-2978 > ------------------------------ > *From:* Smith, Barry F. > *Sent:* Thursday, February 6, 2020 13:25 > *To:* Olek Niewiarowski > *Cc:* Matthew Knepley ; petsc-users at mcs.anl.gov < > petsc-users at mcs.anl.gov> > *Subject:* Re: [petsc-users] Implementing the Sherman Morisson formula > (low rank update) in petsc4py and FEniCS? > > > If I remember your problem is K(u) + kaa' = F(u) > > You should start by creating a SNES and calling SNESSetPicard. Read its > manual page. Your matrix should be a MatCreateLRC() for the Mat argument > to SNESSetPicard and the Peat should be just your K matrix. > > If you run with -ksp_fd_operator -pc_type lu will be using K to > precondition K + kaa' + d F(U)/dU . Newton's method should converge at > quadratic order. You can use -ksp_fd_operator -pc_type anything else to use > an iterative linear solver as the preconditioner of K. > > If you really want to use Sherman Morisson formula then you would > create a PC shell and do > > typedef struct{ > KSP innerksp; > Vec u_1,u_2; > } YourStruct; > > SNESGetKSP(&ksp); > KSPGetPC(&pc); > PCSetType(pc,PCSHELL); > PCShellSetApply(pc,YourPCApply) > PetscMemclear(yourstruct,si > PCShellSetContext(pc,&yourstruct); > > Then > > YourPCApply(PC pc,Vec in, Vec out) > { > YourStruct *yourstruct; > > PCShellGetContext(pc,(void**)&yourstruct) > > if (!yourstruct->ksp) { > PCCreate(comm,&yourstruct->ksp); > KSPSetPrefix(yourstruct->ksp,"yourpc_"); > Mat A,B; > KSPGetOperators(ksp,&A,&B); > KSPSetOperators(yourstruct->ksp,A,B); > create work vectors > } > Apply the solve as you do for the linear case with Sherman Morisson > formula > } > > This you can run with for example -yourpc_pc_type cholesky > > Barry > > Looks complicated, conceptually simple. > > > > 2) Call KSPSetOperators(ksp, K, K,) > > > > 3) Solve the first system KSPSolve(ksp, -F, u_1) > > > > 4) Solve the second system KSPSolve(ksp, a, u_2) > > > > 5) Calculate beta VecDot(a, u_2, &gamma); beta = 1./(1. + k*gamma); > > > > 6) Update the guess VecDot(u_2, F, &delta); VecAXPBYPCZ(u, 1.0, > beta*delta, 1.0, u_1, u_2) > > No > > > On Feb 6, 2020, at 9:02 AM, Olek Niewiarowski > wrote: > > > > Hi Matt, > > > > What you suggested in your last email was exactly what I did on my very > first attempt at the problem, and while it "worked," convergence was not > satisfactory due to the Newton step being fixed in step 6. This is the > reason I would like to use the linesearch in SNES instead. Indeed in your > manual you "recommend most PETSc users work directly with SNES, rather than > using PETSc for the linear problem within a nonlinear solver." Ideally I'd > like to create a SNES solver, pass in the functions to evaluate K, F, a, > and k, and set up the underlying KSP object as in my first message. Is this > possible? > > > > Thanks, > > > > Alexander (Olek) Niewiarowski > > PhD Candidate, Civil & Environmental Engineering > > Princeton University, 2020 > > Cell: +1 (610) 393-2978 > > From: Matthew Knepley > > Sent: Thursday, February 6, 2020 5:33 > > To: Olek Niewiarowski > > Cc: Smith, Barry F. ; petsc-users at mcs.anl.gov < > petsc-users at mcs.anl.gov> > > Subject: Re: [petsc-users] Implementing the Sherman Morisson formula > (low rank update) in petsc4py and FEniCS? > > > > On Wed, Feb 5, 2020 at 8:53 PM Olek Niewiarowski > wrote: > > Hi Barry and Matt, > > > > Thank you for your input and for creating a new issue in the repo. > > My initial question was more basic (how to configure the SNES's KSP > solver as in my first message with a and k), but now I see there's more to > the implementation. To reiterate, for my problem's structure, a good > solution algorithm (on the algebraic level) is the following "double > back-substitution": > > For each nonlinear iteration: > > ? define intermediate vectors u_1 and u_2 > > ? solve Ku_1 = -F --> u_1 = -K^{-1}F (this solve is cheap, don't > actually need K^-1) > > ? solve Ku_2 = -a --> u_2 = -K^{-1}a (ditto) > > ? define \beta = 1/(1 + k a^Tu_2) > > ? \Delta u = u_1 + \beta k u_2^T F u_2 > > ? u = u + \Delta u > > This is very easy to setup: > > > > 1) Create a KSP object KSPCreate(comm, &ksp) > > > > 2) Call KSPSetOperators(ksp, K, K,) > > > > 3) Solve the first system KSPSolve(ksp, -F, u_1) > > > > 4) Solve the second system KSPSolve(ksp, a, u_2) > > > > 5) Calculate beta VecDot(a, u_2, &gamma); beta = 1./(1. + k*gamma); > > > > 6) Update the guess VecDot(u_2, F, &delta); VecAXPBYPCZ(u, 1.0, > beta*delta, 1.0, u_1, u_2) > > > > Thanks, > > > > Matt > > > > I don't need the Jacobian inverse, [K?kaaT]-1 = K-1 - (kK-1 > aaTK-1)/(1+kaTK-1a) just the solution ?u = [K?kaaT]-1F = K-1F - (kK-1 > aFK-1a)/(1 + kaTK-1a) > > = u_1 + beta k u_2^T F u_2 (so I never need to invert K either). (To > Matt's point on gitlab, K is a symmetric sparse matrix arising from a > bilinear form. ) Also, eventually, I want to have more than one low-rank > updates to K, but again, Sherman Morrisson Woodbury should still work. > > > > Being new to PETSc, I don't know if this algorithm directly translates > into an efficient numerical solver. (I'm also not sure if Picard iteration > would be useful here.) What would it take to set up the KSP solver in SNES > like I did below? Is it possible "out of the box"? I looked at > MatCreateLRC() - what would I pass this to? (A pointer to demo/tutorial > would be very appreciated.) If there's a better way to go about all of > this, I'm open to any and all ideas. My only limitation is that I use > petsc4py exclusively since I/future users of my code will not be > comfortable with C. > > > > Thanks again for your help! > > > > > > Alexander (Olek) Niewiarowski > > PhD Candidate, Civil & Environmental Engineering > > Princeton University, 2020 > > Cell: +1 (610) 393-2978 > > From: Smith, Barry F. > > Sent: Wednesday, February 5, 2020 15:46 > > To: Matthew Knepley > > Cc: Olek Niewiarowski ; petsc-users at mcs.anl.gov < > petsc-users at mcs.anl.gov> > > Subject: Re: [petsc-users] Implementing the Sherman Morisson formula > (low rank update) in petsc4py and FEniCS? > > > > > > https://gitlab.com/petsc/petsc/issues/557 > > > > > > > On Feb 5, 2020, at 7:35 AM, Matthew Knepley wrote: > > > > > > Perhaps Barry is right that you want Picard, but suppose you really > want Newton. > > > > > > "This problem can be solved efficiently using the Sherman-Morrison > formula" Well, maybe. The main assumption here is that inverting K is > cheap. I see two things you can do in a straightforward way: > > > > > > 1) Use MatCreateLRC() > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateLRC.html > to create the Jacobian > > > and solve using an iterative method. If you pass just K was the > preconditioning matrix, you can use common PCs. > > > > > > 2) We only implemented MatMult() for LRC, but you could stick your > SMW code in for MatSolve_LRC if you really want to factor K. We would > > > of course help you do this. > > > > > > Thanks, > > > > > > Matt > > > > > > On Wed, Feb 5, 2020 at 1:36 AM Smith, Barry F. via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > > > > I am not sure of everything in your email but it sounds like you > want to use a "Picard" iteration to solve [K(u)?kaaT]?u=?F(u). That is solve > > > > > > A(u^{n}) (u^{n+1} - u^{n}) = F(u^{n}) - A(u^{n})u^{n} where A(u) = > K(u) - kaaT > > > > > > PETSc provides code to this with SNESSetPicard() (see the manual > pages) I don't know if Petsc4py has bindings for this. > > > > > > Adding missing python bindings is not terribly difficult and you may > be able to do it yourself if this is the approach you want. > > > > > > Barry > > > > > > > > > > > > > On Feb 4, 2020, at 5:07 PM, Olek Niewiarowski > wrote: > > > > > > > > Hello, > > > > I am a FEniCS user but new to petsc4py. I am trying to modify the > KSP solver through the SNES object to implement the Sherman-Morrison > formula(e.g. http://fourier.eng.hmc.edu/e176/lectures/algebra/node6.html > ). I am solving a nonlinear system of the form [K(u)?kaaT]?u=?F(u). Here > the jacobian matrix K is modified by the term kaaT, where k is a scalar. > Notably, K is sparse, while the term kaaT results in a full matrix. This > problem can be solved efficiently using the Sherman-Morrison formula : > > > > > > > > [K?kaaT]-1 = K-1 - (kK-1 aaTK-1)/(1+kaTK-1a) > > > > I have managed to successfully implement this at the linear solve > level (by modifying the KSP solver) inside a custom Newton solver in python > by following an incomplete tutorial at > https://www.firedrakeproject.org/petsc-interface.html#defining-a-preconditioner > : > > > > ? while (norm(delU) > alpha): # while not converged > > > > ? > > > > ? self.update_F() # call to method to update r.h.s > form > > > > ? self.update_K() # call to update the jacobian form > > > > ? K = assemble(self.K) # assemble the jacobian > matrix > > > > ? F = assemble(self.F) # assemble the r.h.s vector > > > > ? a = assemble(self.a_form) # assemble the a_form > (see Sherman Morrison formula) > > > > ? > > > > ? for bc in self.mem.bc: # apply boundary conditions > > > > ? bc.apply(K, F) > > > > ? bc.apply(K, a) > > > > ? > > > > ? B = PETSc.Mat().create() > > > > ? > > > > ? # Assemble the bilinear form that defines A and > get the concrete > > > > ? # PETSc matrix > > > > ? A = as_backend_type(K).mat() # get the PETSc > objects for K and a > > > > ? u = as_backend_type(a).vec() > > > > ? > > > > ? # Build the matrix "context" # see firedrake docs > > > > ? Bctx = MatrixFreeB(A, u, u, self.k) > > > > ? > > > > ? # Set up B > > > > ? # B is the same size as A > > > > ? B.setSizes(*A.getSizes()) > > > > ? B.setType(B.Type.PYTHON) > > > > ? B.setPythonContext(Bctx) > > > > ? B.setUp() > > > > ? > > > > ? > > > > ? ksp = PETSc.KSP().create() # create the KSP > linear solver object > > > > ? ksp.setOperators(B) > > > > ? ksp.setUp() > > > > ? pc = ksp.pc > > > > ? pc.setType(pc.Type.PYTHON) > > > > ? pc.setPythonContext(MatrixFreePC()) > > > > ? ksp.setFromOptions() > > > > ? > > > > ? solution = delU # the incremental displacement > at this iteration > > > > ? > > > > ? b = as_backend_type(-F).vec() > > > > ? delu = solution.vector().vec() > > > > ? > > > > ? ksp.solve(b, delu) > > > > > > > > ? self.mem.u.vector().axpy(0.25, > self.delU.vector()) # poor man's linesearch > > > > ? counter += 1 > > > > Here is the corresponding petsc4py code adapted from the firedrake > docs: > > > > > > > > ? class MatrixFreePC(object): > > > > ? > > > > ? def setUp(self, pc): > > > > ? B, P = pc.getOperators() > > > > ? # extract the MatrixFreeB object from B > > > > ? ctx = B.getPythonContext() > > > > ? self.A = ctx.A > > > > ? self.u = ctx.u > > > > ? self.v = ctx.v > > > > ? self.k = ctx.k > > > > ? # Here we build the PC object that uses the > concrete, > > > > ? # assembled matrix A. We will use this to apply the > action > > > > ? # of A^{-1} > > > > ? self.pc = PETSc.PC().create() > > > > ? self.pc.setOptionsPrefix("mf_") > > > > ? self.pc.setOperators(self.A) > > > > ? self.pc.setFromOptions() > > > > ? # Since u and v do not change, we can build the > denominator > > > > ? # and the action of A^{-1} on u only once, in the > setup > > > > ? # phase. > > > > ? tmp = self.A.createVecLeft() > > > > ? self.pc.apply(self.u, tmp) > > > > ? self._Ainvu = tmp > > > > ? self._denom = 1 + self.k*self.v.dot(self._Ainvu) > > > > ? > > > > ? def apply(self, pc, x, y): > > > > ? # y <- A^{-1}x > > > > ? self.pc.apply(x, y) > > > > ? # alpha <- (v^T A^{-1} x) / (1 + v^T A^{-1} u) > > > > ? alpha = (self.k*self.v.dot(y)) / self._denom > > > > ? # y <- y - alpha * A^{-1}u > > > > ? y.axpy(-alpha, self._Ainvu) > > > > ? > > > > ? > > > > ? class MatrixFreeB(object): > > > > ? > > > > ? def __init__(self, A, u, v, k): > > > > ? self.A = A > > > > ? self.u = u > > > > ? self.v = v > > > > ? self.k = k > > > > ? > > > > ? def mult(self, mat, x, y): > > > > ? # y <- A x > > > > ? self.A.mult(x, y) > > > > ? > > > > ? # alpha <- v^T x > > > > ? alpha = self.v.dot(x) > > > > ? > > > > ? # y <- y + alpha*u > > > > ? y.axpy(alpha, self.u) > > > > However, this approach is not efficient as it requires many > iterations due to the Newton step being fixed, so I would like to implement > it using SNES and use line search. Unfortunately, I have not been able to > find any documentation/tutorial on how to do so. Provided I have the FEniCS > forms for F, K, and a, I'd like to do something along the lines of: > > > > solver = PETScSNESSolver() # the FEniCS SNES wrapper > > > > snes = solver.snes() # the petsc4py SNES object > > > > ## ?? > > > > ksp = snes.getKSP() > > > > # set ksp option similar to above > > > > solver.solve() > > > > > > > > I would be very grateful if anyone could could help or point me to a > reference or demo that does something similar (or maybe a completely > different way of solving the problem!). > > > > Many thanks in advance! > > > > Alex > > > > > > > > > > > > -- > > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > > -- Norbert Wiener > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon Feb 10 11:16:26 2020 From: jed at jedbrown.org (Jed Brown) Date: Mon, 10 Feb 2020 10:16:26 -0700 Subject: [petsc-users] Implementing the Sherman Morisson formula (low rank update) in petsc4py and FEniCS? In-Reply-To: References: <20E8B18C-F71E-4B10-958B-6CD24DA869A3@mcs.anl.gov> <35929586-4D5D-4B31-A34E-8D8D266FEA0A@mcs.anl.gov> Message-ID: <875zge8is5.fsf@jedbrown.org> Olek Niewiarowski writes: > Barry, > Thank you for your help and detailed suggestions. I will try to implement what you proposed and will follow-up with any questions. In the meantime, I just want to make sure I understand the use of SNESSetPicard: > r - vector to store function value > b - function evaluation routine - my F(u) function > Amat - matrix with which A(x) x - b(x) is to be computed - a MatCreateLRC() -- what's the best way of passing in scalar k? Typically via the context argument, similar to any SNES example. > Pmat - matrix from which preconditioner is computed (usually the same as Amat) - a regular Mat() > J - function to compute matrix value, see SNESJacobianFunction for details on its calling sequence -- computes K + kaa' > > By the way, the manual page states "we do not recommend using this routine. It is far better to provide the nonlinear function F() and some approximation to the Jacobian and use an approximate Newton solver :-)" Yep, this is mainly for when someone has legacy code to compute a matrix as a nonlinear function of the state U, but not a matrix-free way to compute a residual. Implementing as Newton (with inexact matrix/preconditioner) is more flexible and often enables faster convergence. From bsmith at mcs.anl.gov Mon Feb 10 13:11:29 2020 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Mon, 10 Feb 2020 19:11:29 +0000 Subject: [petsc-users] Implementing the Sherman Morisson formula (low rank update) in petsc4py and FEniCS? In-Reply-To: <875zge8is5.fsf@jedbrown.org> References: <20E8B18C-F71E-4B10-958B-6CD24DA869A3@mcs.anl.gov> <35929586-4D5D-4B31-A34E-8D8D266FEA0A@mcs.anl.gov> <875zge8is5.fsf@jedbrown.org> Message-ID: Note that you can add -snes_fd_operator and get Newton's method with a preconditioner built from the Picard matrix. Barry > On Feb 10, 2020, at 11:16 AM, Jed Brown wrote: > > Olek Niewiarowski writes: > >> Barry, >> Thank you for your help and detailed suggestions. I will try to implement what you proposed and will follow-up with any questions. In the meantime, I just want to make sure I understand the use of SNESSetPicard: >> r - vector to store function value >> b - function evaluation routine - my F(u) function >> Amat - matrix with which A(x) x - b(x) is to be computed - a MatCreateLRC() -- what's the best way of passing in scalar k? > > Typically via the context argument, similar to any SNES example. > >> Pmat - matrix from which preconditioner is computed (usually the same as Amat) - a regular Mat() >> J - function to compute matrix value, see SNESJacobianFunction for details on its calling sequence -- computes K + kaa' >> >> By the way, the manual page states "we do not recommend using this routine. It is far better to provide the nonlinear function F() and some approximation to the Jacobian and use an approximate Newton solver :-)" > > Yep, this is mainly for when someone has legacy code to compute a matrix > as a nonlinear function of the state U, but not a matrix-free way to > compute a residual. Implementing as Newton (with inexact > matrix/preconditioner) is more flexible and often enables faster > convergence. From mcdanielbt at ornl.gov Tue Feb 11 15:10:46 2020 From: mcdanielbt at ornl.gov (McDaniel, Tyler) Date: Tue, 11 Feb 2020 21:10:46 +0000 Subject: [petsc-users] Iterative solvers, MPI+GPU Message-ID: Hello, Our team at Oak Ridge National Lab requires a distributed and GPU-enabled (ideally) iterative solver as part of a new, high-dimensional PDE solver. We are exploring options for software packages with this capability vs. rolling our own, i.e. having some of our team members write one. Our code already has a distributed, GPU-enabled matrix-vector multiply that we'd like to use for the core of GMRES or similar technique. I've looked through the PETSc API and found that matrix-free methods are supported, and this: https://www.mcs.anl.gov/petsc/features/gpus.html seems to indicate that GPU acceleration is available for iterative solvers. My question is: does PETSc support all of these things together? E.g. is it possible for me to use a distributed, matrix free iterative solver with a preconditioner shell on the GPU with PETSc? Best, Tyler McDaniel -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Tue Feb 11 15:45:57 2020 From: jed at jedbrown.org (Jed Brown) Date: Tue, 11 Feb 2020 14:45:57 -0700 Subject: [petsc-users] Iterative solvers, MPI+GPU In-Reply-To: References: Message-ID: <87o8u4vluy.fsf@jedbrown.org> The short answer is yes, this works great, and your vectors never need to leave the GPU (except via send/receive buffers that can hit the network directly with GPU-aware MPI). If you have a shell preconditioner, you're all set. If you want to use PETSc preconditioners, we have some that run on GPUs, but not all are well-suited to GPU architectures, and there is ongoing work to improve performance for some important methods, such as algebraic multigrid (for which setup is harder than the solve). "McDaniel, Tyler via petsc-users" writes: > Hello, > > > Our team at Oak Ridge National Lab requires a distributed and GPU-enabled (ideally) iterative solver as part of a new, high-dimensional PDE solver. We are exploring options for software packages with this capability vs. rolling our own, i.e. having some of our team members write one. > > > Our code already has a distributed, GPU-enabled matrix-vector multiply that we'd like to use for the core of GMRES or similar technique. I've looked through the PETSc API and found that matrix-free methods are supported, and this: https://www.mcs.anl.gov/petsc/features/gpus.html seems to indicate that GPU acceleration is available for iterative solvers. > > > My question is: does PETSc support all of these things together? E.g. is it possible for me to use a distributed, matrix free iterative solver with a preconditioner shell on the GPU with PETSc? > > > Best, > > Tyler McDaniel From reuben.hill10 at imperial.ac.uk Wed Feb 12 05:53:52 2020 From: reuben.hill10 at imperial.ac.uk (Hill, Reuben) Date: Wed, 12 Feb 2020 11:53:52 +0000 Subject: [petsc-users] Vertex only unstructured mesh with DMPlex and DMSWARM Message-ID: I'm a new Firedrake developer working on getting my head around PETSc. As far as I'm aware, all our PETSc calls are done via petsc4py. I'm after general help and advise on two fronts: 1: I?m trying to represent a point cloud as a vertex-only mesh in an attempt to play nicely with the firedrake stack. If I try to do this using firedrake I manage to cause a PETSc seg fault error at the point of calling PETSc.DMPlex().createFromCellList(dim, cells, coords, comm=comm) with dim=0, cells=[[0]], coords=[[1., 2.]] and comm=COMM_WORLD. Output: [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run [0]PETSC ERROR: to get more information on the crash. application called MPI_Abort(MPI_COMM_WORLD, 50152059) - process 0 I?m now looking into getting firedrake to make a DMSWARM which seems to have been designed for doing something closer to this and allows nice things such as being able to make the particles (for me - the vertices of the mesh) move and jump between MPI ranks a-la particle in cell. I note DMSwarm docks don't suggest there is an equivalent of the plex.distribute() method in petsc4py (which I believe calls DMPlexDistribute) that a DMPlex has. In firedrake we create empty DMPlexes on each rank except 0, then call plex.distribute() to take care of parallel partitioning. How, therefore, should I meant to go about distributing particles across MPI ranks? 2: I'm aware these questions may be very naive. Any advice for learning about relevant bits of PETSc for would be very much appreciated. I'm in chapter 2 of the excellent manual (https://www.mcs.anl.gov/petsc/petsc-current/docs/manual.pdf) and I'm also attempting to understand the DMSwarm example. I presume in oder to understand DMSWARM I really ought to understand DMs more generally (i.e. read the whole manual)? Many thanks Reuben Hill 1. -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrick.sanan at gmail.com Wed Feb 12 09:29:13 2020 From: patrick.sanan at gmail.com (Patrick Sanan) Date: Wed, 12 Feb 2020 16:29:13 +0100 Subject: [petsc-users] Vertex only unstructured mesh with DMPlex and DMSWARM In-Reply-To: References: Message-ID: DMSwarm has DMSwarmMigrate() which might be the closest thing to DMPlexDistribute(). https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DMSWARM/DMSwarmMigrate.html Of course, it's good to create particles in parallel when possible. Am Mi., 12. Feb. 2020 um 12:56 Uhr schrieb Hill, Reuben < reuben.hill10 at imperial.ac.uk>: > I'm a new Firedrake developer working on getting my head around PETSc. As > far as I'm aware, all our PETSc calls are done via petsc4py. > > I'm after general help and advise on two fronts: > > > *1*: > > I?m trying to represent a point cloud as a vertex-only mesh in an attempt > to play nicely with the firedrake stack. If I try to do this using > firedrake I manage to cause a PETSc seg fault error at the point of calling > > PETSc.DMPlex().createFromCellList(dim, cells, coords, comm=comm) > > > with dim=0, cells=[[0]], coords=[[1., 2.]] and comm=COMM_WORLD. > > Output: > > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [0]PETSC ERROR: or see > https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS > X to find memory corruption errors > [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and > run > [0]PETSC ERROR: to get more information on the crash. > application called MPI_Abort(MPI_COMM_WORLD, 50152059) - process 0 > > > I?m now looking into getting firedrake to make a DMSWARM which seems to > have been designed for doing something closer to this and allows nice > things such as being able to make the particles (for me - the vertices of > the mesh) move and jump between MPI ranks a-la particle in cell. I note > DMSwarm docks don't suggest there is an equivalent of the plex.distribute() > method in petsc4py (which I believe calls DMPlexDistribute) that a DMPlex > has. In firedrake we create empty DMPlexes on each rank except 0, then call > plex.distribute() to take care of parallel partitioning. How, therefore, > should I meant to go about distributing particles across MPI ranks? > > > *2*: > > I'm aware these questions may be very naive. Any advice for learning about > relevant bits of PETSc for would be very much appreciated. I'm in chapter 2 > of the excellent manual ( > https://www.mcs.anl.gov/petsc/petsc-current/docs/manual.pdf) and I'm also > attempting to understand the DMSwarm example. I presume in oder to > understand DMSWARM I really ought to understand DMs more generally (i.e. > read the whole manual)? > > > Many thanks > Reuben Hill > > > 1. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Feb 12 10:52:42 2020 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 12 Feb 2020 08:52:42 -0800 Subject: [petsc-users] Vertex only unstructured mesh with DMPlex and DMSWARM In-Reply-To: References: Message-ID: On Wed, Feb 12, 2020 at 3:56 AM Hill, Reuben wrote: > I'm a new Firedrake developer working on getting my head around PETSc. As > far as I'm aware, all our PETSc calls are done via petsc4py. > > I'm after general help and advise on two fronts: > > > *1*: > > I?m trying to represent a point cloud as a vertex-only mesh in an attempt > to play nicely with the firedrake stack. If I try to do this using > firedrake I manage to cause a PETSc seg fault error at the point of calling > > PETSc.DMPlex().createFromCellList(dim, cells, coords, comm=comm) > > > with dim=0, cells=[[0]], coords=[[1., 2.]] and comm=COMM_WORLD. > > Output: > > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [0]PETSC ERROR: or see > https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS > X to find memory corruption errors > [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and > run > [0]PETSC ERROR: to get more information on the crash. > application called MPI_Abort(MPI_COMM_WORLD, 50152059) - process 0 > > > I?m now looking into getting firedrake to make a DMSWARM which seems to > have been designed for doing something closer to this and allows nice > things such as being able to make the particles (for me - the vertices of > the mesh) move and jump between MPI ranks a-la particle in cell. I note > DMSwarm docks don't suggest there is an equivalent of the plex.distribute() > method in petsc4py (which I believe calls DMPlexDistribute) that a DMPlex > has. In firedrake we create empty DMPlexes on each rank except 0, then call > plex.distribute() to take care of parallel partitioning. How, therefore, > should I meant to go about distributing particles across MPI ranks? > Patrick is right. Here is an example: https://gitlab.com/petsc/petsc/-/blob/master/src/snes/examples/tutorials/ex63.c Thanks Matt > *2*: > > I'm aware these questions may be very naive. Any advice for learning about > relevant bits of PETSc for would be very much appreciated. I'm in chapter 2 > of the excellent manual ( > https://www.mcs.anl.gov/petsc/petsc-current/docs/manual.pdf) and I'm also > attempting to understand the DMSwarm example. I presume in oder to > understand DMSWARM I really ought to understand DMs more generally (i.e. > read the whole manual)? > > > Many thanks > Reuben Hill > > > 1. > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From perceval.desforges at polytechnique.edu Thu Feb 13 07:53:03 2020 From: perceval.desforges at polytechnique.edu (Perceval Desforges) Date: Thu, 13 Feb 2020 14:53:03 +0100 Subject: [petsc-users] DMUMPS_LOAD_RECV_MSGS Message-ID: <856dbeb7c0711975cc582cfa01f364f3@polytechnique.edu> Hello all, I have been running in a strange issue with petsc, and more specifically I believe Mumps is the problem. In my program, I run a loop where at the beginning of the loop I create an eps object, calculate the eigenvalues in a certain interval using the spectrum slicing method, store them, and then destroy the eps object. For some reason, whatever the problem size, if my loop has too many iterations (over 2040 I believe), the program will crash giving me this error : Internal error 1 in DMUMPS_LOAD_RECV_MSGS 0 application called MPI_Abort(MPI_COMM_WORLD, -99) - process 0 I am running the program in MPI over 20 processes. I don't really understand what this message means, does anybody know? Best regards, Perceval -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Feb 13 08:31:43 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 13 Feb 2020 06:31:43 -0800 Subject: [petsc-users] DMUMPS_LOAD_RECV_MSGS In-Reply-To: <856dbeb7c0711975cc582cfa01f364f3@polytechnique.edu> References: <856dbeb7c0711975cc582cfa01f364f3@polytechnique.edu> Message-ID: On Thu, Feb 13, 2020 at 5:53 AM Perceval Desforges < perceval.desforges at polytechnique.edu> wrote: > Hello all, > > I have been running in a strange issue with petsc, and more specifically I > believe Mumps is the problem. > > In my program, I run a loop where at the beginning of the loop I create an > eps object, calculate the eigenvalues in a certain interval using the > spectrum slicing method, store them, and then destroy the eps object. For > some reason, whatever the problem size, if my loop has too many iterations > (over 2040 I believe), the program will crash giving me this error : > > Internal error 1 in DMUMPS_LOAD_RECV_MSGS 0 > > application called MPI_Abort(MPI_COMM_WORLD, -99) - process 0 > > I am running the program in MPI over 20 processes. > > I don't really understand what this message means, does anybody know? > An easy test would be to replace MUMPS with SuperLU and see if the error persists. Thanks, Matt > Best regards, > > Perceval > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Feb 13 09:09:34 2020 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Thu, 13 Feb 2020 15:09:34 +0000 Subject: [petsc-users] DMUMPS_LOAD_RECV_MSGS In-Reply-To: <856dbeb7c0711975cc582cfa01f364f3@polytechnique.edu> References: <856dbeb7c0711975cc582cfa01f364f3@polytechnique.edu> Message-ID: <1AC9C095-472E-45A4-9619-9D5B64F2D1C8@anl.gov> Given the 2040 either you or MUMPS is running out of communicators. Do you use your own communicators in your code and are you freeing them when you don't need them? If it is not your code then it is MUMPs that is running out and you should contact them directly RECURSIVE SUBROUTINE DMUMPS_LOAD_RECV_MSGS(COMM) IMPLICIT NONE INCLUDE 'mpif.h' INCLUDE 'mumps_tags.h' INTEGER IERR, MSGTAG, MSGLEN, MSGSOU,COMM INTEGER :: STATUS(MPI_STATUS_SIZE) LOGICAL FLAG 10 CONTINUE CALL MPI_IPROBE( MPI_ANY_SOURCE, MPI_ANY_TAG, COMM, & FLAG, STATUS, IERR ) IF (FLAG) THEN KEEP_LOAD(65)=KEEP_LOAD(65)+1 KEEP_LOAD(267)=KEEP_LOAD(267)-1 MSGTAG = STATUS( MPI_TAG ) MSGSOU = STATUS( MPI_SOURCE ) IF ( MSGTAG .NE. UPDATE_LOAD) THEN write(*,*) "Internal error 1 in DMUMPS_LOAD_RECV_MSGS", & MSGTAG CALL MUMPS_ABORT() > On Feb 13, 2020, at 7:53 AM, Perceval Desforges wrote: > > Hello all, > > I have been running in a strange issue with petsc, and more specifically I believe Mumps is the problem. > > In my program, I run a loop where at the beginning of the loop I create an eps object, calculate the eigenvalues in a certain interval using the spectrum slicing method, store them, and then destroy the eps object. For some reason, whatever the problem size, if my loop has too many iterations (over 2040 I believe), the program will crash giving me this error : > > Internal error 1 in DMUMPS_LOAD_RECV_MSGS 0 > > application called MPI_Abort(MPI_COMM_WORLD, -99) - process 0 > > I am running the program in MPI over 20 processes. > > I don't really understand what this message means, does anybody know? > > Best regards, > > Perceval > From pranayreddy865 at gmail.com Thu Feb 13 14:23:33 2020 From: pranayreddy865 at gmail.com (baikadi pranay) Date: Thu, 13 Feb 2020 13:23:33 -0700 Subject: [petsc-users] Question regarding the EPSSetDimensions routine Message-ID: Hello PETSc Users, I am trying to find the lowest 'n' eigenvalues of a hermitian eigenvalue problem. The size of the operator matrix (hamiltonian in my case) is dependent on the mesh spacing provided by the user (which is expected). However I have the following issue: The number of eigenvalues given by the solver is not consistent with what is given as input in the EPSSetDimensions routine. For example, for a 12000x12000 matrix, the solver gives 20 correct eigenvalues if nev=20, but fails to give any eigenvalue if nev=10. I am using the following lines of code to solve the problem: *call EPSCreate(PETSC_COMM_WORLD,eps,ierr)* *call EPSSetOperators(eps,ham,PETSC_NULL_MAT,ierr)call EPSSetProblemType(eps,EPS_HEP,ierr)call EPSSetWhichEigenpairs(eps,EPS_SMALLEST_MAGNITUDE,ierr)call EPSSetDimensions(eps,n_sub,n_sub*2,PETSC_DEFAULT_INTEGER,ierr)* *call EPSSetTolerances(eps,1D-10,5000,ierr)call EPSSolve(eps,ierr)* After the EPSSolve, I am calling EPSGetEigenPair and other relevant routines to get the eigenvector and eigenvalues. Any lead as to how to solve this problem would be greatly helpful to us. Please let me know if I need to provide any further information. Thank you for your time. Sincerely, Pranay. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Thu Feb 13 14:53:54 2020 From: jroman at dsic.upv.es (Jose E. Roman) Date: Thu, 13 Feb 2020 21:53:54 +0100 Subject: [petsc-users] Question regarding the EPSSetDimensions routine In-Reply-To: References: Message-ID: For nev=10 you are using a subspace of size 20. This may be too small. Check convergence with a monitor and increase ncv if necessary. Jose > El 13 feb 2020, a las 21:25, baikadi pranay escribi?: > > ? > Hello PETSc Users, > > I am trying to find the lowest 'n' eigenvalues of a hermitian eigenvalue problem. The size of the operator matrix (hamiltonian in my case) is dependent on the mesh spacing provided by the user (which is expected). However I have the following issue: > > The number of eigenvalues given by the solver is not consistent with what is given as input in the EPSSetDimensions routine. For example, for a 12000x12000 matrix, the solver gives 20 correct eigenvalues if nev=20, but fails to give any eigenvalue if nev=10. > > I am using the following lines of code to solve the problem: > > call EPSCreate(PETSC_COMM_WORLD,eps,ierr) > call EPSSetOperators(eps,ham,PETSC_NULL_MAT,ierr) > call EPSSetProblemType(eps,EPS_HEP,ierr) > call EPSSetWhichEigenpairs(eps,EPS_SMALLEST_MAGNITUDE,ierr) > call EPSSetDimensions(eps,n_sub,n_sub*2,PETSC_DEFAULT_INTEGER,ierr) > call EPSSetTolerances(eps,1D-10,5000,ierr) > call EPSSolve(eps,ierr) > > After the EPSSolve, I am calling EPSGetEigenPair and other relevant routines to get the eigenvector and eigenvalues. > > Any lead as to how to solve this problem would be greatly helpful to us. Please let me know if I need to provide any further information. > > Thank you for your time. > > Sincerely, > Pranay. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pranayreddy865 at gmail.com Thu Feb 13 15:03:37 2020 From: pranayreddy865 at gmail.com (baikadi pranay) Date: Thu, 13 Feb 2020 14:03:37 -0700 Subject: [petsc-users] Question regarding the EPSSetDimensions routine In-Reply-To: References: Message-ID: Thank you Jose for the reply. If I set PETSC_DEFAULT_INTEGER for ncv as suggested in the EPSSetDimensions documentation, I am still running into the same problem. Also, could you elaborate on what you mean by checking convergence with a monitor. Do you mean comparing the eigenvalues for ith and (i+1)th iterations and plotting the difference to see convergence? Sincerely, Pranay. ? On Thu, Feb 13, 2020 at 1:54 PM Jose E. Roman wrote: > For nev=10 you are using a subspace of size 20. This may be too small. > Check convergence with a monitor and increase ncv if necessary. > > Jose > > El 13 feb 2020, a las 21:25, baikadi pranay > escribi?: > > ? > Hello PETSc Users, > > I am trying to find the lowest 'n' eigenvalues of a hermitian eigenvalue > problem. The size of the operator matrix (hamiltonian in my case) is > dependent on the mesh spacing provided by the user (which is expected). > However I have the following issue: > > The number of eigenvalues given by the solver is not consistent with what > is given as input in the EPSSetDimensions routine. For example, for a > 12000x12000 matrix, the solver gives 20 correct eigenvalues if nev=20, but > fails to give any eigenvalue if nev=10. > > I am using the following lines of code to solve the problem: > > > *call EPSCreate(PETSC_COMM_WORLD,eps,ierr)* > > > > *call EPSSetOperators(eps,ham,PETSC_NULL_MAT,ierr)call > EPSSetProblemType(eps,EPS_HEP,ierr)call > EPSSetWhichEigenpairs(eps,EPS_SMALLEST_MAGNITUDE,ierr)call > EPSSetDimensions(eps,n_sub,n_sub*2,PETSC_DEFAULT_INTEGER,ierr)* > > > *call EPSSetTolerances(eps,1D-10,5000,ierr)call EPSSolve(eps,ierr)* > > After the EPSSolve, I am calling EPSGetEigenPair and other relevant > routines to get the eigenvector and eigenvalues. > > Any lead as to how to solve this problem would be greatly helpful to us. > Please let me know if I need to provide any further information. > > Thank you for your time. > > Sincerely, > Pranay. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From skavou1 at lsu.edu Thu Feb 13 15:05:34 2020 From: skavou1 at lsu.edu (Sepideh Kavousi) Date: Thu, 13 Feb 2020 21:05:34 +0000 Subject: [petsc-users] TSMonitorSolutionVTK Message-ID: Dear Petsc users, I am trying to write VTK output file every few steps. When I was using the petsc 3.7 I used the following lines: if (user->ts_write % 500 ==0) { ierr= PetscSNPrintf(user->filename,sizeof(user->filename),"one-%03d.vts",user->ts_write);CHKERRQ(ierr); ierr= TSMonitorSolutionVTK(user->ts,user->ts_write,t,user->sol_old,&user->filename);CHKERRQ(ierr);} user->ts_write+=1; } and it worked fine, but when I use the same line in Petsc 3.9 and 3.10, using these lines it still outputs the files but when I open them with Visit and try to visualize the individual field component, it does not show them. I was wondering does anything change in "TSMonitorSolutionVTK" between the petsc 3.7 and the newer versions? Best, Sepideh -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Thu Feb 13 15:17:20 2020 From: jroman at dsic.upv.es (Jose E. Roman) Date: Thu, 13 Feb 2020 22:17:20 +0100 Subject: [petsc-users] Question regarding the EPSSetDimensions routine In-Reply-To: References: Message-ID: I mean run with -eps_monitor (see section 2.5.3) and you will see if residuals are decreasing. Either increase the maximum number of iterations or the size of the subspace. > El 13 feb 2020, a las 22:03, baikadi pranay escribi?: > > Thank you Jose for the reply. > > If I set PETSC_DEFAULT_INTEGER for ncv as suggested in the EPSSetDimensions documentation, I am still running into the same problem. Also, could you elaborate on what you mean by checking convergence with a monitor. Do you mean comparing the eigenvalues for ith and (i+1)th iterations and plotting the difference to see convergence? > > Sincerely, > Pranay. > ? > > On Thu, Feb 13, 2020 at 1:54 PM Jose E. Roman wrote: > For nev=10 you are using a subspace of size 20. This may be too small. Check convergence with a monitor and increase ncv if necessary. > > Jose > >> El 13 feb 2020, a las 21:25, baikadi pranay escribi?: >> >> ? >> Hello PETSc Users, >> >> I am trying to find the lowest 'n' eigenvalues of a hermitian eigenvalue problem. The size of the operator matrix (hamiltonian in my case) is dependent on the mesh spacing provided by the user (which is expected). However I have the following issue: >> >> The number of eigenvalues given by the solver is not consistent with what is given as input in the EPSSetDimensions routine. For example, for a 12000x12000 matrix, the solver gives 20 correct eigenvalues if nev=20, but fails to give any eigenvalue if nev=10. >> >> I am using the following lines of code to solve the problem: >> >> call EPSCreate(PETSC_COMM_WORLD,eps,ierr) >> call EPSSetOperators(eps,ham,PETSC_NULL_MAT,ierr) >> call EPSSetProblemType(eps,EPS_HEP,ierr) >> call EPSSetWhichEigenpairs(eps,EPS_SMALLEST_MAGNITUDE,ierr) >> call EPSSetDimensions(eps,n_sub,n_sub*2,PETSC_DEFAULT_INTEGER,ierr) >> call EPSSetTolerances(eps,1D-10,5000,ierr) >> call EPSSolve(eps,ierr) >> >> After the EPSSolve, I am calling EPSGetEigenPair and other relevant routines to get the eigenvector and eigenvalues. >> >> Any lead as to how to solve this problem would be greatly helpful to us. Please let me know if I need to provide any further information. >> >> Thank you for your time. >> >> Sincerely, >> Pranay. >> From jed at jedbrown.org Thu Feb 13 16:06:26 2020 From: jed at jedbrown.org (Jed Brown) Date: Thu, 13 Feb 2020 15:06:26 -0700 Subject: [petsc-users] TSMonitorSolutionVTK In-Reply-To: References: Message-ID: <87imkarvkt.fsf@jedbrown.org> Sepideh Kavousi writes: > Dear Petsc users, > I am trying to write VTK output file every few steps. When I was using the petsc 3.7 I used the following lines: > if (user->ts_write % 500 ==0) { > ierr= PetscSNPrintf(user->filename,sizeof(user->filename),"one-%03d.vts",user->ts_write);CHKERRQ(ierr); > ierr= TSMonitorSolutionVTK(user->ts,user->ts_write,t,user->sol_old,&user->filename);CHKERRQ(ierr);} > user->ts_write+=1; > } > and it worked fine, but when I use the same line in Petsc 3.9 and 3.10, using these lines it still outputs the files but when I open them with Visit and try to visualize the individual field component, it does not show them. Do they not exist or are just unnamed? I believe we had a couple regressions with this output style circa 3.10 (side-effect of better handling vector fields), but I believe it was fixed by 3.11 or certainly 3.12. Would it be possible for you to upgrade? Note that you can (and should) pass the format specifier directly, as in: TSMonitorSolutionVTK(user->ts,user->ts_write,t,user->sol_old,"one-%03D.vts"); Visit won't care about the numbers not being contiguous. If you want to name them manually, just handle the viewer yourself; body of TSMonitorSolutionVTK: ierr = PetscSNPrintf(filename,sizeof(filename),(const char*)filenametemplate,step);CHKERRQ(ierr); ierr = PetscViewerVTKOpen(PetscObjectComm((PetscObject)ts),filename,FILE_MODE_WRITE,&viewer);CHKERRQ(ierr); ierr = VecView(u,viewer);CHKERRQ(ierr); ierr = PetscViewerDestroy(&viewer);CHKERRQ(ierr); From jed at jedbrown.org Thu Feb 13 17:28:35 2020 From: jed at jedbrown.org (Jed Brown) Date: Thu, 13 Feb 2020 16:28:35 -0700 Subject: [petsc-users] TSMonitorSolutionVTK In-Reply-To: References: <87imkarvkt.fsf@jedbrown.org> Message-ID: <87blq2rrrw.fsf@jedbrown.org> Good to hear it works now. And here's the relevant merge request (from April 2019). https://gitlab.com/petsc/petsc/-/merge_requests/1602 Sepideh Kavousi writes: > Dear Dr. Brown, > In the VTS files the field components did not exist. I checked 3.11 and 3.12, and it seems 3.11 still has the problem but for the 3.12 the problem is solved. > Thanks, > Sepideh > ________________________________ > From: Jed Brown > Sent: Thursday, February 13, 2020 4:06 PM > To: Sepideh Kavousi ; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] TSMonitorSolutionVTK > > Sepideh Kavousi writes: > >> Dear Petsc users, >> I am trying to write VTK output file every few steps. When I was using the petsc 3.7 I used the following lines: >> if (user->ts_write % 500 ==0) { >> ierr= PetscSNPrintf(user->filename,sizeof(user->filename),"one-%03d.vts",user->ts_write);CHKERRQ(ierr); >> ierr= TSMonitorSolutionVTK(user->ts,user->ts_write,t,user->sol_old,&user->filename);CHKERRQ(ierr);} >> user->ts_write+=1; >> } >> and it worked fine, but when I use the same line in Petsc 3.9 and 3.10, using these lines it still outputs the files but when I open them with Visit and try to visualize the individual field component, it does not show them. > > Do they not exist or are just unnamed? I believe we had a couple > regressions with this output style circa 3.10 (side-effect of better > handling vector fields), but I believe it was fixed by 3.11 or certainly > 3.12. Would it be possible for you to upgrade? > > > Note that you can (and should) pass the format specifier directly, as in: > > TSMonitorSolutionVTK(user->ts,user->ts_write,t,user->sol_old,"one-%03D.vts"); > > Visit won't care about the numbers not being contiguous. If you want to > name them manually, just handle the viewer yourself; body of TSMonitorSolutionVTK: > > ierr = PetscSNPrintf(filename,sizeof(filename),(const char*)filenametemplate,step);CHKERRQ(ierr); > ierr = PetscViewerVTKOpen(PetscObjectComm((PetscObject)ts),filename,FILE_MODE_WRITE,&viewer);CHKERRQ(ierr); > ierr = VecView(u,viewer);CHKERRQ(ierr); > ierr = PetscViewerDestroy(&viewer);CHKERRQ(ierr); From richard.beare at monash.edu Thu Feb 13 23:43:46 2020 From: richard.beare at monash.edu (Richard Beare) Date: Fri, 14 Feb 2020 16:43:46 +1100 Subject: [petsc-users] Crash caused by strange error in KSPSetUp Message-ID: Hi Everyone, I am experimenting with the Simlul at trophy tool ( https://github.com/Inria-Asclepios/simul-atrophy) that uses petsc to simulate brain atrophy based on segmented MRI data. I am not the author. I have this running on most of a dataset of about 50 scans, but experience crashes with several that I am trying to track down. However I am out of ideas. The problem images are slightly bigger than some of the successful ones, but not substantially so, and I have experimented on machines with sufficient RAM. The error happens very quickly, as part of setup - see the valgrind report below. I haven't managed to get the sgcheck tool to work yet. I can only guess that the ksp object is somehow becoming corrupted during the setup process, but the array sizes that I can track (which derive from image sizes), appear correct at every point I can check. Any suggestions as to how I can check what might go wrong in the setup of the ksp object? Thankyou. valgrind tells me: ==18175== Argument 'size' of function memalign has a fishy (possibly negative) value: -17152038144 ==18175== at 0x4C320A6: memalign (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==18175== by 0x4F0F1F2: PetscMallocAlign(unsigned long, int, char const*, char const*, void**) (mal.c:28) ==18175== by 0x56B43CA: MatSeqAIJSetPreallocation_SeqAIJ (aij.c:3595) ==18175== by 0x56B39BD: MatSeqAIJSetPreallocation (aij.c:3539) ==18175== by 0x59A9B44: DMCreateMatrix_DA_3d_MPIAIJ(_p_DM*, _p_Mat*) (fdda.c:1085) ==18175== by 0x59A4C71: DMCreateMatrix_DA(_p_DM*, _p_Mat**) (fdda.c:759) ==18175== by 0x58BBD29: DMCreateMatrix (dm.c:956) ==18175== by 0x5E509D5: KSPSetUp (itfunc.c:262) ==18175== by 0x40A3DE: PetscAdLemTaras3D::solveModel(bool) (PetscAdLemTaras3D.hxx:269) ==18175== by 0x42413F: AdLem3D<3u>::solveModel(bool, bool, bool) (AdLem3D.hxx:552) ==18175== by 0x41C25C: main (PetscAdLemMain.cxx:349) ==18175== -- -- A/Prof Richard Beare Imaging and Bioinformatics, Peninsula Clinical School orcid.org/0000-0002-7530-5664 Richard.Beare at monash.edu +61 3 9788 1724 Geospatial Research: https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Feb 14 00:07:24 2020 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Fri, 14 Feb 2020 06:07:24 +0000 Subject: [petsc-users] Crash caused by strange error in KSPSetUp In-Reply-To: References: Message-ID: Richard, It is likely that for these problems some of the integers become too large for the int variable to hold them, thus they overflow and become negative. You should make a new PETSC_ARCH configuration of PETSc that uses the configure option --with-64-bit-indices, this will change PETSc to use 64 bit integers which will not overflow. Good luck and let us know how it works out Barry Probably the code is built with an older version of PETSc; the later versions should produce a more useful error message. > On Feb 13, 2020, at 11:43 PM, Richard Beare via petsc-users wrote: > > Hi Everyone, > I am experimenting with the Simlul at trophy tool (https://github.com/Inria-Asclepios/simul-atrophy) that uses petsc to simulate brain atrophy based on segmented MRI data. I am not the author. I have this running on most of a dataset of about 50 scans, but experience crashes with several that I am trying to track down. However I am out of ideas. The problem images are slightly bigger than some of the successful ones, but not substantially so, and I have experimented on machines with sufficient RAM. The error happens very quickly, as part of setup - see the valgrind report below. I haven't managed to get the sgcheck tool to work yet. I can only guess that the ksp object is somehow becoming corrupted during the setup process, but the array sizes that I can track (which derive from image sizes), appear correct at every point I can check. Any suggestions as to how I can check what might go wrong in the setup of the ksp object? > Thankyou. > > valgrind tells me: > > ==18175== Argument 'size' of function memalign has a fishy (possibly negative) value: -17152038144 > ==18175== at 0x4C320A6: memalign (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) > ==18175== by 0x4F0F1F2: PetscMallocAlign(unsigned long, int, char const*, char const*, void**) (mal.c:28) > ==18175== by 0x56B43CA: MatSeqAIJSetPreallocation_SeqAIJ (aij.c:3595) > ==18175== by 0x56B39BD: MatSeqAIJSetPreallocation (aij.c:3539) > ==18175== by 0x59A9B44: DMCreateMatrix_DA_3d_MPIAIJ(_p_DM*, _p_Mat*) (fdda.c:1085) > ==18175== by 0x59A4C71: DMCreateMatrix_DA(_p_DM*, _p_Mat**) (fdda.c:759) > ==18175== by 0x58BBD29: DMCreateMatrix (dm.c:956) > ==18175== by 0x5E509D5: KSPSetUp (itfunc.c:262) > ==18175== by 0x40A3DE: PetscAdLemTaras3D::solveModel(bool) (PetscAdLemTaras3D.hxx:269) > ==18175== by 0x42413F: AdLem3D<3u>::solveModel(bool, bool, bool) (AdLem3D.hxx:552) > ==18175== by 0x41C25C: main (PetscAdLemMain.cxx:349) > ==18175== > > -- > -- > A/Prof Richard Beare > Imaging and Bioinformatics, Peninsula Clinical School > orcid.org/0000-0002-7530-5664 > Richard.Beare at monash.edu > +61 3 9788 1724 > > > > Geospatial Research: https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis From richard.beare at monash.edu Fri Feb 14 05:10:53 2020 From: richard.beare at monash.edu (Richard Beare) Date: Fri, 14 Feb 2020 22:10:53 +1100 Subject: [petsc-users] Crash caused by strange error in KSPSetUp In-Reply-To: References: Message-ID: No luck - exactly the same error after including the --with-64-bit-indicies=yes --download-mpich=yes options ==8674== Argument 'size' of function memalign has a fishy (possibly negative) value: -17152036540 ==8674== at 0x4C320A6: memalign (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==8674== by 0x4F0CFF2: PetscMallocAlign(unsigned long, int, char const*, char const*, void**) (mal.c:28) ==8674== by 0x4F0F716: PetscTrMallocDefault(unsigned long, int, char const*, char const*, void**) (mtr.c:188) ==8674== by 0x569AF3E: MatSeqAIJSetPreallocation_SeqAIJ (aij.c:3595) ==8674== by 0x569A531: MatSeqAIJSetPreallocation (aij.c:3539) ==8674== by 0x599080A: DMCreateMatrix_DA_3d_MPIAIJ(_p_DM*, _p_Mat*) (fdda.c:1085) ==8674== by 0x598B937: DMCreateMatrix_DA(_p_DM*, _p_Mat**) (fdda.c:759) ==8674== by 0x58A2BF2: DMCreateMatrix (dm.c:956) ==8674== by 0x5E377B3: KSPSetUp (itfunc.c:262) ==8674== by 0x409FFC: PetscAdLemTaras3D::solveModel(bool) (PetscAdLemTaras3D.hxx:255) ==8674== by 0x4239FB: AdLem3D<3u>::solveModel(bool, bool, bool) (AdLem3D.hxx:551) ==8674== by 0x41BD17: main (PetscAdLemMain.cxx:344) ==8674== On Fri, 14 Feb 2020 at 17:07, Smith, Barry F. wrote: > > Richard, > > It is likely that for these problems some of the integers become too > large for the int variable to hold them, thus they overflow and become > negative. > > You should make a new PETSC_ARCH configuration of PETSc that uses the > configure option --with-64-bit-indices, this will change PETSc to use 64 > bit integers which will not overflow. > > Good luck and let us know how it works out > > Barry > > Probably the code is built with an older version of PETSc; the later > versions should produce a more useful error message. > > > On Feb 13, 2020, at 11:43 PM, Richard Beare via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > > Hi Everyone, > > I am experimenting with the Simlul at trophy tool ( > https://github.com/Inria-Asclepios/simul-atrophy) that uses petsc to > simulate brain atrophy based on segmented MRI data. I am not the author. I > have this running on most of a dataset of about 50 scans, but experience > crashes with several that I am trying to track down. However I am out of > ideas. The problem images are slightly bigger than some of the successful > ones, but not substantially so, and I have experimented on machines with > sufficient RAM. The error happens very quickly, as part of setup - see the > valgrind report below. I haven't managed to get the sgcheck tool to work > yet. I can only guess that the ksp object is somehow becoming corrupted > during the setup process, but the array sizes that I can track (which > derive from image sizes), appear correct at every point I can check. Any > suggestions as to how I can check what might go wrong in the setup of the > ksp object? > > Thankyou. > > > > valgrind tells me: > > > > ==18175== Argument 'size' of function memalign has a fishy (possibly > negative) value: -17152038144 > > ==18175== at 0x4C320A6: memalign (in > /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) > > ==18175== by 0x4F0F1F2: PetscMallocAlign(unsigned long, int, char > const*, char const*, void**) (mal.c:28) > > ==18175== by 0x56B43CA: MatSeqAIJSetPreallocation_SeqAIJ (aij.c:3595) > > ==18175== by 0x56B39BD: MatSeqAIJSetPreallocation (aij.c:3539) > > ==18175== by 0x59A9B44: DMCreateMatrix_DA_3d_MPIAIJ(_p_DM*, _p_Mat*) > (fdda.c:1085) > > ==18175== by 0x59A4C71: DMCreateMatrix_DA(_p_DM*, _p_Mat**) > (fdda.c:759) > > ==18175== by 0x58BBD29: DMCreateMatrix (dm.c:956) > > ==18175== by 0x5E509D5: KSPSetUp (itfunc.c:262) > > ==18175== by 0x40A3DE: PetscAdLemTaras3D::solveModel(bool) > (PetscAdLemTaras3D.hxx:269) > > ==18175== by 0x42413F: AdLem3D<3u>::solveModel(bool, bool, bool) > (AdLem3D.hxx:552) > > ==18175== by 0x41C25C: main (PetscAdLemMain.cxx:349) > > ==18175== > > > > -- > > -- > > A/Prof Richard Beare > > Imaging and Bioinformatics, Peninsula Clinical School > > orcid.org/0000-0002-7530-5664 > > Richard.Beare at monash.edu > > +61 3 9788 1724 > > > > > > > > Geospatial Research: > https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis > > -- -- A/Prof Richard Beare Imaging and Bioinformatics, Peninsula Clinical School orcid.org/0000-0002-7530-5664 Richard.Beare at monash.edu +61 3 9788 1724 Geospatial Research: https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis -------------- next part -------------- An HTML attachment was scrubbed... URL: From jczhang at mcs.anl.gov Fri Feb 14 09:47:04 2020 From: jczhang at mcs.anl.gov (Junchao Zhang) Date: Fri, 14 Feb 2020 09:47:04 -0600 Subject: [petsc-users] Crash caused by strange error in KSPSetUp In-Reply-To: References: Message-ID: Which petsc version do you use? In aij.c of the master branch, I saw Barry recently added a useful check to catch number of nonzero overflow, ierr = PetscIntCast(nz64,&nz);CHKERRQ(ierr); But you mentioned using 64-bit indices did not solve the problem, it might not be the reason. You should try the master branch if feasible. Also, vary number of MPI ranks to see if error stack changes. --Junchao Zhang On Fri, Feb 14, 2020 at 5:12 AM Richard Beare via petsc-users < petsc-users at mcs.anl.gov> wrote: > No luck - exactly the same error after including the > --with-64-bit-indicies=yes --download-mpich=yes options > > ==8674== Argument 'size' of function memalign has a fishy (possibly > negative) value: -17152036540 > ==8674== at 0x4C320A6: memalign (in > /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) > ==8674== by 0x4F0CFF2: PetscMallocAlign(unsigned long, int, char > const*, char const*, void**) (mal.c:28) > ==8674== by 0x4F0F716: PetscTrMallocDefault(unsigned long, int, char > const*, char const*, void**) (mtr.c:188) > ==8674== by 0x569AF3E: MatSeqAIJSetPreallocation_SeqAIJ (aij.c:3595) > ==8674== by 0x569A531: MatSeqAIJSetPreallocation (aij.c:3539) > ==8674== by 0x599080A: DMCreateMatrix_DA_3d_MPIAIJ(_p_DM*, _p_Mat*) > (fdda.c:1085) > ==8674== by 0x598B937: DMCreateMatrix_DA(_p_DM*, _p_Mat**) (fdda.c:759) > ==8674== by 0x58A2BF2: DMCreateMatrix (dm.c:956) > ==8674== by 0x5E377B3: KSPSetUp (itfunc.c:262) > ==8674== by 0x409FFC: PetscAdLemTaras3D::solveModel(bool) > (PetscAdLemTaras3D.hxx:255) > ==8674== by 0x4239FB: AdLem3D<3u>::solveModel(bool, bool, bool) > (AdLem3D.hxx:551) > ==8674== by 0x41BD17: main (PetscAdLemMain.cxx:344) > ==8674== > On Fri, 14 Feb 2020 at 17:07, Smith, Barry F. wrote: > >> >> Richard, >> >> It is likely that for these problems some of the integers become too >> large for the int variable to hold them, thus they overflow and become >> negative. >> >> You should make a new PETSC_ARCH configuration of PETSc that uses >> the configure option --with-64-bit-indices, this will change PETSc to use >> 64 bit integers which will not overflow. >> >> Good luck and let us know how it works out >> >> Barry >> >> Probably the code is built with an older version of PETSc; the later >> versions should produce a more useful error message. >> >> > On Feb 13, 2020, at 11:43 PM, Richard Beare via petsc-users < >> petsc-users at mcs.anl.gov> wrote: >> > >> > Hi Everyone, >> > I am experimenting with the Simlul at trophy tool ( >> https://github.com/Inria-Asclepios/simul-atrophy) that uses petsc to >> simulate brain atrophy based on segmented MRI data. I am not the author. I >> have this running on most of a dataset of about 50 scans, but experience >> crashes with several that I am trying to track down. However I am out of >> ideas. The problem images are slightly bigger than some of the successful >> ones, but not substantially so, and I have experimented on machines with >> sufficient RAM. The error happens very quickly, as part of setup - see the >> valgrind report below. I haven't managed to get the sgcheck tool to work >> yet. I can only guess that the ksp object is somehow becoming corrupted >> during the setup process, but the array sizes that I can track (which >> derive from image sizes), appear correct at every point I can check. Any >> suggestions as to how I can check what might go wrong in the setup of the >> ksp object? >> > Thankyou. >> > >> > valgrind tells me: >> > >> > ==18175== Argument 'size' of function memalign has a fishy (possibly >> negative) value: -17152038144 >> > ==18175== at 0x4C320A6: memalign (in >> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) >> > ==18175== by 0x4F0F1F2: PetscMallocAlign(unsigned long, int, char >> const*, char const*, void**) (mal.c:28) >> > ==18175== by 0x56B43CA: MatSeqAIJSetPreallocation_SeqAIJ (aij.c:3595) >> > ==18175== by 0x56B39BD: MatSeqAIJSetPreallocation (aij.c:3539) >> > ==18175== by 0x59A9B44: DMCreateMatrix_DA_3d_MPIAIJ(_p_DM*, _p_Mat*) >> (fdda.c:1085) >> > ==18175== by 0x59A4C71: DMCreateMatrix_DA(_p_DM*, _p_Mat**) >> (fdda.c:759) >> > ==18175== by 0x58BBD29: DMCreateMatrix (dm.c:956) >> > ==18175== by 0x5E509D5: KSPSetUp (itfunc.c:262) >> > ==18175== by 0x40A3DE: PetscAdLemTaras3D::solveModel(bool) >> (PetscAdLemTaras3D.hxx:269) >> > ==18175== by 0x42413F: AdLem3D<3u>::solveModel(bool, bool, bool) >> (AdLem3D.hxx:552) >> > ==18175== by 0x41C25C: main (PetscAdLemMain.cxx:349) >> > ==18175== >> > >> > -- >> > -- >> > A/Prof Richard Beare >> > Imaging and Bioinformatics, Peninsula Clinical School >> > orcid.org/0000-0002-7530-5664 >> > Richard.Beare at monash.edu >> > +61 3 9788 1724 >> > >> > >> > >> > Geospatial Research: >> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis >> >> > > -- > -- > A/Prof Richard Beare > Imaging and Bioinformatics, Peninsula Clinical School > orcid.org/0000-0002-7530-5664 > Richard.Beare at monash.edu > +61 3 9788 1724 > > > > Geospatial Research: > https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eijkhout at tacc.utexas.edu Fri Feb 14 09:49:12 2020 From: eijkhout at tacc.utexas.edu (Victor Eijkhout) Date: Fri, 14 Feb 2020 15:49:12 +0000 Subject: [petsc-users] Crash caused by strange error in KSPSetUp In-Reply-To: References: Message-ID: On , 2020Feb14, at 09:47, Junchao Zhang via petsc-users > wrote: Barry recently added a useful check to catch number of nonzero overflow, ierr = PetscIntCast(nz64,&nz); Is that only activated in debug versions of the installation? Victor. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jczhang at mcs.anl.gov Fri Feb 14 09:52:32 2020 From: jczhang at mcs.anl.gov (Junchao Zhang) Date: Fri, 14 Feb 2020 09:52:32 -0600 Subject: [petsc-users] Crash caused by strange error in KSPSetUp In-Reply-To: References: Message-ID: On Fri, Feb 14, 2020 at 9:49 AM Victor Eijkhout wrote: > > > On , 2020Feb14, at 09:47, Junchao Zhang via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Barry recently added a useful check to catch number of nonzero overflow, > ierr = PetscIntCast(nz64,&nz); > > > Is that only activated in debug versions of the installation? > No. It is activated for all builds. > > Victor. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.beare at monash.edu Fri Feb 14 15:03:49 2020 From: richard.beare at monash.edu (Richard Beare) Date: Sat, 15 Feb 2020 08:03:49 +1100 Subject: [petsc-users] Crash caused by strange error in KSPSetUp In-Reply-To: References: Message-ID: I will see if I can build with master. The docs for simulatrophy say 3.6.3.1. On Sat, 15 Feb 2020 at 02:47, Junchao Zhang wrote: > Which petsc version do you use? In aij.c of the master branch, I saw Barry > recently added a useful check to catch number of nonzero overflow, ierr = > PetscIntCast(nz64,&nz);CHKERRQ(ierr); But you mentioned using 64-bit > indices did not solve the problem, it might not be the reason. You should > try the master branch if feasible. Also, vary number of MPI ranks to see if > error stack changes. > > --Junchao Zhang > > > On Fri, Feb 14, 2020 at 5:12 AM Richard Beare via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> No luck - exactly the same error after including the >> --with-64-bit-indicies=yes --download-mpich=yes options >> >> ==8674== Argument 'size' of function memalign has a fishy (possibly >> negative) value: -17152036540 >> ==8674== at 0x4C320A6: memalign (in >> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==8674== by 0x4F0CFF2: PetscMallocAlign(unsigned long, int, char >> const*, char const*, void**) (mal.c:28) >> ==8674== by 0x4F0F716: PetscTrMallocDefault(unsigned long, int, char >> const*, char const*, void**) (mtr.c:188) >> ==8674== by 0x569AF3E: MatSeqAIJSetPreallocation_SeqAIJ (aij.c:3595) >> ==8674== by 0x569A531: MatSeqAIJSetPreallocation (aij.c:3539) >> ==8674== by 0x599080A: DMCreateMatrix_DA_3d_MPIAIJ(_p_DM*, _p_Mat*) >> (fdda.c:1085) >> ==8674== by 0x598B937: DMCreateMatrix_DA(_p_DM*, _p_Mat**) (fdda.c:759) >> ==8674== by 0x58A2BF2: DMCreateMatrix (dm.c:956) >> ==8674== by 0x5E377B3: KSPSetUp (itfunc.c:262) >> ==8674== by 0x409FFC: PetscAdLemTaras3D::solveModel(bool) >> (PetscAdLemTaras3D.hxx:255) >> ==8674== by 0x4239FB: AdLem3D<3u>::solveModel(bool, bool, bool) >> (AdLem3D.hxx:551) >> ==8674== by 0x41BD17: main (PetscAdLemMain.cxx:344) >> ==8674== >> On Fri, 14 Feb 2020 at 17:07, Smith, Barry F. wrote: >> >>> >>> Richard, >>> >>> It is likely that for these problems some of the integers become >>> too large for the int variable to hold them, thus they overflow and become >>> negative. >>> >>> You should make a new PETSC_ARCH configuration of PETSc that uses >>> the configure option --with-64-bit-indices, this will change PETSc to use >>> 64 bit integers which will not overflow. >>> >>> Good luck and let us know how it works out >>> >>> Barry >>> >>> Probably the code is built with an older version of PETSc; the >>> later versions should produce a more useful error message. >>> >>> > On Feb 13, 2020, at 11:43 PM, Richard Beare via petsc-users < >>> petsc-users at mcs.anl.gov> wrote: >>> > >>> > Hi Everyone, >>> > I am experimenting with the Simlul at trophy tool ( >>> https://github.com/Inria-Asclepios/simul-atrophy) that uses petsc to >>> simulate brain atrophy based on segmented MRI data. I am not the author. I >>> have this running on most of a dataset of about 50 scans, but experience >>> crashes with several that I am trying to track down. However I am out of >>> ideas. The problem images are slightly bigger than some of the successful >>> ones, but not substantially so, and I have experimented on machines with >>> sufficient RAM. The error happens very quickly, as part of setup - see the >>> valgrind report below. I haven't managed to get the sgcheck tool to work >>> yet. I can only guess that the ksp object is somehow becoming corrupted >>> during the setup process, but the array sizes that I can track (which >>> derive from image sizes), appear correct at every point I can check. Any >>> suggestions as to how I can check what might go wrong in the setup of the >>> ksp object? >>> > Thankyou. >>> > >>> > valgrind tells me: >>> > >>> > ==18175== Argument 'size' of function memalign has a fishy (possibly >>> negative) value: -17152038144 >>> > ==18175== at 0x4C320A6: memalign (in >>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) >>> > ==18175== by 0x4F0F1F2: PetscMallocAlign(unsigned long, int, char >>> const*, char const*, void**) (mal.c:28) >>> > ==18175== by 0x56B43CA: MatSeqAIJSetPreallocation_SeqAIJ >>> (aij.c:3595) >>> > ==18175== by 0x56B39BD: MatSeqAIJSetPreallocation (aij.c:3539) >>> > ==18175== by 0x59A9B44: DMCreateMatrix_DA_3d_MPIAIJ(_p_DM*, >>> _p_Mat*) (fdda.c:1085) >>> > ==18175== by 0x59A4C71: DMCreateMatrix_DA(_p_DM*, _p_Mat**) >>> (fdda.c:759) >>> > ==18175== by 0x58BBD29: DMCreateMatrix (dm.c:956) >>> > ==18175== by 0x5E509D5: KSPSetUp (itfunc.c:262) >>> > ==18175== by 0x40A3DE: PetscAdLemTaras3D::solveModel(bool) >>> (PetscAdLemTaras3D.hxx:269) >>> > ==18175== by 0x42413F: AdLem3D<3u>::solveModel(bool, bool, bool) >>> (AdLem3D.hxx:552) >>> > ==18175== by 0x41C25C: main (PetscAdLemMain.cxx:349) >>> > ==18175== >>> > >>> > -- >>> > -- >>> > A/Prof Richard Beare >>> > Imaging and Bioinformatics, Peninsula Clinical School >>> > orcid.org/0000-0002-7530-5664 >>> > Richard.Beare at monash.edu >>> > +61 3 9788 1724 >>> > >>> > >>> > >>> > Geospatial Research: >>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis >>> >>> >> >> -- >> -- >> A/Prof Richard Beare >> Imaging and Bioinformatics, Peninsula Clinical School >> orcid.org/0000-0002-7530-5664 >> Richard.Beare at monash.edu >> +61 3 9788 1724 >> >> >> >> Geospatial Research: >> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis >> > -- -- A/Prof Richard Beare Imaging and Bioinformatics, Peninsula Clinical School orcid.org/0000-0002-7530-5664 Richard.Beare at monash.edu +61 3 9788 1724 Geospatial Research: https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.beare at monash.edu Fri Feb 14 20:32:57 2020 From: richard.beare at monash.edu (Richard Beare) Date: Sat, 15 Feb 2020 13:32:57 +1100 Subject: [petsc-users] Crash caused by strange error in KSPSetUp In-Reply-To: References: Message-ID: It doesn't compile out of the box with master. singularity def file attached. On Sat, 15 Feb 2020 at 08:03, Richard Beare wrote: > I will see if I can build with master. The docs for simulatrophy say > 3.6.3.1. > > On Sat, 15 Feb 2020 at 02:47, Junchao Zhang wrote: > >> Which petsc version do you use? In aij.c of the master branch, I saw >> Barry recently added a useful check to catch number of nonzero overflow, >> ierr = PetscIntCast(nz64,&nz);CHKERRQ(ierr); But you mentioned using >> 64-bit indices did not solve the problem, it might not be the reason. You >> should try the master branch if feasible. Also, vary number of MPI ranks to >> see if error stack changes. >> >> --Junchao Zhang >> >> >> On Fri, Feb 14, 2020 at 5:12 AM Richard Beare via petsc-users < >> petsc-users at mcs.anl.gov> wrote: >> >>> No luck - exactly the same error after including the >>> --with-64-bit-indicies=yes --download-mpich=yes options >>> >>> ==8674== Argument 'size' of function memalign has a fishy (possibly >>> negative) value: -17152036540 >>> ==8674== at 0x4C320A6: memalign (in >>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==8674== by 0x4F0CFF2: PetscMallocAlign(unsigned long, int, char >>> const*, char const*, void**) (mal.c:28) >>> ==8674== by 0x4F0F716: PetscTrMallocDefault(unsigned long, int, char >>> const*, char const*, void**) (mtr.c:188) >>> ==8674== by 0x569AF3E: MatSeqAIJSetPreallocation_SeqAIJ (aij.c:3595) >>> ==8674== by 0x569A531: MatSeqAIJSetPreallocation (aij.c:3539) >>> ==8674== by 0x599080A: DMCreateMatrix_DA_3d_MPIAIJ(_p_DM*, _p_Mat*) >>> (fdda.c:1085) >>> ==8674== by 0x598B937: DMCreateMatrix_DA(_p_DM*, _p_Mat**) >>> (fdda.c:759) >>> ==8674== by 0x58A2BF2: DMCreateMatrix (dm.c:956) >>> ==8674== by 0x5E377B3: KSPSetUp (itfunc.c:262) >>> ==8674== by 0x409FFC: PetscAdLemTaras3D::solveModel(bool) >>> (PetscAdLemTaras3D.hxx:255) >>> ==8674== by 0x4239FB: AdLem3D<3u>::solveModel(bool, bool, bool) >>> (AdLem3D.hxx:551) >>> ==8674== by 0x41BD17: main (PetscAdLemMain.cxx:344) >>> ==8674== >>> On Fri, 14 Feb 2020 at 17:07, Smith, Barry F. >>> wrote: >>> >>>> >>>> Richard, >>>> >>>> It is likely that for these problems some of the integers become >>>> too large for the int variable to hold them, thus they overflow and become >>>> negative. >>>> >>>> You should make a new PETSC_ARCH configuration of PETSc that uses >>>> the configure option --with-64-bit-indices, this will change PETSc to use >>>> 64 bit integers which will not overflow. >>>> >>>> Good luck and let us know how it works out >>>> >>>> Barry >>>> >>>> Probably the code is built with an older version of PETSc; the >>>> later versions should produce a more useful error message. >>>> >>>> > On Feb 13, 2020, at 11:43 PM, Richard Beare via petsc-users < >>>> petsc-users at mcs.anl.gov> wrote: >>>> > >>>> > Hi Everyone, >>>> > I am experimenting with the Simlul at trophy tool ( >>>> https://github.com/Inria-Asclepios/simul-atrophy) that uses petsc to >>>> simulate brain atrophy based on segmented MRI data. I am not the author. I >>>> have this running on most of a dataset of about 50 scans, but experience >>>> crashes with several that I am trying to track down. However I am out of >>>> ideas. The problem images are slightly bigger than some of the successful >>>> ones, but not substantially so, and I have experimented on machines with >>>> sufficient RAM. The error happens very quickly, as part of setup - see the >>>> valgrind report below. I haven't managed to get the sgcheck tool to work >>>> yet. I can only guess that the ksp object is somehow becoming corrupted >>>> during the setup process, but the array sizes that I can track (which >>>> derive from image sizes), appear correct at every point I can check. Any >>>> suggestions as to how I can check what might go wrong in the setup of the >>>> ksp object? >>>> > Thankyou. >>>> > >>>> > valgrind tells me: >>>> > >>>> > ==18175== Argument 'size' of function memalign has a fishy (possibly >>>> negative) value: -17152038144 >>>> > ==18175== at 0x4C320A6: memalign (in >>>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> > ==18175== by 0x4F0F1F2: PetscMallocAlign(unsigned long, int, char >>>> const*, char const*, void**) (mal.c:28) >>>> > ==18175== by 0x56B43CA: MatSeqAIJSetPreallocation_SeqAIJ >>>> (aij.c:3595) >>>> > ==18175== by 0x56B39BD: MatSeqAIJSetPreallocation (aij.c:3539) >>>> > ==18175== by 0x59A9B44: DMCreateMatrix_DA_3d_MPIAIJ(_p_DM*, >>>> _p_Mat*) (fdda.c:1085) >>>> > ==18175== by 0x59A4C71: DMCreateMatrix_DA(_p_DM*, _p_Mat**) >>>> (fdda.c:759) >>>> > ==18175== by 0x58BBD29: DMCreateMatrix (dm.c:956) >>>> > ==18175== by 0x5E509D5: KSPSetUp (itfunc.c:262) >>>> > ==18175== by 0x40A3DE: PetscAdLemTaras3D::solveModel(bool) >>>> (PetscAdLemTaras3D.hxx:269) >>>> > ==18175== by 0x42413F: AdLem3D<3u>::solveModel(bool, bool, bool) >>>> (AdLem3D.hxx:552) >>>> > ==18175== by 0x41C25C: main (PetscAdLemMain.cxx:349) >>>> > ==18175== >>>> > >>>> > -- >>>> > -- >>>> > A/Prof Richard Beare >>>> > Imaging and Bioinformatics, Peninsula Clinical School >>>> > orcid.org/0000-0002-7530-5664 >>>> > Richard.Beare at monash.edu >>>> > +61 3 9788 1724 >>>> > >>>> > >>>> > >>>> > Geospatial Research: >>>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis >>>> >>>> >>> >>> -- >>> -- >>> A/Prof Richard Beare >>> Imaging and Bioinformatics, Peninsula Clinical School >>> orcid.org/0000-0002-7530-5664 >>> Richard.Beare at monash.edu >>> +61 3 9788 1724 >>> >>> >>> >>> Geospatial Research: >>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis >>> >> > > -- > -- > A/Prof Richard Beare > Imaging and Bioinformatics, Peninsula Clinical School > orcid.org/0000-0002-7530-5664 > Richard.Beare at monash.edu > +61 3 9788 1724 > > > > Geospatial Research: > https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis > -- -- A/Prof Richard Beare Imaging and Bioinformatics, Peninsula Clinical School orcid.org/0000-0002-7530-5664 Richard.Beare at monash.edu +61 3 9788 1724 Geospatial Research: https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: sa.def Type: application/octet-stream Size: 5804 bytes Desc: not available URL: From hongzhang at anl.gov Sat Feb 15 11:14:35 2020 From: hongzhang at anl.gov (Zhang, Hong) Date: Sat, 15 Feb 2020 17:14:35 +0000 Subject: [petsc-users] Flagging the solver to restart In-Reply-To: References: <4246EF6F-6B7A-4202-806B-6D334E5B9427@anl.gov> Message-ID: <9071818E-8A2A-4206-88B9-1C383CC4238B@anl.gov> Please make sure your replies go to the maillist. On Feb 15, 2020, at 4:35 AM, Mohammed Ashour > wrote: Dear Mr. Hong, Thanks for your reply and clarification. I have a follow-up question. If TSRollBack() is to be called within a TSPostStep, that would set the ts->steprollback to PETSC_TRUE. And since there is a falsity test on TSPreStep in TSSolve, that would prevent TSPreStep from being engaged as long as ts->steprollback is PETSC_TRUE, which it is after being set so in the TSPostStep. So I'm wondering, why is there a falsity test on TSPreStep knowing that it would not be accessible if TSRollBack is called within TSPostStep. This guarantees TSPreStep is called only once before each successful step. Hong Thanks in advance. Yours sincerely On Tue, Feb 4, 2020 at 5:32 PM Zhang, Hong > wrote: > On Feb 2, 2020, at 11:24 AM, Mohammed Ashour > wrote: > > Dear All, > I'm solving a constraint phase-field problem using PetIGA. This question i'm having is more relevant to PETSc, so I'm posting here. > > I have an algorithm involving iterating on the solution vector until certain criteria are met before moving forward for the next time step. The sequence inside the TSSolve is to call TSMonitor first, to print a user-defined set of values and the move to solve at TSStep and then call TSPostEvaluate. > > So I'm using the TSMonitor to update some variables at time n , those variables are used the in the residual and jacobian calculations at time n+1, and then solving and then check if those criteria are met or not in a function assigned to TS via TSSetPostEvaluate, if the criteria are met, it'll move forward, if not, it'll engaged the routine TSRollBack(), which based on my understanding is the proper way yo flag the solver to recalculate n+1. My question is, is this the proper way to do it? what is the difference between TSRollBack and TSRestart? You are right that TSRollBack() recalculates the current time step. But I would not suggest to use TSPostEvaluate in your case. Presumably you are not using the PETSc adaptor (e.g. via -ts_adapt_type none) and want to control the stepsize yourself. You can check the criteria in TSPostStep, call TSRollBack() if the criteria are not met and update the variables accordingly. The variables can also be updated in TSPreStep(), but TSMonitor should not be used since it is designed for read-only operations. TSRestart may be needed when you are using non-self-starting integration methods such as multiple step methods and FSAL RK methods (-ts_rk_type <3bs,5dp,5bs,6vr,7vr,8vr>). These methods rely on solutions or residuals from previous time steps, thus need a flag to hard restart the time integration whenever discontinuity is introduced (e.g. a parameter in the RHS function is changed). So TSRestart sets the flag to tell the integrator to treat the next time step like the first time step in a time integration. Hong (Mr.) > Thanks a lot > > -- -- Mohammed Ashour, M.Sc. PhD Scholar Bauhaus-Universit?t Weimar Institute of Structural Mechanics (ISM) Marienstra?e 7 99423 Weimar, Germany Mobile: +(49) 176 58834667 -------------- next part -------------- An HTML attachment was scrubbed... URL: From shrirang.abhyankar at pnnl.gov Sat Feb 15 12:20:14 2020 From: shrirang.abhyankar at pnnl.gov (Abhyankar, Shrirang G) Date: Sat, 15 Feb 2020 18:20:14 +0000 Subject: [petsc-users] Flagging the solver to restart In-Reply-To: <9071818E-8A2A-4206-88B9-1C383CC4238B@anl.gov> References: <4246EF6F-6B7A-4202-806B-6D334E5B9427@anl.gov> <9071818E-8A2A-4206-88B9-1C383CC4238B@anl.gov> Message-ID: <86076CCE-DF62-4805-9F3C-FBE441CDEAFF@pnnl.gov> if the criteria are met, it'll move forward, if not, it'll engaged the routine TSRollBack(), which based on my understanding is the proper way yo flag the solver to recalculate n+1. Are you trying to do some kind of zero crossing event or root-finding here? If so, using TSSetEventHandler would be a better way than to write your own code. You merely have to define the criteria/condition in an event function and how to handle it in a posteventfunction. PETSc will manage the event location and rollback part for you. See an example here https://www.mcs.anl.gov/petsc/petsc-current/src/ts/examples/tutorials/ex40.c.html. Thanks, Shri From: petsc-users on behalf of "Zhang, Hong via petsc-users" Reply-To: "Zhang, Hong" Date: Saturday, February 15, 2020 at 11:14 AM To: Mohammed Ashour Cc: PETSc users list Subject: Re: [petsc-users] Flagging the solver to restart Please make sure your replies go to the maillist. On Feb 15, 2020, at 4:35 AM, Mohammed Ashour > wrote: Dear Mr. Hong, Thanks for your reply and clarification. I have a follow-up question. If TSRollBack() is to be called within a TSPostStep, that would set the ts->steprollback to PETSC_TRUE. And since there is a falsity test on TSPreStep in TSSolve, that would prevent TSPreStep from being engaged as long as ts->steprollback is PETSC_TRUE, which it is after being set so in the TSPostStep. So I'm wondering, why is there a falsity test on TSPreStep knowing that it would not be accessible if TSRollBack is called within TSPostStep. This guarantees TSPreStep is called only once before each successful step. Hong Thanks in advance. Yours sincerely On Tue, Feb 4, 2020 at 5:32 PM Zhang, Hong > wrote: > On Feb 2, 2020, at 11:24 AM, Mohammed Ashour > wrote: > > Dear All, > I'm solving a constraint phase-field problem using PetIGA. This question i'm having is more relevant to PETSc, so I'm posting here. > > I have an algorithm involving iterating on the solution vector until certain criteria are met before moving forward for the next time step. The sequence inside the TSSolve is to call TSMonitor first, to print a user-defined set of values and the move to solve at TSStep and then call TSPostEvaluate. > > So I'm using the TSMonitor to update some variables at time n , those variables are used the in the residual and jacobian calculations at time n+1, and then solving and then check if those criteria are met or not in a function assigned to TS via TSSetPostEvaluate, if the criteria are met, it'll move forward, if not, it'll engaged the routine TSRollBack(), which based on my understanding is the proper way yo flag the solver to recalculate n+1. My question is, is this the proper way to do it? what is the difference between TSRollBack and TSRestart? You are right that TSRollBack() recalculates the current time step. But I would not suggest to use TSPostEvaluate in your case. Presumably you are not using the PETSc adaptor (e.g. via -ts_adapt_type none) and want to control the stepsize yourself. You can check the criteria in TSPostStep, call TSRollBack() if the criteria are not met and update the variables accordingly. The variables can also be updated in TSPreStep(), but TSMonitor should not be used since it is designed for read-only operations. TSRestart may be needed when you are using non-self-starting integration methods such as multiple step methods and FSAL RK methods (-ts_rk_type <3bs,5dp,5bs,6vr,7vr,8vr>). These methods rely on solutions or residuals from previous time steps, thus need a flag to hard restart the time integration whenever discontinuity is introduced (e.g. a parameter in the RHS function is changed). So TSRestart sets the flag to tell the integrator to treat the next time step like the first time step in a time integration. Hong (Mr.) > Thanks a lot > > -- -- Mohammed Ashour, M.Sc. PhD Scholar Bauhaus-Universit?t Weimar Institute of Structural Mechanics (ISM) Marienstra?e 7 99423 Weimar, Germany Mobile: +(49) 176 58834667 -------------- next part -------------- An HTML attachment was scrubbed... URL: From yyang85 at stanford.edu Sat Feb 15 21:42:10 2020 From: yyang85 at stanford.edu (Yuyun Yang) Date: Sun, 16 Feb 2020 03:42:10 +0000 Subject: [petsc-users] Matrix-free method in PETSc Message-ID: Hello team, I wanted to apply the Krylov subspace method to a matrix-free implementation of a stencil, such that the iterative method acts on the operation without ever constructing the matrix explicitly (for example, when doing backward Euler). I'm not sure whether there is already an example for that somewhere. If so, could you point me to a relevant example? Thank you! Best regards, Yuyun -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Sat Feb 15 22:51:40 2020 From: jed at jedbrown.org (Jed Brown) Date: Sat, 15 Feb 2020 21:51:40 -0700 Subject: [petsc-users] Matrix-free method in PETSc In-Reply-To: References: Message-ID: <878sl3w2w3.fsf@jedbrown.org> Take most any example, say in src/ts/examples/tutorials/, and run with -ts_type beuler -snes_mf The -snes_mf says to run the nonlinear solver (Newton by default) with an unpreconditioned Krylov method for which the action of the Jacobian is given by matrix-free finite differencing of the residual (which is nonlinear when the dynamical system is). Yuyun Yang writes: > Hello team, > > I wanted to apply the Krylov subspace method to a matrix-free implementation of a stencil, such that the iterative method acts on the operation without ever constructing the matrix explicitly (for example, when doing backward Euler). > > I'm not sure whether there is already an example for that somewhere. If so, could you point me to a relevant example? > > Thank you! > > Best regards, > Yuyun From bsmith at mcs.anl.gov Sun Feb 16 00:02:10 2020 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Sun, 16 Feb 2020 06:02:10 +0000 Subject: [petsc-users] Matrix-free method in PETSc In-Reply-To: References: Message-ID: Yuyun, If you are speaking about using a finite difference stencil on a structured grid where you provide the Jacobian vector products yourself by looping over the grid doing the stencil operation we unfortunately do not have exactly that kind of example. But it is actually not difficult. I suggest starting with src/ts/examples/tests/ex22.c It computes the sparse matrix explicitly with FormIJacobian() What you need to do is instead in main() use MatCreateShell() and MatShellSetOperation(,MATOP_MULT,(void (*)(void))MyMatMult) then provide the routine MyMatMult() to do your stencil based matrix free product; note that you can create this new routine by taking the structure of IFunction() and reorganizing it to do the Jacobian product instead. You will need to get the information about the shell matrix size on each process by calling DMDAGetCorners(). You will then remove the explicit computation of the Jacobian, and also remove the Event stuff since you don't need it. Extending to 2 and 3d is straight forward. Any questions let us know. Barry If you like this would make a great merge request with your code to improve our examples. > On Feb 15, 2020, at 9:42 PM, Yuyun Yang wrote: > > Hello team, > > I wanted to apply the Krylov subspace method to a matrix-free implementation of a stencil, such that the iterative method acts on the operation without ever constructing the matrix explicitly (for example, when doing backward Euler). > > I'm not sure whether there is already an example for that somewhere. If so, could you point me to a relevant example? > > Thank you! > > Best regards, > Yuyun From yyang85 at stanford.edu Sun Feb 16 05:12:13 2020 From: yyang85 at stanford.edu (Yuyun Yang) Date: Sun, 16 Feb 2020 11:12:13 +0000 Subject: [petsc-users] Matrix-free method in PETSc In-Reply-To: References: , Message-ID: Thank you, that is very helpful information indeed! I will try it and send you my code when it works. Best regards, Yuyun ________________________________ From: Smith, Barry F. Sent: Saturday, February 15, 2020 10:02 PM To: Yuyun Yang Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Matrix-free method in PETSc Yuyun, If you are speaking about using a finite difference stencil on a structured grid where you provide the Jacobian vector products yourself by looping over the grid doing the stencil operation we unfortunately do not have exactly that kind of example. But it is actually not difficult. I suggest starting with src/ts/examples/tests/ex22.c It computes the sparse matrix explicitly with FormIJacobian() What you need to do is instead in main() use MatCreateShell() and MatShellSetOperation(,MATOP_MULT,(void (*)(void))MyMatMult) then provide the routine MyMatMult() to do your stencil based matrix free product; note that you can create this new routine by taking the structure of IFunction() and reorganizing it to do the Jacobian product instead. You will need to get the information about the shell matrix size on each process by calling DMDAGetCorners(). You will then remove the explicit computation of the Jacobian, and also remove the Event stuff since you don't need it. Extending to 2 and 3d is straight forward. Any questions let us know. Barry If you like this would make a great merge request with your code to improve our examples. > On Feb 15, 2020, at 9:42 PM, Yuyun Yang wrote: > > Hello team, > > I wanted to apply the Krylov subspace method to a matrix-free implementation of a stencil, such that the iterative method acts on the operation without ever constructing the matrix explicitly (for example, when doing backward Euler). > > I'm not sure whether there is already an example for that somewhere. If so, could you point me to a relevant example? > > Thank you! > > Best regards, > Yuyun -------------- next part -------------- An HTML attachment was scrubbed... URL: From jczhang at mcs.anl.gov Sun Feb 16 23:37:02 2020 From: jczhang at mcs.anl.gov (Junchao Zhang) Date: Sun, 16 Feb 2020 23:37:02 -0600 Subject: [petsc-users] Crash caused by strange error in KSPSetUp In-Reply-To: References: Message-ID: Richard, I managed to get the code Simlul at trophy built. Could you tell me how to run your test? I want to see if I can reproduce the error. Thanks --Junchao Zhang On Fri, Feb 14, 2020 at 8:34 PM Richard Beare wrote: > It doesn't compile out of the box with master. > > singularity def file attached. > > On Sat, 15 Feb 2020 at 08:03, Richard Beare > wrote: > >> I will see if I can build with master. The docs for simulatrophy say >> 3.6.3.1. >> >> On Sat, 15 Feb 2020 at 02:47, Junchao Zhang wrote: >> >>> Which petsc version do you use? In aij.c of the master branch, I saw >>> Barry recently added a useful check to catch number of nonzero overflow, >>> ierr = PetscIntCast(nz64,&nz);CHKERRQ(ierr); But you mentioned using >>> 64-bit indices did not solve the problem, it might not be the reason. You >>> should try the master branch if feasible. Also, vary number of MPI ranks to >>> see if error stack changes. >>> >>> --Junchao Zhang >>> >>> >>> On Fri, Feb 14, 2020 at 5:12 AM Richard Beare via petsc-users < >>> petsc-users at mcs.anl.gov> wrote: >>> >>>> No luck - exactly the same error after including the >>>> --with-64-bit-indicies=yes --download-mpich=yes options >>>> >>>> ==8674== Argument 'size' of function memalign has a fishy (possibly >>>> negative) value: -17152036540 >>>> ==8674== at 0x4C320A6: memalign (in >>>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==8674== by 0x4F0CFF2: PetscMallocAlign(unsigned long, int, char >>>> const*, char const*, void**) (mal.c:28) >>>> ==8674== by 0x4F0F716: PetscTrMallocDefault(unsigned long, int, char >>>> const*, char const*, void**) (mtr.c:188) >>>> ==8674== by 0x569AF3E: MatSeqAIJSetPreallocation_SeqAIJ (aij.c:3595) >>>> ==8674== by 0x569A531: MatSeqAIJSetPreallocation (aij.c:3539) >>>> ==8674== by 0x599080A: DMCreateMatrix_DA_3d_MPIAIJ(_p_DM*, _p_Mat*) >>>> (fdda.c:1085) >>>> ==8674== by 0x598B937: DMCreateMatrix_DA(_p_DM*, _p_Mat**) >>>> (fdda.c:759) >>>> ==8674== by 0x58A2BF2: DMCreateMatrix (dm.c:956) >>>> ==8674== by 0x5E377B3: KSPSetUp (itfunc.c:262) >>>> ==8674== by 0x409FFC: PetscAdLemTaras3D::solveModel(bool) >>>> (PetscAdLemTaras3D.hxx:255) >>>> ==8674== by 0x4239FB: AdLem3D<3u>::solveModel(bool, bool, bool) >>>> (AdLem3D.hxx:551) >>>> ==8674== by 0x41BD17: main (PetscAdLemMain.cxx:344) >>>> ==8674== >>>> On Fri, 14 Feb 2020 at 17:07, Smith, Barry F. >>>> wrote: >>>> >>>>> >>>>> Richard, >>>>> >>>>> It is likely that for these problems some of the integers become >>>>> too large for the int variable to hold them, thus they overflow and become >>>>> negative. >>>>> >>>>> You should make a new PETSC_ARCH configuration of PETSc that uses >>>>> the configure option --with-64-bit-indices, this will change PETSc to use >>>>> 64 bit integers which will not overflow. >>>>> >>>>> Good luck and let us know how it works out >>>>> >>>>> Barry >>>>> >>>>> Probably the code is built with an older version of PETSc; the >>>>> later versions should produce a more useful error message. >>>>> >>>>> > On Feb 13, 2020, at 11:43 PM, Richard Beare via petsc-users < >>>>> petsc-users at mcs.anl.gov> wrote: >>>>> > >>>>> > Hi Everyone, >>>>> > I am experimenting with the Simlul at trophy tool ( >>>>> https://github.com/Inria-Asclepios/simul-atrophy) that uses petsc to >>>>> simulate brain atrophy based on segmented MRI data. I am not the author. I >>>>> have this running on most of a dataset of about 50 scans, but experience >>>>> crashes with several that I am trying to track down. However I am out of >>>>> ideas. The problem images are slightly bigger than some of the successful >>>>> ones, but not substantially so, and I have experimented on machines with >>>>> sufficient RAM. The error happens very quickly, as part of setup - see the >>>>> valgrind report below. I haven't managed to get the sgcheck tool to work >>>>> yet. I can only guess that the ksp object is somehow becoming corrupted >>>>> during the setup process, but the array sizes that I can track (which >>>>> derive from image sizes), appear correct at every point I can check. Any >>>>> suggestions as to how I can check what might go wrong in the setup of the >>>>> ksp object? >>>>> > Thankyou. >>>>> > >>>>> > valgrind tells me: >>>>> > >>>>> > ==18175== Argument 'size' of function memalign has a fishy (possibly >>>>> negative) value: -17152038144 >>>>> > ==18175== at 0x4C320A6: memalign (in >>>>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) >>>>> > ==18175== by 0x4F0F1F2: PetscMallocAlign(unsigned long, int, char >>>>> const*, char const*, void**) (mal.c:28) >>>>> > ==18175== by 0x56B43CA: MatSeqAIJSetPreallocation_SeqAIJ >>>>> (aij.c:3595) >>>>> > ==18175== by 0x56B39BD: MatSeqAIJSetPreallocation (aij.c:3539) >>>>> > ==18175== by 0x59A9B44: DMCreateMatrix_DA_3d_MPIAIJ(_p_DM*, >>>>> _p_Mat*) (fdda.c:1085) >>>>> > ==18175== by 0x59A4C71: DMCreateMatrix_DA(_p_DM*, _p_Mat**) >>>>> (fdda.c:759) >>>>> > ==18175== by 0x58BBD29: DMCreateMatrix (dm.c:956) >>>>> > ==18175== by 0x5E509D5: KSPSetUp (itfunc.c:262) >>>>> > ==18175== by 0x40A3DE: PetscAdLemTaras3D::solveModel(bool) >>>>> (PetscAdLemTaras3D.hxx:269) >>>>> > ==18175== by 0x42413F: AdLem3D<3u>::solveModel(bool, bool, bool) >>>>> (AdLem3D.hxx:552) >>>>> > ==18175== by 0x41C25C: main (PetscAdLemMain.cxx:349) >>>>> > ==18175== >>>>> > >>>>> > -- >>>>> > -- >>>>> > A/Prof Richard Beare >>>>> > Imaging and Bioinformatics, Peninsula Clinical School >>>>> > orcid.org/0000-0002-7530-5664 >>>>> > Richard.Beare at monash.edu >>>>> > +61 3 9788 1724 >>>>> > >>>>> > >>>>> > >>>>> > Geospatial Research: >>>>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis >>>>> >>>>> >>>> >>>> -- >>>> -- >>>> A/Prof Richard Beare >>>> Imaging and Bioinformatics, Peninsula Clinical School >>>> orcid.org/0000-0002-7530-5664 >>>> Richard.Beare at monash.edu >>>> +61 3 9788 1724 >>>> >>>> >>>> >>>> Geospatial Research: >>>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis >>>> >>> >> >> -- >> -- >> A/Prof Richard Beare >> Imaging and Bioinformatics, Peninsula Clinical School >> orcid.org/0000-0002-7530-5664 >> Richard.Beare at monash.edu >> +61 3 9788 1724 >> >> >> >> Geospatial Research: >> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis >> > > > -- > -- > A/Prof Richard Beare > Imaging and Bioinformatics, Peninsula Clinical School > orcid.org/0000-0002-7530-5664 > Richard.Beare at monash.edu > +61 3 9788 1724 > > > > Geospatial Research: > https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eda.oktay at metu.edu.tr Mon Feb 17 02:35:59 2020 From: eda.oktay at metu.edu.tr (Eda Oktay) Date: Mon, 17 Feb 2020 11:35:59 +0300 Subject: [petsc-users] Forming a matrix from vectors Message-ID: Hello all, I am trying to form a matrix whose columns are eigenvectors I have calculated before U = [v1,v2,v3]. Is there any easy way of forming this matrix? My matrix should be parallel and I have created vectors as below, where nev i s the number of requested eigenvalues. So each V[i] represents an eigenvector and I should form a matrix by using V. Vec *V; VecDuplicateVecs(vr,nev,&V); for (i=0; i From jroman at dsic.upv.es Mon Feb 17 03:24:30 2020 From: jroman at dsic.upv.es (Jose E. Roman) Date: Mon, 17 Feb 2020 10:24:30 +0100 Subject: [petsc-users] Forming a matrix from vectors In-Reply-To: References: Message-ID: I would use MatDenseGetColumn() and VecGetArrayRead() to get the two pointers and then copy the values with a loop. Jose > El 17 feb 2020, a las 9:35, Eda Oktay escribi?: > > Hello all, > > I am trying to form a matrix whose columns are eigenvectors I have calculated before U = [v1,v2,v3]. Is there any easy way of forming this matrix? My matrix should be parallel and I have created vectors as below, where nev i s the number of requested eigenvalues. So each V[i] represents an eigenvector and I should form a matrix by using V. > > Vec *V; > VecDuplicateVecs(vr,nev,&V); > for (i=0; i ierr = EPSGetEigenpair(eps,i,&kr,NULL,V[i],NULL); > } > > Thanks! > > Eda From knepley at gmail.com Mon Feb 17 06:01:42 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 17 Feb 2020 07:01:42 -0500 Subject: [petsc-users] Forming a matrix from vectors In-Reply-To: References: Message-ID: On Mon, Feb 17, 2020 at 4:24 AM Jose E. Roman wrote: > I would use MatDenseGetColumn() and VecGetArrayRead() to get the two > pointers and then copy the values with a loop. > I would do as Jose says to get it working. After you verify it, we could show you how to avoid a copy. Thanks, Matt > Jose > > > El 17 feb 2020, a las 9:35, Eda Oktay escribi?: > > > > Hello all, > > > > I am trying to form a matrix whose columns are eigenvectors I have > calculated before U = [v1,v2,v3]. Is there any easy way of forming this > matrix? My matrix should be parallel and I have created vectors as below, > where nev i s the number of requested eigenvalues. So each V[i] represents > an eigenvector and I should form a matrix by using V. > > > > Vec *V; > > VecDuplicateVecs(vr,nev,&V); > > for (i=0; i > ierr = EPSGetEigenpair(eps,i,&kr,NULL,V[i],NULL); > > } > > > > Thanks! > > > > Eda > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From yyang85 at stanford.edu Mon Feb 17 07:56:43 2020 From: yyang85 at stanford.edu (Yuyun Yang) Date: Mon, 17 Feb 2020 13:56:43 +0000 Subject: [petsc-users] Matrix-free method in PETSc In-Reply-To: References: , , Message-ID: Hello, I actually have a question about the usage of DMDA since I'm quite new to this. I wonder if the DMDA suite of functions can be directly called on vectors created from VecCreate? Or the vectors have to be formed by DMDACreateGlobalVector? I'm also not sure about what the dof and stencil width arguments do. I'm still unsure about the usage of MatCreateShell and MatShellSetOperation, since it seems that MyMatMult should still have 3 inputs just like MatMult (the matrix and two vectors). Since I'm not forming the matrix, does that mean the matrix input is meaningless but still needs to exist for the sake of this format? After I create such a shell matrix, can I use it like a regular matrix in KSP and utilize preconditioners? Thanks! Yuyun ________________________________ From: petsc-users on behalf of Yuyun Yang Sent: Sunday, February 16, 2020 3:12 AM To: Smith, Barry F. Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Matrix-free method in PETSc Thank you, that is very helpful information indeed! I will try it and send you my code when it works. Best regards, Yuyun ________________________________ From: Smith, Barry F. Sent: Saturday, February 15, 2020 10:02 PM To: Yuyun Yang Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Matrix-free method in PETSc Yuyun, If you are speaking about using a finite difference stencil on a structured grid where you provide the Jacobian vector products yourself by looping over the grid doing the stencil operation we unfortunately do not have exactly that kind of example. But it is actually not difficult. I suggest starting with src/ts/examples/tests/ex22.c It computes the sparse matrix explicitly with FormIJacobian() What you need to do is instead in main() use MatCreateShell() and MatShellSetOperation(,MATOP_MULT,(void (*)(void))MyMatMult) then provide the routine MyMatMult() to do your stencil based matrix free product; note that you can create this new routine by taking the structure of IFunction() and reorganizing it to do the Jacobian product instead. You will need to get the information about the shell matrix size on each process by calling DMDAGetCorners(). You will then remove the explicit computation of the Jacobian, and also remove the Event stuff since you don't need it. Extending to 2 and 3d is straight forward. Any questions let us know. Barry If you like this would make a great merge request with your code to improve our examples. > On Feb 15, 2020, at 9:42 PM, Yuyun Yang wrote: > > Hello team, > > I wanted to apply the Krylov subspace method to a matrix-free implementation of a stencil, such that the iterative method acts on the operation without ever constructing the matrix explicitly (for example, when doing backward Euler). > > I'm not sure whether there is already an example for that somewhere. If so, could you point me to a relevant example? > > Thank you! > > Best regards, > Yuyun -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Feb 17 09:19:21 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 17 Feb 2020 10:19:21 -0500 Subject: [petsc-users] Matrix-free method in PETSc In-Reply-To: References: Message-ID: On Mon, Feb 17, 2020 at 8:56 AM Yuyun Yang wrote: > Hello, > > I actually have a question about the usage of DMDA since I'm quite new to > this. I wonder if the DMDA suite of functions can be directly called on > vectors created from VecCreate? Or the vectors have to be formed by > DMDACreateGlobalVector? > Most things work the same. About the only thing that is different is that we set a special viewer for vectors from DMDACreateGlobalVector() which puts it in lexicographic order on output. > I'm also not sure about what the dof and stencil width arguments do. > 'dof' is how many unknowns lie at each vertex. 'sw' is the width of the ghost region for local vectors. > I'm still unsure about the usage of MatCreateShell and > MatShellSetOperation, since it seems that MyMatMult should still have 3 > inputs just like MatMult (the matrix and two vectors). > MatShell is a type where you provide your own function implementations, rather than using those for a particular storage format, like AIJ. > Since I'm not forming the matrix, does that mean the matrix input is > meaningless but still needs to exist for the sake of this format? > It means you calculate the output yourself, using the input. > After I create such a shell matrix, can I use it like a regular matrix in > KSP and utilize preconditioners? > Many preconditioners want access to individual elements of the matrix, which usually will not work with shell matrices, since the user just wants to provide the multiply routine. Thanks, Matt > Thanks! > Yuyun > ------------------------------ > *From:* petsc-users on behalf of Yuyun > Yang > *Sent:* Sunday, February 16, 2020 3:12 AM > *To:* Smith, Barry F. > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Matrix-free method in PETSc > > Thank you, that is very helpful information indeed! I will try it and send > you my code when it works. > > Best regards, > Yuyun > ------------------------------ > *From:* Smith, Barry F. > *Sent:* Saturday, February 15, 2020 10:02 PM > *To:* Yuyun Yang > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Matrix-free method in PETSc > > Yuyun, > > If you are speaking about using a finite difference stencil on a > structured grid where you provide the Jacobian vector products yourself by > looping over the grid doing the stencil operation we unfortunately do not > have exactly that kind of example. > > But it is actually not difficult. I suggest starting with > src/ts/examples/tests/ex22.c It computes the sparse matrix explicitly with > FormIJacobian() > > What you need to do is instead in main() use MatCreateShell() and > MatShellSetOperation(,MATOP_MULT,(void (*)(void))MyMatMult) then provide > the routine MyMatMult() to do your stencil based matrix free product; note > that you can create this new routine by taking the structure of IFunction() > and reorganizing it to do the Jacobian product instead. You will need to > get the information about the shell matrix size on each process by calling > DMDAGetCorners(). > > You will then remove the explicit computation of the Jacobian, and > also remove the Event stuff since you don't need it. > > Extending to 2 and 3d is straight forward. > > Any questions let us know. > > Barry > > If you like this would make a great merge request with your code to > improve our examples. > > > > On Feb 15, 2020, at 9:42 PM, Yuyun Yang wrote: > > > > Hello team, > > > > I wanted to apply the Krylov subspace method to a matrix-free > implementation of a stencil, such that the iterative method acts on the > operation without ever constructing the matrix explicitly (for example, > when doing backward Euler). > > > > I'm not sure whether there is already an example for that somewhere. If > so, could you point me to a relevant example? > > > > Thank you! > > > > Best regards, > > Yuyun > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jczhang at mcs.anl.gov Mon Feb 17 10:19:15 2020 From: jczhang at mcs.anl.gov (Junchao Zhang) Date: Mon, 17 Feb 2020 10:19:15 -0600 Subject: [petsc-users] Crash caused by strange error in KSPSetUp In-Reply-To: References: Message-ID: Hi, Richard, I tested the case you sent over and found it did fail due to the 32-bit overflow on number of non-zeros, and with a 64-bit built petsc it passed. You had a typo when you reported that --with-64-bit-indicies=yes failed. It should be --with-64-bit-indices=yes. You can go with a 64-bit built petsc, or you can go with parallel computing and run with multiple MPI ranks so that each rank has less non-zeros and it is faster (but you need to make sure that code is correctly parallelized). Barry's recent fix ierr = PetscIntCast(nz64,&nz);CHKERRQ(ierr); would print more useful error messages in this case. Barry, should we patch it back to 3.6.3? --Junchao Zhang On Sun, Feb 16, 2020 at 11:37 PM Junchao Zhang wrote: > Richard, > I managed to get the code Simlul at trophy built. Could you tell me how to > run your test? I want to see if I can reproduce the error. Thanks > > --Junchao Zhang > > > On Fri, Feb 14, 2020 at 8:34 PM Richard Beare > wrote: > >> It doesn't compile out of the box with master. >> >> singularity def file attached. >> >> On Sat, 15 Feb 2020 at 08:03, Richard Beare >> wrote: >> >>> I will see if I can build with master. The docs for simulatrophy say >>> 3.6.3.1. >>> >>> On Sat, 15 Feb 2020 at 02:47, Junchao Zhang wrote: >>> >>>> Which petsc version do you use? In aij.c of the master branch, I saw >>>> Barry recently added a useful check to catch number of nonzero overflow, >>>> ierr = PetscIntCast(nz64,&nz);CHKERRQ(ierr); But you mentioned using >>>> 64-bit indices did not solve the problem, it might not be the reason. You >>>> should try the master branch if feasible. Also, vary number of MPI ranks to >>>> see if error stack changes. >>>> >>>> --Junchao Zhang >>>> >>>> >>>> On Fri, Feb 14, 2020 at 5:12 AM Richard Beare via petsc-users < >>>> petsc-users at mcs.anl.gov> wrote: >>>> >>>>> No luck - exactly the same error after including the >>>>> --with-64-bit-indicies=yes --download-mpich=yes options >>>>> >>>>> ==8674== Argument 'size' of function memalign has a fishy (possibly >>>>> negative) value: -17152036540 >>>>> ==8674== at 0x4C320A6: memalign (in >>>>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) >>>>> ==8674== by 0x4F0CFF2: PetscMallocAlign(unsigned long, int, char >>>>> const*, char const*, void**) (mal.c:28) >>>>> ==8674== by 0x4F0F716: PetscTrMallocDefault(unsigned long, int, >>>>> char const*, char const*, void**) (mtr.c:188) >>>>> ==8674== by 0x569AF3E: MatSeqAIJSetPreallocation_SeqAIJ (aij.c:3595) >>>>> ==8674== by 0x569A531: MatSeqAIJSetPreallocation (aij.c:3539) >>>>> ==8674== by 0x599080A: DMCreateMatrix_DA_3d_MPIAIJ(_p_DM*, _p_Mat*) >>>>> (fdda.c:1085) >>>>> ==8674== by 0x598B937: DMCreateMatrix_DA(_p_DM*, _p_Mat**) >>>>> (fdda.c:759) >>>>> ==8674== by 0x58A2BF2: DMCreateMatrix (dm.c:956) >>>>> ==8674== by 0x5E377B3: KSPSetUp (itfunc.c:262) >>>>> ==8674== by 0x409FFC: PetscAdLemTaras3D::solveModel(bool) >>>>> (PetscAdLemTaras3D.hxx:255) >>>>> ==8674== by 0x4239FB: AdLem3D<3u>::solveModel(bool, bool, bool) >>>>> (AdLem3D.hxx:551) >>>>> ==8674== by 0x41BD17: main (PetscAdLemMain.cxx:344) >>>>> ==8674== >>>>> On Fri, 14 Feb 2020 at 17:07, Smith, Barry F. >>>>> wrote: >>>>> >>>>>> >>>>>> Richard, >>>>>> >>>>>> It is likely that for these problems some of the integers become >>>>>> too large for the int variable to hold them, thus they overflow and become >>>>>> negative. >>>>>> >>>>>> You should make a new PETSC_ARCH configuration of PETSc that >>>>>> uses the configure option --with-64-bit-indices, this will change PETSc to >>>>>> use 64 bit integers which will not overflow. >>>>>> >>>>>> Good luck and let us know how it works out >>>>>> >>>>>> Barry >>>>>> >>>>>> Probably the code is built with an older version of PETSc; the >>>>>> later versions should produce a more useful error message. >>>>>> >>>>>> > On Feb 13, 2020, at 11:43 PM, Richard Beare via petsc-users < >>>>>> petsc-users at mcs.anl.gov> wrote: >>>>>> > >>>>>> > Hi Everyone, >>>>>> > I am experimenting with the Simlul at trophy tool ( >>>>>> https://github.com/Inria-Asclepios/simul-atrophy) that uses petsc to >>>>>> simulate brain atrophy based on segmented MRI data. I am not the author. I >>>>>> have this running on most of a dataset of about 50 scans, but experience >>>>>> crashes with several that I am trying to track down. However I am out of >>>>>> ideas. The problem images are slightly bigger than some of the successful >>>>>> ones, but not substantially so, and I have experimented on machines with >>>>>> sufficient RAM. The error happens very quickly, as part of setup - see the >>>>>> valgrind report below. I haven't managed to get the sgcheck tool to work >>>>>> yet. I can only guess that the ksp object is somehow becoming corrupted >>>>>> during the setup process, but the array sizes that I can track (which >>>>>> derive from image sizes), appear correct at every point I can check. Any >>>>>> suggestions as to how I can check what might go wrong in the setup of the >>>>>> ksp object? >>>>>> > Thankyou. >>>>>> > >>>>>> > valgrind tells me: >>>>>> > >>>>>> > ==18175== Argument 'size' of function memalign has a fishy >>>>>> (possibly negative) value: -17152038144 >>>>>> > ==18175== at 0x4C320A6: memalign (in >>>>>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) >>>>>> > ==18175== by 0x4F0F1F2: PetscMallocAlign(unsigned long, int, >>>>>> char const*, char const*, void**) (mal.c:28) >>>>>> > ==18175== by 0x56B43CA: MatSeqAIJSetPreallocation_SeqAIJ >>>>>> (aij.c:3595) >>>>>> > ==18175== by 0x56B39BD: MatSeqAIJSetPreallocation (aij.c:3539) >>>>>> > ==18175== by 0x59A9B44: DMCreateMatrix_DA_3d_MPIAIJ(_p_DM*, >>>>>> _p_Mat*) (fdda.c:1085) >>>>>> > ==18175== by 0x59A4C71: DMCreateMatrix_DA(_p_DM*, _p_Mat**) >>>>>> (fdda.c:759) >>>>>> > ==18175== by 0x58BBD29: DMCreateMatrix (dm.c:956) >>>>>> > ==18175== by 0x5E509D5: KSPSetUp (itfunc.c:262) >>>>>> > ==18175== by 0x40A3DE: PetscAdLemTaras3D::solveModel(bool) >>>>>> (PetscAdLemTaras3D.hxx:269) >>>>>> > ==18175== by 0x42413F: AdLem3D<3u>::solveModel(bool, bool, bool) >>>>>> (AdLem3D.hxx:552) >>>>>> > ==18175== by 0x41C25C: main (PetscAdLemMain.cxx:349) >>>>>> > ==18175== >>>>>> > >>>>>> > -- >>>>>> > -- >>>>>> > A/Prof Richard Beare >>>>>> > Imaging and Bioinformatics, Peninsula Clinical School >>>>>> > orcid.org/0000-0002-7530-5664 >>>>>> > Richard.Beare at monash.edu >>>>>> > +61 3 9788 1724 >>>>>> > >>>>>> > >>>>>> > >>>>>> > Geospatial Research: >>>>>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis >>>>>> >>>>>> >>>>> >>>>> -- >>>>> -- >>>>> A/Prof Richard Beare >>>>> Imaging and Bioinformatics, Peninsula Clinical School >>>>> orcid.org/0000-0002-7530-5664 >>>>> Richard.Beare at monash.edu >>>>> +61 3 9788 1724 >>>>> >>>>> >>>>> >>>>> Geospatial Research: >>>>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis >>>>> >>>> >>> >>> -- >>> -- >>> A/Prof Richard Beare >>> Imaging and Bioinformatics, Peninsula Clinical School >>> orcid.org/0000-0002-7530-5664 >>> Richard.Beare at monash.edu >>> +61 3 9788 1724 >>> >>> >>> >>> Geospatial Research: >>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis >>> >> >> >> -- >> -- >> A/Prof Richard Beare >> Imaging and Bioinformatics, Peninsula Clinical School >> orcid.org/0000-0002-7530-5664 >> Richard.Beare at monash.edu >> +61 3 9788 1724 >> >> >> >> Geospatial Research: >> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From juaneah at gmail.com Mon Feb 17 12:19:07 2020 From: juaneah at gmail.com (Emmanuel Ayala) Date: Mon, 17 Feb 2020 12:19:07 -0600 Subject: [petsc-users] SLEPc: The inner product is not well defined In-Reply-To: References: Message-ID: Thank you very much for the answer. This error appears when computing the B-norm of a vector x, as > sqrt(x'*B*x). Probably your B matrix is semi-definite, and due to > floating-point error the value x'*B*x becomes negative for a certain vector > x. The code uses a tolerance of 10*PETSC_MACHINE_EPSILON, but it seems the > rounding errors are larger in your case. Or maybe your B-matrix is > indefinite, in which case you should solve the problem as non-symmetric (or > as symmetric-indefinite GHIEP). > > Do you get the same problem with the Krylov-Schur solver? > > After check the input matrices, the problem was solved using GHIEP. > A workaround is to edit the source code and remove the check or increase > the tolerance, but this may be catastrophic if your B is indefinite. A > better solution is to reformulate the problem, solving the matrix pair > (A,C) where C=alpha*A+beta*B is positive definite (note that then the > eigenvalues become lambda/(beta+alpha*lambda)). > > Ok, there is a rule to choose the values for alpha and beta? Kind regards. Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From juaneah at gmail.com Mon Feb 17 12:35:38 2020 From: juaneah at gmail.com (Emmanuel Ayala) Date: Mon, 17 Feb 2020 12:35:38 -0600 Subject: [petsc-users] BCs for a EPS solver Message-ID: Hi everyone, I have an eigenvalue problem where I need to apply BCs to the stiffness and mass matrix. Usually, for KSP solver, it is enough to set to zero the rows and columns related to the boundary conditions. I used to apply it with MatZeroRowsColumns, with a 1s on the diagonal. Then the solver works well. There is something similar to KSP for EPS solver ? I already used MatZeroRowsColumns (for EPS solver), with a 1s on the diagonal, and I got wrong result. Kind regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeremy at seamplex.com Mon Feb 17 12:39:47 2020 From: jeremy at seamplex.com (Jeremy Theler) Date: Mon, 17 Feb 2020 15:39:47 -0300 Subject: [petsc-users] BCs for a EPS solver In-Reply-To: References: Message-ID: The usual trick is to set ones in one matrix and zeros in the other one. On Mon, 2020-02-17 at 12:35 -0600, Emmanuel Ayala wrote: > Hi everyone, > > I have an eigenvalue problem where I need to apply BCs to the > stiffness and mass matrix. > > Usually, for KSP solver, it is enough to set to zero the rows and > columns related to the boundary conditions. I used to apply it with > MatZeroRowsColumns, with a 1s on the diagonal. Then the solver works > well. > > There is something similar to KSP for EPS solver ? > > I already used MatZeroRowsColumns (for EPS solver), with a 1s on the > diagonal, and I got wrong result. > > Kind regards. > > > > From juaneah at gmail.com Mon Feb 17 12:57:50 2020 From: juaneah at gmail.com (Emmanuel Ayala) Date: Mon, 17 Feb 2020 12:57:50 -0600 Subject: [petsc-users] BCs for a EPS solver In-Reply-To: References: Message-ID: Hi, thanks for the quick answer. I just did it, and it does not work. My problem is GNHEP and I use the default solver (Krylov-Schur). Moreover I run the code with the options: -st_ksp_type preonly -st_pc_type lu -st_pc_factor_mat_solver_type mumps Any other suggestions? Kind regards. El lun., 17 de feb. de 2020 a la(s) 12:39, Jeremy Theler ( jeremy at seamplex.com) escribi?: > The usual trick is to set ones in one matrix and zeros in the other > one. > > > On Mon, 2020-02-17 at 12:35 -0600, Emmanuel Ayala wrote: > > Hi everyone, > > > > I have an eigenvalue problem where I need to apply BCs to the > > stiffness and mass matrix. > > > > Usually, for KSP solver, it is enough to set to zero the rows and > > columns related to the boundary conditions. I used to apply it with > > MatZeroRowsColumns, with a 1s on the diagonal. Then the solver works > > well. > > > > There is something similar to KSP for EPS solver ? > > > > I already used MatZeroRowsColumns (for EPS solver), with a 1s on the > > diagonal, and I got wrong result. > > > > Kind regards. > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Feb 17 13:20:26 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 17 Feb 2020 14:20:26 -0500 Subject: [petsc-users] BCs for a EPS solver In-Reply-To: References: Message-ID: On Mon, Feb 17, 2020 at 1:59 PM Emmanuel Ayala wrote: > Hi, thanks for the quick answer. > > I just did it, and it does not work. My problem is GNHEP and I use the > default solver (Krylov-Schur). Moreover I run the code with the options: > -st_ksp_type preonly -st_pc_type lu -st_pc_factor_mat_solver_type mumps > I guess a better question is: What do you expect to work? For a linear solve, A x = b if a row i is 0 except for a one on the diagonal, then I get x_i = b_i so hopefully you put the correct boundary value in b_i. For the generalized eigenproblem A x = \lambda B x if you set row i to the identity in A, and zero in B, we get x_i = 0 and you must put the boundary values into x after you have finished the solve. Is this what you did? Thanks, Matt > Any other suggestions? > Kind regards. > > El lun., 17 de feb. de 2020 a la(s) 12:39, Jeremy Theler ( > jeremy at seamplex.com) escribi?: > >> The usual trick is to set ones in one matrix and zeros in the other >> one. >> >> >> On Mon, 2020-02-17 at 12:35 -0600, Emmanuel Ayala wrote: >> > Hi everyone, >> > >> > I have an eigenvalue problem where I need to apply BCs to the >> > stiffness and mass matrix. >> > >> > Usually, for KSP solver, it is enough to set to zero the rows and >> > columns related to the boundary conditions. I used to apply it with >> > MatZeroRowsColumns, with a 1s on the diagonal. Then the solver works >> > well. >> > >> > There is something similar to KSP for EPS solver ? >> > >> > I already used MatZeroRowsColumns (for EPS solver), with a 1s on the >> > diagonal, and I got wrong result. >> > >> > Kind regards. >> > >> > >> > >> > >> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongzhang at anl.gov Mon Feb 17 13:44:14 2020 From: hongzhang at anl.gov (Zhang, Hong) Date: Mon, 17 Feb 2020 19:44:14 +0000 Subject: [petsc-users] Forming a matrix from vectors In-Reply-To: References: Message-ID: <74DB1A51-307D-4A44-9FD5-196FCAB435D0@anl.gov> You can create a dense matrix and use VecPlaceArray() to take a column out of the matrix as a vector. For example, MatCreateDense() MatDenseGetColumn(A,0,col) VecPlaceArray(v,col) ? // fill in the vector with values VecResetArray(v) MatDenseRestoreColumn(A,&col) Hong (Mr.) > On Feb 17, 2020, at 2:35 AM, Eda Oktay wrote: > > Hello all, > > I am trying to form a matrix whose columns are eigenvectors I have calculated before U = [v1,v2,v3]. Is there any easy way of forming this matrix? My matrix should be parallel and I have created vectors as below, where nev i s the number of requested eigenvalues. So each V[i] represents an eigenvector and I should form a matrix by using V. > > Vec *V; > VecDuplicateVecs(vr,nev,&V); > for (i=0; i ierr = EPSGetEigenpair(eps,i,&kr,NULL,V[i],NULL); > } > > Thanks! > > Eda From richard.beare at monash.edu Mon Feb 17 14:16:04 2020 From: richard.beare at monash.edu (Richard Beare) Date: Tue, 18 Feb 2020 07:16:04 +1100 Subject: [petsc-users] Crash caused by strange error in KSPSetUp In-Reply-To: References: Message-ID: Awesome - thanks for that. I will check it out. I will also look at what needs to be done to bring simulatrophy to a more recent version of petsc. On Tue, 18 Feb 2020 at 03:19, Junchao Zhang wrote: > Hi, Richard, > I tested the case you sent over and found it did fail due to the 32-bit > overflow on number of non-zeros, and with a 64-bit built petsc it passed. > You had a typo when you reported that --with-64-bit-indicies=yes failed. It > should be --with-64-bit-indices=yes. > You can go with a 64-bit built petsc, or you can go with parallel > computing and run with multiple MPI ranks so that each rank has less > non-zeros and it is faster (but you need to make sure that code is > correctly parallelized). > Barry's recent fix ierr = PetscIntCast(nz64,&nz);CHKERRQ(ierr); would > print more useful error messages in this case. Barry, should we patch it > back to 3.6.3? > > --Junchao Zhang > > > On Sun, Feb 16, 2020 at 11:37 PM Junchao Zhang > wrote: > >> Richard, >> I managed to get the code Simlul at trophy built. Could you tell me how >> to run your test? I want to see if I can reproduce the error. Thanks >> >> --Junchao Zhang >> >> >> On Fri, Feb 14, 2020 at 8:34 PM Richard Beare >> wrote: >> >>> It doesn't compile out of the box with master. >>> >>> singularity def file attached. >>> >>> On Sat, 15 Feb 2020 at 08:03, Richard Beare >>> wrote: >>> >>>> I will see if I can build with master. The docs for simulatrophy say >>>> 3.6.3.1. >>>> >>>> On Sat, 15 Feb 2020 at 02:47, Junchao Zhang >>>> wrote: >>>> >>>>> Which petsc version do you use? In aij.c of the master branch, I saw >>>>> Barry recently added a useful check to catch number of nonzero overflow, >>>>> ierr = PetscIntCast(nz64,&nz);CHKERRQ(ierr); But you mentioned using >>>>> 64-bit indices did not solve the problem, it might not be the reason. You >>>>> should try the master branch if feasible. Also, vary number of MPI ranks to >>>>> see if error stack changes. >>>>> >>>>> --Junchao Zhang >>>>> >>>>> >>>>> On Fri, Feb 14, 2020 at 5:12 AM Richard Beare via petsc-users < >>>>> petsc-users at mcs.anl.gov> wrote: >>>>> >>>>>> No luck - exactly the same error after including the >>>>>> --with-64-bit-indicies=yes --download-mpich=yes options >>>>>> >>>>>> ==8674== Argument 'size' of function memalign has a fishy (possibly >>>>>> negative) value: -17152036540 >>>>>> ==8674== at 0x4C320A6: memalign (in >>>>>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) >>>>>> ==8674== by 0x4F0CFF2: PetscMallocAlign(unsigned long, int, char >>>>>> const*, char const*, void**) (mal.c:28) >>>>>> ==8674== by 0x4F0F716: PetscTrMallocDefault(unsigned long, int, >>>>>> char const*, char const*, void**) (mtr.c:188) >>>>>> ==8674== by 0x569AF3E: MatSeqAIJSetPreallocation_SeqAIJ >>>>>> (aij.c:3595) >>>>>> ==8674== by 0x569A531: MatSeqAIJSetPreallocation (aij.c:3539) >>>>>> ==8674== by 0x599080A: DMCreateMatrix_DA_3d_MPIAIJ(_p_DM*, >>>>>> _p_Mat*) (fdda.c:1085) >>>>>> ==8674== by 0x598B937: DMCreateMatrix_DA(_p_DM*, _p_Mat**) >>>>>> (fdda.c:759) >>>>>> ==8674== by 0x58A2BF2: DMCreateMatrix (dm.c:956) >>>>>> ==8674== by 0x5E377B3: KSPSetUp (itfunc.c:262) >>>>>> ==8674== by 0x409FFC: PetscAdLemTaras3D::solveModel(bool) >>>>>> (PetscAdLemTaras3D.hxx:255) >>>>>> ==8674== by 0x4239FB: AdLem3D<3u>::solveModel(bool, bool, bool) >>>>>> (AdLem3D.hxx:551) >>>>>> ==8674== by 0x41BD17: main (PetscAdLemMain.cxx:344) >>>>>> ==8674== >>>>>> On Fri, 14 Feb 2020 at 17:07, Smith, Barry F. >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> Richard, >>>>>>> >>>>>>> It is likely that for these problems some of the integers >>>>>>> become too large for the int variable to hold them, thus they overflow and >>>>>>> become negative. >>>>>>> >>>>>>> You should make a new PETSC_ARCH configuration of PETSc that >>>>>>> uses the configure option --with-64-bit-indices, this will change PETSc to >>>>>>> use 64 bit integers which will not overflow. >>>>>>> >>>>>>> Good luck and let us know how it works out >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>> Probably the code is built with an older version of PETSc; the >>>>>>> later versions should produce a more useful error message. >>>>>>> >>>>>>> > On Feb 13, 2020, at 11:43 PM, Richard Beare via petsc-users < >>>>>>> petsc-users at mcs.anl.gov> wrote: >>>>>>> > >>>>>>> > Hi Everyone, >>>>>>> > I am experimenting with the Simlul at trophy tool ( >>>>>>> https://github.com/Inria-Asclepios/simul-atrophy) that uses petsc >>>>>>> to simulate brain atrophy based on segmented MRI data. I am not the author. >>>>>>> I have this running on most of a dataset of about 50 scans, but experience >>>>>>> crashes with several that I am trying to track down. However I am out of >>>>>>> ideas. The problem images are slightly bigger than some of the successful >>>>>>> ones, but not substantially so, and I have experimented on machines with >>>>>>> sufficient RAM. The error happens very quickly, as part of setup - see the >>>>>>> valgrind report below. I haven't managed to get the sgcheck tool to work >>>>>>> yet. I can only guess that the ksp object is somehow becoming corrupted >>>>>>> during the setup process, but the array sizes that I can track (which >>>>>>> derive from image sizes), appear correct at every point I can check. Any >>>>>>> suggestions as to how I can check what might go wrong in the setup of the >>>>>>> ksp object? >>>>>>> > Thankyou. >>>>>>> > >>>>>>> > valgrind tells me: >>>>>>> > >>>>>>> > ==18175== Argument 'size' of function memalign has a fishy >>>>>>> (possibly negative) value: -17152038144 >>>>>>> > ==18175== at 0x4C320A6: memalign (in >>>>>>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) >>>>>>> > ==18175== by 0x4F0F1F2: PetscMallocAlign(unsigned long, int, >>>>>>> char const*, char const*, void**) (mal.c:28) >>>>>>> > ==18175== by 0x56B43CA: MatSeqAIJSetPreallocation_SeqAIJ >>>>>>> (aij.c:3595) >>>>>>> > ==18175== by 0x56B39BD: MatSeqAIJSetPreallocation (aij.c:3539) >>>>>>> > ==18175== by 0x59A9B44: DMCreateMatrix_DA_3d_MPIAIJ(_p_DM*, >>>>>>> _p_Mat*) (fdda.c:1085) >>>>>>> > ==18175== by 0x59A4C71: DMCreateMatrix_DA(_p_DM*, _p_Mat**) >>>>>>> (fdda.c:759) >>>>>>> > ==18175== by 0x58BBD29: DMCreateMatrix (dm.c:956) >>>>>>> > ==18175== by 0x5E509D5: KSPSetUp (itfunc.c:262) >>>>>>> > ==18175== by 0x40A3DE: PetscAdLemTaras3D::solveModel(bool) >>>>>>> (PetscAdLemTaras3D.hxx:269) >>>>>>> > ==18175== by 0x42413F: AdLem3D<3u>::solveModel(bool, bool, >>>>>>> bool) (AdLem3D.hxx:552) >>>>>>> > ==18175== by 0x41C25C: main (PetscAdLemMain.cxx:349) >>>>>>> > ==18175== >>>>>>> > >>>>>>> > -- >>>>>>> > -- >>>>>>> > A/Prof Richard Beare >>>>>>> > Imaging and Bioinformatics, Peninsula Clinical School >>>>>>> > orcid.org/0000-0002-7530-5664 >>>>>>> > Richard.Beare at monash.edu >>>>>>> > +61 3 9788 1724 >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > Geospatial Research: >>>>>>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> -- >>>>>> A/Prof Richard Beare >>>>>> Imaging and Bioinformatics, Peninsula Clinical School >>>>>> orcid.org/0000-0002-7530-5664 >>>>>> Richard.Beare at monash.edu >>>>>> +61 3 9788 1724 >>>>>> >>>>>> >>>>>> >>>>>> Geospatial Research: >>>>>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis >>>>>> >>>>> >>>> >>>> -- >>>> -- >>>> A/Prof Richard Beare >>>> Imaging and Bioinformatics, Peninsula Clinical School >>>> orcid.org/0000-0002-7530-5664 >>>> Richard.Beare at monash.edu >>>> +61 3 9788 1724 >>>> >>>> >>>> >>>> Geospatial Research: >>>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis >>>> >>> >>> >>> -- >>> -- >>> A/Prof Richard Beare >>> Imaging and Bioinformatics, Peninsula Clinical School >>> orcid.org/0000-0002-7530-5664 >>> Richard.Beare at monash.edu >>> +61 3 9788 1724 >>> >>> >>> >>> Geospatial Research: >>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis >>> >> -- -- A/Prof Richard Beare Imaging and Bioinformatics, Peninsula Clinical School orcid.org/0000-0002-7530-5664 Richard.Beare at monash.edu +61 3 9788 1724 Geospatial Research: https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis -------------- next part -------------- An HTML attachment was scrubbed... URL: From juaneah at gmail.com Mon Feb 17 15:33:26 2020 From: juaneah at gmail.com (Emmanuel Ayala) Date: Mon, 17 Feb 2020 15:33:26 -0600 Subject: [petsc-users] BCs for a EPS solver In-Reply-To: References: Message-ID: Hi, Thank you for the clarification, now I understand what means change those values, and I tried to do that. But if I put the row i to the identity in A, and zero in B, the solver crash: [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Error in external library [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: Error in external library [1]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-10, INFO(2)=0 [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [1]PETSC ERROR: Petsc Release Version 3.12.3, Jan, 03, 2020 [1]PETSC ERROR: [2]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [2]PETSC ERROR: Error in external library [2]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-10, INFO(2)=0 [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [2]PETSC ERROR: Petsc Release Version 3.12.3, Jan, 03, 2020 [2]PETSC ERROR: ./comp on a arch-linux-c-opt-O2-mumps named ayala by ayala Mon Feb 17 15:28:01 2020 [3]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [3]PETSC ERROR: Error in external library [3]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-10, INFO(2)=0 [3]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [3]PETSC ERROR: Petsc Release Version 3.12.3, Jan, 03, 2020 [3]PETSC ERROR: ./comp on a arch-linux-c-opt-O2-mumps named ayala by ayala Mon Feb 17 15:28:01 2020 [3]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2 -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich --download-mumps --download-scalapack --download-parmetis --download-metis --download-superlu_dist --download-cmake --download-fblaslapack=1 --with-cxx-dialect=C++11 [3]PETSC ERROR: #1 MatFactorNumeric_MUMPS() line 1365 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/impls/aij/mpi/mumps/mumps.c [0]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-10, INFO(2)=33 [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.12.3, Jan, 03, 2020 [0]PETSC ERROR: ./comp on a arch-linux-c-opt-O2-mumps named ayala by ayala Mon Feb 17 15:28:01 2020 [0]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2 -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich --download-mumps --download-scalapack --download-parmetis --download-metis --download-superlu_dist --download-cmake --download-fblaslapack=1 --with-cxx-dialect=C++11 [0]PETSC ERROR: #1 MatFactorNumeric_MUMPS() line 1365 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/impls/aij/mpi/mumps/mumps.c ./comp on a arch-linux-c-opt-O2-mumps named ayala by ayala Mon Feb 17 15:28:01 2020 [1]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2 -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich --download-mumps --download-scalapack --download-parmetis --download-metis --download-superlu_dist --download-cmake --download-fblaslapack=1 --with-cxx-dialect=C++11 [1]PETSC ERROR: #1 MatFactorNumeric_MUMPS() line 1365 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/impls/aij/mpi/mumps/mumps.c [1]PETSC ERROR: #2 MatLUFactorNumeric() line 3057 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/interface/matrix.c [2]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2 -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich --download-mumps --download-scalapack --download-parmetis --download-metis --download-superlu_dist --download-cmake --download-fblaslapack=1 --with-cxx-dialect=C++11 [2]PETSC ERROR: #1 MatFactorNumeric_MUMPS() line 1365 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/impls/aij/mpi/mumps/mumps.c [2]PETSC ERROR: #2 MatLUFactorNumeric() line 3057 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/interface/matrix.c [2]PETSC ERROR: [3]PETSC ERROR: #2 MatLUFactorNumeric() line 3057 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/interface/matrix.c [3]PETSC ERROR: #3 PCSetUp_LU() line 126 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/impls/factor/lu/lu.c [3]PETSC ERROR: #4 PCSetUp() line 894 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/interface/precon.c [0]PETSC ERROR: #2 MatLUFactorNumeric() line 3057 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/interface/matrix.c [0]PETSC ERROR: #3 PCSetUp_LU() line 126 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/impls/factor/lu/lu.c [0]PETSC ERROR: [1]PETSC ERROR: #3 PCSetUp_LU() line 126 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/impls/factor/lu/lu.c [1]PETSC ERROR: #4 PCSetUp() line 894 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/interface/precon.c [1]PETSC ERROR: #3 PCSetUp_LU() line 126 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/impls/factor/lu/lu.c [2]PETSC ERROR: #4 PCSetUp() line 894 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/interface/precon.c [2]PETSC ERROR: #5 KSPSetUp() line 376 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/ksp/interface/itfunc.c [3]PETSC ERROR: #5 KSPSetUp() line 376 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/ksp/interface/itfunc.c [3]PETSC ERROR: #6 STSetUp_Shift() line 120 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/impls/shift/shift.c #4 PCSetUp() line 894 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/interface/precon.c [0]PETSC ERROR: #5 KSPSetUp() line 376 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #6 STSetUp_Shift() line 120 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/impls/shift/shift.c [0]PETSC ERROR: #5 KSPSetUp() line 376 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/ksp/interface/itfunc.c [1]PETSC ERROR: #6 STSetUp_Shift() line 120 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/impls/shift/shift.c [1]PETSC ERROR: #7 STSetUp() line 271 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/interface/stsolve.c [1]PETSC ERROR: #8 EPSSetUp() line 273 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/eps/interface/epssetup.c [2]PETSC ERROR: #6 STSetUp_Shift() line 120 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/impls/shift/shift.c [2]PETSC ERROR: #7 STSetUp() line 271 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/interface/stsolve.c [2]PETSC ERROR: #8 EPSSetUp() line 273 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/eps/interface/epssetup.c [2]PETSC ERROR: #9 FourBar_NaturalPulsation() line 3937 in /home/ayala/Nextcloud/cpp_projects/2020-02-13-muboto-balancing-v17-mma/Multibody.cc El lun., 17 de feb. de 2020 a la(s) 13:20, Matthew Knepley ( knepley at gmail.com) escribi?: > On Mon, Feb 17, 2020 at 1:59 PM Emmanuel Ayala wrote: > >> Hi, thanks for the quick answer. >> >> I just did it, and it does not work. My problem is GNHEP and I use the >> default solver (Krylov-Schur). Moreover I run the code with the options: >> -st_ksp_type preonly -st_pc_type lu -st_pc_factor_mat_solver_type mumps >> > > I guess a better question is: What do you expect to work? > > For a linear solve, > > A x = b > > if a row i is 0 except for a one on the diagonal, then I get > > x_i = b_i > > so hopefully you put the correct boundary value in b_i. For the > generalized eigenproblem > > A x = \lambda B x > > if you set row i to the identity in A, and zero in B, we get > > x_i = 0 > > and you must put the boundary values into x after you have finished the > solve. Is this what you did? > > Thanks, > > Matt > > >> Any other suggestions? >> Kind regards. >> >> El lun., 17 de feb. de 2020 a la(s) 12:39, Jeremy Theler ( >> jeremy at seamplex.com) escribi?: >> >>> The usual trick is to set ones in one matrix and zeros in the other >>> one. >>> >>> >>> On Mon, 2020-02-17 at 12:35 -0600, Emmanuel Ayala wrote: >>> > Hi everyone, >>> > >>> > I have an eigenvalue problem where I need to apply BCs to the >>> > stiffness and mass matrix. >>> > >>> > Usually, for KSP solver, it is enough to set to zero the rows and >>> > columns related to the boundary conditions. I used to apply it with >>> > MatZeroRowsColumns, with a 1s on the diagonal. Then the solver works >>> > well. >>> > >>> > There is something similar to KSP for EPS solver ? >>> > >>> > I already used MatZeroRowsColumns (for EPS solver), with a 1s on the >>> > diagonal, and I got wrong result. >>> > >>> > Kind regards. >>> > >>> > >>> > >>> > >>> >>> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon Feb 17 17:34:32 2020 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Mon, 17 Feb 2020 23:34:32 +0000 Subject: [petsc-users] Matrix-free method in PETSc In-Reply-To: References: Message-ID: <624EDFBF-E105-4F70-87C6-58BE940D73C9@mcs.anl.gov> > On Feb 17, 2020, at 7:56 AM, Yuyun Yang wrote: > > Hello, > > I actually have a question about the usage of DMDA since I'm quite new to this. I wonder if the DMDA suite of functions can be directly called on vectors created from VecCreate? Yes, but you have to make sure the ones you create have the same sizes and parallel layouts. Generally best to get them from the DMDA or VecDuplicate() than the hassle of figuring out sizes. > Or the vectors have to be formed by DMDACreateGlobalVector? I'm also not sure about what the dof and stencil width arguments do. > > I'm still unsure about the usage of MatCreateShell and MatShellSetOperation, since it seems that MyMatMult should still have 3 inputs just like MatMult (the matrix and two vectors). Since I'm not forming the matrix, does that mean the matrix input is meaningless but still needs to exist for the sake of this format? Well the matrix input is your shell matrix so it likely has information you need to do your multiply routine. MatShellGetContext() (No you do not want to put your information about the matrix stencil inside global variables!) > > After I create such a shell matrix, can I use it like a regular matrix in KSP and utilize preconditioners? > > Thanks! > Yuyun > From: petsc-users on behalf of Yuyun Yang > Sent: Sunday, February 16, 2020 3:12 AM > To: Smith, Barry F. > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Matrix-free method in PETSc > > Thank you, that is very helpful information indeed! I will try it and send you my code when it works. > > Best regards, > Yuyun > From: Smith, Barry F. > Sent: Saturday, February 15, 2020 10:02 PM > To: Yuyun Yang > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Matrix-free method in PETSc > > Yuyun, > > If you are speaking about using a finite difference stencil on a structured grid where you provide the Jacobian vector products yourself by looping over the grid doing the stencil operation we unfortunately do not have exactly that kind of example. > > But it is actually not difficult. I suggest starting with src/ts/examples/tests/ex22.c It computes the sparse matrix explicitly with FormIJacobian() > > What you need to do is instead in main() use MatCreateShell() and MatShellSetOperation(,MATOP_MULT,(void (*)(void))MyMatMult) then provide the routine MyMatMult() to do your stencil based matrix free product; note that you can create this new routine by taking the structure of IFunction() and reorganizing it to do the Jacobian product instead. You will need to get the information about the shell matrix size on each process by calling DMDAGetCorners(). > > You will then remove the explicit computation of the Jacobian, and also remove the Event stuff since you don't need it. > > Extending to 2 and 3d is straight forward. > > Any questions let us know. > > Barry > > If you like this would make a great merge request with your code to improve our examples. > > > > On Feb 15, 2020, at 9:42 PM, Yuyun Yang wrote: > > > > Hello team, > > > > I wanted to apply the Krylov subspace method to a matrix-free implementation of a stencil, such that the iterative method acts on the operation without ever constructing the matrix explicitly (for example, when doing backward Euler). > > > > I'm not sure whether there is already an example for that somewhere. If so, could you point me to a relevant example? > > > > Thank you! > > > > Best regards, > > Yuyun From knepley at gmail.com Mon Feb 17 17:41:15 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 17 Feb 2020 18:41:15 -0500 Subject: [petsc-users] BCs for a EPS solver In-Reply-To: References: Message-ID: On Mon, Feb 17, 2020 at 4:33 PM Emmanuel Ayala wrote: > Hi, > > Thank you for the clarification, now I understand what means change those > values, and I tried to do that. > > But if I put the row i to the identity in A, and zero in B, the solver > crash: > So if you need to factor B, maybe reverse it? Thanks, Matt > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Error in external library > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [1]PETSC ERROR: Error in external library > [1]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: > INFOG(1)=-10, INFO(2)=0 > > [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [1]PETSC ERROR: Petsc Release Version 3.12.3, Jan, 03, 2020 > [1]PETSC ERROR: [2]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [2]PETSC ERROR: Error in external library > [2]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: > INFOG(1)=-10, INFO(2)=0 > > [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [2]PETSC ERROR: Petsc Release Version 3.12.3, Jan, 03, 2020 > [2]PETSC ERROR: ./comp on a arch-linux-c-opt-O2-mumps named ayala by ayala > Mon Feb 17 15:28:01 2020 > [3]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [3]PETSC ERROR: Error in external library > [3]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: > INFOG(1)=-10, INFO(2)=0 > > [3]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [3]PETSC ERROR: Petsc Release Version 3.12.3, Jan, 03, 2020 > [3]PETSC ERROR: ./comp on a arch-linux-c-opt-O2-mumps named ayala by ayala > Mon Feb 17 15:28:01 2020 > [3]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2 > -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" > FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich > --download-mumps --download-scalapack --download-parmetis --download-metis > --download-superlu_dist --download-cmake --download-fblaslapack=1 > --with-cxx-dialect=C++11 > [3]PETSC ERROR: #1 MatFactorNumeric_MUMPS() line 1365 in > /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/impls/aij/mpi/mumps/mumps.c > [0]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: > INFOG(1)=-10, INFO(2)=33 > > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.12.3, Jan, 03, 2020 > [0]PETSC ERROR: ./comp on a arch-linux-c-opt-O2-mumps named ayala by ayala > Mon Feb 17 15:28:01 2020 > [0]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2 > -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" > FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich > --download-mumps --download-scalapack --download-parmetis --download-metis > --download-superlu_dist --download-cmake --download-fblaslapack=1 > --with-cxx-dialect=C++11 > [0]PETSC ERROR: #1 MatFactorNumeric_MUMPS() line 1365 in > /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/impls/aij/mpi/mumps/mumps.c > ./comp on a arch-linux-c-opt-O2-mumps named ayala by ayala Mon Feb 17 > 15:28:01 2020 > [1]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2 > -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" > FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich > --download-mumps --download-scalapack --download-parmetis --download-metis > --download-superlu_dist --download-cmake --download-fblaslapack=1 > --with-cxx-dialect=C++11 > [1]PETSC ERROR: #1 MatFactorNumeric_MUMPS() line 1365 in > /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/impls/aij/mpi/mumps/mumps.c > [1]PETSC ERROR: #2 MatLUFactorNumeric() line 3057 in > /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/interface/matrix.c > [2]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2 > -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" > FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich > --download-mumps --download-scalapack --download-parmetis --download-metis > --download-superlu_dist --download-cmake --download-fblaslapack=1 > --with-cxx-dialect=C++11 > [2]PETSC ERROR: #1 MatFactorNumeric_MUMPS() line 1365 in > /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/impls/aij/mpi/mumps/mumps.c > [2]PETSC ERROR: #2 MatLUFactorNumeric() line 3057 in > /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/interface/matrix.c > [2]PETSC ERROR: [3]PETSC ERROR: #2 MatLUFactorNumeric() line 3057 in > /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/interface/matrix.c > [3]PETSC ERROR: #3 PCSetUp_LU() line 126 in > /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/impls/factor/lu/lu.c > [3]PETSC ERROR: #4 PCSetUp() line 894 in > /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/interface/precon.c > [0]PETSC ERROR: #2 MatLUFactorNumeric() line 3057 in > /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/interface/matrix.c > [0]PETSC ERROR: #3 PCSetUp_LU() line 126 in > /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/impls/factor/lu/lu.c > [0]PETSC ERROR: [1]PETSC ERROR: #3 PCSetUp_LU() line 126 in > /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/impls/factor/lu/lu.c > [1]PETSC ERROR: #4 PCSetUp() line 894 in > /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/interface/precon.c > [1]PETSC ERROR: #3 PCSetUp_LU() line 126 in > /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/impls/factor/lu/lu.c > [2]PETSC ERROR: #4 PCSetUp() line 894 in > /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/interface/precon.c > [2]PETSC ERROR: #5 KSPSetUp() line 376 in > /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/ksp/interface/itfunc.c > [3]PETSC ERROR: #5 KSPSetUp() line 376 in > /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/ksp/interface/itfunc.c > [3]PETSC ERROR: #6 STSetUp_Shift() line 120 in > /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/impls/shift/shift.c > #4 PCSetUp() line 894 in > /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/interface/precon.c > [0]PETSC ERROR: #5 KSPSetUp() line 376 in > /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: #6 STSetUp_Shift() line 120 in > /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/impls/shift/shift.c > [0]PETSC ERROR: #5 KSPSetUp() line 376 in > /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/ksp/interface/itfunc.c > [1]PETSC ERROR: #6 STSetUp_Shift() line 120 in > /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/impls/shift/shift.c > [1]PETSC ERROR: #7 STSetUp() line 271 in > /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/interface/stsolve.c > [1]PETSC ERROR: #8 EPSSetUp() line 273 in > /home/ayala/Documents/SLEPc/slepc-3.12.2/src/eps/interface/epssetup.c > [2]PETSC ERROR: #6 STSetUp_Shift() line 120 in > /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/impls/shift/shift.c > [2]PETSC ERROR: #7 STSetUp() line 271 in > /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/interface/stsolve.c > [2]PETSC ERROR: #8 EPSSetUp() line 273 in > /home/ayala/Documents/SLEPc/slepc-3.12.2/src/eps/interface/epssetup.c > [2]PETSC ERROR: #9 FourBar_NaturalPulsation() line 3937 in > /home/ayala/Nextcloud/cpp_projects/2020-02-13-muboto-balancing-v17-mma/Multibody.cc > > El lun., 17 de feb. de 2020 a la(s) 13:20, Matthew Knepley ( > knepley at gmail.com) escribi?: > >> On Mon, Feb 17, 2020 at 1:59 PM Emmanuel Ayala wrote: >> >>> Hi, thanks for the quick answer. >>> >>> I just did it, and it does not work. My problem is GNHEP and I use the >>> default solver (Krylov-Schur). Moreover I run the code with the options: >>> -st_ksp_type preonly -st_pc_type lu -st_pc_factor_mat_solver_type mumps >>> >> >> I guess a better question is: What do you expect to work? >> >> For a linear solve, >> >> A x = b >> >> if a row i is 0 except for a one on the diagonal, then I get >> >> x_i = b_i >> >> so hopefully you put the correct boundary value in b_i. For the >> generalized eigenproblem >> >> A x = \lambda B x >> >> if you set row i to the identity in A, and zero in B, we get >> >> x_i = 0 >> >> and you must put the boundary values into x after you have finished the >> solve. Is this what you did? >> >> Thanks, >> >> Matt >> >> >>> Any other suggestions? >>> Kind regards. >>> >>> El lun., 17 de feb. de 2020 a la(s) 12:39, Jeremy Theler ( >>> jeremy at seamplex.com) escribi?: >>> >>>> The usual trick is to set ones in one matrix and zeros in the other >>>> one. >>>> >>>> >>>> On Mon, 2020-02-17 at 12:35 -0600, Emmanuel Ayala wrote: >>>> > Hi everyone, >>>> > >>>> > I have an eigenvalue problem where I need to apply BCs to the >>>> > stiffness and mass matrix. >>>> > >>>> > Usually, for KSP solver, it is enough to set to zero the rows and >>>> > columns related to the boundary conditions. I used to apply it with >>>> > MatZeroRowsColumns, with a 1s on the diagonal. Then the solver works >>>> > well. >>>> > >>>> > There is something similar to KSP for EPS solver ? >>>> > >>>> > I already used MatZeroRowsColumns (for EPS solver), with a 1s on the >>>> > diagonal, and I got wrong result. >>>> > >>>> > Kind regards. >>>> > >>>> > >>>> > >>>> > >>>> >>>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Tue Feb 18 03:17:46 2020 From: jroman at dsic.upv.es (Jose E. Roman) Date: Tue, 18 Feb 2020 10:17:46 +0100 Subject: [petsc-users] SLEPc: The inner product is not well defined In-Reply-To: References: Message-ID: <5772B337-D031-434F-B2B9-AC6EA7D19783@dsic.upv.es> > El 17 feb 2020, a las 19:19, Emmanuel Ayala escribi?: > > Thank you very much for the answer. > > This error appears when computing the B-norm of a vector x, as sqrt(x'*B*x). Probably your B matrix is semi-definite, and due to floating-point error the value x'*B*x becomes negative for a certain vector x. The code uses a tolerance of 10*PETSC_MACHINE_EPSILON, but it seems the rounding errors are larger in your case. Or maybe your B-matrix is indefinite, in which case you should solve the problem as non-symmetric (or as symmetric-indefinite GHIEP). > > Do you get the same problem with the Krylov-Schur solver? > > > After check the input matrices, the problem was solved using GHIEP. > > A workaround is to edit the source code and remove the check or increase the tolerance, but this may be catastrophic if your B is indefinite. A better solution is to reformulate the problem, solving the matrix pair (A,C) where C=alpha*A+beta*B is positive definite (note that then the eigenvalues become lambda/(beta+alpha*lambda)). > > > Ok, there is a rule to choose the values for alpha and beta? For instance take alpha=1 and beta=-sigma, where sigma is a lower bound of the leftmost eigenvalue of B (the most negative one). This assumes that A is positive definite. Jose > > Kind regards. > Thanks. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Tue Feb 18 03:51:36 2020 From: jroman at dsic.upv.es (Jose E. Roman) Date: Tue, 18 Feb 2020 10:51:36 +0100 Subject: [petsc-users] BCs for a EPS solver In-Reply-To: References: Message-ID: <9286DD37-1487-4F54-BBC9-4E6133FEC916@dsic.upv.es> You put alpha on the diagonal of A and beta on the diagonal of B to get an eigenvalue lambda=alpha/beta. If you set beta=0 then lambda=Inf. The choice depends on where your wanted eigenvalues are and how you are solving the eigenproblem. The choice of lambda=Inf suggested by Jeremy avoids inserting eigenvalues that may interfere with the problem's eigenvalues, but this is good for shift-and-invert, not for the case where you solve linear systems with B. Anyway, this kind of manipulation may have an impact on convergence of the eigensolver or on conditioning of the linear solves. A possibly better approach is just to get rid of the BC unknowns by creating smaller A, B matrices, e.g. with MatGetSubMatrix(). Jose > El 18 feb 2020, a las 0:41, Matthew Knepley escribi?: > > On Mon, Feb 17, 2020 at 4:33 PM Emmanuel Ayala > wrote: > Hi, > > Thank you for the clarification, now I understand what means change those values, and I tried to do that. > > But if I put the row i to the identity in A, and zero in B, the solver crash: > > So if you need to factor B, maybe reverse it? > > Thanks, > > Matt > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Error in external library > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [1]PETSC ERROR: Error in external library > [1]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-10, INFO(2)=0 > > [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [1]PETSC ERROR: Petsc Release Version 3.12.3, Jan, 03, 2020 > [1]PETSC ERROR: [2]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [2]PETSC ERROR: Error in external library > [2]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-10, INFO(2)=0 > > [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [2]PETSC ERROR: Petsc Release Version 3.12.3, Jan, 03, 2020 > [2]PETSC ERROR: ./comp on a arch-linux-c-opt-O2-mumps named ayala by ayala Mon Feb 17 15:28:01 2020 > [3]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [3]PETSC ERROR: Error in external library > [3]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-10, INFO(2)=0 > > [3]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [3]PETSC ERROR: Petsc Release Version 3.12.3, Jan, 03, 2020 > [3]PETSC ERROR: ./comp on a arch-linux-c-opt-O2-mumps named ayala by ayala Mon Feb 17 15:28:01 2020 > [3]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2 -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich --download-mumps --download-scalapack --download-parmetis --download-metis --download-superlu_dist --download-cmake --download-fblaslapack=1 --with-cxx-dialect=C++11 > [3]PETSC ERROR: #1 MatFactorNumeric_MUMPS() line 1365 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/impls/aij/mpi/mumps/mumps.c > [0]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-10, INFO(2)=33 > > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.12.3, Jan, 03, 2020 > [0]PETSC ERROR: ./comp on a arch-linux-c-opt-O2-mumps named ayala by ayala Mon Feb 17 15:28:01 2020 > [0]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2 -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich --download-mumps --download-scalapack --download-parmetis --download-metis --download-superlu_dist --download-cmake --download-fblaslapack=1 --with-cxx-dialect=C++11 > [0]PETSC ERROR: #1 MatFactorNumeric_MUMPS() line 1365 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/impls/aij/mpi/mumps/mumps.c > ./comp on a arch-linux-c-opt-O2-mumps named ayala by ayala Mon Feb 17 15:28:01 2020 > [1]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2 -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich --download-mumps --download-scalapack --download-parmetis --download-metis --download-superlu_dist --download-cmake --download-fblaslapack=1 --with-cxx-dialect=C++11 > [1]PETSC ERROR: #1 MatFactorNumeric_MUMPS() line 1365 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/impls/aij/mpi/mumps/mumps.c > [1]PETSC ERROR: #2 MatLUFactorNumeric() line 3057 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/interface/matrix.c > [2]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2 -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich --download-mumps --download-scalapack --download-parmetis --download-metis --download-superlu_dist --download-cmake --download-fblaslapack=1 --with-cxx-dialect=C++11 > [2]PETSC ERROR: #1 MatFactorNumeric_MUMPS() line 1365 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/impls/aij/mpi/mumps/mumps.c > [2]PETSC ERROR: #2 MatLUFactorNumeric() line 3057 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/interface/matrix.c > [2]PETSC ERROR: [3]PETSC ERROR: #2 MatLUFactorNumeric() line 3057 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/interface/matrix.c > [3]PETSC ERROR: #3 PCSetUp_LU() line 126 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/impls/factor/lu/lu.c > [3]PETSC ERROR: #4 PCSetUp() line 894 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/interface/precon.c > [0]PETSC ERROR: #2 MatLUFactorNumeric() line 3057 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/interface/matrix.c > [0]PETSC ERROR: #3 PCSetUp_LU() line 126 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/impls/factor/lu/lu.c > [0]PETSC ERROR: [1]PETSC ERROR: #3 PCSetUp_LU() line 126 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/impls/factor/lu/lu.c > [1]PETSC ERROR: #4 PCSetUp() line 894 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/interface/precon.c > [1]PETSC ERROR: #3 PCSetUp_LU() line 126 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/impls/factor/lu/lu.c > [2]PETSC ERROR: #4 PCSetUp() line 894 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/interface/precon.c > [2]PETSC ERROR: #5 KSPSetUp() line 376 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/ksp/interface/itfunc.c > [3]PETSC ERROR: #5 KSPSetUp() line 376 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/ksp/interface/itfunc.c > [3]PETSC ERROR: #6 STSetUp_Shift() line 120 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/impls/shift/shift.c > #4 PCSetUp() line 894 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/interface/precon.c > [0]PETSC ERROR: #5 KSPSetUp() line 376 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: #6 STSetUp_Shift() line 120 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/impls/shift/shift.c > [0]PETSC ERROR: #5 KSPSetUp() line 376 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/ksp/interface/itfunc.c > [1]PETSC ERROR: #6 STSetUp_Shift() line 120 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/impls/shift/shift.c > [1]PETSC ERROR: #7 STSetUp() line 271 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/interface/stsolve.c > [1]PETSC ERROR: #8 EPSSetUp() line 273 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/eps/interface/epssetup.c > [2]PETSC ERROR: #6 STSetUp_Shift() line 120 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/impls/shift/shift.c > [2]PETSC ERROR: #7 STSetUp() line 271 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/interface/stsolve.c > [2]PETSC ERROR: #8 EPSSetUp() line 273 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/eps/interface/epssetup.c > [2]PETSC ERROR: #9 FourBar_NaturalPulsation() line 3937 in /home/ayala/Nextcloud/cpp_projects/2020-02-13-muboto-balancing-v17-mma/Multibody.cc > > El lun., 17 de feb. de 2020 a la(s) 13:20, Matthew Knepley (knepley at gmail.com ) escribi?: > On Mon, Feb 17, 2020 at 1:59 PM Emmanuel Ayala > wrote: > Hi, thanks for the quick answer. > > I just did it, and it does not work. My problem is GNHEP and I use the default solver (Krylov-Schur). Moreover I run the code with the options: -st_ksp_type preonly -st_pc_type lu -st_pc_factor_mat_solver_type mumps > > I guess a better question is: What do you expect to work? > > For a linear solve, > > A x = b > > if a row i is 0 except for a one on the diagonal, then I get > > x_i = b_i > > so hopefully you put the correct boundary value in b_i. For the generalized eigenproblem > > A x = \lambda B x > > if you set row i to the identity in A, and zero in B, we get > > x_i = 0 > > and you must put the boundary values into x after you have finished the solve. Is this what you did? > > Thanks, > > Matt > > Any other suggestions? > Kind regards. > > El lun., 17 de feb. de 2020 a la(s) 12:39, Jeremy Theler (jeremy at seamplex.com ) escribi?: > The usual trick is to set ones in one matrix and zeros in the other > one. > > > On Mon, 2020-02-17 at 12:35 -0600, Emmanuel Ayala wrote: > > Hi everyone, > > > > I have an eigenvalue problem where I need to apply BCs to the > > stiffness and mass matrix. > > > > Usually, for KSP solver, it is enough to set to zero the rows and > > columns related to the boundary conditions. I used to apply it with > > MatZeroRowsColumns, with a 1s on the diagonal. Then the solver works > > well. > > > > There is something similar to KSP for EPS solver ? > > > > I already used MatZeroRowsColumns (for EPS solver), with a 1s on the > > diagonal, and I got wrong result. > > > > Kind regards. > > > > > > > > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.scott at epcc.ed.ac.uk Tue Feb 18 05:03:07 2020 From: d.scott at epcc.ed.ac.uk (David Scott) Date: Tue, 18 Feb 2020 11:03:07 +0000 Subject: [petsc-users] DM_BOUNDARY_GHOSTED Message-ID: <0fb1334e-13f7-23f3-1c42-79286ed164c9@epcc.ed.ac.uk> Hello, I wish to solve a channel flow problem with different boundary conditions. In the streamwise direction I may have periodic or inlet/outlet BCs. I would like to make my code for the two cases as similar as possible. If I use DM_BOUNDARY_PERIODIC then when performing a linear solve the ghost values will be set automatically. For the inlet/outlet case can I use DM_BOUNDARY_GHOSTED instead and somehow arrange for values that I specify to be placed in the ghost locations? Thanks, David The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From knepley at gmail.com Tue Feb 18 06:42:22 2020 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 18 Feb 2020 07:42:22 -0500 Subject: [petsc-users] DM_BOUNDARY_GHOSTED In-Reply-To: <0fb1334e-13f7-23f3-1c42-79286ed164c9@epcc.ed.ac.uk> References: <0fb1334e-13f7-23f3-1c42-79286ed164c9@epcc.ed.ac.uk> Message-ID: On Tue, Feb 18, 2020 at 6:03 AM David Scott wrote: > Hello, > > I wish to solve a channel flow problem with different boundary > conditions. In the streamwise direction I may have periodic or > inlet/outlet BCs. I would like to make my code for the two cases as > similar as possible. If I use DM_BOUNDARY_PERIODIC then when performing > a linear solve the ghost values will be set automatically. For the > inlet/outlet case can I use DM_BOUNDARY_GHOSTED instead and somehow > arrange for values that I specify to be placed in the ghost locations? > Yes, that is the intent. Thanks, Matt > Thanks, > > David > > The University of Edinburgh is a charitable body, registered in Scotland, > with registration number SC005336. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From yyang85 at stanford.edu Tue Feb 18 07:19:52 2020 From: yyang85 at stanford.edu (Yuyun Yang) Date: Tue, 18 Feb 2020 13:19:52 +0000 Subject: [petsc-users] Matrix-free method in PETSc In-Reply-To: <624EDFBF-E105-4F70-87C6-58BE940D73C9@mcs.anl.gov> References: <624EDFBF-E105-4F70-87C6-58BE940D73C9@mcs.anl.gov> Message-ID: <1D793649-96CD-4807-8330-53D9CA176348@stanford.edu> Thanks for the clarification. Got one more question: if I have variable coefficients, my stencil will be updated at every time step, so will the coefficients in myMatMult. In that case, is it necessary to destroy the shell matrix and create it all over again, or can I use it as it is, only calling the stencil update function, assuming the result will be passed into the matrix operation automatically? Thanks, Yuyun ?On 2/18/20, 7:34 AM, "Smith, Barry F." wrote: > On Feb 17, 2020, at 7:56 AM, Yuyun Yang wrote: > > Hello, > > I actually have a question about the usage of DMDA since I'm quite new to this. I wonder if the DMDA suite of functions can be directly called on vectors created from VecCreate? Yes, but you have to make sure the ones you create have the same sizes and parallel layouts. Generally best to get them from the DMDA or VecDuplicate() than the hassle of figuring out sizes. > Or the vectors have to be formed by DMDACreateGlobalVector? I'm also not sure about what the dof and stencil width arguments do. > > I'm still unsure about the usage of MatCreateShell and MatShellSetOperation, since it seems that MyMatMult should still have 3 inputs just like MatMult (the matrix and two vectors). Since I'm not forming the matrix, does that mean the matrix input is meaningless but still needs to exist for the sake of this format? Well the matrix input is your shell matrix so it likely has information you need to do your multiply routine. MatShellGetContext() (No you do not want to put your information about the matrix stencil inside global variables!) > > After I create such a shell matrix, can I use it like a regular matrix in KSP and utilize preconditioners? > > Thanks! > Yuyun > From: petsc-users on behalf of Yuyun Yang > Sent: Sunday, February 16, 2020 3:12 AM > To: Smith, Barry F. > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Matrix-free method in PETSc > > Thank you, that is very helpful information indeed! I will try it and send you my code when it works. > > Best regards, > Yuyun > From: Smith, Barry F. > Sent: Saturday, February 15, 2020 10:02 PM > To: Yuyun Yang > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Matrix-free method in PETSc > > Yuyun, > > If you are speaking about using a finite difference stencil on a structured grid where you provide the Jacobian vector products yourself by looping over the grid doing the stencil operation we unfortunately do not have exactly that kind of example. > > But it is actually not difficult. I suggest starting with src/ts/examples/tests/ex22.c It computes the sparse matrix explicitly with FormIJacobian() > > What you need to do is instead in main() use MatCreateShell() and MatShellSetOperation(,MATOP_MULT,(void (*)(void))MyMatMult) then provide the routine MyMatMult() to do your stencil based matrix free product; note that you can create this new routine by taking the structure of IFunction() and reorganizing it to do the Jacobian product instead. You will need to get the information about the shell matrix size on each process by calling DMDAGetCorners(). > > You will then remove the explicit computation of the Jacobian, and also remove the Event stuff since you don't need it. > > Extending to 2 and 3d is straight forward. > > Any questions let us know. > > Barry > > If you like this would make a great merge request with your code to improve our examples. > > > > On Feb 15, 2020, at 9:42 PM, Yuyun Yang wrote: > > > > Hello team, > > > > I wanted to apply the Krylov subspace method to a matrix-free implementation of a stencil, such that the iterative method acts on the operation without ever constructing the matrix explicitly (for example, when doing backward Euler). > > > > I'm not sure whether there is already an example for that somewhere. If so, could you point me to a relevant example? > > > > Thank you! > > > > Best regards, > > Yuyun From knepley at gmail.com Tue Feb 18 07:23:03 2020 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 18 Feb 2020 08:23:03 -0500 Subject: [petsc-users] Matrix-free method in PETSc In-Reply-To: <1D793649-96CD-4807-8330-53D9CA176348@stanford.edu> References: <624EDFBF-E105-4F70-87C6-58BE940D73C9@mcs.anl.gov> <1D793649-96CD-4807-8330-53D9CA176348@stanford.edu> Message-ID: On Tue, Feb 18, 2020 at 8:20 AM Yuyun Yang wrote: > Thanks for the clarification. > > Got one more question: if I have variable coefficients, my stencil will be > updated at every time step, so will the coefficients in myMatMult. In that > case, is it necessary to destroy the shell matrix and create it all over > again, or can I use it as it is, only calling the stencil update function, > assuming the result will be passed into the matrix operation automatically? > You update the information in the context associated with the shell matrix. No need to destroy it. Thanks, Matt > Thanks, > Yuyun > > ?On 2/18/20, 7:34 AM, "Smith, Barry F." wrote: > > > > > On Feb 17, 2020, at 7:56 AM, Yuyun Yang > wrote: > > > > Hello, > > > > I actually have a question about the usage of DMDA since I'm quite > new to this. I wonder if the DMDA suite of functions can be directly called > on vectors created from VecCreate? > > Yes, but you have to make sure the ones you create have the same > sizes and parallel layouts. Generally best to get them from the DMDA or > VecDuplicate() than the hassle of figuring out sizes. > > > Or the vectors have to be formed by DMDACreateGlobalVector? I'm also > not sure about what the dof and stencil width arguments do. > > > > I'm still unsure about the usage of MatCreateShell and > MatShellSetOperation, since it seems that MyMatMult should still have 3 > inputs just like MatMult (the matrix and two vectors). Since I'm not > forming the matrix, does that mean the matrix input is meaningless but > still needs to exist for the sake of this format? > > Well the matrix input is your shell matrix so it likely has > information you need to do your multiply routine. MatShellGetContext() (No > you do not want to put your information about the matrix stencil inside > global variables!) > > > > > > After I create such a shell matrix, can I use it like a regular > matrix in KSP and utilize preconditioners? > > > > Thanks! > > Yuyun > > From: petsc-users on behalf of > Yuyun Yang > > Sent: Sunday, February 16, 2020 3:12 AM > > To: Smith, Barry F. > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Matrix-free method in PETSc > > > > Thank you, that is very helpful information indeed! I will try it > and send you my code when it works. > > > > Best regards, > > Yuyun > > From: Smith, Barry F. > > Sent: Saturday, February 15, 2020 10:02 PM > > To: Yuyun Yang > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Matrix-free method in PETSc > > > > Yuyun, > > > > If you are speaking about using a finite difference stencil on a > structured grid where you provide the Jacobian vector products yourself by > looping over the grid doing the stencil operation we unfortunately do not > have exactly that kind of example. > > > > But it is actually not difficult. I suggest starting with > src/ts/examples/tests/ex22.c It computes the sparse matrix explicitly with > FormIJacobian() > > > > What you need to do is instead in main() use MatCreateShell() > and MatShellSetOperation(,MATOP_MULT,(void (*)(void))MyMatMult) then > provide the routine MyMatMult() to do your stencil based matrix free > product; note that you can create this new routine by taking the structure > of IFunction() and reorganizing it to do the Jacobian product instead. You > will need to get the information about the shell matrix size on each > process by calling DMDAGetCorners(). > > > > You will then remove the explicit computation of the Jacobian, > and also remove the Event stuff since you don't need it. > > > > Extending to 2 and 3d is straight forward. > > > > Any questions let us know. > > > > Barry > > > > If you like this would make a great merge request with your code > to improve our examples. > > > > > > > On Feb 15, 2020, at 9:42 PM, Yuyun Yang > wrote: > > > > > > Hello team, > > > > > > I wanted to apply the Krylov subspace method to a matrix-free > implementation of a stencil, such that the iterative method acts on the > operation without ever constructing the matrix explicitly (for example, > when doing backward Euler). > > > > > > I'm not sure whether there is already an example for that > somewhere. If so, could you point me to a relevant example? > > > > > > Thank you! > > > > > > Best regards, > > > Yuyun > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From yyang85 at stanford.edu Tue Feb 18 08:26:11 2020 From: yyang85 at stanford.edu (Yuyun Yang) Date: Tue, 18 Feb 2020 14:26:11 +0000 Subject: [petsc-users] Matrix-free method in PETSc In-Reply-To: References: <624EDFBF-E105-4F70-87C6-58BE940D73C9@mcs.anl.gov> <1D793649-96CD-4807-8330-53D9CA176348@stanford.edu> Message-ID: Thanks. Also, when using KSP, would the syntax be KSPSetOperators(ksp,A,A)? Since you mentioned preconditioners are not generally used for matrix-free operators, I wasn?t sure whether I should still put ?A? in the Pmat field. Is it still possible to use TS in conjunction with the matrix-free operator? I?d like to create a simple test case that solves the 1d heat equation implicitly with variable coefficients, but didn?t know how the time stepping can be set up. Thanks, Yuyun From: Matthew Knepley Date: Tuesday, February 18, 2020 at 9:23 PM To: Yuyun Yang Cc: "Smith, Barry F." , "petsc-users at mcs.anl.gov" Subject: Re: [petsc-users] Matrix-free method in PETSc On Tue, Feb 18, 2020 at 8:20 AM Yuyun Yang > wrote: Thanks for the clarification. Got one more question: if I have variable coefficients, my stencil will be updated at every time step, so will the coefficients in myMatMult. In that case, is it necessary to destroy the shell matrix and create it all over again, or can I use it as it is, only calling the stencil update function, assuming the result will be passed into the matrix operation automatically? You update the information in the context associated with the shell matrix. No need to destroy it. Thanks, Matt Thanks, Yuyun On 2/18/20, 7:34 AM, "Smith, Barry F." > wrote: > On Feb 17, 2020, at 7:56 AM, Yuyun Yang > wrote: > > Hello, > > I actually have a question about the usage of DMDA since I'm quite new to this. I wonder if the DMDA suite of functions can be directly called on vectors created from VecCreate? Yes, but you have to make sure the ones you create have the same sizes and parallel layouts. Generally best to get them from the DMDA or VecDuplicate() than the hassle of figuring out sizes. > Or the vectors have to be formed by DMDACreateGlobalVector? I'm also not sure about what the dof and stencil width arguments do. > > I'm still unsure about the usage of MatCreateShell and MatShellSetOperation, since it seems that MyMatMult should still have 3 inputs just like MatMult (the matrix and two vectors). Since I'm not forming the matrix, does that mean the matrix input is meaningless but still needs to exist for the sake of this format? Well the matrix input is your shell matrix so it likely has information you need to do your multiply routine. MatShellGetContext() (No you do not want to put your information about the matrix stencil inside global variables!) > > After I create such a shell matrix, can I use it like a regular matrix in KSP and utilize preconditioners? > > Thanks! > Yuyun > From: petsc-users > on behalf of Yuyun Yang > > Sent: Sunday, February 16, 2020 3:12 AM > To: Smith, Barry F. > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Matrix-free method in PETSc > > Thank you, that is very helpful information indeed! I will try it and send you my code when it works. > > Best regards, > Yuyun > From: Smith, Barry F. > > Sent: Saturday, February 15, 2020 10:02 PM > To: Yuyun Yang > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Matrix-free method in PETSc > > Yuyun, > > If you are speaking about using a finite difference stencil on a structured grid where you provide the Jacobian vector products yourself by looping over the grid doing the stencil operation we unfortunately do not have exactly that kind of example. > > But it is actually not difficult. I suggest starting with src/ts/examples/tests/ex22.c It computes the sparse matrix explicitly with FormIJacobian() > > What you need to do is instead in main() use MatCreateShell() and MatShellSetOperation(,MATOP_MULT,(void (*)(void))MyMatMult) then provide the routine MyMatMult() to do your stencil based matrix free product; note that you can create this new routine by taking the structure of IFunction() and reorganizing it to do the Jacobian product instead. You will need to get the information about the shell matrix size on each process by calling DMDAGetCorners(). > > You will then remove the explicit computation of the Jacobian, and also remove the Event stuff since you don't need it. > > Extending to 2 and 3d is straight forward. > > Any questions let us know. > > Barry > > If you like this would make a great merge request with your code to improve our examples. > > > > On Feb 15, 2020, at 9:42 PM, Yuyun Yang > wrote: > > > > Hello team, > > > > I wanted to apply the Krylov subspace method to a matrix-free implementation of a stencil, such that the iterative method acts on the operation without ever constructing the matrix explicitly (for example, when doing backward Euler). > > > > I'm not sure whether there is already an example for that somewhere. If so, could you point me to a relevant example? > > > > Thank you! > > > > Best regards, > > Yuyun -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongzhang at anl.gov Tue Feb 18 09:31:57 2020 From: hongzhang at anl.gov (Zhang, Hong) Date: Tue, 18 Feb 2020 15:31:57 +0000 Subject: [petsc-users] Matrix-free method in PETSc In-Reply-To: References: <624EDFBF-E105-4F70-87C6-58BE940D73C9@mcs.anl.gov> <1D793649-96CD-4807-8330-53D9CA176348@stanford.edu> Message-ID: Here is an TS example using DMDA and matrix-free Jacobians. Though the matrix-free part is faked, it demonstrates the workflow. https://gitlab.com/petsc/petsc/-/blob/hongzh/ts-matshell-example/src/ts/examples/tutorials/advection-diffusion-reaction/ex5_mf.c Hong (Mr.) On Feb 18, 2020, at 8:26 AM, Yuyun Yang > wrote: Thanks. Also, when using KSP, would the syntax be KSPSetOperators(ksp,A,A)? Since you mentioned preconditioners are not generally used for matrix-free operators, I wasn?t sure whether I should still put ?A? in the Pmat field. Is it still possible to use TS in conjunction with the matrix-free operator? I?d like to create a simple test case that solves the 1d heat equation implicitly with variable coefficients, but didn?t know how the time stepping can be set up. Thanks, Yuyun From: Matthew Knepley > Date: Tuesday, February 18, 2020 at 9:23 PM To: Yuyun Yang > Cc: "Smith, Barry F." >, "petsc-users at mcs.anl.gov" > Subject: Re: [petsc-users] Matrix-free method in PETSc On Tue, Feb 18, 2020 at 8:20 AM Yuyun Yang > wrote: Thanks for the clarification. Got one more question: if I have variable coefficients, my stencil will be updated at every time step, so will the coefficients in myMatMult. In that case, is it necessary to destroy the shell matrix and create it all over again, or can I use it as it is, only calling the stencil update function, assuming the result will be passed into the matrix operation automatically? You update the information in the context associated with the shell matrix. No need to destroy it. Thanks, Matt Thanks, Yuyun On 2/18/20, 7:34 AM, "Smith, Barry F." > wrote: > On Feb 17, 2020, at 7:56 AM, Yuyun Yang > wrote: > > Hello, > > I actually have a question about the usage of DMDA since I'm quite new to this. I wonder if the DMDA suite of functions can be directly called on vectors created from VecCreate? Yes, but you have to make sure the ones you create have the same sizes and parallel layouts. Generally best to get them from the DMDA or VecDuplicate() than the hassle of figuring out sizes. > Or the vectors have to be formed by DMDACreateGlobalVector? I'm also not sure about what the dof and stencil width arguments do. > > I'm still unsure about the usage of MatCreateShell and MatShellSetOperation, since it seems that MyMatMult should still have 3 inputs just like MatMult (the matrix and two vectors). Since I'm not forming the matrix, does that mean the matrix input is meaningless but still needs to exist for the sake of this format? Well the matrix input is your shell matrix so it likely has information you need to do your multiply routine. MatShellGetContext() (No you do not want to put your information about the matrix stencil inside global variables!) > > After I create such a shell matrix, can I use it like a regular matrix in KSP and utilize preconditioners? > > Thanks! > Yuyun > From: petsc-users > on behalf of Yuyun Yang > > Sent: Sunday, February 16, 2020 3:12 AM > To: Smith, Barry F. > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Matrix-free method in PETSc > > Thank you, that is very helpful information indeed! I will try it and send you my code when it works. > > Best regards, > Yuyun > From: Smith, Barry F. > > Sent: Saturday, February 15, 2020 10:02 PM > To: Yuyun Yang > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Matrix-free method in PETSc > > Yuyun, > > If you are speaking about using a finite difference stencil on a structured grid where you provide the Jacobian vector products yourself by looping over the grid doing the stencil operation we unfortunately do not have exactly that kind of example. > > But it is actually not difficult. I suggest starting with src/ts/examples/tests/ex22.c It computes the sparse matrix explicitly with FormIJacobian() > > What you need to do is instead in main() use MatCreateShell() and MatShellSetOperation(,MATOP_MULT,(void (*)(void))MyMatMult) then provide the routine MyMatMult() to do your stencil based matrix free product; note that you can create this new routine by taking the structure of IFunction() and reorganizing it to do the Jacobian product instead. You will need to get the information about the shell matrix size on each process by calling DMDAGetCorners(). > > You will then remove the explicit computation of the Jacobian, and also remove the Event stuff since you don't need it. > > Extending to 2 and 3d is straight forward. > > Any questions let us know. > > Barry > > If you like this would make a great merge request with your code to improve our examples. > > > > On Feb 15, 2020, at 9:42 PM, Yuyun Yang > wrote: > > > > Hello team, > > > > I wanted to apply the Krylov subspace method to a matrix-free implementation of a stencil, such that the iterative method acts on the operation without ever constructing the matrix explicitly (for example, when doing backward Euler). > > > > I'm not sure whether there is already an example for that somewhere. If so, could you point me to a relevant example? > > > > Thank you! > > > > Best regards, > > Yuyun -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From juaneah at gmail.com Tue Feb 18 13:15:57 2020 From: juaneah at gmail.com (Emmanuel Ayala) Date: Tue, 18 Feb 2020 13:15:57 -0600 Subject: [petsc-users] SLEPc: The inner product is not well defined In-Reply-To: <5772B337-D031-434F-B2B9-AC6EA7D19783@dsic.upv.es> References: <5772B337-D031-434F-B2B9-AC6EA7D19783@dsic.upv.es> Message-ID: Ok, thank you! Kind regards. El mar., 18 de feb. de 2020 a la(s) 03:17, Jose E. Roman (jroman at dsic.upv.es) escribi?: > > > El 17 feb 2020, a las 19:19, Emmanuel Ayala escribi?: > > Thank you very much for the answer. > > This error appears when computing the B-norm of a vector x, as >> sqrt(x'*B*x). Probably your B matrix is semi-definite, and due to >> floating-point error the value x'*B*x becomes negative for a certain vector >> x. The code uses a tolerance of 10*PETSC_MACHINE_EPSILON, but it seems the >> rounding errors are larger in your case. Or maybe your B-matrix is >> indefinite, in which case you should solve the problem as non-symmetric (or >> as symmetric-indefinite GHIEP). >> >> Do you get the same problem with the Krylov-Schur solver? >> >> > After check the input matrices, the problem was solved using GHIEP. > > >> A workaround is to edit the source code and remove the check or increase >> the tolerance, but this may be catastrophic if your B is indefinite. A >> better solution is to reformulate the problem, solving the matrix pair >> (A,C) where C=alpha*A+beta*B is positive definite (note that then the >> eigenvalues become lambda/(beta+alpha*lambda)). >> >> > Ok, there is a rule to choose the values for alpha and beta? > > > For instance take alpha=1 and beta=-sigma, where sigma is a lower bound of > the leftmost eigenvalue of B (the most negative one). This assumes that A > is positive definite. > > Jose > > > > Kind regards. > Thanks. > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From juaneah at gmail.com Tue Feb 18 13:18:35 2020 From: juaneah at gmail.com (Emmanuel Ayala) Date: Tue, 18 Feb 2020 13:18:35 -0600 Subject: [petsc-users] BCs for a EPS solver In-Reply-To: <9286DD37-1487-4F54-BBC9-4E6133FEC916@dsic.upv.es> References: <9286DD37-1487-4F54-BBC9-4E6133FEC916@dsic.upv.es> Message-ID: Thanks for the answer. Finally I generate a submatrix and It worked. Kind regards. El mar., 18 de feb. de 2020 a la(s) 03:51, Jose E. Roman (jroman at dsic.upv.es) escribi?: > You put alpha on the diagonal of A and beta on the diagonal of B to get an > eigenvalue lambda=alpha/beta. If you set beta=0 then lambda=Inf. The choice > depends on where your wanted eigenvalues are and how you are solving the > eigenproblem. The choice of lambda=Inf suggested by Jeremy avoids inserting > eigenvalues that may interfere with the problem's eigenvalues, but this is > good for shift-and-invert, not for the case where you solve linear systems > with B. > > Anyway, this kind of manipulation may have an impact on convergence of the > eigensolver or on conditioning of the linear solves. A possibly better > approach is just to get rid of the BC unknowns by creating smaller A, B > matrices, e.g. with MatGetSubMatrix(). > > Jose > > > El 18 feb 2020, a las 0:41, Matthew Knepley escribi?: > > On Mon, Feb 17, 2020 at 4:33 PM Emmanuel Ayala wrote: > >> Hi, >> >> Thank you for the clarification, now I understand what means change those >> values, and I tried to do that. >> >> But if I put the row i to the identity in A, and zero in B, the solver >> crash: >> > > So if you need to factor B, maybe reverse it? > > Thanks, > > Matt > > >> [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [0]PETSC ERROR: Error in external library >> [1]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [1]PETSC ERROR: Error in external library >> [1]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: >> INFOG(1)=-10, INFO(2)=0 >> >> [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for >> trouble shooting. >> [1]PETSC ERROR: Petsc Release Version 3.12.3, Jan, 03, 2020 >> [1]PETSC ERROR: [2]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [2]PETSC ERROR: Error in external library >> [2]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: >> INFOG(1)=-10, INFO(2)=0 >> >> [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for >> trouble shooting. >> [2]PETSC ERROR: Petsc Release Version 3.12.3, Jan, 03, 2020 >> [2]PETSC ERROR: ./comp on a arch-linux-c-opt-O2-mumps named ayala by >> ayala Mon Feb 17 15:28:01 2020 >> [3]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [3]PETSC ERROR: Error in external library >> [3]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: >> INFOG(1)=-10, INFO(2)=0 >> >> [3]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for >> trouble shooting. >> [3]PETSC ERROR: Petsc Release Version 3.12.3, Jan, 03, 2020 >> [3]PETSC ERROR: ./comp on a arch-linux-c-opt-O2-mumps named ayala by >> ayala Mon Feb 17 15:28:01 2020 >> [3]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2 >> -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" >> FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich >> --download-mumps --download-scalapack --download-parmetis --download-metis >> --download-superlu_dist --download-cmake --download-fblaslapack=1 >> --with-cxx-dialect=C++11 >> [3]PETSC ERROR: #1 MatFactorNumeric_MUMPS() line 1365 in >> /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/impls/aij/mpi/mumps/mumps.c >> [0]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: >> INFOG(1)=-10, INFO(2)=33 >> >> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for >> trouble shooting. >> [0]PETSC ERROR: Petsc Release Version 3.12.3, Jan, 03, 2020 >> [0]PETSC ERROR: ./comp on a arch-linux-c-opt-O2-mumps named ayala by >> ayala Mon Feb 17 15:28:01 2020 >> [0]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2 >> -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" >> FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich >> --download-mumps --download-scalapack --download-parmetis --download-metis >> --download-superlu_dist --download-cmake --download-fblaslapack=1 >> --with-cxx-dialect=C++11 >> [0]PETSC ERROR: #1 MatFactorNumeric_MUMPS() line 1365 in >> /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/impls/aij/mpi/mumps/mumps.c >> ./comp on a arch-linux-c-opt-O2-mumps named ayala by ayala Mon Feb 17 >> 15:28:01 2020 >> [1]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2 >> -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" >> FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich >> --download-mumps --download-scalapack --download-parmetis --download-metis >> --download-superlu_dist --download-cmake --download-fblaslapack=1 >> --with-cxx-dialect=C++11 >> [1]PETSC ERROR: #1 MatFactorNumeric_MUMPS() line 1365 in >> /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/impls/aij/mpi/mumps/mumps.c >> [1]PETSC ERROR: #2 MatLUFactorNumeric() line 3057 in >> /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/interface/matrix.c >> [2]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2 >> -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" >> FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich >> --download-mumps --download-scalapack --download-parmetis --download-metis >> --download-superlu_dist --download-cmake --download-fblaslapack=1 >> --with-cxx-dialect=C++11 >> [2]PETSC ERROR: #1 MatFactorNumeric_MUMPS() line 1365 in >> /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/impls/aij/mpi/mumps/mumps.c >> [2]PETSC ERROR: #2 MatLUFactorNumeric() line 3057 in >> /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/interface/matrix.c >> [2]PETSC ERROR: [3]PETSC ERROR: #2 MatLUFactorNumeric() line 3057 in >> /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/interface/matrix.c >> [3]PETSC ERROR: #3 PCSetUp_LU() line 126 in >> /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/impls/factor/lu/lu.c >> [3]PETSC ERROR: #4 PCSetUp() line 894 in >> /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/interface/precon.c >> [0]PETSC ERROR: #2 MatLUFactorNumeric() line 3057 in >> /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/interface/matrix.c >> [0]PETSC ERROR: #3 PCSetUp_LU() line 126 in >> /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/impls/factor/lu/lu.c >> [0]PETSC ERROR: [1]PETSC ERROR: #3 PCSetUp_LU() line 126 in >> /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/impls/factor/lu/lu.c >> [1]PETSC ERROR: #4 PCSetUp() line 894 in >> /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/interface/precon.c >> [1]PETSC ERROR: #3 PCSetUp_LU() line 126 in >> /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/impls/factor/lu/lu.c >> [2]PETSC ERROR: #4 PCSetUp() line 894 in >> /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/interface/precon.c >> [2]PETSC ERROR: #5 KSPSetUp() line 376 in >> /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/ksp/interface/itfunc.c >> [3]PETSC ERROR: #5 KSPSetUp() line 376 in >> /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/ksp/interface/itfunc.c >> [3]PETSC ERROR: #6 STSetUp_Shift() line 120 in >> /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/impls/shift/shift.c >> #4 PCSetUp() line 894 in >> /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/interface/precon.c >> [0]PETSC ERROR: #5 KSPSetUp() line 376 in >> /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/ksp/interface/itfunc.c >> [0]PETSC ERROR: #6 STSetUp_Shift() line 120 in >> /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/impls/shift/shift.c >> [0]PETSC ERROR: #5 KSPSetUp() line 376 in >> /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/ksp/interface/itfunc.c >> [1]PETSC ERROR: #6 STSetUp_Shift() line 120 in >> /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/impls/shift/shift.c >> [1]PETSC ERROR: #7 STSetUp() line 271 in >> /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/interface/stsolve.c >> [1]PETSC ERROR: #8 EPSSetUp() line 273 in >> /home/ayala/Documents/SLEPc/slepc-3.12.2/src/eps/interface/epssetup.c >> [2]PETSC ERROR: #6 STSetUp_Shift() line 120 in >> /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/impls/shift/shift.c >> [2]PETSC ERROR: #7 STSetUp() line 271 in >> /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/interface/stsolve.c >> [2]PETSC ERROR: #8 EPSSetUp() line 273 in >> /home/ayala/Documents/SLEPc/slepc-3.12.2/src/eps/interface/epssetup.c >> [2]PETSC ERROR: #9 FourBar_NaturalPulsation() line 3937 in >> /home/ayala/Nextcloud/cpp_projects/2020-02-13-muboto-balancing-v17-mma/ >> Multibody.cc >> >> El lun., 17 de feb. de 2020 a la(s) 13:20, Matthew Knepley ( >> knepley at gmail.com) escribi?: >> >>> On Mon, Feb 17, 2020 at 1:59 PM Emmanuel Ayala >>> wrote: >>> >>>> Hi, thanks for the quick answer. >>>> >>>> I just did it, and it does not work. My problem is GNHEP and I use the >>>> default solver (Krylov-Schur). Moreover I run the code with the options: >>>> -st_ksp_type preonly -st_pc_type lu -st_pc_factor_mat_solver_type mumps >>>> >>> >>> I guess a better question is: What do you expect to work? >>> >>> For a linear solve, >>> >>> A x = b >>> >>> if a row i is 0 except for a one on the diagonal, then I get >>> >>> x_i = b_i >>> >>> so hopefully you put the correct boundary value in b_i. For the >>> generalized eigenproblem >>> >>> A x = \lambda B x >>> >>> if you set row i to the identity in A, and zero in B, we get >>> >>> x_i = 0 >>> >>> and you must put the boundary values into x after you have finished the >>> solve. Is this what you did? >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Any other suggestions? >>>> Kind regards. >>>> >>>> El lun., 17 de feb. de 2020 a la(s) 12:39, Jeremy Theler ( >>>> jeremy at seamplex.com) escribi?: >>>> >>>>> The usual trick is to set ones in one matrix and zeros in the other >>>>> one. >>>>> >>>>> >>>>> On Mon, 2020-02-17 at 12:35 -0600, Emmanuel Ayala wrote: >>>>> > Hi everyone, >>>>> > >>>>> > I have an eigenvalue problem where I need to apply BCs to the >>>>> > stiffness and mass matrix. >>>>> > >>>>> > Usually, for KSP solver, it is enough to set to zero the rows and >>>>> > columns related to the boundary conditions. I used to apply it with >>>>> > MatZeroRowsColumns, with a 1s on the diagonal. Then the solver works >>>>> > well. >>>>> > >>>>> > There is something similar to KSP for EPS solver ? >>>>> > >>>>> > I already used MatZeroRowsColumns (for EPS solver), with a 1s on the >>>>> > diagonal, and I got wrong result. >>>>> > >>>>> > Kind regards. >>>>> > >>>>> > >>>>> > >>>>> > >>>>> >>>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hectorb at utexas.edu Tue Feb 18 14:30:44 2020 From: hectorb at utexas.edu (Hector E Barrios Molano) Date: Tue, 18 Feb 2020 14:30:44 -0600 Subject: [petsc-users] Efficiently move matrix from single processor to multiple Message-ID: Dear PETSc Experts! Do you know if there is an efficient way to move a matrix from a single processor (MatCreateSeqBAIJ) to a matrix contained in all processors? As a little bit of context, I have a code in which only one processor creates a matrix and a vector for a linear system of equations. Then we want to use a parallel solver to get the solution and give it back to a single processor I tried MatView to create a binary file and MatLoad to load the matrix in parallel. This seems to work but performance is significantly decreased independent of the number of processors used. I have some questions: Can I share the matrix without having to write it to a file, for example, through a buffer? Is there a way to efficiently avoid the overhead of writing, reading loading matrices to and from processors? Thanks for your comments, Hector -- *Hector Barrios* PhD Student, Graduate Research Assistant Hildebrand Department of Petroleum and Geosystems Engineering The University of Texas at Austin hectorb at utexas.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Tue Feb 18 14:49:32 2020 From: jed at jedbrown.org (Jed Brown) Date: Tue, 18 Feb 2020 13:49:32 -0700 Subject: [petsc-users] Efficiently move matrix from single processor to multiple In-Reply-To: References: Message-ID: <87blpv39k3.fsf@jedbrown.org> Hector E Barrios Molano writes: > Dear PETSc Experts! > > Do you know if there is an efficient way to move a matrix from a single > processor (MatCreateSeqBAIJ) to a matrix contained in all processors? How did you create the original SeqBAIJ? Could you just call MatSetValues on a parallel matrix instead (even if only setting values from rank 0)? Many of our tutorials suggest this mode if you can't parallelize your assembly. > As a little bit of context, I have a code in which only one processor > creates a matrix and a vector for a linear system of equations. Then we > want to use a parallel solver to get the solution and give it back to a > single processor > > I tried MatView to create a binary file and MatLoad to load the matrix > in parallel. This seems to work but performance is significantly > decreased independent of the number of processors used. > > I have some questions: > > Can I share the matrix without having to write it to a file, for > example, through a buffer? > Is there a way to efficiently avoid the overhead of writing, reading > loading matrices to and from processors? > > Thanks for your comments, > > Hector > -- > *Hector Barrios* > PhD Student, Graduate Research Assistant > Hildebrand Department of Petroleum and Geosystems Engineering > The University of Texas at Austin > hectorb at utexas.edu From yyang85 at stanford.edu Tue Feb 18 17:56:47 2020 From: yyang85 at stanford.edu (Yuyun Yang) Date: Tue, 18 Feb 2020 23:56:47 +0000 Subject: [petsc-users] Matrix-free method in PETSc In-Reply-To: References: <624EDFBF-E105-4F70-87C6-58BE940D73C9@mcs.anl.gov> <1D793649-96CD-4807-8330-53D9CA176348@stanford.edu> Message-ID: <5F6B7FE1-AF86-4F37-ACDD-84408ADA9A9C@stanford.edu> Thanks a lot for the example! From: "Zhang, Hong" Date: Tuesday, February 18, 2020 at 11:32 PM To: Yuyun Yang Cc: Matthew Knepley , "petsc-users at mcs.anl.gov" Subject: Re: [petsc-users] Matrix-free method in PETSc Here is an TS example using DMDA and matrix-free Jacobians. Though the matrix-free part is faked, it demonstrates the workflow. https://gitlab.com/petsc/petsc/-/blob/hongzh/ts-matshell-example/src/ts/examples/tutorials/advection-diffusion-reaction/ex5_mf.c Hong (Mr.) On Feb 18, 2020, at 8:26 AM, Yuyun Yang > wrote: Thanks. Also, when using KSP, would the syntax be KSPSetOperators(ksp,A,A)? Since you mentioned preconditioners are not generally used for matrix-free operators, I wasn?t sure whether I should still put ?A? in the Pmat field. Is it still possible to use TS in conjunction with the matrix-free operator? I?d like to create a simple test case that solves the 1d heat equation implicitly with variable coefficients, but didn?t know how the time stepping can be set up. Thanks, Yuyun From: Matthew Knepley > Date: Tuesday, February 18, 2020 at 9:23 PM To: Yuyun Yang > Cc: "Smith, Barry F." >, "petsc-users at mcs.anl.gov" > Subject: Re: [petsc-users] Matrix-free method in PETSc On Tue, Feb 18, 2020 at 8:20 AM Yuyun Yang > wrote: Thanks for the clarification. Got one more question: if I have variable coefficients, my stencil will be updated at every time step, so will the coefficients in myMatMult. In that case, is it necessary to destroy the shell matrix and create it all over again, or can I use it as it is, only calling the stencil update function, assuming the result will be passed into the matrix operation automatically? You update the information in the context associated with the shell matrix. No need to destroy it. Thanks, Matt Thanks, Yuyun On 2/18/20, 7:34 AM, "Smith, Barry F." > wrote: > On Feb 17, 2020, at 7:56 AM, Yuyun Yang > wrote: > > Hello, > > I actually have a question about the usage of DMDA since I'm quite new to this. I wonder if the DMDA suite of functions can be directly called on vectors created from VecCreate? Yes, but you have to make sure the ones you create have the same sizes and parallel layouts. Generally best to get them from the DMDA or VecDuplicate() than the hassle of figuring out sizes. > Or the vectors have to be formed by DMDACreateGlobalVector? I'm also not sure about what the dof and stencil width arguments do. > > I'm still unsure about the usage of MatCreateShell and MatShellSetOperation, since it seems that MyMatMult should still have 3 inputs just like MatMult (the matrix and two vectors). Since I'm not forming the matrix, does that mean the matrix input is meaningless but still needs to exist for the sake of this format? Well the matrix input is your shell matrix so it likely has information you need to do your multiply routine. MatShellGetContext() (No you do not want to put your information about the matrix stencil inside global variables!) > > After I create such a shell matrix, can I use it like a regular matrix in KSP and utilize preconditioners? > > Thanks! > Yuyun > From: petsc-users > on behalf of Yuyun Yang > > Sent: Sunday, February 16, 2020 3:12 AM > To: Smith, Barry F. > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Matrix-free method in PETSc > > Thank you, that is very helpful information indeed! I will try it and send you my code when it works. > > Best regards, > Yuyun > From: Smith, Barry F. > > Sent: Saturday, February 15, 2020 10:02 PM > To: Yuyun Yang > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Matrix-free method in PETSc > > Yuyun, > > If you are speaking about using a finite difference stencil on a structured grid where you provide the Jacobian vector products yourself by looping over the grid doing the stencil operation we unfortunately do not have exactly that kind of example. > > But it is actually not difficult. I suggest starting with src/ts/examples/tests/ex22.c It computes the sparse matrix explicitly with FormIJacobian() > > What you need to do is instead in main() use MatCreateShell() and MatShellSetOperation(,MATOP_MULT,(void (*)(void))MyMatMult) then provide the routine MyMatMult() to do your stencil based matrix free product; note that you can create this new routine by taking the structure of IFunction() and reorganizing it to do the Jacobian product instead. You will need to get the information about the shell matrix size on each process by calling DMDAGetCorners(). > > You will then remove the explicit computation of the Jacobian, and also remove the Event stuff since you don't need it. > > Extending to 2 and 3d is straight forward. > > Any questions let us know. > > Barry > > If you like this would make a great merge request with your code to improve our examples. > > > > On Feb 15, 2020, at 9:42 PM, Yuyun Yang > wrote: > > > > Hello team, > > > > I wanted to apply the Krylov subspace method to a matrix-free implementation of a stencil, such that the iterative method acts on the operation without ever constructing the matrix explicitly (for example, when doing backward Euler). > > > > I'm not sure whether there is already an example for that somewhere. If so, could you point me to a relevant example? > > > > Thank you! > > > > Best regards, > > Yuyun -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Feb 18 21:09:04 2020 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Wed, 19 Feb 2020 03:09:04 +0000 Subject: [petsc-users] Matrix-free method in PETSc In-Reply-To: References: <624EDFBF-E105-4F70-87C6-58BE940D73C9@mcs.anl.gov> <1D793649-96CD-4807-8330-53D9CA176348@stanford.edu> Message-ID: <264F2C78-2D0D-46CC-A161-6161D97B433E@mcs.anl.gov> > On Feb 18, 2020, at 8:26 AM, Yuyun Yang wrote: > > Thanks. Also, when using KSP, would the syntax be KSPSetOperators(ksp,A,A)? Since you mentioned preconditioners are not generally used for matrix-free operators, I wasn?t sure whether I should still put ?A? in the Pmat field. > > Is it still possible to use TS in conjunction with the matrix-free operator? I?d like to create a simple test case that solves the 1d heat equation implicitly with variable coefficients, but didn?t know how the time stepping can be set up. On Feb 15, 2020, at 9:42 PM you asked about "(for example, when doing backward Euler)" on Saturday, February 15, 2020 10:02 PM I suggested you start with the example src/ts/examples/tests/ex22.c I outlined how you could change it to be matrix free. The example clearly uses TS Now three days later you are asking about how time-stepping can be set up with a matrix-free operator? If you are going to ignore answers we provide to your questions maybe we won't bother answering in the future. > > Thanks, > Yuyun > > From: Matthew Knepley > Date: Tuesday, February 18, 2020 at 9:23 PM > To: Yuyun Yang > Cc: "Smith, Barry F." , "petsc-users at mcs.anl.gov" > Subject: Re: [petsc-users] Matrix-free method in PETSc > > On Tue, Feb 18, 2020 at 8:20 AM Yuyun Yang wrote: > Thanks for the clarification. > > Got one more question: if I have variable coefficients, my stencil will be updated at every time step, so will the coefficients in myMatMult. In that case, is it necessary to destroy the shell matrix and create it all over again, or can I use it as it is, only calling the stencil update function, assuming the result will be passed into the matrix operation automatically? > > You update the information in the context associated with the shell matrix. No need to destroy it. > > Thanks, > > Matt > > Thanks, > Yuyun > > On 2/18/20, 7:34 AM, "Smith, Barry F." wrote: > > > > > On Feb 17, 2020, at 7:56 AM, Yuyun Yang wrote: > > > > Hello, > > > > I actually have a question about the usage of DMDA since I'm quite new to this. I wonder if the DMDA suite of functions can be directly called on vectors created from VecCreate? > > Yes, but you have to make sure the ones you create have the same sizes and parallel layouts. Generally best to get them from the DMDA or VecDuplicate() than the hassle of figuring out sizes. > > > Or the vectors have to be formed by DMDACreateGlobalVector? I'm also not sure about what the dof and stencil width arguments do. > > > > I'm still unsure about the usage of MatCreateShell and MatShellSetOperation, since it seems that MyMatMult should still have 3 inputs just like MatMult (the matrix and two vectors). Since I'm not forming the matrix, does that mean the matrix input is meaningless but still needs to exist for the sake of this format? > > Well the matrix input is your shell matrix so it likely has information you need to do your multiply routine. MatShellGetContext() (No you do not want to put your information about the matrix stencil inside global variables!) > > > > > > After I create such a shell matrix, can I use it like a regular matrix in KSP and utilize preconditioners? > > > > Thanks! > > Yuyun > > From: petsc-users on behalf of Yuyun Yang > > Sent: Sunday, February 16, 2020 3:12 AM > > To: Smith, Barry F. > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Matrix-free method in PETSc > > > > Thank you, that is very helpful information indeed! I will try it and send you my code when it works. > > > > Best regards, > > Yuyun > > From: Smith, Barry F. > > Sent: Saturday, February 15, 2020 10:02 PM > > To: Yuyun Yang > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Matrix-free method in PETSc > > > > Yuyun, > > > > If you are speaking about using a finite difference stencil on a structured grid where you provide the Jacobian vector products yourself by looping over the grid doing the stencil operation we unfortunately do not have exactly that kind of example. > > > > But it is actually not difficult. I suggest starting with src/ts/examples/tests/ex22.c It computes the sparse matrix explicitly with FormIJacobian() > > > > What you need to do is instead in main() use MatCreateShell() and MatShellSetOperation(,MATOP_MULT,(void (*)(void))MyMatMult) then provide the routine MyMatMult() to do your stencil based matrix free product; note that you can create this new routine by taking the structure of IFunction() and reorganizing it to do the Jacobian product instead. You will need to get the information about the shell matrix size on each process by calling DMDAGetCorners(). > > > > You will then remove the explicit computation of the Jacobian, and also remove the Event stuff since you don't need it. > > > > Extending to 2 and 3d is straight forward. > > > > Any questions let us know. > > > > Barry > > > > If you like this would make a great merge request with your code to improve our examples. > > > > > > > On Feb 15, 2020, at 9:42 PM, Yuyun Yang wrote: > > > > > > Hello team, > > > > > > I wanted to apply the Krylov subspace method to a matrix-free implementation of a stencil, such that the iterative method acts on the operation without ever constructing the matrix explicitly (for example, when doing backward Euler). > > > > > > I'm not sure whether there is already an example for that somewhere. If so, could you point me to a relevant example? > > > > > > Thank you! > > > > > > Best regards, > > > Yuyun > > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From mfadams at lbl.gov Wed Feb 19 16:07:58 2020 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 19 Feb 2020 17:07:58 -0500 Subject: [petsc-users] ParMetis error Message-ID: We have a code that works with v3.7.7 but with newer versions we get what looks like an internal ParMetis error ('Failed during initial partitioning'). See attached output. I've never seen this message ... any ideas? Thanks, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Petsc3-11_xgc_28263517_.log Type: application/octet-stream Size: 473109 bytes Desc: not available URL: From bsmith at mcs.anl.gov Wed Feb 19 17:58:17 2020 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Wed, 19 Feb 2020 23:58:17 +0000 Subject: [petsc-users] ParMetis error In-Reply-To: References: Message-ID: <37372288-90BA-4020-99F3-4E8541F0A5E2@anl.gov> Mark, It may be best to try jumping to the latest PETSc 3.12. ParMETIS had some difficult issues with matrices we started to provide to it in the last year and the code to handle the problems may not be in 3.11 If the problem persists in 3.12 then I would start with checking with valgrind. Barry > On Feb 19, 2020, at 4:07 PM, Mark Adams wrote: > > We have a code that works with v3.7.7 but with newer versions we get what looks like an internal ParMetis error ('Failed during initial partitioning'). See attached output. > > I've never seen this message ... any ideas? > > Thanks, > Mark > From jourdon_anthony at hotmail.fr Thu Feb 20 02:03:18 2020 From: jourdon_anthony at hotmail.fr (Anthony Jourdon) Date: Thu, 20 Feb 2020 08:03:18 +0000 Subject: [petsc-users] DMDA Error In-Reply-To: References: , Message-ID: Hello, After tests and discussions with the computer admins the problem is solved ! It appears that the bug indeed comes from intel mpi 2019 and all of its updates. For reasons that I do not understand it seems that intel mpi 2019 gives strange MPI errors when inter-nodes communication is required for computers using infiniband. Apparently this is a known error and indeed I found topics on forums talking about that. I switch to intel mpi 2018 Update 3 and no problem, code runs normally on 1024 mpi ranks. Thank you for your attention and your time ! Sincerly, Anthony Jourdon ________________________________ De : Zhang, Junchao Envoy? : vendredi 24 janvier 2020 16:52 ? : Anthony Jourdon Cc : petsc-users at mcs.anl.gov Objet : Re: [petsc-users] DMDA Error Hello, Anthony I tried petsc-3.8.4 + icc/gcc + Intel MPI 2019 update 5 + optimized/debug build, and ran with 1024 ranks, but I could not reproduce the error. Maybe you can try these: * Use the latest petsc + your test example, run with AND without -vecscatter_type mpi1, to see if they can report useful messages. * Or, use Intel MPI 2019 update 6 to see if this is an Intel MPI bug. $ cat ex50.c #include #include int main(int argc,char **argv) { PetscErrorCode ierr; PetscInt size; PetscInt X = 1024,Y = 128,Z=512; //PetscInt X = 512,Y = 64, Z=256; DM da; ierr = PetscInitialize(&argc,&argv,(char*)0,NULL);if (ierr) return ierr; ierr = MPI_Comm_size(PETSC_COMM_WORLD,&size);CHKERRQ(ierr); ierr = DMDACreate3d(PETSC_COMM_WORLD,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE,DMDA_STENCIL_BOX,2*X+1,2*Y+1,2*Z+1,PETSC_DECIDE,PETSC_DECIDE,PETSC_DECIDE,3,2,NULL,NULL,NULL,&da);CHKERRQ(ierr); ierr = DMSetFromOptions(da);CHKERRQ(ierr); ierr = DMSetUp(da);CHKERRQ(ierr); ierr = PetscPrintf(PETSC_COMM_WORLD,"Running with %D MPI ranks\n",size);CHKERRQ(ierr); ierr = DMDestroy(&da);CHKERRQ(ierr); ierr = PetscFinalize(); return ierr; } $ldd ex50 linux-vdso.so.1 => (0x00007ffdbcd43000) libpetsc.so.3.8 => /home/jczhang/petsc/linux-intel-opt/lib/libpetsc.so.3.8 (0x00002afd27e51000) libX11.so.6 => /lib64/libX11.so.6 (0x00002afd2a811000) libifport.so.5 => /blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-parallel-studio-cluster.2019.5-zqvneipqa4u52iwlyy5kx4hbsfnspz6g/compilers_and_libraries_2019.5.281/linux/compiler/lib/intel64_lin/libifport.so.5 (0x00002afd2ab4f000) libmpicxx.so.12 => /blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-parallel-studio-cluster.2019.5-zqvneipqa4u52iwlyy5kx4hbsfnspz6g/compilers_and_libraries_2019.5.281/linux/mpi/intel64/lib/libmpicxx.so.12 (0x00002afd2ad7d000) libdl.so.2 => /lib64/libdl.so.2 (0x00002afd2af9d000) libmpifort.so.12 => /blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-parallel-studio-cluster.2019.5-zqvneipqa4u52iwlyy5kx4hbsfnspz6g/compilers_and_libraries_2019.5.281/linux/mpi/intel64/lib/libmpifort.so.12 (0x00002afd2b1a1000) libmpi.so.12 => /blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-parallel-studio-cluster.2019.5-zqvneipqa4u52iwlyy5kx4hbsfnspz6g/compilers_and_libraries_2019.5.281/linux/mpi/intel64/lib/release/libmpi.so.12 (0x00002afd2b55f000) librt.so.1 => /lib64/librt.so.1 (0x00002afd2c564000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00002afd2c76c000) libimf.so => /blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-parallel-studio-cluster.2019.5-zqvneipqa4u52iwlyy5kx4hbsfnspz6g/compilers_and_libraries_2019.5.281/linux/compiler/lib/intel64_lin/libimf.so (0x00002afd2c988000) libsvml.so => /blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-parallel-studio-cluster.2019.5-zqvneipqa4u52iwlyy5kx4hbsfnspz6g/compilers_and_libraries_2019.5.281/linux/compiler/lib/intel64_lin/libsvml.so (0x00002afd2d00d000) libirng.so => /blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-parallel-studio-cluster.2019.5-zqvneipqa4u52iwlyy5kx4hbsfnspz6g/compilers_and_libraries_2019.5.281/linux/compiler/lib/intel64_lin/libirng.so (0x00002afd2ea99000) libm.so.6 => /lib64/libm.so.6 (0x00002afd2ee04000) libcilkrts.so.5 => /blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-parallel-studio-cluster.2019.5-zqvneipqa4u52iwlyy5kx4hbsfnspz6g/compilers_and_libraries_2019.5.281/linux/compiler/lib/intel64_lin/libcilkrts.so.5 (0x00002afd2f106000) libstdc++.so.6 => /blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-parallel-studio-cluster.2019.5-zqvneipqa4u52iwlyy5kx4hbsfnspz6g/clck/2019.5/lib/intel64/libstdc++.so.6 (0x00002afd2f343000) libgcc_s.so.1 => /blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-parallel-studio-cluster.2019.5-zqvneipqa4u52iwlyy5kx4hbsfnspz6g/clck/2019.5/lib/intel64/libgcc_s.so.1 (0x00002afd2f655000) libirc.so => /blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-parallel-studio-cluster.2019.5-zqvneipqa4u52iwlyy5kx4hbsfnspz6g/compilers_and_libraries_2019.5.281/linux/compiler/lib/intel64_lin/libirc.so (0x00002afd2f86b000) libc.so.6 => /lib64/libc.so.6 (0x00002afd2fadd000) libintlc.so.5 => /blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-parallel-studio-cluster.2019.5-zqvneipqa4u52iwlyy5kx4hbsfnspz6g/compilers_and_libraries_2019.5.281/linux/compiler/lib/intel64_lin/libintlc.so.5 (0x00002afd2feaa000) libxcb.so.1 => /lib64/libxcb.so.1 (0x00002afd3011c000) /lib64/ld-linux-x86-64.so.2 (0x00002afd27c2d000) libfabric.so.1 => /blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-parallel-studio-cluster.2019.5-zqvneipqa4u52iwlyy5kx4hbsfnspz6g/compilers_and_libraries_2019.5.281/linux/mpi/intel64/libfabric/lib/libfabric.so.1 (0x00002afd30344000) libXau.so.6 => /lib64/libXau.so.6 (0x00002afd3057c000) --Junchao Zhang On Tue, Jan 21, 2020 at 2:25 AM Anthony Jourdon > wrote: Hello, I made a test to try to reproduce the error. To do so I modified the file $PETSC_DIR/src/dm/examples/tests/ex35.c I attach the file in case of need. The same error is reproduced for 1024 mpi ranks. I tested two problem sizes (2*512+1x2*64+1x2*256+1 and 2*1024+1x2*128+1x2*512+1) and the error occured for both cases, the first case is also the one I used to run before the OS and mpi updates. I also run the code with -malloc_debug and nothing more appeared. I attached the configure command I used to build a debug version of petsc. Thank you for your time, Sincerly. Anthony Jourdon ________________________________ De : Zhang, Junchao > Envoy? : jeudi 16 janvier 2020 16:49 ? : Anthony Jourdon > Cc : petsc-users at mcs.anl.gov > Objet : Re: [petsc-users] DMDA Error It seems the problem is triggered by DMSetUp. You can write a small test creating the DMDA with the same size as your code, to see if you can reproduce the problem. If yes, it would be much easier for us to debug it. --Junchao Zhang On Thu, Jan 16, 2020 at 7:38 AM Anthony Jourdon > wrote: Dear Petsc developer, I need assistance with an error. I run a code that uses the DMDA related functions. I'm using petsc-3.8.4. This code used to run very well on a super computer with the OS SLES11. Petsc was built using an intel mpi 5.1.3.223 module and intel mkl version 2016.0.2.181 The code was running with no problem on 1024 and more mpi ranks. Recently, the OS of the computer has been updated to RHEL7 I rebuilt Petsc using new available versions of intel mpi (2019U5) and mkl (2019.0.5.281) which are the same versions for compilers and mkl. Since then I tested to run the exact same code on 8, 16, 24, 48, 512 and 1024 mpi ranks. Until 1024 mpi ranks no problem, but for 1024 an error related to DMDA appeared. I snip the first lines of the error stack here and the full error stack is attached. [534]PETSC ERROR: #1 PetscGatherMessageLengths() line 120 in /scratch2/dlp/appli_local/SCR/OROGEN/petsc3.8.4_MPI/petsc-3.8.4/src/sys/utils/mpimesg.c [534]PETSC ERROR: #2 VecScatterCreate_PtoS() line 2288 in /scratch2/dlp/appli_local/SCR/OROGEN/petsc3.8.4_MPI/petsc-3.8.4/src/vec/vec/utils/vpscat.c [534]PETSC ERROR: #3 VecScatterCreate() line 1462 in /scratch2/dlp/appli_local/SCR/OROGEN/petsc3.8.4_MPI/petsc-3.8.4/src/vec/vec/utils/vscat.c [534]PETSC ERROR: #4 DMSetUp_DA_3D() line 1042 in /scratch2/dlp/appli_local/SCR/OROGEN/petsc3.8.4_MPI/petsc-3.8.4/src/dm/impls/da/da3.c [534]PETSC ERROR: #5 DMSetUp_DA() line 25 in /scratch2/dlp/appli_local/SCR/OROGEN/petsc3.8.4_MPI/petsc-3.8.4/src/dm/impls/da/dareg.c [534]PETSC ERROR: #6 DMSetUp() line 720 in /scratch2/dlp/appli_local/SCR/OROGEN/petsc3.8.4_MPI/petsc-3.8.4/src/dm/interface/dm.c Thank you for your time, Sincerly, Anthony Jourdon -------------- next part -------------- An HTML attachment was scrubbed... URL: From eijkhout at tacc.utexas.edu Thu Feb 20 08:47:22 2020 From: eijkhout at tacc.utexas.edu (Victor Eijkhout) Date: Thu, 20 Feb 2020 14:47:22 +0000 Subject: [petsc-users] DMDA Error In-Reply-To: References: Message-ID: On , 2020Feb20, at 02:03, Anthony Jourdon > wrote: It appears that the bug indeed comes from intel mpi 2019 and all of its updates. Try export I_MPI_ADJUST_ALLREDUCE=1 (Please check spelling) For some reason the default allreduce algorithm is broken. This setting with 19.0.6 has solved many problems. Don?t use 19.0.5 or earlier at all. Victor. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Feb 20 13:22:07 2020 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 20 Feb 2020 14:22:07 -0500 Subject: [petsc-users] static libs Message-ID: We are having problems linking with at Cray static library environment variable, that is required to link Adios, and IO package. How does one build with static PETSc libs? Thanks, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Thu Feb 20 13:24:14 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 20 Feb 2020 13:24:14 -0600 Subject: [petsc-users] static libs In-Reply-To: References: Message-ID: You can build PETSc statically with configure option: --with-shared-libraries=0 Satish On Thu, 20 Feb 2020, Mark Adams wrote: > We are having problems linking with at Cray static library environment > variable, that is required to link Adios, and IO package. How does one > build with static PETSc libs? > Thanks, > Mark > From balay at mcs.anl.gov Thu Feb 20 13:29:49 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 20 Feb 2020 13:29:49 -0600 Subject: [petsc-users] static libs In-Reply-To: References: Message-ID: BTW: What do you mean by 'Cray static library environment variable'? Is it CRAYPE_LINK_TYPE? What is set to? What problems are you having? One can get shared library build of PETSc working with: export CRAYPE_LINK_TYPE=dynamic Satish On Thu, 20 Feb 2020, Satish Balay via petsc-users wrote: > You can build PETSc statically with configure option: > > --with-shared-libraries=0 > > Satish > > On Thu, 20 Feb 2020, Mark Adams wrote: > > > We are having problems linking with at Cray static library environment > > variable, that is required to link Adios, and IO package. How does one > > build with static PETSc libs? > > Thanks, > > Mark > > > From asharma at pppl.gov Thu Feb 20 13:52:04 2020 From: asharma at pppl.gov (Amil Sharma) Date: Thu, 20 Feb 2020 14:52:04 -0500 Subject: [petsc-users] static libs In-Reply-To: References: Message-ID: We need static linking in order to link an existing static IO library, but we did not know the PETSc static build configure option. On Thu, Feb 20, 2020 at 2:30 PM Satish Balay wrote: > BTW: What do you mean by 'Cray static library environment variable'? Is it > CRAYPE_LINK_TYPE? What is set to? What problems are you having? > > One can get shared library build of PETSc working with: > > export CRAYPE_LINK_TYPE=dynamic > > Satish > > On Thu, 20 Feb 2020, Satish Balay via petsc-users wrote: > > > You can build PETSc statically with configure option: > > > > --with-shared-libraries=0 > > > > Satish > > > > On Thu, 20 Feb 2020, Mark Adams wrote: > > > > > We are having problems linking with at Cray static library environment > > > variable, that is required to link Adios, and IO package. How does one > > > build with static PETSc libs? > > > Thanks, > > > Mark > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From asharma at pppl.gov Thu Feb 20 14:01:57 2020 From: asharma at pppl.gov (Amil Sharma) Date: Thu, 20 Feb 2020 15:01:57 -0500 Subject: [petsc-users] static libs In-Reply-To: References: Message-ID: Just wondering if static linking is better for performance? On Thu, Feb 20, 2020 at 2:58 PM Albert Mollen wrote: > Hi Mark, > I'm trying to rebuild Adios2 with dynamic linking on cori. Hopefully we > can move over to that. > > Best regards > ---------- > Albert Moll?n > Associate Research Physicist > > Theory Department > Princeton Plasma Physics Laboratory > P.O. Box 451 > Princeton, NJ 08543-0451 > USA > > Tel. +1 609-243-3909 > E-mail: amollen at pppl.gov > > > On Thu, Feb 20, 2020 at 2:52 PM Amil Sharma wrote: > >> We need static linking in order to link an existing static IO library, >> but we did not know the PETSc static build configure option. >> >> On Thu, Feb 20, 2020 at 2:30 PM Satish Balay wrote: >> >>> BTW: What do you mean by 'Cray static library environment variable'? Is >>> it CRAYPE_LINK_TYPE? What is set to? What problems are you having? >>> >>> One can get shared library build of PETSc working with: >>> >>> export CRAYPE_LINK_TYPE=dynamic >>> >>> Satish >>> >>> On Thu, 20 Feb 2020, Satish Balay via petsc-users wrote: >>> >>> > You can build PETSc statically with configure option: >>> > >>> > --with-shared-libraries=0 >>> > >>> > Satish >>> > >>> > On Thu, 20 Feb 2020, Mark Adams wrote: >>> > >>> > > We are having problems linking with at Cray static library >>> environment >>> > > variable, that is required to link Adios, and IO package. How does >>> one >>> > > build with static PETSc libs? >>> > > Thanks, >>> > > Mark >>> > > >>> > >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From amollen at pppl.gov Thu Feb 20 13:57:48 2020 From: amollen at pppl.gov (Albert Mollen) Date: Thu, 20 Feb 2020 14:57:48 -0500 Subject: [petsc-users] static libs In-Reply-To: References: Message-ID: Hi Mark, I'm trying to rebuild Adios2 with dynamic linking on cori. Hopefully we can move over to that. Best regards ---------- Albert Moll?n Associate Research Physicist Theory Department Princeton Plasma Physics Laboratory P.O. Box 451 Princeton, NJ 08543-0451 USA Tel. +1 609-243-3909 E-mail: amollen at pppl.gov On Thu, Feb 20, 2020 at 2:52 PM Amil Sharma wrote: > We need static linking in order to link an existing static IO library, but > we did not know the PETSc static build configure option. > > On Thu, Feb 20, 2020 at 2:30 PM Satish Balay wrote: > >> BTW: What do you mean by 'Cray static library environment variable'? Is >> it CRAYPE_LINK_TYPE? What is set to? What problems are you having? >> >> One can get shared library build of PETSc working with: >> >> export CRAYPE_LINK_TYPE=dynamic >> >> Satish >> >> On Thu, 20 Feb 2020, Satish Balay via petsc-users wrote: >> >> > You can build PETSc statically with configure option: >> > >> > --with-shared-libraries=0 >> > >> > Satish >> > >> > On Thu, 20 Feb 2020, Mark Adams wrote: >> > >> > > We are having problems linking with at Cray static library environment >> > > variable, that is required to link Adios, and IO package. How does one >> > > build with static PETSc libs? >> > > Thanks, >> > > Mark >> > > >> > >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From amollen at pppl.gov Thu Feb 20 13:57:48 2020 From: amollen at pppl.gov (Albert Mollen) Date: Thu, 20 Feb 2020 14:57:48 -0500 Subject: [petsc-users] static libs In-Reply-To: References: Message-ID: Hi Mark, I'm trying to rebuild Adios2 with dynamic linking on cori. Hopefully we can move over to that. Best regards ---------- Albert Moll?n Associate Research Physicist Theory Department Princeton Plasma Physics Laboratory P.O. Box 451 Princeton, NJ 08543-0451 USA Tel. +1 609-243-3909 E-mail: amollen at pppl.gov On Thu, Feb 20, 2020 at 2:52 PM Amil Sharma wrote: > We need static linking in order to link an existing static IO library, but > we did not know the PETSc static build configure option. > > On Thu, Feb 20, 2020 at 2:30 PM Satish Balay wrote: > >> BTW: What do you mean by 'Cray static library environment variable'? Is >> it CRAYPE_LINK_TYPE? What is set to? What problems are you having? >> >> One can get shared library build of PETSc working with: >> >> export CRAYPE_LINK_TYPE=dynamic >> >> Satish >> >> On Thu, 20 Feb 2020, Satish Balay via petsc-users wrote: >> >> > You can build PETSc statically with configure option: >> > >> > --with-shared-libraries=0 >> > >> > Satish >> > >> > On Thu, 20 Feb 2020, Mark Adams wrote: >> > >> > > We are having problems linking with at Cray static library environment >> > > variable, that is required to link Adios, and IO package. How does one >> > > build with static PETSc libs? >> > > Thanks, >> > > Mark >> > > >> > >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jczhang at mcs.anl.gov Thu Feb 20 15:26:28 2020 From: jczhang at mcs.anl.gov (Junchao Zhang) Date: Thu, 20 Feb 2020 15:26:28 -0600 Subject: [petsc-users] static libs In-Reply-To: References: Message-ID: Copy & paste from a Cray paper: "The main disadvantage of dynamic shared libraries is the runtime performance costs of dynamic linking. Every time the program is executed it has to perform a large part of its linking process. The lookup of symbols in a dynamic shared library is much less efficient than in static libraries. The loading of a dynamic shared library during an application?s execution may result in a ?jitter? effect where a single process holds up the forward progress of other processes of the application while it is loading a library. " BTW, Cori's default is changed from static to dynamic. I heard Frontier will also use dynamic. --Junchao Zhang On Thu, Feb 20, 2020 at 2:03 PM Amil Sharma via petsc-users < petsc-users at mcs.anl.gov> wrote: > Just wondering if static linking is better for performance? > > On Thu, Feb 20, 2020 at 2:58 PM Albert Mollen wrote: > >> Hi Mark, >> I'm trying to rebuild Adios2 with dynamic linking on cori. Hopefully we >> can move over to that. >> >> Best regards >> ---------- >> Albert Moll?n >> Associate Research Physicist >> >> Theory Department >> Princeton Plasma Physics Laboratory >> P.O. Box 451 >> Princeton, NJ 08543-0451 >> USA >> >> Tel. +1 609-243-3909 >> E-mail: amollen at pppl.gov >> >> >> On Thu, Feb 20, 2020 at 2:52 PM Amil Sharma wrote: >> >>> We need static linking in order to link an existing static IO library, >>> but we did not know the PETSc static build configure option. >>> >>> On Thu, Feb 20, 2020 at 2:30 PM Satish Balay wrote: >>> >>>> BTW: What do you mean by 'Cray static library environment variable'? Is >>>> it CRAYPE_LINK_TYPE? What is set to? What problems are you having? >>>> >>>> One can get shared library build of PETSc working with: >>>> >>>> export CRAYPE_LINK_TYPE=dynamic >>>> >>>> Satish >>>> >>>> On Thu, 20 Feb 2020, Satish Balay via petsc-users wrote: >>>> >>>> > You can build PETSc statically with configure option: >>>> > >>>> > --with-shared-libraries=0 >>>> > >>>> > Satish >>>> > >>>> > On Thu, 20 Feb 2020, Mark Adams wrote: >>>> > >>>> > > We are having problems linking with at Cray static library >>>> environment >>>> > > variable, that is required to link Adios, and IO package. How does >>>> one >>>> > > build with static PETSc libs? >>>> > > Thanks, >>>> > > Mark >>>> > > >>>> > >>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Thu Feb 20 15:46:36 2020 From: jed at jedbrown.org (Jed Brown) Date: Thu, 20 Feb 2020 14:46:36 -0700 Subject: [petsc-users] static libs In-Reply-To: References: Message-ID: <87o8ttvsn7.fsf@jedbrown.org> Yeah, this startup is typically not bad for normal compiled programs, but can be substantial if you have many libraries or are using a language like Python, which may search hundreds or thousands of paths. In any case, it's mainly a filesystem metadata scalability stress, and one reason some people like to use containers. Making private symbols private helps improve the performance of symbol relocation (a sequential operation). People rarely care about this in scientific software since the actual symbol relocation rarely takes more than a second or two. Those interested in symbol visibility and optimizing startup time for shared libraries can check out this classic guid. https://www.akkadia.org/drepper/dsohowto.pdf Junchao Zhang via petsc-users writes: > Copy & paste from a Cray paper: > "The main disadvantage of dynamic shared libraries is the runtime > performance costs of dynamic linking. Every time the program is executed it > has to perform a large part of its linking process. The lookup of symbols > in a dynamic shared library is much less efficient than in static > libraries. The loading of a dynamic shared library during an application?s > execution may result in a ?jitter? effect where a single process holds up > the forward progress of other processes of the application while it is > loading a library. " > > BTW, Cori's default is changed from static to dynamic. I heard Frontier > will also use dynamic. > > --Junchao Zhang > > > On Thu, Feb 20, 2020 at 2:03 PM Amil Sharma via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> Just wondering if static linking is better for performance? >> >> On Thu, Feb 20, 2020 at 2:58 PM Albert Mollen wrote: >> >>> Hi Mark, >>> I'm trying to rebuild Adios2 with dynamic linking on cori. Hopefully we >>> can move over to that. >>> >>> Best regards >>> ---------- >>> Albert Moll?n >>> Associate Research Physicist >>> >>> Theory Department >>> Princeton Plasma Physics Laboratory >>> P.O. Box 451 >>> Princeton, NJ 08543-0451 >>> USA >>> >>> Tel. +1 609-243-3909 >>> E-mail: amollen at pppl.gov >>> >>> >>> On Thu, Feb 20, 2020 at 2:52 PM Amil Sharma wrote: >>> >>>> We need static linking in order to link an existing static IO library, >>>> but we did not know the PETSc static build configure option. >>>> >>>> On Thu, Feb 20, 2020 at 2:30 PM Satish Balay wrote: >>>> >>>>> BTW: What do you mean by 'Cray static library environment variable'? Is >>>>> it CRAYPE_LINK_TYPE? What is set to? What problems are you having? >>>>> >>>>> One can get shared library build of PETSc working with: >>>>> >>>>> export CRAYPE_LINK_TYPE=dynamic >>>>> >>>>> Satish >>>>> >>>>> On Thu, 20 Feb 2020, Satish Balay via petsc-users wrote: >>>>> >>>>> > You can build PETSc statically with configure option: >>>>> > >>>>> > --with-shared-libraries=0 >>>>> > >>>>> > Satish >>>>> > >>>>> > On Thu, 20 Feb 2020, Mark Adams wrote: >>>>> > >>>>> > > We are having problems linking with at Cray static library >>>>> environment >>>>> > > variable, that is required to link Adios, and IO package. How does >>>>> one >>>>> > > build with static PETSc libs? >>>>> > > Thanks, >>>>> > > Mark >>>>> > > >>>>> > >>>>> >>>>> From mfadams at lbl.gov Thu Feb 20 16:47:37 2020 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 20 Feb 2020 17:47:37 -0500 Subject: [petsc-users] static libs In-Reply-To: References: Message-ID: On Thu, Feb 20, 2020 at 2:24 PM Satish Balay wrote: > You can build PETSc statically with configure option: > > --with-shared-libraries=0 > Thanks, I had forgotten this, was searching for 'static' > > Satish > > On Thu, 20 Feb 2020, Mark Adams wrote: > > > We are having problems linking with at Cray static library environment > > variable, that is required to link Adios, and IO package. How does one > > build with static PETSc libs? > > Thanks, > > Mark > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Feb 20 16:51:26 2020 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 20 Feb 2020 17:51:26 -0500 Subject: [petsc-users] static libs In-Reply-To: References: Message-ID: On Thu, Feb 20, 2020 at 2:30 PM Satish Balay wrote: > BTW: What do you mean by 'Cray static library environment variable'? Is it > CRAYPE_LINK_TYPE? Yes, that was it. > What is set to? What problems are you having? > I think they were using 'static' for Adios but are going to try to make it work with dynamic. Otherwise I can configure with static libs. Thanks, > > One can get shared library build of PETSc working with: > > export CRAYPE_LINK_TYPE=dynamic > > Satish > > On Thu, 20 Feb 2020, Satish Balay via petsc-users wrote: > > > You can build PETSc statically with configure option: > > > > --with-shared-libraries=0 > > > > Satish > > > > On Thu, 20 Feb 2020, Mark Adams wrote: > > > > > We are having problems linking with at Cray static library environment > > > variable, that is required to link Adios, and IO package. How does one > > > build with static PETSc libs? > > > Thanks, > > > Mark > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eijkhout at tacc.utexas.edu Thu Feb 20 16:53:27 2020 From: eijkhout at tacc.utexas.edu (Victor Eijkhout) Date: Thu, 20 Feb 2020 22:53:27 +0000 Subject: [petsc-users] static libs In-Reply-To: References: Message-ID: <1147D371-7873-417E-9E31-A6DAE09465B7@tacc.utexas.edu> On , 2020Feb20, at 15:26, Junchao Zhang via petsc-users > wrote: The main disadvantage of dynamic shared libraries is the runtime performance costs of dynamic linking. Every time the program is executed it has to perform a large part of its linking process The main disadvantage of static linked libraries is the program load time. Each processor that executes the program has to load the executable from disk. Static => large executables => disk hit. Victor. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Thu Feb 20 18:29:29 2020 From: jed at jedbrown.org (Jed Brown) Date: Thu, 20 Feb 2020 17:29:29 -0700 Subject: [petsc-users] static libs In-Reply-To: <1147D371-7873-417E-9E31-A6DAE09465B7@tacc.utexas.edu> References: <1147D371-7873-417E-9E31-A6DAE09465B7@tacc.utexas.edu> Message-ID: <87d0a8wzo6.fsf@jedbrown.org> Victor Eijkhout writes: > On , 2020Feb20, at 15:26, Junchao Zhang via petsc-users > wrote: > > The main disadvantage of dynamic shared libraries is the runtime performance costs of dynamic linking. Every time the program is executed it has to perform a large part of its linking process > > The main disadvantage of static linked libraries is the program load time. Each processor that executes the program has to load the executable from disk. > > Static => large executables => disk hit. I mean, that code is loaded one way or another, be it in a shared library or a static executable. One advantage of shared libraries is that code and read-only data can be shared between processes. So when you mpiexec -n 64 on your fat node, only one copy of the code and read-only data needs to be resident in memory. From mfadams at lbl.gov Fri Feb 21 13:09:17 2020 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 21 Feb 2020 14:09:17 -0500 Subject: [petsc-users] [petsc-maint] "make ... all" failure In-Reply-To: References: Message-ID: Please send the error that you see with --with-64-bit-indices=0. This should not be hard to fix. And yes, always start with a clean environment when you get a strange error. Compile time errors are very rare. On Fri, Feb 21, 2020 at 1:26 PM Jin Chen wrote: > I tried and found > > --with-64-bit-indices=1 > > works. > > But mumps and superlu don't have 64-bit support. > > -- Jin > > On Fri, Feb 21, 2020 at 12:07 PM Mark Adams wrote: > >> Cool, a compiler error. >> >> First, delete the "arch" directory and configure again. This a deep 'make >> clean'. Something you need to do this when you switch branches. >> >> On Fri, Feb 21, 2020 at 11:30 AM Jin Chen via petsc-maint < >> petsc-maint at mcs.anl.gov> wrote: >> >>> Hi, >>> >>> I'm installing petsc branch >>> >>> barry/fix-superlu_dist-py-for-gpus >>> >>> on another computer for testing. It passed configure, but failed at >>> "make .... all". >>> >>> Would you please take a look? Both configure.log and make.log are >>> attached. >>> >>> Thanks, >>> >>> -- Jin >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jchen at pppl.gov Fri Feb 21 13:20:22 2020 From: jchen at pppl.gov (Jin Chen) Date: Fri, 21 Feb 2020 14:20:22 -0500 Subject: [petsc-users] [petsc-maint] "make ... all" failure In-Reply-To: References: Message-ID: errors from setting --with-64-bit-indices=0 : /opt/pgi/19.5/linux86-64-llvm/19.5/include/edg/xmmintrin.h(2514): internal error: assertion failed at: "/dvs/p4/build/sw/rel/gpu_drv/r440/TC440_70/drivers/compiler/edg/EDG_5.0/src/sys_predef.c", line 574 1 catastrophic error detected in the compilation of "/tmp/tmpxft_00004945_00000000-4_aijcusparse.cpp4.ii". Compilation aborted. nvcc error : 'cudafe++' died due to signal 6 gmake[2]: *** [tigergpu-pgi195-openmpi/obj/mat/impls/aij/seq/seqcusparse/aijcusparse.o] Error 6 On Fri, Feb 21, 2020 at 2:09 PM Mark Adams wrote: > Please send the error that you see with --with-64-bit-indices=0. This > should not be hard to fix. > > And yes, always start with a clean environment when you get a strange > error. Compile time errors are very rare. > > On Fri, Feb 21, 2020 at 1:26 PM Jin Chen wrote: > >> I tried and found >> >> --with-64-bit-indices=1 >> >> works. >> >> But mumps and superlu don't have 64-bit support. >> >> -- Jin >> >> On Fri, Feb 21, 2020 at 12:07 PM Mark Adams wrote: >> >>> Cool, a compiler error. >>> >>> First, delete the "arch" directory and configure again. This a deep >>> 'make clean'. Something you need to do this when you switch branches. >>> >>> On Fri, Feb 21, 2020 at 11:30 AM Jin Chen via petsc-maint < >>> petsc-maint at mcs.anl.gov> wrote: >>> >>>> Hi, >>>> >>>> I'm installing petsc branch >>>> >>>> barry/fix-superlu_dist-py-for-gpus >>>> >>>> on another computer for testing. It passed configure, but failed at >>>> "make .... all". >>>> >>>> Would you please take a look? Both configure.log and make.log are >>>> attached. >>>> >>>> Thanks, >>>> >>>> -- Jin >>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri Feb 21 13:36:08 2020 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 21 Feb 2020 14:36:08 -0500 Subject: [petsc-users] [petsc-maint] "make ... all" failure In-Reply-To: References: Message-ID: OK, the compiler is really failing on 64 bit integers. This is out of my expertise. It is possible that when we were doing this last year that we used 64 bit integers and never encountered this. On Fri, Feb 21, 2020 at 2:20 PM Jin Chen wrote: > errors from setting --with-64-bit-indices=0 : > > > /opt/pgi/19.5/linux86-64-llvm/19.5/include/edg/xmmintrin.h(2514): internal > error: assertion failed at: > "/dvs/p4/build/sw/rel/gpu_drv/r440/TC440_70/drivers/compiler/edg/EDG_5.0/src/sys_predef.c", > line 574 > 1 catastrophic error detected in the compilation of > "/tmp/tmpxft_00004945_00000000-4_aijcusparse.cpp4.ii". > Compilation aborted. > nvcc error : 'cudafe++' died due to signal 6 > gmake[2]: *** > [tigergpu-pgi195-openmpi/obj/mat/impls/aij/seq/seqcusparse/aijcusparse.o] > Error 6 > > On Fri, Feb 21, 2020 at 2:09 PM Mark Adams wrote: > >> Please send the error that you see with --with-64-bit-indices=0. This >> should not be hard to fix. >> >> And yes, always start with a clean environment when you get a strange >> error. Compile time errors are very rare. >> >> On Fri, Feb 21, 2020 at 1:26 PM Jin Chen wrote: >> >>> I tried and found >>> >>> --with-64-bit-indices=1 >>> >>> works. >>> >>> But mumps and superlu don't have 64-bit support. >>> >>> -- Jin >>> >>> On Fri, Feb 21, 2020 at 12:07 PM Mark Adams wrote: >>> >>>> Cool, a compiler error. >>>> >>>> First, delete the "arch" directory and configure again. This a deep >>>> 'make clean'. Something you need to do this when you switch branches. >>>> >>>> On Fri, Feb 21, 2020 at 11:30 AM Jin Chen via petsc-maint < >>>> petsc-maint at mcs.anl.gov> wrote: >>>> >>>>> Hi, >>>>> >>>>> I'm installing petsc branch >>>>> >>>>> barry/fix-superlu_dist-py-for-gpus >>>>> >>>>> on another computer for testing. It passed configure, but failed at >>>>> "make .... all". >>>>> >>>>> Would you please take a look? Both configure.log and make.log are >>>>> attached. >>>>> >>>>> Thanks, >>>>> >>>>> -- Jin >>>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From shrirang.abhyankar at pnnl.gov Sat Feb 22 08:18:57 2020 From: shrirang.abhyankar at pnnl.gov (Abhyankar, Shrirang G) Date: Sat, 22 Feb 2020 14:18:57 +0000 Subject: [petsc-users] Using PETSc with GPU supported SuperLU_Dist Message-ID: <264462B2-AE1F-4922-948E-0C6FCCB9A429@pnnl.gov> Hi, I want to install PETSc with GPU supported SuperLU_Dist. What are the configure options I should be using? Thanks, Shri -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Sat Feb 22 08:25:01 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Sat, 22 Feb 2020 08:25:01 -0600 Subject: [petsc-users] Using PETSc with GPU supported SuperLU_Dist In-Reply-To: <264462B2-AE1F-4922-948E-0C6FCCB9A429@pnnl.gov> References: <264462B2-AE1F-4922-948E-0C6FCCB9A429@pnnl.gov> Message-ID: On Sat, 22 Feb 2020, Abhyankar, Shrirang G via petsc-users wrote: > Hi, > I want to install PETSc with GPU supported SuperLU_Dist. What are the configure options I should be using? Shri, >>>>> if self.framework.argDB['download-superlu_dist-gpu']: self.cuda = framework.require('config.packages.cuda',self) self.openmp = framework.require('config.packages.openmp',self) self.deps = [self.mpi,self.blasLapack,self.cuda,self.openmp] <<<<< So try: --with-cuda=1 --download-superlu_dist=1 --download-superlu_dist-gpu=1 --with-openmp=1 [and usual MPI, blaslapack] Satish From shrirang.abhyankar at pnnl.gov Sat Feb 22 11:41:46 2020 From: shrirang.abhyankar at pnnl.gov (Abhyankar, Shrirang G) Date: Sat, 22 Feb 2020 17:41:46 +0000 Subject: [petsc-users] Using PETSc with GPU supported SuperLU_Dist In-Reply-To: References: <264462B2-AE1F-4922-948E-0C6FCCB9A429@pnnl.gov> Message-ID: <4BDB7C51-7452-45CC-A118-4D3F4F5D03D1@pnnl.gov> Thanks, Satish. Configure and make go through fine. Getting an undefined reference error for VecGetArrayWrite_SeqCUDA. Shri From: Satish Balay Reply-To: petsc-users Date: Saturday, February 22, 2020 at 8:25 AM To: "Abhyankar, Shrirang G" Cc: "petsc-users at mcs.anl.gov" Subject: Re: [petsc-users] Using PETSc with GPU supported SuperLU_Dist On Sat, 22 Feb 2020, Abhyankar, Shrirang G via petsc-users wrote: Hi, I want to install PETSc with GPU supported SuperLU_Dist. What are the configure options I should be using? Shri, if self.framework.argDB['download-superlu_dist-gpu']: self.cuda = framework.require('config.packages.cuda',self) self.openmp = framework.require('config.packages.openmp',self) self.deps = [self.mpi,self.blasLapack,self.cuda,self.openmp] <<<<< So try: --with-cuda=1 --download-superlu_dist=1 --download-superlu_dist-gpu=1 --with-openmp=1 [and usual MPI, blaslapack] Satish -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test.log Type: application/octet-stream Size: 4994 bytes Desc: test.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: make.log Type: application/octet-stream Size: 106830 bytes Desc: make.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 2546435 bytes Desc: configure.log URL: From balay at mcs.anl.gov Sat Feb 22 12:27:48 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Sat, 22 Feb 2020 12:27:48 -0600 Subject: [petsc-users] Using PETSc with GPU supported SuperLU_Dist In-Reply-To: <4BDB7C51-7452-45CC-A118-4D3F4F5D03D1@pnnl.gov> References: <264462B2-AE1F-4922-948E-0C6FCCB9A429@pnnl.gov> <4BDB7C51-7452-45CC-A118-4D3F4F5D03D1@pnnl.gov> Message-ID: Looks like a bug in petsc that needs fixing. However - you shouldn't need options '--with-cxx-dialect=C++11 --with-clanguage=c++' Satish On Sat, 22 Feb 2020, Abhyankar, Shrirang G via petsc-users wrote: > Thanks, Satish. Configure and make go through fine. Getting an undefined reference error for VecGetArrayWrite_SeqCUDA. > > Shri > From: Satish Balay > Reply-To: petsc-users > Date: Saturday, February 22, 2020 at 8:25 AM > To: "Abhyankar, Shrirang G" > Cc: "petsc-users at mcs.anl.gov" > Subject: Re: [petsc-users] Using PETSc with GPU supported SuperLU_Dist > > On Sat, 22 Feb 2020, Abhyankar, Shrirang G via petsc-users wrote: > > Hi, > I want to install PETSc with GPU supported SuperLU_Dist. What are the configure options I should be using? > > > Shri, > > > if self.framework.argDB['download-superlu_dist-gpu']: > self.cuda = framework.require('config.packages.cuda',self) > self.openmp = framework.require('config.packages.openmp',self) > self.deps = [self.mpi,self.blasLapack,self.cuda,self.openmp] > <<<<< > > So try: > > --with-cuda=1 --download-superlu_dist=1 --download-superlu_dist-gpu=1 --with-openmp=1 [and usual MPI, blaslapack] > > Satish > > > From jczhang at mcs.anl.gov Sat Feb 22 20:53:46 2020 From: jczhang at mcs.anl.gov (Junchao Zhang) Date: Sat, 22 Feb 2020 20:53:46 -0600 Subject: [petsc-users] Using PETSc with GPU supported SuperLU_Dist In-Reply-To: <4BDB7C51-7452-45CC-A118-4D3F4F5D03D1@pnnl.gov> References: <264462B2-AE1F-4922-948E-0C6FCCB9A429@pnnl.gov> <4BDB7C51-7452-45CC-A118-4D3F4F5D03D1@pnnl.gov> Message-ID: We met the error before and knew why. Will fix it soon. --Junchao Zhang On Sat, Feb 22, 2020 at 11:43 AM Abhyankar, Shrirang G via petsc-users < petsc-users at mcs.anl.gov> wrote: > Thanks, Satish. Configure and make go through fine. Getting an undefined > reference error for VecGetArrayWrite_SeqCUDA. > > > > Shri > > *From: *Satish Balay > *Reply-To: *petsc-users > *Date: *Saturday, February 22, 2020 at 8:25 AM > *To: *"Abhyankar, Shrirang G" > *Cc: *"petsc-users at mcs.anl.gov" > *Subject: *Re: [petsc-users] Using PETSc with GPU supported SuperLU_Dist > > > > On Sat, 22 Feb 2020, Abhyankar, Shrirang G via petsc-users wrote: > > > > Hi, > > I want to install PETSc with GPU supported SuperLU_Dist. What are the > configure options I should be using? > > > > > > Shri, > > > > > > if self.framework.argDB['download-superlu_dist-gpu']: > > self.cuda = framework.require('config.packages.cuda',self) > > self.openmp = > framework.require('config.packages.openmp',self) > > self.deps = > [self.mpi,self.blasLapack,self.cuda,self.openmp] > > <<<<< > > > > So try: > > > > --with-cuda=1 --download-superlu_dist=1 --download-superlu_dist-gpu=1 > --with-openmp=1 [and usual MPI, blaslapack] > > > > Satish > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Sat Feb 22 20:59:14 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Sat, 22 Feb 2020 20:59:14 -0600 Subject: [petsc-users] Using PETSc with GPU supported SuperLU_Dist In-Reply-To: References: <264462B2-AE1F-4922-948E-0C6FCCB9A429@pnnl.gov> <4BDB7C51-7452-45CC-A118-4D3F4F5D03D1@pnnl.gov> Message-ID: The fix is now in both maint and master https://gitlab.com/petsc/petsc/-/merge_requests/2555 Satish On Sat, 22 Feb 2020, Junchao Zhang via petsc-users wrote: > We met the error before and knew why. Will fix it soon. > --Junchao Zhang > > > On Sat, Feb 22, 2020 at 11:43 AM Abhyankar, Shrirang G via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > Thanks, Satish. Configure and make go through fine. Getting an undefined > > reference error for VecGetArrayWrite_SeqCUDA. > > > > > > > > Shri > > > > *From: *Satish Balay > > *Reply-To: *petsc-users > > *Date: *Saturday, February 22, 2020 at 8:25 AM > > *To: *"Abhyankar, Shrirang G" > > *Cc: *"petsc-users at mcs.anl.gov" > > *Subject: *Re: [petsc-users] Using PETSc with GPU supported SuperLU_Dist > > > > > > > > On Sat, 22 Feb 2020, Abhyankar, Shrirang G via petsc-users wrote: > > > > > > > > Hi, > > > > I want to install PETSc with GPU supported SuperLU_Dist. What are the > > configure options I should be using? > > > > > > > > > > > > Shri, > > > > > > > > > > > > if self.framework.argDB['download-superlu_dist-gpu']: > > > > self.cuda = framework.require('config.packages.cuda',self) > > > > self.openmp = > > framework.require('config.packages.openmp',self) > > > > self.deps = > > [self.mpi,self.blasLapack,self.cuda,self.openmp] > > > > <<<<< > > > > > > > > So try: > > > > > > > > --with-cuda=1 --download-superlu_dist=1 --download-superlu_dist-gpu=1 > > --with-openmp=1 [and usual MPI, blaslapack] > > > > > > > > Satish > > > > > > > > > > > From jczhang at mcs.anl.gov Sat Feb 22 21:02:11 2020 From: jczhang at mcs.anl.gov (Junchao Zhang) Date: Sat, 22 Feb 2020 21:02:11 -0600 Subject: [petsc-users] Using PETSc with GPU supported SuperLU_Dist In-Reply-To: <5efd582f8f36424bab7e5604b33efcbf@BY5PR09MB5585.namprd09.prod.outlook.com> References: <264462B2-AE1F-4922-948E-0C6FCCB9A429@pnnl.gov> <4BDB7C51-7452-45CC-A118-4D3F4F5D03D1@pnnl.gov> <5efd582f8f36424bab7e5604b33efcbf@BY5PR09MB5585.namprd09.prod.outlook.com> Message-ID: Great. Thanks. On Sat, Feb 22, 2020 at 8:59 PM Balay, Satish wrote: > The fix is now in both maint and master > > https://gitlab.com/petsc/petsc/-/merge_requests/2555 > > Satish > > On Sat, 22 Feb 2020, Junchao Zhang via petsc-users wrote: > > > We met the error before and knew why. Will fix it soon. > > --Junchao Zhang > > > > > > On Sat, Feb 22, 2020 at 11:43 AM Abhyankar, Shrirang G via petsc-users < > > petsc-users at mcs.anl.gov> wrote: > > > > > Thanks, Satish. Configure and make go through fine. Getting an > undefined > > > reference error for VecGetArrayWrite_SeqCUDA. > > > > > > > > > > > > Shri > > > > > > *From: *Satish Balay > > > *Reply-To: *petsc-users > > > *Date: *Saturday, February 22, 2020 at 8:25 AM > > > *To: *"Abhyankar, Shrirang G" > > > *Cc: *"petsc-users at mcs.anl.gov" > > > *Subject: *Re: [petsc-users] Using PETSc with GPU supported > SuperLU_Dist > > > > > > > > > > > > On Sat, 22 Feb 2020, Abhyankar, Shrirang G via petsc-users wrote: > > > > > > > > > > > > Hi, > > > > > > I want to install PETSc with GPU supported SuperLU_Dist. What are > the > > > configure options I should be using? > > > > > > > > > > > > > > > > > > Shri, > > > > > > > > > > > > > > > > > > if self.framework.argDB['download-superlu_dist-gpu']: > > > > > > self.cuda = > framework.require('config.packages.cuda',self) > > > > > > self.openmp = > > > framework.require('config.packages.openmp',self) > > > > > > self.deps = > > > [self.mpi,self.blasLapack,self.cuda,self.openmp] > > > > > > <<<<< > > > > > > > > > > > > So try: > > > > > > > > > > > > --with-cuda=1 --download-superlu_dist=1 --download-superlu_dist-gpu=1 > > > --with-openmp=1 [and usual MPI, blaslapack] > > > > > > > > > > > > Satish > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From barrydog505 at gmail.com Sun Feb 23 01:59:19 2020 From: barrydog505 at gmail.com (Tsung-Hsing Chen) Date: Sun, 23 Feb 2020 15:59:19 +0800 Subject: [petsc-users] Error - Out of memory. This could be due to allocating too large an object or bleeding by not properly destroying unneeded objects. Message-ID: Hi all, I have written a simple code to solve the FEM problem, and I want to use LU to solve the Ax=b. My problem(error) won't happen at the beginning until M & N in A_matrix is getting larger. (Can also be understood as mesh vertex increase.) All the error output seems to relate to LU, but I don't know what should be done. The followings are the code I wrote(section) and the error output. Here's the code (section) : /* code ... */ ierr = MatCreate(PETSC_COMM_WORLD, &A_matrix);CHKERRQ(ierr); ierr = MatSetSizes(A_matrix, PETSC_DECIDE, PETSC_DECIDE, M, N);CHKERRQ(ierr); ierr = MatSetType(A_matrix, MATSEQAIJ);CHKERRQ(ierr); // setting nnz ... ierr = MatSeqAIJSetPreallocation(A_matrix, 0, nnz);CHKERRQ(ierr); /* MatSetValues(); ... MatAssemblyBegin(); MatAssemblyEnd(); */ ierr = KSPCreate(PETSC_COMM_WORLD, &ksp);CHKERRQ(ierr); ierr = KSPSetOperators(ksp, A_matrix, A_matrix);CHKERRQ(ierr); ierr = KSPSetType(ksp, KSPPREONLY);CHKERRQ(ierr); ierr = KSPGetPC(ksp, &pc);CHKERRQ(ierr); ierr = PCSetType(pc, PCLU);CHKERRQ(ierr); ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr); ierr = KSPSetUp(ksp);CHKERRQ(ierr); /* code ... */ Here's the error (run with valgrind --tool=memcheck --leak-check=full) : ==6371== Warning: set address range perms: large range [0x7c84a040, 0xb4e9a598) (undefined) ==6371== Warning: set address range perms: large range [0xb4e9b040, 0x2b4e9aeac) (undefined) ==6371== Warning: set address range perms: large range [0x2b4e9b040, 0x4b4e9aeac) (undefined) ==6371== Argument 'size' of function memalign has a fishy (possibly negative) value: -5187484888 ==6371== at 0x4C320A6: memalign (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==6371== by 0x501B4B0: PetscMallocAlign (mal.c:49) ==6371== by 0x501CE37: PetscMallocA (mal.c:422) ==6371== by 0x5ACFF0C: MatLUFactorSymbolic_SeqAIJ (aijfact.c:366) ==6371== by 0x561D8B3: MatLUFactorSymbolic (matrix.c:3005) ==6371== by 0x644ED9C: PCSetUp_LU (lu.c:90) ==6371== by 0x65A2C32: PCSetUp (precon.c:894) ==6371== by 0x6707E71: KSPSetUp (itfunc.c:376) ==6371== by 0x13AB09: Calculate (taylor_hood.c:1780) ==6371== by 0x10CB85: main (taylor_hood.c:228) ==6371== [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Out of memory. This could be due to allocating [0]PETSC ERROR: too large an object or bleeding by not properly [0]PETSC ERROR: destroying unneeded objects. [0]PETSC ERROR: Memory allocated 0 Memory used by process 15258234880 [0]PETSC ERROR: Try running with -malloc_dump or -malloc_view for info. [0]PETSC ERROR: Memory requested 18446744068522065920 [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown [0]PETSC ERROR: ./taylor_hood on a arch-linux2-c-debug named e2-120 by barry Sun Feb 23 14:18:46 2020 [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mpich --download-fblaslapack --download-triangle [0]PETSC ERROR: #1 MatLUFactorSymbolic_SeqAIJ() line 366 in /home/barry/petsc/src/mat/impls/aij/seq/aijfact.c [0]PETSC ERROR: #2 PetscMallocA() line 422 in /home/barry/petsc/src/sys/memory/mal.c [0]PETSC ERROR: #3 MatLUFactorSymbolic_SeqAIJ() line 366 in /home/barry/petsc/src/mat/impls/aij/seq/aijfact.c [0]PETSC ERROR: #4 MatLUFactorSymbolic() line 3005 in /home/barry/petsc/src/mat/interface/matrix.c [0]PETSC ERROR: #5 PCSetUp_LU() line 90 in /home/barry/petsc/src/ksp/pc/impls/factor/lu/lu.c [0]PETSC ERROR: #6 PCSetUp() line 894 in /home/barry/petsc/src/ksp/pc/interface/precon.c [0]PETSC ERROR: #7 KSPSetUp() line 376 in /home/barry/petsc/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #8 Calculate() line 1780 in /home/barry/brain/brain/3D/taylor_hood.c [0]PETSC ERROR: #9 main() line 230 in /home/barry/brain/brain/3D/taylor_hood.c [0]PETSC ERROR: PETSc Option Table entries: [0]PETSC ERROR: -dm_view [0]PETSC ERROR: -f mesh/ellipsoid.msh [0]PETSC ERROR: -matload_block_size 1 [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- Is there any setting that should be done but I ignore? Thanks in advance, Tsung-Hsing Chen -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Sun Feb 23 02:33:26 2020 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Sun, 23 Feb 2020 11:33:26 +0300 Subject: [petsc-users] Error - Out of memory. This could be due to allocating too large an object or bleeding by not properly destroying unneeded objects. In-Reply-To: References: Message-ID: This seems integer overflow when computing the factors. How large is the matrix when you encounter the error? Note that LU is not memory optimal and you can easily encounter out-of-memory issues with large matrices. Assuming sparsity, the memory requirements for LU are N log(N) in 2D and N^4/3 in 3D. Il giorno dom 23 feb 2020 alle ore 11:01 Tsung-Hsing Chen < barrydog505 at gmail.com> ha scritto: > Hi all, > > I have written a simple code to solve the FEM problem, and I want to use > LU to solve the Ax=b. > My problem(error) won't happen at the beginning until M & N in A_matrix is > getting larger. (Can also be understood as mesh vertex increase.) > All the error output seems to relate to LU, but I don't know what should > be done. > The followings are the code I wrote(section) and the error output. > > Here's the code (section) : > /* > code ... > */ > ierr = MatCreate(PETSC_COMM_WORLD, &A_matrix);CHKERRQ(ierr); > ierr = MatSetSizes(A_matrix, PETSC_DECIDE, PETSC_DECIDE, M, > N);CHKERRQ(ierr); > ierr = MatSetType(A_matrix, MATSEQAIJ);CHKERRQ(ierr); > // setting nnz ... > ierr = MatSeqAIJSetPreallocation(A_matrix, 0, nnz);CHKERRQ(ierr); > /* > MatSetValues(); ... > MatAssemblyBegin(); > MatAssemblyEnd(); > */ > ierr = KSPCreate(PETSC_COMM_WORLD, &ksp);CHKERRQ(ierr); > ierr = KSPSetOperators(ksp, A_matrix, A_matrix);CHKERRQ(ierr); > ierr = KSPSetType(ksp, KSPPREONLY);CHKERRQ(ierr); > ierr = KSPGetPC(ksp, &pc);CHKERRQ(ierr); > ierr = PCSetType(pc, PCLU);CHKERRQ(ierr); > ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr); > ierr = KSPSetUp(ksp);CHKERRQ(ierr); > /* > code ... > */ > > Here's the error (run with valgrind --tool=memcheck --leak-check=full) : > ==6371== Warning: set address range perms: large range [0x7c84a040, > 0xb4e9a598) (undefined) > ==6371== Warning: set address range perms: large range [0xb4e9b040, > 0x2b4e9aeac) (undefined) > ==6371== Warning: set address range perms: large range [0x2b4e9b040, > 0x4b4e9aeac) (undefined) > ==6371== Argument 'size' of function memalign has a fishy (possibly > negative) value: -5187484888 > ==6371== at 0x4C320A6: memalign (in > /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) > ==6371== by 0x501B4B0: PetscMallocAlign (mal.c:49) > ==6371== by 0x501CE37: PetscMallocA (mal.c:422) > ==6371== by 0x5ACFF0C: MatLUFactorSymbolic_SeqAIJ (aijfact.c:366) > ==6371== by 0x561D8B3: MatLUFactorSymbolic (matrix.c:3005) > ==6371== by 0x644ED9C: PCSetUp_LU (lu.c:90) > ==6371== by 0x65A2C32: PCSetUp (precon.c:894) > ==6371== by 0x6707E71: KSPSetUp (itfunc.c:376) > ==6371== by 0x13AB09: Calculate (taylor_hood.c:1780) > ==6371== by 0x10CB85: main (taylor_hood.c:228) > ==6371== > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Out of memory. This could be due to allocating > [0]PETSC ERROR: too large an object or bleeding by not properly > [0]PETSC ERROR: destroying unneeded objects. > [0]PETSC ERROR: Memory allocated 0 Memory used by process 15258234880 > [0]PETSC ERROR: Try running with -malloc_dump or -malloc_view for info. > [0]PETSC ERROR: Memory requested 18446744068522065920 > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown > [0]PETSC ERROR: ./taylor_hood on a arch-linux2-c-debug named e2-120 by > barry Sun Feb 23 14:18:46 2020 > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ > --with-fc=gfortran --download-mpich --download-fblaslapack > --download-triangle > [0]PETSC ERROR: #1 MatLUFactorSymbolic_SeqAIJ() line 366 in > /home/barry/petsc/src/mat/impls/aij/seq/aijfact.c > [0]PETSC ERROR: #2 PetscMallocA() line 422 in > /home/barry/petsc/src/sys/memory/mal.c > [0]PETSC ERROR: #3 MatLUFactorSymbolic_SeqAIJ() line 366 in > /home/barry/petsc/src/mat/impls/aij/seq/aijfact.c > [0]PETSC ERROR: #4 MatLUFactorSymbolic() line 3005 in > /home/barry/petsc/src/mat/interface/matrix.c > [0]PETSC ERROR: #5 PCSetUp_LU() line 90 in > /home/barry/petsc/src/ksp/pc/impls/factor/lu/lu.c > [0]PETSC ERROR: #6 PCSetUp() line 894 in > /home/barry/petsc/src/ksp/pc/interface/precon.c > [0]PETSC ERROR: #7 KSPSetUp() line 376 in > /home/barry/petsc/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: #8 Calculate() line 1780 in > /home/barry/brain/brain/3D/taylor_hood.c > [0]PETSC ERROR: #9 main() line 230 in > /home/barry/brain/brain/3D/taylor_hood.c > [0]PETSC ERROR: PETSc Option Table entries: > [0]PETSC ERROR: -dm_view > [0]PETSC ERROR: -f mesh/ellipsoid.msh > [0]PETSC ERROR: -matload_block_size 1 > [0]PETSC ERROR: ----------------End of Error Message -------send entire > error message to petsc-maint at mcs.anl.gov---------- > > Is there any setting that should be done but I ignore? > > Thanks in advance, > > Tsung-Hsing Chen > -- Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From barrydog505 at gmail.com Sun Feb 23 02:52:44 2020 From: barrydog505 at gmail.com (Tsung-Hsing Chen) Date: Sun, 23 Feb 2020 16:52:44 +0800 Subject: [petsc-users] Error - Out of memory. This could be due to allocating too large an object or bleeding by not properly destroying unneeded objects. In-Reply-To: References: Message-ID: This error came with a matrix approximate 300,000*300,000, and I was solving a 3D model. " the memory requirements for LU are N log(N) in 2D and N^4/3 in 3D. " What unit is it? Byte? Stefano Zampini ? 2020?2?23? ?? ??4:33??? > This seems integer overflow when computing the factors. > > How large is the matrix when you encounter the error? > Note that LU is not memory optimal and you can easily encounter > out-of-memory issues with large matrices. > Assuming sparsity, the memory requirements for LU are N log(N) in 2D and > N^4/3 in 3D. > > > Il giorno dom 23 feb 2020 alle ore 11:01 Tsung-Hsing Chen < > barrydog505 at gmail.com> ha scritto: > >> Hi all, >> >> I have written a simple code to solve the FEM problem, and I want to use >> LU to solve the Ax=b. >> My problem(error) won't happen at the beginning until M & N in A_matrix >> is getting larger. (Can also be understood as mesh vertex increase.) >> All the error output seems to relate to LU, but I don't know what should >> be done. >> The followings are the code I wrote(section) and the error output. >> >> Here's the code (section) : >> /* >> code ... >> */ >> ierr = MatCreate(PETSC_COMM_WORLD, &A_matrix);CHKERRQ(ierr); >> ierr = MatSetSizes(A_matrix, PETSC_DECIDE, PETSC_DECIDE, M, >> N);CHKERRQ(ierr); >> ierr = MatSetType(A_matrix, MATSEQAIJ);CHKERRQ(ierr); >> // setting nnz ... >> ierr = MatSeqAIJSetPreallocation(A_matrix, 0, nnz);CHKERRQ(ierr); >> /* >> MatSetValues(); ... >> MatAssemblyBegin(); >> MatAssemblyEnd(); >> */ >> ierr = KSPCreate(PETSC_COMM_WORLD, &ksp);CHKERRQ(ierr); >> ierr = KSPSetOperators(ksp, A_matrix, A_matrix);CHKERRQ(ierr); >> ierr = KSPSetType(ksp, KSPPREONLY);CHKERRQ(ierr); >> ierr = KSPGetPC(ksp, &pc);CHKERRQ(ierr); >> ierr = PCSetType(pc, PCLU);CHKERRQ(ierr); >> ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr); >> ierr = KSPSetUp(ksp);CHKERRQ(ierr); >> /* >> code ... >> */ >> >> Here's the error (run with valgrind --tool=memcheck --leak-check=full) : >> ==6371== Warning: set address range perms: large range [0x7c84a040, >> 0xb4e9a598) (undefined) >> ==6371== Warning: set address range perms: large range [0xb4e9b040, >> 0x2b4e9aeac) (undefined) >> ==6371== Warning: set address range perms: large range [0x2b4e9b040, >> 0x4b4e9aeac) (undefined) >> ==6371== Argument 'size' of function memalign has a fishy (possibly >> negative) value: -5187484888 >> ==6371== at 0x4C320A6: memalign (in >> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==6371== by 0x501B4B0: PetscMallocAlign (mal.c:49) >> ==6371== by 0x501CE37: PetscMallocA (mal.c:422) >> ==6371== by 0x5ACFF0C: MatLUFactorSymbolic_SeqAIJ (aijfact.c:366) >> ==6371== by 0x561D8B3: MatLUFactorSymbolic (matrix.c:3005) >> ==6371== by 0x644ED9C: PCSetUp_LU (lu.c:90) >> ==6371== by 0x65A2C32: PCSetUp (precon.c:894) >> ==6371== by 0x6707E71: KSPSetUp (itfunc.c:376) >> ==6371== by 0x13AB09: Calculate (taylor_hood.c:1780) >> ==6371== by 0x10CB85: main (taylor_hood.c:228) >> ==6371== >> [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [0]PETSC ERROR: Out of memory. This could be due to allocating >> [0]PETSC ERROR: too large an object or bleeding by not properly >> [0]PETSC ERROR: destroying unneeded objects. >> [0]PETSC ERROR: Memory allocated 0 Memory used by process 15258234880 >> [0]PETSC ERROR: Try running with -malloc_dump or -malloc_view for info. >> [0]PETSC ERROR: Memory requested 18446744068522065920 >> [0]PETSC ERROR: See >> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >> shooting. >> [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown >> [0]PETSC ERROR: ./taylor_hood on a arch-linux2-c-debug named e2-120 by >> barry Sun Feb 23 14:18:46 2020 >> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ >> --with-fc=gfortran --download-mpich --download-fblaslapack >> --download-triangle >> [0]PETSC ERROR: #1 MatLUFactorSymbolic_SeqAIJ() line 366 in >> /home/barry/petsc/src/mat/impls/aij/seq/aijfact.c >> [0]PETSC ERROR: #2 PetscMallocA() line 422 in >> /home/barry/petsc/src/sys/memory/mal.c >> [0]PETSC ERROR: #3 MatLUFactorSymbolic_SeqAIJ() line 366 in >> /home/barry/petsc/src/mat/impls/aij/seq/aijfact.c >> [0]PETSC ERROR: #4 MatLUFactorSymbolic() line 3005 in >> /home/barry/petsc/src/mat/interface/matrix.c >> [0]PETSC ERROR: #5 PCSetUp_LU() line 90 in >> /home/barry/petsc/src/ksp/pc/impls/factor/lu/lu.c >> [0]PETSC ERROR: #6 PCSetUp() line 894 in >> /home/barry/petsc/src/ksp/pc/interface/precon.c >> [0]PETSC ERROR: #7 KSPSetUp() line 376 in >> /home/barry/petsc/src/ksp/ksp/interface/itfunc.c >> [0]PETSC ERROR: #8 Calculate() line 1780 in >> /home/barry/brain/brain/3D/taylor_hood.c >> [0]PETSC ERROR: #9 main() line 230 in >> /home/barry/brain/brain/3D/taylor_hood.c >> [0]PETSC ERROR: PETSc Option Table entries: >> [0]PETSC ERROR: -dm_view >> [0]PETSC ERROR: -f mesh/ellipsoid.msh >> [0]PETSC ERROR: -matload_block_size 1 >> [0]PETSC ERROR: ----------------End of Error Message -------send entire >> error message to petsc-maint at mcs.anl.gov---------- >> >> Is there any setting that should be done but I ignore? >> >> Thanks in advance, >> >> Tsung-Hsing Chen >> > > > -- > Stefano > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Sun Feb 23 03:35:09 2020 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Sun, 23 Feb 2020 12:35:09 +0300 Subject: [petsc-users] Error - Out of memory. This could be due to allocating too large an object or bleeding by not properly destroying unneeded objects. In-Reply-To: References: Message-ID: Il giorno dom 23 feb 2020 alle ore 11:53 Tsung-Hsing Chen < barrydog505 at gmail.com> ha scritto: > This error came with a matrix approximate 300,000*300,000, and I was > solving a 3D model. > " the memory requirements for LU are N log(N) in 2D and N^4/3 in 3D. " > What unit is it? Byte? > Number of floating-point entries. Assuming an optimal nested dissection ordering can be found (i.e. no "dense" rows), the largest front is asymptotically as large as N^(2/3) (N the size of the matrix) Storing it in memory requires (N^(2/3))^2 entries, thus N^4/3 entries. What problem are you solving? If you plan to use direct methods, you may want to experiment with parallel factorization packages like MUMPS or SUPERLU_DIST > > Stefano Zampini ? 2020?2?23? ?? ??4:33??? > >> This seems integer overflow when computing the factors. >> >> How large is the matrix when you encounter the error? >> Note that LU is not memory optimal and you can easily encounter >> out-of-memory issues with large matrices. >> Assuming sparsity, the memory requirements for LU are N log(N) in 2D and >> N^4/3 in 3D. >> >> >> Il giorno dom 23 feb 2020 alle ore 11:01 Tsung-Hsing Chen < >> barrydog505 at gmail.com> ha scritto: >> >>> Hi all, >>> >>> I have written a simple code to solve the FEM problem, and I want to use >>> LU to solve the Ax=b. >>> My problem(error) won't happen at the beginning until M & N in A_matrix >>> is getting larger. (Can also be understood as mesh vertex increase.) >>> All the error output seems to relate to LU, but I don't know what should >>> be done. >>> The followings are the code I wrote(section) and the error output. >>> >>> Here's the code (section) : >>> /* >>> code ... >>> */ >>> ierr = MatCreate(PETSC_COMM_WORLD, &A_matrix);CHKERRQ(ierr); >>> ierr = MatSetSizes(A_matrix, PETSC_DECIDE, PETSC_DECIDE, M, >>> N);CHKERRQ(ierr); >>> ierr = MatSetType(A_matrix, MATSEQAIJ);CHKERRQ(ierr); >>> // setting nnz ... >>> ierr = MatSeqAIJSetPreallocation(A_matrix, 0, nnz);CHKERRQ(ierr); >>> /* >>> MatSetValues(); ... >>> MatAssemblyBegin(); >>> MatAssemblyEnd(); >>> */ >>> ierr = KSPCreate(PETSC_COMM_WORLD, &ksp);CHKERRQ(ierr); >>> ierr = KSPSetOperators(ksp, A_matrix, A_matrix);CHKERRQ(ierr); >>> ierr = KSPSetType(ksp, KSPPREONLY);CHKERRQ(ierr); >>> ierr = KSPGetPC(ksp, &pc);CHKERRQ(ierr); >>> ierr = PCSetType(pc, PCLU);CHKERRQ(ierr); >>> ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr); >>> ierr = KSPSetUp(ksp);CHKERRQ(ierr); >>> /* >>> code ... >>> */ >>> >>> Here's the error (run with valgrind --tool=memcheck --leak-check=full) : >>> ==6371== Warning: set address range perms: large range [0x7c84a040, >>> 0xb4e9a598) (undefined) >>> ==6371== Warning: set address range perms: large range [0xb4e9b040, >>> 0x2b4e9aeac) (undefined) >>> ==6371== Warning: set address range perms: large range [0x2b4e9b040, >>> 0x4b4e9aeac) (undefined) >>> ==6371== Argument 'size' of function memalign has a fishy (possibly >>> negative) value: -5187484888 >>> ==6371== at 0x4C320A6: memalign (in >>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==6371== by 0x501B4B0: PetscMallocAlign (mal.c:49) >>> ==6371== by 0x501CE37: PetscMallocA (mal.c:422) >>> ==6371== by 0x5ACFF0C: MatLUFactorSymbolic_SeqAIJ (aijfact.c:366) >>> ==6371== by 0x561D8B3: MatLUFactorSymbolic (matrix.c:3005) >>> ==6371== by 0x644ED9C: PCSetUp_LU (lu.c:90) >>> ==6371== by 0x65A2C32: PCSetUp (precon.c:894) >>> ==6371== by 0x6707E71: KSPSetUp (itfunc.c:376) >>> ==6371== by 0x13AB09: Calculate (taylor_hood.c:1780) >>> ==6371== by 0x10CB85: main (taylor_hood.c:228) >>> ==6371== >>> [0]PETSC ERROR: --------------------- Error Message >>> -------------------------------------------------------------- >>> [0]PETSC ERROR: Out of memory. This could be due to allocating >>> [0]PETSC ERROR: too large an object or bleeding by not properly >>> [0]PETSC ERROR: destroying unneeded objects. >>> [0]PETSC ERROR: Memory allocated 0 Memory used by process 15258234880 >>> [0]PETSC ERROR: Try running with -malloc_dump or -malloc_view for info. >>> [0]PETSC ERROR: Memory requested 18446744068522065920 >>> [0]PETSC ERROR: See >>> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>> shooting. >>> [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown >>> [0]PETSC ERROR: ./taylor_hood on a arch-linux2-c-debug named e2-120 by >>> barry Sun Feb 23 14:18:46 2020 >>> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ >>> --with-fc=gfortran --download-mpich --download-fblaslapack >>> --download-triangle >>> [0]PETSC ERROR: #1 MatLUFactorSymbolic_SeqAIJ() line 366 in >>> /home/barry/petsc/src/mat/impls/aij/seq/aijfact.c >>> [0]PETSC ERROR: #2 PetscMallocA() line 422 in >>> /home/barry/petsc/src/sys/memory/mal.c >>> [0]PETSC ERROR: #3 MatLUFactorSymbolic_SeqAIJ() line 366 in >>> /home/barry/petsc/src/mat/impls/aij/seq/aijfact.c >>> [0]PETSC ERROR: #4 MatLUFactorSymbolic() line 3005 in >>> /home/barry/petsc/src/mat/interface/matrix.c >>> [0]PETSC ERROR: #5 PCSetUp_LU() line 90 in >>> /home/barry/petsc/src/ksp/pc/impls/factor/lu/lu.c >>> [0]PETSC ERROR: #6 PCSetUp() line 894 in >>> /home/barry/petsc/src/ksp/pc/interface/precon.c >>> [0]PETSC ERROR: #7 KSPSetUp() line 376 in >>> /home/barry/petsc/src/ksp/ksp/interface/itfunc.c >>> [0]PETSC ERROR: #8 Calculate() line 1780 in >>> /home/barry/brain/brain/3D/taylor_hood.c >>> [0]PETSC ERROR: #9 main() line 230 in >>> /home/barry/brain/brain/3D/taylor_hood.c >>> [0]PETSC ERROR: PETSc Option Table entries: >>> [0]PETSC ERROR: -dm_view >>> [0]PETSC ERROR: -f mesh/ellipsoid.msh >>> [0]PETSC ERROR: -matload_block_size 1 >>> [0]PETSC ERROR: ----------------End of Error Message -------send >>> entire error message to petsc-maint at mcs.anl.gov---------- >>> >>> Is there any setting that should be done but I ignore? >>> >>> Thanks in advance, >>> >>> Tsung-Hsing Chen >>> >> >> >> -- >> Stefano >> > -- Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From barrydog505 at gmail.com Sun Feb 23 04:22:44 2020 From: barrydog505 at gmail.com (Tsung-Hsing Chen) Date: Sun, 23 Feb 2020 18:22:44 +0800 Subject: [petsc-users] Error - Out of memory. This could be due to allocating too large an object or bleeding by not properly destroying unneeded objects. In-Reply-To: References: Message-ID: A problem related to elasticity. I think I will try those external packages. Thanks for your assistance. Stefano Zampini ? 2020?2?23? ?? ??5:35??? > > > Il giorno dom 23 feb 2020 alle ore 11:53 Tsung-Hsing Chen < > barrydog505 at gmail.com> ha scritto: > >> This error came with a matrix approximate 300,000*300,000, and I was >> solving a 3D model. >> " the memory requirements for LU are N log(N) in 2D and N^4/3 in 3D. " >> What unit is it? Byte? >> > > Number of floating-point entries. > > Assuming an optimal nested dissection ordering can be found (i.e. no > "dense" rows), the largest front is asymptotically as large as N^(2/3) (N > the size of the matrix) > Storing it in memory requires (N^(2/3))^2 entries, thus N^4/3 entries. > What problem are you solving? > If you plan to use direct methods, you may want to experiment with > parallel factorization packages like MUMPS or SUPERLU_DIST > > >> >> Stefano Zampini ? 2020?2?23? ?? ??4:33??? >> >>> This seems integer overflow when computing the factors. >>> >>> How large is the matrix when you encounter the error? >>> Note that LU is not memory optimal and you can easily encounter >>> out-of-memory issues with large matrices. >>> Assuming sparsity, the memory requirements for LU are N log(N) in 2D and >>> N^4/3 in 3D. >>> >>> >>> Il giorno dom 23 feb 2020 alle ore 11:01 Tsung-Hsing Chen < >>> barrydog505 at gmail.com> ha scritto: >>> >>>> Hi all, >>>> >>>> I have written a simple code to solve the FEM problem, and I want to >>>> use LU to solve the Ax=b. >>>> My problem(error) won't happen at the beginning until M & N in A_matrix >>>> is getting larger. (Can also be understood as mesh vertex increase.) >>>> All the error output seems to relate to LU, but I don't know what >>>> should be done. >>>> The followings are the code I wrote(section) and the error output. >>>> >>>> Here's the code (section) : >>>> /* >>>> code ... >>>> */ >>>> ierr = MatCreate(PETSC_COMM_WORLD, &A_matrix);CHKERRQ(ierr); >>>> ierr = MatSetSizes(A_matrix, PETSC_DECIDE, PETSC_DECIDE, M, >>>> N);CHKERRQ(ierr); >>>> ierr = MatSetType(A_matrix, MATSEQAIJ);CHKERRQ(ierr); >>>> // setting nnz ... >>>> ierr = MatSeqAIJSetPreallocation(A_matrix, 0, nnz);CHKERRQ(ierr); >>>> /* >>>> MatSetValues(); ... >>>> MatAssemblyBegin(); >>>> MatAssemblyEnd(); >>>> */ >>>> ierr = KSPCreate(PETSC_COMM_WORLD, &ksp);CHKERRQ(ierr); >>>> ierr = KSPSetOperators(ksp, A_matrix, A_matrix);CHKERRQ(ierr); >>>> ierr = KSPSetType(ksp, KSPPREONLY);CHKERRQ(ierr); >>>> ierr = KSPGetPC(ksp, &pc);CHKERRQ(ierr); >>>> ierr = PCSetType(pc, PCLU);CHKERRQ(ierr); >>>> ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr); >>>> ierr = KSPSetUp(ksp);CHKERRQ(ierr); >>>> /* >>>> code ... >>>> */ >>>> >>>> Here's the error (run with valgrind --tool=memcheck --leak-check=full) : >>>> ==6371== Warning: set address range perms: large range [0x7c84a040, >>>> 0xb4e9a598) (undefined) >>>> ==6371== Warning: set address range perms: large range [0xb4e9b040, >>>> 0x2b4e9aeac) (undefined) >>>> ==6371== Warning: set address range perms: large range [0x2b4e9b040, >>>> 0x4b4e9aeac) (undefined) >>>> ==6371== Argument 'size' of function memalign has a fishy (possibly >>>> negative) value: -5187484888 >>>> ==6371== at 0x4C320A6: memalign (in >>>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==6371== by 0x501B4B0: PetscMallocAlign (mal.c:49) >>>> ==6371== by 0x501CE37: PetscMallocA (mal.c:422) >>>> ==6371== by 0x5ACFF0C: MatLUFactorSymbolic_SeqAIJ (aijfact.c:366) >>>> ==6371== by 0x561D8B3: MatLUFactorSymbolic (matrix.c:3005) >>>> ==6371== by 0x644ED9C: PCSetUp_LU (lu.c:90) >>>> ==6371== by 0x65A2C32: PCSetUp (precon.c:894) >>>> ==6371== by 0x6707E71: KSPSetUp (itfunc.c:376) >>>> ==6371== by 0x13AB09: Calculate (taylor_hood.c:1780) >>>> ==6371== by 0x10CB85: main (taylor_hood.c:228) >>>> ==6371== >>>> [0]PETSC ERROR: --------------------- Error Message >>>> -------------------------------------------------------------- >>>> [0]PETSC ERROR: Out of memory. This could be due to allocating >>>> [0]PETSC ERROR: too large an object or bleeding by not properly >>>> [0]PETSC ERROR: destroying unneeded objects. >>>> [0]PETSC ERROR: Memory allocated 0 Memory used by process 15258234880 >>>> [0]PETSC ERROR: Try running with -malloc_dump or -malloc_view for >>>> info. >>>> [0]PETSC ERROR: Memory requested 18446744068522065920 >>>> [0]PETSC ERROR: See >>>> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>>> shooting. >>>> [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown >>>> [0]PETSC ERROR: ./taylor_hood on a arch-linux2-c-debug named e2-120 >>>> by barry Sun Feb 23 14:18:46 2020 >>>> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ >>>> --with-fc=gfortran --download-mpich --download-fblaslapack >>>> --download-triangle >>>> [0]PETSC ERROR: #1 MatLUFactorSymbolic_SeqAIJ() line 366 in >>>> /home/barry/petsc/src/mat/impls/aij/seq/aijfact.c >>>> [0]PETSC ERROR: #2 PetscMallocA() line 422 in >>>> /home/barry/petsc/src/sys/memory/mal.c >>>> [0]PETSC ERROR: #3 MatLUFactorSymbolic_SeqAIJ() line 366 in >>>> /home/barry/petsc/src/mat/impls/aij/seq/aijfact.c >>>> [0]PETSC ERROR: #4 MatLUFactorSymbolic() line 3005 in >>>> /home/barry/petsc/src/mat/interface/matrix.c >>>> [0]PETSC ERROR: #5 PCSetUp_LU() line 90 in >>>> /home/barry/petsc/src/ksp/pc/impls/factor/lu/lu.c >>>> [0]PETSC ERROR: #6 PCSetUp() line 894 in >>>> /home/barry/petsc/src/ksp/pc/interface/precon.c >>>> [0]PETSC ERROR: #7 KSPSetUp() line 376 in >>>> /home/barry/petsc/src/ksp/ksp/interface/itfunc.c >>>> [0]PETSC ERROR: #8 Calculate() line 1780 in >>>> /home/barry/brain/brain/3D/taylor_hood.c >>>> [0]PETSC ERROR: #9 main() line 230 in >>>> /home/barry/brain/brain/3D/taylor_hood.c >>>> [0]PETSC ERROR: PETSc Option Table entries: >>>> [0]PETSC ERROR: -dm_view >>>> [0]PETSC ERROR: -f mesh/ellipsoid.msh >>>> [0]PETSC ERROR: -matload_block_size 1 >>>> [0]PETSC ERROR: ----------------End of Error Message -------send >>>> entire error message to petsc-maint at mcs.anl.gov---------- >>>> >>>> Is there any setting that should be done but I ignore? >>>> >>>> Thanks in advance, >>>> >>>> Tsung-Hsing Chen >>>> >>> >>> >>> -- >>> Stefano >>> >> > > -- > Stefano > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shrirang.abhyankar at pnnl.gov Sun Feb 23 15:08:54 2020 From: shrirang.abhyankar at pnnl.gov (Abhyankar, Shrirang G) Date: Sun, 23 Feb 2020 21:08:54 +0000 Subject: [petsc-users] Using PETSc with GPU supported SuperLU_Dist In-Reply-To: References: <264462B2-AE1F-4922-948E-0C6FCCB9A429@pnnl.gov> <4BDB7C51-7452-45CC-A118-4D3F4F5D03D1@pnnl.gov> Message-ID: <4C14C2B3-0CB1-4E5F-A414-D5FBC10F6F18@pnnl.gov> I am getting an error now for CUDA driver version. Any suggestions? petsc:maint$ make test Running test examples to verify correct installation Using PETSC_DIR=/people/abhy245/software/petsc and PETSC_ARCH=debug-mode-newell Possible error running C/C++ src/snes/examples/tutorials/ex19 with 1 MPI process See http://www.mcs.anl.gov/petsc/documentation/faq.html [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Error in system call [0]PETSC ERROR: error in cudaSetDevice CUDA driver version is insufficient for CUDA runtime version [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown [0]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by abhy245 Sun Feb 23 12:49:55 2020 [0]PETSC ERROR: Configure options --download-fblaslapack --download-make --download-metis --download-parmetis --download-scalapack --download-suitesparse --download-superlu_dist-gpu=1 --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++ --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1 --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1 PETSC_ARCH=debug-mode-newell [0]PETSC ERROR: #1 PetscCUDAInitialize() line 261 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c [0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c [0]PETSC ERROR: #3 PetscInitialize() line 1010 in /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpiexec detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was: Process name: [[46518,1],0] Exit code: 88 -------------------------------------------------------------------------- Possible error running C/C++ src/snes/examples/tutorials/ex19 with 2 MPI processes See http://www.mcs.anl.gov/petsc/documentation/faq.html [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: Error in system call [1]PETSC ERROR: [0]PETSC ERROR: Error in system call [0]PETSC ERROR: error in cudaGetDeviceCount CUDA driver version is insufficient for CUDA runtime version [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. error in cudaGetDeviceCount CUDA driver version is insufficient for CUDA runtime version [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [1]PETSC ERROR: [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown [0]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by abhy245 Sun Feb 23 12:49:57 2020 [0]PETSC ERROR: Configure options --download-fblaslapack --download-make --download-metis --download-parmetis --download-scalapack --download-suitesparse --download-superlu_dist-gpu=1 --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++ --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1 --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1 PETSC_ARCH=debug-mode-newell [0]PETSC ERROR: #1 PetscCUDAInitialize() line 254 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c [0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c [0]PETSC ERROR: #3 PetscInitialize() line 1010 in /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c Petsc Release Version 3.12.4, unknown [1]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by abhy245 Sun Feb 23 12:49:57 2020 [1]PETSC ERROR: Configure options --download-fblaslapack --download-make --download-metis --download-parmetis --download-scalapack --download-suitesparse --download-superlu_dist-gpu=1 --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++ --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1 --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1 PETSC_ARCH=debug-mode-newell [1]PETSC ERROR: #1 PetscCUDAInitialize() line 254 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c [1]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c [1]PETSC ERROR: #3 PetscInitialize() line 1010 in /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpiexec detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was: Process name: [[46522,1],0] Exit code: 88 -------------------------------------------------------------------------- 1,2c1,21 < lid velocity = 0.0025, prandtl # = 1., grashof # = 1. < Number of SNES iterations = 2 --- > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Error in system call > [0]PETSC ERROR: error in cudaSetDevice CUDA driver version is insufficient for CUDA runtime version > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown > [0]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by abhy245 Sun Feb 23 12:50:00 2020 > [0]PETSC ERROR: Configure options --download-fblaslapack --download-make --download-metis --download-parmetis --download-scalapack --download-suitesparse --download-superlu_dist-gpu=1 --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++ --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1 --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1 PETSC_ARCH=debug-mode-newell > [0]PETSC ERROR: #1 PetscCUDAInitialize() line 261 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c > [0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c > [0]PETSC ERROR: #3 PetscInitialize() line 1010 in /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c > -------------------------------------------------------------------------- > Primary job terminated normally, but 1 process returned > a non-zero exit code. Per user-direction, the job has been aborted. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpiexec detected that one or more processes exited with non-zero status, thus causing > the job to be terminated. The first process to do so was: > > Process name: [[46545,1],0] > Exit code: 88 > -------------------------------------------------------------------------- /people/abhy245/software/petsc/src/snes/examples/tutorials Possible problem with ex19 running with superlu_dist, diffs above ========================================= Possible error running Fortran example src/snes/examples/tutorials/ex5f with 1 MPI process See http://www.mcs.anl.gov/petsc/documentation/faq.html [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Error in system call [0]PETSC ERROR: error in cudaSetDevice CUDA driver version is insufficient for CUDA runtime version [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown [0]PETSC ERROR: ./ex5f on a debug-mode-newell named newell01.pnl.gov by abhy245 Sun Feb 23 12:50:04 2020 [0]PETSC ERROR: Configure options --download-fblaslapack --download-make --download-metis --download-parmetis --download-scalapack --download-suitesparse --download-superlu_dist-gpu=1 --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++ --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1 --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1 PETSC_ARCH=debug-mode-newell [0]PETSC ERROR: #1 PetscCUDAInitialize() line 261 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c [0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c [0]PETSC ERROR: PetscInitialize:Checking initial options Unable to initialize PETSc -------------------------------------------------------------------------- mpiexec has exited due to process rank 0 with PID 0 on node newell01 exiting improperly. There are three reasons this could occur: 1. this process did not call "init" before exiting, but others in the job did. This can cause a job to hang indefinitely while it waits for all processes to call "init". By rule, if one process calls "init", then ALL processes must call "init" prior to termination. 2. this process called "init", but exited without calling "finalize". By rule, all processes that call "init" MUST call "finalize" prior to exiting or it will be considered an "abnormal termination" 3. this process called "MPI_Abort" or "orte_abort" and the mca parameter orte_create_session_dirs is set to false. In this case, the run-time cannot detect that the abort call was an abnormal termination. Hence, the only error message you will receive is this one. This may have caused other processes in the application to be terminated by signals sent by mpiexec (as reported here). You can avoid this message by specifying -quiet on the mpiexec command line. -------------------------------------------------------------------------- Completed test examples From: Satish Balay Reply-To: petsc-users Date: Saturday, February 22, 2020 at 9:00 PM To: Junchao Zhang Cc: "Abhyankar, Shrirang G" , petsc-users Subject: Re: [petsc-users] Using PETSc with GPU supported SuperLU_Dist The fix is now in both maint and master https://gitlab.com/petsc/petsc/-/merge_requests/2555 Satish On Sat, 22 Feb 2020, Junchao Zhang via petsc-users wrote: We met the error before and knew why. Will fix it soon. --Junchao Zhang On Sat, Feb 22, 2020 at 11:43 AM Abhyankar, Shrirang G via petsc-users < petsc-users at mcs.anl.gov> wrote: > Thanks, Satish. Configure and make go through fine. Getting an undefined > reference error for VecGetArrayWrite_SeqCUDA. > > > > Shri > > *From: *Satish Balay > > *Reply-To: *petsc-users > > *Date: *Saturday, February 22, 2020 at 8:25 AM > *To: *"Abhyankar, Shrirang G" > > *Cc: *"petsc-users at mcs.anl.gov" > > *Subject: *Re: [petsc-users] Using PETSc with GPU supported SuperLU_Dist > > > > On Sat, 22 Feb 2020, Abhyankar, Shrirang G via petsc-users wrote: > > > > Hi, > > I want to install PETSc with GPU supported SuperLU_Dist. What are the > configure options I should be using? > > > > > > Shri, > > > > > > if self.framework.argDB['download-superlu_dist-gpu']: > > self.cuda = framework.require('config.packages.cuda',self) > > self.openmp = > framework.require('config.packages.openmp',self) > > self.deps = > [self.mpi,self.blasLapack,self.cuda,self.openmp] > > <<<<< > > > > So try: > > > > --with-cuda=1 --download-superlu_dist=1 --download-superlu_dist-gpu=1 > --with-openmp=1 [and usual MPI, blaslapack] > > > > Satish > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: make.log Type: application/octet-stream Size: 106174 bytes Desc: make.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 2406311 bytes Desc: configure.log URL: From shrirang.abhyankar at pnnl.gov Sun Feb 23 15:33:11 2020 From: shrirang.abhyankar at pnnl.gov (Abhyankar, Shrirang G) Date: Sun, 23 Feb 2020 21:33:11 +0000 Subject: [petsc-users] Using PETSc with GPU supported SuperLU_Dist In-Reply-To: <4C14C2B3-0CB1-4E5F-A414-D5FBC10F6F18@pnnl.gov> References: <264462B2-AE1F-4922-948E-0C6FCCB9A429@pnnl.gov> <4BDB7C51-7452-45CC-A118-4D3F4F5D03D1@pnnl.gov> <4C14C2B3-0CB1-4E5F-A414-D5FBC10F6F18@pnnl.gov> Message-ID: I was using CUDA v10.2. Switching to 9.2 gives a clean make test. Thanks, Shri From: petsc-users on behalf of "Abhyankar, Shrirang G via petsc-users" Reply-To: "Abhyankar, Shrirang G" Date: Sunday, February 23, 2020 at 3:10 PM To: petsc-users , Junchao Zhang Subject: Re: [petsc-users] Using PETSc with GPU supported SuperLU_Dist I am getting an error now for CUDA driver version. Any suggestions? petsc:maint$ make test Running test examples to verify correct installation Using PETSC_DIR=/people/abhy245/software/petsc and PETSC_ARCH=debug-mode-newell Possible error running C/C++ src/snes/examples/tutorials/ex19 with 1 MPI process See http://www.mcs.anl.gov/petsc/documentation/faq.html [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Error in system call [0]PETSC ERROR: error in cudaSetDevice CUDA driver version is insufficient for CUDA runtime version [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown [0]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by abhy245 Sun Feb 23 12:49:55 2020 [0]PETSC ERROR: Configure options --download-fblaslapack --download-make --download-metis --download-parmetis --download-scalapack --download-suitesparse --download-superlu_dist-gpu=1 --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++ --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1 --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1 PETSC_ARCH=debug-mode-newell [0]PETSC ERROR: #1 PetscCUDAInitialize() line 261 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c [0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c [0]PETSC ERROR: #3 PetscInitialize() line 1010 in /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpiexec detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was: Process name: [[46518,1],0] Exit code: 88 -------------------------------------------------------------------------- Possible error running C/C++ src/snes/examples/tutorials/ex19 with 2 MPI processes See http://www.mcs.anl.gov/petsc/documentation/faq.html [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: Error in system call [1]PETSC ERROR: [0]PETSC ERROR: Error in system call [0]PETSC ERROR: error in cudaGetDeviceCount CUDA driver version is insufficient for CUDA runtime version [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. error in cudaGetDeviceCount CUDA driver version is insufficient for CUDA runtime version [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [1]PETSC ERROR: [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown [0]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by abhy245 Sun Feb 23 12:49:57 2020 [0]PETSC ERROR: Configure options --download-fblaslapack --download-make --download-metis --download-parmetis --download-scalapack --download-suitesparse --download-superlu_dist-gpu=1 --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++ --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1 --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1 PETSC_ARCH=debug-mode-newell [0]PETSC ERROR: #1 PetscCUDAInitialize() line 254 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c [0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c [0]PETSC ERROR: #3 PetscInitialize() line 1010 in /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c Petsc Release Version 3.12.4, unknown [1]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by abhy245 Sun Feb 23 12:49:57 2020 [1]PETSC ERROR: Configure options --download-fblaslapack --download-make --download-metis --download-parmetis --download-scalapack --download-suitesparse --download-superlu_dist-gpu=1 --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++ --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1 --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1 PETSC_ARCH=debug-mode-newell [1]PETSC ERROR: #1 PetscCUDAInitialize() line 254 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c [1]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c [1]PETSC ERROR: #3 PetscInitialize() line 1010 in /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpiexec detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was: Process name: [[46522,1],0] Exit code: 88 -------------------------------------------------------------------------- 1,2c1,21 < lid velocity = 0.0025, prandtl # = 1., grashof # = 1. < Number of SNES iterations = 2 --- > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Error in system call > [0]PETSC ERROR: error in cudaSetDevice CUDA driver version is insufficient for CUDA runtime version > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown > [0]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by abhy245 Sun Feb 23 12:50:00 2020 > [0]PETSC ERROR: Configure options --download-fblaslapack --download-make --download-metis --download-parmetis --download-scalapack --download-suitesparse --download-superlu_dist-gpu=1 --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++ --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1 --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1 PETSC_ARCH=debug-mode-newell > [0]PETSC ERROR: #1 PetscCUDAInitialize() line 261 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c > [0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c > [0]PETSC ERROR: #3 PetscInitialize() line 1010 in /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c > -------------------------------------------------------------------------- > Primary job terminated normally, but 1 process returned > a non-zero exit code. Per user-direction, the job has been aborted. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpiexec detected that one or more processes exited with non-zero status, thus causing > the job to be terminated. The first process to do so was: > > Process name: [[46545,1],0] > Exit code: 88 > -------------------------------------------------------------------------- /people/abhy245/software/petsc/src/snes/examples/tutorials Possible problem with ex19 running with superlu_dist, diffs above ========================================= Possible error running Fortran example src/snes/examples/tutorials/ex5f with 1 MPI process See http://www.mcs.anl.gov/petsc/documentation/faq.html [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Error in system call [0]PETSC ERROR: error in cudaSetDevice CUDA driver version is insufficient for CUDA runtime version [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown [0]PETSC ERROR: ./ex5f on a debug-mode-newell named newell01.pnl.gov by abhy245 Sun Feb 23 12:50:04 2020 [0]PETSC ERROR: Configure options --download-fblaslapack --download-make --download-metis --download-parmetis --download-scalapack --download-suitesparse --download-superlu_dist-gpu=1 --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++ --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1 --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1 PETSC_ARCH=debug-mode-newell [0]PETSC ERROR: #1 PetscCUDAInitialize() line 261 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c [0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c [0]PETSC ERROR: PetscInitialize:Checking initial options Unable to initialize PETSc -------------------------------------------------------------------------- mpiexec has exited due to process rank 0 with PID 0 on node newell01 exiting improperly. There are three reasons this could occur: 1. this process did not call "init" before exiting, but others in the job did. This can cause a job to hang indefinitely while it waits for all processes to call "init". By rule, if one process calls "init", then ALL processes must call "init" prior to termination. 2. this process called "init", but exited without calling "finalize". By rule, all processes that call "init" MUST call "finalize" prior to exiting or it will be considered an "abnormal termination" 3. this process called "MPI_Abort" or "orte_abort" and the mca parameter orte_create_session_dirs is set to false. In this case, the run-time cannot detect that the abort call was an abnormal termination. Hence, the only error message you will receive is this one. This may have caused other processes in the application to be terminated by signals sent by mpiexec (as reported here). You can avoid this message by specifying -quiet on the mpiexec command line. -------------------------------------------------------------------------- Completed test examples From: Satish Balay Reply-To: petsc-users Date: Saturday, February 22, 2020 at 9:00 PM To: Junchao Zhang Cc: "Abhyankar, Shrirang G" , petsc-users Subject: Re: [petsc-users] Using PETSc with GPU supported SuperLU_Dist The fix is now in both maint and master https://gitlab.com/petsc/petsc/-/merge_requests/2555 Satish On Sat, 22 Feb 2020, Junchao Zhang via petsc-users wrote: We met the error before and knew why. Will fix it soon. --Junchao Zhang On Sat, Feb 22, 2020 at 11:43 AM Abhyankar, Shrirang G via petsc-users < petsc-users at mcs.anl.gov> wrote: > Thanks, Satish. Configure and make go through fine. Getting an undefined > reference error for VecGetArrayWrite_SeqCUDA. > > > > Shri > > *From: *Satish Balay > > *Reply-To: *petsc-users > > *Date: *Saturday, February 22, 2020 at 8:25 AM > *To: *"Abhyankar, Shrirang G" > > *Cc: *"petsc-users at mcs.anl.gov" > > *Subject: *Re: [petsc-users] Using PETSc with GPU supported SuperLU_Dist > > > > On Sat, 22 Feb 2020, Abhyankar, Shrirang G via petsc-users wrote: > > > > Hi, > > I want to install PETSc with GPU supported SuperLU_Dist. What are the > configure options I should be using? > > > > > > Shri, > > > > > > if self.framework.argDB['download-superlu_dist-gpu']: > > self.cuda = framework.require('config.packages.cuda',self) > > self.openmp = > framework.require('config.packages.openmp',self) > > self.deps = > [self.mpi,self.blasLapack,self.cuda,self.openmp] > > <<<<< > > > > So try: > > > > --with-cuda=1 --download-superlu_dist=1 --download-superlu_dist-gpu=1 > --with-openmp=1 [and usual MPI, blaslapack] > > > > Satish > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.beare at monash.edu Sun Feb 23 17:24:02 2020 From: richard.beare at monash.edu (Richard Beare) Date: Mon, 24 Feb 2020 10:24:02 +1100 Subject: [petsc-users] Correct approach for updating deprecated code Message-ID: Hi, The following code gives a deprecation warning. What is the correct way of updating the use of ViewerSetFormat to ViewerPushFormat (which I presume is the preferred replacement). My first attempt gave errors concerning ordering. Thanks PetscViewer viewer1; ierr = PetscViewerBinaryOpen(PETSC_COMM_WORLD,fileName.c_str (),FILE_MODE_WRITE,&viewer1);CHKERRQ(ierr); ierr = PetscViewerSetFormat(viewer1,PETSC_VIEWER_BINARY_MATLAB);CHKERRQ (ierr); ierr = PetscObjectSetName((PetscObject)mX,"x");CHKERRQ(ierr); ierr = PetscObjectSetName((PetscObject)mB,"b");CHKERRQ(ierr); ierr = VecView(mX,viewer1);CHKERRQ(ierr); ierr = VecView(mB,viewer1);CHKERRQ(ierr); -- -- A/Prof Richard Beare Imaging and Bioinformatics, Peninsula Clinical School orcid.org/0000-0002-7530-5664 Richard.Beare at monash.edu +61 3 9788 1724 Geospatial Research: https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sun Feb 23 17:43:02 2020 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 23 Feb 2020 18:43:02 -0500 Subject: [petsc-users] Correct approach for updating deprecated code In-Reply-To: References: Message-ID: On Sun, Feb 23, 2020 at 6:25 PM Richard Beare via petsc-users < petsc-users at mcs.anl.gov> wrote: > > Hi, > The following code gives a deprecation warning. What is the correct way of > updating the use of ViewerSetFormat to ViewerPushFormat (which I presume is > the preferred replacement). My first attempt gave errors concerning > ordering. > You can't just change SetFormat to PushFormat here? Matt > Thanks > > PetscViewer viewer1; > ierr = PetscViewerBinaryOpen(PETSC_COMM_WORLD,fileName.c_str > (),FILE_MODE_WRITE,&viewer1);CHKERRQ(ierr); > ierr = PetscViewerSetFormat(viewer1,PETSC_VIEWER_BINARY_MATLAB);CHKERRQ > (ierr); > > ierr = PetscObjectSetName((PetscObject)mX,"x");CHKERRQ(ierr); > ierr = PetscObjectSetName((PetscObject)mB,"b");CHKERRQ(ierr); > > ierr = VecView(mX,viewer1);CHKERRQ(ierr); > ierr = VecView(mB,viewer1);CHKERRQ(ierr); > > > -- > -- > A/Prof Richard Beare > Imaging and Bioinformatics, Peninsula Clinical School > orcid.org/0000-0002-7530-5664 > Richard.Beare at monash.edu > +61 3 9788 1724 > > > > Geospatial Research: > https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.beare at monash.edu Sun Feb 23 17:45:29 2020 From: richard.beare at monash.edu (Richard Beare) Date: Mon, 24 Feb 2020 10:45:29 +1100 Subject: [petsc-users] Correct approach for updating deprecated code In-Reply-To: References: Message-ID: That's what I did (see below), but I got ordering errors (unfortunately deleted those logs too soon). I'll rerun if no one recognises what I've done wrong. PetscViewer viewer1; ierr = PetscViewerBinaryOpen(PETSC_COMM_WORLD,fileName.c_str (),FILE_MODE_WRITE,&viewer1);CHKERRQ(ierr); //ierr = PetscViewerSetFormat(viewer1,PETSC_VIEWER_BINARY_MATLAB);CHKERRQ(ierr); ierr = PetscViewerPushFormat(viewer1,PETSC_VIEWER_BINARY_MATLAB);CHKERRQ (ierr); ierr = PetscObjectSetName((PetscObject)mX,"x");CHKERRQ(ierr); ierr = PetscObjectSetName((PetscObject)mB,"b");CHKERRQ(ierr); On Mon, 24 Feb 2020 at 10:43, Matthew Knepley wrote: > On Sun, Feb 23, 2020 at 6:25 PM Richard Beare via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> >> Hi, >> The following code gives a deprecation warning. What is the correct way >> of updating the use of ViewerSetFormat to ViewerPushFormat (which I presume >> is the preferred replacement). My first attempt gave errors concerning >> ordering. >> > > You can't just change SetFormat to PushFormat here? > > Matt > > >> Thanks >> >> PetscViewer viewer1; >> ierr = PetscViewerBinaryOpen(PETSC_COMM_WORLD,fileName.c_str >> (),FILE_MODE_WRITE,&viewer1);CHKERRQ(ierr); >> ierr = PetscViewerSetFormat(viewer1,PETSC_VIEWER_BINARY_MATLAB);CHKERRQ >> (ierr); >> >> ierr = PetscObjectSetName((PetscObject)mX,"x");CHKERRQ(ierr); >> ierr = PetscObjectSetName((PetscObject)mB,"b");CHKERRQ(ierr); >> >> ierr = VecView(mX,viewer1);CHKERRQ(ierr); >> ierr = VecView(mB,viewer1);CHKERRQ(ierr); >> >> >> -- >> -- >> A/Prof Richard Beare >> Imaging and Bioinformatics, Peninsula Clinical School >> orcid.org/0000-0002-7530-5664 >> Richard.Beare at monash.edu >> +61 3 9788 1724 >> >> >> >> Geospatial Research: >> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- -- A/Prof Richard Beare Imaging and Bioinformatics, Peninsula Clinical School orcid.org/0000-0002-7530-5664 Richard.Beare at monash.edu +61 3 9788 1724 Geospatial Research: https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sun Feb 23 19:35:47 2020 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 23 Feb 2020 20:35:47 -0500 Subject: [petsc-users] Correct approach for updating deprecated code In-Reply-To: References: Message-ID: I think you are going to have to send the error logs. Thanks, MAtt On Sun, Feb 23, 2020 at 6:45 PM Richard Beare wrote: > That's what I did (see below), but I got ordering errors (unfortunately > deleted those logs too soon). I'll rerun if no one recognises what I've > done wrong. > > PetscViewer viewer1; > ierr = PetscViewerBinaryOpen(PETSC_COMM_WORLD,fileName.c_str > (),FILE_MODE_WRITE,&viewer1);CHKERRQ(ierr); > //ierr = > PetscViewerSetFormat(viewer1,PETSC_VIEWER_BINARY_MATLAB);CHKERRQ(ierr); > ierr = PetscViewerPushFormat(viewer1,PETSC_VIEWER_BINARY_MATLAB);CHKERRQ > (ierr); > > ierr = PetscObjectSetName((PetscObject)mX,"x");CHKERRQ(ierr); > ierr = PetscObjectSetName((PetscObject)mB,"b");CHKERRQ(ierr); > > On Mon, 24 Feb 2020 at 10:43, Matthew Knepley wrote: > >> On Sun, Feb 23, 2020 at 6:25 PM Richard Beare via petsc-users < >> petsc-users at mcs.anl.gov> wrote: >> >>> >>> Hi, >>> The following code gives a deprecation warning. What is the correct way >>> of updating the use of ViewerSetFormat to ViewerPushFormat (which I presume >>> is the preferred replacement). My first attempt gave errors concerning >>> ordering. >>> >> >> You can't just change SetFormat to PushFormat here? >> >> Matt >> >> >>> Thanks >>> >>> PetscViewer viewer1; >>> ierr = PetscViewerBinaryOpen(PETSC_COMM_WORLD,fileName.c_str >>> (),FILE_MODE_WRITE,&viewer1);CHKERRQ(ierr); >>> ierr = PetscViewerSetFormat(viewer1,PETSC_VIEWER_BINARY_MATLAB);CHKERRQ >>> (ierr); >>> >>> ierr = PetscObjectSetName((PetscObject)mX,"x");CHKERRQ(ierr); >>> ierr = PetscObjectSetName((PetscObject)mB,"b");CHKERRQ(ierr); >>> >>> ierr = VecView(mX,viewer1);CHKERRQ(ierr); >>> ierr = VecView(mB,viewer1);CHKERRQ(ierr); >>> >>> >>> -- >>> -- >>> A/Prof Richard Beare >>> Imaging and Bioinformatics, Peninsula Clinical School >>> orcid.org/0000-0002-7530-5664 >>> Richard.Beare at monash.edu >>> +61 3 9788 1724 >>> >>> >>> >>> Geospatial Research: >>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > > > -- > -- > A/Prof Richard Beare > Imaging and Bioinformatics, Peninsula Clinical School > orcid.org/0000-0002-7530-5664 > Richard.Beare at monash.edu > +61 3 9788 1724 > > > > Geospatial Research: > https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierpaolo.minelli at cnr.it Mon Feb 24 04:30:24 2020 From: pierpaolo.minelli at cnr.it (Pierpaolo Minelli) Date: Mon, 24 Feb 2020 11:30:24 +0100 Subject: [petsc-users] Using DMDAs with an assigned domain decomposition to Solve Poisson Equation Message-ID: <37B03CED-D8A0-4736-9F5C-2994200BF2D5@cnr.it> Hi, I'm developing a 3D code in Fortran to study the space-time evolution of charged particles within a Cartesian domain. The domain decomposition has been made by me taking into account symmetry and load balancing reasons related to my specific problem. In this first draft, it will remain constant throughout my simulation. Is there a way, using DMDAs, to solve Poisson's equation, using the domain decomposition above, obtaining as a result the local solution including its ghost cells values? As input data at each time-step I know the electric charge density in each local subdomain (RHS), including the ghost cells, even if I don't think they are useful for the calculation of the equation. Matrix coefficients (LHS) and boundary conditions are constant during my simulation. As an output I would need to know the local electrical potential in each local subdomain, including the values of the ghost cells in each dimension(X,Y,Z). Is there an example that I can use in Fortran to solve this kind of problem? Thanks in advance Pierpaolo Minelli From mfadams at lbl.gov Mon Feb 24 05:08:31 2020 From: mfadams at lbl.gov (Mark Adams) Date: Mon, 24 Feb 2020 06:08:31 -0500 Subject: [petsc-users] Using DMDAs with an assigned domain decomposition to Solve Poisson Equation In-Reply-To: <37B03CED-D8A0-4736-9F5C-2994200BF2D5@cnr.it> References: <37B03CED-D8A0-4736-9F5C-2994200BF2D5@cnr.it> Message-ID: On Mon, Feb 24, 2020 at 5:30 AM Pierpaolo Minelli wrote: > Hi, > I'm developing a 3D code in Fortran to study the space-time evolution of > charged particles within a Cartesian domain. > The domain decomposition has been made by me taking into account symmetry > and load balancing reasons related to my specific problem. In this first > draft, it will remain constant throughout my simulation. > > Is there a way, using DMDAs, to solve Poisson's equation, using the domain > decomposition above, obtaining as a result the local solution including its > ghost cells values? > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DM/DMGlobalToLocalBegin.html#DMGlobalToLocalBegin > > As input data at each time-step I know the electric charge density in each > local subdomain (RHS), including the ghost cells, even if I don't think > they are useful for the calculation of the equation. > Matrix coefficients (LHS) and boundary conditions are constant during my > simulation. > > As an output I would need to know the local electrical potential in each > local subdomain, including the values of the ghost cells in each > dimension(X,Y,Z). > > Is there an example that I can use in Fortran to solve this kind of > problem? > I see one, but it is not hard to convert a C example: https://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/ex14f.F90.html > > Thanks in advance > > Pierpaolo Minelli > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Feb 24 05:24:24 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 24 Feb 2020 06:24:24 -0500 Subject: [petsc-users] Using DMDAs with an assigned domain decomposition to Solve Poisson Equation In-Reply-To: <37B03CED-D8A0-4736-9F5C-2994200BF2D5@cnr.it> References: <37B03CED-D8A0-4736-9F5C-2994200BF2D5@cnr.it> Message-ID: On Mon, Feb 24, 2020 at 5:30 AM Pierpaolo Minelli wrote: > Hi, > I'm developing a 3D code in Fortran to study the space-time evolution of > charged particles within a Cartesian domain. > The domain decomposition has been made by me taking into account symmetry > and load balancing reasons related to my specific problem. That may be a problem. DMDA can only decompose itself along straight lines through the domain. Is that how your decomposition looks? > In this first draft, it will remain constant throughout my simulation. > > Is there a way, using DMDAs, to solve Poisson's equation, using the domain > decomposition above, obtaining as a result the local solution including its > ghost cells values? > How do you discretize the Poisson equation? Thanks, Matt > As input data at each time-step I know the electric charge density in each > local subdomain (RHS), including the ghost cells, even if I don't think > they are useful for the calculation of the equation. > Matrix coefficients (LHS) and boundary conditions are constant during my > simulation. > > As an output I would need to know the local electrical potential in each > local subdomain, including the values of the ghost cells in each > dimension(X,Y,Z). > > Is there an example that I can use in Fortran to solve this kind of > problem? > > Thanks in advance > > Pierpaolo Minelli > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierpaolo.minelli at cnr.it Mon Feb 24 05:29:39 2020 From: pierpaolo.minelli at cnr.it (Pierpaolo Minelli) Date: Mon, 24 Feb 2020 12:29:39 +0100 Subject: [petsc-users] Using DMDAs with an assigned domain decomposition to Solve Poisson Equation In-Reply-To: References: <37B03CED-D8A0-4736-9F5C-2994200BF2D5@cnr.it> Message-ID: <00B8AC11-E70D-41F5-BEA5-96788B59715D@cnr.it> > Il giorno 24 feb 2020, alle ore 12:08, Mark Adams ha scritto: > > > On Mon, Feb 24, 2020 at 5:30 AM Pierpaolo Minelli > wrote: > Hi, > I'm developing a 3D code in Fortran to study the space-time evolution of charged particles within a Cartesian domain. > The domain decomposition has been made by me taking into account symmetry and load balancing reasons related to my specific problem. In this first draft, it will remain constant throughout my simulation. > > Is there a way, using DMDAs, to solve Poisson's equation, using the domain decomposition above, obtaining as a result the local solution including its ghost cells values? > > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DM/DMGlobalToLocalBegin.html#DMGlobalToLocalBegin > > > > As input data at each time-step I know the electric charge density in each local subdomain (RHS), including the ghost cells, even if I don't think they are useful for the calculation of the equation. > Matrix coefficients (LHS) and boundary conditions are constant during my simulation. > > As an output I would need to know the local electrical potential in each local subdomain, including the values of the ghost cells in each dimension(X,Y,Z). > > Is there an example that I can use in Fortran to solve this kind of problem? > > I see one, but it is not hard to convert a C example: > > https://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/ex14f.F90.html Thanks, i will try to give a look to this example and i will try to check also C examples in that directory. > > > Thanks in advance > > Pierpaolo Minelli > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierpaolo.minelli at cnr.it Mon Feb 24 05:35:23 2020 From: pierpaolo.minelli at cnr.it (Pierpaolo Minelli) Date: Mon, 24 Feb 2020 12:35:23 +0100 Subject: [petsc-users] Using DMDAs with an assigned domain decomposition to Solve Poisson Equation In-Reply-To: References: <37B03CED-D8A0-4736-9F5C-2994200BF2D5@cnr.it> Message-ID: <22A77097-9BFB-4FA5-A152-BF86D4D40D1F@cnr.it> > Il giorno 24 feb 2020, alle ore 12:24, Matthew Knepley ha scritto: > > On Mon, Feb 24, 2020 at 5:30 AM Pierpaolo Minelli > wrote: > Hi, > I'm developing a 3D code in Fortran to study the space-time evolution of charged particles within a Cartesian domain. > The domain decomposition has been made by me taking into account symmetry and load balancing reasons related to my specific problem. > > That may be a problem. DMDA can only decompose itself along straight lines through the domain. Is that how your decomposition looks? My decomposition at the moment is paractically a 2D decomposition because i have: M = 251 (X) N = 341 (Y) P = 161 (Z) and if i use 24 MPI procs, i divided my domain in a 3D Cartesian Topology with: m = 4 n = 6 p = 1 > > In this first draft, it will remain constant throughout my simulation. > > Is there a way, using DMDAs, to solve Poisson's equation, using the domain decomposition above, obtaining as a result the local solution including its ghost cells values? > > How do you discretize the Poisson equation? I intend to use a 7 point stencil like that in this example: https://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/ex22f.F90.html > > Thanks, > > Matt > > As input data at each time-step I know the electric charge density in each local subdomain (RHS), including the ghost cells, even if I don't think they are useful for the calculation of the equation. > Matrix coefficients (LHS) and boundary conditions are constant during my simulation. > > As an output I would need to know the local electrical potential in each local subdomain, including the values of the ghost cells in each dimension(X,Y,Z). > > Is there an example that I can use in Fortran to solve this kind of problem? > > Thanks in advance > > Pierpaolo Minelli > > Thanks Pierpaolo > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Feb 24 05:58:43 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 24 Feb 2020 06:58:43 -0500 Subject: [petsc-users] Using DMDAs with an assigned domain decomposition to Solve Poisson Equation In-Reply-To: <22A77097-9BFB-4FA5-A152-BF86D4D40D1F@cnr.it> References: <37B03CED-D8A0-4736-9F5C-2994200BF2D5@cnr.it> <22A77097-9BFB-4FA5-A152-BF86D4D40D1F@cnr.it> Message-ID: On Mon, Feb 24, 2020 at 6:35 AM Pierpaolo Minelli wrote: > > > Il giorno 24 feb 2020, alle ore 12:24, Matthew Knepley > ha scritto: > > On Mon, Feb 24, 2020 at 5:30 AM Pierpaolo Minelli < > pierpaolo.minelli at cnr.it> wrote: > >> Hi, >> I'm developing a 3D code in Fortran to study the space-time evolution of >> charged particles within a Cartesian domain. >> The domain decomposition has been made by me taking into account symmetry >> and load balancing reasons related to my specific problem. > > > That may be a problem. DMDA can only decompose itself along straight lines > through the domain. Is that how your decomposition looks? > > > My decomposition at the moment is paractically a 2D decomposition because > i have: > > M = 251 (X) > N = 341 (Y) > P = 161 (Z) > > and if i use 24 MPI procs, i divided my domain in a 3D Cartesian Topology > with: > > m = 4 > n = 6 > p = 1 > > > > >> In this first draft, it will remain constant throughout my simulation. >> >> Is there a way, using DMDAs, to solve Poisson's equation, using the >> domain decomposition above, obtaining as a result the local solution >> including its ghost cells values? >> > > How do you discretize the Poisson equation? > > > I intend to use a 7 point stencil like that in this example: > > > https://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/ex22f.F90.html > Okay, then you can do exactly as Mark says and use that example. This will allow you to use geometric multigrid for the Poisson problem. I don't think it can be beaten speed-wise. Thanks, Matt > > Thanks, > > Matt > > >> As input data at each time-step I know the electric charge density in >> each local subdomain (RHS), including the ghost cells, even if I don't >> think they are useful for the calculation of the equation. >> Matrix coefficients (LHS) and boundary conditions are constant during my >> simulation. >> >> As an output I would need to know the local electrical potential in each >> local subdomain, including the values of the ghost cells in each >> dimension(X,Y,Z). >> >> Is there an example that I can use in Fortran to solve this kind of >> problem? >> >> Thanks in advance >> >> Pierpaolo Minelli >> >> > > > Thanks > Pierpaolo > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierpaolo.minelli at cnr.it Mon Feb 24 06:07:09 2020 From: pierpaolo.minelli at cnr.it (Pierpaolo Minelli) Date: Mon, 24 Feb 2020 13:07:09 +0100 Subject: [petsc-users] Using DMDAs with an assigned domain decomposition to Solve Poisson Equation In-Reply-To: References: <37B03CED-D8A0-4736-9F5C-2994200BF2D5@cnr.it> <22A77097-9BFB-4FA5-A152-BF86D4D40D1F@cnr.it> Message-ID: > Il giorno 24 feb 2020, alle ore 12:58, Matthew Knepley ha scritto: > > On Mon, Feb 24, 2020 at 6:35 AM Pierpaolo Minelli > wrote: > > >> Il giorno 24 feb 2020, alle ore 12:24, Matthew Knepley > ha scritto: >> >> On Mon, Feb 24, 2020 at 5:30 AM Pierpaolo Minelli > wrote: >> Hi, >> I'm developing a 3D code in Fortran to study the space-time evolution of charged particles within a Cartesian domain. >> The domain decomposition has been made by me taking into account symmetry and load balancing reasons related to my specific problem. >> >> That may be a problem. DMDA can only decompose itself along straight lines through the domain. Is that how your decomposition looks? > > My decomposition at the moment is paractically a 2D decomposition because i have: > > M = 251 (X) > N = 341 (Y) > P = 161 (Z) > > and if i use 24 MPI procs, i divided my domain in a 3D Cartesian Topology with: > > m = 4 > n = 6 > p = 1 > > >> >> In this first draft, it will remain constant throughout my simulation. >> >> Is there a way, using DMDAs, to solve Poisson's equation, using the domain decomposition above, obtaining as a result the local solution including its ghost cells values? >> >> How do you discretize the Poisson equation? > > I intend to use a 7 point stencil like that in this example: > > https://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/ex22f.F90.html > > Okay, then you can do exactly as Mark says and use that example. This will allow you to use geometric multigrid > for the Poisson problem. I don't think it can be beaten speed-wise. > > Thanks, > > Matt > Ok, i will try this approach and let you know. Thanks again Pierpaolo >> >> Thanks, >> >> Matt >> >> As input data at each time-step I know the electric charge density in each local subdomain (RHS), including the ghost cells, even if I don't think they are useful for the calculation of the equation. >> Matrix coefficients (LHS) and boundary conditions are constant during my simulation. >> >> As an output I would need to know the local electrical potential in each local subdomain, including the values of the ghost cells in each dimension(X,Y,Z). >> >> Is there an example that I can use in Fortran to solve this kind of problem? >> >> Thanks in advance >> >> Pierpaolo Minelli >> >> > > > Thanks > Pierpaolo > >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Mon Feb 24 07:20:38 2020 From: mfadams at lbl.gov (Mark Adams) Date: Mon, 24 Feb 2020 08:20:38 -0500 Subject: [petsc-users] Using DMDAs with an assigned domain decomposition to Solve Poisson Equation In-Reply-To: <00B8AC11-E70D-41F5-BEA5-96788B59715D@cnr.it> References: <37B03CED-D8A0-4736-9F5C-2994200BF2D5@cnr.it> <00B8AC11-E70D-41F5-BEA5-96788B59715D@cnr.it> Message-ID: > > > > https://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/ex14f.F90.html > > > Thanks, i will try to give a look to this example and i will try to check > also C examples in that directory. > There are a lot of places with examples (this is actually not an obvious place). The web page for each method lists examples that use it and similar methods. You can find examples by following these links too. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jczhang at mcs.anl.gov Mon Feb 24 09:01:05 2020 From: jczhang at mcs.anl.gov (Junchao Zhang) Date: Mon, 24 Feb 2020 09:01:05 -0600 Subject: [petsc-users] Using PETSc with GPU supported SuperLU_Dist In-Reply-To: References: <264462B2-AE1F-4922-948E-0C6FCCB9A429@pnnl.gov> <4BDB7C51-7452-45CC-A118-4D3F4F5D03D1@pnnl.gov> <4C14C2B3-0CB1-4E5F-A414-D5FBC10F6F18@pnnl.gov> Message-ID: [0]PETSC ERROR: error in cudaSetDevice CUDA driver version is insufficient for CUDA runtime version That means you need to update your cuda driver for CUDA 10.2. See minimum requirement in Table 1 at https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#major-components --Junchao Zhang On Sun, Feb 23, 2020 at 3:33 PM Abhyankar, Shrirang G < shrirang.abhyankar at pnnl.gov> wrote: > I was using CUDA v10.2. Switching to 9.2 gives a clean make test. > > > > Thanks, > > Shri > > > > > > *From: *petsc-users on behalf of > "Abhyankar, Shrirang G via petsc-users" > *Reply-To: *"Abhyankar, Shrirang G" > *Date: *Sunday, February 23, 2020 at 3:10 PM > *To: *petsc-users , Junchao Zhang < > jczhang at mcs.anl.gov> > *Subject: *Re: [petsc-users] Using PETSc with GPU supported SuperLU_Dist > > > > I am getting an error now for CUDA driver version. Any suggestions? > > > > petsc:maint$ make test > > Running test examples to verify correct installation > > Using PETSC_DIR=/people/abhy245/software/petsc and > PETSC_ARCH=debug-mode-newell > > Possible error running C/C++ src/snes/examples/tutorials/ex19 with 1 MPI > process > > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [0]PETSC ERROR: Error in system call > > [0]PETSC ERROR: error in cudaSetDevice CUDA driver version is insufficient > for CUDA runtime version > > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown > > [0]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by > abhy245 Sun Feb 23 12:49:55 2020 > > [0]PETSC ERROR: Configure options --download-fblaslapack --download-make > --download-metis --download-parmetis --download-scalapack > --download-suitesparse --download-superlu_dist-gpu=1 > --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++ > --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1 > --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1 > PETSC_ARCH=debug-mode-newell > > [0]PETSC ERROR: #1 PetscCUDAInitialize() line 261 in > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c > > [0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c > > [0]PETSC ERROR: #3 PetscInitialize() line 1010 in > /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c > > -------------------------------------------------------------------------- > > Primary job terminated normally, but 1 process returned > > a non-zero exit code. Per user-direction, the job has been aborted. > > -------------------------------------------------------------------------- > > -------------------------------------------------------------------------- > > mpiexec detected that one or more processes exited with non-zero status, > thus causing > > the job to be terminated. The first process to do so was: > > > > Process name: [[46518,1],0] > > Exit code: 88 > > -------------------------------------------------------------------------- > > Possible error running C/C++ src/snes/examples/tutorials/ex19 with 2 MPI > processes > > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [1]PETSC ERROR: Error in system call > > [1]PETSC ERROR: [0]PETSC ERROR: Error in system call > > [0]PETSC ERROR: error in cudaGetDeviceCount CUDA driver version is > insufficient for CUDA runtime version > > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > error in cudaGetDeviceCount CUDA driver version is insufficient for CUDA > runtime version > > [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [1]PETSC ERROR: [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown > > [0]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by > abhy245 Sun Feb 23 12:49:57 2020 > > [0]PETSC ERROR: Configure options --download-fblaslapack --download-make > --download-metis --download-parmetis --download-scalapack > --download-suitesparse --download-superlu_dist-gpu=1 > --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++ > --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1 > --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1 > PETSC_ARCH=debug-mode-newell > > [0]PETSC ERROR: #1 PetscCUDAInitialize() line 254 in > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c > > [0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c > > [0]PETSC ERROR: #3 PetscInitialize() line 1010 in > /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c > > Petsc Release Version 3.12.4, unknown > > [1]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by > abhy245 Sun Feb 23 12:49:57 2020 > > [1]PETSC ERROR: Configure options --download-fblaslapack --download-make > --download-metis --download-parmetis --download-scalapack > --download-suitesparse --download-superlu_dist-gpu=1 > --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++ > --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1 > --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1 > PETSC_ARCH=debug-mode-newell > > [1]PETSC ERROR: #1 PetscCUDAInitialize() line 254 in > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c > > [1]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c > > [1]PETSC ERROR: #3 PetscInitialize() line 1010 in > /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c > > -------------------------------------------------------------------------- > > Primary job terminated normally, but 1 process returned > > a non-zero exit code. Per user-direction, the job has been aborted. > > -------------------------------------------------------------------------- > > -------------------------------------------------------------------------- > > mpiexec detected that one or more processes exited with non-zero status, > thus causing > > the job to be terminated. The first process to do so was: > > > > Process name: [[46522,1],0] > > Exit code: 88 > > -------------------------------------------------------------------------- > > 1,2c1,21 > > < lid velocity = 0.0025, prandtl # = 1., grashof # = 1. > > < Number of SNES iterations = 2 > > --- > > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > > [0]PETSC ERROR: Error in system call > > > [0]PETSC ERROR: error in cudaSetDevice CUDA driver version is > insufficient for CUDA runtime version > > > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > > [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown > > > [0]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by > abhy245 Sun Feb 23 12:50:00 2020 > > > [0]PETSC ERROR: Configure options --download-fblaslapack --download-make > --download-metis --download-parmetis --download-scalapack > --download-suitesparse --download-superlu_dist-gpu=1 > --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++ > --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1 > --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1 > PETSC_ARCH=debug-mode-newell > > > [0]PETSC ERROR: #1 PetscCUDAInitialize() line 261 in > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c > > > [0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c > > > [0]PETSC ERROR: #3 PetscInitialize() line 1010 in > /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c > > > > -------------------------------------------------------------------------- > > > Primary job terminated normally, but 1 process returned > > > a non-zero exit code. Per user-direction, the job has been aborted. > > > > -------------------------------------------------------------------------- > > > > -------------------------------------------------------------------------- > > > mpiexec detected that one or more processes exited with non-zero status, > thus causing > > > the job to be terminated. The first process to do so was: > > > > > > Process name: [[46545,1],0] > > > Exit code: 88 > > > > -------------------------------------------------------------------------- > > /people/abhy245/software/petsc/src/snes/examples/tutorials > > Possible problem with ex19 running with superlu_dist, diffs above > > ========================================= > > Possible error running Fortran example src/snes/examples/tutorials/ex5f > with 1 MPI process > > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [0]PETSC ERROR: Error in system call > > [0]PETSC ERROR: error in cudaSetDevice CUDA driver version is insufficient > for CUDA runtime version > > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown > > [0]PETSC ERROR: ./ex5f on a debug-mode-newell named newell01.pnl.gov by > abhy245 Sun Feb 23 12:50:04 2020 > > [0]PETSC ERROR: Configure options --download-fblaslapack --download-make > --download-metis --download-parmetis --download-scalapack > --download-suitesparse --download-superlu_dist-gpu=1 > --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++ > --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1 > --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1 > PETSC_ARCH=debug-mode-newell > > [0]PETSC ERROR: #1 PetscCUDAInitialize() line 261 in > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c > > [0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c > > [0]PETSC ERROR: PetscInitialize:Checking initial options > > Unable to initialize PETSc > > -------------------------------------------------------------------------- > > mpiexec has exited due to process rank 0 with PID 0 on > > node newell01 exiting improperly. There are three reasons this could occur: > > > > 1. this process did not call "init" before exiting, but others in > > the job did. This can cause a job to hang indefinitely while it waits > > for all processes to call "init". By rule, if one process calls "init", > > then ALL processes must call "init" prior to termination. > > > > 2. this process called "init", but exited without calling "finalize". > > By rule, all processes that call "init" MUST call "finalize" prior to > > exiting or it will be considered an "abnormal termination" > > > > 3. this process called "MPI_Abort" or "orte_abort" and the mca parameter > > orte_create_session_dirs is set to false. In this case, the run-time cannot > > detect that the abort call was an abnormal termination. Hence, the only > > error message you will receive is this one. > > > > This may have caused other processes in the application to be > > terminated by signals sent by mpiexec (as reported here). > > > > You can avoid this message by specifying -quiet on the mpiexec command > line. > > -------------------------------------------------------------------------- > > Completed test examples > > *From: *Satish Balay > *Reply-To: *petsc-users > *Date: *Saturday, February 22, 2020 at 9:00 PM > *To: *Junchao Zhang > *Cc: *"Abhyankar, Shrirang G" , petsc-users < > petsc-users at mcs.anl.gov> > *Subject: *Re: [petsc-users] Using PETSc with GPU supported SuperLU_Dist > > > > The fix is now in both maint and master > > > > https://gitlab.com/petsc/petsc/-/merge_requests/2555 > > > > Satish > > > > On Sat, 22 Feb 2020, Junchao Zhang via petsc-users wrote: > > > > We met the error before and knew why. Will fix it soon. > > --Junchao Zhang > > On Sat, Feb 22, 2020 at 11:43 AM Abhyankar, Shrirang G via petsc-users < > > petsc-users at mcs.anl.gov> wrote: > > > Thanks, Satish. Configure and make go through fine. Getting an undefined > > > reference error for VecGetArrayWrite_SeqCUDA. > > > > > > > > > > > > Shri > > > > > > *From: *Satish Balay > > > *Reply-To: *petsc-users > > > *Date: *Saturday, February 22, 2020 at 8:25 AM > > > *To: *"Abhyankar, Shrirang G" > > > *Cc: *"petsc-users at mcs.anl.gov" > > > *Subject: *Re: [petsc-users] Using PETSc with GPU supported SuperLU_Dist > > > > > > > > > > > > On Sat, 22 Feb 2020, Abhyankar, Shrirang G via petsc-users wrote: > > > > > > > > > > > > Hi, > > > > > > I want to install PETSc with GPU supported SuperLU_Dist. What are the > > > configure options I should be using? > > > > > > > > > > > > > > > > > > Shri, > > > > > > > > > > > > > > > > > > if self.framework.argDB['download-superlu_dist-gpu']: > > > > > > self.cuda = > framework.require('config.packages.cuda',self) > > > > > > self.openmp = > > > framework.require('config.packages.openmp',self) > > > > > > self.deps = > > > [self.mpi,self.blasLapack,self.cuda,self.openmp] > > > > > > <<<<< > > > > > > > > > > > > So try: > > > > > > > > > > > > --with-cuda=1 --download-superlu_dist=1 --download-superlu_dist-gpu=1 > > > --with-openmp=1 [and usual MPI, blaslapack] > > > > > > > > > > > > Satish > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Mon Feb 24 09:18:51 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 24 Feb 2020 09:18:51 -0600 Subject: [petsc-users] Using PETSc with GPU supported SuperLU_Dist In-Reply-To: References: <264462B2-AE1F-4922-948E-0C6FCCB9A429@pnnl.gov> <4BDB7C51-7452-45CC-A118-4D3F4F5D03D1@pnnl.gov> <4C14C2B3-0CB1-4E5F-A414-D5FBC10F6F18@pnnl.gov> Message-ID: nvidia-smi gives some relevant info. I'm not sure what exactly the cuda-version listed here refers to.. [is it the max version of cuda - this driver is compatible with?] Satish ----- [balay at p1 ~]$ nvidia-smi Mon Feb 24 09:15:26 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.59 Driver Version: 440.59 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Quadro T2000 Off | 00000000:01:00.0 Off | N/A | | N/A 45C P8 4W / N/A | 182MiB / 3911MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1372 G /usr/libexec/Xorg 180MiB | +-----------------------------------------------------------------------------+ [balay at p1 ~]$ On Mon, 24 Feb 2020, Junchao Zhang via petsc-users wrote: > [0]PETSC ERROR: error in cudaSetDevice CUDA driver version is insufficient > for CUDA runtime version > > That means you need to update your cuda driver for CUDA 10.2. See minimum > requirement in Table 1 at > https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#major-components > > --Junchao Zhang > > > On Sun, Feb 23, 2020 at 3:33 PM Abhyankar, Shrirang G < > shrirang.abhyankar at pnnl.gov> wrote: > > > I was using CUDA v10.2. Switching to 9.2 gives a clean make test. > > > > > > > > Thanks, > > > > Shri > > > > > > > > > > > > *From: *petsc-users on behalf of > > "Abhyankar, Shrirang G via petsc-users" > > *Reply-To: *"Abhyankar, Shrirang G" > > *Date: *Sunday, February 23, 2020 at 3:10 PM > > *To: *petsc-users , Junchao Zhang < > > jczhang at mcs.anl.gov> > > *Subject: *Re: [petsc-users] Using PETSc with GPU supported SuperLU_Dist > > > > > > > > I am getting an error now for CUDA driver version. Any suggestions? > > > > > > > > petsc:maint$ make test > > > > Running test examples to verify correct installation > > > > Using PETSC_DIR=/people/abhy245/software/petsc and > > PETSC_ARCH=debug-mode-newell > > > > Possible error running C/C++ src/snes/examples/tutorials/ex19 with 1 MPI > > process > > > > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > > > [0]PETSC ERROR: --------------------- Error Message > > -------------------------------------------------------------- > > > > [0]PETSC ERROR: Error in system call > > > > [0]PETSC ERROR: error in cudaSetDevice CUDA driver version is insufficient > > for CUDA runtime version > > > > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > > for trouble shooting. > > > > [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown > > > > [0]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by > > abhy245 Sun Feb 23 12:49:55 2020 > > > > [0]PETSC ERROR: Configure options --download-fblaslapack --download-make > > --download-metis --download-parmetis --download-scalapack > > --download-suitesparse --download-superlu_dist-gpu=1 > > --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++ > > --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1 > > --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1 > > PETSC_ARCH=debug-mode-newell > > > > [0]PETSC ERROR: #1 PetscCUDAInitialize() line 261 in > > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c > > > > [0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in > > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c > > > > [0]PETSC ERROR: #3 PetscInitialize() line 1010 in > > /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c > > > > -------------------------------------------------------------------------- > > > > Primary job terminated normally, but 1 process returned > > > > a non-zero exit code. Per user-direction, the job has been aborted. > > > > -------------------------------------------------------------------------- > > > > -------------------------------------------------------------------------- > > > > mpiexec detected that one or more processes exited with non-zero status, > > thus causing > > > > the job to be terminated. The first process to do so was: > > > > > > > > Process name: [[46518,1],0] > > > > Exit code: 88 > > > > -------------------------------------------------------------------------- > > > > Possible error running C/C++ src/snes/examples/tutorials/ex19 with 2 MPI > > processes > > > > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > > > [0]PETSC ERROR: --------------------- Error Message > > -------------------------------------------------------------- > > > > [1]PETSC ERROR: --------------------- Error Message > > -------------------------------------------------------------- > > > > [1]PETSC ERROR: Error in system call > > > > [1]PETSC ERROR: [0]PETSC ERROR: Error in system call > > > > [0]PETSC ERROR: error in cudaGetDeviceCount CUDA driver version is > > insufficient for CUDA runtime version > > > > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > > for trouble shooting. > > > > error in cudaGetDeviceCount CUDA driver version is insufficient for CUDA > > runtime version > > > > [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > > for trouble shooting. > > > > [1]PETSC ERROR: [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown > > > > [0]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by > > abhy245 Sun Feb 23 12:49:57 2020 > > > > [0]PETSC ERROR: Configure options --download-fblaslapack --download-make > > --download-metis --download-parmetis --download-scalapack > > --download-suitesparse --download-superlu_dist-gpu=1 > > --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++ > > --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1 > > --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1 > > PETSC_ARCH=debug-mode-newell > > > > [0]PETSC ERROR: #1 PetscCUDAInitialize() line 254 in > > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c > > > > [0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in > > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c > > > > [0]PETSC ERROR: #3 PetscInitialize() line 1010 in > > /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c > > > > Petsc Release Version 3.12.4, unknown > > > > [1]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by > > abhy245 Sun Feb 23 12:49:57 2020 > > > > [1]PETSC ERROR: Configure options --download-fblaslapack --download-make > > --download-metis --download-parmetis --download-scalapack > > --download-suitesparse --download-superlu_dist-gpu=1 > > --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++ > > --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1 > > --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1 > > PETSC_ARCH=debug-mode-newell > > > > [1]PETSC ERROR: #1 PetscCUDAInitialize() line 254 in > > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c > > > > [1]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in > > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c > > > > [1]PETSC ERROR: #3 PetscInitialize() line 1010 in > > /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c > > > > -------------------------------------------------------------------------- > > > > Primary job terminated normally, but 1 process returned > > > > a non-zero exit code. Per user-direction, the job has been aborted. > > > > -------------------------------------------------------------------------- > > > > -------------------------------------------------------------------------- > > > > mpiexec detected that one or more processes exited with non-zero status, > > thus causing > > > > the job to be terminated. The first process to do so was: > > > > > > > > Process name: [[46522,1],0] > > > > Exit code: 88 > > > > -------------------------------------------------------------------------- > > > > 1,2c1,21 > > > > < lid velocity = 0.0025, prandtl # = 1., grashof # = 1. > > > > < Number of SNES iterations = 2 > > > > --- > > > > > [0]PETSC ERROR: --------------------- Error Message > > -------------------------------------------------------------- > > > > > [0]PETSC ERROR: Error in system call > > > > > [0]PETSC ERROR: error in cudaSetDevice CUDA driver version is > > insufficient for CUDA runtime version > > > > > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > > for trouble shooting. > > > > > [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown > > > > > [0]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by > > abhy245 Sun Feb 23 12:50:00 2020 > > > > > [0]PETSC ERROR: Configure options --download-fblaslapack --download-make > > --download-metis --download-parmetis --download-scalapack > > --download-suitesparse --download-superlu_dist-gpu=1 > > --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++ > > --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1 > > --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1 > > PETSC_ARCH=debug-mode-newell > > > > > [0]PETSC ERROR: #1 PetscCUDAInitialize() line 261 in > > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c > > > > > [0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in > > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c > > > > > [0]PETSC ERROR: #3 PetscInitialize() line 1010 in > > /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c > > > > > > > -------------------------------------------------------------------------- > > > > > Primary job terminated normally, but 1 process returned > > > > > a non-zero exit code. Per user-direction, the job has been aborted. > > > > > > > -------------------------------------------------------------------------- > > > > > > > -------------------------------------------------------------------------- > > > > > mpiexec detected that one or more processes exited with non-zero status, > > thus causing > > > > > the job to be terminated. The first process to do so was: > > > > > > > > > > Process name: [[46545,1],0] > > > > > Exit code: 88 > > > > > > > -------------------------------------------------------------------------- > > > > /people/abhy245/software/petsc/src/snes/examples/tutorials > > > > Possible problem with ex19 running with superlu_dist, diffs above > > > > ========================================= > > > > Possible error running Fortran example src/snes/examples/tutorials/ex5f > > with 1 MPI process > > > > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > > > [0]PETSC ERROR: --------------------- Error Message > > -------------------------------------------------------------- > > > > [0]PETSC ERROR: Error in system call > > > > [0]PETSC ERROR: error in cudaSetDevice CUDA driver version is insufficient > > for CUDA runtime version > > > > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > > for trouble shooting. > > > > [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown > > > > [0]PETSC ERROR: ./ex5f on a debug-mode-newell named newell01.pnl.gov by > > abhy245 Sun Feb 23 12:50:04 2020 > > > > [0]PETSC ERROR: Configure options --download-fblaslapack --download-make > > --download-metis --download-parmetis --download-scalapack > > --download-suitesparse --download-superlu_dist-gpu=1 > > --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++ > > --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1 > > --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1 > > PETSC_ARCH=debug-mode-newell > > > > [0]PETSC ERROR: #1 PetscCUDAInitialize() line 261 in > > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c > > > > [0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in > > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c > > > > [0]PETSC ERROR: PetscInitialize:Checking initial options > > > > Unable to initialize PETSc > > > > -------------------------------------------------------------------------- > > > > mpiexec has exited due to process rank 0 with PID 0 on > > > > node newell01 exiting improperly. There are three reasons this could occur: > > > > > > > > 1. this process did not call "init" before exiting, but others in > > > > the job did. This can cause a job to hang indefinitely while it waits > > > > for all processes to call "init". By rule, if one process calls "init", > > > > then ALL processes must call "init" prior to termination. > > > > > > > > 2. this process called "init", but exited without calling "finalize". > > > > By rule, all processes that call "init" MUST call "finalize" prior to > > > > exiting or it will be considered an "abnormal termination" > > > > > > > > 3. this process called "MPI_Abort" or "orte_abort" and the mca parameter > > > > orte_create_session_dirs is set to false. In this case, the run-time cannot > > > > detect that the abort call was an abnormal termination. Hence, the only > > > > error message you will receive is this one. > > > > > > > > This may have caused other processes in the application to be > > > > terminated by signals sent by mpiexec (as reported here). > > > > > > > > You can avoid this message by specifying -quiet on the mpiexec command > > line. > > > > -------------------------------------------------------------------------- > > > > Completed test examples > > > > *From: *Satish Balay > > *Reply-To: *petsc-users > > *Date: *Saturday, February 22, 2020 at 9:00 PM > > *To: *Junchao Zhang > > *Cc: *"Abhyankar, Shrirang G" , petsc-users < > > petsc-users at mcs.anl.gov> > > *Subject: *Re: [petsc-users] Using PETSc with GPU supported SuperLU_Dist > > > > > > > > The fix is now in both maint and master > > > > > > > > https://gitlab.com/petsc/petsc/-/merge_requests/2555 > > > > > > > > Satish > > > > > > > > On Sat, 22 Feb 2020, Junchao Zhang via petsc-users wrote: > > > > > > > > We met the error before and knew why. Will fix it soon. > > > > --Junchao Zhang > > > > On Sat, Feb 22, 2020 at 11:43 AM Abhyankar, Shrirang G via petsc-users < > > > > petsc-users at mcs.anl.gov> wrote: > > > > > Thanks, Satish. Configure and make go through fine. Getting an undefined > > > > > reference error for VecGetArrayWrite_SeqCUDA. > > > > > > > > > > > > > > > > > > > > Shri > > > > > > > > > > *From: *Satish Balay > > > > > *Reply-To: *petsc-users > > > > > *Date: *Saturday, February 22, 2020 at 8:25 AM > > > > > *To: *"Abhyankar, Shrirang G" > > > > > *Cc: *"petsc-users at mcs.anl.gov" > > > > > *Subject: *Re: [petsc-users] Using PETSc with GPU supported SuperLU_Dist > > > > > > > > > > > > > > > > > > > > On Sat, 22 Feb 2020, Abhyankar, Shrirang G via petsc-users wrote: > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > I want to install PETSc with GPU supported SuperLU_Dist. What are the > > > > > configure options I should be using? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Shri, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > if self.framework.argDB['download-superlu_dist-gpu']: > > > > > > > > > > self.cuda = > > framework.require('config.packages.cuda',self) > > > > > > > > > > self.openmp = > > > > > framework.require('config.packages.openmp',self) > > > > > > > > > > self.deps = > > > > > [self.mpi,self.blasLapack,self.cuda,self.openmp] > > > > > > > > > > <<<<< > > > > > > > > > > > > > > > > > > > > So try: > > > > > > > > > > > > > > > > > > > > --with-cuda=1 --download-superlu_dist=1 --download-superlu_dist-gpu=1 > > > > > --with-openmp=1 [and usual MPI, blaslapack] > > > > > > > > > > > > > > > > > > > > Satish > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From balay at mcs.anl.gov Mon Feb 24 09:35:36 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 24 Feb 2020 09:35:36 -0600 Subject: [petsc-users] Correct approach for updating deprecated code In-Reply-To: References: Message-ID: Perhaps this is helpful. https://gitlab.com/petsc/petsc/commits/6a9046bcf1dc7e213a87d3843bfa02f323786ad4 Satish On Sun, 23 Feb 2020, Matthew Knepley wrote: > I think you are going to have to send the error logs. > > Thanks, > > MAtt > > On Sun, Feb 23, 2020 at 6:45 PM Richard Beare > wrote: > > > That's what I did (see below), but I got ordering errors (unfortunately > > deleted those logs too soon). I'll rerun if no one recognises what I've > > done wrong. > > > > PetscViewer viewer1; > > ierr = PetscViewerBinaryOpen(PETSC_COMM_WORLD,fileName.c_str > > (),FILE_MODE_WRITE,&viewer1);CHKERRQ(ierr); > > //ierr = > > PetscViewerSetFormat(viewer1,PETSC_VIEWER_BINARY_MATLAB);CHKERRQ(ierr); > > ierr = PetscViewerPushFormat(viewer1,PETSC_VIEWER_BINARY_MATLAB);CHKERRQ > > (ierr); > > > > ierr = PetscObjectSetName((PetscObject)mX,"x");CHKERRQ(ierr); > > ierr = PetscObjectSetName((PetscObject)mB,"b");CHKERRQ(ierr); > > > > On Mon, 24 Feb 2020 at 10:43, Matthew Knepley wrote: > > > >> On Sun, Feb 23, 2020 at 6:25 PM Richard Beare via petsc-users < > >> petsc-users at mcs.anl.gov> wrote: > >> > >>> > >>> Hi, > >>> The following code gives a deprecation warning. What is the correct way > >>> of updating the use of ViewerSetFormat to ViewerPushFormat (which I presume > >>> is the preferred replacement). My first attempt gave errors concerning > >>> ordering. > >>> > >> > >> You can't just change SetFormat to PushFormat here? > >> > >> Matt > >> > >> > >>> Thanks > >>> > >>> PetscViewer viewer1; > >>> ierr = PetscViewerBinaryOpen(PETSC_COMM_WORLD,fileName.c_str > >>> (),FILE_MODE_WRITE,&viewer1);CHKERRQ(ierr); > >>> ierr = PetscViewerSetFormat(viewer1,PETSC_VIEWER_BINARY_MATLAB);CHKERRQ > >>> (ierr); > >>> > >>> ierr = PetscObjectSetName((PetscObject)mX,"x");CHKERRQ(ierr); > >>> ierr = PetscObjectSetName((PetscObject)mB,"b");CHKERRQ(ierr); > >>> > >>> ierr = VecView(mX,viewer1);CHKERRQ(ierr); > >>> ierr = VecView(mB,viewer1);CHKERRQ(ierr); > >>> > >>> > >>> -- > >>> -- > >>> A/Prof Richard Beare > >>> Imaging and Bioinformatics, Peninsula Clinical School > >>> orcid.org/0000-0002-7530-5664 > >>> Richard.Beare at monash.edu > >>> +61 3 9788 1724 > >>> > >>> > >>> > >>> Geospatial Research: > >>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis > >>> > >> > >> > >> -- > >> What most experimenters take for granted before they begin their > >> experiments is infinitely more interesting than any results to which their > >> experiments lead. > >> -- Norbert Wiener > >> > >> https://www.cse.buffalo.edu/~knepley/ > >> > >> > > > > > > -- > > -- > > A/Prof Richard Beare > > Imaging and Bioinformatics, Peninsula Clinical School > > orcid.org/0000-0002-7530-5664 > > Richard.Beare at monash.edu > > +61 3 9788 1724 > > > > > > > > Geospatial Research: > > https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis > > > > > From knepley at gmail.com Mon Feb 24 10:04:52 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 24 Feb 2020 11:04:52 -0500 Subject: [petsc-users] Correct approach for updating deprecated code In-Reply-To: References: Message-ID: On Sun, Feb 23, 2020 at 6:45 PM Richard Beare wrote: > That's what I did (see below), but I got ordering errors (unfortunately > deleted those logs too soon). I'll rerun if no one recognises what I've > done wrong. > > PetscViewer viewer1; > ierr = PetscViewerBinaryOpen(PETSC_COMM_WORLD,fileName.c_str > (),FILE_MODE_WRITE,&viewer1);CHKERRQ(ierr); > //ierr = > PetscViewerSetFormat(viewer1,PETSC_VIEWER_BINARY_MATLAB);CHKERRQ(ierr); > ierr = PetscViewerPushFormat(viewer1,PETSC_VIEWER_BINARY_MATLAB);CHKERRQ > (ierr); > This should not cause problems. However, is it possible that somewhere you are pushing a format again and again without popping? This could exceed the stack size. Thanks, Matt > ierr = PetscObjectSetName((PetscObject)mX,"x");CHKERRQ(ierr); > ierr = PetscObjectSetName((PetscObject)mB,"b");CHKERRQ(ierr); > > On Mon, 24 Feb 2020 at 10:43, Matthew Knepley wrote: > >> On Sun, Feb 23, 2020 at 6:25 PM Richard Beare via petsc-users < >> petsc-users at mcs.anl.gov> wrote: >> >>> >>> Hi, >>> The following code gives a deprecation warning. What is the correct way >>> of updating the use of ViewerSetFormat to ViewerPushFormat (which I presume >>> is the preferred replacement). My first attempt gave errors concerning >>> ordering. >>> >> >> You can't just change SetFormat to PushFormat here? >> >> Matt >> >> >>> Thanks >>> >>> PetscViewer viewer1; >>> ierr = PetscViewerBinaryOpen(PETSC_COMM_WORLD,fileName.c_str >>> (),FILE_MODE_WRITE,&viewer1);CHKERRQ(ierr); >>> ierr = PetscViewerSetFormat(viewer1,PETSC_VIEWER_BINARY_MATLAB);CHKERRQ >>> (ierr); >>> >>> ierr = PetscObjectSetName((PetscObject)mX,"x");CHKERRQ(ierr); >>> ierr = PetscObjectSetName((PetscObject)mB,"b");CHKERRQ(ierr); >>> >>> ierr = VecView(mX,viewer1);CHKERRQ(ierr); >>> ierr = VecView(mB,viewer1);CHKERRQ(ierr); >>> >>> >>> -- >>> -- >>> A/Prof Richard Beare >>> Imaging and Bioinformatics, Peninsula Clinical School >>> orcid.org/0000-0002-7530-5664 >>> Richard.Beare at monash.edu >>> +61 3 9788 1724 >>> >>> >>> >>> Geospatial Research: >>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > > > -- > -- > A/Prof Richard Beare > Imaging and Bioinformatics, Peninsula Clinical School > orcid.org/0000-0002-7530-5664 > Richard.Beare at monash.edu > +61 3 9788 1724 > > > > Geospatial Research: > https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From juaneah at gmail.com Mon Feb 24 16:50:53 2020 From: juaneah at gmail.com (Emmanuel Ayala) Date: Mon, 24 Feb 2020 16:50:53 -0600 Subject: [petsc-users] Behavior of PetscViewerVTKOpen Message-ID: Hi everyone, I think VTK is not the best option to save files, but I just want a quick way to visualize a structured grid generated with DMDACreate3d. I visualize the vts file with ParaView. The situation is, If I change the number global dimension in each direction of the array (*M,N,P *parameters for DMDACreate3d) it is supposed that the size of the mesh won't changes, ONLY the number of division in the affected direction, but it does NOT happen ( I used DMDASetUniformCoordinates to have an uniform grid ). When I save the file, and then check it with ParaView, the size changes. (I checked the coordinates with DMGetCoordinates everything is OK, size and division), the problem is in the visualization using VTK. I use the next set of functions to save the file. ierr = PetscViewerVTKOpen(PETSC_COMM_WORLD,"nodes_coord.vts",FILE_MODE_WRITE,&view); CHKERRQ(ierr); ierr = DMGetCoordinates(da_nodes,&v1); CHKERRQ(ierr); // borrowed reference ierr = VecView(coord,view);CHKERRQ(ierr); Then, I realize that if I create a vector with DMCreateGlobalVector and then copy in it the coordinate from DMGetCoordinates, the size remains unchanged (in the visualization) and just the number of elements along the direction change (when I modify M,N,P). ierr = PetscViewerVTKOpen(PETSC_COMM_WORLD,"nodes_coord.vts",FILE_MODE_WRITE,&view); CHKERRQ(ierr); ierr = DMGetCoordinates(da_nodes,&v1); CHKERRQ(ierr); // borrowed reference ierr = DMCreateGlobalVector(da_nodes,&coord); CHKERRQ(ierr); ierr = VecCopy(v1,coord); CHKERRQ(ierr); ierr = VecView(coord,view);CHKERRQ(ierr); *There is something wrong with PetscViewerVTKOpen or it's just the approach that I used?* Kind regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon Feb 24 17:48:06 2020 From: jed at jedbrown.org (Jed Brown) Date: Mon, 24 Feb 2020 16:48:06 -0700 Subject: [petsc-users] Behavior of PetscViewerVTKOpen In-Reply-To: References: Message-ID: <87pne3o8cp.fsf@jedbrown.org> Emmanuel Ayala writes: > Hi everyone, > > I think VTK is not the best option to save files, but I just want a quick > way to visualize a structured grid generated with DMDACreate3d. I visualize > the vts file with ParaView. > > The situation is, If I change the number global dimension in each direction > of the array (*M,N,P *parameters for DMDACreate3d) it is supposed that the > size of the mesh won't changes, ONLY the number of division in the affected > direction, but it does NOT happen ( I used DMDASetUniformCoordinates to > have an uniform grid ). When I save the file, and then check it with > ParaView, the size changes. (I checked the coordinates with > DMGetCoordinates everything is OK, size and division), the problem is in > the visualization using VTK. > > I use the next set of functions to save the file. Normally you view a field defined on your DM (which will be correctly mapped by your coordinates), not the coordinates (which reside on a coordinate DM, which does not itself have coordinates). Try something like this: DMCreateGlobalVector(dm, &U); VecView(U, viewer); > ierr = > PetscViewerVTKOpen(PETSC_COMM_WORLD,"nodes_coord.vts",FILE_MODE_WRITE,&view); > CHKERRQ(ierr); > ierr = DMGetCoordinates(da_nodes,&v1); CHKERRQ(ierr); // borrowed reference > ierr = VecView(coord,view);CHKERRQ(ierr); > > Then, I realize that if I create a vector with DMCreateGlobalVector and > then copy in it the coordinate from DMGetCoordinates, the size remains > unchanged (in the visualization) and just the number of elements along the > direction change (when I modify M,N,P). > > ierr = > PetscViewerVTKOpen(PETSC_COMM_WORLD,"nodes_coord.vts",FILE_MODE_WRITE,&view); > CHKERRQ(ierr); > ierr = DMGetCoordinates(da_nodes,&v1); CHKERRQ(ierr); // borrowed reference > ierr = DMCreateGlobalVector(da_nodes,&coord); CHKERRQ(ierr); > ierr = VecCopy(v1,coord); CHKERRQ(ierr); > ierr = VecView(coord,view);CHKERRQ(ierr); > > *There is something wrong with PetscViewerVTKOpen or it's just the approach > that I used?* > > Kind regards. From juaneah at gmail.com Mon Feb 24 18:11:42 2020 From: juaneah at gmail.com (Emmanuel Ayala) Date: Mon, 24 Feb 2020 18:11:42 -0600 Subject: [petsc-users] Behavior of PetscViewerVTKOpen In-Reply-To: <87pne3o8cp.fsf@jedbrown.org> References: <87pne3o8cp.fsf@jedbrown.org> Message-ID: Ok, that's right, now i understand. Thanks! Kind regards. El lun., 24 de feb. de 2020 a la(s) 17:48, Jed Brown (jed at jedbrown.org) escribi?: > Emmanuel Ayala writes: > > > Hi everyone, > > > > I think VTK is not the best option to save files, but I just want a quick > > way to visualize a structured grid generated with DMDACreate3d. I > visualize > > the vts file with ParaView. > > > > The situation is, If I change the number global dimension in each > direction > > of the array (*M,N,P *parameters for DMDACreate3d) it is supposed that > the > > size of the mesh won't changes, ONLY the number of division in the > affected > > direction, but it does NOT happen ( I used DMDASetUniformCoordinates to > > have an uniform grid ). When I save the file, and then check it with > > ParaView, the size changes. (I checked the coordinates with > > DMGetCoordinates everything is OK, size and division), the problem is in > > the visualization using VTK. > > > > I use the next set of functions to save the file. > > Normally you view a field defined on your DM (which will be correctly > mapped by your coordinates), not the coordinates (which reside on a > coordinate DM, which does not itself have coordinates). Try something > like this: > > DMCreateGlobalVector(dm, &U); > VecView(U, viewer); > > > ierr = > > > PetscViewerVTKOpen(PETSC_COMM_WORLD,"nodes_coord.vts",FILE_MODE_WRITE,&view); > > CHKERRQ(ierr); > > ierr = DMGetCoordinates(da_nodes,&v1); CHKERRQ(ierr); // borrowed > reference > > ierr = VecView(coord,view);CHKERRQ(ierr); > > > > Then, I realize that if I create a vector with DMCreateGlobalVector and > > then copy in it the coordinate from DMGetCoordinates, the size remains > > unchanged (in the visualization) and just the number of elements along > the > > direction change (when I modify M,N,P). > > > > ierr = > > > PetscViewerVTKOpen(PETSC_COMM_WORLD,"nodes_coord.vts",FILE_MODE_WRITE,&view); > > CHKERRQ(ierr); > > ierr = DMGetCoordinates(da_nodes,&v1); CHKERRQ(ierr); // borrowed > reference > > ierr = DMCreateGlobalVector(da_nodes,&coord); CHKERRQ(ierr); > > ierr = VecCopy(v1,coord); CHKERRQ(ierr); > > ierr = VecView(coord,view);CHKERRQ(ierr); > > > > *There is something wrong with PetscViewerVTKOpen or it's just the > approach > > that I used?* > > > > Kind regards. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From claudio.tomasi at usi.ch Tue Feb 25 10:29:14 2020 From: claudio.tomasi at usi.ch (Claudio Tomasi) Date: Tue, 25 Feb 2020 16:29:14 +0000 Subject: [petsc-users] DMDAVecGetArrayDOF for read-only vectors Message-ID: Dear all, I would like to ask whether there's an analogous of the routine DMDAVecGetArrayDOF for read-only vectors. I'm using the SNES solver and in computing the SNESfunction I need to read and use the values of the solution vector x, which is created with DMCreateGlobalVector where the associated DM has two dofs per node. Best regards, Claudio Tomasi -------------- next part -------------- An HTML attachment was scrubbed... URL: From sajidsyed2021 at u.northwestern.edu Tue Feb 25 10:37:09 2020 From: sajidsyed2021 at u.northwestern.edu (Sajid Ali) Date: Tue, 25 Feb 2020 10:37:09 -0600 Subject: [petsc-users] Questions about TSAdjoint for time dependent parameters Message-ID: Hi PETSc-developers, Could the code used for section 5.1 of the recent paper "PETSc TSAdjoint: a discrete adjoint ODE solver for first-order and second-order sensitivity analysis" be shared ? Are there more examples that deal with time dependent parameters in the git repository ? Another question I have is regarding the equations used to introduce adjoints in section 7.1 of the manual where for the state of the solution vector is denoted by y and the parameters by p. [1] I'm unsure about what the partial derivative of y0 with respect to p means since I understand y0 to be the initial conditions used to solve the TS which would not depend on the parameters (since the parameters are related to the equations TS tries to solve for which should not dependent on the initialization used). Could someone clarify what this means ? [2] The manual described that a user has to set the correct initialization for the adjoint variables when calling TSSetCostGradients. The initialization for mu vector is whereby given to be d?i/dp at t=tF. If p is time dependent, does one evaluate this derivative with respect to p(t) at t=tF ? Thank You, Sajid Ali | PhD Candidate Applied Physics Northwestern University s-sajid-ali.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From ellen.price at cfa.harvard.edu Tue Feb 25 10:54:23 2020 From: ellen.price at cfa.harvard.edu (Ellen M. Price) Date: Tue, 25 Feb 2020 11:54:23 -0500 Subject: [petsc-users] Guidance on using Tao for unconstrained minimization Message-ID: <9480b1d3-1ed3-8108-15d8-a5914312c5f2@cfa.harvard.edu> Hello PETSc users! I am using Tao for an unconstrained minimization problem. I have found that CG works better than the other types for this application. After about 85 iterations, I get an error about line search failure. I'm not clear on what this means, or how I could mitigate the problem, and neither the manual nor FAQ give any guidance. Can anyone suggest things I could try to help the method converge? I have function and gradient info, but no Hessian. Thanks, Ellen Price From hongzhang at anl.gov Tue Feb 25 10:59:43 2020 From: hongzhang at anl.gov (Zhang, Hong) Date: Tue, 25 Feb 2020 16:59:43 +0000 Subject: [petsc-users] Questions about TSAdjoint for time dependent parameters In-Reply-To: References: Message-ID: On Feb 25, 2020, at 10:37 AM, Sajid Ali > wrote: Hi PETSc-developers, Could the code used for section 5.1 of the recent paper "PETSc TSAdjoint: a discrete adjoint ODE solver for first-order and second-order sensitivity analysis" be shared ? Are there more examples that deal with time dependent parameters in the git repository ? The code is in the master branch. See ts/examples/tutorials/optimal_control/ex1.c. This is the only example that deals with time-varying parameters. Another question I have is regarding the equations used to introduce adjoints in section 7.1 of the manual where for the state of the solution vector is denoted by y and the parameters by p. [1] I'm unsure about what the partial derivative of y0 with respect to p means since I understand y0 to be the initial conditions used to solve the TS which would not depend on the parameters (since the parameters are related to the equations TS tries to solve for which should not dependent on the initialization used). Could someone clarify what this means ? There exist applications that initial condition depends on the design parameters. [2] The manual described that a user has to set the correct initialization for the adjoint variables when calling TSSetCostGradients. The initialization for mu vector is whereby given to be d?i/dp at t=tF. If p is time dependent, does one evaluate this derivative with respect to p(t) at t=tF ? Yes The adjoint solvers are designed to handle as many cases as possible. In your case, you may have simpler dependencies than those supported. If the initial condition and the objective function do not depend on the parameters directly, their partial derivatives wrt to p will be zero and you can simply ignore them. Hong (Mr.) Thank You, Sajid Ali | PhD Candidate Applied Physics Northwestern University s-sajid-ali.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From sajidsyed2021 at u.northwestern.edu Tue Feb 25 11:21:42 2020 From: sajidsyed2021 at u.northwestern.edu (Sajid Ali) Date: Tue, 25 Feb 2020 11:21:42 -0600 Subject: [petsc-users] Questions about TSAdjoint for time dependent parameters In-Reply-To: References: Message-ID: Hi Hong, Thanks for the explanation! If I have a cost function consisting of an L2 norm of the difference of a TS-solution and some reference along with some constraints (say bounds, L1-sparsity, total variation etc), would I provide a routine for gradient evaluation of only the L2 norm (where TAO would take care of the constraints) or do I also have to take the constraints into account (since I'd also have to differentiate the regularizers) ? Thank You, Sajid Ali | PhD Candidate Applied Physics Northwestern University s-sajid-ali.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Feb 25 11:27:24 2020 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 25 Feb 2020 12:27:24 -0500 Subject: [petsc-users] Questions about TSAdjoint for time dependent parameters In-Reply-To: References: Message-ID: On Tue, Feb 25, 2020 at 12:23 PM Sajid Ali wrote: > Hi Hong, > > Thanks for the explanation! > > If I have a cost function consisting of an L2 norm of the difference of a > TS-solution and some reference along with some constraints (say bounds, > L1-sparsity, total variation etc), would I provide a routine for gradient > evaluation of only the L2 norm (where TAO would take care of the > constraints) or do I also have to take the constraints into account (since > I'd also have to differentiate the regularizers) ? > We want to have a framework for this separable case. The ADMM implementation that was recently merged is a step in this direction. See Alp's talk from SIAM PP 2020. Thanks, Matt > Thank You, > Sajid Ali | PhD Candidate > Applied Physics > Northwestern University > s-sajid-ali.github.io > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongzhang at anl.gov Tue Feb 25 11:49:35 2020 From: hongzhang at anl.gov (Zhang, Hong) Date: Tue, 25 Feb 2020 17:49:35 +0000 Subject: [petsc-users] Questions about TSAdjoint for time dependent parameters In-Reply-To: References: Message-ID: <918151C4-5C6E-4AF2-A540-854AB6B1AC32@anl.gov> On Feb 25, 2020, at 11:21 AM, Sajid Ali > wrote: Hi Hong, Thanks for the explanation! If I have a cost function consisting of an L2 norm of the difference of a TS-solution and some reference along with some constraints (say bounds, L1-sparsity, total variation etc), would I provide a routine for gradient evaluation of only the L2 norm (where TAO would take care of the constraints) or do I also have to take the constraints into account (since I'd also have to differentiate the regularizers) ? This depends on how you would like to formulate and solve your optimization problem. If you wan to use the built-in regularizers in TAO, then you just need provide gradient evaluation of the L2 norm. But TAO provides interfaces for users to provide customized regularizers and the gradient of them, in this case, again, adjoint can be used for the gradient calculation in the same way you handle objective functions/gradients. Of course, it is also possible to include regularizers in your objective function. Hong Thank You, Sajid Ali | PhD Candidate Applied Physics Northwestern University s-sajid-ali.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Tue Feb 25 12:49:58 2020 From: jed at jedbrown.org (Jed Brown) Date: Tue, 25 Feb 2020 11:49:58 -0700 Subject: [petsc-users] DMDAVecGetArrayDOF for read-only vectors In-Reply-To: References: Message-ID: <87tv3emrhl.fsf@jedbrown.org> Claudio Tomasi writes: > Dear all, > I would like to ask whether there's an analogous of the routine DMDAVecGetArrayDOF for read-only vectors. https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DMDA/DMDAVecGetArrayDOFRead.html I see it's missing links to/from related pages, so I've added those in this MR https://gitlab.com/petsc/petsc/-/merge_requests/2560 > I'm using the SNES solver and in computing the SNESfunction I need to read and use the values of the solution vector x, which is created with DMCreateGlobalVector where the associated DM has two dofs per node. > Best regards, > Claudio Tomasi From mfadams at lbl.gov Tue Feb 25 21:35:56 2020 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 25 Feb 2020 22:35:56 -0500 Subject: [petsc-users] [WARNING: UNSCANNABLE EXTRACTION FAILED]/usr/bin/ld: cannot find -lhwloc Message-ID: We are running on a KNL and getting /usr/bin/ld: cannot find -lhwloc This is v3.7.7. I see -lhwloc in the PETSc stuff. We are also missing libX11.a and I am configuring now with --with-x=0 to try to fix that. I've attached to full output and the logs. Any ideas? Thanks, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: output Type: application/octet-stream Size: 59425 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: make.log.gz Type: application/x-gzip Size: 10750 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log.gz Type: application/x-gzip Size: 547697 bytes Desc: not available URL: From balay at mcs.anl.gov Tue Feb 25 21:36:57 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 25 Feb 2020 21:36:57 -0600 Subject: [petsc-users] /usr/bin/ld: cannot find -lhwloc In-Reply-To: References: Message-ID: Try --with-hwloc=0 Satish On Tue, 25 Feb 2020, Mark Adams wrote: > We are running on a KNL and getting /usr/bin/ld: cannot find -lhwloc > > This is v3.7.7. > > I see -lhwloc in the PETSc stuff. We are also missing libX11.a and I am > configuring now with > --with-x=0 to try to fix that. > > I've attached to full output and the logs. > > Any ideas? > Thanks, > Mark > From barrydog505 at gmail.com Wed Feb 26 02:15:57 2020 From: barrydog505 at gmail.com (Tsung-Hsing Chen) Date: Wed, 26 Feb 2020 16:15:57 +0800 Subject: [petsc-users] The question of the output from ksp/ex2.c Message-ID: Hi, I tried to run the example in ksp/examples/tutorials/ex2. I run the code with : mpiexec -n 2 ./ex2 -ksp_monitor_short -m 5 -n 5 -ksp_gmres_cgs_refinement_type refine_always the output is : 0 KSP Residual norm 3.21109 1 KSP Residual norm 0.93268 2 KSP Residual norm 0.103515 3 KSP Residual norm 0.00787798 4 KSP Residual norm 0.000387275 Norm of error 0.000392701 iterations 4 0 KSP Residual norm 3.21109 1 KSP Residual norm 0.93268 2 KSP Residual norm 0.103515 3 KSP Residual norm 0.00787798 4 KSP Residual norm 0.000387275 Norm of error 0.000392701 iterations 4 My output(above) is twice as the ksp/examples/tutorials/output/ex2_4.out. Is this the right answer that should come out? Thanks in advance, Tsung-Hsing Chen -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Wed Feb 26 02:25:58 2020 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Wed, 26 Feb 2020 11:25:58 +0300 Subject: [petsc-users] The question of the output from ksp/ex2.c In-Reply-To: References: Message-ID: This is what I get [szampini at localhost tutorials]$ mpiexec -n 2 ./ex2 -ksp_monitor_short -m 5 -n 5 -ksp_gmres_cgs_refinement_type refine_always 0 KSP Residual norm 2.73499 1 KSP Residual norm 0.795482 2 KSP Residual norm 0.261984 3 KSP Residual norm 0.0752998 4 KSP Residual norm 0.0230031 5 KSP Residual norm 0.00521255 6 KSP Residual norm 0.00145783 7 KSP Residual norm 0.000277319 Norm of error 0.000292349 iterations 7 When I sequentially, I get (same output as yours) [szampini at localhost tutorials]$ mpiexec -n 1 ./ex2 -ksp_monitor_short -m 5 -n 5 -ksp_gmres_cgs_refinement_type refine_always 0 KSP Residual norm 3.21109 1 KSP Residual norm 0.93268 2 KSP Residual norm 0.103515 3 KSP Residual norm 0.00787798 4 KSP Residual norm 0.000387275 Norm of error 0.000392701 iterations 4 This means you are using the wrong mpiexec Il giorno mer 26 feb 2020 alle ore 11:17 Tsung-Hsing Chen < barrydog505 at gmail.com> ha scritto: > Hi, > > I tried to run the example in ksp/examples/tutorials/ex2. > I run the code with : mpiexec -n 2 ./ex2 -ksp_monitor_short -m 5 -n 5 > -ksp_gmres_cgs_refinement_type refine_always > > the output is : > 0 KSP Residual norm 3.21109 > 1 KSP Residual norm 0.93268 > 2 KSP Residual norm 0.103515 > 3 KSP Residual norm 0.00787798 > 4 KSP Residual norm 0.000387275 > Norm of error 0.000392701 iterations 4 > 0 KSP Residual norm 3.21109 > 1 KSP Residual norm 0.93268 > 2 KSP Residual norm 0.103515 > 3 KSP Residual norm 0.00787798 > 4 KSP Residual norm 0.000387275 > Norm of error 0.000392701 iterations 4 > > My output(above) is twice as the ksp/examples/tutorials/output/ex2_4.out. > Is this the right answer that should come out? > > Thanks in advance, > > Tsung-Hsing Chen > -- Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From barrydog505 at gmail.com Wed Feb 26 02:37:56 2020 From: barrydog505 at gmail.com (Tsung-Hsing Chen) Date: Wed, 26 Feb 2020 16:37:56 +0800 Subject: [petsc-users] The question of the output from ksp/ex2.c In-Reply-To: References: Message-ID: So, What should I do to use the correct mpiexec? Am I configure petsc with the wrong way or something should be done? Stefano Zampini ? 2020?2?26? ?? ??4:26??? > This is what I get > > [szampini at localhost tutorials]$ mpiexec -n 2 ./ex2 -ksp_monitor_short -m > 5 -n 5 -ksp_gmres_cgs_refinement_type refine_always > 0 KSP Residual norm 2.73499 > 1 KSP Residual norm 0.795482 > 2 KSP Residual norm 0.261984 > 3 KSP Residual norm 0.0752998 > 4 KSP Residual norm 0.0230031 > 5 KSP Residual norm 0.00521255 > 6 KSP Residual norm 0.00145783 > 7 KSP Residual norm 0.000277319 > Norm of error 0.000292349 iterations 7 > > When I sequentially, I get (same output as yours) > > [szampini at localhost tutorials]$ mpiexec -n 1 ./ex2 -ksp_monitor_short -m > 5 -n 5 -ksp_gmres_cgs_refinement_type refine_always > 0 KSP Residual norm 3.21109 > 1 KSP Residual norm 0.93268 > 2 KSP Residual norm 0.103515 > 3 KSP Residual norm 0.00787798 > 4 KSP Residual norm 0.000387275 > Norm of error 0.000392701 iterations 4 > > This means you are using the wrong mpiexec > > Il giorno mer 26 feb 2020 alle ore 11:17 Tsung-Hsing Chen < > barrydog505 at gmail.com> ha scritto: > >> Hi, >> >> I tried to run the example in ksp/examples/tutorials/ex2. >> I run the code with : mpiexec -n 2 ./ex2 -ksp_monitor_short -m 5 -n 5 >> -ksp_gmres_cgs_refinement_type refine_always >> >> the output is : >> 0 KSP Residual norm 3.21109 >> 1 KSP Residual norm 0.93268 >> 2 KSP Residual norm 0.103515 >> 3 KSP Residual norm 0.00787798 >> 4 KSP Residual norm 0.000387275 >> Norm of error 0.000392701 iterations 4 >> 0 KSP Residual norm 3.21109 >> 1 KSP Residual norm 0.93268 >> 2 KSP Residual norm 0.103515 >> 3 KSP Residual norm 0.00787798 >> 4 KSP Residual norm 0.000387275 >> Norm of error 0.000392701 iterations 4 >> >> My output(above) is twice as the ksp/examples/tutorials/output/ex2_4.out. >> Is this the right answer that should come out? >> >> Thanks in advance, >> >> Tsung-Hsing Chen >> > > > -- > Stefano > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Wed Feb 26 02:50:32 2020 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Wed, 26 Feb 2020 11:50:32 +0300 Subject: [petsc-users] The question of the output from ksp/ex2.c In-Reply-To: References: Message-ID: First, make sure you compiled with support for MPI by running make check [szampini at localhost petsc]$ make check Running test examples to verify correct installation Using PETSC_DIR=/home/szampini/Devel/jointinversion/pkgs/petsc and PETSC_ARCH=arch-joint C/C++ example src/snes/examples/tutorials/ex19 run successfully with 1 MPI process C/C++ example src/snes/examples/tutorials/ex19 run successfully with 2 MPI processes C/C++ example src/snes/examples/tutorials/ex19 run successfully with hypre C/C++ example src/snes/examples/tutorials/ex19 run successfully with mumps Completed test examples if you have the "2 MPI processes" output, then type [szampini at localhost petsc]$ make -f gmakefile print VAR=MPIEXEC mpiexec For me, mpiexec is system-wide. Il giorno mer 26 feb 2020 alle ore 11:38 Tsung-Hsing Chen < barrydog505 at gmail.com> ha scritto: > So, What should I do to use the correct mpiexec? > Am I configure petsc with the wrong way or something should be done? > > Stefano Zampini ? 2020?2?26? ?? ??4:26??? > >> This is what I get >> >> [szampini at localhost tutorials]$ mpiexec -n 2 ./ex2 -ksp_monitor_short -m >> 5 -n 5 -ksp_gmres_cgs_refinement_type refine_always >> 0 KSP Residual norm 2.73499 >> 1 KSP Residual norm 0.795482 >> 2 KSP Residual norm 0.261984 >> 3 KSP Residual norm 0.0752998 >> 4 KSP Residual norm 0.0230031 >> 5 KSP Residual norm 0.00521255 >> 6 KSP Residual norm 0.00145783 >> 7 KSP Residual norm 0.000277319 >> Norm of error 0.000292349 iterations 7 >> >> When I sequentially, I get (same output as yours) >> >> [szampini at localhost tutorials]$ mpiexec -n 1 ./ex2 -ksp_monitor_short -m >> 5 -n 5 -ksp_gmres_cgs_refinement_type refine_always >> 0 KSP Residual norm 3.21109 >> 1 KSP Residual norm 0.93268 >> 2 KSP Residual norm 0.103515 >> 3 KSP Residual norm 0.00787798 >> 4 KSP Residual norm 0.000387275 >> Norm of error 0.000392701 iterations 4 >> >> This means you are using the wrong mpiexec >> >> Il giorno mer 26 feb 2020 alle ore 11:17 Tsung-Hsing Chen < >> barrydog505 at gmail.com> ha scritto: >> >>> Hi, >>> >>> I tried to run the example in ksp/examples/tutorials/ex2. >>> I run the code with : mpiexec -n 2 ./ex2 -ksp_monitor_short -m 5 -n 5 >>> -ksp_gmres_cgs_refinement_type refine_always >>> >>> the output is : >>> 0 KSP Residual norm 3.21109 >>> 1 KSP Residual norm 0.93268 >>> 2 KSP Residual norm 0.103515 >>> 3 KSP Residual norm 0.00787798 >>> 4 KSP Residual norm 0.000387275 >>> Norm of error 0.000392701 iterations 4 >>> 0 KSP Residual norm 3.21109 >>> 1 KSP Residual norm 0.93268 >>> 2 KSP Residual norm 0.103515 >>> 3 KSP Residual norm 0.00787798 >>> 4 KSP Residual norm 0.000387275 >>> Norm of error 0.000392701 iterations 4 >>> >>> My output(above) is twice as the ksp/examples/tutorials/output/ex2_4.out. >>> Is this the right answer that should come out? >>> >>> Thanks in advance, >>> >>> Tsung-Hsing Chen >>> >> >> >> -- >> Stefano >> > -- Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From barrydog505 at gmail.com Wed Feb 26 04:21:24 2020 From: barrydog505 at gmail.com (Tsung-Hsing Chen) Date: Wed, 26 Feb 2020 18:21:24 +0800 Subject: [petsc-users] The question of the output from ksp/ex2.c In-Reply-To: References: Message-ID: Unfortunately, it still no work for me. what I do is first : ./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mpich --download-fblaslapack then make ......, and make check. the output has shown that "C/C++ example src/snes/examples/tutorials/ex19 run successfully with 2 MPI processes". last, I type "make -f gmakefile print VAR=MPIEXEC". And I went running ex2, the problem still exists. Is there needed to do anything else before I run ex2? By the way, should I move to petsc-maint at mcs.anl.gov for the upcoming question? Stefano Zampini ? 2020?2?26? ?? ??4:50??? > First, make sure you compiled with support for MPI by running make check > > [szampini at localhost petsc]$ make check > Running test examples to verify correct installation > Using PETSC_DIR=/home/szampini/Devel/jointinversion/pkgs/petsc and > PETSC_ARCH=arch-joint > C/C++ example src/snes/examples/tutorials/ex19 run successfully with 1 MPI > process > C/C++ example src/snes/examples/tutorials/ex19 run successfully with 2 MPI > processes > C/C++ example src/snes/examples/tutorials/ex19 run successfully with hypre > C/C++ example src/snes/examples/tutorials/ex19 run successfully with mumps > Completed test examples > > if you have the "2 MPI processes" output, then type > > [szampini at localhost petsc]$ make -f gmakefile print VAR=MPIEXEC > mpiexec > > For me, mpiexec is system-wide. > > Il giorno mer 26 feb 2020 alle ore 11:38 Tsung-Hsing Chen < > barrydog505 at gmail.com> ha scritto: > >> So, What should I do to use the correct mpiexec? >> Am I configure petsc with the wrong way or something should be done? >> >> Stefano Zampini ? 2020?2?26? ?? ??4:26??? >> >>> This is what I get >>> >>> [szampini at localhost tutorials]$ mpiexec -n 2 ./ex2 -ksp_monitor_short >>> -m 5 -n 5 -ksp_gmres_cgs_refinement_type refine_always >>> 0 KSP Residual norm 2.73499 >>> 1 KSP Residual norm 0.795482 >>> 2 KSP Residual norm 0.261984 >>> 3 KSP Residual norm 0.0752998 >>> 4 KSP Residual norm 0.0230031 >>> 5 KSP Residual norm 0.00521255 >>> 6 KSP Residual norm 0.00145783 >>> 7 KSP Residual norm 0.000277319 >>> Norm of error 0.000292349 iterations 7 >>> >>> When I sequentially, I get (same output as yours) >>> >>> [szampini at localhost tutorials]$ mpiexec -n 1 ./ex2 -ksp_monitor_short >>> -m 5 -n 5 -ksp_gmres_cgs_refinement_type refine_always >>> 0 KSP Residual norm 3.21109 >>> 1 KSP Residual norm 0.93268 >>> 2 KSP Residual norm 0.103515 >>> 3 KSP Residual norm 0.00787798 >>> 4 KSP Residual norm 0.000387275 >>> Norm of error 0.000392701 iterations 4 >>> >>> This means you are using the wrong mpiexec >>> >>> Il giorno mer 26 feb 2020 alle ore 11:17 Tsung-Hsing Chen < >>> barrydog505 at gmail.com> ha scritto: >>> >>>> Hi, >>>> >>>> I tried to run the example in ksp/examples/tutorials/ex2. >>>> I run the code with : mpiexec -n 2 ./ex2 -ksp_monitor_short -m 5 -n 5 >>>> -ksp_gmres_cgs_refinement_type refine_always >>>> >>>> the output is : >>>> 0 KSP Residual norm 3.21109 >>>> 1 KSP Residual norm 0.93268 >>>> 2 KSP Residual norm 0.103515 >>>> 3 KSP Residual norm 0.00787798 >>>> 4 KSP Residual norm 0.000387275 >>>> Norm of error 0.000392701 iterations 4 >>>> 0 KSP Residual norm 3.21109 >>>> 1 KSP Residual norm 0.93268 >>>> 2 KSP Residual norm 0.103515 >>>> 3 KSP Residual norm 0.00787798 >>>> 4 KSP Residual norm 0.000387275 >>>> Norm of error 0.000392701 iterations 4 >>>> >>>> My output(above) is twice as >>>> the ksp/examples/tutorials/output/ex2_4.out. >>>> Is this the right answer that should come out? >>>> >>>> Thanks in advance, >>>> >>>> Tsung-Hsing Chen >>>> >>> >>> >>> -- >>> Stefano >>> >> > > -- > Stefano > -------------- next part -------------- An HTML attachment was scrubbed... URL: From barrydog505 at gmail.com Wed Feb 26 04:59:31 2020 From: barrydog505 at gmail.com (Tsung-Hsing Chen) Date: Wed, 26 Feb 2020 18:59:31 +0800 Subject: [petsc-users] The question of the output from ksp/ex2.c In-Reply-To: References: Message-ID: I think I just found out what happened. There is another mpi "openmpi" that already exists on my computer. After I remove it then all back to normal. Thanks for your assistance, Tsung-Hsing Chen Tsung-Hsing Chen ? 2020?2?26? ?? ??6:21??? > Unfortunately, it still no work for me. > what I do is > first : ./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran > --download-mpich --download-fblaslapack > then make ......, and make check. > the output has shown that "C/C++ example src/snes/examples/tutorials/ex19 > run successfully with 2 MPI processes". > last, I type "make -f gmakefile print VAR=MPIEXEC". > > And I went running ex2, the problem still exists. > Is there needed to do anything else before I run ex2? > By the way, should I move to petsc-maint at mcs.anl.gov for the upcoming > question? > > > Stefano Zampini ? 2020?2?26? ?? ??4:50??? > >> First, make sure you compiled with support for MPI by running make check >> >> [szampini at localhost petsc]$ make check >> Running test examples to verify correct installation >> Using PETSC_DIR=/home/szampini/Devel/jointinversion/pkgs/petsc and >> PETSC_ARCH=arch-joint >> C/C++ example src/snes/examples/tutorials/ex19 run successfully with 1 >> MPI process >> C/C++ example src/snes/examples/tutorials/ex19 run successfully with 2 >> MPI processes >> C/C++ example src/snes/examples/tutorials/ex19 run successfully with hypre >> C/C++ example src/snes/examples/tutorials/ex19 run successfully with mumps >> Completed test examples >> >> if you have the "2 MPI processes" output, then type >> >> [szampini at localhost petsc]$ make -f gmakefile print VAR=MPIEXEC >> mpiexec >> >> For me, mpiexec is system-wide. >> >> Il giorno mer 26 feb 2020 alle ore 11:38 Tsung-Hsing Chen < >> barrydog505 at gmail.com> ha scritto: >> >>> So, What should I do to use the correct mpiexec? >>> Am I configure petsc with the wrong way or something should be done? >>> >>> Stefano Zampini ? 2020?2?26? ?? ??4:26??? >>> >>>> This is what I get >>>> >>>> [szampini at localhost tutorials]$ mpiexec -n 2 ./ex2 -ksp_monitor_short >>>> -m 5 -n 5 -ksp_gmres_cgs_refinement_type refine_always >>>> 0 KSP Residual norm 2.73499 >>>> 1 KSP Residual norm 0.795482 >>>> 2 KSP Residual norm 0.261984 >>>> 3 KSP Residual norm 0.0752998 >>>> 4 KSP Residual norm 0.0230031 >>>> 5 KSP Residual norm 0.00521255 >>>> 6 KSP Residual norm 0.00145783 >>>> 7 KSP Residual norm 0.000277319 >>>> Norm of error 0.000292349 iterations 7 >>>> >>>> When I sequentially, I get (same output as yours) >>>> >>>> [szampini at localhost tutorials]$ mpiexec -n 1 ./ex2 -ksp_monitor_short >>>> -m 5 -n 5 -ksp_gmres_cgs_refinement_type refine_always >>>> 0 KSP Residual norm 3.21109 >>>> 1 KSP Residual norm 0.93268 >>>> 2 KSP Residual norm 0.103515 >>>> 3 KSP Residual norm 0.00787798 >>>> 4 KSP Residual norm 0.000387275 >>>> Norm of error 0.000392701 iterations 4 >>>> >>>> This means you are using the wrong mpiexec >>>> >>>> Il giorno mer 26 feb 2020 alle ore 11:17 Tsung-Hsing Chen < >>>> barrydog505 at gmail.com> ha scritto: >>>> >>>>> Hi, >>>>> >>>>> I tried to run the example in ksp/examples/tutorials/ex2. >>>>> I run the code with : mpiexec -n 2 ./ex2 -ksp_monitor_short -m 5 -n 5 >>>>> -ksp_gmres_cgs_refinement_type refine_always >>>>> >>>>> the output is : >>>>> 0 KSP Residual norm 3.21109 >>>>> 1 KSP Residual norm 0.93268 >>>>> 2 KSP Residual norm 0.103515 >>>>> 3 KSP Residual norm 0.00787798 >>>>> 4 KSP Residual norm 0.000387275 >>>>> Norm of error 0.000392701 iterations 4 >>>>> 0 KSP Residual norm 3.21109 >>>>> 1 KSP Residual norm 0.93268 >>>>> 2 KSP Residual norm 0.103515 >>>>> 3 KSP Residual norm 0.00787798 >>>>> 4 KSP Residual norm 0.000387275 >>>>> Norm of error 0.000392701 iterations 4 >>>>> >>>>> My output(above) is twice as >>>>> the ksp/examples/tutorials/output/ex2_4.out. >>>>> Is this the right answer that should come out? >>>>> >>>>> Thanks in advance, >>>>> >>>>> Tsung-Hsing Chen >>>>> >>>> >>>> >>>> -- >>>> Stefano >>>> >>> >> >> -- >> Stefano >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Feb 26 11:44:03 2020 From: jed at jedbrown.org (Jed Brown) Date: Wed, 26 Feb 2020 10:44:03 -0700 Subject: [petsc-users] Guidance on using Tao for unconstrained minimization In-Reply-To: <9480b1d3-1ed3-8108-15d8-a5914312c5f2@cfa.harvard.edu> References: <9480b1d3-1ed3-8108-15d8-a5914312c5f2@cfa.harvard.edu> Message-ID: <87ftexjlb0.fsf@jedbrown.org> Could you share output for your current configuration with -tao_monitor -tao_ls_monitor -tao_view? "Ellen M. Price" writes: > Hello PETSc users! > > I am using Tao for an unconstrained minimization problem. I have found > that CG works better than the other types for this application. After > about 85 iterations, I get an error about line search failure. I'm not > clear on what this means, or how I could mitigate the problem, and > neither the manual nor FAQ give any guidance. Can anyone suggest things > I could try to help the method converge? I have function and gradient > info, but no Hessian. > > Thanks, > Ellen Price From adener at anl.gov Wed Feb 26 12:02:47 2020 From: adener at anl.gov (Dener, Alp) Date: Wed, 26 Feb 2020 18:02:47 +0000 Subject: [petsc-users] Guidance on using Tao for unconstrained minimization In-Reply-To: <87ftexjlb0.fsf@jedbrown.org> References: <9480b1d3-1ed3-8108-15d8-a5914312c5f2@cfa.harvard.edu> <87ftexjlb0.fsf@jedbrown.org> Message-ID: Hi Ellen, Per Jed?s suggestion, seeing the monitor and ls_monitor outputs would certainly be helpful. The line search for CG (and other Tao algorithms) have safeguard steps for failures. When the line search fails to determine a valid step length for the computed CG direction, the search direction falls back to gradient descent for a second line search. If the gradient descent step succeeds here, then the CG updates restart again from that point (discarding previously updated information completely). LS failure is reported to the user only if this safeguard also fails to produce a viable step length, which then suggests that the computed gradient at that point may be incorrect or have significant numerical errors. If you can afford a slow run for debugging, you can use ?-tao_test_gradient? to check your gradient against the FD approximation at every iteration throughout the run. If you?re confident that the gradient is accurate, I would recommend testing with ?-tao_bncg_type gd? for a pure gradient descent run, and also trying out ?-tao_type bqnls? for the quasi-Newton method (only requires the gradient, no Hessian). ? Alp Dener Postdoctoral Researcher Argonne National Laboratory https://www.anl.gov/profile/alp-dener On February 26, 2020 at 11:44:15 AM, Jed Brown (jed at jedbrown.org) wrote: Could you share output for your current configuration with -tao_monitor -tao_ls_monitor -tao_view? "Ellen M. Price" writes: > Hello PETSc users! > > I am using Tao for an unconstrained minimization problem. I have found > that CG works better than the other types for this application. After > about 85 iterations, I get an error about line search failure. I'm not > clear on what this means, or how I could mitigate the problem, and > neither the manual nor FAQ give any guidance. Can anyone suggest things > I could try to help the method converge? I have function and gradient > info, but no Hessian. > > Thanks, > Ellen Price -------------- next part -------------- An HTML attachment was scrubbed... URL: From ellen.price at cfa.harvard.edu Wed Feb 26 17:59:27 2020 From: ellen.price at cfa.harvard.edu (Ellen Price) Date: Wed, 26 Feb 2020 18:59:27 -0500 Subject: [petsc-users] Guidance on using Tao for unconstrained minimization In-Reply-To: <87ftexjlb0.fsf@jedbrown.org> References: <9480b1d3-1ed3-8108-15d8-a5914312c5f2@cfa.harvard.edu> <87ftexjlb0.fsf@jedbrown.org> Message-ID: Hi Jed, Thanks for getting back to me! Here's the output for my CG config. Sorry it's kind of a lot. Ellen On Wed, Feb 26, 2020 at 12:43 PM Jed Brown wrote: > Could you share output for your current configuration with -tao_monitor > -tao_ls_monitor -tao_view? > > "Ellen M. Price" writes: > > > Hello PETSc users! > > > > I am using Tao for an unconstrained minimization problem. I have found > > that CG works better than the other types for this application. After > > about 85 iterations, I get an error about line search failure. I'm not > > clear on what this means, or how I could mitigate the problem, and > > neither the manual nor FAQ give any guidance. Can anyone suggest things > > I could try to help the method converge? I have function and gradient > > info, but no Hessian. > > > > Thanks, > > Ellen Price > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: tao_cg.out Type: application/octet-stream Size: 187171 bytes Desc: not available URL: From mfadams at lbl.gov Wed Feb 26 18:11:53 2020 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 26 Feb 2020 19:11:53 -0500 Subject: [petsc-users] /usr/bin/ld: cannot find -lhwloc In-Reply-To: References: Message-ID: Thanks, we are now getting /usr/bin/ld: cannot find -lgcc_s. Any ideas? Let me know if you want to logs. (it takes a bit of mucking around). Thanks again, On Tue, Feb 25, 2020 at 10:37 PM Satish Balay wrote: > Try > > --with-hwloc=0 > > Satish > > On Tue, 25 Feb 2020, Mark Adams wrote: > > > We are running on a KNL and getting /usr/bin/ld: cannot find -lhwloc > > > > This is v3.7.7. > > > > I see -lhwloc in the PETSc stuff. We are also missing libX11.a and I am > > configuring now with > > --with-x=0 to try to fix that. > > > > I've attached to full output and the logs. > > > > Any ideas? > > Thanks, > > Mark > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: output Type: application/octet-stream Size: 61875 bytes Desc: not available URL: From balay at mcs.anl.gov Wed Feb 26 18:34:55 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 26 Feb 2020 18:34:55 -0600 Subject: [petsc-users] /usr/bin/ld: cannot find -lhwloc In-Reply-To: References: Message-ID: Such problems usually occur when 'compiler enviornment' is different during petsc build - and this application build. So things are worked during petsc configure [and build] - but they are are not working during application build. So the issue is to figure out why the env is different in these 2 builds. And in the petsc build - I see: >>> Using C/C++ linker: cc Using C/C++ flags: -g -O3 Using Fortran linker: ftn Using Fortran flags: -g -O3 <<<< On the application build - I see: ftn -O2 -no-ipo -traceback Don't know if these different options are causing these issues. Its good to test with default petsc makefiles [which use the same targets as petsc sources] - if possible. Satish On Wed, 26 Feb 2020, Mark Adams wrote: > Thanks, > we are now getting > /usr/bin/ld: cannot find -lgcc_s. > Any ideas? > Let me know if you want to logs. (it takes a bit of mucking around). > Thanks again, > > > On Tue, Feb 25, 2020 at 10:37 PM Satish Balay wrote: > > > Try > > > > --with-hwloc=0 > > > > Satish > > > > On Tue, 25 Feb 2020, Mark Adams wrote: > > > > > We are running on a KNL and getting /usr/bin/ld: cannot find -lhwloc > > > > > > This is v3.7.7. > > > > > > I see -lhwloc in the PETSc stuff. We are also missing libX11.a and I am > > > configuring now with > > > --with-x=0 to try to fix that. > > > > > > I've attached to full output and the logs. > > > > > > Any ideas? > > > Thanks, > > > Mark > > > > > > > > From juaneah at gmail.com Wed Feb 26 23:38:20 2020 From: juaneah at gmail.com (Emmanuel Ayala) Date: Wed, 26 Feb 2020 23:38:20 -0600 Subject: [petsc-users] MatMatMult Message-ID: Hi everyone, Recently I installed the PETSc version 3.12.4 in optimized mode: ./configure --with-debugging=0 COPTFLAGS='-O2 -march=native -mtune=native' CXXOPTFLAGS='-O2 -march=native -mtune=native' FOPTFLAGS='-O2 -march=native -mtune=native' --download-mpich=1 --download-fblaslapack=1 --with-cxx-dialect=C++11 When I perform MatMatMult(A,B,MAT_REUSE_MATRIX,PETSC_DEFAULT,&C), for sparse A and dense B matrices, then: In the special case where matrix B (and hence C) are dense you can create the correctly sized matrix C yourself and then call this routine with MAT_REUSE_MATRIX , rather than first having MatMatMult () create it for you. You can NEVER do this if the matrix C is sparse. So, for these new release 3.12.4, if you create C as a dense matrix it is necessary to apply a MatAssemblyBegin() and MatAssemblyEnd() after the matrix multiplication, other wise it's not possible to perform further operations. It does not happen in the 3.11.3 version, where MatAssembly after multiplication it's not necessary. Kind regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Thu Feb 27 09:33:29 2020 From: hzhang at mcs.anl.gov (hzhang at mcs.anl.gov) Date: Thu, 27 Feb 2020 09:33:29 -0600 Subject: [petsc-users] MatMatMult In-Reply-To: References: Message-ID: Emmanuel: You can create a dense C with the required parallel layout without calling MatAssemblyBegin() and MatAssemblyEnd(). Did you get error without calling these routines? We only updated the help manu, not internal implementation. In the next release, we'll introduce new set of API to consolidate the API of mat-mat-operations. Hong Hi everyone, > > Recently I installed the PETSc version 3.12.4 in optimized mode: > > ./configure --with-debugging=0 COPTFLAGS='-O2 -march=native -mtune=native' > CXXOPTFLAGS='-O2 -march=native -mtune=native' FOPTFLAGS='-O2 -march=native > -mtune=native' --download-mpich=1 --download-fblaslapack=1 > --with-cxx-dialect=C++11 > > When I perform MatMatMult(A,B,MAT_REUSE_MATRIX,PETSC_DEFAULT,&C), for > sparse A and dense B matrices, then: > > In the special case where matrix B (and hence C) are dense you can create > the correctly sized matrix C yourself and then call this routine with > MAT_REUSE_MATRIX > , > rather than first having MatMatMult > () > create it for you. You can NEVER do this if the matrix C is sparse. > > So, for these new release 3.12.4, if you create C as a dense matrix it is > necessary to apply a MatAssemblyBegin() and MatAssemblyEnd() after the > matrix multiplication, other wise it's not possible to perform further > operations. > > It does not happen in the 3.11.3 version, where MatAssembly after > multiplication it's not necessary. > > Kind regards. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From adener at anl.gov Thu Feb 27 09:55:18 2020 From: adener at anl.gov (Dener, Alp) Date: Thu, 27 Feb 2020 15:55:18 +0000 Subject: [petsc-users] Guidance on using Tao for unconstrained minimization In-Reply-To: References: <9480b1d3-1ed3-8108-15d8-a5914312c5f2@cfa.harvard.edu> <87ftexjlb0.fsf@jedbrown.org> Message-ID: Hi Ellen, It looks like you?re using the old unconstrained CG code. This will be deprecated in the near future in favor of the newer bound-constrained CG algorithm (TAOBNCG) that can also solve unconstrained problems when the user does not specify any bounds on the problem. The newer TAOBNCG algorithm implements a preconditioner that significantly improves the scaling of the search direction and helps the line search accept the unit step length most of the time. I would recommend making sure that you?re on PETSc version 3.11 or newer, and then switching to this with ?-tao_type bncg?. You will not need to change any of your code to do this. If you still fail to converge, please send a new log with the new algorithm and we can evaluate the next steps. ? Alp Dener Postdoctoral Researcher Argonne National Laboratory https://www.anl.gov/profile/alp-dener On February 26, 2020 at 6:01:34 PM, Ellen Price (ellen.price at cfa.harvard.edu) wrote: Hi Jed, Thanks for getting back to me! Here's the output for my CG config. Sorry it's kind of a lot. Ellen On Wed, Feb 26, 2020 at 12:43 PM Jed Brown > wrote: Could you share output for your current configuration with -tao_monitor -tao_ls_monitor -tao_view? "Ellen M. Price" > writes: > Hello PETSc users! > > I am using Tao for an unconstrained minimization problem. I have found > that CG works better than the other types for this application. After > about 85 iterations, I get an error about line search failure. I'm not > clear on what this means, or how I could mitigate the problem, and > neither the manual nor FAQ give any guidance. Can anyone suggest things > I could try to help the method converge? I have function and gradient > info, but no Hessian. > > Thanks, > Ellen Price -------------- next part -------------- An HTML attachment was scrubbed... URL: From adenchfi at hawk.iit.edu Thu Feb 27 10:40:16 2020 From: adenchfi at hawk.iit.edu (Adam Denchfield) Date: Thu, 27 Feb 2020 10:40:16 -0600 Subject: [petsc-users] Guidance on using Tao for unconstrained minimization In-Reply-To: References: <9480b1d3-1ed3-8108-15d8-a5914312c5f2@cfa.harvard.edu> <87ftexjlb0.fsf@jedbrown.org> Message-ID: Hi Ellen, It is as Alp said. To emphasize what he said, you don't need to worry about using a bounded CG method - the bounded CG methods can be used for unconstrained problems, and are much better than the old unconstrained CG code. On Thu, Feb 27, 2020, 9:55 AM Dener, Alp via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi Ellen, > > It looks like you?re using the old unconstrained CG code. This will be > deprecated in the near future in favor of the newer bound-constrained CG > algorithm (TAOBNCG) that can also solve unconstrained problems when the > user does not specify any bounds on the problem. > > The newer TAOBNCG algorithm implements a preconditioner that significantly > improves the scaling of the search direction and helps the line search > accept the unit step length most of the time. I would recommend making sure > that you?re on PETSc version 3.11 or newer, and then switching to this with > ?-tao_type bncg?. You will not need to change any of your code to do this. > If you still fail to converge, please send a new log with the new algorithm > and we can evaluate the next steps. > > ? > Alp Dener > Postdoctoral Researcher > Argonne National Laboratory > https://www.anl.gov/profile/alp-dener > > On February 26, 2020 at 6:01:34 PM, Ellen Price ( > ellen.price at cfa.harvard.edu) wrote: > > Hi Jed, > > Thanks for getting back to me! Here's the output for my CG config. Sorry > it's kind of a lot. > > Ellen > > On Wed, Feb 26, 2020 at 12:43 PM Jed Brown wrote: > >> Could you share output for your current configuration with -tao_monitor >> -tao_ls_monitor -tao_view? >> >> "Ellen M. Price" writes: >> >> > Hello PETSc users! >> > >> > I am using Tao for an unconstrained minimization problem. I have found >> > that CG works better than the other types for this application. After >> > about 85 iterations, I get an error about line search failure. I'm not >> > clear on what this means, or how I could mitigate the problem, and >> > neither the manual nor FAQ give any guidance. Can anyone suggest things >> > I could try to help the method converge? I have function and gradient >> > info, but no Hessian. >> > >> > Thanks, >> > Ellen Price >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ellen.price at cfa.harvard.edu Thu Feb 27 13:47:13 2020 From: ellen.price at cfa.harvard.edu (Ellen Price) Date: Thu, 27 Feb 2020 14:47:13 -0500 Subject: [petsc-users] Guidance on using Tao for unconstrained minimization In-Reply-To: References: <9480b1d3-1ed3-8108-15d8-a5914312c5f2@cfa.harvard.edu> <87ftexjlb0.fsf@jedbrown.org> Message-ID: I tried what you suggested and used the bounded CG method. It gets a lot farther than the unconstrained CG method and finds a lower residual, but it still experiences a line search failure after a while. Any thoughts? I'm attaching the log output. In case it's helpful, I also spent a few more hours working on the code and now can compute the Hessian times an arbitrary vector (matrix-free using a MATSHELL); even matrix-free, however, the Hessian is much slower to compute than the gradient and objective. To answer a previous question, I am as sure as I can be that the gradient is correct, since I'm using automatic differentiation and not relying on a hand-coded function. Thanks for your help, Ellen On Thu, Feb 27, 2020 at 11:40 AM Adam Denchfield wrote: > Hi Ellen, > > It is as Alp said. To emphasize what he said, you don't need to worry > about using a bounded CG method - the bounded CG methods can be used for > unconstrained problems, and are much better than the old unconstrained CG > code. > > > On Thu, Feb 27, 2020, 9:55 AM Dener, Alp via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> Hi Ellen, >> >> It looks like you?re using the old unconstrained CG code. This will be >> deprecated in the near future in favor of the newer bound-constrained CG >> algorithm (TAOBNCG) that can also solve unconstrained problems when the >> user does not specify any bounds on the problem. >> >> The newer TAOBNCG algorithm implements a preconditioner that >> significantly improves the scaling of the search direction and helps the >> line search accept the unit step length most of the time. I would recommend >> making sure that you?re on PETSc version 3.11 or newer, and then switching >> to this with ?-tao_type bncg?. You will not need to change any of your code >> to do this. If you still fail to converge, please send a new log with the >> new algorithm and we can evaluate the next steps. >> >> ? >> Alp Dener >> Postdoctoral Researcher >> Argonne National Laboratory >> https://www.anl.gov/profile/alp-dener >> >> On February 26, 2020 at 6:01:34 PM, Ellen Price ( >> ellen.price at cfa.harvard.edu) wrote: >> >> Hi Jed, >> >> Thanks for getting back to me! Here's the output for my CG config. Sorry >> it's kind of a lot. >> >> Ellen >> >> On Wed, Feb 26, 2020 at 12:43 PM Jed Brown wrote: >> >>> Could you share output for your current configuration with -tao_monitor >>> -tao_ls_monitor -tao_view? >>> >>> "Ellen M. Price" writes: >>> >>> > Hello PETSc users! >>> > >>> > I am using Tao for an unconstrained minimization problem. I have found >>> > that CG works better than the other types for this application. After >>> > about 85 iterations, I get an error about line search failure. I'm not >>> > clear on what this means, or how I could mitigate the problem, and >>> > neither the manual nor FAQ give any guidance. Can anyone suggest things >>> > I could try to help the method converge? I have function and gradient >>> > info, but no Hessian. >>> > >>> > Thanks, >>> > Ellen Price >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: tao_bncg.out Type: application/octet-stream Size: 51783 bytes Desc: not available URL: From adener at anl.gov Thu Feb 27 15:58:44 2020 From: adener at anl.gov (Dener, Alp) Date: Thu, 27 Feb 2020 21:58:44 +0000 Subject: [petsc-users] Guidance on using Tao for unconstrained minimization In-Reply-To: References: <9480b1d3-1ed3-8108-15d8-a5914312c5f2@cfa.harvard.edu> <87ftexjlb0.fsf@jedbrown.org> Message-ID: The log shows that the LS keeps backtracking on the step length and still fails to find any point that reduces the function value even though the gradient passes the directional derivative test (i.e. it is a valid descent direction). If you?re confident that the gradient is accurate, then this behavior suggests that you?re stuck at a discontinuity in the function. Are you certain that this objective is at least C1 continuous? I also see that the LS is taking a step length of 5.0 in the 109th iteration right before it fails on the 110th. This might be too aggressive. You could restrict this with ?-tao_ls_stepmax 1.0? or switch to a backtracking LS with ?-tao_ls_type armijo? see if it changes anything at all. Note that the backtracking line search may behave a bit differently than restricting max step to 1.0, because backtracking completely ignores curvature information. Trying both would be helpful. You could additionally try to disable the line search entirely with ?-tao_ls_type unit? and accept 1.0 step lengths no matter what. This might cause an issue at the very beginning where I do see the LS doing some work. However, if it helps you reach the same point of failure as before, then it will keep going further without LS failures, and what happens with the function value and gradient norm here can help diagnose the problem. If there is indeed a discontinuous point like I suspect, then BNCG without line search might bounce back and forth around that point until it hits the iteration limit. TAO does have support for matrix-free Hessians in the Newton-type algorithms. You can switch to them with ?-tao_type bnls? for Newton Line Search or ?-tao_type bntr? for Newton Trust Region. They both use a Krylov method to iteratively invert the Hessian, and in matrix-free cases, use a BFGS approximation as the preconditioner. They're going to take more time per iteration, but should at least reach the point of failure in fewer iterations than BNCG. Whether or not that trade-off improves the overall time is problem dependent. The same LS modifications I mentioned above are also applicable to these. Ultimately though, if there really is a discontinuity, they're likely to get stuck in the same spot unless they manages to find a different local minimum that BNCG is not finding. ? Alp Dener Postdoctoral Researcher Argonne National Laboratory https://www.anl.gov/profile/alp-dener On February 27, 2020 at 1:48:44 PM, Ellen Price (ellen.price at cfa.harvard.edu) wrote: I tried what you suggested and used the bounded CG method. It gets a lot farther than the unconstrained CG method and finds a lower residual, but it still experiences a line search failure after a while. Any thoughts? I'm attaching the log output. In case it's helpful, I also spent a few more hours working on the code and now can compute the Hessian times an arbitrary vector (matrix-free using a MATSHELL); even matrix-free, however, the Hessian is much slower to compute than the gradient and objective. To answer a previous question, I am as sure as I can be that the gradient is correct, since I'm using automatic differentiation and not relying on a hand-coded function. Thanks for your help, Ellen On Thu, Feb 27, 2020 at 11:40 AM Adam Denchfield > wrote: Hi Ellen, It is as Alp said. To emphasize what he said, you don't need to worry about using a bounded CG method - the bounded CG methods can be used for unconstrained problems, and are much better than the old unconstrained CG code. On Thu, Feb 27, 2020, 9:55 AM Dener, Alp via petsc-users > wrote: Hi Ellen, It looks like you?re using the old unconstrained CG code. This will be deprecated in the near future in favor of the newer bound-constrained CG algorithm (TAOBNCG) that can also solve unconstrained problems when the user does not specify any bounds on the problem. The newer TAOBNCG algorithm implements a preconditioner that significantly improves the scaling of the search direction and helps the line search accept the unit step length most of the time. I would recommend making sure that you?re on PETSc version 3.11 or newer, and then switching to this with ?-tao_type bncg?. You will not need to change any of your code to do this. If you still fail to converge, please send a new log with the new algorithm and we can evaluate the next steps. ? Alp Dener Postdoctoral Researcher Argonne National Laboratory https://www.anl.gov/profile/alp-dener On February 26, 2020 at 6:01:34 PM, Ellen Price (ellen.price at cfa.harvard.edu) wrote: Hi Jed, Thanks for getting back to me! Here's the output for my CG config. Sorry it's kind of a lot. Ellen On Wed, Feb 26, 2020 at 12:43 PM Jed Brown > wrote: Could you share output for your current configuration with -tao_monitor -tao_ls_monitor -tao_view? "Ellen M. Price" > writes: > Hello PETSc users! > > I am using Tao for an unconstrained minimization problem. I have found > that CG works better than the other types for this application. After > about 85 iterations, I get an error about line search failure. I'm not > clear on what this means, or how I could mitigate the problem, and > neither the manual nor FAQ give any guidance. Can anyone suggest things > I could try to help the method converge? I have function and gradient > info, but no Hessian. > > Thanks, > Ellen Price -------------- next part -------------- An HTML attachment was scrubbed... URL: From juaneah at gmail.com Thu Feb 27 16:55:05 2020 From: juaneah at gmail.com (Emmanuel Ayala) Date: Thu, 27 Feb 2020 16:55:05 -0600 Subject: [petsc-users] MatMatMult In-Reply-To: References: Message-ID: Thanks for the answer. Emmanuel: > You can create a dense C with the required parallel layout without > calling MatAssemblyBegin() and MatAssemblyEnd(). > Did you get error without calling these routines? > Yes, the output is (after create the C dense matrix and do not assembly it, run1 - see attached file -): [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Not for unassembled matrix [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.12.4, Feb, 04, 2020 [0]PETSC ERROR: ./comp on a arch-linux-c-opt-O2 named ayala by ayala Thu Feb 27 16:47:15 2020 [0]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2 -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich=1 --download-fblaslapack=1 --with-cxx-dialect=C++11 [0]PETSC ERROR: #1 MatNorm() line 5123 in /home/ayala/Documents/PETSc/petsc-3.12.4/src/mat/interface/matrix.c We only updated the help manu, not internal implementation. In the next > release, we'll introduce new set of API to consolidate the API of > mat-mat-operations. > Hong > I attach my test file, or maybe I'm doing something wrong. I tested this file on my laptop ubuntu 18 Kind regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: main.cc Type: text/x-c++src Size: 3157 bytes Desc: not available URL: From alexlindsay239 at gmail.com Thu Feb 27 19:59:49 2020 From: alexlindsay239 at gmail.com (Alexander Lindsay) Date: Thu, 27 Feb 2020 17:59:49 -0800 Subject: [petsc-users] Scraping MPI information from PETSc conf Message-ID: What's the cleanest way to determine the MPI install used to build PETSc? We are configuring a an MPI-based C++ library with autotools that will eventually be used by libMesh, and we'd like to make sure that this library (as well as libMesh) uses the same MPI that PETSc used or at worst detect our own and then error/warn the user if its an MPI that differs from the one used to build PETc. It seems like the only path info that shows up is in MPICXX_SHOW, PETSC_EXTERNAL_LIB_BASIC, and PETSC_WITH_EXTERNAL_LIB (I'm looking in petscvariables). I'm willing to learn the m4/portable shell built-ins necessary to parse those variables and come out with an mpi-dir, but before doing that figured I'd ask here and see if I'm missing something easier. Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Thu Feb 27 20:08:40 2020 From: jed at jedbrown.org (Jed Brown) Date: Thu, 27 Feb 2020 19:08:40 -0700 Subject: [petsc-users] Scraping MPI information from PETSc conf In-Reply-To: References: Message-ID: <87y2snsbtj.fsf@jedbrown.org> If determining mpicc is sufficient, this will work pkg-config --var=ccompiler PETSc We also define $ grep NUMVERSION mpich-optg/include/petscconf.h #define PETSC_HAVE_MPICH_NUMVERSION 30302300 or $ grep OMPI_ ompi-optg/include/petscconf.h #define PETSC_HAVE_OMPI_MAJOR_VERSION 4 #define PETSC_HAVE_OMPI_MINOR_VERSION 0 #define PETSC_HAVE_OMPI_RELEASE_VERSION 2 which PETSc uses to raise a compile-time error if it believes you're compiling PETSc code using an incompatible MPI. Note that some of this is hidden in the environment on Cray systems, for example, where CC=cc regardless of what compiler you're actually using. Alexander Lindsay writes: > What's the cleanest way to determine the MPI install used to build PETSc? > We are configuring a an MPI-based C++ library with autotools that will > eventually be used by libMesh, and we'd like to make sure that this library > (as well as libMesh) uses the same MPI that PETSc used or at worst detect > our own and then error/warn the user if its an MPI that differs from the > one used to build PETc. It seems like the only path info that shows up is > in MPICXX_SHOW, PETSC_EXTERNAL_LIB_BASIC, and PETSC_WITH_EXTERNAL_LIB (I'm > looking in petscvariables). I'm willing to learn the m4/portable shell > built-ins necessary to parse those variables and come out with an mpi-dir, > but before doing that figured I'd ask here and see if I'm missing something > easier. > > Alex From balay at mcs.anl.gov Thu Feb 27 21:15:19 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 27 Feb 2020 21:15:19 -0600 Subject: [petsc-users] Scraping MPI information from PETSc conf In-Reply-To: <87y2snsbtj.fsf@jedbrown.org> References: <87y2snsbtj.fsf@jedbrown.org> Message-ID: Not really useful for autotools - but we print the mpi.h used during build in make.log Using mpi.h: # 1 "/home/petsc/soft/mpich-3.3b1/include/mpi.h" 1 I guess the same code [using a petsc makefile] - can be scripted and parsed to get the PATH to compare in autotools. However the current version check [below] is likely the best way. Our prior check was deemed too strict - for ex: when linux distros updated MPI packages with a bug fixed version [without API change] - our prior check flagged this as incompatible - so we had to change it. Satish On Thu, 27 Feb 2020, Jed Brown wrote: > If determining mpicc is sufficient, this will work > > pkg-config --var=ccompiler PETSc > > We also define > > $ grep NUMVERSION mpich-optg/include/petscconf.h > #define PETSC_HAVE_MPICH_NUMVERSION 30302300 > > or > > $ grep OMPI_ ompi-optg/include/petscconf.h > #define PETSC_HAVE_OMPI_MAJOR_VERSION 4 > #define PETSC_HAVE_OMPI_MINOR_VERSION 0 > #define PETSC_HAVE_OMPI_RELEASE_VERSION 2 > > which PETSc uses to raise a compile-time error if it believes you're > compiling PETSc code using an incompatible MPI. > > Note that some of this is hidden in the environment on Cray systems, for > example, where CC=cc regardless of what compiler you're actually using. > > Alexander Lindsay writes: > > > What's the cleanest way to determine the MPI install used to build PETSc? > > We are configuring a an MPI-based C++ library with autotools that will > > eventually be used by libMesh, and we'd like to make sure that this library > > (as well as libMesh) uses the same MPI that PETSc used or at worst detect > > our own and then error/warn the user if its an MPI that differs from the > > one used to build PETc. It seems like the only path info that shows up is > > in MPICXX_SHOW, PETSC_EXTERNAL_LIB_BASIC, and PETSC_WITH_EXTERNAL_LIB (I'm > > looking in petscvariables). I'm willing to learn the m4/portable shell > > built-ins necessary to parse those variables and come out with an mpi-dir, > > but before doing that figured I'd ask here and see if I'm missing something > > easier. > > > > Alex > From jordic at cttc.upc.edu Fri Feb 28 04:29:44 2020 From: jordic at cttc.upc.edu (jordic) Date: Fri, 28 Feb 2020 11:29:44 +0100 Subject: [petsc-users] Memory leak at GPU when updating matrix of type mpiaijcusparse (CUDA) Message-ID: Dear all, the following simple program: ////////////////////////////////////////////////////////////////////////////////////// #include PetscInt ierr=0; int main(int argc,char **argv) { MPI_Comm comm; PetscMPIInt rank,size; PetscInitialize(&argc,&argv,NULL,help);if (ierr) return ierr; comm = PETSC_COMM_WORLD; MPI_Comm_rank(comm,&rank); MPI_Comm_size(comm,&size); Mat A; MatCreate(comm, &A); MatSetSizes(A, 1, 1, PETSC_DETERMINE, PETSC_DETERMINE); MatSetFromOptions(A); PetscInt dnz=1, onz=0; MatMPIAIJSetPreallocation(A, 0, &dnz, 0, &onz); MatSetOption(A, MAT_NO_OFF_PROC_ENTRIES, PETSC_TRUE); MatSetOption(A, MAT_IGNORE_ZERO_ENTRIES, PETSC_TRUE); PetscInt igid=rank, jgid=rank; PetscScalar value=rank+1.0; // for(int i=0; i<10; ++i) for(;;) //infinite loop { MatSetValue(A, igid, jgid, value, INSERT_VALUES); MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); } MatDestroy(&A); PetscFinalize(); return ierr; } ////////////////////////////////////////////////////////////////////////////////////// creates a simple diagonal matrix with one value per mpi-core. If the type of the matrix is "mpiaij" (-mat_type mpiaij) there is no problem but with "mpiaijcusparse" (-mat_type mpiaijcusparse) the memory usage at the GPU grows with every iteration of the infinite loop. The only solution that I found is to destroy and create the matrix every time that it needs to be updated. Is there a better way to avoid this problem? I am using Petsc Release Version 3.12.2 with this configure options: Configure options --package-prefix-hash=/home_nobck/user/petsc-hash-pkgs --with-debugging=0 --with-fc=0 CC=gcc CXX=g++ --COPTFLAGS="-g -O3" --CXXOPTFLAGS="-g -O3" --CUDAOPTFLAGS="-D_FORCE_INLINES -g -O3" --with-mpi-include=/usr/lib/openmpi/include --with-mpi-lib="-L/usr/lib/openmpi/lib -lmpi_cxx -lmpi" --with-cuda=1 --with-precision=double --with-cuda-include=/usr/include --with-cuda-lib="-L/usr/lib/x86_64-linux-gnu -lcuda -lcudart -lcublas -lcufft -lcusparse -lcusolver" PETSC_ARCH=arch-ci-linux-opt-cxx-cuda-double Thanks for your help, Jorge -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Fri Feb 28 10:04:28 2020 From: hzhang at mcs.anl.gov (hzhang at mcs.anl.gov) Date: Fri, 28 Feb 2020 10:04:28 -0600 Subject: [petsc-users] MatMatMult In-Reply-To: References: Message-ID: Emmanuel: This is a bug in petsc. I've pushed a fix https://gitlab.com/petsc/petsc/-/commit/fd2a003f2c07165526de5c2fa5ca4f3c85618da7 You can edit it in your petsc library, or add MatAssemblyBegin/End in your application code until petsc-release is patched. Thanks for reporting it and sending us the test! Hong Thanks for the answer. > > Emmanuel: >> You can create a dense C with the required parallel layout without >> calling MatAssemblyBegin() and MatAssemblyEnd(). >> Did you get error without calling these routines? >> > > Yes, the output is (after create the C dense matrix and do not assembly > it, run1 - see attached file -): > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Not for unassembled matrix > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.12.4, Feb, 04, 2020 > [0]PETSC ERROR: ./comp on a arch-linux-c-opt-O2 named ayala by ayala Thu > Feb 27 16:47:15 2020 > [0]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2 > -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" > FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich=1 > --download-fblaslapack=1 --with-cxx-dialect=C++11 > [0]PETSC ERROR: #1 MatNorm() line 5123 in > /home/ayala/Documents/PETSc/petsc-3.12.4/src/mat/interface/matrix.c > > We only updated the help manu, not internal implementation. In the next >> release, we'll introduce new set of API to consolidate the API of >> mat-mat-operations. >> Hong >> > > I attach my test file, or maybe I'm doing something wrong. I tested this > file on my laptop ubuntu 18 > > Kind regards. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From juaneah at gmail.com Fri Feb 28 10:18:47 2020 From: juaneah at gmail.com (Emmanuel Ayala) Date: Fri, 28 Feb 2020 10:18:47 -0600 Subject: [petsc-users] MatMatMult In-Reply-To: References: Message-ID: Hi, Thanks. Kind regards. El vie., 28 de feb. de 2020 a la(s) 10:04, hzhang at mcs.anl.gov ( hzhang at mcs.anl.gov) escribi?: > Emmanuel: > This is a bug in petsc. I've pushed a fix > https://gitlab.com/petsc/petsc/-/commit/fd2a003f2c07165526de5c2fa5ca4f3c85618da7 > > You can edit it in your petsc library, or add MatAssemblyBegin/End in > your application code until petsc-release is patched. > Thanks for reporting it and sending us the test! > Hong > > Thanks for the answer. >> >> Emmanuel: >>> You can create a dense C with the required parallel layout without >>> calling MatAssemblyBegin() and MatAssemblyEnd(). >>> Did you get error without calling these routines? >>> >> >> Yes, the output is (after create the C dense matrix and do not assembly >> it, run1 - see attached file -): >> >> [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [0]PETSC ERROR: Object is in wrong state >> [0]PETSC ERROR: Not for unassembled matrix >> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html >> for trouble shooting. >> [0]PETSC ERROR: Petsc Release Version 3.12.4, Feb, 04, 2020 >> [0]PETSC ERROR: ./comp on a arch-linux-c-opt-O2 named ayala by ayala Thu >> Feb 27 16:47:15 2020 >> [0]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2 >> -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" >> FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich=1 >> --download-fblaslapack=1 --with-cxx-dialect=C++11 >> [0]PETSC ERROR: #1 MatNorm() line 5123 in >> /home/ayala/Documents/PETSc/petsc-3.12.4/src/mat/interface/matrix.c >> >> We only updated the help manu, not internal implementation. In the next >>> release, we'll introduce new set of API to consolidate the API of >>> mat-mat-operations. >>> Hong >>> >> >> I attach my test file, or maybe I'm doing something wrong. I tested this >> file on my laptop ubuntu 18 >> >> Kind regards. >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From D.J.P.Lahaye at tudelft.nl Fri Feb 28 10:32:50 2020 From: D.J.P.Lahaye at tudelft.nl (Domenico Lahaye - EWI) Date: Fri, 28 Feb 2020 16:32:50 +0000 Subject: [petsc-users] Master student exploring DMPLEX and TS/ex11.c In-Reply-To: <5b91e057fc0a4dc280b188525d921718@tudelft.nl> References: <5b91e057fc0a4dc280b188525d921718@tudelft.nl> Message-ID: <0ae7d639d923420e974792593ac6d90f@tudelft.nl> Dear all, I have a master student exploring DMPLEX and TS/ex11.c. He has build his own examples aiming at easing the learning curve. His examples are here GITHUB LINK -> https://github.com/mukkund1996/DMPLEX-advection Possibly this material is valuable to you. Any feedback is most welcome. Domenico Lahaye. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jczhang at mcs.anl.gov Fri Feb 28 12:13:12 2020 From: jczhang at mcs.anl.gov (Junchao Zhang) Date: Fri, 28 Feb 2020 12:13:12 -0600 Subject: [petsc-users] Memory leak at GPU when updating matrix of type mpiaijcusparse (CUDA) In-Reply-To: References: Message-ID: I will take a look at it and get back to you. Thanks. On Fri, Feb 28, 2020, 7:29 AM jordic wrote: > Dear all, > > the following simple program: > > > ////////////////////////////////////////////////////////////////////////////////////// > > #include > > PetscInt ierr=0; > int main(int argc,char **argv) > { > MPI_Comm comm; > PetscMPIInt rank,size; > > PetscInitialize(&argc,&argv,NULL,help);if (ierr) return ierr; > comm = PETSC_COMM_WORLD; > MPI_Comm_rank(comm,&rank); > MPI_Comm_size(comm,&size); > > Mat A; > MatCreate(comm, &A); > MatSetSizes(A, 1, 1, PETSC_DETERMINE, PETSC_DETERMINE); > MatSetFromOptions(A); > PetscInt dnz=1, onz=0; > MatMPIAIJSetPreallocation(A, 0, &dnz, 0, &onz); > MatSetOption(A, MAT_NO_OFF_PROC_ENTRIES, PETSC_TRUE); > MatSetOption(A, MAT_IGNORE_ZERO_ENTRIES, PETSC_TRUE); > PetscInt igid=rank, jgid=rank; > PetscScalar value=rank+1.0; > > // for(int i=0; i<10; ++i) > for(;;) //infinite loop > { > MatSetValue(A, igid, jgid, value, INSERT_VALUES); > MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); > MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); > } > MatDestroy(&A); > PetscFinalize(); > return ierr; > } > > > ////////////////////////////////////////////////////////////////////////////////////// > > creates a simple diagonal matrix with one value per mpi-core. If the type > of the matrix is "mpiaij" (-mat_type mpiaij) there is no problem but with > "mpiaijcusparse" (-mat_type mpiaijcusparse) the memory usage at the GPU > grows with every iteration of the infinite loop. The only solution that I > found is to destroy and create the matrix every time that it needs to be > updated. Is there a better way to avoid this problem? > > I am using Petsc Release Version 3.12.2 with this configure options: > > Configure options --package-prefix-hash=/home_nobck/user/petsc-hash-pkgs > --with-debugging=0 --with-fc=0 CC=gcc CXX=g++ --COPTFLAGS="-g -O3" > --CXXOPTFLAGS="-g -O3" --CUDAOPTFLAGS="-D_FORCE_INLINES -g -O3" > --with-mpi-include=/usr/lib/openmpi/include > --with-mpi-lib="-L/usr/lib/openmpi/lib -lmpi_cxx -lmpi" --with-cuda=1 > --with-precision=double --with-cuda-include=/usr/include > --with-cuda-lib="-L/usr/lib/x86_64-linux-gnu -lcuda -lcudart -lcublas > -lcufft -lcusparse -lcusolver" PETSC_ARCH=arch-ci-linux-opt-cxx-cuda-double > > Thanks for your help, > > Jorge > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Zane.Jakobs at colorado.edu Fri Feb 28 13:10:41 2020 From: Zane.Jakobs at colorado.edu (Zane Charles Jakobs) Date: Fri, 28 Feb 2020 12:10:41 -0700 Subject: [petsc-users] Correct use of VecGetArray with std::vector Message-ID: Hi PETSc devs, I'm writing some C++ code that calls PETSc, and I'd like to be able to place the result of VecGetArray into an std::vector and then later call VecRestoreArray on that data, or get the same effects. It seems like the correct way to do this would be something like: Vec x; std::vector vals, idx; int num_vals, global_offset; PetscErrorCode ierr; ... /* do some stuff to x and compute num_vals and global_offset*/ ... vals.resize(num_vals); idx.resize(num_vals); std::iota(idx.begin(), idx.end(), global_offset); ierr = VecGetValues(x, num_vals, idx.data(), vals.data());CHKERRQ(ierr); /* do stuff to vals */ ... ierr = VecSetValues(x, num_vals, idx.data(), vals.data(), [whatever insert mode]);CHKERRQ(ierr); idx.clear(); vals.clear(); Is that correct (in the sense that it does what you'd expect if you replaced the vectors with pointers to indices/data and used VecGet/RestoreArray() instead of VecGet/SetValues, and it doesn't violate any of std::vector's invariants, e.g. by reallocating its memory)? If not, is there a "normal" way to do this? Thanks! -Zane Jakobs -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Feb 28 13:21:59 2020 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 28 Feb 2020 14:21:59 -0500 Subject: [petsc-users] Correct use of VecGetArray with std::vector In-Reply-To: References: Message-ID: On Fri, Feb 28, 2020 at 2:12 PM Zane Charles Jakobs < Zane.Jakobs at colorado.edu> wrote: > Hi PETSc devs, > > I'm writing some C++ code that calls PETSc, and I'd like to be able to > place the result of VecGetArray into an std::vector and then later call > VecRestoreArray on that data, or get the same effects. It seems like the > correct way to do this would be something like: > Why are you calling Get/SetValues() instead Get/SetArray()? Shouldn't you just get the pointer using GetArray() and stick it into your std::vector? Thanks, Matt > Vec x; > std::vector vals, idx; > int num_vals, global_offset; > PetscErrorCode ierr; > ... > /* do some stuff to x and compute num_vals and global_offset*/ > ... > vals.resize(num_vals); > idx.resize(num_vals); > std::iota(idx.begin(), idx.end(), global_offset); > ierr = VecGetValues(x, num_vals, idx.data(), vals.data());CHKERRQ(ierr); > /* do stuff to vals */ > ... > ierr = VecSetValues(x, num_vals, idx.data(), vals.data(), [whatever insert > mode]);CHKERRQ(ierr); > idx.clear(); > vals.clear(); > > Is that correct (in the sense that it does what you'd expect if you > replaced the vectors with pointers to indices/data and used > VecGet/RestoreArray() instead of VecGet/SetValues, and it doesn't violate > any of std::vector's invariants, e.g. by reallocating its memory)? If not, > is there a "normal" way to do this? > > Thanks! > > -Zane Jakobs > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jczhang at mcs.anl.gov Fri Feb 28 14:00:54 2020 From: jczhang at mcs.anl.gov (Junchao Zhang) Date: Fri, 28 Feb 2020 14:00:54 -0600 Subject: [petsc-users] Correct use of VecGetArray with std::vector In-Reply-To: References: Message-ID: You can create the C++ vector vals and resize it to a proper size, get its data pointer, then pass it to PETSc, int n; Vec x; std::vector vals; vals.resize(n); /* You need to calculate n by other means */ ierr = VecCreateMPIWithArray(PETSC_COMM_WORLD,bs,n,PETSC_DECIDE,vals.data(),&v);CHKERRQ(ierr); // Code using v ierr = VecDestroy(&v);CHKERRQ(ierr); // Code using vals; --Junchao Zhang On Fri, Feb 28, 2020 at 1:12 PM Zane Charles Jakobs < Zane.Jakobs at colorado.edu> wrote: > Hi PETSc devs, > > I'm writing some C++ code that calls PETSc, and I'd like to be able to > place the result of VecGetArray into an std::vector and then later call > VecRestoreArray on that data, or get the same effects. It seems like the > correct way to do this would be something like: > > Vec x; > std::vector vals, idx; > int num_vals, global_offset; > PetscErrorCode ierr; > ... > /* do some stuff to x and compute num_vals and global_offset*/ > ... > vals.resize(num_vals); > idx.resize(num_vals); > std::iota(idx.begin(), idx.end(), global_offset); > ierr = VecGetValues(x, num_vals, idx.data(), vals.data());CHKERRQ(ierr); > /* do stuff to vals */ > ... > ierr = VecSetValues(x, num_vals, idx.data(), vals.data(), [whatever insert > mode]);CHKERRQ(ierr); > idx.clear(); > vals.clear(); > > Is that correct (in the sense that it does what you'd expect if you > replaced the vectors with pointers to indices/data and used > VecGet/RestoreArray() instead of VecGet/SetValues, and it doesn't violate > any of std::vector's invariants, e.g. by reallocating its memory)? If not, > is there a "normal" way to do this? > > Thanks! > > -Zane Jakobs > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jed.Brown at Colorado.EDU Fri Feb 28 14:14:03 2020 From: Jed.Brown at Colorado.EDU (Jed Brown) Date: Fri, 28 Feb 2020 13:14:03 -0700 Subject: [petsc-users] Correct use of VecGetArray with std::vector In-Reply-To: References: Message-ID: <87lfomqxkk.fsf@jedbrown.org> Matthew Knepley writes: > On Fri, Feb 28, 2020 at 2:12 PM Zane Charles Jakobs < > Zane.Jakobs at colorado.edu> wrote: > >> Hi PETSc devs, >> >> I'm writing some C++ code that calls PETSc, and I'd like to be able to >> place the result of VecGetArray into an std::vector and then later call >> VecRestoreArray on that data, or get the same effects. It seems like the >> correct way to do this would be something like: >> > > Why are you calling Get/SetValues() instead Get/SetArray()? Shouldn't you > just get the pointer using GetArray() and stick it into > your std::vector? Can't do this because std::vector can't be wrapped around existing memory. I would recommend not using std::vector. Dynamic resizing isn't a feature you want in this context, and since you'd like to use existing memory, you need to use a container that can accept existing memory. From eijkhout at tacc.utexas.edu Fri Feb 28 15:26:58 2020 From: eijkhout at tacc.utexas.edu (Victor Eijkhout) Date: Fri, 28 Feb 2020 21:26:58 +0000 Subject: [petsc-users] Correct use of VecGetArray with std::vector In-Reply-To: <87lfomqxkk.fsf@jedbrown.org> References: <87lfomqxkk.fsf@jedbrown.org> Message-ID: <6E47824B-7BE9-43E2-BC98-C0BC39B28987@tacc.utexas.edu> On , 2020Feb28, at 14:14, Jed Brown > wrote: Can't do this because std::vector can't be wrapped around existing memory. That?s why I use gsl::span which ?will be in c++20? Victor. -------------- next part -------------- An HTML attachment was scrubbed... URL: