From tom.alex.mondragon at gmail.com  Sat Feb  1 01:09:18 2020
From: tom.alex.mondragon at gmail.com (Tomas Mondragon)
Date: Sat, 1 Feb 2020 01:09:18 -0600
Subject: [petsc-users] Running moose/scripts/update_and_rebuild_petsc.sh
 on HPC
In-Reply-To: <DF8BC4C1-E11A-4687-9CFA-C4EB576BD187@anl.gov>
References: <277eb13a-0590-4b1a-a089-09a7c35efd83@googlegroups.com>
	<CAN5Wd-L2jg0gFm+SFxi7g89gM_seqCiSGWappfaGb-PyfcT2bg@mail.gmail.com>
	<edd3a169-5012-4e46-a060-5088a199603c@googlegroups.com>
	<CAN5Wd-J26NwRqVhhqRcAr=PktB4xe+99NDY_cTvUvywcBQ7o3A@mail.gmail.com>
	<641b2c64-0e88-47e7-b33a-e2528287095a@googlegroups.com>
	<CAN5Wd-JNAHQ6u9A3p6GN-h1b3_AurVtPbypkpHuaYX6Ta_wCjA@mail.gmail.com>
	<2bf174ba-a994-45f8-a661-454458a6ffa3@googlegroups.com>
	<0a8315b7-185e-44c9-b1d3-d3b8f52939d4@googlegroups.com>
	<CAN5Wd-JasXRa-CbZdZUsgZrmHioE6u=Qdz62jBydKdHF74gsTQ@mail.gmail.com>
	<c230c461-db36-4257-a88f-a1479ac97fa5@googlegroups.com>
	<CAP4KeS+STRb1h99NS+tqrBC819uNAHQCPBpPT0b08WeBQ_VSJQ@mail.gmail.com>
	<CAN5Wd-LG9yapyY6xAMH_468eE5rTsjRmhHFedadg+yhC9dYeGA@mail.gmail.com>
	<2c9e5abd-bd4f-4b95-b2ea-8aa6a993d5fb@googlegroups.com>
	<0b4c29ac-2261-404a-84f6-5e8e28e1c51f@googlegroups.com>
	<CAN5Wd-KH+V8S=vw-do8Yw3Ed_8cca7ZeA0aOfBaSLwCSLmFQtQ@mail.gmail.com>
	<095881e4-592d-427a-ad84-6cbe5fb8fe2e@googlegroups.com>
	<CAN5Wd-J7==65Cb-uXi_zk=o8MAGuVm7jQogVKTNF+7SnJ=YzUw@mail.gmail.com>
	<1a976f38-4944-425f-af72-f5ce7ce3ac85@googlegroups.com>
	<CAN5Wd-+cvRzHFjJvg_LGyEC7GH960ei94c5FCMnH-vPJjAoENA@mail.gmail.com>
	<alpine.LFD.2.21.2001301512530.132463@sb>
	<alpine.LFD.2.21.2001301546360.132463@sb>
	<CAP4KeSKD=k2BqHCHn6P0CnUY-St3s6W3F-BhMvO0-cS4qCgf6g@mail.gmail.com>
	<alpine.LFD.2.21.2001301608530.132463@sb>
	<alpine.LFD.2.21.2001301624290.132463@sb>
	<CAN5Wd-JQFC-E6ohNLx74Vyb+Z6CnBo_0AG89+Zjppgn7ki9feg@mail.gmail.com>
	<a34fa09e-a4f5-4225-8933-34eb36759260@googlegroups.com>
	<CAP4KeSJtxarA=jnxdn+Eai2fi7nbdE09kL75v9GEr61oYVuN6g@mail.gmail.com>
	<DF8BC4C1-E11A-4687-9CFA-C4EB576BD187@anl.gov>
Message-ID: <CAP4KeSLFZ=1hijqhMvoBTSxXnhaMHikeB5AxKuzy0JyUx3t2Fg@mail.gmail.com>

Thanks, that does sound useful!

On Fri, Jan 31, 2020, 6:23 PM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:

>
>    You might find this option useful.
>
>    --with-packages-download-dir=<dir>
>        Skip network download of package tarballs and locate them in
> specified dir. If not found in dir, print package URL - so it can be
> obtained manually.
>
>
>    This generates a list of URLs to download so you don't need to look
> through the xxx.py files for that information. Conceivably a script could
> gather this information from the run of configure and get the tarballs for
> you.
>
>     Barry
>
>
>
> > On Jan 31, 2020, at 11:58 AM, Tomas Mondragon <
> tom.alex.mondragon at gmail.com> wrote:
> >
> > Hypre problem resolved. PETSc commit 05f86fb made in August 05, 2019
> added the line 'self.installwithbatch  = 0' to the __init__ method of the
> Configure class in the file
> petsc/config/BuildSystem/config/packages/hypre.py to fix a bug with hypre
> installation on Cray KNL systems. Since the machine I was installing os was
> an SGI system, I decided to try switching to 'self.installwithbatch = 1'
> and it worked! The configure script was finally able to run to completion.
> >
> > Perhaps there can be a Cray flag for configure that can control this,
> since it is only Cray's that have this problem with Hypre?
> >
> > For my benefit when I have to do this again -
> > To get moose/petsc/scripts/update_and_rebuild_petsc.sh to run on an SGI
> system as a batch job, I had to:
> >
> > Make sure the git (gnu version) module was loaded
> > git clone moose
> > cd to the petsc directory and git clone the petsc submodule, but make
> sure to pull the latest commit. The commit that the moose repo refers to is
> outdated.
> > cd back to the moose directory, git add petsc and git commit so that the
> newest petsc commit gets used by the update script. otherwise the old
> commit will be used.
> > download the tarballs for fblaspack, hypre, metis, mumps, parmetis,
> scalapack, (PT)scotch, slepc, and superLU_dist. The URLS are in the
> __init__ methods of the relevant files
> inmost/petsc/config/BuildSystem/config/packages/
> > alter moose/scripts/update_and_rebuild_petsc.sh script so that it is a
> working PBS batch job. Be sure to module swap to the gcc compiler and
> module load git (gnu version) and alter the ./configure command arguments
> >      adding
> >              --with-cudac=0
> >              --with-batch=1
> >     changing
> >              --download-<package>=/path/to/thirdparty/package/tarball
> > If the supercomputer is not a Cray KNL system, change line 26 of
> moose/petsc/config/BuildSystem/config/packages/hypre.py from
> 'self.installwithbath = 0' to 'self.installwithbatch = 1', otherwise,
> install hypre on its own and use --with-hypre-dir=/path/to/hypre in the
> ./configure command
> >
> > On Fri, Jan 31, 2020 at 10:06 AM Tomas Mondragon <
> tom.alex.mondragon at gmail.com> wrote:
> > Thanks for the change to base.py. Pulling the commit, confirm was able
> to skip over lgrind and c2html. I did have a problem with Parmetis, but
> that was because I was using an old ParMetis commit accidentally. Fixed by
> downloading the right commit of ParMetis.
> >
> > My current problem is with Hypre. Apparently --download-hypre cannot be
> used with --with-batch=1 even if the download URL is on the local machine.
> The configuration.log that resulted is attached for anyone who may be
> interested.
> >
> > --
> > You received this message because you are subscribed to a topic in the
> Google Groups "moose-users" group.
> > To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/moose-users/2xZsBpG-DtY/unsubscribe.
> > To unsubscribe from this group and all its topics, send an email to
> moose-users+unsubscribe at googlegroups.com.
> > To view this discussion on the web visit
> https://groups.google.com/d/msgid/moose-users/a34fa09e-a4f5-4225-8933-34eb36759260%40googlegroups.com
> .
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200201/20e6b9c3/attachment.html>

From dmitry.melnichuk at geosteertech.com  Mon Feb  3 09:38:57 2020
From: dmitry.melnichuk at geosteertech.com (=?utf-8?B?0JTQvNC40YLRgNC40Lkg0JzQtdC70YzQvdC40YfRg9C6?=)
Date: Mon, 03 Feb 2020 18:38:57 +0300
Subject: [petsc-users] Triple increasing of allocated memory during KSPSolve
 calling(GMRES preconditioned by ASM)
Message-ID: <9444561580744337@myt2-b8bf7a4d4ebc.qloud-c.yandex.net>

An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200203/11dffb93/attachment.html>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200203/11dffb93/attachment-0001.html>

From knepley at gmail.com  Mon Feb  3 10:34:47 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 3 Feb 2020 11:34:47 -0500
Subject: [petsc-users] Triple increasing of allocated memory during
 KSPSolve calling(GMRES preconditioned by ASM)
In-Reply-To: <9444561580744337@myt2-b8bf7a4d4ebc.qloud-c.yandex.net>
References: <9444561580744337@myt2-b8bf7a4d4ebc.qloud-c.yandex.net>
Message-ID: <CAMYG4GmU=9hctrsUveBYEHH1m_D4_02WQ=4HaiRCGEkuuq0GmQ@mail.gmail.com>

On Mon, Feb 3, 2020 at 10:38 AM ??????? ????????? <
dmitry.melnichuk at geosteertech.com> wrote:

> Hello all!
>
> Now I am faced with a problem associated with the memory allocation when
> calling of KSPSolve .
>
> GMRES preconditioned by ASM for solving linear algebra system (obtained by
> the finite element spatial discretisation of Biot poroelasticity model) was
> chosen.
> According to the output value of PetscMallocGetCurrentUsage subroutine 176
> MB for matrix and RHS vector storage is required (before KSPSolve calling).
> But during solving linear algebra system 543 MB of RAM is required (during
> KSPSolve calling).
> Thus, the amount of allocated memory after preconditioning stage increased
> three times. This kind of behaviour is critically for 3D models with
> several millions of cells.
>

1) In order to know anything, we have to see the output of -ksp_view,
although I see you used an overlap of 2

2) The overlap increases the size of submatrices beyond that of the
original matrix. Suppose that you used LU for the sub-preconditioner.
    You would need at least 2x memory (with ILU(0)) since the matrix
dominates memory usage. Moreover, you have overlap
    and you might have fill-in depending on the solver.

3) The massif tool from valgrind is a good fine-grained way to look at
memory allocation

  Thanks,

     Matt

Is there a way to decrease amout of allocated memory?
> Is that an expected behaviour for GMRES-ASM combination?
>
> As I remember, using previous version of PETSc didn't demonstrate so
> significante memory increasing.
>
> ...
> Vec :: Vec_F, Vec_U
> Mat :: Mat_K
> ...
> ...
> call MatAssemblyBegin(Mat_M,Mat_Final_Assembly,ierr)
> call MatAssemblyEnd(Mat_M,Mat_Final_Assembly,ierr)
> ....
> call VecAssemblyBegin(Vec_F_mod,ierr)
> call VecAssemblyEnd(Vec_F_mod,ierr)
> ...
> ...
> call PetscMallocGetCurrentUsage(mem, ierr)
> print *,"Memory used: ",mem
> ...
> ...
> call KSPSetType(Krylov,KSPGMRES,ierr)
> call KSPGetPC(Krylov,PreCon,ierr)
> call PCSetType(PreCon,PCASM,ierr)
> call KSPSetFromOptions(Krylov,ierr)
> ...
> call KSPSolve(Krylov,Vec_F,Vec_U,ierr)
> ...
> ...
> options = "-pc_asm_overlap 2 -pc_asm_type basic -ksp_monitor
> -ksp_converged_reason"
>
>
> Kind regards,
> Dmitry Melnichuk
> Matrix.dat (265288024) <https://yadi.sk/d/YCCJRsUVmFK9XA>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200203/8005798d/attachment.html>

From ashour.msc at gmail.com  Sun Feb  2 11:24:42 2020
From: ashour.msc at gmail.com (Mohammed Ashour)
Date: Sun, 2 Feb 2020 18:24:42 +0100
Subject: [petsc-users] Flagging the solver to restart
Message-ID: <CAG3=zwCjBdGgd8gv3Dt4UAzbrDEDkCjmBCOjuKq2w=N_eN_EkA@mail.gmail.com>

Dear All,
I'm solving a constraint phase-field problem using PetIGA. This question
i'm having is more relevant to PETSc, so I'm posting here.

I have an algorithm involving iterating on the solution vector until
certain criteria are met before moving forward for the next time step. The
sequence inside the TSSolve is to call TSMonitor first, to print a
user-defined set of values and the move to solve at TSStep and then call
TSPostEvaluate.

So I'm using the TSMonitor to update some variables at time n , those
variables are used the in the residual and jacobian calculations at time
n+1, and then solving and then check if those criteria are met or not in a
function assigned to TS via TSSetPostEvaluate, if the criteria are met,
it'll move forward, if not, it'll engaged the routine TSRollBack(), which
based on my understanding is the proper way yo flag the solver to
recalculate n+1. My question is, is this the proper way to do it? what is
the difference between TSRollBack and TSRestart?
Thanks a lot

--
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200202/d5e5ca19/attachment.html>

From bsmith at mcs.anl.gov  Mon Feb  3 10:55:36 2020
From: bsmith at mcs.anl.gov (Smith, Barry F.)
Date: Mon, 3 Feb 2020 16:55:36 +0000
Subject: [petsc-users] Triple increasing of allocated memory during
 KSPSolve calling(GMRES preconditioned by ASM)
In-Reply-To: <CAMYG4GmU=9hctrsUveBYEHH1m_D4_02WQ=4HaiRCGEkuuq0GmQ@mail.gmail.com>
References: <9444561580744337@myt2-b8bf7a4d4ebc.qloud-c.yandex.net>
	<CAMYG4GmU=9hctrsUveBYEHH1m_D4_02WQ=4HaiRCGEkuuq0GmQ@mail.gmail.com>
Message-ID: <86DE2602-DCBD-47C4-819C-223F9CF9A503@mcs.anl.gov>


   GMRES also can by default require about 35 work vectors if it reaches the full restart. You can set a smaller restart with -ksp_gmres_restart 15 for example but this can also hurt the convergence of GMRES dramatically. People sometimes use the KSPBCGS algorithm since it does not require all the restart vectors but it can also converge more slowly.

    Depending on how much memory the sparse matrices use relative to the vectors the vector memory may matter or not.

   If you are using a recent version of PETSc you can run with -log_view -log_view_memory and it will show on the right side of the columns how much memory is being allocated for each of the operations in various ways. 

   Barry


> On Feb 3, 2020, at 10:34 AM, Matthew Knepley <knepley at gmail.com> wrote:
> 
> On Mon, Feb 3, 2020 at 10:38 AM ??????? ????????? <dmitry.melnichuk at geosteertech.com> wrote:
> Hello all!
> 
> Now I am faced with a problem associated with the memory allocation when calling of KSPSolve .
> 
> GMRES preconditioned by ASM for solving linear algebra system (obtained by the finite element spatial discretisation of Biot poroelasticity model) was chosen.
> According to the output value of PetscMallocGetCurrentUsage subroutine 176 MB for matrix and RHS vector storage is required (before KSPSolve calling).
> But during solving linear algebra system 543 MB of RAM is required (during KSPSolve calling).
> Thus, the amount of allocated memory after preconditioning stage increased three times. This kind of behaviour is critically for 3D models with several millions of cells.  
> 
> 1) In order to know anything, we have to see the output of -ksp_view, although I see you used an overlap of 2
> 
> 2) The overlap increases the size of submatrices beyond that of the original matrix. Suppose that you used LU for the sub-preconditioner.
>     You would need at least 2x memory (with ILU(0)) since the matrix dominates memory usage. Moreover, you have overlap
>     and you might have fill-in depending on the solver.
> 
> 3) The massif tool from valgrind is a good fine-grained way to look at memory allocation
>  
>   Thanks,
> 
>      Matt
> 
> Is there a way to decrease amout of allocated memory?
> Is that an expected behaviour for GMRES-ASM combination?
> 
> As I remember, using previous version of PETSc didn't demonstrate so significante memory increasing.  
> 
> ...
> Vec :: Vec_F, Vec_U
> Mat :: Mat_K
> ...
> ...
> call MatAssemblyBegin(Mat_M,Mat_Final_Assembly,ierr)
> call MatAssemblyEnd(Mat_M,Mat_Final_Assembly,ierr)
> ....
> call VecAssemblyBegin(Vec_F_mod,ierr)
> call VecAssemblyEnd(Vec_F_mod,ierr)
> ...
> ...
> call PetscMallocGetCurrentUsage(mem, ierr)
> print *,"Memory used: ",mem
> ...
> ...
> call KSPSetType(Krylov,KSPGMRES,ierr)
> call KSPGetPC(Krylov,PreCon,ierr)
> call PCSetType(PreCon,PCASM,ierr)
> call KSPSetFromOptions(Krylov,ierr)
> ...
> call KSPSolve(Krylov,Vec_F,Vec_U,ierr)
> ...
> ...
> options = "-pc_asm_overlap 2 -pc_asm_type basic -ksp_monitor -ksp_converged_reason"
>  
>  
> Kind regards,
> Dmitry Melnichuk
> Matrix.dat (265288024)
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/


From dmitry.melnichuk at geosteertech.com  Tue Feb  4 06:04:58 2020
From: dmitry.melnichuk at geosteertech.com (=?utf-8?B?0JTQvNC40YLRgNC40Lkg0JzQtdC70YzQvdC40YfRg9C6?=)
Date: Tue, 04 Feb 2020 15:04:58 +0300
Subject: [petsc-users] Triple increasing of allocated memory during
 KSPSolve calling(GMRES preconditioned by ASM)
In-Reply-To: <86DE2602-DCBD-47C4-819C-223F9CF9A503@mcs.anl.gov>
References: <9444561580744337@myt2-b8bf7a4d4ebc.qloud-c.yandex.net>
	<CAMYG4GmU=9hctrsUveBYEHH1m_D4_02WQ=4HaiRCGEkuuq0GmQ@mail.gmail.com>
	<86DE2602-DCBD-47C4-819C-223F9CF9A503@mcs.anl.gov>
Message-ID: <14855941580817898@vla4-d1c3bcedfacb.qloud-c.yandex.net>

An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200204/1ae84c11/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Logs_26K_GMRES-ASM-log_view-log_view_memory-malloc_dump_32bit
Type: application/octet-stream
Size: 68273 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200204/1ae84c11/attachment-0001.obj>

From bsmith at mcs.anl.gov  Tue Feb  4 10:04:45 2020
From: bsmith at mcs.anl.gov (Smith, Barry F.)
Date: Tue, 4 Feb 2020 16:04:45 +0000
Subject: [petsc-users] Triple increasing of allocated memory during
 KSPSolve calling(GMRES preconditioned by ASM)
In-Reply-To: <14855941580817898@vla4-d1c3bcedfacb.qloud-c.yandex.net>
References: <9444561580744337@myt2-b8bf7a4d4ebc.qloud-c.yandex.net>
	<CAMYG4GmU=9hctrsUveBYEHH1m_D4_02WQ=4HaiRCGEkuuq0GmQ@mail.gmail.com>
	<86DE2602-DCBD-47C4-819C-223F9CF9A503@mcs.anl.gov>
	<14855941580817898@vla4-d1c3bcedfacb.qloud-c.yandex.net>
Message-ID: <72B6812D-4131-41A6-9A30-878D2F9058D8@mcs.anl.gov>


   Please run with the option -ksp_view so we know the exact solver options you are using. 

   From the lines 

MatCreateSubMats       1 1.0 1.9397e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 1.1066e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatIncreaseOvrlp       1 1.0 3.0324e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0

   and the fact you have three matrices I would guess you are using the additive Schwarz preconditioner (ASM) with ILU(0) on the blocks. (which converges the same as ILU on one process but does use much more memory).

   Note: your code is still built with 32 bit integers. 

   I would guess the basic matrix formed plus the vectors in this example could take ~200 MB. It is the two matrices in the additive Schwarz that is taking the additional memory.

   What kind of PDEs are you solving and what kind of formulation?  

   ASM plus ILU is the "work mans" type preconditioner, relatively robust but not particularly fast for convergence. Depending on your problem you might be able to do much better convergence wise by using a PCFIELDSPLIT and a PCGAMG on one of the splits. In your own run you see the ILU is chugging along rather slowly to the solution. 

   With your current solvers you can use the option -sub_pc_factor_in_place which will shave off one of the matrices memories. Please try that.

   Avoiding the ASM you can avoid both extra matrices but at the cost of even slower convergence. Use, for example -pc_type sor 

   
    The petroleum industry also has a variety of "custom" preconditioners/solvers for particular models and formulations that can beat the convergence of general purpose solvers; and require less memory. Some of these can be implemented or simulated with PETSc. Some of these are implemented in the commercial petroleum simulation codes and it can be difficult to get a handle on exactly what they do because of proprietary issues. I think I have an old text on these approaches in my office, there may be modern books that discuss these.


   Barry




   



> On Feb 4, 2020, at 6:04 AM, ??????? ????????? <dmitry.melnichuk at geosteertech.com> wrote:
> 
> Hello again!
> Thank you very much for your replies!
> Log is attached.
>  
> 1. The main problem now is following. To solve the matrix that is attached to my previous e-mail PETSc consumes ~550 MB.
> I know definitely that there are commercial softwares in petroleum industry (e.g., Schlumberger Petrel) that solve the same initial problem consuming only ~200 MB.
> Moreover, I am sure that when I used 32-bit PETSc (GMRES + ASM) a year ago, it also consumed ~200 MB for this matrix.
>  
> So, my question is: do you have any advice on how to decrease the amount of RAM consumed for such matrix from 550 MB to 200 MB? Maybe some specific preconditioner or other ways?
>  
> I will be very grateful for any thoughts!
>  
> 2. The second problem is more particular.
> According to resource manager in Windows 10, Fortran solver based on PETSc consumes 548 MB RAM  while solving the system of linear equations.
> As I understand it form logs, it is required 459 MB and 52 MB for matrix and vector storage respectively. After summing of all objects for which memory is  allocated we get only 517 MB.
>  
> Thank you again for your time! Have a nice day.
>  
> Kind regards,
> Dmitry
> 
> 
> 03.02.2020, 19:55, "Smith, Barry F." <bsmith at mcs.anl.gov>:
> 
>    GMRES also can by default require about 35 work vectors if it reaches the full restart. You can set a smaller restart with -ksp_gmres_restart 15 for example but this can also hurt the convergence of GMRES dramatically. People sometimes use the KSPBCGS algorithm since it does not require all the restart vectors but it can also converge more slowly.
> 
>     Depending on how much memory the sparse matrices use relative to the vectors the vector memory may matter or not.
> 
>    If you are using a recent version of PETSc you can run with -log_view -log_view_memory and it will show on the right side of the columns how much memory is being allocated for each of the operations in various ways. 
> 
>    Barry
> 
> 
> 
>  On Feb 3, 2020, at 10:34 AM, Matthew Knepley <knepley at gmail.com> wrote:
> 
>  On Mon, Feb 3, 2020 at 10:38 AM ??????? ????????? <dmitry.melnichuk at geosteertech.com> wrote:
>  Hello all!
> 
>  Now I am faced with a problem associated with the memory allocation when calling of KSPSolve .
> 
>  GMRES preconditioned by ASM for solving linear algebra system (obtained by the finite element spatial discretisation of Biot poroelasticity model) was chosen.
>  According to the output value of PetscMallocGetCurrentUsage subroutine 176 MB for matrix and RHS vector storage is required (before KSPSolve calling).
>  But during solving linear algebra system 543 MB of RAM is required (during KSPSolve calling).
>  Thus, the amount of allocated memory after preconditioning stage increased three times. This kind of behaviour is critically for 3D models with several millions of cells. 
> 
>  1) In order to know anything, we have to see the output of -ksp_view, although I see you used an overlap of 2
> 
>  2) The overlap increases the size of submatrices beyond that of the original matrix. Suppose that you used LU for the sub-preconditioner.
>      You would need at least 2x memory (with ILU(0)) since the matrix dominates memory usage. Moreover, you have overlap
>      and you might have fill-in depending on the solver.
> 
>  3) The massif tool from valgrind is a good fine-grained way to look at memory allocation
> 
>    Thanks,
> 
>       Matt
> 
>  Is there a way to decrease amout of allocated memory?
>  Is that an expected behaviour for GMRES-ASM combination?
> 
>  As I remember, using previous version of PETSc didn't demonstrate so significante memory increasing. 
> 
>  ...
>  Vec :: Vec_F, Vec_U
>  Mat :: Mat_K
>  ...
>  ...
>  call MatAssemblyBegin(Mat_M,Mat_Final_Assembly,ierr)
>  call MatAssemblyEnd(Mat_M,Mat_Final_Assembly,ierr)
>  ....
>  call VecAssemblyBegin(Vec_F_mod,ierr)
>  call VecAssemblyEnd(Vec_F_mod,ierr)
>  ...
>  ...
>  call PetscMallocGetCurrentUsage(mem, ierr)
>  print *,"Memory used: ",mem
>  ...
>  ...
>  call KSPSetType(Krylov,KSPGMRES,ierr)
>  call KSPGetPC(Krylov,PreCon,ierr)
>  call PCSetType(PreCon,PCASM,ierr)
>  call KSPSetFromOptions(Krylov,ierr)
>  ...
>  call KSPSolve(Krylov,Vec_F,Vec_U,ierr)
>  ...
>  ...
>  options = "-pc_asm_overlap 2 -pc_asm_type basic -ksp_monitor -ksp_converged_reason"
> 
> 
>  Kind regards,
>  Dmitry Melnichuk
>  Matrix.dat (265288024)
> 
> 
>  -- 
>  What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>  -- Norbert Wiener
> 
>  https://www.cse.buffalo.edu/~knepley/
> 
> 
> <Logs_26K_GMRES-ASM-log_view-log_view_memory-malloc_dump_32bit>


From hongzhang at anl.gov  Tue Feb  4 10:32:45 2020
From: hongzhang at anl.gov (Zhang, Hong)
Date: Tue, 4 Feb 2020 16:32:45 +0000
Subject: [petsc-users] Flagging the solver to restart
In-Reply-To: <CAG3=zwCjBdGgd8gv3Dt4UAzbrDEDkCjmBCOjuKq2w=N_eN_EkA@mail.gmail.com>
References: <CAG3=zwCjBdGgd8gv3Dt4UAzbrDEDkCjmBCOjuKq2w=N_eN_EkA@mail.gmail.com>
Message-ID: <4246EF6F-6B7A-4202-806B-6D334E5B9427@anl.gov>



> On Feb 2, 2020, at 11:24 AM, Mohammed Ashour <ashour.msc at gmail.com> wrote:
> 
> Dear All,
> I'm solving a constraint phase-field problem using PetIGA. This question i'm having is more relevant to PETSc, so I'm posting here.
> 
> I have an algorithm involving iterating on the solution vector until certain criteria are met before moving forward for the next time step. The sequence inside the TSSolve is to call TSMonitor first, to print a user-defined set of values and the move to solve at TSStep and then call TSPostEvaluate.
> 
> So I'm using the TSMonitor to update some variables at time n , those variables are used the in the residual and jacobian calculations at time n+1, and then solving and then check if those criteria are met or not in a function assigned to TS via TSSetPostEvaluate, if the criteria are met, it'll move forward, if not, it'll engaged the routine TSRollBack(), which based on my understanding is the proper way yo flag the solver to recalculate n+1. My question is, is this the proper way to do it? what is the difference between TSRollBack and TSRestart?

You are right that TSRollBack() recalculates the current time step. But I would not suggest to use TSPostEvaluate in your case. Presumably you are not using the PETSc adaptor (e.g. via -ts_adapt_type none) and want to control the stepsize yourself. You can check the criteria in TSPostStep, call TSRollBack() if the criteria are not met and update the variables accordingly. The variables can also be updated in TSPreStep(), but TSMonitor should not be used since it is designed for read-only operations.

TSRestart may be needed when you are using non-self-starting integration methods such as multiple step methods and FSAL RK methods (-ts_rk_type <3bs,5dp,5bs,6vr,7vr,8vr>). These methods rely on solutions or residuals from previous time steps, thus need a flag to hard restart the time integration whenever discontinuity is introduced (e.g. a parameter in the RHS function is changed). So TSRestart sets the flag to tell the integrator to treat the next time step like the first time step in a time integration.

Hong (Mr.) 

> Thanks a lot
> 
> -- 


From balay at mcs.anl.gov  Tue Feb  4 12:10:41 2020
From: balay at mcs.anl.gov (Satish Balay)
Date: Tue, 4 Feb 2020 12:10:41 -0600
Subject: [petsc-users] petsc-3.12.4.tar.gz now available
Message-ID: <alpine.LFD.2.21.2002041142190.2589@sb>

Dear PETSc users,

The patch release petsc-3.12.4 is now available for download,
with change list at 'PETSc-3.12 Changelog'

http://www.mcs.anl.gov/petsc/download/index.html

Satish



From dong-hao at outlook.com  Tue Feb  4 12:41:43 2020
From: dong-hao at outlook.com (Hao DONG)
Date: Tue, 4 Feb 2020 18:41:43 +0000
Subject: [petsc-users] What is the right way to implement a (block) Diagonal
 ILU as PC?
Message-ID: <MN2PR07MB6239B134B7BCE032E8ACC4C995030@MN2PR07MB6239.namprd07.prod.outlook.com>

Dear all,


I have a few questions about the implementation of diagonal ILU PC in PETSc. I want to solve a very simple system with KSP (in parallel), the nature of the system (finite difference time-harmonic Maxwell) is probably not important to the question itself. Long story short, I just need to solve a set of equations of Ax = b with a block diagonal system matrix, like (not sure if the mono font works):

   |X    |
A =|  Y  |
   |    Z|

Note that A is not really block-diagonal, it?s just a multi-diagonal matrix determined by the FD mesh, where most elements are close to diagonal. So instead of a full ILU decomposition, a D-ILU is a good approximation as a preconditioner, and the number of blocks should not matter too much:

    |Lx      |         |Ux      |
L = |   Ly   | and U = |   Uy   |
    |      Lz|         |      Uz|

Where [Lx, Ux] = ILU0(X), etc. Indeed, the D-ILU preconditioner (with 3N blocks) is quite efficient with Krylov subspace methods like BiCGstab or QMR in my sequential Fortran/Matlab code.

So like most users, I am looking for a parallel implement with this problem in PETSc. After looking through the manual, it seems that the most straightforward way to do it is through PCBJACOBI. Not sure I understand it right, I just setup a 3-block PCJACOBI and give each of the block a KSP with PCILU. Is this supposed to be equivalent to my D-ILU preconditioner? My little block of fortran code would look like:
...
      call PCBJacobiSetTotalBlocks(pc_local,Ntotal,                   &
     &     isubs,ierr)
      call PCBJacobiSetLocalBlocks(pc_local, Nsub,                    &
     &    isubs(istart:iend),ierr)
      ! set up the block jacobi structure
      call KSPSetup(ksp_local,ierr)
      ! allocate sub ksps
      allocate(ksp_sub(Nsub))
      call PCBJacobiGetSubKSP(pc_local,Nsub,istart,                   &
     &     ksp_sub,ierr)
      do i=1,Nsub
          call KSPGetPC(ksp_sub(i),pc_sub,ierr)
          !ILU preconditioner
          call PCSetType(pc_sub,ptype,ierr)
          call PCFactorSetLevels(pc_sub,1,ierr) ! use ILU(1) here
          call KSPSetType(ksp_sub(i),KSPPREONLY,ierr)]
      end do
      call KSPSetTolerances(ksp_local,KSSiter%tol,PETSC_DEFAULT_REAL, &
     &     PETSC_DEFAULT_REAL,KSSiter%maxit,ierr)
?

I understand that the parallel performance may not be comparable, so I first set up a one-process test (with MPIAij, but all the rows are local since there is only one process). The system is solved without any problem (identical results within error). But the performance is actually a lot worse (code built without debugging flags in performance tests) than my own home-brew implementation in Fortran (I wrote my own ILU0 in CSR sparse matrix format), which is hard to believe. I suspect the difference is from the PC as the PETSc version took much more BiCGstab iterations (60-ish vs 100-ish) to converge to the same relative tol.

This is further confirmed when I change the setup of D-ILU (using 6 or 9 blocks instead of 3). While my Fortran/Matlab codes see minimal performance difference (<5%) when I play with the D-ILU setup, increasing the number of D-ILU blocks from 3 to 9 caused the ksp setup with PCBJACOBI to suffer a performance decrease of more than 50% in sequential test. So my implementation IS somewhat different in PETSc. Do I miss something in the PCBJACOBI setup? Or do I have some fundamental misunderstanding of how PCBJACOBI works in PETSc?

If this is not the right way to implement a block diagonal ILU as (parallel) PC, please kindly point me to the right direction. I searched through the mail list to find some answers, only to find a couple of similar questions... An example would be nice.

On the other hand, does PETSc support a simple way to use explicit L/U matrix as a preconditioner? I can import the  D-ILU matrix (I already converted my A matrix into Mat) constructed in my Fortran code to make a better comparison. Or do I have to construct my own PC using PCshell? If so, is there a good tutorial/example to learn about how to use PCSHELL (in a more advanced sense, like how to setup pc side and transpose)?

Thanks in advance,

Hao

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200204/9e2ea637/attachment-0001.html>

From hzhang at mcs.anl.gov  Tue Feb  4 13:16:04 2020
From: hzhang at mcs.anl.gov (hzhang at mcs.anl.gov)
Date: Tue, 4 Feb 2020 13:16:04 -0600
Subject: [petsc-users] What is the right way to implement a (block)
 Diagonal ILU as PC?
In-Reply-To: <MN2PR07MB6239B134B7BCE032E8ACC4C995030@MN2PR07MB6239.namprd07.prod.outlook.com>
References: <MN2PR07MB6239B134B7BCE032E8ACC4C995030@MN2PR07MB6239.namprd07.prod.outlook.com>
Message-ID: <CAGCphBv2+KE_8t7oqr+-qa5ctBvDzQCwpihjXoAUeViT72j4VA@mail.gmail.com>

Hao:
I would suggest to use a parallel sparse direct solver, e.g., superlu_dist
or mumps. These solvers can take advantage of your sparse data structure.
Once it works, then you may play with other preconditioners, such as
bjacobi + lu/ilu. See
https://www.mcs.anl.gov/petsc/miscellaneous/external.html
Hong

Dear all,
>
>
> I have a few questions about the implementation of diagonal ILU PC in
> PETSc. I want to solve a very simple system with KSP (in parallel), the
> nature of the system (finite difference time-harmonic Maxwell) is probably
> not important to the question itself. Long story short, I just need to
> solve a set of equations of Ax = b with a block diagonal system matrix,
> like (not sure if the mono font works):
>
>    |X    |
> A =|  Y  |
>    |    Z|
>
> Note that A is not really block-diagonal, it?s just a multi-diagonal
> matrix determined by the FD mesh, where most elements are close to
> diagonal. So instead of a full ILU decomposition, a D-ILU is a good
> approximation as a preconditioner, and the number of blocks should not
> matter too much:
>
>     |Lx      |         |Ux      |
> L = |   Ly   | and U = |   Uy   |
>     |      Lz|         |      Uz|
>
> Where [Lx, Ux] = ILU0(X), etc. Indeed, the D-ILU preconditioner (with 3N
> blocks) is quite efficient with Krylov subspace methods like BiCGstab or
> QMR in my sequential Fortran/Matlab code.
>
> So like most users, I am looking for a parallel implement with this
> problem in PETSc. After looking through the manual, it seems that the
> most straightforward way to do it is through PCBJACOBI. Not sure I
> understand it right, I just setup a 3-block PCJACOBI and give each of the
> block a KSP with PCILU. Is this supposed to be equivalent to my
> D-ILU preconditioner? My little block of fortran code would look like:
> ...
> *      call* PCBJacobiSetTotalBlocks(pc_local,Ntotal,                   &
>      &     isubs,ierr)
>       *call* PCBJacobiSetLocalBlocks(pc_local, Nsub,                    &
>      &    isubs(istart:iend),ierr)
>       ! set up the block jacobi structure
>       *call* KSPSetup(ksp_local,ierr)
>       ! allocate sub ksps
>       allocate(ksp_sub(Nsub))
>       *call* PCBJacobiGetSubKSP(pc_local,Nsub,istart,                   &
>      &     ksp_sub,ierr)
>       do i=1,Nsub
>           *call* KSPGetPC(ksp_sub(i),pc_sub,ierr)
>           !ILU preconditioner
>           *call* PCSetType(pc_sub,ptype,ierr)
>           *call* PCFactorSetLevels(pc_sub,1,ierr) ! use ILU(1) here
>           *call* KSPSetType(ksp_sub(i),KSPPREONLY,ierr)]
>       end do
>       *call* KSPSetTolerances(ksp_local,KSSiter%tol,PETSC_DEFAULT_REAL, &
>      &     PETSC_DEFAULT_REAL,KSSiter%maxit,ierr)
> ?
>
> I understand that the parallel performance may not be comparable, so I
> first set up a one-process test (with MPIAij, but all the rows are local
> since there is only one process). The system is solved without any
> problem (identical results within error). But the performance is actually a
> lot worse (code built without debugging flags in performance tests) than my
> own home-brew implementation in Fortran (I wrote my own ILU0 in CSR sparse
> matrix format), which is hard to believe. I suspect the difference is from
> the PC as the PETSc version took much more BiCGstab iterations (60-ish vs
> 100-ish) to converge to the same relative tol.
>
> This is further confirmed when I change the setup of D-ILU (using 6 or 9
> blocks instead of 3). While my Fortran/Matlab codes see minimal performance
> difference (<5%) when I play with the D-ILU setup, increasing the number of
> D-ILU blocks from 3 to 9 caused the ksp setup with PCBJACOBI to suffer a
> performance decrease of more than 50% in sequential test. So my
> implementation IS somewhat different in PETSc. Do I miss something in the
> PCBJACOBI setup? Or do I have some fundamental misunderstanding of how
> PCBJACOBI works in PETSc?
>
> If this is not the right way to implement a block diagonal ILU as
> (parallel) PC, please kindly point me to the right direction. I searched
> through the mail list to find some answers, only to find a couple of
> similar questions... An example would be nice.
>
> On the other hand, does PETSc support a simple way to use explicit L/U
> matrix as a preconditioner? I can import the  D-ILU matrix (I already
> converted my A matrix into Mat) constructed in my Fortran code to make a
> better comparison. Or do I have to construct my own PC using PCshell? If
> so, is there a good tutorial/example to learn about how to use PCSHELL (in
> a more advanced sense, like how to setup pc side and transpose)?
>
> Thanks in advance,
>
> Hao
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200204/a385a12f/attachment.html>

From bsmith at mcs.anl.gov  Tue Feb  4 14:27:09 2020
From: bsmith at mcs.anl.gov (Smith, Barry F.)
Date: Tue, 4 Feb 2020 20:27:09 +0000
Subject: [petsc-users] What is the right way to implement a (block)
 Diagonal ILU as PC?
In-Reply-To: <MN2PR07MB6239B134B7BCE032E8ACC4C995030@MN2PR07MB6239.namprd07.prod.outlook.com>
References: <MN2PR07MB6239B134B7BCE032E8ACC4C995030@MN2PR07MB6239.namprd07.prod.outlook.com>
Message-ID: <264F91C4-8558-4850-9B4B-ABE4123C2A2C@anl.gov>



> On Feb 4, 2020, at 12:41 PM, Hao DONG <dong-hao at outlook.com> wrote:
> 
> Dear all, 
> 
> 
> I have a few questions about the implementation of diagonal ILU PC in PETSc. I want to solve a very simple system with KSP (in parallel), the nature of the system (finite difference time-harmonic Maxwell) is probably not important to the question itself. Long story short, I just need to solve a set of equations of Ax = b with a block diagonal system matrix, like (not sure if the mono font works): 
> 
>    |X    |  
> A =|  Y  |  
>    |    Z| 
> 
> Note that A is not really block-diagonal, it?s just a multi-diagonal matrix determined by the FD mesh, where most elements are close to diagonal. So instead of a full ILU decomposition, a D-ILU is a good approximation as a preconditioner, and the number of blocks should not matter too much: 
> 
>     |Lx      |         |Ux      |
> L = |   Ly   | and U = |   Uy   |
>     |      Lz|         |      Uz|
> 
> Where [Lx, Ux] = ILU0(X), etc. Indeed, the D-ILU preconditioner (with 3N blocks) is quite efficient with Krylov subspace methods like BiCGstab or QMR in my sequential Fortran/Matlab code. 
> 
> So like most users, I am looking for a parallel implement with this problem in PETSc. After looking through the manual, it seems that the most straightforward way to do it is through PCBJACOBI. Not sure I understand it right, I just setup a 3-block PCJACOBI and give each of the block a KSP with PCILU. Is this supposed to be equivalent to my D-ILU preconditioner? My little block of fortran code would look like: 
> ...
>       call PCBJacobiSetTotalBlocks(pc_local,Ntotal,                   &
>      &     isubs,ierr)
>       call PCBJacobiSetLocalBlocks(pc_local, Nsub,                    &
>      &    isubs(istart:iend),ierr)
>       ! set up the block jacobi structure
>       call KSPSetup(ksp_local,ierr)
>       ! allocate sub ksps
>       allocate(ksp_sub(Nsub))
>       call PCBJacobiGetSubKSP(pc_local,Nsub,istart,                   &
>      &     ksp_sub,ierr)
>       do i=1,Nsub
>           call KSPGetPC(ksp_sub(i),pc_sub,ierr)
>           !ILU preconditioner
>           call PCSetType(pc_sub,ptype,ierr)
>           call PCFactorSetLevels(pc_sub,1,ierr) ! use ILU(1) here
>           call KSPSetType(ksp_sub(i),KSPPREONLY,ierr)]
>       end do
>       call KSPSetTolerances(ksp_local,KSSiter%tol,PETSC_DEFAULT_REAL, &
>      &     PETSC_DEFAULT_REAL,KSSiter%maxit,ierr)
> ? 

     This code looks essentially right. You should call with -ksp_view to see exactly what PETSc is using for a solver. 

> 
> I understand that the parallel performance may not be comparable, so I first set up a one-process test (with MPIAij, but all the rows are local since there is only one process). The system is solved without any problem (identical results within error). But the performance is actually a lot worse (code built without debugging flags in performance tests) than my own home-brew implementation in Fortran (I wrote my own ILU0 in CSR sparse matrix format), which is hard to believe. I suspect the difference is from the PC as the PETSc version took much more BiCGstab iterations (60-ish vs 100-ish) to converge to the same relative tol. 

   PETSc uses GMRES by default with a restart of 30 and left preconditioning. It could be different exact choices in the solver (which is why -ksp_view is so useful) can explain the differences in the runs between your code and PETSc's
> 
> This is further confirmed when I change the setup of D-ILU (using 6 or 9 blocks instead of 3). While my Fortran/Matlab codes see minimal performance difference (<5%) when I play with the D-ILU setup, increasing the number of D-ILU blocks from 3 to 9 caused the ksp setup with PCBJACOBI to suffer a performance decrease of more than 50% in sequential test.

   This is odd, the more blocks the smaller each block so the quicker the ILU set up should be. You can run various cases with -log_view and send us the output to see what is happening at each part of the computation in time.
 
> So my implementation IS somewhat different in PETSc. Do I miss something in the PCBJACOBI setup? Or do I have some fundamental misunderstanding of how PCBJACOBI works in PETSc? 

   Probably not.
> 
> If this is not the right way to implement a block diagonal ILU as (parallel) PC, please kindly point me to the right direction. I searched through the mail list to find some answers, only to find a couple of similar questions... An example would be nice.

   You approach seems fundamentally right but I cannot be sure of possible bugs.
> 
> On the other hand, does PETSc support a simple way to use explicit L/U matrix as a preconditioner? I can import the  D-ILU matrix (I already converted my A matrix into Mat) constructed in my Fortran code to make a better comparison. Or do I have to construct my own PC using PCshell? If so, is there a good tutorial/example to learn about how to use PCSHELL (in a more advanced sense, like how to setup pc side and transpose)? 

   Not sure what you mean by explicit L/U matrix as a preconditioner. As Hong said, yes you can use a parallel LU from MUMPS or SuperLU_DIST or Pastix as the solver. You do not need any shell code. You simply need to set the PCType to lu 

   You can also set all this options from the command line and don't need to write the code specifically. So call KSPSetFromOptions() and then for example

    -pc_type bjacobi  -pc_bjacobi_local_blocks 3 -pc_sub_type ilu (this last one is applied to each block so you could use -pc_type lu and it would use lu on each block.) 

   -ksp_type_none  -pc_type lu -pc_factor_mat_solver_type mumps  (do parallel LU with mumps)

By not hardwiring in the code and just using options you can test out different cases much quicker

Use -ksp_view to make sure that is using the solver the way you expect.

Barry



   Barry

> 
> Thanks in advance, 
> 
> Hao


From aan2 at princeton.edu  Tue Feb  4 17:07:36 2020
From: aan2 at princeton.edu (Olek Niewiarowski)
Date: Tue, 4 Feb 2020 23:07:36 +0000
Subject: [petsc-users] Implementing the Sherman Morisson formula (low rank
 update) in petsc4py and FEniCS?
Message-ID: <MN2PR04MB6957EBDB8DBE35C947805DC08A030@MN2PR04MB6957.namprd04.prod.outlook.com>

Hello,
I am a FEniCS user but new to petsc4py. I am trying to modify the KSP solver through the SNES object to implement the Sherman-Morrison formula (e.g.  http://fourier.eng.hmc.edu/e176/lectures/algebra/node6.html ). I am solving a nonlinear system of the form [K(u)?kaaT]?u=?F(u). Here the jacobian matrix K is modified by the term kaaT, where k is a scalar.  Notably, K is sparse, while the term kaaT results in a full matrix. This problem can be solved efficiently using the Sherman-Morrison formula :

[K?kaaT]-1 = K-1  - (kK-1 aaTK-1)/(1+kaTK-1a)
I have managed to successfully implement this at the linear solve level (by modifying the KSP solver) inside a custom Newton solver in python by following an incomplete tutorial at https://www.firedrakeproject.org/petsc-interface.html#defining-a-preconditioner :
  *               while (norm(delU) > alpha):  # while not converged
  *
  *                   self.update_F()  # call to method to update r.h.s form
  *                   self.update_K()  # call to update the jacobian form
  *                   K = assemble(self.K)  # assemble the jacobian matrix
  *                   F = assemble(self.F)  # assemble the r.h.s vector
  *                   a = assemble(self.a_form)  # assemble the a_form (see Sherman Morrison formula)
  *
  *                   for bc in self.mem.bc:  # apply boundary conditions
  *                       bc.apply(K, F)
  *                       bc.apply(K, a)
  *
  *                   B = PETSc.Mat().create()
  *
  *                   # Assemble the bilinear form that defines A and get the concrete
  *                   # PETSc matrix
  *                   A = as_backend_type(K).mat()  # get the PETSc objects for K and a
  *                   u = as_backend_type(a).vec()
  *
  *                   # Build the matrix "context"  # see firedrake docs
  *                   Bctx = MatrixFreeB(A, u, u, self.k)
  *
  *                   # Set up B
  *                   # B is the same size as A
  *                   B.setSizes(*A.getSizes())
  *                   B.setType(B.Type.PYTHON)
  *                   B.setPythonContext(Bctx)
  *                   B.setUp()
  *
  *
  *                   ksp = PETSc.KSP().create()   # create the KSP linear solver object
  *                   ksp.setOperators(B)
  *                   ksp.setUp()
  *                   pc = ksp.pc
  *                   pc.setType(pc.Type.PYTHON)
  *                   pc.setPythonContext(MatrixFreePC())
  *                   ksp.setFromOptions()
  *
  *                   solution = delU    # the incremental displacement at this iteration
  *
  *                   b = as_backend_type(-F).vec()
  *                   delu = solution.vector().vec()
  *
  *                   ksp.solve(b, delu)
  *
  *                   self.mem.u.vector().axpy(0.25, self.delU.vector())  # poor man's linesearch
  *                   counter += 1
Here is the corresponding petsc4py code adapted from the firedrake docs:


  1.  class MatrixFreePC(object):
  2.
  3.      def setUp(self, pc):
  4.          B, P = pc.getOperators()
  5.          # extract the MatrixFreeB object from B
  6.          ctx = B.getPythonContext()
  7.          self.A = ctx.A
  8.          self.u = ctx.u
  9.          self.v = ctx.v
  10.         self.k = ctx.k
  11.         # Here we build the PC object that uses the concrete,
  12.         # assembled matrix A.  We will use this to apply the action
  13.         # of A^{-1}
  14.         self.pc = PETSc.PC().create()
  15.         self.pc.setOptionsPrefix("mf_")
  16.         self.pc.setOperators(self.A)
  17.         self.pc.setFromOptions()
  18.         # Since u and v do not change, we can build the denominator
  19.         # and the action of A^{-1} on u only once, in the setup
  20.         # phase.
  21.         tmp = self.A.createVecLeft()
  22.         self.pc.apply(self.u, tmp)
  23.         self._Ainvu = tmp
  24.         self._denom = 1 + self.k*self.v.dot(self._Ainvu)
  25.
  26.     def apply(self, pc, x, y):
  27.         # y <- A^{-1}x
  28.         self.pc.apply(x, y)
  29.         # alpha <- (v^T A^{-1} x) / (1 + v^T A^{-1} u)
  30.         alpha = (self.k*self.v.dot(y)) / self._denom
  31.         # y <- y - alpha * A^{-1}u
  32.         y.axpy(-alpha, self._Ainvu)
  33.
  34.
  35. class MatrixFreeB(object):
  36.
  37.     def __init__(self, A, u, v, k):
  38.         self.A = A
  39.         self.u = u
  40.         self.v = v
  41.         self.k = k
  42.
  43.     def mult(self, mat, x, y):
  44.         # y <- A x
  45.         self.A.mult(x, y)
  46.
  47.         # alpha <- v^T x
  48.         alpha = self.v.dot(x)
  49.
  50.         # y <- y + alpha*u
  51.         y.axpy(alpha, self.u)

However, this approach is not efficient as it requires many iterations due to the Newton step being fixed, so I would like to implement it using SNES and use line search. Unfortunately, I have not been able to find any documentation/tutorial on how to do so. Provided I have the FEniCS forms for F, K, and a, I'd like to do something along the lines of:
solver  = PETScSNESSolver() # the FEniCS SNES wrapper
snes = solver.snes()  # the petsc4py SNES object
## ??
ksp = snes.getKSP()
 # set ksp option similar to above
solver.solve()

I would be very grateful if anyone could could help or point me to a reference or demo that does something similar (or maybe a completely different way of solving the problem!).
Many thanks in advance!
Alex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200204/30cb6f80/attachment-0001.html>

From sajidsyed2021 at u.northwestern.edu  Tue Feb  4 18:30:46 2020
From: sajidsyed2021 at u.northwestern.edu (Sajid Ali)
Date: Tue, 4 Feb 2020 18:30:46 -0600
Subject: [petsc-users] Required structure and attrs for MatLoad from hdf5
Message-ID: <CAOGsD9hDeKjqtAFHq5hoJa7f22tC=ToDJuO7HQY58H2cvpbs1g@mail.gmail.com>

Hi PETSc-developers,

The example src/mat/examples/tutorials/ex10.c shows how one would read a
matrix from a hdf5 file. Since MatView isn?t implemented for hdf5_mat
format, how is the hdf5 file (to be used to run ex10) generated ?

I tried reading from an hdf5 file but I saw an error stating object 'jc'
doesn't exist and thus would like to know how I should store a sparse
matrix in an hdf5 file so that MatLoad works.

PS: I?m guessing that MATLAB stores the matrix in the format that PETSc
expects (group/dset/attrs) but I?m creating this from Python. If the
recommended approach is to transfer numpy arrays to PETSc matrices via
petsc4py, I?d switch to that instead of directly creating hdf5 files.

Thank You,
Sajid Ali | PhD Candidate
Applied Physics
Northwestern University
s-sajid-ali.github.io
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200204/d65f521d/attachment.html>

From bsmith at mcs.anl.gov  Tue Feb  4 19:56:34 2020
From: bsmith at mcs.anl.gov (Smith, Barry F.)
Date: Wed, 5 Feb 2020 01:56:34 +0000
Subject: [petsc-users] Required structure and attrs for MatLoad from hdf5
In-Reply-To: <CAOGsD9hDeKjqtAFHq5hoJa7f22tC=ToDJuO7HQY58H2cvpbs1g@mail.gmail.com>
References: <CAOGsD9hDeKjqtAFHq5hoJa7f22tC=ToDJuO7HQY58H2cvpbs1g@mail.gmail.com>
Message-ID: <0B9AE74A-7239-4CCF-8A64-A4CFB46B1370@anl.gov>


  I think this is a Python-Matlab question, not specifically related to PETSc in any way. Googling python matrix hdf5 matlab there are mentions of h5py library that can be used to write out sparse matrices in Matlab HDF5 format. Which could presumably be read by PETSc. PETSc can also read in the non-transposed version which presumably can also be written out with h5py

  https://www.loc.gov/preservation/digital/formats/fdd/fdd000440.shtml gives some indication that an indication of whether the transpose is stored in the file might exist, if so and it is used by Matlab we wouldn't need the special matlab format flag.

   Barry


> On Feb 4, 2020, at 6:30 PM, Sajid Ali <sajidsyed2021 at u.northwestern.edu> wrote:
> 
> Hi PETSc-developers,
> 
> The example src/mat/examples/tutorials/ex10.c shows how one would read a matrix from a hdf5 file. Since MatView isn?t implemented for hdf5_mat format, how is the hdf5 file (to be used to run ex10) generated ?
> 
> I tried reading from an hdf5 file but I saw an error stating object 'jc' doesn't exist and thus would like to know how I should store a sparse matrix in an hdf5 file so that MatLoad works.
> 
> PS: I?m guessing that MATLAB stores the matrix in the format that PETSc expects (group/dset/attrs) but I?m creating this from Python. If the recommended approach is to transfer numpy arrays to PETSc matrices via petsc4py, I?d switch to that instead of directly creating hdf5 files.
> 
> Thank You,
> Sajid Ali | PhD Candidate
> Applied Physics
> Northwestern University
> s-sajid-ali.github.io
> 


From bsmith at mcs.anl.gov  Wed Feb  5 00:35:58 2020
From: bsmith at mcs.anl.gov (Smith, Barry F.)
Date: Wed, 5 Feb 2020 06:35:58 +0000
Subject: [petsc-users] Implementing the Sherman Morisson formula (low
 rank update) in petsc4py and FEniCS?
In-Reply-To: <MN2PR04MB6957EBDB8DBE35C947805DC08A030@MN2PR04MB6957.namprd04.prod.outlook.com>
References: <MN2PR04MB6957EBDB8DBE35C947805DC08A030@MN2PR04MB6957.namprd04.prod.outlook.com>
Message-ID: <C5580EE8-E585-4570-9266-073A860AB975@anl.gov>


   I am not sure of everything in your email but it sounds like you want to use a "Picard" iteration to solve [K(u)?kaaT]?u=?F(u). That is solve

  A(u^{n}) (u^{n+1} - u^{n}) = F(u^{n}) - A(u^{n})u^{n}  where A(u) = K(u) - kaaT

 PETSc provides code to this with SNESSetPicard() (see the manual pages) I don't know if Petsc4py has bindings for this.

  Adding missing python bindings is not terribly difficult and you may be able to do it yourself if this is the approach you want.

   Barry

   

> On Feb 4, 2020, at 5:07 PM, Olek Niewiarowski <aan2 at princeton.edu> wrote:
> 
> Hello, 
> I am a FEniCS user but new to petsc4py. I am trying to modify the KSP solver through the SNES object to implement the Sherman-Morrison formula(e.g.  http://fourier.eng.hmc.edu/e176/lectures/algebra/node6.html ). I am solving a nonlinear system of the form [K(u)?kaaT]?u=?F(u). Here the jacobian matrix K is modified by the term kaaT, where k is a scalar.  Notably, K is sparse, while the term kaaT results in a full matrix. This problem can be solved efficiently using the Sherman-Morrison formula : 
> 
> [K?kaaT]-1 = K-1  - (kK-1 aaTK-1)/(1+kaTK-1a)
> I have managed to successfully implement this at the linear solve level (by modifying the KSP solver) inside a custom Newton solver in python by following an incomplete tutorial at https://www.firedrakeproject.org/petsc-interface.html#defining-a-preconditioner :
> ?             while (norm(delU) > alpha):  # while not converged
> ?   
> ?                 self.update_F()  # call to method to update r.h.s form
> ?                 self.update_K()  # call to update the jacobian form
> ?                 K = assemble(self.K)  # assemble the jacobian matrix
> ?                 F = assemble(self.F)  # assemble the r.h.s vector
> ?                 a = assemble(self.a_form)  # assemble the a_form (see Sherman Morrison formula)
> ?   
> ?                 for bc in self.mem.bc:  # apply boundary conditions
> ?                     bc.apply(K, F)  
> ?                     bc.apply(K, a)  
> ?   
> ?                 B = PETSc.Mat().create()  
> ?   
> ?                 # Assemble the bilinear form that defines A and get the concrete  
> ?                 # PETSc matrix  
> ?                 A = as_backend_type(K).mat()  # get the PETSc objects for K and a
> ?                 u = as_backend_type(a).vec()  
> ?   
> ?                 # Build the matrix "context"  # see firedrake docs
> ?                 Bctx = MatrixFreeB(A, u, u, self.k)  
> ?   
> ?                 # Set up B  
> ?                 # B is the same size as A  
> ?                 B.setSizes(*A.getSizes())  
> ?                 B.setType(B.Type.PYTHON)  
> ?                 B.setPythonContext(Bctx)  
> ?                 B.setUp()  
> ?   
> ?   
> ?                 ksp = PETSc.KSP().create()   # create the KSP linear solver object
> ?                 ksp.setOperators(B)  
> ?                 ksp.setUp()  
> ?                 pc = ksp.pc  
> ?                 pc.setType(pc.Type.PYTHON)  
> ?                 pc.setPythonContext(MatrixFreePC())  
> ?                 ksp.setFromOptions()  
> ?   
> ?                 solution = delU    # the incremental displacement at this iteration
> ?   
> ?                 b = as_backend_type(-F).vec()  
> ?                 delu = solution.vector().vec()  
> ?   
> ?                 ksp.solve(b, delu) 
> 
> ?                 self.mem.u.vector().axpy(0.25, self.delU.vector())  # poor man's linesearch
> ?                 counter += 1  
> Here is the corresponding petsc4py code adapted from the firedrake docs:
> 
> 	? class MatrixFreePC(object):  
> 	?   
> 	?     def setUp(self, pc):  
> 	?         B, P = pc.getOperators()  
> 	?         # extract the MatrixFreeB object from B  
> 	?         ctx = B.getPythonContext()  
> 	?         self.A = ctx.A  
> 	?         self.u = ctx.u  
> 	?         self.v = ctx.v  
> 	?         self.k = ctx.k  
> 	?         # Here we build the PC object that uses the concrete,  
> 	?         # assembled matrix A.  We will use this to apply the action  
> 	?         # of A^{-1}  
> 	?         self.pc = PETSc.PC().create()  
> 	?         self.pc.setOptionsPrefix("mf_")  
> 	?         self.pc.setOperators(self.A)  
> 	?         self.pc.setFromOptions()  
> 	?         # Since u and v do not change, we can build the denominator  
> 	?         # and the action of A^{-1} on u only once, in the setup  
> 	?         # phase.  
> 	?         tmp = self.A.createVecLeft()  
> 	?         self.pc.apply(self.u, tmp)  
> 	?         self._Ainvu = tmp  
> 	?         self._denom = 1 + self.k*self.v.dot(self._Ainvu)  
> 	?   
> 	?     def apply(self, pc, x, y):  
> 	?         # y <- A^{-1}x  
> 	?         self.pc.apply(x, y)  
> 	?         # alpha <- (v^T A^{-1} x) / (1 + v^T A^{-1} u)  
> 	?         alpha = (self.k*self.v.dot(y)) / self._denom  
> 	?         # y <- y - alpha * A^{-1}u  
> 	?         y.axpy(-alpha, self._Ainvu)  
> 	?   
> 	?   
> 	? class MatrixFreeB(object):  
> 	?   
> 	?     def __init__(self, A, u, v, k):  
> 	?         self.A = A  
> 	?         self.u = u  
> 	?         self.v = v  
> 	?         self.k = k  
> 	?   
> 	?     def mult(self, mat, x, y):  
> 	?         # y <- A x  
> 	?         self.A.mult(x, y)  
> 	?   
> 	?         # alpha <- v^T x  
> 	?         alpha = self.v.dot(x)  
> 	?   
> 	?         # y <- y + alpha*u  
> 	?         y.axpy(alpha, self.u)  
> However, this approach is not efficient as it requires many iterations due to the Newton step being fixed, so I would like to implement it using SNES and use line search. Unfortunately, I have not been able to find any documentation/tutorial on how to do so. Provided I have the FEniCS forms for F, K, and a, I'd like to do something along the lines of:
> solver  = PETScSNESSolver() # the FEniCS SNES wrapper
> snes = solver.snes()  # the petsc4py SNES object
> ## ??
> ksp = snes.getKSP()
>  # set ksp option similar to above
> solver.solve()
> 
> I would be very grateful if anyone could could help or point me to a reference or demo that does something similar (or maybe a completely different way of solving the problem!). 
> Many thanks in advance!
> Alex


From dong-hao at outlook.com  Wed Feb  5 04:36:26 2020
From: dong-hao at outlook.com (Hao DONG)
Date: Wed, 5 Feb 2020 10:36:26 +0000
Subject: [petsc-users] What is the right way to implement a (block)
 Diagonal ILU as PC?
In-Reply-To: <264F91C4-8558-4850-9B4B-ABE4123C2A2C@anl.gov>
References: <MN2PR07MB6239B134B7BCE032E8ACC4C995030@MN2PR07MB6239.namprd07.prod.outlook.com>,
	<264F91C4-8558-4850-9B4B-ABE4123C2A2C@anl.gov>
Message-ID: <MN2PR07MB6239E7694FFB455C533991FB95020@MN2PR07MB6239.namprd07.prod.outlook.com>

Thanks a lot for your suggestions, Hong and Barry -

As you suggested, I first tried the LU direct solvers (built-in and MUMPs) out this morning, which work perfectly, albeit slow. As my problem itself is a part of a PDE based optimization, the exact solution in the searching procedure is not necessary (I often set a relative tolerance of 1E-7/1E-8, etc. for Krylov subspace methods). Also tried BJACOBI with exact LU, the KSP just converges in one or two iterations, as expected.

I added -kspview option for the D-ILU test (still with Block Jacobi as preconditioner and bcgs as KSP solver). The KSPview output from one of the examples in a toy model looks like:

KSP Object: 1 MPI processes
   type: bcgs
   maximum iterations=120, nonzero initial guess
   tolerances:  relative=1e-07, absolute=1e-50, divergence=10000.
   left preconditioning
   using PRECONDITIONED norm type for convergence test
 PC Object: 1 MPI processes
   type: bjacobi
     number of blocks = 3
     Local solve is same for all blocks, in the following KSP and PC objects:
     KSP Object: (sub_) 1 MPI processes
       type: preonly
       maximum iterations=10000, initial guess is zero
       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
       left preconditioning
       using NONE norm type for convergence test
     PC Object: (sub_) 1 MPI processes
       type: ilu
         out-of-place factorization
         0 levels of fill
         tolerance for zero pivot 2.22045e-14
         matrix ordering: natural
         factor fill ratio given 1., needed 1.
           Factored matrix follows:
             Mat Object: 1 MPI processes
               type: seqaij
               rows=11294, cols=11294
               package used to perform factorization: petsc
               total: nonzeros=76008, allocated nonzeros=76008
               total number of mallocs used during MatSetValues calls=0
                 not using I-node routines
       linear system matrix = precond matrix:
       Mat Object: 1 MPI processes
         type: seqaij
         rows=11294, cols=11294
         total: nonzeros=76008, allocated nonzeros=76008
         total number of mallocs used during MatSetValues calls=0
           not using I-node routines
   linear system matrix = precond matrix:
   Mat Object: 1 MPI processes
     type: mpiaij
     rows=33880, cols=33880
     total: nonzeros=436968, allocated nonzeros=436968
     total number of mallocs used during MatSetValues calls=0
       not using I-node (on process 0) routines

do you see something wrong with my setup?

I also tried a quick performance test with a small 278906 by 278906 matrix (3850990 nnz) with the following parameters:

-ksp_type bcgs -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type ilu -ksp_view

Reducing the relative residual to 1E-7

Took 4.08s with 41 bcgs iterations.

Merely changing the -pc_bjacobi_local_blocks to 6

Took 7.02s with 73 bcgs iterations. 9 blocks would further take 9.45s with 101 bcgs iterations.

As a reference, my home-brew Fortran code solves the same problem (3-block D-ILU0) in

1.84s with 24 bcgs iterations (the bcgs code is also a home-brew one)?



Well, by saying ?use explicit L/U matrix as preconditioner?, I wonder if a user is allowed to provide his own (separate) L and U Mat for preconditioning (like how it works in Matlab solvers), e.g.

x = qmr(A,b,Tol,MaxIter,L,U,x)

As I already explicitly constructed the L and U matrices in Fortran, it is not hard to convert them to Mat in Petsc to test Petsc and my Fortran code head-to-head. In that case, the A, b, x, and L/U are all identical, it would be easier to see where the problem came from.



BTW, there is another thing I wondered - is there a way to output residual in unpreconditioned norm? I tried to

call KSPSetNormType(ksp_local, KSP_NORM_UNPRECONDITIONED, ierr)

But always get an error that current ksp does not support unpreconditioned in LEFT/RIGHT (either way). Is it possible to do that (output unpreconditioned residual) in PETSc at all?

Cheers,
Hao


________________________________
From: Smith, Barry F. <bsmith at mcs.anl.gov>
Sent: Tuesday, February 4, 2020 8:27 PM
To: Hao DONG <dong-hao at outlook.com>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC?



> On Feb 4, 2020, at 12:41 PM, Hao DONG <dong-hao at outlook.com> wrote:
>
> Dear all,
>
>
> I have a few questions about the implementation of diagonal ILU PC in PETSc. I want to solve a very simple system with KSP (in parallel), the nature of the system (finite difference time-harmonic Maxwell) is probably not important to the question itself. Long story short, I just need to solve a set of equations of Ax = b with a block diagonal system matrix, like (not sure if the mono font works):
>
>    |X    |
> A =|  Y  |
>    |    Z|
>
> Note that A is not really block-diagonal, it?s just a multi-diagonal matrix determined by the FD mesh, where most elements are close to diagonal. So instead of a full ILU decomposition, a D-ILU is a good approximation as a preconditioner, and the number of blocks should not matter too much:
>
>     |Lx      |         |Ux      |
> L = |   Ly   | and U = |   Uy   |
>     |      Lz|         |      Uz|
>
> Where [Lx, Ux] = ILU0(X), etc. Indeed, the D-ILU preconditioner (with 3N blocks) is quite efficient with Krylov subspace methods like BiCGstab or QMR in my sequential Fortran/Matlab code.
>
> So like most users, I am looking for a parallel implement with this problem in PETSc. After looking through the manual, it seems that the most straightforward way to do it is through PCBJACOBI. Not sure I understand it right, I just setup a 3-block PCJACOBI and give each of the block a KSP with PCILU. Is this supposed to be equivalent to my D-ILU preconditioner? My little block of fortran code would look like:
> ...
>       call PCBJacobiSetTotalBlocks(pc_local,Ntotal,                   &
>      &     isubs,ierr)
>       call PCBJacobiSetLocalBlocks(pc_local, Nsub,                    &
>      &    isubs(istart:iend),ierr)
>       ! set up the block jacobi structure
>       call KSPSetup(ksp_local,ierr)
>       ! allocate sub ksps
>       allocate(ksp_sub(Nsub))
>       call PCBJacobiGetSubKSP(pc_local,Nsub,istart,                   &
>      &     ksp_sub,ierr)
>       do i=1,Nsub
>           call KSPGetPC(ksp_sub(i),pc_sub,ierr)
>           !ILU preconditioner
>           call PCSetType(pc_sub,ptype,ierr)
>           call PCFactorSetLevels(pc_sub,1,ierr) ! use ILU(1) here
>           call KSPSetType(ksp_sub(i),KSPPREONLY,ierr)]
>       end do
>       call KSPSetTolerances(ksp_local,KSSiter%tol,PETSC_DEFAULT_REAL, &
>      &     PETSC_DEFAULT_REAL,KSSiter%maxit,ierr)
> ?

     This code looks essentially right. You should call with -ksp_view to see exactly what PETSc is using for a solver.

>
> I understand that the parallel performance may not be comparable, so I first set up a one-process test (with MPIAij, but all the rows are local since there is only one process). The system is solved without any problem (identical results within error). But the performance is actually a lot worse (code built without debugging flags in performance tests) than my own home-brew implementation in Fortran (I wrote my own ILU0 in CSR sparse matrix format), which is hard to believe. I suspect the difference is from the PC as the PETSc version took much more BiCGstab iterations (60-ish vs 100-ish) to converge to the same relative tol.

   PETSc uses GMRES by default with a restart of 30 and left preconditioning. It could be different exact choices in the solver (which is why -ksp_view is so useful) can explain the differences in the runs between your code and PETSc's
>
> This is further confirmed when I change the setup of D-ILU (using 6 or 9 blocks instead of 3). While my Fortran/Matlab codes see minimal performance difference (<5%) when I play with the D-ILU setup, increasing the number of D-ILU blocks from 3 to 9 caused the ksp setup with PCBJACOBI to suffer a performance decrease of more than 50% in sequential test.

   This is odd, the more blocks the smaller each block so the quicker the ILU set up should be. You can run various cases with -log_view and send us the output to see what is happening at each part of the computation in time.

> So my implementation IS somewhat different in PETSc. Do I miss something in the PCBJACOBI setup? Or do I have some fundamental misunderstanding of how PCBJACOBI works in PETSc?

   Probably not.
>
> If this is not the right way to implement a block diagonal ILU as (parallel) PC, please kindly point me to the right direction. I searched through the mail list to find some answers, only to find a couple of similar questions... An example would be nice.

   You approach seems fundamentally right but I cannot be sure of possible bugs.
>
> On the other hand, does PETSc support a simple way to use explicit L/U matrix as a preconditioner? I can import the  D-ILU matrix (I already converted my A matrix into Mat) constructed in my Fortran code to make a better comparison. Or do I have to construct my own PC using PCshell? If so, is there a good tutorial/example to learn about how to use PCSHELL (in a more advanced sense, like how to setup pc side and transpose)?

   Not sure what you mean by explicit L/U matrix as a preconditioner. As Hong said, yes you can use a parallel LU from MUMPS or SuperLU_DIST or Pastix as the solver. You do not need any shell code. You simply need to set the PCType to lu

   You can also set all this options from the command line and don't need to write the code specifically. So call KSPSetFromOptions() and then for example

    -pc_type bjacobi  -pc_bjacobi_local_blocks 3 -pc_sub_type ilu (this last one is applied to each block so you could use -pc_type lu and it would use lu on each block.)

   -ksp_type_none  -pc_type lu -pc_factor_mat_solver_type mumps  (do parallel LU with mumps)

By not hardwiring in the code and just using options you can test out different cases much quicker

Use -ksp_view to make sure that is using the solver the way you expect.

Barry



   Barry

>
> Thanks in advance,
>
> Hao

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200205/9e428a73/attachment-0001.html>

From knepley at gmail.com  Wed Feb  5 07:35:27 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 5 Feb 2020 08:35:27 -0500
Subject: [petsc-users] Implementing the Sherman Morisson formula (low
 rank update) in petsc4py and FEniCS?
In-Reply-To: <C5580EE8-E585-4570-9266-073A860AB975@anl.gov>
References: <MN2PR04MB6957EBDB8DBE35C947805DC08A030@MN2PR04MB6957.namprd04.prod.outlook.com>
	<C5580EE8-E585-4570-9266-073A860AB975@anl.gov>
Message-ID: <CAMYG4Gm4DZB_Q2iQPKFz7SVu1Q7Cg5mN-pnWNY=szjaNGuGjcw@mail.gmail.com>

Perhaps Barry is right that you want Picard, but suppose you really want
Newton.

"This problem can be solved efficiently using the Sherman-Morrison formula"
Well, maybe. The main assumption here is that inverting K is cheap. I see
two things you can do in a straightforward way:

  1) Use MatCreateLRC()
https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateLRC.html
to create the Jacobian
       and solve using an iterative method. If you pass just K was the
preconditioning matrix, you can use common PCs.

  2) We only implemented MatMult() for LRC, but you could stick your SMW
code in for MatSolve_LRC if you really want to factor K. We would
       of course help you do this.

  Thanks,

     Matt

On Wed, Feb 5, 2020 at 1:36 AM Smith, Barry F. via petsc-users <
petsc-users at mcs.anl.gov> wrote:

>
>    I am not sure of everything in your email but it sounds like you want
> to use a "Picard" iteration to solve [K(u)?kaaT]?u=?F(u). That is solve
>
>   A(u^{n}) (u^{n+1} - u^{n}) = F(u^{n}) - A(u^{n})u^{n}  where A(u) = K(u)
> - kaaT
>
>  PETSc provides code to this with SNESSetPicard() (see the manual pages) I
> don't know if Petsc4py has bindings for this.
>
>   Adding missing python bindings is not terribly difficult and you may be
> able to do it yourself if this is the approach you want.
>
>    Barry
>
>
>
> > On Feb 4, 2020, at 5:07 PM, Olek Niewiarowski <aan2 at princeton.edu>
> wrote:
> >
> > Hello,
> > I am a FEniCS user but new to petsc4py. I am trying to modify the KSP
> solver through the SNES object to implement the Sherman-Morrison
> formula(e.g.  http://fourier.eng.hmc.edu/e176/lectures/algebra/node6.html
> ). I am solving a nonlinear system of the form [K(u)?kaaT]?u=?F(u). Here
> the jacobian matrix K is modified by the term kaaT, where k is a scalar.
> Notably, K is sparse, while the term kaaT results in a full matrix. This
> problem can be solved efficiently using the Sherman-Morrison formula :
> >
> > [K?kaaT]-1 = K-1  - (kK-1 aaTK-1)/(1+kaTK-1a)
> > I have managed to successfully implement this at the linear solve level
> (by modifying the KSP solver) inside a custom Newton solver in python by
> following an incomplete tutorial at
> https://www.firedrakeproject.org/petsc-interface.html#defining-a-preconditioner
> :
> > ?             while (norm(delU) > alpha):  # while not converged
> > ?
> > ?                 self.update_F()  # call to method to update r.h.s form
> > ?                 self.update_K()  # call to update the jacobian form
> > ?                 K = assemble(self.K)  # assemble the jacobian matrix
> > ?                 F = assemble(self.F)  # assemble the r.h.s vector
> > ?                 a = assemble(self.a_form)  # assemble the a_form (see
> Sherman Morrison formula)
> > ?
> > ?                 for bc in self.mem.bc:  # apply boundary conditions
> > ?                     bc.apply(K, F)
> > ?                     bc.apply(K, a)
> > ?
> > ?                 B = PETSc.Mat().create()
> > ?
> > ?                 # Assemble the bilinear form that defines A and get
> the concrete
> > ?                 # PETSc matrix
> > ?                 A = as_backend_type(K).mat()  # get the PETSc objects
> for K and a
> > ?                 u = as_backend_type(a).vec()
> > ?
> > ?                 # Build the matrix "context"  # see firedrake docs
> > ?                 Bctx = MatrixFreeB(A, u, u, self.k)
> > ?
> > ?                 # Set up B
> > ?                 # B is the same size as A
> > ?                 B.setSizes(*A.getSizes())
> > ?                 B.setType(B.Type.PYTHON)
> > ?                 B.setPythonContext(Bctx)
> > ?                 B.setUp()
> > ?
> > ?
> > ?                 ksp = PETSc.KSP().create()   # create the KSP linear
> solver object
> > ?                 ksp.setOperators(B)
> > ?                 ksp.setUp()
> > ?                 pc = ksp.pc
> > ?                 pc.setType(pc.Type.PYTHON)
> > ?                 pc.setPythonContext(MatrixFreePC())
> > ?                 ksp.setFromOptions()
> > ?
> > ?                 solution = delU    # the incremental displacement at
> this iteration
> > ?
> > ?                 b = as_backend_type(-F).vec()
> > ?                 delu = solution.vector().vec()
> > ?
> > ?                 ksp.solve(b, delu)
> >
> > ?                 self.mem.u.vector().axpy(0.25, self.delU.vector())  #
> poor man's linesearch
> > ?                 counter += 1
> > Here is the corresponding petsc4py code adapted from the firedrake docs:
> >
> >       ? class MatrixFreePC(object):
> >       ?
> >       ?     def setUp(self, pc):
> >       ?         B, P = pc.getOperators()
> >       ?         # extract the MatrixFreeB object from B
> >       ?         ctx = B.getPythonContext()
> >       ?         self.A = ctx.A
> >       ?         self.u = ctx.u
> >       ?         self.v = ctx.v
> >       ?         self.k = ctx.k
> >       ?         # Here we build the PC object that uses the concrete,
> >       ?         # assembled matrix A.  We will use this to apply the
> action
> >       ?         # of A^{-1}
> >       ?         self.pc = PETSc.PC().create()
> >       ?         self.pc.setOptionsPrefix("mf_")
> >       ?         self.pc.setOperators(self.A)
> >       ?         self.pc.setFromOptions()
> >       ?         # Since u and v do not change, we can build the
> denominator
> >       ?         # and the action of A^{-1} on u only once, in the setup
> >       ?         # phase.
> >       ?         tmp = self.A.createVecLeft()
> >       ?         self.pc.apply(self.u, tmp)
> >       ?         self._Ainvu = tmp
> >       ?         self._denom = 1 + self.k*self.v.dot(self._Ainvu)
> >       ?
> >       ?     def apply(self, pc, x, y):
> >       ?         # y <- A^{-1}x
> >       ?         self.pc.apply(x, y)
> >       ?         # alpha <- (v^T A^{-1} x) / (1 + v^T A^{-1} u)
> >       ?         alpha = (self.k*self.v.dot(y)) / self._denom
> >       ?         # y <- y - alpha * A^{-1}u
> >       ?         y.axpy(-alpha, self._Ainvu)
> >       ?
> >       ?
> >       ? class MatrixFreeB(object):
> >       ?
> >       ?     def __init__(self, A, u, v, k):
> >       ?         self.A = A
> >       ?         self.u = u
> >       ?         self.v = v
> >       ?         self.k = k
> >       ?
> >       ?     def mult(self, mat, x, y):
> >       ?         # y <- A x
> >       ?         self.A.mult(x, y)
> >       ?
> >       ?         # alpha <- v^T x
> >       ?         alpha = self.v.dot(x)
> >       ?
> >       ?         # y <- y + alpha*u
> >       ?         y.axpy(alpha, self.u)
> > However, this approach is not efficient as it requires many iterations
> due to the Newton step being fixed, so I would like to implement it using
> SNES and use line search. Unfortunately, I have not been able to find any
> documentation/tutorial on how to do so. Provided I have the FEniCS forms
> for F, K, and a, I'd like to do something along the lines of:
> > solver  = PETScSNESSolver() # the FEniCS SNES wrapper
> > snes = solver.snes()  # the petsc4py SNES object
> > ## ??
> > ksp = snes.getKSP()
> >  # set ksp option similar to above
> > solver.solve()
> >
> > I would be very grateful if anyone could could help or point me to a
> reference or demo that does something similar (or maybe a completely
> different way of solving the problem!).
> > Many thanks in advance!
> > Alex
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200205/ae49894c/attachment.html>

From dmitry.melnichuk at geosteertech.com  Wed Feb  5 09:03:35 2020
From: dmitry.melnichuk at geosteertech.com (=?utf-8?B?0JTQvNC40YLRgNC40Lkg0JzQtdC70YzQvdC40YfRg9C6?=)
Date: Wed, 05 Feb 2020 18:03:35 +0300
Subject: [petsc-users] Triple increasing of allocated memory during
 KSPSolve calling(GMRES preconditioned by ASM)
In-Reply-To: <72B6812D-4131-41A6-9A30-878D2F9058D8@mcs.anl.gov>
References: <9444561580744337@myt2-b8bf7a4d4ebc.qloud-c.yandex.net>
	<CAMYG4GmU=9hctrsUveBYEHH1m_D4_02WQ=4HaiRCGEkuuq0GmQ@mail.gmail.com>
	<86DE2602-DCBD-47C4-819C-223F9CF9A503@mcs.anl.gov>
	<14855941580817898@vla4-d1c3bcedfacb.qloud-c.yandex.net>
	<72B6812D-4131-41A6-9A30-878D2F9058D8@mcs.anl.gov>
Message-ID: <18751580915015@vla4-4046ec513d04.qloud-c.yandex.net>

An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200205/ac159271/attachment-0001.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ksp_view.txt
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200205/ac159271/attachment-0004.txt>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PDE.JPG
Type: image/jpeg
Size: 108816 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200205/ac159271/attachment-0001.jpe>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Zheng_poroelasticity.pdf
Type: application/pdf
Size: 3466224 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200205/ac159271/attachment-0001.pdf>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: full_log_ASM_factor_in_place.txt
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200205/ac159271/attachment-0005.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Error_gmres_sor.txt
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200205/ac159271/attachment-0006.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Error_ilu_pc_factor.txt
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200205/ac159271/attachment-0007.txt>

From knepley at gmail.com  Wed Feb  5 09:46:29 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 5 Feb 2020 10:46:29 -0500
Subject: [petsc-users] Triple increasing of allocated memory during
 KSPSolve calling(GMRES preconditioned by ASM)
In-Reply-To: <18751580915015@vla4-4046ec513d04.qloud-c.yandex.net>
References: <9444561580744337@myt2-b8bf7a4d4ebc.qloud-c.yandex.net>
	<CAMYG4GmU=9hctrsUveBYEHH1m_D4_02WQ=4HaiRCGEkuuq0GmQ@mail.gmail.com>
	<86DE2602-DCBD-47C4-819C-223F9CF9A503@mcs.anl.gov>
	<14855941580817898@vla4-d1c3bcedfacb.qloud-c.yandex.net>
	<72B6812D-4131-41A6-9A30-878D2F9058D8@mcs.anl.gov>
	<18751580915015@vla4-4046ec513d04.qloud-c.yandex.net>
Message-ID: <CAMYG4Gmf=Wt19TVBzkQDf4pRrWyYPXCqG3YM-FefKA8qEuSwpw@mail.gmail.com>

On Wed, Feb 5, 2020 at 10:04 AM ??????? ????????? <
dmitry.melnichuk at geosteertech.com> wrote:

> Barry, appreciate your response, as always.
>
> - You are saying that I am using ASM + ILU(0). However, I use PETSc only
> with "ASM" as the input parameter for preconditioner. Does it mean that
> ILU(0) is default sub-preconditioner for ASM?
>

Yes.


> Can I change it using the option "-sub_pc_type"?
>

Yes

Does it make sense to you within the scope of my general goal, which is
> memory consumption decrease? Can it be useful to vary the "-sub_ksp_type"
> option?
>

Yes. For example, try measuring the memory usage with -sub_pc_type jacobi


> - I have run the computation for the same initial matrix with the
> "-sub_pc_factor_in_place" option, PC = ASM. Now the process consumed ~400
> MB comparing to 550 MB without this option.
> I used "-ksp_view" for this computation, two logs for this computation are
> attached:
> "*ksp_view.txt"  - *ksp_view option only
> *"full_log_ASM_factor_in_place.txt"* - full log without ksp_view option
>
> - Then I changed primary preconditioner from ASM to ILU(0) and ran the
> computation: memory consumption was again about ~400 MB, no matter if I use
> the "-sub_pc_factor_in_place" option.
>
> - Then I tried to run the computation with ILU(0) and
> "-pc_factor_in_place", just in case: the computation did not start, I got
> an error message, the log is attached:* "Error_ilu_pc_factor.txt"*
>
> - Then I ran the computation with SOR as a preconditioner. PETSc gave me
> an error message, the log is attached: *"Error_gmres_sor.txt"*
>
> - As for the kind of PDEs: I am solving the standard poroelasticity
> problem, the formulation can be found in the attached paper
> (Zheng_poroelasticity.pdf), pages 2-3.
> The file PDE.jpg is a snapshot of PDEs from this paper.
>

Proelasticity is elliptic (the kind that I am familiar with), so I would at
least try Algebraic Multigrid, either GAMG, or ML, or Hypre (probably try
all of them).

  Thanks,

     Matt


>
> So, if you may give me any further advice on how to decrease the consumed
> amount of memory to approximately the matrix size (~200 MB in this case),
> it will be great. Do I need to focus on searching a proper preconditioner?
> BTW, the single ILU(0) did not give me any memory advantage comparing to
> ASM with "-sub_pc_factor_in_place".
>
> Have a pleasant day!
>
> Kind regards,
> Dmitry
>
>
>
> 04.02.2020, 19:04, "Smith, Barry F." <bsmith at mcs.anl.gov>:
>
>
>    Please run with the option -ksp_view so we know the exact solver
> options you are using.
>
>    From the lines
>
> MatCreateSubMats 1 1.0 1.9397e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00
> 0 0 0 0 0 0 0 0 0 0 0
> MatGetOrdering 1 1.0 1.1066e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0
> 0 0 0 0 0 0 0 0 0 0
> MatIncreaseOvrlp 1 1.0 3.0324e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00
> 0 0 0 0 0 0 0 0 0 0 0
>
>    and the fact you have three matrices I would guess you are using the
> additive Schwarz preconditioner (ASM) with ILU(0) on the blocks. (which
> converges the same as ILU on one process but does use much more memory).
>
>    Note: your code is still built with 32 bit integers.
>
>    I would guess the basic matrix formed plus the vectors in this example
> could take ~200 MB. It is the two matrices in the additive Schwarz that is
> taking the additional memory.
>
>    What kind of PDEs are you solving and what kind of formulation?
>
>    ASM plus ILU is the "work mans" type preconditioner, relatively robust
> but not particularly fast for convergence. Depending on your problem you
> might be able to do much better convergence wise by using a PCFIELDSPLIT
> and a PCGAMG on one of the splits. In your own run you see the ILU is
> chugging along rather slowly to the solution.
>
>    With your current solvers you can use the option
> -sub_pc_factor_in_place which will shave off one of the matrices memories.
> Please try that.
>
>    Avoiding the ASM you can avoid both extra matrices but at the cost of
> even slower convergence. Use, for example -pc_type sor
>
>
>     The petroleum industry also has a variety of "custom"
> preconditioners/solvers for particular models and formulations that can
> beat the convergence of general purpose solvers; and require less memory.
> Some of these can be implemented or simulated with PETSc. Some of these are
> implemented in the commercial petroleum simulation codes and it can be
> difficult to get a handle on exactly what they do because of proprietary
> issues. I think I have an old text on these approaches in my office, there
> may be modern books that discuss these.
>
>
>    Barry
>
>
>
>
>  On Feb 4, 2020, at 6:04 AM, ??????? ????????? <
> dmitry.melnichuk at geosteertech.com> wrote:
>
>  Hello again!
>  Thank you very much for your replies!
>  Log is attached.
>
>  1. The main problem now is following. To solve the matrix that is
> attached to my previous e-mail PETSc consumes ~550 MB.
>  I know definitely that there are commercial softwares in petroleum
> industry (e.g., Schlumberger Petrel) that solve the same initial problem
> consuming only ~200 MB.
>  Moreover, I am sure that when I used 32-bit PETSc (GMRES + ASM) a year
> ago, it also consumed ~200 MB for this matrix.
>
>  So, my question is: do you have any advice on how to decrease the amount
> of RAM consumed for such matrix from 550 MB to 200 MB? Maybe some specific
> preconditioner or other ways?
>
>  I will be very grateful for any thoughts!
>
>  2. The second problem is more particular.
>  According to resource manager in Windows 10, Fortran solver based on
> PETSc consumes 548 MB RAM while solving the system of linear equations.
>  As I understand it form logs, it is required 459 MB and 52 MB for matrix
> and vector storage respectively. After summing of all objects for which
> memory is allocated we get only 517 MB.
>
>  Thank you again for your time! Have a nice day.
>
>  Kind regards,
>  Dmitry
>
>
>  03.02.2020, 19:55, "Smith, Barry F." <bsmith at mcs.anl.gov>:
>
>     GMRES also can by default require about 35 work vectors if it reaches
> the full restart. You can set a smaller restart with -ksp_gmres_restart 15
> for example but this can also hurt the convergence of GMRES dramatically.
> People sometimes use the KSPBCGS algorithm since it does not require all
> the restart vectors but it can also converge more slowly.
>
>      Depending on how much memory the sparse matrices use relative to the
> vectors the vector memory may matter or not.
>
>     If you are using a recent version of PETSc you can run with -log_view
> -log_view_memory and it will show on the right side of the columns how much
> memory is being allocated for each of the operations in various ways.
>
>     Barry
>
>
>
>   On Feb 3, 2020, at 10:34 AM, Matthew Knepley <knepley at gmail.com> wrote:
>
>   On Mon, Feb 3, 2020 at 10:38 AM ??????? ????????? <
> dmitry.melnichuk at geosteertech.com> wrote:
>   Hello all!
>
>   Now I am faced with a problem associated with the memory allocation when
> calling of KSPSolve .
>
>   GMRES preconditioned by ASM for solving linear algebra system (obtained
> by the finite element spatial discretisation of Biot poroelasticity model)
> was chosen.
>   According to the output value of PetscMallocGetCurrentUsage subroutine
> 176 MB for matrix and RHS vector storage is required (before KSPSolve
> calling).
>   But during solving linear algebra system 543 MB of RAM is required
> (during KSPSolve calling).
>   Thus, the amount of allocated memory after preconditioning stage
> increased three times. This kind of behaviour is critically for 3D models
> with several millions of cells.
>
>   1) In order to know anything, we have to see the output of -ksp_view,
> although I see you used an overlap of 2
>
>   2) The overlap increases the size of submatrices beyond that of the
> original matrix. Suppose that you used LU for the sub-preconditioner.
>       You would need at least 2x memory (with ILU(0)) since the matrix
> dominates memory usage. Moreover, you have overlap
>       and you might have fill-in depending on the solver.
>
>   3) The massif tool from valgrind is a good fine-grained way to look at
> memory allocation
>
>     Thanks,
>
>        Matt
>
>   Is there a way to decrease amout of allocated memory?
>   Is that an expected behaviour for GMRES-ASM combination?
>
>   As I remember, using previous version of PETSc didn't demonstrate so
> significante memory increasing.
>
>   ...
>   Vec :: Vec_F, Vec_U
>   Mat :: Mat_K
>   ...
>   ...
>   call MatAssemblyBegin(Mat_M,Mat_Final_Assembly,ierr)
>   call MatAssemblyEnd(Mat_M,Mat_Final_Assembly,ierr)
>   ....
>   call VecAssemblyBegin(Vec_F_mod,ierr)
>   call VecAssemblyEnd(Vec_F_mod,ierr)
>   ...
>   ...
>   call PetscMallocGetCurrentUsage(mem, ierr)
>   print *,"Memory used: ",mem
>   ...
>   ...
>   call KSPSetType(Krylov,KSPGMRES,ierr)
>   call KSPGetPC(Krylov,PreCon,ierr)
>   call PCSetType(PreCon,PCASM,ierr)
>   call KSPSetFromOptions(Krylov,ierr)
>   ...
>   call KSPSolve(Krylov,Vec_F,Vec_U,ierr)
>   ...
>   ...
>   options = "-pc_asm_overlap 2 -pc_asm_type basic -ksp_monitor
> -ksp_converged_reason"
>
>
>   Kind regards,
>   Dmitry Melnichuk
>   Matrix.dat (265288024)
>
>
>   --
>   What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
>   -- Norbert Wiener
>
>   https://www.cse.buffalo.edu/~knepley/
>
>
>  <Logs_26K_GMRES-ASM-log_view-log_view_memory-malloc_dump_32bit>
>
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200205/8d737f4e/attachment.html>

From bsmith at mcs.anl.gov  Wed Feb  5 14:16:32 2020
From: bsmith at mcs.anl.gov (Smith, Barry F.)
Date: Wed, 5 Feb 2020 20:16:32 +0000
Subject: [petsc-users] Triple increasing of allocated memory during
 KSPSolve calling(GMRES preconditioned by ASM)
In-Reply-To: <18751580915015@vla4-4046ec513d04.qloud-c.yandex.net>
References: <9444561580744337@myt2-b8bf7a4d4ebc.qloud-c.yandex.net>
	<CAMYG4GmU=9hctrsUveBYEHH1m_D4_02WQ=4HaiRCGEkuuq0GmQ@mail.gmail.com>
	<86DE2602-DCBD-47C4-819C-223F9CF9A503@mcs.anl.gov>
	<14855941580817898@vla4-d1c3bcedfacb.qloud-c.yandex.net>
	<72B6812D-4131-41A6-9A30-878D2F9058D8@mcs.anl.gov>
	<18751580915015@vla4-4046ec513d04.qloud-c.yandex.net>
Message-ID: <6E2C0944-0849-461D-AA7C-A8120A2FCCD3@mcs.anl.gov>



> On Feb 5, 2020, at 9:03 AM, ??????? ????????? <dmitry.melnichuk at geosteertech.com> wrote:
> 
> Barry, appreciate your response, as always.
>  
> - You are saying that I am using ASM + ILU(0). However, I use PETSc only with "ASM" as the input parameter for preconditioner. Does it mean that ILU(0) is default sub-preconditioner for ASM?

   Yes

> Can I change it using the option "-sub_pc_type"?

  Yes -sub_pc_type for   then it will use SOR on each block instead of ILU saves a matrix.

> Does it make sense to you within the scope of my general goal, which is memory consumption decrease? Can it be useful to vary the "-sub_ksp_type" option?

   Probably not.
>  
> - I have run the computation for the same initial matrix with the "-sub_pc_factor_in_place" option, PC = ASM. Now the process consumed ~400 MB comparing to 550 MB without this option.

   This is what I expected, good.

> I used "-ksp_view" for this computation, two logs for this computation are attached:
> "ksp_view.txt"  - ksp_view option only
> "full_log_ASM_factor_in_place.txt" - full log without ksp_view option
>  
> - Then I changed primary preconditioner from ASM to ILU(0) and ran the computation: memory consumption was again about ~400 MB, no matter if I use the "-sub_pc_factor_in_place" option.
>  
> - Then I tried to run the computation with ILU(0) and "-pc_factor_in_place", just in case: the computation did not start, I got an error message, the log is attached: "Error_ilu_pc_factor.txt"

   Since that matrix is used for the MatMuilt you cannot do the factorization in  place since it replaces the original matrix entries with the factorization matrix entries


>  
> - Then I ran the computation with SOR as a preconditioner. PETSc gave me an error message, the log is attached: "Error_gmres_sor.txt"

   This is because our SOR cannot handle zeros on the diagonal.

>  
> - As for the kind of PDEs: I am solving the standard poroelasticity problem, the formulation can be found in the attached paper (Zheng_poroelasticity.pdf), pages 2-3.
> The file PDE.jpg is a snapshot of PDEs from this paper.
>  
>  
> So, if you may give me any further advice on how to decrease the consumed amount of memory to approximately the matrix size (~200 MB in this case), it will be great. Do I need to focus on searching a proper preconditioner? BTW, the single ILU(0) did not give me any memory advantage comparing to ASM with "-sub_pc_factor_in_place".

   Yes, because in both cases you need two copies of the matrix, for the multiple and for the ILU. But you want a preconditioner that doesn't require any new matrices in the preconditioner. This is difficult. You want an efficient preconditioner that requires essentially no additional memory?

   -ksp_type gmres  or bcgs -pc_type jacobi   (the sor won't work because the zero diagonals)   It will not be good preconditioner. Are you sure you don't have additional memory for the preconditioner? A good preconditioner  might require up to 5  to 6 the memory of the original matrix.


>  
> Have a pleasant day!
>  
> Kind regards,
> Dmitry
>  
>  
>  
> 04.02.2020, 19:04, "Smith, Barry F." <bsmith at mcs.anl.gov>:
> 
>    Please run with the option -ksp_view so we know the exact solver options you are using.
> 
>    From the lines
> 
> MatCreateSubMats 1 1.0 1.9397e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetOrdering 1 1.0 1.1066e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatIncreaseOvrlp 1 1.0 3.0324e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> 
>    and the fact you have three matrices I would guess you are using the additive Schwarz preconditioner (ASM) with ILU(0) on the blocks. (which converges the same as ILU on one process but does use much more memory).
> 
>    Note: your code is still built with 32 bit integers.
> 
>    I would guess the basic matrix formed plus the vectors in this example could take ~200 MB. It is the two matrices in the additive Schwarz that is taking the additional memory.
> 
>    What kind of PDEs are you solving and what kind of formulation?
> 
>    ASM plus ILU is the "work mans" type preconditioner, relatively robust but not particularly fast for convergence. Depending on your problem you might be able to do much better convergence wise by using a PCFIELDSPLIT and a PCGAMG on one of the splits. In your own run you see the ILU is chugging along rather slowly to the solution.
> 
>    With your current solvers you can use the option -sub_pc_factor_in_place which will shave off one of the matrices memories. Please try that.
> 
>    Avoiding the ASM you can avoid both extra matrices but at the cost of even slower convergence. Use, for example -pc_type sor
> 
> 
>     The petroleum industry also has a variety of "custom" preconditioners/solvers for particular models and formulations that can beat the convergence of general purpose solvers; and require less memory. Some of these can be implemented or simulated with PETSc. Some of these are implemented in the commercial petroleum simulation codes and it can be difficult to get a handle on exactly what they do because of proprietary issues. I think I have an old text on these approaches in my office, there may be modern books that discuss these.
> 
> 
>    Barry
> 
> 
>  
> 
>  On Feb 4, 2020, at 6:04 AM, ??????? ????????? <dmitry.melnichuk at geosteertech.com> wrote:
> 
>  Hello again!
>  Thank you very much for your replies!
>  Log is attached.
> 
>  1. The main problem now is following. To solve the matrix that is attached to my previous e-mail PETSc consumes ~550 MB.
>  I know definitely that there are commercial softwares in petroleum industry (e.g., Schlumberger Petrel) that solve the same initial problem consuming only ~200 MB.
>  Moreover, I am sure that when I used 32-bit PETSc (GMRES + ASM) a year ago, it also consumed ~200 MB for this matrix.
> 
>  So, my question is: do you have any advice on how to decrease the amount of RAM consumed for such matrix from 550 MB to 200 MB? Maybe some specific preconditioner or other ways?
> 
>  I will be very grateful for any thoughts!
> 
>  2. The second problem is more particular.
>  According to resource manager in Windows 10, Fortran solver based on PETSc consumes 548 MB RAM while solving the system of linear equations.
>  As I understand it form logs, it is required 459 MB and 52 MB for matrix and vector storage respectively. After summing of all objects for which memory is allocated we get only 517 MB.
> 
>  Thank you again for your time! Have a nice day.
> 
>  Kind regards,
>  Dmitry
> 
> 
>  03.02.2020, 19:55, "Smith, Barry F." <bsmith at mcs.anl.gov>:
> 
>     GMRES also can by default require about 35 work vectors if it reaches the full restart. You can set a smaller restart with -ksp_gmres_restart 15 for example but this can also hurt the convergence of GMRES dramatically. People sometimes use the KSPBCGS algorithm since it does not require all the restart vectors but it can also converge more slowly.
> 
>      Depending on how much memory the sparse matrices use relative to the vectors the vector memory may matter or not.
> 
>     If you are using a recent version of PETSc you can run with -log_view -log_view_memory and it will show on the right side of the columns how much memory is being allocated for each of the operations in various ways.
> 
>     Barry
> 
> 
> 
>   On Feb 3, 2020, at 10:34 AM, Matthew Knepley <knepley at gmail.com> wrote:
> 
>   On Mon, Feb 3, 2020 at 10:38 AM ??????? ????????? <dmitry.melnichuk at geosteertech.com> wrote:
>   Hello all!
> 
>   Now I am faced with a problem associated with the memory allocation when calling of KSPSolve .
> 
>   GMRES preconditioned by ASM for solving linear algebra system (obtained by the finite element spatial discretisation of Biot poroelasticity model) was chosen.
>   According to the output value of PetscMallocGetCurrentUsage subroutine 176 MB for matrix and RHS vector storage is required (before KSPSolve calling).
>   But during solving linear algebra system 543 MB of RAM is required (during KSPSolve calling).
>   Thus, the amount of allocated memory after preconditioning stage increased three times. This kind of behaviour is critically for 3D models with several millions of cells.
> 
>   1) In order to know anything, we have to see the output of -ksp_view, although I see you used an overlap of 2
> 
>   2) The overlap increases the size of submatrices beyond that of the original matrix. Suppose that you used LU for the sub-preconditioner.
>       You would need at least 2x memory (with ILU(0)) since the matrix dominates memory usage. Moreover, you have overlap
>       and you might have fill-in depending on the solver.
> 
>   3) The massif tool from valgrind is a good fine-grained way to look at memory allocation
> 
>     Thanks,
> 
>        Matt
> 
>   Is there a way to decrease amout of allocated memory?
>   Is that an expected behaviour for GMRES-ASM combination?
> 
>   As I remember, using previous version of PETSc didn't demonstrate so significante memory increasing.
> 
>   ...
>   Vec :: Vec_F, Vec_U
>   Mat :: Mat_K
>   ...
>   ...
>   call MatAssemblyBegin(Mat_M,Mat_Final_Assembly,ierr)
>   call MatAssemblyEnd(Mat_M,Mat_Final_Assembly,ierr)
>   ....
>   call VecAssemblyBegin(Vec_F_mod,ierr)
>   call VecAssemblyEnd(Vec_F_mod,ierr)
>   ...
>   ...
>   call PetscMallocGetCurrentUsage(mem, ierr)
>   print *,"Memory used: ",mem
>   ...
>   ...
>   call KSPSetType(Krylov,KSPGMRES,ierr)
>   call KSPGetPC(Krylov,PreCon,ierr)
>   call PCSetType(PreCon,PCASM,ierr)
>   call KSPSetFromOptions(Krylov,ierr)
>   ...
>   call KSPSolve(Krylov,Vec_F,Vec_U,ierr)
>   ...
>   ...
>   options = "-pc_asm_overlap 2 -pc_asm_type basic -ksp_monitor -ksp_converged_reason"
> 
> 
>   Kind regards,
>   Dmitry Melnichuk
>   Matrix.dat (265288024)
> 
> 
>   --
>   What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>   -- Norbert Wiener
> 
>   https://www.cse.buffalo.edu/~knepley/
> 
> 
>  <Logs_26K_GMRES-ASM-log_view-log_view_memory-malloc_dump_32bit>
>  
> <ksp_view.txt><PDE.JPG><Zheng_poroelasticity.pdf><full_log_ASM_factor_in_place.txt><Error_gmres_sor.txt><Error_ilu_pc_factor.txt>


From bsmith at mcs.anl.gov  Wed Feb  5 14:46:43 2020
From: bsmith at mcs.anl.gov (Smith, Barry F.)
Date: Wed, 5 Feb 2020 20:46:43 +0000
Subject: [petsc-users] Implementing the Sherman Morisson formula (low
 rank update) in petsc4py and FEniCS?
In-Reply-To: <CAMYG4Gm4DZB_Q2iQPKFz7SVu1Q7Cg5mN-pnWNY=szjaNGuGjcw@mail.gmail.com>
References: <MN2PR04MB6957EBDB8DBE35C947805DC08A030@MN2PR04MB6957.namprd04.prod.outlook.com>
	<C5580EE8-E585-4570-9266-073A860AB975@anl.gov>
	<CAMYG4Gm4DZB_Q2iQPKFz7SVu1Q7Cg5mN-pnWNY=szjaNGuGjcw@mail.gmail.com>
Message-ID: <20E8B18C-F71E-4B10-958B-6CD24DA869A3@mcs.anl.gov>


https://gitlab.com/petsc/petsc/issues/557


> On Feb 5, 2020, at 7:35 AM, Matthew Knepley <knepley at gmail.com> wrote:
> 
> Perhaps Barry is right that you want Picard, but suppose you really want Newton.
> 
> "This problem can be solved efficiently using the Sherman-Morrison formula" Well, maybe. The main assumption here is that inverting K is cheap. I see two things you can do in a straightforward way:
> 
>   1) Use MatCreateLRC() https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateLRC.html to create the Jacobian
>        and solve using an iterative method. If you pass just K was the preconditioning matrix, you can use common PCs.
> 
>   2) We only implemented MatMult() for LRC, but you could stick your SMW code in for MatSolve_LRC if you really want to factor K. We would
>        of course help you do this.
> 
>   Thanks,
> 
>      Matt
> 
> On Wed, Feb 5, 2020 at 1:36 AM Smith, Barry F. via petsc-users <petsc-users at mcs.anl.gov> wrote:
> 
>    I am not sure of everything in your email but it sounds like you want to use a "Picard" iteration to solve [K(u)?kaaT]?u=?F(u). That is solve
> 
>   A(u^{n}) (u^{n+1} - u^{n}) = F(u^{n}) - A(u^{n})u^{n}  where A(u) = K(u) - kaaT
> 
>  PETSc provides code to this with SNESSetPicard() (see the manual pages) I don't know if Petsc4py has bindings for this.
> 
>   Adding missing python bindings is not terribly difficult and you may be able to do it yourself if this is the approach you want.
> 
>    Barry
> 
> 
> 
> > On Feb 4, 2020, at 5:07 PM, Olek Niewiarowski <aan2 at princeton.edu> wrote:
> > 
> > Hello, 
> > I am a FEniCS user but new to petsc4py. I am trying to modify the KSP solver through the SNES object to implement the Sherman-Morrison formula(e.g.  http://fourier.eng.hmc.edu/e176/lectures/algebra/node6.html ). I am solving a nonlinear system of the form [K(u)?kaaT]?u=?F(u). Here the jacobian matrix K is modified by the term kaaT, where k is a scalar.  Notably, K is sparse, while the term kaaT results in a full matrix. This problem can be solved efficiently using the Sherman-Morrison formula : 
> > 
> > [K?kaaT]-1 = K-1  - (kK-1 aaTK-1)/(1+kaTK-1a)
> > I have managed to successfully implement this at the linear solve level (by modifying the KSP solver) inside a custom Newton solver in python by following an incomplete tutorial at https://www.firedrakeproject.org/petsc-interface.html#defining-a-preconditioner :
> > ?             while (norm(delU) > alpha):  # while not converged
> > ?   
> > ?                 self.update_F()  # call to method to update r.h.s form
> > ?                 self.update_K()  # call to update the jacobian form
> > ?                 K = assemble(self.K)  # assemble the jacobian matrix
> > ?                 F = assemble(self.F)  # assemble the r.h.s vector
> > ?                 a = assemble(self.a_form)  # assemble the a_form (see Sherman Morrison formula)
> > ?   
> > ?                 for bc in self.mem.bc:  # apply boundary conditions
> > ?                     bc.apply(K, F)  
> > ?                     bc.apply(K, a)  
> > ?   
> > ?                 B = PETSc.Mat().create()  
> > ?   
> > ?                 # Assemble the bilinear form that defines A and get the concrete  
> > ?                 # PETSc matrix  
> > ?                 A = as_backend_type(K).mat()  # get the PETSc objects for K and a
> > ?                 u = as_backend_type(a).vec()  
> > ?   
> > ?                 # Build the matrix "context"  # see firedrake docs
> > ?                 Bctx = MatrixFreeB(A, u, u, self.k)  
> > ?   
> > ?                 # Set up B  
> > ?                 # B is the same size as A  
> > ?                 B.setSizes(*A.getSizes())  
> > ?                 B.setType(B.Type.PYTHON)  
> > ?                 B.setPythonContext(Bctx)  
> > ?                 B.setUp()  
> > ?   
> > ?   
> > ?                 ksp = PETSc.KSP().create()   # create the KSP linear solver object
> > ?                 ksp.setOperators(B)  
> > ?                 ksp.setUp()  
> > ?                 pc = ksp.pc  
> > ?                 pc.setType(pc.Type.PYTHON)  
> > ?                 pc.setPythonContext(MatrixFreePC())  
> > ?                 ksp.setFromOptions()  
> > ?   
> > ?                 solution = delU    # the incremental displacement at this iteration
> > ?   
> > ?                 b = as_backend_type(-F).vec()  
> > ?                 delu = solution.vector().vec()  
> > ?   
> > ?                 ksp.solve(b, delu) 
> > 
> > ?                 self.mem.u.vector().axpy(0.25, self.delU.vector())  # poor man's linesearch
> > ?                 counter += 1  
> > Here is the corresponding petsc4py code adapted from the firedrake docs:
> > 
> >       ? class MatrixFreePC(object):  
> >       ?   
> >       ?     def setUp(self, pc):  
> >       ?         B, P = pc.getOperators()  
> >       ?         # extract the MatrixFreeB object from B  
> >       ?         ctx = B.getPythonContext()  
> >       ?         self.A = ctx.A  
> >       ?         self.u = ctx.u  
> >       ?         self.v = ctx.v  
> >       ?         self.k = ctx.k  
> >       ?         # Here we build the PC object that uses the concrete,  
> >       ?         # assembled matrix A.  We will use this to apply the action  
> >       ?         # of A^{-1}  
> >       ?         self.pc = PETSc.PC().create()  
> >       ?         self.pc.setOptionsPrefix("mf_")  
> >       ?         self.pc.setOperators(self.A)  
> >       ?         self.pc.setFromOptions()  
> >       ?         # Since u and v do not change, we can build the denominator  
> >       ?         # and the action of A^{-1} on u only once, in the setup  
> >       ?         # phase.  
> >       ?         tmp = self.A.createVecLeft()  
> >       ?         self.pc.apply(self.u, tmp)  
> >       ?         self._Ainvu = tmp  
> >       ?         self._denom = 1 + self.k*self.v.dot(self._Ainvu)  
> >       ?   
> >       ?     def apply(self, pc, x, y):  
> >       ?         # y <- A^{-1}x  
> >       ?         self.pc.apply(x, y)  
> >       ?         # alpha <- (v^T A^{-1} x) / (1 + v^T A^{-1} u)  
> >       ?         alpha = (self.k*self.v.dot(y)) / self._denom  
> >       ?         # y <- y - alpha * A^{-1}u  
> >       ?         y.axpy(-alpha, self._Ainvu)  
> >       ?   
> >       ?   
> >       ? class MatrixFreeB(object):  
> >       ?   
> >       ?     def __init__(self, A, u, v, k):  
> >       ?         self.A = A  
> >       ?         self.u = u  
> >       ?         self.v = v  
> >       ?         self.k = k  
> >       ?   
> >       ?     def mult(self, mat, x, y):  
> >       ?         # y <- A x  
> >       ?         self.A.mult(x, y)  
> >       ?   
> >       ?         # alpha <- v^T x  
> >       ?         alpha = self.v.dot(x)  
> >       ?   
> >       ?         # y <- y + alpha*u  
> >       ?         y.axpy(alpha, self.u)  
> > However, this approach is not efficient as it requires many iterations due to the Newton step being fixed, so I would like to implement it using SNES and use line search. Unfortunately, I have not been able to find any documentation/tutorial on how to do so. Provided I have the FEniCS forms for F, K, and a, I'd like to do something along the lines of:
> > solver  = PETScSNESSolver() # the FEniCS SNES wrapper
> > snes = solver.snes()  # the petsc4py SNES object
> > ## ??
> > ksp = snes.getKSP()
> >  # set ksp option similar to above
> > solver.solve()
> > 
> > I would be very grateful if anyone could could help or point me to a reference or demo that does something similar (or maybe a completely different way of solving the problem!). 
> > Many thanks in advance!
> > Alex
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/


From hong at aspiritech.org  Wed Feb  5 15:26:10 2020
From: hong at aspiritech.org (hong at aspiritech.org)
Date: Wed, 5 Feb 2020 15:26:10 -0600
Subject: [petsc-users] What is the right way to implement a (block)
 Diagonal ILU as PC?
In-Reply-To: <MN2PR07MB6239E7694FFB455C533991FB95020@MN2PR07MB6239.namprd07.prod.outlook.com>
References: <MN2PR07MB6239B134B7BCE032E8ACC4C995030@MN2PR07MB6239.namprd07.prod.outlook.com>
	<264F91C4-8558-4850-9B4B-ABE4123C2A2C@anl.gov>
	<MN2PR07MB6239E7694FFB455C533991FB95020@MN2PR07MB6239.namprd07.prod.outlook.com>
Message-ID: <CAGCphBtX2+SCmickaXbEQANu0fZZh=y4qUJLSGssqnNdWFj2PA@mail.gmail.com>

Hao:
Try '-pc_sub_type lu -ksp_type gmres -ksp_monitor'
Hong

Thanks a lot for your suggestions, Hong and Barry -
>
> As you suggested, I first tried the LU direct solvers (built-in and MUMPs)
> out this morning, which work perfectly, albeit slow. As my problem itself
> is a part of a PDE based optimization, the exact solution in the
> searching procedure is not necessary (I often set a relative tolerance of
> 1E-7/1E-8, etc. for Krylov subspace methods). Also tried BJACOBI with exact
> LU, the KSP just converges in one or two iterations, as expected.
>
> I added -kspview option for the D-ILU test (still with Block Jacobi as
> preconditioner and bcgs as KSP solver). The KSPview output from one of the
> examples in a toy model looks like:
>
> KSP Object: 1 MPI processes
>    type: bcgs
>    maximum iterations=120, nonzero initial guess
>    tolerances:  relative=1e-07, absolute=1e-50, divergence=10000.
>    left preconditioning
>    using PRECONDITIONED norm type for convergence test
>  PC Object: 1 MPI processes
>    type: bjacobi
>      number of blocks = 3
>      Local solve is same for all blocks, in the following KSP and PC
> objects:
>      KSP Object: (sub_) 1 MPI processes
>        type: preonly
>        maximum iterations=10000, initial guess is zero
>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>        left preconditioning
>        using NONE norm type for convergence test
>      PC Object: (sub_) 1 MPI processes
>        type: ilu
>          out-of-place factorization
>          0 levels of fill
>          tolerance for zero pivot 2.22045e-14
>          matrix ordering: natural
>          factor fill ratio given 1., needed 1.
>            Factored matrix follows:
>              Mat Object: 1 MPI processes
>                type: seqaij
>                rows=11294, cols=11294
>                package used to perform factorization: petsc
>                total: nonzeros=76008, allocated nonzeros=76008
>                total number of mallocs used during MatSetValues calls=0
>                  not using I-node routines
>        linear system matrix = precond matrix:
>        Mat Object: 1 MPI processes
>          type: seqaij
>          rows=11294, cols=11294
>          total: nonzeros=76008, allocated nonzeros=76008
>          total number of mallocs used during MatSetValues calls=0
>            not using I-node routines
>    linear system matrix = precond matrix:
>    Mat Object: 1 MPI processes
>      type: mpiaij
>      rows=33880, cols=33880
>      total: nonzeros=436968, allocated nonzeros=436968
>      total number of mallocs used during MatSetValues calls=0
>        not using I-node (on process 0) routines
>
> do you see something wrong with my setup?
>
> I also tried a quick performance test with a small 278906 by 278906
> matrix (3850990 nnz) with the following parameters:
>
> -ksp_type bcgs -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type
> ilu -ksp_view
>
> Reducing the relative residual to 1E-7
>
> Took 4.08s with 41 bcgs iterations.
>
> Merely changing the -pc_bjacobi_local_blocks to 6
>
> Took 7.02s with 73 bcgs iterations. 9 blocks would further take 9.45s with
> 101 bcgs iterations.
>
> As a reference, my home-brew Fortran code solves the same problem (3-block
> D-ILU0) in
>
> 1.84s with 24 bcgs iterations (the bcgs code is also a home-brew one)?
>
>
>
> Well, by saying ?use explicit L/U matrix as preconditioner?, I wonder if a
> user is allowed to provide his own (separate) L and U Mat for
> preconditioning (like how it works in Matlab solvers), e.g.
>
> x = qmr(A,b,Tol,MaxIter,L,U,x)
>
> As I already explicitly constructed the L and U matrices in Fortran, it is
> not hard to convert them to Mat in Petsc to test Petsc and my Fortran code
> head-to-head. In that case, the A, b, x, and L/U are all identical, it
> would be easier to see where the problem came from.
>
>
>
> BTW, there is another thing I wondered - is there a way to output residual
> in unpreconditioned norm? I tried to
>
> *call* KSPSetNormType(ksp_local, KSP_NORM_UNPRECONDITIONED, ierr)
>
> But always get an error that current ksp does not support unpreconditioned
> in LEFT/RIGHT (either way). Is it possible to do that (output
> unpreconditioned residual) in PETSc at all?
>
> Cheers,
> Hao
>
>
> ------------------------------
> *From:* Smith, Barry F. <bsmith at mcs.anl.gov>
> *Sent:* Tuesday, February 4, 2020 8:27 PM
> *To:* Hao DONG <dong-hao at outlook.com>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] What is the right way to implement a (block)
> Diagonal ILU as PC?
>
>
>
> > On Feb 4, 2020, at 12:41 PM, Hao DONG <dong-hao at outlook.com> wrote:
> >
> > Dear all,
> >
> >
> > I have a few questions about the implementation of diagonal ILU PC in
> PETSc. I want to solve a very simple system with KSP (in parallel), the
> nature of the system (finite difference time-harmonic Maxwell) is probably
> not important to the question itself. Long story short, I just need to
> solve a set of equations of Ax = b with a block diagonal system matrix,
> like (not sure if the mono font works):
> >
> >    |X    |
> > A =|  Y  |
> >    |    Z|
> >
> > Note that A is not really block-diagonal, it?s just a multi-diagonal
> matrix determined by the FD mesh, where most elements are close to
> diagonal. So instead of a full ILU decomposition, a D-ILU is a good
> approximation as a preconditioner, and the number of blocks should not
> matter too much:
> >
> >     |Lx      |         |Ux      |
> > L = |   Ly   | and U = |   Uy   |
> >     |      Lz|         |      Uz|
> >
> > Where [Lx, Ux] = ILU0(X), etc. Indeed, the D-ILU preconditioner (with 3N
> blocks) is quite efficient with Krylov subspace methods like BiCGstab or
> QMR in my sequential Fortran/Matlab code.
> >
> > So like most users, I am looking for a parallel implement with this
> problem in PETSc. After looking through the manual, it seems that the most
> straightforward way to do it is through PCBJACOBI. Not sure I understand it
> right, I just setup a 3-block PCJACOBI and give each of the block a KSP
> with PCILU. Is this supposed to be equivalent to my D-ILU preconditioner?
> My little block of fortran code would look like:
> > ...
> >       call PCBJacobiSetTotalBlocks(pc_local,Ntotal,                   &
> >      &     isubs,ierr)
> >       call PCBJacobiSetLocalBlocks(pc_local, Nsub,                    &
> >      &    isubs(istart:iend),ierr)
> >       ! set up the block jacobi structure
> >       call KSPSetup(ksp_local,ierr)
> >       ! allocate sub ksps
> >       allocate(ksp_sub(Nsub))
> >       call PCBJacobiGetSubKSP(pc_local,Nsub,istart,                   &
> >      &     ksp_sub,ierr)
> >       do i=1,Nsub
> >           call KSPGetPC(ksp_sub(i),pc_sub,ierr)
> >           !ILU preconditioner
> >           call PCSetType(pc_sub,ptype,ierr)
> >           call PCFactorSetLevels(pc_sub,1,ierr) ! use ILU(1) here
> >           call KSPSetType(ksp_sub(i),KSPPREONLY,ierr)]
> >       end do
> >       call KSPSetTolerances(ksp_local,KSSiter%tol,PETSC_DEFAULT_REAL, &
> >      &     PETSC_DEFAULT_REAL,KSSiter%maxit,ierr)
> > ?
>
>      This code looks essentially right. You should call with -ksp_view to
> see exactly what PETSc is using for a solver.
>
> >
> > I understand that the parallel performance may not be comparable, so I
> first set up a one-process test (with MPIAij, but all the rows are local
> since there is only one process). The system is solved without any problem
> (identical results within error). But the performance is actually a lot
> worse (code built without debugging flags in performance tests) than my own
> home-brew implementation in Fortran (I wrote my own ILU0 in CSR sparse
> matrix format), which is hard to believe. I suspect the difference is from
> the PC as the PETSc version took much more BiCGstab iterations (60-ish vs
> 100-ish) to converge to the same relative tol.
>
>    PETSc uses GMRES by default with a restart of 30 and left
> preconditioning. It could be different exact choices in the solver (which
> is why -ksp_view is so useful) can explain the differences in the runs
> between your code and PETSc's
> >
> > This is further confirmed when I change the setup of D-ILU (using 6 or 9
> blocks instead of 3). While my Fortran/Matlab codes see minimal performance
> difference (<5%) when I play with the D-ILU setup, increasing the number of
> D-ILU blocks from 3 to 9 caused the ksp setup with PCBJACOBI to suffer a
> performance decrease of more than 50% in sequential test.
>
>    This is odd, the more blocks the smaller each block so the quicker the
> ILU set up should be. You can run various cases with -log_view and send us
> the output to see what is happening at each part of the computation in time.
>
> > So my implementation IS somewhat different in PETSc. Do I miss something
> in the PCBJACOBI setup? Or do I have some fundamental misunderstanding of
> how PCBJACOBI works in PETSc?
>
>    Probably not.
> >
> > If this is not the right way to implement a block diagonal ILU as
> (parallel) PC, please kindly point me to the right direction. I searched
> through the mail list to find some answers, only to find a couple of
> similar questions... An example would be nice.
>
>    You approach seems fundamentally right but I cannot be sure of possible
> bugs.
> >
> > On the other hand, does PETSc support a simple way to use explicit L/U
> matrix as a preconditioner? I can import the  D-ILU matrix (I already
> converted my A matrix into Mat) constructed in my Fortran code to make a
> better comparison. Or do I have to construct my own PC using PCshell? If
> so, is there a good tutorial/example to learn about how to use PCSHELL (in
> a more advanced sense, like how to setup pc side and transpose)?
>
>    Not sure what you mean by explicit L/U matrix as a preconditioner. As
> Hong said, yes you can use a parallel LU from MUMPS or SuperLU_DIST or
> Pastix as the solver. You do not need any shell code. You simply need to
> set the PCType to lu
>
>    You can also set all this options from the command line and don't need
> to write the code specifically. So call KSPSetFromOptions() and then for
> example
>
>     -pc_type bjacobi  -pc_bjacobi_local_blocks 3 -pc_sub_type ilu (this
> last one is applied to each block so you could use -pc_type lu and it would
> use lu on each block.)
>
>    -ksp_type_none  -pc_type lu -pc_factor_mat_solver_type mumps  (do
> parallel LU with mumps)
>
> By not hardwiring in the code and just using options you can test out
> different cases much quicker
>
> Use -ksp_view to make sure that is using the solver the way you expect.
>
> Barry
>
>
>
>    Barry
>
> >
> > Thanks in advance,
> >
> > Hao
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200205/b714b2cb/attachment-0001.html>

From bsmith at mcs.anl.gov  Wed Feb  5 15:42:04 2020
From: bsmith at mcs.anl.gov (Smith, Barry F.)
Date: Wed, 5 Feb 2020 21:42:04 +0000
Subject: [petsc-users] What is the right way to implement a (block)
 Diagonal ILU as PC?
In-Reply-To: <MN2PR07MB6239E7694FFB455C533991FB95020@MN2PR07MB6239.namprd07.prod.outlook.com>
References: <MN2PR07MB6239B134B7BCE032E8ACC4C995030@MN2PR07MB6239.namprd07.prod.outlook.com>
	<264F91C4-8558-4850-9B4B-ABE4123C2A2C@anl.gov>
	<MN2PR07MB6239E7694FFB455C533991FB95020@MN2PR07MB6239.namprd07.prod.outlook.com>
Message-ID: <4A373D93-4018-45E0-B805-3ECC528472DD@mcs.anl.gov>



> On Feb 5, 2020, at 4:36 AM, Hao DONG <dong-hao at outlook.com> wrote:
> 
> Thanks a lot for your suggestions, Hong and Barry - 
> 
> As you suggested, I first tried the LU direct solvers (built-in and MUMPs) out this morning, which work perfectly, albeit slow. As my problem itself is a part of a PDE based optimization, the exact solution in the searching procedure is not necessary (I often set a relative tolerance of 1E-7/1E-8, etc. for Krylov subspace methods). Also tried BJACOBI with exact LU, the KSP just converges in one or two iterations, as expected. 
> 
> I added -kspview option for the D-ILU test (still with Block Jacobi as preconditioner and bcgs as KSP solver). The KSPview output from one of the examples in a toy model looks like: 
> 
> KSP Object: 1 MPI processes
>    type: bcgs
>    maximum iterations=120, nonzero initial guess
>    tolerances:  relative=1e-07, absolute=1e-50, divergence=10000.
>    left preconditioning
>    using PRECONDITIONED norm type for convergence test
>  PC Object: 1 MPI processes
>    type: bjacobi
>      number of blocks = 3
>      Local solve is same for all blocks, in the following KSP and PC objects:
>      KSP Object: (sub_) 1 MPI processes
>        type: preonly
>        maximum iterations=10000, initial guess is zero
>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>        left preconditioning
>        using NONE norm type for convergence test
>      PC Object: (sub_) 1 MPI processes
>        type: ilu
>          out-of-place factorization
>          0 levels of fill
>          tolerance for zero pivot 2.22045e-14
>          matrix ordering: natural
>          factor fill ratio given 1., needed 1.
>            Factored matrix follows:
>              Mat Object: 1 MPI processes
>                type: seqaij
>                rows=11294, cols=11294
>                package used to perform factorization: petsc
>                total: nonzeros=76008, allocated nonzeros=76008
>                total number of mallocs used during MatSetValues calls=0
>                  not using I-node routines
>        linear system matrix = precond matrix:
>        Mat Object: 1 MPI processes
>          type: seqaij
>          rows=11294, cols=11294
>          total: nonzeros=76008, allocated nonzeros=76008
>          total number of mallocs used during MatSetValues calls=0
>            not using I-node routines
>    linear system matrix = precond matrix:
>    Mat Object: 1 MPI processes
>      type: mpiaij
>      rows=33880, cols=33880
>      total: nonzeros=436968, allocated nonzeros=436968
>      total number of mallocs used during MatSetValues calls=0
>        not using I-node (on process 0) routines
> 
> do you see something wrong with my setup?
> 
> I also tried a quick performance test with a small 278906 by 278906 matrix (3850990 nnz) with the following parameters: 
> 
> -ksp_type bcgs -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type ilu -ksp_view
> 
> Reducing the relative residual to 1E-7 
> 
> Took 4.08s with 41 bcgs iterations. 
> 
> Merely changing the -pc_bjacobi_local_blocks to 6 
> 
> Took 7.02s with 73 bcgs iterations. 9 blocks would further take 9.45s with 101 bcgs iterations.

    This is normal. more blocks slower convergence
> 
> As a reference, my home-brew Fortran code solves the same problem (3-block D-ILU0) in 
> 
> 1.84s with 24 bcgs iterations (the bcgs code is also a home-brew one)?
> 
    Run the PETSc code with optimization ./configure --with-debugging=0  an run the code with -log_view this will show where the PETSc code is spending the time (send it to use)


> 
> 
> Well, by saying ?use explicit L/U matrix as preconditioner?, I wonder if a user is allowed to provide his own (separate) L and U Mat for preconditioning (like how it works in Matlab solvers), e.g.
> 
> x = qmr(A,b,Tol,MaxIter,L,U,x)
>  
> As I already explicitly constructed the L and U matrices in Fortran, it is not hard to convert them to Mat in Petsc to test Petsc and my Fortran code head-to-head. In that case, the A, b, x, and L/U are all identical, it would be easier to see where the problem came from. 
> 
> 
     No, we don't provide this kind of support
     

> 
> BTW, there is another thing I wondered - is there a way to output residual in unpreconditioned norm? I tried to
> 
> call KSPSetNormType(ksp_local, KSP_NORM_UNPRECONDITIONED, ierr)
> 
> But always get an error that current ksp does not support unpreconditioned in LEFT/RIGHT (either way). Is it possible to do that (output unpreconditioned residual) in PETSc at all?

   -ksp_monitor_true_residual    You can also run GMRES (and some other methods) with right preconditioning, -ksp_pc_side right  then the residual computed is by the algorithm the unpreconditioned residual

   KSPSetNormType sets the norm used in the algorithm, it generally always has to left or right, only a couple algorithms support unpreconditioned directly.

   Barry


> 
> Cheers, 
> Hao
> 
> 
> From: Smith, Barry F. <bsmith at mcs.anl.gov>
> Sent: Tuesday, February 4, 2020 8:27 PM
> To: Hao DONG <dong-hao at outlook.com>
> Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> Subject: Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC?
>  
> 
> 
> > On Feb 4, 2020, at 12:41 PM, Hao DONG <dong-hao at outlook.com> wrote:
> > 
> > Dear all, 
> > 
> > 
> > I have a few questions about the implementation of diagonal ILU PC in PETSc. I want to solve a very simple system with KSP (in parallel), the nature of the system (finite difference time-harmonic Maxwell) is probably not important to the question itself. Long story short, I just need to solve a set of equations of Ax = b with a block diagonal system matrix, like (not sure if the mono font works): 
> > 
> >    |X    |  
> > A =|  Y  |  
> >    |    Z| 
> > 
> > Note that A is not really block-diagonal, it?s just a multi-diagonal matrix determined by the FD mesh, where most elements are close to diagonal. So instead of a full ILU decomposition, a D-ILU is a good approximation as a preconditioner, and the number of blocks should not matter too much: 
> > 
> >     |Lx      |         |Ux      |
> > L = |   Ly   | and U = |   Uy   |
> >     |      Lz|         |      Uz|
> > 
> > Where [Lx, Ux] = ILU0(X), etc. Indeed, the D-ILU preconditioner (with 3N blocks) is quite efficient with Krylov subspace methods like BiCGstab or QMR in my sequential Fortran/Matlab code. 
> > 
> > So like most users, I am looking for a parallel implement with this problem in PETSc. After looking through the manual, it seems that the most straightforward way to do it is through PCBJACOBI. Not sure I understand it right, I just setup a 3-block PCJACOBI and give each of the block a KSP with PCILU. Is this supposed to be equivalent to my D-ILU preconditioner? My little block of fortran code would look like: 
> > ...
> >       call PCBJacobiSetTotalBlocks(pc_local,Ntotal,                   &
> >      &     isubs,ierr)
> >       call PCBJacobiSetLocalBlocks(pc_local, Nsub,                    &
> >      &    isubs(istart:iend),ierr)
> >       ! set up the block jacobi structure
> >       call KSPSetup(ksp_local,ierr)
> >       ! allocate sub ksps
> >       allocate(ksp_sub(Nsub))
> >       call PCBJacobiGetSubKSP(pc_local,Nsub,istart,                   &
> >      &     ksp_sub,ierr)
> >       do i=1,Nsub
> >           call KSPGetPC(ksp_sub(i),pc_sub,ierr)
> >           !ILU preconditioner
> >           call PCSetType(pc_sub,ptype,ierr)
> >           call PCFactorSetLevels(pc_sub,1,ierr) ! use ILU(1) here
> >           call KSPSetType(ksp_sub(i),KSPPREONLY,ierr)]
> >       end do
> >       call KSPSetTolerances(ksp_local,KSSiter%tol,PETSC_DEFAULT_REAL, &
> >      &     PETSC_DEFAULT_REAL,KSSiter%maxit,ierr)
> > ? 
> 
>      This code looks essentially right. You should call with -ksp_view to see exactly what PETSc is using for a solver. 
> 
> > 
> > I understand that the parallel performance may not be comparable, so I first set up a one-process test (with MPIAij, but all the rows are local since there is only one process). The system is solved without any problem (identical results within error). But the performance is actually a lot worse (code built without debugging flags in performance tests) than my own home-brew implementation in Fortran (I wrote my own ILU0 in CSR sparse matrix format), which is hard to believe. I suspect the difference is from the PC as the PETSc version took much more BiCGstab iterations (60-ish vs 100-ish) to converge to the same relative tol. 
> 
>    PETSc uses GMRES by default with a restart of 30 and left preconditioning. It could be different exact choices in the solver (which is why -ksp_view is so useful) can explain the differences in the runs between your code and PETSc's
> > 
> > This is further confirmed when I change the setup of D-ILU (using 6 or 9 blocks instead of 3). While my Fortran/Matlab codes see minimal performance difference (<5%) when I play with the D-ILU setup, increasing the number of D-ILU blocks from 3 to 9 caused the ksp setup with PCBJACOBI to suffer a performance decrease of more than 50% in sequential test.
> 
>    This is odd, the more blocks the smaller each block so the quicker the ILU set up should be. You can run various cases with -log_view and send us the output to see what is happening at each part of the computation in time.
>  
> > So my implementation IS somewhat different in PETSc. Do I miss something in the PCBJACOBI setup? Or do I have some fundamental misunderstanding of how PCBJACOBI works in PETSc? 
> 
>    Probably not.
> > 
> > If this is not the right way to implement a block diagonal ILU as (parallel) PC, please kindly point me to the right direction. I searched through the mail list to find some answers, only to find a couple of similar questions... An example would be nice.
> 
>    You approach seems fundamentally right but I cannot be sure of possible bugs.
> > 
> > On the other hand, does PETSc support a simple way to use explicit L/U matrix as a preconditioner? I can import the  D-ILU matrix (I already converted my A matrix into Mat) constructed in my Fortran code to make a better comparison. Or do I have to construct my own PC using PCshell? If so, is there a good tutorial/example to learn about how to use PCSHELL (in a more advanced sense, like how to setup pc side and transpose)? 
> 
>    Not sure what you mean by explicit L/U matrix as a preconditioner. As Hong said, yes you can use a parallel LU from MUMPS or SuperLU_DIST or Pastix as the solver. You do not need any shell code. You simply need to set the PCType to lu 
> 
>    You can also set all this options from the command line and don't need to write the code specifically. So call KSPSetFromOptions() and then for example
> 
>     -pc_type bjacobi  -pc_bjacobi_local_blocks 3 -pc_sub_type ilu (this last one is applied to each block so you could use -pc_type lu and it would use lu on each block.) 
> 
>    -ksp_type_none  -pc_type lu -pc_factor_mat_solver_type mumps  (do parallel LU with mumps)
> 
> By not hardwiring in the code and just using options you can test out different cases much quicker
> 
> Use -ksp_view to make sure that is using the solver the way you expect.
> 
> Barry
> 
> 
> 
>    Barry
> 
> > 
> > Thanks in advance, 
> > 
> > Hao


From aan2 at princeton.edu  Wed Feb  5 19:53:51 2020
From: aan2 at princeton.edu (Olek Niewiarowski)
Date: Thu, 6 Feb 2020 01:53:51 +0000
Subject: [petsc-users] Implementing the Sherman Morisson formula (low
 rank update) in petsc4py and FEniCS?
In-Reply-To: <20E8B18C-F71E-4B10-958B-6CD24DA869A3@mcs.anl.gov>
References: <MN2PR04MB6957EBDB8DBE35C947805DC08A030@MN2PR04MB6957.namprd04.prod.outlook.com>
	<C5580EE8-E585-4570-9266-073A860AB975@anl.gov>
	<CAMYG4Gm4DZB_Q2iQPKFz7SVu1Q7Cg5mN-pnWNY=szjaNGuGjcw@mail.gmail.com>,
	<20E8B18C-F71E-4B10-958B-6CD24DA869A3@mcs.anl.gov>
Message-ID: <MN2PR04MB6957AFB32FE45626A84C4CFB8A1D0@MN2PR04MB6957.namprd04.prod.outlook.com>

Hi Barry and Matt,

Thank you for your input and for creating a new issue in the repo.
My initial question was more basic (how to configure the SNES's KSP solver as in my first message with a and k), but now I see there's more to the implementation. To reiterate, for my problem's structure, a good solution algorithm (on the algebraic level) is the following "double back-substitution":
For each nonlinear iteration:

  1.  define intermediate vectors u_1 and u_2
  2.  solve Ku_1 =  -F --> u_1 = -K^{-1}F (this solve is cheap, don't actually need K^-1)
  3.  solve Ku_2 = -a --> u_2 = -K^{-1}a (ditto)
  4.  define \beta = 1/(1 + k a^Tu_2)
  5.  \Delta u = u_1 + \beta k u_2^T F u_2
  6.  u = u + \Delta u

I don't need the Jacobian inverse, [K?kaaT]-1 = K-1  - (kK-1 aaTK-1)/(1+kaTK-1a) just the solution ?u = [K?kaaT]-1F = K-1F - (kK-1 aFK-1a)/(1 + kaTK-1a)
= u_1 + beta k u_2^T F u_2  (so I never need to invert K either). (To Matt's point on gitlab, K is a symmetric sparse matrix arising from a bilinear form. ) Also, eventually, I want to have more than one low-rank updates to K, but again, Sherman Morrisson Woodbury should still work.

Being new to PETSc, I don't know if this algorithm directly translates into an efficient numerical solver. (I'm also not sure if Picard iteration would be useful here.) What would it take to set up the KSP solver in SNES like I did below? Is it possible "out of the box"?  I looked at MatCreateLRC() - what would I pass this to? (A pointer to demo/tutorial would be very appreciated.) If there's a better way to go about all of this, I'm open to any and all ideas. My only limitation is that I use petsc4py exclusively since I/future users of my code will not be comfortable with C.

Thanks again for your help!


Alexander (Olek) Niewiarowski
PhD Candidate, Civil & Environmental Engineering
Princeton University, 2020
Cell: +1 (610) 393-2978
________________________________
From: Smith, Barry F. <bsmith at mcs.anl.gov>
Sent: Wednesday, February 5, 2020 15:46
To: Matthew Knepley <knepley at gmail.com>
Cc: Olek Niewiarowski <aan2 at princeton.edu>; petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Implementing the Sherman Morisson formula (low rank update) in petsc4py and FEniCS?


https://gitlab.com/petsc/petsc/issues/557


> On Feb 5, 2020, at 7:35 AM, Matthew Knepley <knepley at gmail.com> wrote:
>
> Perhaps Barry is right that you want Picard, but suppose you really want Newton.
>
> "This problem can be solved efficiently using the Sherman-Morrison formula" Well, maybe. The main assumption here is that inverting K is cheap. I see two things you can do in a straightforward way:
>
>   1) Use MatCreateLRC() https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateLRC.html to create the Jacobian
>        and solve using an iterative method. If you pass just K was the preconditioning matrix, you can use common PCs.
>
>   2) We only implemented MatMult() for LRC, but you could stick your SMW code in for MatSolve_LRC if you really want to factor K. We would
>        of course help you do this.
>
>   Thanks,
>
>      Matt
>
> On Wed, Feb 5, 2020 at 1:36 AM Smith, Barry F. via petsc-users <petsc-users at mcs.anl.gov> wrote:
>
>    I am not sure of everything in your email but it sounds like you want to use a "Picard" iteration to solve [K(u)?kaaT]?u=?F(u). That is solve
>
>   A(u^{n}) (u^{n+1} - u^{n}) = F(u^{n}) - A(u^{n})u^{n}  where A(u) = K(u) - kaaT
>
>  PETSc provides code to this with SNESSetPicard() (see the manual pages) I don't know if Petsc4py has bindings for this.
>
>   Adding missing python bindings is not terribly difficult and you may be able to do it yourself if this is the approach you want.
>
>    Barry
>
>
>
> > On Feb 4, 2020, at 5:07 PM, Olek Niewiarowski <aan2 at princeton.edu> wrote:
> >
> > Hello,
> > I am a FEniCS user but new to petsc4py. I am trying to modify the KSP solver through the SNES object to implement the Sherman-Morrison formula(e.g.  http://fourier.eng.hmc.edu/e176/lectures/algebra/node6.html ). I am solving a nonlinear system of the form [K(u)?kaaT]?u=?F(u). Here the jacobian matrix K is modified by the term kaaT, where k is a scalar.  Notably, K is sparse, while the term kaaT results in a full matrix. This problem can be solved efficiently using the Sherman-Morrison formula :
> >
> > [K?kaaT]-1 = K-1  - (kK-1 aaTK-1)/(1+kaTK-1a)
> > I have managed to successfully implement this at the linear solve level (by modifying the KSP solver) inside a custom Newton solver in python by following an incomplete tutorial at https://www.firedrakeproject.org/petsc-interface.html#defining-a-preconditioner :
> > ?             while (norm(delU) > alpha):  # while not converged
> > ?
> > ?                 self.update_F()  # call to method to update r.h.s form
> > ?                 self.update_K()  # call to update the jacobian form
> > ?                 K = assemble(self.K)  # assemble the jacobian matrix
> > ?                 F = assemble(self.F)  # assemble the r.h.s vector
> > ?                 a = assemble(self.a_form)  # assemble the a_form (see Sherman Morrison formula)
> > ?
> > ?                 for bc in self.mem.bc:  # apply boundary conditions
> > ?                     bc.apply(K, F)
> > ?                     bc.apply(K, a)
> > ?
> > ?                 B = PETSc.Mat().create()
> > ?
> > ?                 # Assemble the bilinear form that defines A and get the concrete
> > ?                 # PETSc matrix
> > ?                 A = as_backend_type(K).mat()  # get the PETSc objects for K and a
> > ?                 u = as_backend_type(a).vec()
> > ?
> > ?                 # Build the matrix "context"  # see firedrake docs
> > ?                 Bctx = MatrixFreeB(A, u, u, self.k)
> > ?
> > ?                 # Set up B
> > ?                 # B is the same size as A
> > ?                 B.setSizes(*A.getSizes())
> > ?                 B.setType(B.Type.PYTHON)
> > ?                 B.setPythonContext(Bctx)
> > ?                 B.setUp()
> > ?
> > ?
> > ?                 ksp = PETSc.KSP().create()   # create the KSP linear solver object
> > ?                 ksp.setOperators(B)
> > ?                 ksp.setUp()
> > ?                 pc = ksp.pc
> > ?                 pc.setType(pc.Type.PYTHON)
> > ?                 pc.setPythonContext(MatrixFreePC())
> > ?                 ksp.setFromOptions()
> > ?
> > ?                 solution = delU    # the incremental displacement at this iteration
> > ?
> > ?                 b = as_backend_type(-F).vec()
> > ?                 delu = solution.vector().vec()
> > ?
> > ?                 ksp.solve(b, delu)
> >
> > ?                 self.mem.u.vector().axpy(0.25, self.delU.vector())  # poor man's linesearch
> > ?                 counter += 1
> > Here is the corresponding petsc4py code adapted from the firedrake docs:
> >
> >       ? class MatrixFreePC(object):
> >       ?
> >       ?     def setUp(self, pc):
> >       ?         B, P = pc.getOperators()
> >       ?         # extract the MatrixFreeB object from B
> >       ?         ctx = B.getPythonContext()
> >       ?         self.A = ctx.A
> >       ?         self.u = ctx.u
> >       ?         self.v = ctx.v
> >       ?         self.k = ctx.k
> >       ?         # Here we build the PC object that uses the concrete,
> >       ?         # assembled matrix A.  We will use this to apply the action
> >       ?         # of A^{-1}
> >       ?         self.pc = PETSc.PC().create()
> >       ?         self.pc.setOptionsPrefix("mf_")
> >       ?         self.pc.setOperators(self.A)
> >       ?         self.pc.setFromOptions()
> >       ?         # Since u and v do not change, we can build the denominator
> >       ?         # and the action of A^{-1} on u only once, in the setup
> >       ?         # phase.
> >       ?         tmp = self.A.createVecLeft()
> >       ?         self.pc.apply(self.u, tmp)
> >       ?         self._Ainvu = tmp
> >       ?         self._denom = 1 + self.k*self.v.dot(self._Ainvu)
> >       ?
> >       ?     def apply(self, pc, x, y):
> >       ?         # y <- A^{-1}x
> >       ?         self.pc.apply(x, y)
> >       ?         # alpha <- (v^T A^{-1} x) / (1 + v^T A^{-1} u)
> >       ?         alpha = (self.k*self.v.dot(y)) / self._denom
> >       ?         # y <- y - alpha * A^{-1}u
> >       ?         y.axpy(-alpha, self._Ainvu)
> >       ?
> >       ?
> >       ? class MatrixFreeB(object):
> >       ?
> >       ?     def __init__(self, A, u, v, k):
> >       ?         self.A = A
> >       ?         self.u = u
> >       ?         self.v = v
> >       ?         self.k = k
> >       ?
> >       ?     def mult(self, mat, x, y):
> >       ?         # y <- A x
> >       ?         self.A.mult(x, y)
> >       ?
> >       ?         # alpha <- v^T x
> >       ?         alpha = self.v.dot(x)
> >       ?
> >       ?         # y <- y + alpha*u
> >       ?         y.axpy(alpha, self.u)
> > However, this approach is not efficient as it requires many iterations due to the Newton step being fixed, so I would like to implement it using SNES and use line search. Unfortunately, I have not been able to find any documentation/tutorial on how to do so. Provided I have the FEniCS forms for F, K, and a, I'd like to do something along the lines of:
> > solver  = PETScSNESSolver() # the FEniCS SNES wrapper
> > snes = solver.snes()  # the petsc4py SNES object
> > ## ??
> > ksp = snes.getKSP()
> >  # set ksp option similar to above
> > solver.solve()
> >
> > I would be very grateful if anyone could could help or point me to a reference or demo that does something similar (or maybe a completely different way of solving the problem!).
> > Many thanks in advance!
> > Alex
>
>
>
> --
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200206/c0a2ed2b/attachment-0001.html>

From dong-hao at outlook.com  Thu Feb  6 03:43:05 2020
From: dong-hao at outlook.com (Hao DONG)
Date: Thu, 6 Feb 2020 09:43:05 +0000
Subject: [petsc-users] What is the right way to implement a (block)
 Diagonal ILU as PC?
In-Reply-To: <4A373D93-4018-45E0-B805-3ECC528472DD@mcs.anl.gov>
References: <MN2PR07MB6239B134B7BCE032E8ACC4C995030@MN2PR07MB6239.namprd07.prod.outlook.com>
	<264F91C4-8558-4850-9B4B-ABE4123C2A2C@anl.gov>
	<MN2PR07MB6239E7694FFB455C533991FB95020@MN2PR07MB6239.namprd07.prod.outlook.com>,
	<4A373D93-4018-45E0-B805-3ECC528472DD@mcs.anl.gov>
Message-ID: <BN8PR07MB6228BCD9B6D1205CB3E6D60F951D0@BN8PR07MB6228.namprd07.prod.outlook.com>

Dear Hong and Barry,

Thanks for the suggestions. So there could be some problems in my PETSc configuration? - but my PETSc lib was indeed compiled without the debug flags (--with-debugging=0). I use GCC/GFortran (Home-brew GCC 9.2.0) for the compiling and building of PETSc and my own fortran code. My Fortran compiling flags are simply like:

-O3 -ffree-line-length-none -fastsse

Which is also used for -FOPTFLAGS in PETSc (I added -openmp for PETSc, but not my fortran code, as I don?t have any OMP optimizations in my program). Note the performance test results I listed yesterday (e.g. 4.08s with 41 bcgs iterations.) are without any CSR-array->PETSc translation overhead (only include the set and solve part).

I have two questions about the performance difference -

1. Is ilu only factorized once for each iteration, or ilu is performed at every outer ksp iteration steps? Sounds unlikely - but if so, this could cause some extra overheads.

2. Some KSP solvers like BCGS or TFQMR has two ?half-iterations? for each iteration step. Not sure how it works in PETSc, but is that possible that both the two ?half" relative residuals are counted in the output array, doubling the number of iterations (but that cannot explain the extra time consumed)?

Anyway, the output with -log_view from the same 278906 by 278906 matrix with 3-block D-ILU in PETSc is as follows:


---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

 MEMsolv.lu on a arch-darwin-c-opt named Haos-MBP with 1 processor, by donghao Thu Feb  6 09:07:35 2020
 Using Petsc Release Version 3.12.3, unknown

                          Max       Max/Min     Avg       Total
 Time (sec):           4.443e+00     1.000   4.443e+00
 Objects:              1.155e+03     1.000   1.155e+03
 Flop:                 4.311e+09     1.000   4.311e+09  4.311e+09
 Flop/sec:             9.703e+08     1.000   9.703e+08  9.703e+08
 MPI Messages:         0.000e+00     0.000   0.000e+00  0.000e+00
 MPI Message Lengths:  0.000e+00     0.000   0.000e+00  0.000e+00
 MPI Reductions:       0.000e+00     0.000

 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                             e.g., VecAXPY() for real vectors of length N --> 2N flop
                             and VecAXPY() for complex vectors of length N --> 8N flop

 Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
                         Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
  0:      Main Stage: 4.4435e+00 100.0%  4.3113e+09 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%

 ????????????????????????????????????????????????????????????
 See the 'Profiling' chapter of the users' manual for details on interpreting output.
 Phase summary info:
    Count: number of times phase was executed
    Time and Flop: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
    Mess: number of messages sent
    AvgLen: average message length (bytes)
    Reduct: number of global reductions
    Global: entire computation
    Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
       %T - percent time in this phase         %F - percent flop in this phase
       %M - percent messages in this phase     %L - percent message lengths in this phase
       %R - percent reductions in this phase
    Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
 ------------------------------------------------------------------------------------------------------------------------
 Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
                    Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
 ------------------------------------------------------------------------------------------------------------------------

 --- Event Stage 0: Main Stage

 BuildTwoSidedF         1 1.0 2.3000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 MatMult               83 1.0 1.7815e+00 1.0 2.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 40 48  0  0  0  40 48  0  0  0  1168
 MatSolve             252 1.0 1.2708e+00 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 29 28  0  0  0  29 28  0  0  0   939
 MatLUFactorNum         3 1.0 7.9725e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   298
 MatILUFactorSym        3 1.0 2.6998e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
 MatAssemblyBegin       5 1.0 3.6000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 MatAssemblyEnd         5 1.0 3.1619e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
 MatGetRowIJ            3 1.0 2.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 MatCreateSubMats       1 1.0 3.9659e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
 MatGetOrdering         3 1.0 4.3070e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 MatView                3 1.0 1.3600e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 VecDot                82 1.0 1.8948e-01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00  4  4  0  0  0   4  4  0  0  0   966
 VecDotNorm2           41 1.0 1.6812e-01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00  4  4  0  0  0   4  4  0  0  0  1088
 VecNorm               43 1.0 9.5099e-02 1.0 9.59e+07 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0  1009
 VecCopy                2 1.0 1.0120e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 VecSet               271 1.0 3.8922e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
 VecAXPY                1 1.0 7.7200e-04 1.0 2.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2890
 VecAXPBYCZ            82 1.0 2.4370e-01 1.0 3.66e+08 1.0 0.0e+00 0.0e+00 0.0e+00  5  8  0  0  0   5  8  0  0  0  1502
 VecWAXPY              82 1.0 1.4148e-01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  4  0  0  0   3  4  0  0  0  1293
 VecAssemblyBegin       2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 VecAssemblyEnd         2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 VecScatterBegin       84 1.0 5.9300e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 KSPSetUp               4 1.0 1.4167e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 KSPSolve               1 1.0 4.0250e+00 1.0 4.31e+09 1.0 0.0e+00 0.0e+00 0.0e+00 91100  0  0  0  91100  0  0  0  1071
 PCSetUp                4 1.0 1.5207e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  3  1  0  0  0   3  1  0  0  0   156
 PCSetUpOnBlocks        1 1.0 1.1116e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  3  1  0  0  0   3  1  0  0  0   214
 PCApply               84 1.0 1.2912e+00 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 29 28  0  0  0  29 28  0  0  0   924
 PCApplyOnBlocks      252 1.0 1.2909e+00 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 29 28  0  0  0  29 28  0  0  0   924
 ------------------------------------------------------------------------------------------------------------------------

# I skipped the memory part - the options (and compiler options) are as follows:

#PETSc Option Table entries:
 -ksp_type bcgs
 -ksp_view
 -log_view
 -pc_bjacobi_local_blocks 3
 -pc_factor_levels 0
 -pc_sub_type ilu
 -pc_type bjacobi
 #End of PETSc Option Table entries
 Compiled with FORTRAN kernels
 Compiled with full precision matrices (default)
 sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 16 sizeof(PetscInt) 4
 Configure options: --with-scalar-type=complex --download-mumps --download-scalapack --with-fortran-kernels=1 --          FOPTFLAGS=?-O3 -fastsse -mp -openmp? --COPTFLAGS=?-O3 -fastsse -mp -openmp? --CXXOPTFLAGS="-O3 -fastsse -mp -openmp" --     with-debugging=0
 -----------------------------------------
 Libraries compiled on 2020-02-03 10:44:31 on Haos-MBP
 Machine characteristics: Darwin-19.2.0-x86_64-i386-64bit
 Using PETSc directory: /Users/donghao/src/git/PETSc-current
 Using PETSc arch: arch-darwin-c-opt
 -----------------------------------------

 Using C compiler: mpicc  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fno-stack-   check -Qunused-arguments -fvisibility=hidden
 Using Fortran compiler: mpif90  -Wall -ffree-line-length-0 -Wno-unused-dummy-argument

Using include paths: -I/Users/donghao/src/git/PETSc-current/include -I/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/include
 -----------------------------------------

 Using C linker: mpicc
 Using Fortran linker: mpif90
 Using libraries: -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/donghao/src/git/PETSc-   current/arch-darwin-c-opt/lib -lpetsc -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/    donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -Wl,-rpath,/usr/local/opt/libevent/lib -L/usr/local/opt/libevent/    lib -Wl,-rpath,/usr/local/Cellar/open-mpi/4.0.2/lib -L/usr/local/Cellar/open-mpi/4.0.2/lib -Wl,-rpath,/usr/local/Cellar/ gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple-darwin19/9.2.0 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple-        darwin19/9.2.0 -Wl,-rpath,/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -lcmumps -   ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -llapack -lblas -lc++ -ldl -lmpi_usempif08 -                 lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lquadmath -lm -lc++ -ldl


On the other hand, running PETSc with

-pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type lu -ksp_type gmres -ksp_monitor -ksp_view -log_view

For the same problem takes 5.37s and  72 GMRES iterations. Our previous testings show that BiCGstab (bcgs in PETSc) is almost always the most effective KSP solver for our non-symmetrical complex system. Strangely, the system is still using ilu instead of lu for sub blocks. The output is like:

   0 KSP Residual norm 2.480412407430e+02
   1 KSP Residual norm 8.848059967835e+01
   2 KSP Residual norm 3.415272863261e+01
   3 KSP Residual norm 1.563045190939e+01
   4 KSP Residual norm 6.241296940043e+00
   5 KSP Residual norm 2.739710899854e+00
   6 KSP Residual norm 1.391304148888e+00
   7 KSP Residual norm 7.959262020849e-01
   8 KSP Residual norm 4.828323055231e-01
   9 KSP Residual norm 2.918529739200e-01
  10 KSP Residual norm 1.905508589557e-01
  11 KSP Residual norm 1.291541892702e-01
  12 KSP Residual norm 8.827145774707e-02
  13 KSP Residual norm 6.521331095889e-02
  14 KSP Residual norm 5.095787952595e-02
  15 KSP Residual norm 4.043060387395e-02
  16 KSP Residual norm 3.232590200012e-02
  17 KSP Residual norm 2.593944982216e-02
  18 KSP Residual norm 2.064639483533e-02
  19 KSP Residual norm 1.653916663492e-02
  20 KSP Residual norm 1.334946415452e-02
  21 KSP Residual norm 1.092886880597e-02
  22 KSP Residual norm 8.988004105542e-03
  23 KSP Residual norm 7.466501315240e-03
  24 KSP Residual norm 6.284389135436e-03
  25 KSP Residual norm 5.425231669964e-03
  26 KSP Residual norm 4.766338253084e-03
  27 KSP Residual norm 4.241238878242e-03
  28 KSP Residual norm 3.808113525685e-03
  29 KSP Residual norm 3.449383788116e-03
  30 KSP Residual norm 3.126025526388e-03
  31 KSP Residual norm 2.958328054299e-03
  32 KSP Residual norm 2.802344900403e-03
  33 KSP Residual norm 2.621993580492e-03
  34 KSP Residual norm 2.430066269304e-03
  35 KSP Residual norm 2.259043079597e-03
  36 KSP Residual norm 2.104287972986e-03
  37 KSP Residual norm 1.952916080045e-03
  38 KSP Residual norm 1.804988937999e-03
  39 KSP Residual norm 1.643302117377e-03
  40 KSP Residual norm 1.471661332036e-03
  41 KSP Residual norm 1.286445911163e-03
  42 KSP Residual norm 1.127543025848e-03
  43 KSP Residual norm 9.777148275484e-04
  44 KSP Residual norm 8.293314450006e-04
  45 KSP Residual norm 6.989331136622e-04
  46 KSP Residual norm 5.852307780220e-04
  47 KSP Residual norm 4.926715539762e-04
  48 KSP Residual norm 4.215941372075e-04
  49 KSP Residual norm 3.699489548162e-04
  50 KSP Residual norm 3.293897163533e-04
  51 KSP Residual norm 2.959954542998e-04
  52 KSP Residual norm 2.700193032414e-04
  53 KSP Residual norm 2.461789791204e-04
  54 KSP Residual norm 2.218839085563e-04
  55 KSP Residual norm 1.945154309976e-04
  56 KSP Residual norm 1.661128781744e-04
  57 KSP Residual norm 1.413198766258e-04
  58 KSP Residual norm 1.213984003195e-04
  59 KSP Residual norm 1.044317450754e-04
  60 KSP Residual norm 8.919957502977e-05
  61 KSP Residual norm 8.042584301275e-05
  62 KSP Residual norm 7.292784493581e-05
  63 KSP Residual norm 6.481935501872e-05
  64 KSP Residual norm 5.718564652679e-05
  65 KSP Residual norm 5.072589750116e-05
  66 KSP Residual norm 4.487930741285e-05
  67 KSP Residual norm 3.941040674119e-05
  68 KSP Residual norm 3.492873281291e-05
  69 KSP Residual norm 3.103798339845e-05
  70 KSP Residual norm 2.822943237409e-05
  71 KSP Residual norm 2.610615023776e-05
  72 KSP Residual norm 2.441692671173e-05
 KSP Object: 1 MPI processes
   type: gmres
     restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
     happy breakdown tolerance 1e-30
   maximum iterations=150, nonzero initial guess
   tolerances:  relative=1e-07, absolute=1e-50, divergence=10000.
   left preconditioning
   using PRECONDITIONED norm type for convergence test
 PC Object: 1 MPI processes
   type: bjacobi
     number of blocks = 3
     Local solve is same for all blocks, in the following KSP and PC objects:
     KSP Object: (sub_) 1 MPI processes
       type: preonly
       maximum iterations=10000, initial guess is zero
       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
       left preconditioning
       using NONE norm type for convergence test
     PC Object: (sub_) 1 MPI processes
       type: ilu
         out-of-place factorization
         0 levels of fill
         tolerance for zero pivot 2.22045e-14
         matrix ordering: natural
         factor fill ratio given 1., needed 1.
           Factored matrix follows:
             Mat Object: 1 MPI processes
               type: seqaij
               rows=92969, cols=92969
               package used to perform factorization: petsc
               total: nonzeros=638417, allocated nonzeros=638417
               total number of mallocs used during MatSetValues calls=0
                 not using I-node routines
       linear system matrix = precond matrix:
       Mat Object: 1 MPI processes
         type: seqaij
         rows=92969, cols=92969
         total: nonzeros=638417, allocated nonzeros=638417
         total number of mallocs used during MatSetValues calls=0
           not using I-node routines
   linear system matrix = precond matrix:
   Mat Object: 1 MPI processes
     type: mpiaij
     rows=278906, cols=278906
     total: nonzeros=3274027, allocated nonzeros=3274027
     total number of mallocs used during MatSetValues calls=0
       not using I-node (on process 0) routines
...
 ------------------------------------------------------------------------------------------------------------------------
 Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
                    Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
 ------------------------------------------------------------------------------------------------------------------------

 --- Event Stage 0: Main Stage

 BuildTwoSidedF         1 1.0 2.3000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 MatMult               75 1.0 1.5812e+00 1.0 1.88e+09 1.0 0.0e+00 0.0e+00 0.0e+00 28 24  0  0  0  28 24  0  0  0  1189
 MatSolve             228 1.0 1.1442e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 14  0  0  0  20 14  0  0  0   944
 MatLUFactorNum         3 1.0 8.1930e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0   290
 MatILUFactorSym        3 1.0 2.7102e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 MatAssemblyBegin       5 1.0 3.7000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 MatAssemblyEnd         5 1.0 3.1895e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
 MatGetRowIJ            3 1.0 2.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 MatCreateSubMats       1 1.0 4.0904e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
 MatGetOrdering         3 1.0 4.2640e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 MatView                3 1.0 1.4400e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 VecMDot               72 1.0 1.1984e+00 1.0 2.25e+09 1.0 0.0e+00 0.0e+00 0.0e+00 21 28  0  0  0  21 28  0  0  0  1877
 VecNorm               76 1.0 1.6841e-01 1.0 1.70e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  2  0  0  0   3  2  0  0  0  1007
 VecScale              75 1.0 1.8241e-02 1.0 8.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  4587
 VecCopy                3 1.0 1.4970e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 VecSet               276 1.0 9.1970e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
 VecAXPY                6 1.0 3.7450e-03 1.0 1.34e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  3575
 VecMAXPY              75 1.0 1.0022e+00 1.0 2.41e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 30  0  0  0  18 30  0  0  0  2405
 VecAssemblyBegin       2 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 VecAssemblyEnd         2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 VecScatterBegin       76 1.0 5.5100e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 VecNormalize          75 1.0 1.8462e-01 1.0 2.51e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  3  0  0  0   3  3  0  0  0  1360
 KSPSetUp               4 1.0 1.1341e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 KSPSolve               1 1.0 5.3123e+00 1.0 7.91e+09 1.0 0.0e+00 0.0e+00 0.0e+00 93100  0  0  0  93100  0  0  0  1489
 KSPGMRESOrthog        72 1.0 2.1316e+00 1.0 4.50e+09 1.0 0.0e+00 0.0e+00 0.0e+00 37 57  0  0  0  37 57  0  0  0  2110
 PCSetUp                4 1.0 1.5531e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0   153
 PCSetUpOnBlocks        1 1.0 1.1343e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0   209
 PCApply               76 1.0 1.1671e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 14  0  0  0  20 14  0  0  0   925
 PCApplyOnBlocks      228 1.0 1.1668e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 14  0  0  0  20 14  0  0  0   925
 ????????????????????????????????????????????????????????????
...
#PETSc Option Table entries:
 -ksp_monitor
 -ksp_type gmres
 -ksp_view
 -log_view
 -pc_bjacobi_local_blocks 3
 -pc_sub_type lu
 -pc_type bjacobi
 #End of PETSc Option Table entries
...

Does any of the setup/output ring a bell?

BTW, out of curiosity - what is a ?I-node? routine?


Cheers,
Hao


________________________________
From: Smith, Barry F. <bsmith at mcs.anl.gov>
Sent: Wednesday, February 5, 2020 9:42 PM
To: Hao DONG <dong-hao at outlook.com>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC?



> On Feb 5, 2020, at 4:36 AM, Hao DONG <dong-hao at outlook.com> wrote:
>
> Thanks a lot for your suggestions, Hong and Barry -
>
> As you suggested, I first tried the LU direct solvers (built-in and MUMPs) out this morning, which work perfectly, albeit slow. As my problem itself is a part of a PDE based optimization, the exact solution in the searching procedure is not necessary (I often set a relative tolerance of 1E-7/1E-8, etc. for Krylov subspace methods). Also tried BJACOBI with exact LU, the KSP just converges in one or two iterations, as expected.
>
> I added -kspview option for the D-ILU test (still with Block Jacobi as preconditioner and bcgs as KSP solver). The KSPview output from one of the examples in a toy model looks like:
>
> KSP Object: 1 MPI processes
>    type: bcgs
>    maximum iterations=120, nonzero initial guess
>    tolerances:  relative=1e-07, absolute=1e-50, divergence=10000.
>    left preconditioning
>    using PRECONDITIONED norm type for convergence test
>  PC Object: 1 MPI processes
>    type: bjacobi
>      number of blocks = 3
>      Local solve is same for all blocks, in the following KSP and PC objects:
>      KSP Object: (sub_) 1 MPI processes
>        type: preonly
>        maximum iterations=10000, initial guess is zero
>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>        left preconditioning
>        using NONE norm type for convergence test
>      PC Object: (sub_) 1 MPI processes
>        type: ilu
>          out-of-place factorization
>          0 levels of fill
>          tolerance for zero pivot 2.22045e-14
>          matrix ordering: natural
>          factor fill ratio given 1., needed 1.
>            Factored matrix follows:
>              Mat Object: 1 MPI processes
>                type: seqaij
>                rows=11294, cols=11294
>                package used to perform factorization: petsc
>                total: nonzeros=76008, allocated nonzeros=76008
>                total number of mallocs used during MatSetValues calls=0
>                  not using I-node routines
>        linear system matrix = precond matrix:
>        Mat Object: 1 MPI processes
>          type: seqaij
>          rows=11294, cols=11294
>          total: nonzeros=76008, allocated nonzeros=76008
>          total number of mallocs used during MatSetValues calls=0
>            not using I-node routines
>    linear system matrix = precond matrix:
>    Mat Object: 1 MPI processes
>      type: mpiaij
>      rows=33880, cols=33880
>      total: nonzeros=436968, allocated nonzeros=436968
>      total number of mallocs used during MatSetValues calls=0
>        not using I-node (on process 0) routines
>
> do you see something wrong with my setup?
>
> I also tried a quick performance test with a small 278906 by 278906 matrix (3850990 nnz) with the following parameters:
>
> -ksp_type bcgs -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type ilu -ksp_view
>
> Reducing the relative residual to 1E-7
>
> Took 4.08s with 41 bcgs iterations.
>
> Merely changing the -pc_bjacobi_local_blocks to 6
>
> Took 7.02s with 73 bcgs iterations. 9 blocks would further take 9.45s with 101 bcgs iterations.

    This is normal. more blocks slower convergence
>
> As a reference, my home-brew Fortran code solves the same problem (3-block D-ILU0) in
>
> 1.84s with 24 bcgs iterations (the bcgs code is also a home-brew one)?
>
    Run the PETSc code with optimization ./configure --with-debugging=0  an run the code with -log_view this will show where the PETSc code is spending the time (send it to use)


>
>
> Well, by saying ?use explicit L/U matrix as preconditioner?, I wonder if a user is allowed to provide his own (separate) L and U Mat for preconditioning (like how it works in Matlab solvers), e.g.
>
> x = qmr(A,b,Tol,MaxIter,L,U,x)
>
> As I already explicitly constructed the L and U matrices in Fortran, it is not hard to convert them to Mat in Petsc to test Petsc and my Fortran code head-to-head. In that case, the A, b, x, and L/U are all identical, it would be easier to see where the problem came from.
>
>
     No, we don't provide this kind of support


>
> BTW, there is another thing I wondered - is there a way to output residual in unpreconditioned norm? I tried to
>
> call KSPSetNormType(ksp_local, KSP_NORM_UNPRECONDITIONED, ierr)
>
> But always get an error that current ksp does not support unpreconditioned in LEFT/RIGHT (either way). Is it possible to do that (output unpreconditioned residual) in PETSc at all?

   -ksp_monitor_true_residual    You can also run GMRES (and some other methods) with right preconditioning, -ksp_pc_side right  then the residual computed is by the algorithm the unpreconditioned residual

   KSPSetNormType sets the norm used in the algorithm, it generally always has to left or right, only a couple algorithms support unpreconditioned directly.

   Barry


>
> Cheers,
> Hao
>
>
> From: Smith, Barry F. <bsmith at mcs.anl.gov>
> Sent: Tuesday, February 4, 2020 8:27 PM
> To: Hao DONG <dong-hao at outlook.com>
> Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> Subject: Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC?
>
>
>
> > On Feb 4, 2020, at 12:41 PM, Hao DONG <dong-hao at outlook.com> wrote:
> >
> > Dear all,
> >
> >
> > I have a few questions about the implementation of diagonal ILU PC in PETSc. I want to solve a very simple system with KSP (in parallel), the nature of the system (finite difference time-harmonic Maxwell) is probably not important to the question itself. Long story short, I just need to solve a set of equations of Ax = b with a block diagonal system matrix, like (not sure if the mono font works):
> >
> >    |X    |
> > A =|  Y  |
> >    |    Z|
> >
> > Note that A is not really block-diagonal, it?s just a multi-diagonal matrix determined by the FD mesh, where most elements are close to diagonal. So instead of a full ILU decomposition, a D-ILU is a good approximation as a preconditioner, and the number of blocks should not matter too much:
> >
> >     |Lx      |         |Ux      |
> > L = |   Ly   | and U = |   Uy   |
> >     |      Lz|         |      Uz|
> >
> > Where [Lx, Ux] = ILU0(X), etc. Indeed, the D-ILU preconditioner (with 3N blocks) is quite efficient with Krylov subspace methods like BiCGstab or QMR in my sequential Fortran/Matlab code.
> >
> > So like most users, I am looking for a parallel implement with this problem in PETSc. After looking through the manual, it seems that the most straightforward way to do it is through PCBJACOBI. Not sure I understand it right, I just setup a 3-block PCJACOBI and give each of the block a KSP with PCILU. Is this supposed to be equivalent to my D-ILU preconditioner? My little block of fortran code would look like:
> > ...
> >       call PCBJacobiSetTotalBlocks(pc_local,Ntotal,                   &
> >      &     isubs,ierr)
> >       call PCBJacobiSetLocalBlocks(pc_local, Nsub,                    &
> >      &    isubs(istart:iend),ierr)
> >       ! set up the block jacobi structure
> >       call KSPSetup(ksp_local,ierr)
> >       ! allocate sub ksps
> >       allocate(ksp_sub(Nsub))
> >       call PCBJacobiGetSubKSP(pc_local,Nsub,istart,                   &
> >      &     ksp_sub,ierr)
> >       do i=1,Nsub
> >           call KSPGetPC(ksp_sub(i),pc_sub,ierr)
> >           !ILU preconditioner
> >           call PCSetType(pc_sub,ptype,ierr)
> >           call PCFactorSetLevels(pc_sub,1,ierr) ! use ILU(1) here
> >           call KSPSetType(ksp_sub(i),KSPPREONLY,ierr)]
> >       end do
> >       call KSPSetTolerances(ksp_local,KSSiter%tol,PETSC_DEFAULT_REAL, &
> >      &     PETSC_DEFAULT_REAL,KSSiter%maxit,ierr)
> > ?
>
>      This code looks essentially right. You should call with -ksp_view to see exactly what PETSc is using for a solver.
>
> >
> > I understand that the parallel performance may not be comparable, so I first set up a one-process test (with MPIAij, but all the rows are local since there is only one process). The system is solved without any problem (identical results within error). But the performance is actually a lot worse (code built without debugging flags in performance tests) than my own home-brew implementation in Fortran (I wrote my own ILU0 in CSR sparse matrix format), which is hard to believe. I suspect the difference is from the PC as the PETSc version took much more BiCGstab iterations (60-ish vs 100-ish) to converge to the same relative tol.
>
>    PETSc uses GMRES by default with a restart of 30 and left preconditioning. It could be different exact choices in the solver (which is why -ksp_view is so useful) can explain the differences in the runs between your code and PETSc's
> >
> > This is further confirmed when I change the setup of D-ILU (using 6 or 9 blocks instead of 3). While my Fortran/Matlab codes see minimal performance difference (<5%) when I play with the D-ILU setup, increasing the number of D-ILU blocks from 3 to 9 caused the ksp setup with PCBJACOBI to suffer a performance decrease of more than 50% in sequential test.
>
>    This is odd, the more blocks the smaller each block so the quicker the ILU set up should be. You can run various cases with -log_view and send us the output to see what is happening at each part of the computation in time.
>
> > So my implementation IS somewhat different in PETSc. Do I miss something in the PCBJACOBI setup? Or do I have some fundamental misunderstanding of how PCBJACOBI works in PETSc?
>
>    Probably not.
> >
> > If this is not the right way to implement a block diagonal ILU as (parallel) PC, please kindly point me to the right direction. I searched through the mail list to find some answers, only to find a couple of similar questions... An example would be nice.
>
>    You approach seems fundamentally right but I cannot be sure of possible bugs.
> >
> > On the other hand, does PETSc support a simple way to use explicit L/U matrix as a preconditioner? I can import the  D-ILU matrix (I already converted my A matrix into Mat) constructed in my Fortran code to make a better comparison. Or do I have to construct my own PC using PCshell? If so, is there a good tutorial/example to learn about how to use PCSHELL (in a more advanced sense, like how to setup pc side and transpose)?
>
>    Not sure what you mean by explicit L/U matrix as a preconditioner. As Hong said, yes you can use a parallel LU from MUMPS or SuperLU_DIST or Pastix as the solver. You do not need any shell code. You simply need to set the PCType to lu
>
>    You can also set all this options from the command line and don't need to write the code specifically. So call KSPSetFromOptions() and then for example
>
>     -pc_type bjacobi  -pc_bjacobi_local_blocks 3 -pc_sub_type ilu (this last one is applied to each block so you could use -pc_type lu and it would use lu on each block.)
>
>    -ksp_type_none  -pc_type lu -pc_factor_mat_solver_type mumps  (do parallel LU with mumps)
>
> By not hardwiring in the code and just using options you can test out different cases much quicker
>
> Use -ksp_view to make sure that is using the solver the way you expect.
>
> Barry
>
>
>
>    Barry
>
> >
> > Thanks in advance,
> >
> > Hao

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200206/76e56b31/attachment-0001.html>

From knepley at gmail.com  Thu Feb  6 04:33:44 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 6 Feb 2020 05:33:44 -0500
Subject: [petsc-users] Implementing the Sherman Morisson formula (low
 rank update) in petsc4py and FEniCS?
In-Reply-To: <MN2PR04MB6957AFB32FE45626A84C4CFB8A1D0@MN2PR04MB6957.namprd04.prod.outlook.com>
References: <MN2PR04MB6957EBDB8DBE35C947805DC08A030@MN2PR04MB6957.namprd04.prod.outlook.com>
	<C5580EE8-E585-4570-9266-073A860AB975@anl.gov>
	<CAMYG4Gm4DZB_Q2iQPKFz7SVu1Q7Cg5mN-pnWNY=szjaNGuGjcw@mail.gmail.com>
	<20E8B18C-F71E-4B10-958B-6CD24DA869A3@mcs.anl.gov>
	<MN2PR04MB6957AFB32FE45626A84C4CFB8A1D0@MN2PR04MB6957.namprd04.prod.outlook.com>
Message-ID: <CAMYG4GkQxHwJh9YoarD=XZb43CHhnsYG=6cM6Ng+EacQrFWsRg@mail.gmail.com>

On Wed, Feb 5, 2020 at 8:53 PM Olek Niewiarowski <aan2 at princeton.edu> wrote:

> Hi Barry and Matt,
>
> Thank you for your input and for creating a new issue in the repo.
> My initial question was more basic (how to configure the SNES's KSP solver
> as in my first message with *a* and *k*), but now I see there's more to
> the implementation. To reiterate, for my problem's structure, a good
> solution algorithm (on the algebraic level) is the following "double
> back-substitution":
> For each nonlinear iteration:
>
>    1. define intermediate vectors u_1 and u_2
>    2. solve Ku_1 =  -F --> u_1 = -K^{-1}F (this solve is cheap, don't
>    actually need K^-1)
>    3. solve Ku_2 = -a --> u_2 = -K^{-1}a (ditto)
>    4. define \beta = 1/(1 + k a^Tu_2)
>    5. \Delta u = u_1 + \beta k u_2^T F u_2
>    6. u = u + \Delta u
>
> This is very easy to setup:

  1) Create a KSP object KSPCreate(comm, &ksp)

  2) Call KSPSetOperators(ksp, K, K,)

  3) Solve the first system KSPSolve(ksp, -F, u_1)

  4) Solve the second system KSPSolve(ksp, a, u_2)

  5) Calculate beta VecDot(a, u_2, &gamma); beta = 1./(1. + k*gamma);

  6) Update the guess VecDot(u_2, F, &delta); VecAXPBYPCZ(u, 1.0,
beta*delta, 1.0, u_1, u_2)

Thanks,

    Matt

I don't need the Jacobian inverse, [*K*?k*aa**T*]-1 = *K*-1  - (k*K*-1 *a*
> *a**T**K*-1)/(1+k*a**T**K*-1*a*) just the solution ?*u =* [*K*?k*aa**T*]-1
> *F *= *K*-1*F* - (k*K*-1 *a**F**K*-1*a*)/(1 + k*a**T**K*-1*a*)
> = *u*_1 + beta k *u*_2^T *F u*_2  (so I never need to invert *K *either). (To
> Matt's point on gitlab, K is a symmetric sparse matrix arising from a
> bilinear form. ) Also, eventually, I want to have more than one low-rank
> updates to K, but again, Sherman Morrisson Woodbury should still work.
>
> Being new to PETSc, I don't know if this algorithm directly translates
> into an efficient numerical solver. (I'm also not sure if Picard iteration
> would be useful here.) What would it take to set up the KSP solver in SNES
> like I did below? Is it possible "out of the box"?  I looked at
> MatCreateLRC() - what would I pass this to? (A pointer to demo/tutorial
> would be very appreciated.) If there's a better way to go about all of
> this, I'm open to any and all ideas. My only limitation is that I use
> petsc4py exclusively since I/future users of my code will not be
> comfortable with C.
>
> Thanks again for your help!
>
>
> *Alexander (Olek) Niewiarowski*
> PhD Candidate, Civil & Environmental Engineering
> Princeton University, 2020
> Cell: +1 (610) 393-2978
> ------------------------------
> *From:* Smith, Barry F. <bsmith at mcs.anl.gov>
> *Sent:* Wednesday, February 5, 2020 15:46
> *To:* Matthew Knepley <knepley at gmail.com>
> *Cc:* Olek Niewiarowski <aan2 at princeton.edu>; petsc-users at mcs.anl.gov <
> petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Implementing the Sherman Morisson formula
> (low rank update) in petsc4py and FEniCS?
>
>
> https://gitlab.com/petsc/petsc/issues/557
>
>
> > On Feb 5, 2020, at 7:35 AM, Matthew Knepley <knepley at gmail.com> wrote:
> >
> > Perhaps Barry is right that you want Picard, but suppose you really want
> Newton.
> >
> > "This problem can be solved efficiently using the Sherman-Morrison
> formula" Well, maybe. The main assumption here is that inverting K is
> cheap. I see two things you can do in a straightforward way:
> >
> >   1) Use MatCreateLRC()
> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateLRC.html
> to create the Jacobian
> >        and solve using an iterative method. If you pass just K was the
> preconditioning matrix, you can use common PCs.
> >
> >   2) We only implemented MatMult() for LRC, but you could stick your SMW
> code in for MatSolve_LRC if you really want to factor K. We would
> >        of course help you do this.
> >
> >   Thanks,
> >
> >      Matt
> >
> > On Wed, Feb 5, 2020 at 1:36 AM Smith, Barry F. via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
> >
> >    I am not sure of everything in your email but it sounds like you want
> to use a "Picard" iteration to solve [K(u)?kaaT]?u=?F(u). That is solve
> >
> >   A(u^{n}) (u^{n+1} - u^{n}) = F(u^{n}) - A(u^{n})u^{n}  where A(u) =
> K(u) - kaaT
> >
> >  PETSc provides code to this with SNESSetPicard() (see the manual pages)
> I don't know if Petsc4py has bindings for this.
> >
> >   Adding missing python bindings is not terribly difficult and you may
> be able to do it yourself if this is the approach you want.
> >
> >    Barry
> >
> >
> >
> > > On Feb 4, 2020, at 5:07 PM, Olek Niewiarowski <aan2 at princeton.edu>
> wrote:
> > >
> > > Hello,
> > > I am a FEniCS user but new to petsc4py. I am trying to modify the KSP
> solver through the SNES object to implement the Sherman-Morrison
> formula(e.g.  http://fourier.eng.hmc.edu/e176/lectures/algebra/node6.html
> ). I am solving a nonlinear system of the form [K(u)?kaaT]?u=?F(u). Here
> the jacobian matrix K is modified by the term kaaT, where k is a scalar.
> Notably, K is sparse, while the term kaaT results in a full matrix. This
> problem can be solved efficiently using the Sherman-Morrison formula :
> > >
> > > [K?kaaT]-1 = K-1  - (kK-1 aaTK-1)/(1+kaTK-1a)
> > > I have managed to successfully implement this at the linear solve
> level (by modifying the KSP solver) inside a custom Newton solver in python
> by following an incomplete tutorial at
> https://www.firedrakeproject.org/petsc-interface.html#defining-a-preconditioner
> :
> > > ?             while (norm(delU) > alpha):  # while not converged
> > > ?
> > > ?                 self.update_F()  # call to method to update r.h.s
> form
> > > ?                 self.update_K()  # call to update the jacobian form
> > > ?                 K = assemble(self.K)  # assemble the jacobian matrix
> > > ?                 F = assemble(self.F)  # assemble the r.h.s vector
> > > ?                 a = assemble(self.a_form)  # assemble the a_form
> (see Sherman Morrison formula)
> > > ?
> > > ?                 for bc in self.mem.bc:  # apply boundary conditions
> > > ?                     bc.apply(K, F)
> > > ?                     bc.apply(K, a)
> > > ?
> > > ?                 B = PETSc.Mat().create()
> > > ?
> > > ?                 # Assemble the bilinear form that defines A and get
> the concrete
> > > ?                 # PETSc matrix
> > > ?                 A = as_backend_type(K).mat()  # get the PETSc
> objects for K and a
> > > ?                 u = as_backend_type(a).vec()
> > > ?
> > > ?                 # Build the matrix "context"  # see firedrake docs
> > > ?                 Bctx = MatrixFreeB(A, u, u, self.k)
> > > ?
> > > ?                 # Set up B
> > > ?                 # B is the same size as A
> > > ?                 B.setSizes(*A.getSizes())
> > > ?                 B.setType(B.Type.PYTHON)
> > > ?                 B.setPythonContext(Bctx)
> > > ?                 B.setUp()
> > > ?
> > > ?
> > > ?                 ksp = PETSc.KSP().create()   # create the KSP linear
> solver object
> > > ?                 ksp.setOperators(B)
> > > ?                 ksp.setUp()
> > > ?                 pc = ksp.pc
> > > ?                 pc.setType(pc.Type.PYTHON)
> > > ?                 pc.setPythonContext(MatrixFreePC())
> > > ?                 ksp.setFromOptions()
> > > ?
> > > ?                 solution = delU    # the incremental displacement at
> this iteration
> > > ?
> > > ?                 b = as_backend_type(-F).vec()
> > > ?                 delu = solution.vector().vec()
> > > ?
> > > ?                 ksp.solve(b, delu)
> > >
> > > ?                 self.mem.u.vector().axpy(0.25, self.delU.vector())
> # poor man's linesearch
> > > ?                 counter += 1
> > > Here is the corresponding petsc4py code adapted from the firedrake
> docs:
> > >
> > >       ? class MatrixFreePC(object):
> > >       ?
> > >       ?     def setUp(self, pc):
> > >       ?         B, P = pc.getOperators()
> > >       ?         # extract the MatrixFreeB object from B
> > >       ?         ctx = B.getPythonContext()
> > >       ?         self.A = ctx.A
> > >       ?         self.u = ctx.u
> > >       ?         self.v = ctx.v
> > >       ?         self.k = ctx.k
> > >       ?         # Here we build the PC object that uses the concrete,
> > >       ?         # assembled matrix A.  We will use this to apply the
> action
> > >       ?         # of A^{-1}
> > >       ?         self.pc = PETSc.PC().create()
> > >       ?         self.pc.setOptionsPrefix("mf_")
> > >       ?         self.pc.setOperators(self.A)
> > >       ?         self.pc.setFromOptions()
> > >       ?         # Since u and v do not change, we can build the
> denominator
> > >       ?         # and the action of A^{-1} on u only once, in the
> setup
> > >       ?         # phase.
> > >       ?         tmp = self.A.createVecLeft()
> > >       ?         self.pc.apply(self.u, tmp)
> > >       ?         self._Ainvu = tmp
> > >       ?         self._denom = 1 + self.k*self.v.dot(self._Ainvu)
> > >       ?
> > >       ?     def apply(self, pc, x, y):
> > >       ?         # y <- A^{-1}x
> > >       ?         self.pc.apply(x, y)
> > >       ?         # alpha <- (v^T A^{-1} x) / (1 + v^T A^{-1} u)
> > >       ?         alpha = (self.k*self.v.dot(y)) / self._denom
> > >       ?         # y <- y - alpha * A^{-1}u
> > >       ?         y.axpy(-alpha, self._Ainvu)
> > >       ?
> > >       ?
> > >       ? class MatrixFreeB(object):
> > >       ?
> > >       ?     def __init__(self, A, u, v, k):
> > >       ?         self.A = A
> > >       ?         self.u = u
> > >       ?         self.v = v
> > >       ?         self.k = k
> > >       ?
> > >       ?     def mult(self, mat, x, y):
> > >       ?         # y <- A x
> > >       ?         self.A.mult(x, y)
> > >       ?
> > >       ?         # alpha <- v^T x
> > >       ?         alpha = self.v.dot(x)
> > >       ?
> > >       ?         # y <- y + alpha*u
> > >       ?         y.axpy(alpha, self.u)
> > > However, this approach is not efficient as it requires many iterations
> due to the Newton step being fixed, so I would like to implement it using
> SNES and use line search. Unfortunately, I have not been able to find any
> documentation/tutorial on how to do so. Provided I have the FEniCS forms
> for F, K, and a, I'd like to do something along the lines of:
> > > solver  = PETScSNESSolver() # the FEniCS SNES wrapper
> > > snes = solver.snes()  # the petsc4py SNES object
> > > ## ??
> > > ksp = snes.getKSP()
> > >  # set ksp option similar to above
> > > solver.solve()
> > >
> > > I would be very grateful if anyone could could help or point me to a
> reference or demo that does something similar (or maybe a completely
> different way of solving the problem!).
> > > Many thanks in advance!
> > > Alex
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> > -- Norbert Wiener
> >
> > https://www.cse.buffalo.edu/~knepley/
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200206/b32e2e36/attachment-0001.html>

From aan2 at princeton.edu  Thu Feb  6 09:02:46 2020
From: aan2 at princeton.edu (Olek Niewiarowski)
Date: Thu, 6 Feb 2020 15:02:46 +0000
Subject: [petsc-users] Implementing the Sherman Morisson formula (low
 rank update) in petsc4py and FEniCS?
In-Reply-To: <CAMYG4GkQxHwJh9YoarD=XZb43CHhnsYG=6cM6Ng+EacQrFWsRg@mail.gmail.com>
References: <MN2PR04MB6957EBDB8DBE35C947805DC08A030@MN2PR04MB6957.namprd04.prod.outlook.com>
	<C5580EE8-E585-4570-9266-073A860AB975@anl.gov>
	<CAMYG4Gm4DZB_Q2iQPKFz7SVu1Q7Cg5mN-pnWNY=szjaNGuGjcw@mail.gmail.com>
	<20E8B18C-F71E-4B10-958B-6CD24DA869A3@mcs.anl.gov>
	<MN2PR04MB6957AFB32FE45626A84C4CFB8A1D0@MN2PR04MB6957.namprd04.prod.outlook.com>,
	<CAMYG4GkQxHwJh9YoarD=XZb43CHhnsYG=6cM6Ng+EacQrFWsRg@mail.gmail.com>
Message-ID: <MN2PR04MB6957CF4A0B62376B974A5C778A1D0@MN2PR04MB6957.namprd04.prod.outlook.com>

Hi Matt,

What you suggested in your last email was exactly what I did on my very first attempt at the problem, and while it "worked," convergence was not satisfactory due to the Newton step being fixed in step 6. This is the reason I would like to use the linesearch in SNES instead. Indeed in your manual you "recommend most PETSc users work directly with SNES, rather than using PETSc for the linear problem within a nonlinear solver."  Ideally I'd like to create a SNES solver, pass in the functions to evaluate K, F, a, and k, and set up the underlying KSP object as in my first message. Is this possible?

Thanks,

Alexander (Olek) Niewiarowski
PhD Candidate, Civil & Environmental Engineering
Princeton University, 2020
Cell: +1 (610) 393-2978
________________________________
From: Matthew Knepley <knepley at gmail.com>
Sent: Thursday, February 6, 2020 5:33
To: Olek Niewiarowski <aan2 at princeton.edu>
Cc: Smith, Barry F. <bsmith at mcs.anl.gov>; petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Implementing the Sherman Morisson formula (low rank update) in petsc4py and FEniCS?

On Wed, Feb 5, 2020 at 8:53 PM Olek Niewiarowski <aan2 at princeton.edu<mailto:aan2 at princeton.edu>> wrote:
Hi Barry and Matt,

Thank you for your input and for creating a new issue in the repo.
My initial question was more basic (how to configure the SNES's KSP solver as in my first message with a and k), but now I see there's more to the implementation. To reiterate, for my problem's structure, a good solution algorithm (on the algebraic level) is the following "double back-substitution":
For each nonlinear iteration:

  1.  define intermediate vectors u_1 and u_2
  2.  solve Ku_1 =  -F --> u_1 = -K^{-1}F (this solve is cheap, don't actually need K^-1)
  3.  solve Ku_2 = -a --> u_2 = -K^{-1}a (ditto)
  4.  define \beta = 1/(1 + k a^Tu_2)
  5.  \Delta u = u_1 + \beta k u_2^T F u_2
  6.  u = u + \Delta u

This is very easy to setup:

  1) Create a KSP object KSPCreate(comm, &ksp)

  2) Call KSPSetOperators(ksp, K, K,)

  3) Solve the first system KSPSolve(ksp, -F, u_1)

  4) Solve the second system KSPSolve(ksp, a, u_2)

  5) Calculate beta VecDot(a, u_2, &gamma); beta = 1./(1. + k*gamma);

  6) Update the guess VecDot(u_2, F, &delta); VecAXPBYPCZ(u, 1.0, beta*delta, 1.0, u_1, u_2)

Thanks,

    Matt

I don't need the Jacobian inverse, [K?kaaT]-1 = K-1  - (kK-1 aaTK-1)/(1+kaTK-1a) just the solution ?u = [K?kaaT]-1F = K-1F - (kK-1 aFK-1a)/(1 + kaTK-1a)
= u_1 + beta k u_2^T F u_2  (so I never need to invert K either). (To Matt's point on gitlab, K is a symmetric sparse matrix arising from a bilinear form. ) Also, eventually, I want to have more than one low-rank updates to K, but again, Sherman Morrisson Woodbury should still work.

Being new to PETSc, I don't know if this algorithm directly translates into an efficient numerical solver. (I'm also not sure if Picard iteration would be useful here.) What would it take to set up the KSP solver in SNES like I did below? Is it possible "out of the box"?  I looked at MatCreateLRC() - what would I pass this to? (A pointer to demo/tutorial would be very appreciated.) If there's a better way to go about all of this, I'm open to any and all ideas. My only limitation is that I use petsc4py exclusively since I/future users of my code will not be comfortable with C.

Thanks again for your help!


Alexander (Olek) Niewiarowski
PhD Candidate, Civil & Environmental Engineering
Princeton University, 2020
Cell: +1 (610) 393-2978
________________________________
From: Smith, Barry F. <bsmith at mcs.anl.gov<mailto:bsmith at mcs.anl.gov>>
Sent: Wednesday, February 5, 2020 15:46
To: Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>>
Cc: Olek Niewiarowski <aan2 at princeton.edu<mailto:aan2 at princeton.edu>>; petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Implementing the Sherman Morisson formula (low rank update) in petsc4py and FEniCS?


https://gitlab.com/petsc/petsc/issues/557


> On Feb 5, 2020, at 7:35 AM, Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>> wrote:
>
> Perhaps Barry is right that you want Picard, but suppose you really want Newton.
>
> "This problem can be solved efficiently using the Sherman-Morrison formula" Well, maybe. The main assumption here is that inverting K is cheap. I see two things you can do in a straightforward way:
>
>   1) Use MatCreateLRC() https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateLRC.html to create the Jacobian
>        and solve using an iterative method. If you pass just K was the preconditioning matrix, you can use common PCs.
>
>   2) We only implemented MatMult() for LRC, but you could stick your SMW code in for MatSolve_LRC if you really want to factor K. We would
>        of course help you do this.
>
>   Thanks,
>
>      Matt
>
> On Wed, Feb 5, 2020 at 1:36 AM Smith, Barry F. via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
>
>    I am not sure of everything in your email but it sounds like you want to use a "Picard" iteration to solve [K(u)?kaaT]?u=?F(u). That is solve
>
>   A(u^{n}) (u^{n+1} - u^{n}) = F(u^{n}) - A(u^{n})u^{n}  where A(u) = K(u) - kaaT
>
>  PETSc provides code to this with SNESSetPicard() (see the manual pages) I don't know if Petsc4py has bindings for this.
>
>   Adding missing python bindings is not terribly difficult and you may be able to do it yourself if this is the approach you want.
>
>    Barry
>
>
>
> > On Feb 4, 2020, at 5:07 PM, Olek Niewiarowski <aan2 at princeton.edu<mailto:aan2 at princeton.edu>> wrote:
> >
> > Hello,
> > I am a FEniCS user but new to petsc4py. I am trying to modify the KSP solver through the SNES object to implement the Sherman-Morrison formula(e.g.  http://fourier.eng.hmc.edu/e176/lectures/algebra/node6.html ). I am solving a nonlinear system of the form [K(u)?kaaT]?u=?F(u). Here the jacobian matrix K is modified by the term kaaT, where k is a scalar.  Notably, K is sparse, while the term kaaT results in a full matrix. This problem can be solved efficiently using the Sherman-Morrison formula :
> >
> > [K?kaaT]-1 = K-1  - (kK-1 aaTK-1)/(1+kaTK-1a)
> > I have managed to successfully implement this at the linear solve level (by modifying the KSP solver) inside a custom Newton solver in python by following an incomplete tutorial at https://www.firedrakeproject.org/petsc-interface.html#defining-a-preconditioner :
> > ?             while (norm(delU) > alpha):  # while not converged
> > ?
> > ?                 self.update_F()  # call to method to update r.h.s form
> > ?                 self.update_K()  # call to update the jacobian form
> > ?                 K = assemble(self.K)  # assemble the jacobian matrix
> > ?                 F = assemble(self.F)  # assemble the r.h.s vector
> > ?                 a = assemble(self.a_form)  # assemble the a_form (see Sherman Morrison formula)
> > ?
> > ?                 for bc in self.mem.bc:  # apply boundary conditions
> > ?                     bc.apply(K, F)
> > ?                     bc.apply(K, a)
> > ?
> > ?                 B = PETSc.Mat().create()
> > ?
> > ?                 # Assemble the bilinear form that defines A and get the concrete
> > ?                 # PETSc matrix
> > ?                 A = as_backend_type(K).mat()  # get the PETSc objects for K and a
> > ?                 u = as_backend_type(a).vec()
> > ?
> > ?                 # Build the matrix "context"  # see firedrake docs
> > ?                 Bctx = MatrixFreeB(A, u, u, self.k)
> > ?
> > ?                 # Set up B
> > ?                 # B is the same size as A
> > ?                 B.setSizes(*A.getSizes())
> > ?                 B.setType(B.Type.PYTHON)
> > ?                 B.setPythonContext(Bctx)
> > ?                 B.setUp()
> > ?
> > ?
> > ?                 ksp = PETSc.KSP().create()   # create the KSP linear solver object
> > ?                 ksp.setOperators(B)
> > ?                 ksp.setUp()
> > ?                 pc = ksp.pc
> > ?                 pc.setType(pc.Type.PYTHON)
> > ?                 pc.setPythonContext(MatrixFreePC())
> > ?                 ksp.setFromOptions()
> > ?
> > ?                 solution = delU    # the incremental displacement at this iteration
> > ?
> > ?                 b = as_backend_type(-F).vec()
> > ?                 delu = solution.vector().vec()
> > ?
> > ?                 ksp.solve(b, delu)
> >
> > ?                 self.mem.u.vector().axpy(0.25, self.delU.vector())  # poor man's linesearch
> > ?                 counter += 1
> > Here is the corresponding petsc4py code adapted from the firedrake docs:
> >
> >       ? class MatrixFreePC(object):
> >       ?
> >       ?     def setUp(self, pc):
> >       ?         B, P = pc.getOperators()
> >       ?         # extract the MatrixFreeB object from B
> >       ?         ctx = B.getPythonContext()
> >       ?         self.A = ctx.A
> >       ?         self.u = ctx.u
> >       ?         self.v = ctx.v
> >       ?         self.k = ctx.k
> >       ?         # Here we build the PC object that uses the concrete,
> >       ?         # assembled matrix A.  We will use this to apply the action
> >       ?         # of A^{-1}
> >       ?         self.pc = PETSc.PC().create()
> >       ?         self.pc.setOptionsPrefix("mf_")
> >       ?         self.pc.setOperators(self.A)
> >       ?         self.pc.setFromOptions()
> >       ?         # Since u and v do not change, we can build the denominator
> >       ?         # and the action of A^{-1} on u only once, in the setup
> >       ?         # phase.
> >       ?         tmp = self.A.createVecLeft()
> >       ?         self.pc.apply(self.u, tmp)
> >       ?         self._Ainvu = tmp
> >       ?         self._denom = 1 + self.k*self.v.dot(self._Ainvu)
> >       ?
> >       ?     def apply(self, pc, x, y):
> >       ?         # y <- A^{-1}x
> >       ?         self.pc.apply(x, y)
> >       ?         # alpha <- (v^T A^{-1} x) / (1 + v^T A^{-1} u)
> >       ?         alpha = (self.k*self.v.dot(y)) / self._denom
> >       ?         # y <- y - alpha * A^{-1}u
> >       ?         y.axpy(-alpha, self._Ainvu)
> >       ?
> >       ?
> >       ? class MatrixFreeB(object):
> >       ?
> >       ?     def __init__(self, A, u, v, k):
> >       ?         self.A = A
> >       ?         self.u = u
> >       ?         self.v = v
> >       ?         self.k = k
> >       ?
> >       ?     def mult(self, mat, x, y):
> >       ?         # y <- A x
> >       ?         self.A.mult(x, y)
> >       ?
> >       ?         # alpha <- v^T x
> >       ?         alpha = self.v.dot(x)
> >       ?
> >       ?         # y <- y + alpha*u
> >       ?         y.axpy(alpha, self.u)
> > However, this approach is not efficient as it requires many iterations due to the Newton step being fixed, so I would like to implement it using SNES and use line search. Unfortunately, I have not been able to find any documentation/tutorial on how to do so. Provided I have the FEniCS forms for F, K, and a, I'd like to do something along the lines of:
> > solver  = PETScSNESSolver() # the FEniCS SNES wrapper
> > snes = solver.snes()  # the petsc4py SNES object
> > ## ??
> > ksp = snes.getKSP()
> >  # set ksp option similar to above
> > solver.solve()
> >
> > I would be very grateful if anyone could could help or point me to a reference or demo that does something similar (or maybe a completely different way of solving the problem!).
> > Many thanks in advance!
> > Alex
>
>
>
> --
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/



--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200206/b8a92294/attachment-0001.html>

From amin.sadeghi at live.com  Thu Feb  6 11:36:37 2020
From: amin.sadeghi at live.com (Amin Sadeghi)
Date: Thu, 6 Feb 2020 17:36:37 +0000
Subject: [petsc-users] PETSc scaling for solving system of equations
Message-ID: <DB6P189MB0373696E027D089C37722FE5F31D0@DB6P189MB0373.EURP189.PROD.OUTLOOK.COM>

Hi,

Recently, I've been playing around with petsc4py to solve a battery simulation, which takes too long to solve using scipy solvers. I also have access to an HPC cluster with a few nodes, each with a dozen CPU cores. However, I can't seem to get any further speedup past 4 processors. Very likely, I'm doing something wrong. I'd really appreciate it if someone could shed some light on this.

For the record, I'm using PETSc's cg solver.

Best,
Amin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200206/da9d1a72/attachment.html>

From bsmith at mcs.anl.gov  Thu Feb  6 12:25:03 2020
From: bsmith at mcs.anl.gov (Smith, Barry F.)
Date: Thu, 6 Feb 2020 18:25:03 +0000
Subject: [petsc-users] Implementing the Sherman Morisson formula (low
 rank update) in petsc4py and FEniCS?
In-Reply-To: <MN2PR04MB6957CF4A0B62376B974A5C778A1D0@MN2PR04MB6957.namprd04.prod.outlook.com>
References: <MN2PR04MB6957EBDB8DBE35C947805DC08A030@MN2PR04MB6957.namprd04.prod.outlook.com>
	<C5580EE8-E585-4570-9266-073A860AB975@anl.gov>
	<CAMYG4Gm4DZB_Q2iQPKFz7SVu1Q7Cg5mN-pnWNY=szjaNGuGjcw@mail.gmail.com>
	<20E8B18C-F71E-4B10-958B-6CD24DA869A3@mcs.anl.gov>
	<MN2PR04MB6957AFB32FE45626A84C4CFB8A1D0@MN2PR04MB6957.namprd04.prod.outlook.com>
	<CAMYG4GkQxHwJh9YoarD=XZb43CHhnsYG=6cM6Ng+EacQrFWsRg@mail.gmail.com>
	<MN2PR04MB6957CF4A0B62376B974A5C778A1D0@MN2PR04MB6957.namprd04.prod.outlook.com>
Message-ID: <35929586-4D5D-4B31-A34E-8D8D266FEA0A@mcs.anl.gov>


   If I remember your problem is K(u) + kaa' = F(u)

   You should start by creating a SNES and calling SNESSetPicard. Read its manual page.  Your matrix should be a MatCreateLRC() for the Mat argument to SNESSetPicard and the Peat should be just your K matrix.

    If you run with -ksp_fd_operator -pc_type lu will be using    K to precondition K + kaa' + d F(U)/dU  . Newton's method should converge at quadratic order. You can use -ksp_fd_operator -pc_type anything else to use an iterative linear solver as the preconditioner of K.  

    If you really want to use Sherman Morisson formula  then you would create a PC shell and do

typedef struct{
   KSP innerksp;
   Vec u_1,u_2;
} YourStruct;

     SNESGetKSP(&ksp);
     KSPGetPC(&pc);
     PCSetType(pc,PCSHELL);
     PCShellSetApply(pc,YourPCApply)
     PetscMemclear(yourstruct,si
     PCShellSetContext(pc,&yourstruct);

     Then 

      YourPCApply(PC pc,Vec in, Vec out)
{
      YourStruct *yourstruct;

       PCShellGetContext(pc,(void**)&yourstruct)

       if (!yourstruct->ksp) {
         PCCreate(comm,&yourstruct->ksp);
         KSPSetPrefix(yourstruct->ksp,"yourpc_");
         Mat A,B;
         KSPGetOperators(ksp,&A,&B);
         KSPSetOperators(yourstruct->ksp,A,B);
         create work vectors 
       }
       Apply the solve as you do for the linear case with Sherman Morisson formula 
}

This you can run with for example -yourpc_pc_type cholesky

   Barry

Looks complicated,  conceptually simple.


> 2) Call KSPSetOperators(ksp, K, K,)
> 
>   3) Solve the first system KSPSolve(ksp, -F, u_1)
> 
>   4) Solve the second system KSPSolve(ksp, a, u_2)
> 
>   5) Calculate beta VecDot(a, u_2, &gamma); beta = 1./(1. + k*gamma);
> 
>   6) Update the guess VecDot(u_2, F, &delta); VecAXPBYPCZ(u, 1.0, beta*delta, 1.0, u_1, u_2)

No

> On Feb 6, 2020, at 9:02 AM, Olek Niewiarowski <aan2 at princeton.edu> wrote:
> 
> Hi Matt, 
> 
> What you suggested in your last email was exactly what I did on my very first attempt at the problem, and while it "worked," convergence was not satisfactory due to the Newton step being fixed in step 6. This is the reason I would like to use the linesearch in SNES instead. Indeed in your manual you "recommend most PETSc users work directly with SNES, rather than using PETSc for the linear problem within a nonlinear solver."  Ideally I'd like to create a SNES solver, pass in the functions to evaluate K, F, a, and k, and set up the underlying KSP object as in my first message. Is this possible?
> 
> Thanks,
> 
> Alexander (Olek) Niewiarowski
> PhD Candidate, Civil & Environmental Engineering
> Princeton University, 2020
> Cell: +1 (610) 393-2978
> From: Matthew Knepley <knepley at gmail.com>
> Sent: Thursday, February 6, 2020 5:33
> To: Olek Niewiarowski <aan2 at princeton.edu>
> Cc: Smith, Barry F. <bsmith at mcs.anl.gov>; petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> Subject: Re: [petsc-users] Implementing the Sherman Morisson formula (low rank update) in petsc4py and FEniCS?
>  
> On Wed, Feb 5, 2020 at 8:53 PM Olek Niewiarowski <aan2 at princeton.edu> wrote:
> Hi Barry and Matt, 
> 
> Thank you for your input and for creating a new issue in the repo. 
> My initial question was more basic (how to configure the SNES's KSP solver as in my first message with a and k), but now I see there's more to the implementation. To reiterate, for my problem's structure, a good solution algorithm (on the algebraic level) is the following "double back-substitution": 
> For each nonlinear iteration:
> 	? define intermediate vectors u_1 and u_2 
> 	? solve Ku_1 =  -F --> u_1 = -K^{-1}F (this solve is cheap, don't actually need K^-1)
> 	? solve Ku_2 = -a --> u_2 = -K^{-1}a (ditto)
> 	? define \beta = 1/(1 + k a^Tu_2)
> 	? \Delta u = u_1 + \beta k u_2^T F u_2
> 	? u = u + \Delta u
> This is very easy to setup: 
> 
>   1) Create a KSP object KSPCreate(comm, &ksp)
> 
>   2) Call KSPSetOperators(ksp, K, K,)
> 
>   3) Solve the first system KSPSolve(ksp, -F, u_1)
> 
>   4) Solve the second system KSPSolve(ksp, a, u_2)
> 
>   5) Calculate beta VecDot(a, u_2, &gamma); beta = 1./(1. + k*gamma);
> 
>   6) Update the guess VecDot(u_2, F, &delta); VecAXPBYPCZ(u, 1.0, beta*delta, 1.0, u_1, u_2)
> 
> Thanks,
> 
>     Matt
> 
> I don't need the Jacobian inverse, [K?kaaT]-1 = K-1  - (kK-1 aaTK-1)/(1+kaTK-1a) just the solution ?u = [K?kaaT]-1F = K-1F - (kK-1 aFK-1a)/(1 + kaTK-1a) 
> = u_1 + beta k u_2^T F u_2  (so I never need to invert K either). (To Matt's point on gitlab, K is a symmetric sparse matrix arising from a bilinear form. ) Also, eventually, I want to have more than one low-rank updates to K, but again, Sherman Morrisson Woodbury should still work. 
> 
> Being new to PETSc, I don't know if this algorithm directly translates into an efficient numerical solver. (I'm also not sure if Picard iteration would be useful here.) What would it take to set up the KSP solver in SNES like I did below? Is it possible "out of the box"?  I looked at MatCreateLRC() - what would I pass this to? (A pointer to demo/tutorial would be very appreciated.) If there's a better way to go about all of this, I'm open to any and all ideas. My only limitation is that I use petsc4py exclusively since I/future users of my code will not be comfortable with C.
> 
> Thanks again for your help!
> 
> 
> Alexander (Olek) Niewiarowski
> PhD Candidate, Civil & Environmental Engineering
> Princeton University, 2020
> Cell: +1 (610) 393-2978
> From: Smith, Barry F. <bsmith at mcs.anl.gov>
> Sent: Wednesday, February 5, 2020 15:46
> To: Matthew Knepley <knepley at gmail.com>
> Cc: Olek Niewiarowski <aan2 at princeton.edu>; petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> Subject: Re: [petsc-users] Implementing the Sherman Morisson formula (low rank update) in petsc4py and FEniCS?
>  
> 
> https://gitlab.com/petsc/petsc/issues/557
> 
> 
> > On Feb 5, 2020, at 7:35 AM, Matthew Knepley <knepley at gmail.com> wrote:
> > 
> > Perhaps Barry is right that you want Picard, but suppose you really want Newton.
> > 
> > "This problem can be solved efficiently using the Sherman-Morrison formula" Well, maybe. The main assumption here is that inverting K is cheap. I see two things you can do in a straightforward way:
> > 
> >   1) Use MatCreateLRC() https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateLRC.html to create the Jacobian
> >        and solve using an iterative method. If you pass just K was the preconditioning matrix, you can use common PCs.
> > 
> >   2) We only implemented MatMult() for LRC, but you could stick your SMW code in for MatSolve_LRC if you really want to factor K. We would
> >        of course help you do this.
> > 
> >   Thanks,
> > 
> >      Matt
> > 
> > On Wed, Feb 5, 2020 at 1:36 AM Smith, Barry F. via petsc-users <petsc-users at mcs.anl.gov> wrote:
> > 
> >    I am not sure of everything in your email but it sounds like you want to use a "Picard" iteration to solve [K(u)?kaaT]?u=?F(u). That is solve
> > 
> >   A(u^{n}) (u^{n+1} - u^{n}) = F(u^{n}) - A(u^{n})u^{n}  where A(u) = K(u) - kaaT
> > 
> >  PETSc provides code to this with SNESSetPicard() (see the manual pages) I don't know if Petsc4py has bindings for this.
> > 
> >   Adding missing python bindings is not terribly difficult and you may be able to do it yourself if this is the approach you want.
> > 
> >    Barry
> > 
> > 
> > 
> > > On Feb 4, 2020, at 5:07 PM, Olek Niewiarowski <aan2 at princeton.edu> wrote:
> > > 
> > > Hello, 
> > > I am a FEniCS user but new to petsc4py. I am trying to modify the KSP solver through the SNES object to implement the Sherman-Morrison formula(e.g.  http://fourier.eng.hmc.edu/e176/lectures/algebra/node6.html ). I am solving a nonlinear system of the form [K(u)?kaaT]?u=?F(u). Here the jacobian matrix K is modified by the term kaaT, where k is a scalar.  Notably, K is sparse, while the term kaaT results in a full matrix. This problem can be solved efficiently using the Sherman-Morrison formula : 
> > > 
> > > [K?kaaT]-1 = K-1  - (kK-1 aaTK-1)/(1+kaTK-1a)
> > > I have managed to successfully implement this at the linear solve level (by modifying the KSP solver) inside a custom Newton solver in python by following an incomplete tutorial at https://www.firedrakeproject.org/petsc-interface.html#defining-a-preconditioner :
> > > ?             while (norm(delU) > alpha):  # while not converged
> > > ?   
> > > ?                 self.update_F()  # call to method to update r.h.s form
> > > ?                 self.update_K()  # call to update the jacobian form
> > > ?                 K = assemble(self.K)  # assemble the jacobian matrix
> > > ?                 F = assemble(self.F)  # assemble the r.h.s vector
> > > ?                 a = assemble(self.a_form)  # assemble the a_form (see Sherman Morrison formula)
> > > ?   
> > > ?                 for bc in self.mem.bc:  # apply boundary conditions
> > > ?                     bc.apply(K, F)  
> > > ?                     bc.apply(K, a)  
> > > ?   
> > > ?                 B = PETSc.Mat().create()  
> > > ?   
> > > ?                 # Assemble the bilinear form that defines A and get the concrete  
> > > ?                 # PETSc matrix  
> > > ?                 A = as_backend_type(K).mat()  # get the PETSc objects for K and a
> > > ?                 u = as_backend_type(a).vec()  
> > > ?   
> > > ?                 # Build the matrix "context"  # see firedrake docs
> > > ?                 Bctx = MatrixFreeB(A, u, u, self.k)  
> > > ?   
> > > ?                 # Set up B  
> > > ?                 # B is the same size as A  
> > > ?                 B.setSizes(*A.getSizes())  
> > > ?                 B.setType(B.Type.PYTHON)  
> > > ?                 B.setPythonContext(Bctx)  
> > > ?                 B.setUp()  
> > > ?   
> > > ?   
> > > ?                 ksp = PETSc.KSP().create()   # create the KSP linear solver object
> > > ?                 ksp.setOperators(B)  
> > > ?                 ksp.setUp()  
> > > ?                 pc = ksp.pc  
> > > ?                 pc.setType(pc.Type.PYTHON)  
> > > ?                 pc.setPythonContext(MatrixFreePC())  
> > > ?                 ksp.setFromOptions()  
> > > ?   
> > > ?                 solution = delU    # the incremental displacement at this iteration
> > > ?   
> > > ?                 b = as_backend_type(-F).vec()  
> > > ?                 delu = solution.vector().vec()  
> > > ?   
> > > ?                 ksp.solve(b, delu) 
> > > 
> > > ?                 self.mem.u.vector().axpy(0.25, self.delU.vector())  # poor man's linesearch
> > > ?                 counter += 1  
> > > Here is the corresponding petsc4py code adapted from the firedrake docs:
> > > 
> > >       ? class MatrixFreePC(object):  
> > >       ?   
> > >       ?     def setUp(self, pc):  
> > >       ?         B, P = pc.getOperators()  
> > >       ?         # extract the MatrixFreeB object from B  
> > >       ?         ctx = B.getPythonContext()  
> > >       ?         self.A = ctx.A  
> > >       ?         self.u = ctx.u  
> > >       ?         self.v = ctx.v  
> > >       ?         self.k = ctx.k  
> > >       ?         # Here we build the PC object that uses the concrete,  
> > >       ?         # assembled matrix A.  We will use this to apply the action  
> > >       ?         # of A^{-1}  
> > >       ?         self.pc = PETSc.PC().create()  
> > >       ?         self.pc.setOptionsPrefix("mf_")  
> > >       ?         self.pc.setOperators(self.A)  
> > >       ?         self.pc.setFromOptions()  
> > >       ?         # Since u and v do not change, we can build the denominator  
> > >       ?         # and the action of A^{-1} on u only once, in the setup  
> > >       ?         # phase.  
> > >       ?         tmp = self.A.createVecLeft()  
> > >       ?         self.pc.apply(self.u, tmp)  
> > >       ?         self._Ainvu = tmp  
> > >       ?         self._denom = 1 + self.k*self.v.dot(self._Ainvu)  
> > >       ?   
> > >       ?     def apply(self, pc, x, y):  
> > >       ?         # y <- A^{-1}x  
> > >       ?         self.pc.apply(x, y)  
> > >       ?         # alpha <- (v^T A^{-1} x) / (1 + v^T A^{-1} u)  
> > >       ?         alpha = (self.k*self.v.dot(y)) / self._denom  
> > >       ?         # y <- y - alpha * A^{-1}u  
> > >       ?         y.axpy(-alpha, self._Ainvu)  
> > >       ?   
> > >       ?   
> > >       ? class MatrixFreeB(object):  
> > >       ?   
> > >       ?     def __init__(self, A, u, v, k):  
> > >       ?         self.A = A  
> > >       ?         self.u = u  
> > >       ?         self.v = v  
> > >       ?         self.k = k  
> > >       ?   
> > >       ?     def mult(self, mat, x, y):  
> > >       ?         # y <- A x  
> > >       ?         self.A.mult(x, y)  
> > >       ?   
> > >       ?         # alpha <- v^T x  
> > >       ?         alpha = self.v.dot(x)  
> > >       ?   
> > >       ?         # y <- y + alpha*u  
> > >       ?         y.axpy(alpha, self.u)  
> > > However, this approach is not efficient as it requires many iterations due to the Newton step being fixed, so I would like to implement it using SNES and use line search. Unfortunately, I have not been able to find any documentation/tutorial on how to do so. Provided I have the FEniCS forms for F, K, and a, I'd like to do something along the lines of:
> > > solver  = PETScSNESSolver() # the FEniCS SNES wrapper
> > > snes = solver.snes()  # the petsc4py SNES object
> > > ## ??
> > > ksp = snes.getKSP()
> > >  # set ksp option similar to above
> > > solver.solve()
> > > 
> > > I would be very grateful if anyone could could help or point me to a reference or demo that does something similar (or maybe a completely different way of solving the problem!). 
> > > Many thanks in advance!
> > > Alex
> > 
> > 
> > 
> > -- 
> > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> > -- Norbert Wiener
> > 
> > https://www.cse.buffalo.edu/~knepley/
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/


From jed at jedbrown.org  Thu Feb  6 12:25:56 2020
From: jed at jedbrown.org (Jed Brown)
Date: Thu, 06 Feb 2020 11:25:56 -0700
Subject: [petsc-users] PETSc scaling for solving system of equations
In-Reply-To: <DB6P189MB0373696E027D089C37722FE5F31D0@DB6P189MB0373.EURP189.PROD.OUTLOOK.COM>
References: <DB6P189MB0373696E027D089C37722FE5F31D0@DB6P189MB0373.EURP189.PROD.OUTLOOK.COM>
Message-ID: <87pnerd13f.fsf@jedbrown.org>

As a first step, please run with -log_view and send results.  What is the size of your problem and what preconditioner are you using?

Amin Sadeghi <amin.sadeghi at live.com> writes:

> Hi,
>
> Recently, I've been playing around with petsc4py to solve a battery simulation, which takes too long to solve using scipy solvers. I also have access to an HPC cluster with a few nodes, each with a dozen CPU cores. However, I can't seem to get any further speedup past 4 processors. Very likely, I'm doing something wrong. I'd really appreciate it if someone could shed some light on this.
>
> For the record, I'm using PETSc's cg solver.
>
> Best,
> Amin

From bsmith at mcs.anl.gov  Thu Feb  6 13:03:50 2020
From: bsmith at mcs.anl.gov (Smith, Barry F.)
Date: Thu, 6 Feb 2020 19:03:50 +0000
Subject: [petsc-users] What is the right way to implement a (block)
 Diagonal ILU as PC?
In-Reply-To: <BN8PR07MB6228BCD9B6D1205CB3E6D60F951D0@BN8PR07MB6228.namprd07.prod.outlook.com>
References: <MN2PR07MB6239B134B7BCE032E8ACC4C995030@MN2PR07MB6239.namprd07.prod.outlook.com>
	<264F91C4-8558-4850-9B4B-ABE4123C2A2C@anl.gov>
	<MN2PR07MB6239E7694FFB455C533991FB95020@MN2PR07MB6239.namprd07.prod.outlook.com>
	<4A373D93-4018-45E0-B805-3ECC528472DD@mcs.anl.gov>
	<BN8PR07MB6228BCD9B6D1205CB3E6D60F951D0@BN8PR07MB6228.namprd07.prod.outlook.com>
Message-ID: <A3BEB448-119F-472E-A133-099EB4332E47@mcs.anl.gov>


  Read my comments ALL the way down, they go a long way.

> On Feb 6, 2020, at 3:43 AM, Hao DONG <dong-hao at outlook.com> wrote:
> 
> Dear Hong and Barry, 
> 
> Thanks for the suggestions. So there could be some problems in my PETSc configuration? - but my PETSc lib was indeed compiled without the debug flags (--with-debugging=0). I use GCC/GFortran (Home-brew GCC 9.2.0) for the compiling and building of PETSc and my own fortran code. My Fortran compiling flags are simply like: 
> 
> -O3 -ffree-line-length-none -fastsse 
> 
> Which is also used for -FOPTFLAGS in PETSc (I added -openmp for PETSc, but not my fortran code, as I don?t have any OMP optimizations in my program). Note the performance test results I listed yesterday (e.g. 4.08s with 41 bcgs iterations.) are without any CSR-array->PETSc translation overhead (only include the set and solve part). 

   PETSc doesn't use -openmp in any way for its solvers. Do not use this option, it may be slowing the code down. Please send configure.log

> 
> I have two questions about the performance difference - 
> 
> 1. Is ilu only factorized once for each iteration, or ilu is performed at every outer ksp iteration steps? Sounds unlikely - but if so, this could cause some extra overheads. 

   ILU is ONLY done if the matrix has changed (which seems wrong).
> 
> 2. Some KSP solvers like BCGS or TFQMR has two ?half-iterations? for each iteration step. Not sure how it works in PETSc, but is that possible that both the two ?half" relative residuals are counted in the output array, doubling the number of iterations (but that cannot explain the extra time consumed)?

   Yes, PETSc might report them as two, you need to check the exact code.

> 
> Anyway, the output with -log_view from the same 278906 by 278906 matrix with 3-block D-ILU in PETSc is as follows: 
> 
> 
> ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
>  
>  MEMsolv.lu on a arch-darwin-c-opt named Haos-MBP with 1 processor, by donghao Thu Feb  6 09:07:35 2020
>  Using Petsc Release Version 3.12.3, unknown
>  
>                           Max       Max/Min     Avg       Total
>  Time (sec):           4.443e+00     1.000   4.443e+00
>  Objects:              1.155e+03     1.000   1.155e+03
>  Flop:                 4.311e+09     1.000   4.311e+09  4.311e+09
>  Flop/sec:             9.703e+08     1.000   9.703e+08  9.703e+08
>  MPI Messages:         0.000e+00     0.000   0.000e+00  0.000e+00
>  MPI Message Lengths:  0.000e+00     0.000   0.000e+00  0.000e+00
>  MPI Reductions:       0.000e+00     0.000
>  
>  Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
>                              e.g., VecAXPY() for real vectors of length N --> 2N flop
>                              and VecAXPY() for complex vectors of length N --> 8N flop
>  
>  Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
>                          Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
>   0:      Main Stage: 4.4435e+00 100.0%  4.3113e+09 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>  
>  ????????????????????????????????????????????????????????????
>  See the 'Profiling' chapter of the users' manual for details on interpreting output.
>  Phase summary info:
>     Count: number of times phase was executed
>     Time and Flop: Max - maximum over all processors
>                    Ratio - ratio of maximum to minimum over all processors
>     Mess: number of messages sent
>     AvgLen: average message length (bytes)
>     Reduct: number of global reductions
>     Global: entire computation
>     Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
>        %T - percent time in this phase         %F - percent flop in this phase
>        %M - percent messages in this phase     %L - percent message lengths in this phase
>        %R - percent reductions in this phase
>     Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
>  ------------------------------------------------------------------------------------------------------------------------
>  Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
>                     Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>  ------------------------------------------------------------------------------------------------------------------------
>  
>  --- Event Stage 0: Main Stage
>  
>  BuildTwoSidedF         1 1.0 2.3000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  MatMult               83 1.0 1.7815e+00 1.0 2.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 40 48  0  0  0  40 48  0  0  0  1168
>  MatSolve             252 1.0 1.2708e+00 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 29 28  0  0  0  29 28  0  0  0   939
>  MatLUFactorNum         3 1.0 7.9725e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   298
>  MatILUFactorSym        3 1.0 2.6998e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>  MatAssemblyBegin       5 1.0 3.6000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  MatAssemblyEnd         5 1.0 3.1619e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>  MatGetRowIJ            3 1.0 2.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  MatCreateSubMats       1 1.0 3.9659e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>  MatGetOrdering         3 1.0 4.3070e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  MatView                3 1.0 1.3600e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  VecDot                82 1.0 1.8948e-01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00  4  4  0  0  0   4  4  0  0  0   966
>  VecDotNorm2           41 1.0 1.6812e-01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00  4  4  0  0  0   4  4  0  0  0  1088
>  VecNorm               43 1.0 9.5099e-02 1.0 9.59e+07 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0  1009
>  VecCopy                2 1.0 1.0120e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  VecSet               271 1.0 3.8922e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>  VecAXPY                1 1.0 7.7200e-04 1.0 2.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2890
>  VecAXPBYCZ            82 1.0 2.4370e-01 1.0 3.66e+08 1.0 0.0e+00 0.0e+00 0.0e+00  5  8  0  0  0   5  8  0  0  0  1502
>  VecWAXPY              82 1.0 1.4148e-01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  4  0  0  0   3  4  0  0  0  1293
>  VecAssemblyBegin       2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  VecAssemblyEnd         2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  VecScatterBegin       84 1.0 5.9300e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  KSPSetUp               4 1.0 1.4167e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  KSPSolve               1 1.0 4.0250e+00 1.0 4.31e+09 1.0 0.0e+00 0.0e+00 0.0e+00 91100  0  0  0  91100  0  0  0  1071
>  PCSetUp                4 1.0 1.5207e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  3  1  0  0  0   3  1  0  0  0   156
>  PCSetUpOnBlocks        1 1.0 1.1116e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  3  1  0  0  0   3  1  0  0  0   214
>  PCApply               84 1.0 1.2912e+00 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 29 28  0  0  0  29 28  0  0  0   924
>  PCApplyOnBlocks      252 1.0 1.2909e+00 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 29 28  0  0  0  29 28  0  0  0   924
>  ------------------------------------------------------------------------------------------------------------------------
> 
> # I skipped the memory part - the options (and compiler options) are as follows: 
> 
> #PETSc Option Table entries: 
>  -ksp_type bcgs
>  -ksp_view
>  -log_view
>  -pc_bjacobi_local_blocks 3
>  -pc_factor_levels 0
>  -pc_sub_type ilu
>  -pc_type bjacobi
>  #End of PETSc Option Table entries
>  Compiled with FORTRAN kernels
>  Compiled with full precision matrices (default)
>  sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 16 sizeof(PetscInt) 4
>  Configure options: --with-scalar-type=complex --download-mumps --download-scalapack --with-fortran-kernels=1 --          FOPTFLAGS=?-O3 -fastsse -mp -openmp? --COPTFLAGS=?-O3 -fastsse -mp -openmp? --CXXOPTFLAGS="-O3 -fastsse -mp -openmp" --     with-debugging=0
>  -----------------------------------------
>  Libraries compiled on 2020-02-03 10:44:31 on Haos-MBP
>  Machine characteristics: Darwin-19.2.0-x86_64-i386-64bit
>  Using PETSc directory: /Users/donghao/src/git/PETSc-current
>  Using PETSc arch: arch-darwin-c-opt
>  -----------------------------------------
> 
>  Using C compiler: mpicc  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fno-stack-   check -Qunused-arguments -fvisibility=hidden
>  Using Fortran compiler: mpif90  -Wall -ffree-line-length-0 -Wno-unused-dummy-argument
> 
> Using include paths: -I/Users/donghao/src/git/PETSc-current/include -I/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/include
>  -----------------------------------------
>  
>  Using C linker: mpicc
>  Using Fortran linker: mpif90
>  Using libraries: -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/donghao/src/git/PETSc-   current/arch-darwin-c-opt/lib -lpetsc -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/    donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -Wl,-rpath,/usr/local/opt/libevent/lib -L/usr/local/opt/libevent/    lib -Wl,-rpath,/usr/local/Cellar/open-mpi/4.0.2/lib -L/usr/local/Cellar/open-mpi/4.0.2/lib -Wl,-rpath,/usr/local/Cellar/ gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple-darwin19/9.2.0 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple-        darwin19/9.2.0 -Wl,-rpath,/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -lcmumps -   ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -llapack -lblas -lc++ -ldl -lmpi_usempif08 -                 lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lquadmath -lm -lc++ -ldl
> 
> 
> On the other hand, running PETSc with 
> 
> -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type lu -ksp_type gmres -ksp_monitor -ksp_view -log_view
> 
> For the same problem takes 5.37s and  72 GMRES iterations. Our previous testings show that BiCGstab (bcgs in PETSc) is almost always the most effective KSP solver for our non-symmetrical complex system. Strangely, the system is still using ilu instead of lu for sub blocks. The output is like: 

  -sub_pc_type lu

> 
>    0 KSP Residual norm 2.480412407430e+02
>    1 KSP Residual norm 8.848059967835e+01
>    2 KSP Residual norm 3.415272863261e+01
>    3 KSP Residual norm 1.563045190939e+01
>    4 KSP Residual norm 6.241296940043e+00
>    5 KSP Residual norm 2.739710899854e+00
>    6 KSP Residual norm 1.391304148888e+00
>    7 KSP Residual norm 7.959262020849e-01
>    8 KSP Residual norm 4.828323055231e-01
>    9 KSP Residual norm 2.918529739200e-01
>   10 KSP Residual norm 1.905508589557e-01
>   11 KSP Residual norm 1.291541892702e-01
>   12 KSP Residual norm 8.827145774707e-02
>   13 KSP Residual norm 6.521331095889e-02
>   14 KSP Residual norm 5.095787952595e-02
>   15 KSP Residual norm 4.043060387395e-02
>   16 KSP Residual norm 3.232590200012e-02
>   17 KSP Residual norm 2.593944982216e-02
>   18 KSP Residual norm 2.064639483533e-02
>   19 KSP Residual norm 1.653916663492e-02
>   20 KSP Residual norm 1.334946415452e-02
>   21 KSP Residual norm 1.092886880597e-02
>   22 KSP Residual norm 8.988004105542e-03
>   23 KSP Residual norm 7.466501315240e-03
>   24 KSP Residual norm 6.284389135436e-03
>   25 KSP Residual norm 5.425231669964e-03
>   26 KSP Residual norm 4.766338253084e-03
>   27 KSP Residual norm 4.241238878242e-03
>   28 KSP Residual norm 3.808113525685e-03
>   29 KSP Residual norm 3.449383788116e-03
>   30 KSP Residual norm 3.126025526388e-03
>   31 KSP Residual norm 2.958328054299e-03
>   32 KSP Residual norm 2.802344900403e-03
>   33 KSP Residual norm 2.621993580492e-03
>   34 KSP Residual norm 2.430066269304e-03
>   35 KSP Residual norm 2.259043079597e-03
>   36 KSP Residual norm 2.104287972986e-03
>   37 KSP Residual norm 1.952916080045e-03
>   38 KSP Residual norm 1.804988937999e-03
>   39 KSP Residual norm 1.643302117377e-03
>   40 KSP Residual norm 1.471661332036e-03
>   41 KSP Residual norm 1.286445911163e-03
>   42 KSP Residual norm 1.127543025848e-03
>   43 KSP Residual norm 9.777148275484e-04
>   44 KSP Residual norm 8.293314450006e-04
>   45 KSP Residual norm 6.989331136622e-04
>   46 KSP Residual norm 5.852307780220e-04
>   47 KSP Residual norm 4.926715539762e-04
>   48 KSP Residual norm 4.215941372075e-04
>   49 KSP Residual norm 3.699489548162e-04
>   50 KSP Residual norm 3.293897163533e-04
>   51 KSP Residual norm 2.959954542998e-04
>   52 KSP Residual norm 2.700193032414e-04
>   53 KSP Residual norm 2.461789791204e-04
>   54 KSP Residual norm 2.218839085563e-04
>   55 KSP Residual norm 1.945154309976e-04
>   56 KSP Residual norm 1.661128781744e-04
>   57 KSP Residual norm 1.413198766258e-04
>   58 KSP Residual norm 1.213984003195e-04
>   59 KSP Residual norm 1.044317450754e-04
>   60 KSP Residual norm 8.919957502977e-05
>   61 KSP Residual norm 8.042584301275e-05
>   62 KSP Residual norm 7.292784493581e-05
>   63 KSP Residual norm 6.481935501872e-05
>   64 KSP Residual norm 5.718564652679e-05
>   65 KSP Residual norm 5.072589750116e-05
>   66 KSP Residual norm 4.487930741285e-05
>   67 KSP Residual norm 3.941040674119e-05
>   68 KSP Residual norm 3.492873281291e-05
>   69 KSP Residual norm 3.103798339845e-05
>   70 KSP Residual norm 2.822943237409e-05
>   71 KSP Residual norm 2.610615023776e-05
>   72 KSP Residual norm 2.441692671173e-05
>  KSP Object: 1 MPI processes
>    type: gmres
>      restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>      happy breakdown tolerance 1e-30
>    maximum iterations=150, nonzero initial guess
>    tolerances:  relative=1e-07, absolute=1e-50, divergence=10000.
>    left preconditioning
>    using PRECONDITIONED norm type for convergence test
>  PC Object: 1 MPI processes
>    type: bjacobi
>      number of blocks = 3
>      Local solve is same for all blocks, in the following KSP and PC objects:
>      KSP Object: (sub_) 1 MPI processes
>        type: preonly
>        maximum iterations=10000, initial guess is zero
>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>        left preconditioning
>        using NONE norm type for convergence test
>      PC Object: (sub_) 1 MPI processes
>        type: ilu
>          out-of-place factorization
>          0 levels of fill
>          tolerance for zero pivot 2.22045e-14
>          matrix ordering: natural
>          factor fill ratio given 1., needed 1.
>            Factored matrix follows:
>              Mat Object: 1 MPI processes
>                type: seqaij
>                rows=92969, cols=92969
>                package used to perform factorization: petsc
>                total: nonzeros=638417, allocated nonzeros=638417
>                total number of mallocs used during MatSetValues calls=0
>                  not using I-node routines
>        linear system matrix = precond matrix:
>        Mat Object: 1 MPI processes
>          type: seqaij
>          rows=92969, cols=92969
>          total: nonzeros=638417, allocated nonzeros=638417
>          total number of mallocs used during MatSetValues calls=0
>            not using I-node routines
>    linear system matrix = precond matrix:
>    Mat Object: 1 MPI processes
>      type: mpiaij
>      rows=278906, cols=278906
>      total: nonzeros=3274027, allocated nonzeros=3274027
>      total number of mallocs used during MatSetValues calls=0
>        not using I-node (on process 0) routines
> ...
>  ------------------------------------------------------------------------------------------------------------------------
>  Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
>                     Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>  ------------------------------------------------------------------------------------------------------------------------
>  
>  --- Event Stage 0: Main Stage
>  
>  BuildTwoSidedF         1 1.0 2.3000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  MatMult               75 1.0 1.5812e+00 1.0 1.88e+09 1.0 0.0e+00 0.0e+00 0.0e+00 28 24  0  0  0  28 24  0  0  0  1189
>  MatSolve             228 1.0 1.1442e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 14  0  0  0  20 14  0  0  0   944

   These flop rates are ok, but not great. Perhaps an older machine.

>  MatLUFactorNum         3 1.0 8.1930e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0   290
>  MatILUFactorSym        3 1.0 2.7102e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  MatAssemblyBegin       5 1.0 3.7000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  MatAssemblyEnd         5 1.0 3.1895e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>  MatGetRowIJ            3 1.0 2.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  MatCreateSubMats       1 1.0 4.0904e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>  MatGetOrdering         3 1.0 4.2640e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  MatView                3 1.0 1.4400e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  VecMDot               72 1.0 1.1984e+00 1.0 2.25e+09 1.0 0.0e+00 0.0e+00 0.0e+00 21 28  0  0  0  21 28  0  0  0  1877

    21 percent of the time in VecMDOT this is huge for s sequential fun. I think maybe you are using a terrible OpenMP BLAS? 

    Send configure.log 


>  VecNorm               76 1.0 1.6841e-01 1.0 1.70e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  2  0  0  0   3  2  0  0  0  1007
>  VecScale              75 1.0 1.8241e-02 1.0 8.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  4587
>  VecCopy                3 1.0 1.4970e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  VecSet               276 1.0 9.1970e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
>  VecAXPY                6 1.0 3.7450e-03 1.0 1.34e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  3575
>  VecMAXPY              75 1.0 1.0022e+00 1.0 2.41e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 30  0  0  0  18 30  0  0  0  2405
>  VecAssemblyBegin       2 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  VecAssemblyEnd         2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  VecScatterBegin       76 1.0 5.5100e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  VecNormalize          75 1.0 1.8462e-01 1.0 2.51e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  3  0  0  0   3  3  0  0  0  1360
>  KSPSetUp               4 1.0 1.1341e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  KSPSolve               1 1.0 5.3123e+00 1.0 7.91e+09 1.0 0.0e+00 0.0e+00 0.0e+00 93100  0  0  0  93100  0  0  0  1489
>  KSPGMRESOrthog        72 1.0 2.1316e+00 1.0 4.50e+09 1.0 0.0e+00 0.0e+00 0.0e+00 37 57  0  0  0  37 57  0  0  0  2110
>  PCSetUp                4 1.0 1.5531e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0   153
>  PCSetUpOnBlocks        1 1.0 1.1343e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0   209
>  PCApply               76 1.0 1.1671e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 14  0  0  0  20 14  0  0  0   925
>  PCApplyOnBlocks      228 1.0 1.1668e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 14  0  0  0  20 14  0  0  0   925
>  ????????????????????????????????????????????????????????????
> ...
> #PETSc Option Table entries:
>  -ksp_monitor
>  -ksp_type gmres
>  -ksp_view
>  -log_view
>  -pc_bjacobi_local_blocks 3
>  -pc_sub_type lu
>  -pc_type bjacobi
>  #End of PETSc Option Table entries
> ...
> 
> Does any of the setup/output ring a bell? 
> 
> BTW, out of curiosity - what is a ?I-node? routine?
> 
> 
> Cheers, 
> Hao
> 
> 
> From: Smith, Barry F. <bsmith at mcs.anl.gov>
> Sent: Wednesday, February 5, 2020 9:42 PM
> To: Hao DONG <dong-hao at outlook.com>
> Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> Subject: Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC?
>  
> 
> 
> > On Feb 5, 2020, at 4:36 AM, Hao DONG <dong-hao at outlook.com> wrote:
> > 
> > Thanks a lot for your suggestions, Hong and Barry - 
> > 
> > As you suggested, I first tried the LU direct solvers (built-in and MUMPs) out this morning, which work perfectly, albeit slow. As my problem itself is a part of a PDE based optimization, the exact solution in the searching procedure is not necessary (I often set a relative tolerance of 1E-7/1E-8, etc. for Krylov subspace methods). Also tried BJACOBI with exact LU, the KSP just converges in one or two iterations, as expected. 
> > 
> > I added -kspview option for the D-ILU test (still with Block Jacobi as preconditioner and bcgs as KSP solver). The KSPview output from one of the examples in a toy model looks like: 
> > 
> > KSP Object: 1 MPI processes
> >    type: bcgs
> >    maximum iterations=120, nonzero initial guess
> >    tolerances:  relative=1e-07, absolute=1e-50, divergence=10000.
> >    left preconditioning
> >    using PRECONDITIONED norm type for convergence test
> >  PC Object: 1 MPI processes
> >    type: bjacobi
> >      number of blocks = 3
> >      Local solve is same for all blocks, in the following KSP and PC objects:
> >      KSP Object: (sub_) 1 MPI processes
> >        type: preonly
> >        maximum iterations=10000, initial guess is zero
> >        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >        left preconditioning
> >        using NONE norm type for convergence test
> >      PC Object: (sub_) 1 MPI processes
> >        type: ilu
> >          out-of-place factorization
> >          0 levels of fill
> >          tolerance for zero pivot 2.22045e-14
> >          matrix ordering: natural
> >          factor fill ratio given 1., needed 1.
> >            Factored matrix follows:
> >              Mat Object: 1 MPI processes
> >                type: seqaij
> >                rows=11294, cols=11294
> >                package used to perform factorization: petsc
> >                total: nonzeros=76008, allocated nonzeros=76008
> >                total number of mallocs used during MatSetValues calls=0
> >                  not using I-node routines
> >        linear system matrix = precond matrix:
> >        Mat Object: 1 MPI processes
> >          type: seqaij
> >          rows=11294, cols=11294
> >          total: nonzeros=76008, allocated nonzeros=76008
> >          total number of mallocs used during MatSetValues calls=0
> >            not using I-node routines
> >    linear system matrix = precond matrix:
> >    Mat Object: 1 MPI processes
> >      type: mpiaij
> >      rows=33880, cols=33880
> >      total: nonzeros=436968, allocated nonzeros=436968
> >      total number of mallocs used during MatSetValues calls=0
> >        not using I-node (on process 0) routines
> > 
> > do you see something wrong with my setup?
> > 
> > I also tried a quick performance test with a small 278906 by 278906 matrix (3850990 nnz) with the following parameters: 
> > 
> > -ksp_type bcgs -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type ilu -ksp_view
> > 
> > Reducing the relative residual to 1E-7 
> > 
> > Took 4.08s with 41 bcgs iterations. 
> > 
> > Merely changing the -pc_bjacobi_local_blocks to 6 
> > 
> > Took 7.02s with 73 bcgs iterations. 9 blocks would further take 9.45s with 101 bcgs iterations.
> 
>     This is normal. more blocks slower convergence
> > 
> > As a reference, my home-brew Fortran code solves the same problem (3-block D-ILU0) in 
> > 
> > 1.84s with 24 bcgs iterations (the bcgs code is also a home-brew one)?
> > 
>     Run the PETSc code with optimization ./configure --with-debugging=0  an run the code with -log_view this will show where the PETSc code is spending the time (send it to use)
> 
> 
> > 
> > 
> > Well, by saying ?use explicit L/U matrix as preconditioner?, I wonder if a user is allowed to provide his own (separate) L and U Mat for preconditioning (like how it works in Matlab solvers), e.g.
> > 
> > x = qmr(A,b,Tol,MaxIter,L,U,x)
> >  
> > As I already explicitly constructed the L and U matrices in Fortran, it is not hard to convert them to Mat in Petsc to test Petsc and my Fortran code head-to-head. In that case, the A, b, x, and L/U are all identical, it would be easier to see where the problem came from. 
> > 
> > 
>      No, we don't provide this kind of support
>      
> 
> > 
> > BTW, there is another thing I wondered - is there a way to output residual in unpreconditioned norm? I tried to
> > 
> > call KSPSetNormType(ksp_local, KSP_NORM_UNPRECONDITIONED, ierr)
> > 
> > But always get an error that current ksp does not support unpreconditioned in LEFT/RIGHT (either way). Is it possible to do that (output unpreconditioned residual) in PETSc at all?
> 
>    -ksp_monitor_true_residual    You can also run GMRES (and some other methods) with right preconditioning, -ksp_pc_side right  then the residual computed is by the algorithm the unpreconditioned residual
> 
>    KSPSetNormType sets the norm used in the algorithm, it generally always has to left or right, only a couple algorithms support unpreconditioned directly.
> 
>    Barry
> 
> 
> > 
> > Cheers, 
> > Hao
> > 
> > 
> > From: Smith, Barry F. <bsmith at mcs.anl.gov>
> > Sent: Tuesday, February 4, 2020 8:27 PM
> > To: Hao DONG <dong-hao at outlook.com>
> > Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> > Subject: Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC?
> >  
> > 
> > 
> > > On Feb 4, 2020, at 12:41 PM, Hao DONG <dong-hao at outlook.com> wrote:
> > > 
> > > Dear all, 
> > > 
> > > 
> > > I have a few questions about the implementation of diagonal ILU PC in PETSc. I want to solve a very simple system with KSP (in parallel), the nature of the system (finite difference time-harmonic Maxwell) is probably not important to the question itself. Long story short, I just need to solve a set of equations of Ax = b with a block diagonal system matrix, like (not sure if the mono font works): 
> > > 
> > >    |X    |  
> > > A =|  Y  |  
> > >    |    Z| 
> > > 
> > > Note that A is not really block-diagonal, it?s just a multi-diagonal matrix determined by the FD mesh, where most elements are close to diagonal. So instead of a full ILU decomposition, a D-ILU is a good approximation as a preconditioner, and the number of blocks should not matter too much: 
> > > 
> > >     |Lx      |         |Ux      |
> > > L = |   Ly   | and U = |   Uy   |
> > >     |      Lz|         |      Uz|
> > > 
> > > Where [Lx, Ux] = ILU0(X), etc. Indeed, the D-ILU preconditioner (with 3N blocks) is quite efficient with Krylov subspace methods like BiCGstab or QMR in my sequential Fortran/Matlab code. 
> > > 
> > > So like most users, I am looking for a parallel implement with this problem in PETSc. After looking through the manual, it seems that the most straightforward way to do it is through PCBJACOBI. Not sure I understand it right, I just setup a 3-block PCJACOBI and give each of the block a KSP with PCILU. Is this supposed to be equivalent to my D-ILU preconditioner? My little block of fortran code would look like: 
> > > ...
> > >       call PCBJacobiSetTotalBlocks(pc_local,Ntotal,                   &
> > >      &     isubs,ierr)
> > >       call PCBJacobiSetLocalBlocks(pc_local, Nsub,                    &
> > >      &    isubs(istart:iend),ierr)
> > >       ! set up the block jacobi structure
> > >       call KSPSetup(ksp_local,ierr)
> > >       ! allocate sub ksps
> > >       allocate(ksp_sub(Nsub))
> > >       call PCBJacobiGetSubKSP(pc_local,Nsub,istart,                   &
> > >      &     ksp_sub,ierr)
> > >       do i=1,Nsub
> > >           call KSPGetPC(ksp_sub(i),pc_sub,ierr)
> > >           !ILU preconditioner
> > >           call PCSetType(pc_sub,ptype,ierr)
> > >           call PCFactorSetLevels(pc_sub,1,ierr) ! use ILU(1) here
> > >           call KSPSetType(ksp_sub(i),KSPPREONLY,ierr)]
> > >       end do
> > >       call KSPSetTolerances(ksp_local,KSSiter%tol,PETSC_DEFAULT_REAL, &
> > >      &     PETSC_DEFAULT_REAL,KSSiter%maxit,ierr)
> > > ? 
> > 
> >      This code looks essentially right. You should call with -ksp_view to see exactly what PETSc is using for a solver. 
> > 
> > > 
> > > I understand that the parallel performance may not be comparable, so I first set up a one-process test (with MPIAij, but all the rows are local since there is only one process). The system is solved without any problem (identical results within error). But the performance is actually a lot worse (code built without debugging flags in performance tests) than my own home-brew implementation in Fortran (I wrote my own ILU0 in CSR sparse matrix format), which is hard to believe. I suspect the difference is from the PC as the PETSc version took much more BiCGstab iterations (60-ish vs 100-ish) to converge to the same relative tol. 
> > 
> >    PETSc uses GMRES by default with a restart of 30 and left preconditioning. It could be different exact choices in the solver (which is why -ksp_view is so useful) can explain the differences in the runs between your code and PETSc's
> > > 
> > > This is further confirmed when I change the setup of D-ILU (using 6 or 9 blocks instead of 3). While my Fortran/Matlab codes see minimal performance difference (<5%) when I play with the D-ILU setup, increasing the number of D-ILU blocks from 3 to 9 caused the ksp setup with PCBJACOBI to suffer a performance decrease of more than 50% in sequential test.
> > 
> >    This is odd, the more blocks the smaller each block so the quicker the ILU set up should be. You can run various cases with -log_view and send us the output to see what is happening at each part of the computation in time.
> >  
> > > So my implementation IS somewhat different in PETSc. Do I miss something in the PCBJACOBI setup? Or do I have some fundamental misunderstanding of how PCBJACOBI works in PETSc? 
> > 
> >    Probably not.
> > > 
> > > If this is not the right way to implement a block diagonal ILU as (parallel) PC, please kindly point me to the right direction. I searched through the mail list to find some answers, only to find a couple of similar questions... An example would be nice.
> > 
> >    You approach seems fundamentally right but I cannot be sure of possible bugs.
> > > 
> > > On the other hand, does PETSc support a simple way to use explicit L/U matrix as a preconditioner? I can import the  D-ILU matrix (I already converted my A matrix into Mat) constructed in my Fortran code to make a better comparison. Or do I have to construct my own PC using PCshell? If so, is there a good tutorial/example to learn about how to use PCSHELL (in a more advanced sense, like how to setup pc side and transpose)? 
> > 
> >    Not sure what you mean by explicit L/U matrix as a preconditioner. As Hong said, yes you can use a parallel LU from MUMPS or SuperLU_DIST or Pastix as the solver. You do not need any shell code. You simply need to set the PCType to lu 
> > 
> >    You can also set all this options from the command line and don't need to write the code specifically. So call KSPSetFromOptions() and then for example
> > 
> >     -pc_type bjacobi  -pc_bjacobi_local_blocks 3 -pc_sub_type ilu (this last one is applied to each block so you could use -pc_type lu and it would use lu on each block.) 
> > 
> >    -ksp_type_none  -pc_type lu -pc_factor_mat_solver_type mumps  (do parallel LU with mumps)
> > 
> > By not hardwiring in the code and just using options you can test out different cases much quicker
> > 
> > Use -ksp_view to make sure that is using the solver the way you expect.
> > 
> > Barry
> > 
> > 
> > 
> >    Barry
> > 
> > > 
> > > Thanks in advance, 
> > > 
> > > Hao


From juaneah at gmail.com  Thu Feb  6 18:00:01 2020
From: juaneah at gmail.com (Emmanuel Ayala)
Date: Thu, 6 Feb 2020 18:00:01 -0600
Subject: [petsc-users] SLEPc: The inner product is not well defined
Message-ID: <CAMo+o5hZSMjEUS_pM62K0g46hYx9KBffJ7ETc=0QNqoAzZHpZw@mail.gmail.com>

Hi everyone,

I'm solving the eigenvalue problem of three bodies in the same program, it
generates a three sets of matrices.

I installed PETSc as optimized version:

Configure options --with-debugging=0 COPTFLAGS="-O2 -march=native
-mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" FOPTFLAGS="-O2
-march=native -mtune=native" --download-mpich --download-superlu_dist
--download-metis --download-parmetis --download-cmake
--download-fblaslapack=1 --with-cxx-dialect=C++11

Then I installed SLEPc in the standard form and referring to PETSc
optimized directory. I did NOT install SLEPc with --with-debugging=0,
because I'm still testing my code.

My matrices comes from DMCreateMatrix, the stiffness matrix and mass
matrix. I use the function MatIsSymmetric to check if my matrices are
symmetric or not, and always the matrices are symmetric (even when the
program crash). For that reason I use:

ierr = EPSSetProblemType(eps,EPS_GHEP); CHKERRQ(ierr);
ierr = EPSSetType(eps,EPSLOBPCG); CHKERRQ(ierr); // because I need the
smallest

The problem is: sometimes the code works well for the three sets of
matrices and I get the expected results, but sometimes it does not happen,
it only works for the first sets of matrices, and then it crashes, and when
that occurs the error message is:

[0]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------
[0]PETSC ERROR: The inner product is not well defined: indefinite matrix
[0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.12.3, Jan, 03, 2020
[0]PETSC ERROR: ./comp on a linux-opt-02 named lnx by ayala Thu Feb  6
17:20:16 2020
[0]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2
-march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native"
FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich
--download-superlu_dist --download-metis --download-parmetis
--download-cmake --download-fblaslapack=1 --with-cxx-dialect=C++11
[0]PETSC ERROR: #1 BV_SafeSqrt() line 130 in
/home/ayala/Documents/SLEPc/slepc-3.12.2/include/slepc/private/bvimpl.h
[0]PETSC ERROR: #2 BVNorm_Private() line 473 in
/home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/bv/interface/bvglobal.c
[0]PETSC ERROR: #3 BVNormColumn() line 718 in
/home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/bv/interface/bvglobal.c
[0]PETSC ERROR: #4 BV_NormVecOrColumn() line 26 in
/home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/bv/interface/bvorthog.c
[0]PETSC ERROR: #5 BVOrthogonalizeCGS1() line 136 in
/home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/bv/interface/bvorthog.c
[0]PETSC ERROR: #6 BVOrthogonalizeGS() line 188 in
/home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/bv/interface/bvorthog.c
[0]PETSC ERROR: #7 BVOrthonormalizeColumn() line 416 in
/home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/bv/interface/bvorthog.c
[0]PETSC ERROR: #8 EPSSolve_LOBPCG() line 144 in
/home/ayala/Documents/SLEPc/slepc-3.12.2/src/eps/impls/cg/lobpcg/lobpcg.c
[0]PETSC ERROR: #9 EPSSolve() line 149 in
/home/ayala/Documents/SLEPc/slepc-3.12.2/src/eps/interface/epssolve.c

Number of iterations of the method: 0
Number of linear iterations of the method: 0
Solution method: lobpcg

Number of requested eigenvalues: 6
Stopping condition: tol=1e-06, maxit=10000
Number of converged eigenpairs: 0

The problem appears even when I run the same compilation.

Anyone have any suggestions to solve the problem?

Kind regards.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200206/a88a6140/attachment.html>

From fdkong.jd at gmail.com  Thu Feb  6 18:35:56 2020
From: fdkong.jd at gmail.com (Fande Kong)
Date: Thu, 6 Feb 2020 17:35:56 -0700
Subject: [petsc-users] Condition Number and GMRES iteration
Message-ID: <CAN5Wd-LcRCAr-StMLD6McREAST9-yQpg=rStBL7TtxVzG=fE-A@mail.gmail.com>

Hi All,

MOOSE team, Alex and I are working on some variable scaling techniques to
improve the condition number of the matrix of linear systems. The goal of
variable scaling is to make the diagonal of matrix as close to unity as
possible. After scaling (for certain example), the condition number of the
linear system is actually reduced, but the GMRES iteration does not
decrease at all.

>From my understanding, the condition number is the worst estimation for
GMRES convergence. That is, the GMRES iteration should not increases when
the condition number decreases. This actually could example what we saw:
the improved condition number does not necessary lead to a decrease in
GMRES iteration. We try to understand this a bit more, and we guess that
the number of eigenvalue clusters of the matrix of the linear system
may/might be related to the convergence rate of GMRES.  We plot eigenvalues
of scaled system and unscaled system, and the clusters look different from
each other, but the GMRRES iterations are the same.

Anyone know what is the right relationship between the condition number and
GMRES iteration? How does the number of eigenvalue clusters affect GMRES
iteration?  How to count eigenvalue clusters? For example, how many
eigenvalue clusters we have in the attach image respectively?

If you need more details, please let us know. Alex and I are happy to
provide any details you are interested in.


Thanks,

Fande Kong,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200206/a00b16af/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: imgo-1.jpg
Type: image/jpeg
Size: 16293 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200206/a00b16af/attachment-0002.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: imgo.jpg
Type: image/jpeg
Size: 17349 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200206/a00b16af/attachment-0003.jpg>

From alexlindsay239 at gmail.com  Thu Feb  6 19:05:48 2020
From: alexlindsay239 at gmail.com (Alexander Lindsay)
Date: Thu, 6 Feb 2020 17:05:48 -0800
Subject: [petsc-users] Condition Number and GMRES iteration
In-Reply-To: <CAN5Wd-LcRCAr-StMLD6McREAST9-yQpg=rStBL7TtxVzG=fE-A@mail.gmail.com>
References: <CAN5Wd-LcRCAr-StMLD6McREAST9-yQpg=rStBL7TtxVzG=fE-A@mail.gmail.com>
Message-ID: <CANFcJrFrPjXRuZZupW0fsCxCydrA-OpAvj-a349F+tHCwZ6f1w@mail.gmail.com>

It looks like Fande has attached the eigenvalue plots with the real axis
having a logarithmic scale. The same plots with a linear scale are attached
here.

The system has 306 degrees of freedom. 12 eigenvalues are unity for both
scaled and unscaled cases; this number corresponds to the number of mesh
nodes with Dirichlet boundary conditions (just a 1 on the diagonal for the
corresponding rows). The rest of the eigenvalues are orders of magnitude
smaller for the unscaled case; using scaling these eigenvalues are brought
much closer 1.

This particular problem is linear but we solve it with SNES, so constant
Jacobian. We run with options '-pc_type none -ksp_gmres_restart 1000
-snes_rtol 1e-8 -ksp_rtol 1e-5` so for this linear problem it takes two
non-linear iterations to solve.

Unscaled:

first nonlinear iteration takes 2 linear iterations
second nonlinear iteration takes 99 linear iterations

Scaled:

first nonlinear iteration takes 94 linear iterations
second nonlinear iteration takes 100 linear iterations

Running with `-pc_type svd` the condition number for the unscaled
simulation is 4e9 while it is 2e3 for the scaled simulation.



On Thu, Feb 6, 2020 at 4:36 PM Fande Kong <fdkong.jd at gmail.com> wrote:

> Hi All,
>
> MOOSE team, Alex and I are working on some variable scaling techniques to
> improve the condition number of the matrix of linear systems. The goal of
> variable scaling is to make the diagonal of matrix as close to unity as
> possible. After scaling (for certain example), the condition number of the
> linear system is actually reduced, but the GMRES iteration does not
> decrease at all.
>
> From my understanding, the condition number is the worst estimation for
> GMRES convergence. That is, the GMRES iteration should not increases when
> the condition number decreases. This actually could example what we saw:
> the improved condition number does not necessary lead to a decrease in
> GMRES iteration. We try to understand this a bit more, and we guess that
> the number of eigenvalue clusters of the matrix of the linear system
> may/might be related to the convergence rate of GMRES.  We plot eigenvalues
> of scaled system and unscaled system, and the clusters look different from
> each other, but the GMRRES iterations are the same.
>
> Anyone know what is the right relationship between the condition number
> and GMRES iteration? How does the number of eigenvalue clusters affect
> GMRES iteration?  How to count eigenvalue clusters? For example, how many
> eigenvalue clusters we have in the attach image respectively?
>
> If you need more details, please let us know. Alex and I are happy to
> provide any details you are interested in.
>
>
> Thanks,
>
> Fande Kong,
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200206/998c1f03/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: no-scaling-linear.png
Type: image/png
Size: 11539 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200206/998c1f03/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: with-scaling-linear.png
Type: image/png
Size: 12940 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200206/998c1f03/attachment-0003.png>

From jroman at dsic.upv.es  Fri Feb  7 03:06:38 2020
From: jroman at dsic.upv.es (Jose E. Roman)
Date: Fri, 7 Feb 2020 10:06:38 +0100
Subject: [petsc-users] SLEPc: The inner product is not well defined
In-Reply-To: <CAMo+o5hZSMjEUS_pM62K0g46hYx9KBffJ7ETc=0QNqoAzZHpZw@mail.gmail.com>
References: <CAMo+o5hZSMjEUS_pM62K0g46hYx9KBffJ7ETc=0QNqoAzZHpZw@mail.gmail.com>
Message-ID: <E7DB53E3-2E41-4A5B-9C07-893ABB64C7FC@dsic.upv.es>

This error appears when computing the B-norm of a vector x, as sqrt(x'*B*x). Probably your B matrix is semi-definite, and due to floating-point error the value x'*B*x becomes negative for a certain vector x. The code uses a tolerance of 10*PETSC_MACHINE_EPSILON, but it seems the rounding errors are larger in your case. Or maybe your B-matrix is indefinite, in which case you should solve the problem as non-symmetric (or as symmetric-indefinite GHIEP).

Do you get the same problem with the Krylov-Schur solver?

A workaround is to edit the source code and remove the check or increase the tolerance, but this may be catastrophic if your B is indefinite. A better solution is to reformulate the problem, solving the matrix pair (A,C) where C=alpha*A+beta*B is positive definite (note that then the eigenvalues become lambda/(beta+alpha*lambda)).

Jose

> El 7 feb 2020, a las 1:00, Emmanuel Ayala <juaneah at gmail.com> escribi?:
> 
> Hi everyone,
> 
> I'm solving the eigenvalue problem of three bodies in the same program, it generates a three sets of matrices.
> 
> I installed PETSc as optimized version:
> 
> Configure options --with-debugging=0 COPTFLAGS="-O2 -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich --download-superlu_dist --download-metis --download-parmetis --download-cmake --download-fblaslapack=1 --with-cxx-dialect=C++11
> 
> Then I installed SLEPc in the standard form and referring to PETSc optimized directory. I did NOT install SLEPc with --with-debugging=0, because I'm still testing my code.
> 
> My matrices comes from DMCreateMatrix, the stiffness matrix and mass matrix. I use the function MatIsSymmetric to check if my matrices are symmetric or not, and always the matrices are symmetric (even when the program crash). For that reason I use:
> 
> ierr = EPSSetProblemType(eps,EPS_GHEP); CHKERRQ(ierr);
> ierr = EPSSetType(eps,EPSLOBPCG); CHKERRQ(ierr); // because I need the smallest
> 
> The problem is: sometimes the code works well for the three sets of matrices and I get the expected results, but sometimes it does not happen, it only works for the first sets of matrices, and then it crashes, and when that occurs the error message is:
> 
> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [0]PETSC ERROR: The inner product is not well defined: indefinite matrix
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html <https://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.12.3, Jan, 03, 2020 
> [0]PETSC ERROR: ./comp on a linux-opt-02 named lnx by ayala Thu Feb  6 17:20:16 2020
> [0]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2 -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich --download-superlu_dist --download-metis --download-parmetis --download-cmake --download-fblaslapack=1 --with-cxx-dialect=C++11
> [0]PETSC ERROR: #1 BV_SafeSqrt() line 130 in /home/ayala/Documents/SLEPc/slepc-3.12.2/include/slepc/private/bvimpl.h
> [0]PETSC ERROR: #2 BVNorm_Private() line 473 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/bv/interface/bvglobal.c
> [0]PETSC ERROR: #3 BVNormColumn() line 718 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/bv/interface/bvglobal.c
> [0]PETSC ERROR: #4 BV_NormVecOrColumn() line 26 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/bv/interface/bvorthog.c
> [0]PETSC ERROR: #5 BVOrthogonalizeCGS1() line 136 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/bv/interface/bvorthog.c
> [0]PETSC ERROR: #6 BVOrthogonalizeGS() line 188 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/bv/interface/bvorthog.c
> [0]PETSC ERROR: #7 BVOrthonormalizeColumn() line 416 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/bv/interface/bvorthog.c
> [0]PETSC ERROR: #8 EPSSolve_LOBPCG() line 144 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/eps/impls/cg/lobpcg/lobpcg.c
> [0]PETSC ERROR: #9 EPSSolve() line 149 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/eps/interface/epssolve.c
> 
> Number of iterations of the method: 0
> Number of linear iterations of the method: 0
> Solution method: lobpcg
> 
> Number of requested eigenvalues: 6
> Stopping condition: tol=1e-06, maxit=10000
> Number of converged eigenpairs: 0
> 
> The problem appears even when I run the same compilation.
> 
> Anyone have any suggestions to solve the problem?
> 
> Kind regards.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200207/4b77f371/attachment.html>

From dong-hao at outlook.com  Fri Feb  7 07:44:59 2020
From: dong-hao at outlook.com (Hao DONG)
Date: Fri, 7 Feb 2020 13:44:59 +0000
Subject: [petsc-users] What is the right way to implement a (block)
 Diagonal ILU as PC?
In-Reply-To: <A3BEB448-119F-472E-A133-099EB4332E47@mcs.anl.gov>
References: <MN2PR07MB6239B134B7BCE032E8ACC4C995030@MN2PR07MB6239.namprd07.prod.outlook.com>
	<264F91C4-8558-4850-9B4B-ABE4123C2A2C@anl.gov>
	<MN2PR07MB6239E7694FFB455C533991FB95020@MN2PR07MB6239.namprd07.prod.outlook.com>
	<4A373D93-4018-45E0-B805-3ECC528472DD@mcs.anl.gov>
	<BN8PR07MB6228BCD9B6D1205CB3E6D60F951D0@BN8PR07MB6228.namprd07.prod.outlook.com>,
	<A3BEB448-119F-472E-A133-099EB4332E47@mcs.anl.gov>
Message-ID: <MN2PR07MB6239E5B73B916F6BDD8279AA951C0@MN2PR07MB6239.namprd07.prod.outlook.com>

Thanks, Barry, I really appreciate your help -

I removed the OpenMP flags and rebuilt PETSc. So the problem is from the BLAS lib I linked? Not sure which version my BLAS is, though? But I also included the -download-Scalapack option. Shouldn?t that enable linking with PBLAS automatically?

After looking at the bcgs code in PETSc, I suppose the iteration residual recorded is indeed recorded twice per one "actual iteration?. So that can explain the difference of iteration numbers.

My laptop is indeed an old machine (MBP15 mid-2014). I just cannot work with vi without a physical ESC key... I have attached the configure.log -didn?t know that it is so large!

Anyway, it seems that the removal of -openmp changes quite a lot of things, the performance is indeed getting much better - the flop/sec increases by a factor of 3. Still, I am getting 20 percent of VecMDot, but no VecMDot in BCGS all (see output below), is that a feature of gmres method?

here is the output of the same problem with:

-pc_type bjacobi -pc_bjacobi_local_blocks 3 -sub_pc_type ilu -ksp_type gmres -ksp_monitor -ksp_view


---------------------------------------------- PETSc Performance Summary: ----------------------------------------------



 Mod3DMT.test on a arch-darwin-c-opt named Haos-MBP with 1 processor, by donghao Fri Feb  7 10:26:19 2020
 Using Petsc Release Version 3.12.3, unknown



                          Max       Max/Min     Avg       Total
 Time (sec):           2.520e+00     1.000   2.520e+00
 Objects:              1.756e+03     1.000   1.756e+03
 Flop:                 7.910e+09     1.000   7.910e+09  7.910e+09
 Flop/sec:             3.138e+09     1.000   3.138e+09  3.138e+09
 MPI Messages:         0.000e+00     0.000   0.000e+00  0.000e+00
 MPI Message Lengths:  0.000e+00     0.000   0.000e+00  0.000e+00
 MPI Reductions:       0.000e+00     0.000



 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                             e.g., VecAXPY() for real vectors of length N --> 2N flop
                             and VecAXPY() for complex vectors of length N --> 8N flop



 Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
                         Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
  0:      Main Stage: 2.5204e+00 100.0%  7.9096e+09 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%



 ------------------------------------------------------------------------------------------------------------------------
?
------------------------------------------------------------------------------------------------------------------------
 Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
                    Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
 ------------------------------------------------------------------------------------------------------------------------



 --- Event Stage 0: Main Stage



 BuildTwoSidedF         1 1.0 3.4000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 MatMult               75 1.0 6.2884e-01 1.0 1.88e+09 1.0 0.0e+00 0.0e+00 0.0e+00 25 24  0  0  0  25 24  0  0  0  2991
 MatSolve             228 1.0 4.4164e-01 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 14  0  0  0  18 14  0  0  0  2445
 MatLUFactorNum         3 1.0 4.1317e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0   574
 MatILUFactorSym        3 1.0 2.3858e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
 MatAssemblyBegin       5 1.0 4.4000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 MatAssemblyEnd         5 1.0 1.5067e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
 MatGetRowIJ            3 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 MatCreateSubMats       1 1.0 2.4558e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
 MatGetOrdering         3 1.0 1.3290e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 MatView                3 1.0 1.2800e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 VecMDot               72 1.0 4.9875e-01 1.0 2.25e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 28  0  0  0  20 28  0  0  0  4509
 VecNorm               76 1.0 6.6666e-02 1.0 1.70e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  2  0  0  0   3  2  0  0  0  2544
 VecScale              75 1.0 1.7982e-02 1.0 8.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0  4653
 VecCopy                3 1.0 1.5080e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 VecSet               276 1.0 9.6784e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  4  0  0  0  0   4  0  0  0  0     0
 VecAXPY                6 1.0 3.6860e-03 1.0 1.34e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  3632
 VecMAXPY              75 1.0 4.0490e-01 1.0 2.41e+09 1.0 0.0e+00 0.0e+00 0.0e+00 16 30  0  0  0  16 30  0  0  0  5951
 VecAssemblyBegin       2 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 VecAssemblyEnd         2 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 VecScatterBegin       76 1.0 5.3800e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 VecNormalize          75 1.0 8.3690e-02 1.0 2.51e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  3  0  0  0   3  3  0  0  0  2999
 KSPSetUp               4 1.0 1.1663e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 KSPSolve               1 1.0 2.2119e+00 1.0 7.91e+09 1.0 0.0e+00 0.0e+00 0.0e+00 88100  0  0  0  88100  0  0  0  3576
 KSPGMRESOrthog        72 1.0 8.7843e-01 1.0 4.50e+09 1.0 0.0e+00 0.0e+00 0.0e+00 35 57  0  0  0  35 57  0  0  0  5121
 PCSetUp                4 1.0 9.2448e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  4  0  0  0  0   4  0  0  0  0   257
 PCSetUpOnBlocks        1 1.0 6.6597e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0   356
 PCApply               76 1.0 4.6281e-01 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 14  0  0  0  18 14  0  0  0  2333
 PCApplyOnBlocks      228 1.0 4.6262e-01 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 14  0  0  0  18 14  0  0  0  2334
 ------------------------------------------------------------------------------------------------------------------------

Average time to get PetscTime(): 1e-07
 #PETSc Option Table entries:
 -I LBFGS
 -ksp_type gmres
 -ksp_view
 -log_view
 -pc_bjacobi_local_blocks 3
 -pc_type bjacobi
 -sub_pc_type ilu
 #End of PETSc Option Table entries
 Compiled with FORTRAN kernels
 Compiled with full precision matrices (default)
 sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 16 sizeof(PetscInt) 4
 Configure options: --with-scalar-type=complex --download-mumps --download-scalapack --with-fortran-kernels=1 --          FOPTFLAGS="-O3 -ffree-line-length-0 -msse2" --COPTFLAGS="-O3 -msse2" --CXXOPTFLAGS="-O3 -msse2" --with-debugging=0
 -----------------------------------------
 Libraries compiled on 2020-02-07 10:07:42 on Haos-MBP
 Machine characteristics: Darwin-19.3.0-x86_64-i386-64bit
 Using PETSc directory: /Users/donghao/src/git/PETSc-current
 Using PETSc arch: arch-darwin-c-opt
 -----------------------------------------



 Using C compiler: mpicc  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fno-stack-   check -Qunused-arguments -fvisibility=hidden -O3 -msse2
 Using Fortran compiler: mpif90  -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -O3 -ffree-line-length-0 -         msse2
 -----------------------------------------



 Using include paths: -I/Users/donghao/src/git/PETSc-current/include -I/Users/donghao/src/git/PETSc-current/arch-darwin-  c-opt/include
 -----------------------------------------



 Using C linker: mpicc
 Using Fortran linker: mpif90
 Using libraries: -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/donghao/src/git/PETSc-   current/arch-darwin-c-opt/lib -lpetsc -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/    donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -Wl,-rpath,/usr/local/opt/libevent/lib -L/usr/local/opt/libevent/    lib -Wl,-rpath,/usr/local/Cellar/open-mpi/4.0.2/lib -L/usr/local/Cellar/open-mpi/4.0.2/lib -Wl,-rpath,/usr/local/Cellar/ gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple-darwin19/9.2.0 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple-        darwin19/9.2.0 -Wl,-rpath,/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -lcmumps -   ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -llapack -lblas -lc++ -ldl -lmpi_usempif08 -                 lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lquadmath -lm -lc++ -ldl
 -----------------------------------------



The BCGS solver performance is now comparable to my own Fortran code (1.84s). Still, I feel that there is something wrong hidden somewhere in my setup - a professional lib should to perform better, I believe. Any other ideas that I can look into? Interestingly there is no VecMDot operation at all! Here is the output with the option of:

-pc_type bjacobi -pc_bjacobi_local_blocks 3 -sub_pc_type ilu -ksp_type bcgs -ksp_monitor -ksp_view




---------------------------------------------- PETSc Performance Summary: ----------------------------------------------



 Mod3DMT.test on a arch-darwin-c-opt named Haos-MBP with 1 processor, by donghao Fri Feb  7 10:38:00 2020
 Using Petsc Release Version 3.12.3, unknown



                          Max       Max/Min     Avg       Total
 Time (sec):           2.187e+00     1.000   2.187e+00
 Objects:              1.155e+03     1.000   1.155e+03
 Flop:                 4.311e+09     1.000   4.311e+09  4.311e+09
 Flop/sec:             1.971e+09     1.000   1.971e+09  1.971e+09
 MPI Messages:         0.000e+00     0.000   0.000e+00  0.000e+00
 MPI Message Lengths:  0.000e+00     0.000   0.000e+00  0.000e+00
 MPI Reductions:       0.000e+00     0.000



 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                             e.g., VecAXPY() for real vectors of length N --> 2N flop
                             and VecAXPY() for complex vectors of length N --> 8N flop



 Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
                         Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
  0:      Main Stage: 2.1870e+00 100.0%  4.3113e+09 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%



 ------------------------------------------------------------------------------------------------------------------------

 ------------------------------------------------------------------------------------------------------------------------
 Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
                    Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
 ------------------------------------------------------------------------------------------------------------------------



 --- Event Stage 0: Main Stage



 BuildTwoSidedF         1 1.0 2.2000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 MatMult               83 1.0 7.8726e-01 1.0 2.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 36 48  0  0  0  36 48  0  0  0  2644
 MatSolve             252 1.0 5.5656e-01 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 25 28  0  0  0  25 28  0  0  0  2144
 MatLUFactorNum         3 1.0 4.5115e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   526
 MatILUFactorSym        3 1.0 2.5103e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
 MatAssemblyBegin       5 1.0 3.3000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 MatAssemblyEnd         5 1.0 1.5709e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
 MatGetRowIJ            3 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 MatCreateSubMats       1 1.0 2.8989e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
 MatGetOrdering         3 1.0 1.1200e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 MatView                3 1.0 1.2600e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 VecDot                82 1.0 8.9328e-02 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00  4  4  0  0  0   4  4  0  0  0  2048
 VecDotNorm2           41 1.0 9.9019e-02 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00  5  4  0  0  0   5  4  0  0  0  1848
 VecNorm               43 1.0 3.9988e-02 1.0 9.59e+07 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0  2399
 VecCopy                2 1.0 1.1150e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 VecSet               271 1.0 4.2833e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
 VecAXPY                1 1.0 5.9200e-04 1.0 2.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  3769
 VecAXPBYCZ            82 1.0 1.1448e-01 1.0 3.66e+08 1.0 0.0e+00 0.0e+00 0.0e+00  5  8  0  0  0   5  8  0  0  0  3196
 VecWAXPY              82 1.0 6.7460e-02 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  4  0  0  0   3  4  0  0  0  2712
 VecAssemblyBegin       2 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 VecAssemblyEnd         2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 VecScatterBegin       84 1.0 5.2800e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 KSPSetUp               4 1.0 1.4765e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
 KSPSolve               1 1.0 1.8514e+00 1.0 4.31e+09 1.0 0.0e+00 0.0e+00 0.0e+00 85100  0  0  0  85100  0  0  0  2329
 PCSetUp                4 1.0 1.0193e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  5  1  0  0  0   5  1  0  0  0   233
 PCSetUpOnBlocks        1 1.0 7.1421e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  3  1  0  0  0   3  1  0  0  0   332
 PCApply               84 1.0 5.7927e-01 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 26 28  0  0  0  26 28  0  0  0  2060
 PCApplyOnBlocks      252 1.0 5.7902e-01 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 26 28  0  0  0  26 28  0  0  0  2061
 ------------------------------------------------------------------------------------------------------------------------


Cheers,
Hao



________________________________
From: Smith, Barry F. <bsmith at mcs.anl.gov>
Sent: Thursday, February 6, 2020 7:03 PM
To: Hao DONG <dong-hao at outlook.com>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC?


  Read my comments ALL the way down, they go a long way.

> On Feb 6, 2020, at 3:43 AM, Hao DONG <dong-hao at outlook.com> wrote:
>
> Dear Hong and Barry,
>
> Thanks for the suggestions. So there could be some problems in my PETSc configuration? - but my PETSc lib was indeed compiled without the debug flags (--with-debugging=0). I use GCC/GFortran (Home-brew GCC 9.2.0) for the compiling and building of PETSc and my own fortran code. My Fortran compiling flags are simply like:
>
> -O3 -ffree-line-length-none -fastsse
>
> Which is also used for -FOPTFLAGS in PETSc (I added -openmp for PETSc, but not my fortran code, as I don?t have any OMP optimizations in my program). Note the performance test results I listed yesterday (e.g. 4.08s with 41 bcgs iterations.) are without any CSR-array->PETSc translation overhead (only include the set and solve part).

   PETSc doesn't use -openmp in any way for its solvers. Do not use this option, it may be slowing the code down. Please send configure.log

>
> I have two questions about the performance difference -
>
> 1. Is ilu only factorized once for each iteration, or ilu is performed at every outer ksp iteration steps? Sounds unlikely - but if so, this could cause some extra overheads.

   ILU is ONLY done if the matrix has changed (which seems wrong).
>
> 2. Some KSP solvers like BCGS or TFQMR has two ?half-iterations? for each iteration step. Not sure how it works in PETSc, but is that possible that both the two ?half" relative residuals are counted in the output array, doubling the number of iterations (but that cannot explain the extra time consumed)?

   Yes, PETSc might report them as two, you need to check the exact code.

>
> Anyway, the output with -log_view from the same 278906 by 278906 matrix with 3-block D-ILU in PETSc is as follows:
>
>
> ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
>
>  MEMsolv.lu on a arch-darwin-c-opt named Haos-MBP with 1 processor, by donghao Thu Feb  6 09:07:35 2020
>  Using Petsc Release Version 3.12.3, unknown
>
>                           Max       Max/Min     Avg       Total
>  Time (sec):           4.443e+00     1.000   4.443e+00
>  Objects:              1.155e+03     1.000   1.155e+03
>  Flop:                 4.311e+09     1.000   4.311e+09  4.311e+09
>  Flop/sec:             9.703e+08     1.000   9.703e+08  9.703e+08
>  MPI Messages:         0.000e+00     0.000   0.000e+00  0.000e+00
>  MPI Message Lengths:  0.000e+00     0.000   0.000e+00  0.000e+00
>  MPI Reductions:       0.000e+00     0.000
>
>  Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
>                              e.g., VecAXPY() for real vectors of length N --> 2N flop
>                              and VecAXPY() for complex vectors of length N --> 8N flop
>
>  Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
>                          Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
>   0:      Main Stage: 4.4435e+00 100.0%  4.3113e+09 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>
>  ????????????????????????????????????????????????????????????
>  See the 'Profiling' chapter of the users' manual for details on interpreting output.
>  Phase summary info:
>     Count: number of times phase was executed
>     Time and Flop: Max - maximum over all processors
>                    Ratio - ratio of maximum to minimum over all processors
>     Mess: number of messages sent
>     AvgLen: average message length (bytes)
>     Reduct: number of global reductions
>     Global: entire computation
>     Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
>        %T - percent time in this phase         %F - percent flop in this phase
>        %M - percent messages in this phase     %L - percent message lengths in this phase
>        %R - percent reductions in this phase
>     Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
>  ------------------------------------------------------------------------------------------------------------------------
>  Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
>                     Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>  ------------------------------------------------------------------------------------------------------------------------
>
>  --- Event Stage 0: Main Stage
>
>  BuildTwoSidedF         1 1.0 2.3000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  MatMult               83 1.0 1.7815e+00 1.0 2.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 40 48  0  0  0  40 48  0  0  0  1168
>  MatSolve             252 1.0 1.2708e+00 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 29 28  0  0  0  29 28  0  0  0   939
>  MatLUFactorNum         3 1.0 7.9725e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   298
>  MatILUFactorSym        3 1.0 2.6998e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>  MatAssemblyBegin       5 1.0 3.6000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  MatAssemblyEnd         5 1.0 3.1619e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>  MatGetRowIJ            3 1.0 2.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  MatCreateSubMats       1 1.0 3.9659e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>  MatGetOrdering         3 1.0 4.3070e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  MatView                3 1.0 1.3600e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  VecDot                82 1.0 1.8948e-01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00  4  4  0  0  0   4  4  0  0  0   966
>  VecDotNorm2           41 1.0 1.6812e-01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00  4  4  0  0  0   4  4  0  0  0  1088
>  VecNorm               43 1.0 9.5099e-02 1.0 9.59e+07 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0  1009
>  VecCopy                2 1.0 1.0120e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  VecSet               271 1.0 3.8922e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>  VecAXPY                1 1.0 7.7200e-04 1.0 2.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2890
>  VecAXPBYCZ            82 1.0 2.4370e-01 1.0 3.66e+08 1.0 0.0e+00 0.0e+00 0.0e+00  5  8  0  0  0   5  8  0  0  0  1502
>  VecWAXPY              82 1.0 1.4148e-01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  4  0  0  0   3  4  0  0  0  1293
>  VecAssemblyBegin       2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  VecAssemblyEnd         2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  VecScatterBegin       84 1.0 5.9300e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  KSPSetUp               4 1.0 1.4167e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  KSPSolve               1 1.0 4.0250e+00 1.0 4.31e+09 1.0 0.0e+00 0.0e+00 0.0e+00 91100  0  0  0  91100  0  0  0  1071
>  PCSetUp                4 1.0 1.5207e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  3  1  0  0  0   3  1  0  0  0   156
>  PCSetUpOnBlocks        1 1.0 1.1116e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  3  1  0  0  0   3  1  0  0  0   214
>  PCApply               84 1.0 1.2912e+00 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 29 28  0  0  0  29 28  0  0  0   924
>  PCApplyOnBlocks      252 1.0 1.2909e+00 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 29 28  0  0  0  29 28  0  0  0   924
>  ------------------------------------------------------------------------------------------------------------------------
>
> # I skipped the memory part - the options (and compiler options) are as follows:
>
> #PETSc Option Table entries:
>  -ksp_type bcgs
>  -ksp_view
>  -log_view
>  -pc_bjacobi_local_blocks 3
>  -pc_factor_levels 0
>  -pc_sub_type ilu
>  -pc_type bjacobi
>  #End of PETSc Option Table entries
>  Compiled with FORTRAN kernels
>  Compiled with full precision matrices (default)
>  sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 16 sizeof(PetscInt) 4
>  Configure options: --with-scalar-type=complex --download-mumps --download-scalapack --with-fortran-kernels=1 --          FOPTFLAGS=?-O3 -fastsse -mp -openmp? --COPTFLAGS=?-O3 -fastsse -mp -openmp? --CXXOPTFLAGS="-O3 -fastsse -mp -openmp" --     with-debugging=0
>  -----------------------------------------
>  Libraries compiled on 2020-02-03 10:44:31 on Haos-MBP
>  Machine characteristics: Darwin-19.2.0-x86_64-i386-64bit
>  Using PETSc directory: /Users/donghao/src/git/PETSc-current
>  Using PETSc arch: arch-darwin-c-opt
>  -----------------------------------------
>
>  Using C compiler: mpicc  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fno-stack-   check -Qunused-arguments -fvisibility=hidden
>  Using Fortran compiler: mpif90  -Wall -ffree-line-length-0 -Wno-unused-dummy-argument
>
> Using include paths: -I/Users/donghao/src/git/PETSc-current/include -I/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/include
>  -----------------------------------------
>
>  Using C linker: mpicc
>  Using Fortran linker: mpif90
>  Using libraries: -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/donghao/src/git/PETSc-   current/arch-darwin-c-opt/lib -lpetsc -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/    donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -Wl,-rpath,/usr/local/opt/libevent/lib -L/usr/local/opt/libevent/    lib -Wl,-rpath,/usr/local/Cellar/open-mpi/4.0.2/lib -L/usr/local/Cellar/open-mpi/4.0.2/lib -Wl,-rpath,/usr/local/Cellar/ gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple-darwin19/9.2.0 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple-        darwin19/9.2.0 -Wl,-rpath,/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -lcmumps -   ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -llapack -lblas -lc++ -ldl -lmpi_usempif08 -                 lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lquadmath -lm -lc++ -ldl
>
>
> On the other hand, running PETSc with
>
> -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type lu -ksp_type gmres -ksp_monitor -ksp_view -log_view
>
> For the same problem takes 5.37s and  72 GMRES iterations. Our previous testings show that BiCGstab (bcgs in PETSc) is almost always the most effective KSP solver for our non-symmetrical complex system. Strangely, the system is still using ilu instead of lu for sub blocks. The output is like:

  -sub_pc_type lu

>
>    0 KSP Residual norm 2.480412407430e+02
>    1 KSP Residual norm 8.848059967835e+01
>    2 KSP Residual norm 3.415272863261e+01
>    3 KSP Residual norm 1.563045190939e+01
>    4 KSP Residual norm 6.241296940043e+00
>    5 KSP Residual norm 2.739710899854e+00
>    6 KSP Residual norm 1.391304148888e+00
>    7 KSP Residual norm 7.959262020849e-01
>    8 KSP Residual norm 4.828323055231e-01
>    9 KSP Residual norm 2.918529739200e-01
>   10 KSP Residual norm 1.905508589557e-01
>   11 KSP Residual norm 1.291541892702e-01
>   12 KSP Residual norm 8.827145774707e-02
>   13 KSP Residual norm 6.521331095889e-02
>   14 KSP Residual norm 5.095787952595e-02
>   15 KSP Residual norm 4.043060387395e-02
>   16 KSP Residual norm 3.232590200012e-02
>   17 KSP Residual norm 2.593944982216e-02
>   18 KSP Residual norm 2.064639483533e-02
>   19 KSP Residual norm 1.653916663492e-02
>   20 KSP Residual norm 1.334946415452e-02
>   21 KSP Residual norm 1.092886880597e-02
>   22 KSP Residual norm 8.988004105542e-03
>   23 KSP Residual norm 7.466501315240e-03
>   24 KSP Residual norm 6.284389135436e-03
>   25 KSP Residual norm 5.425231669964e-03
>   26 KSP Residual norm 4.766338253084e-03
>   27 KSP Residual norm 4.241238878242e-03
>   28 KSP Residual norm 3.808113525685e-03
>   29 KSP Residual norm 3.449383788116e-03
>   30 KSP Residual norm 3.126025526388e-03
>   31 KSP Residual norm 2.958328054299e-03
>   32 KSP Residual norm 2.802344900403e-03
>   33 KSP Residual norm 2.621993580492e-03
>   34 KSP Residual norm 2.430066269304e-03
>   35 KSP Residual norm 2.259043079597e-03
>   36 KSP Residual norm 2.104287972986e-03
>   37 KSP Residual norm 1.952916080045e-03
>   38 KSP Residual norm 1.804988937999e-03
>   39 KSP Residual norm 1.643302117377e-03
>   40 KSP Residual norm 1.471661332036e-03
>   41 KSP Residual norm 1.286445911163e-03
>   42 KSP Residual norm 1.127543025848e-03
>   43 KSP Residual norm 9.777148275484e-04
>   44 KSP Residual norm 8.293314450006e-04
>   45 KSP Residual norm 6.989331136622e-04
>   46 KSP Residual norm 5.852307780220e-04
>   47 KSP Residual norm 4.926715539762e-04
>   48 KSP Residual norm 4.215941372075e-04
>   49 KSP Residual norm 3.699489548162e-04
>   50 KSP Residual norm 3.293897163533e-04
>   51 KSP Residual norm 2.959954542998e-04
>   52 KSP Residual norm 2.700193032414e-04
>   53 KSP Residual norm 2.461789791204e-04
>   54 KSP Residual norm 2.218839085563e-04
>   55 KSP Residual norm 1.945154309976e-04
>   56 KSP Residual norm 1.661128781744e-04
>   57 KSP Residual norm 1.413198766258e-04
>   58 KSP Residual norm 1.213984003195e-04
>   59 KSP Residual norm 1.044317450754e-04
>   60 KSP Residual norm 8.919957502977e-05
>   61 KSP Residual norm 8.042584301275e-05
>   62 KSP Residual norm 7.292784493581e-05
>   63 KSP Residual norm 6.481935501872e-05
>   64 KSP Residual norm 5.718564652679e-05
>   65 KSP Residual norm 5.072589750116e-05
>   66 KSP Residual norm 4.487930741285e-05
>   67 KSP Residual norm 3.941040674119e-05
>   68 KSP Residual norm 3.492873281291e-05
>   69 KSP Residual norm 3.103798339845e-05
>   70 KSP Residual norm 2.822943237409e-05
>   71 KSP Residual norm 2.610615023776e-05
>   72 KSP Residual norm 2.441692671173e-05
>  KSP Object: 1 MPI processes
>    type: gmres
>      restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>      happy breakdown tolerance 1e-30
>    maximum iterations=150, nonzero initial guess
>    tolerances:  relative=1e-07, absolute=1e-50, divergence=10000.
>    left preconditioning
>    using PRECONDITIONED norm type for convergence test
>  PC Object: 1 MPI processes
>    type: bjacobi
>      number of blocks = 3
>      Local solve is same for all blocks, in the following KSP and PC objects:
>      KSP Object: (sub_) 1 MPI processes
>        type: preonly
>        maximum iterations=10000, initial guess is zero
>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>        left preconditioning
>        using NONE norm type for convergence test
>      PC Object: (sub_) 1 MPI processes
>        type: ilu
>          out-of-place factorization
>          0 levels of fill
>          tolerance for zero pivot 2.22045e-14
>          matrix ordering: natural
>          factor fill ratio given 1., needed 1.
>            Factored matrix follows:
>              Mat Object: 1 MPI processes
>                type: seqaij
>                rows=92969, cols=92969
>                package used to perform factorization: petsc
>                total: nonzeros=638417, allocated nonzeros=638417
>                total number of mallocs used during MatSetValues calls=0
>                  not using I-node routines
>        linear system matrix = precond matrix:
>        Mat Object: 1 MPI processes
>          type: seqaij
>          rows=92969, cols=92969
>          total: nonzeros=638417, allocated nonzeros=638417
>          total number of mallocs used during MatSetValues calls=0
>            not using I-node routines
>    linear system matrix = precond matrix:
>    Mat Object: 1 MPI processes
>      type: mpiaij
>      rows=278906, cols=278906
>      total: nonzeros=3274027, allocated nonzeros=3274027
>      total number of mallocs used during MatSetValues calls=0
>        not using I-node (on process 0) routines
> ...
>  ------------------------------------------------------------------------------------------------------------------------
>  Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
>                     Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>  ------------------------------------------------------------------------------------------------------------------------
>
>  --- Event Stage 0: Main Stage
>
>  BuildTwoSidedF         1 1.0 2.3000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  MatMult               75 1.0 1.5812e+00 1.0 1.88e+09 1.0 0.0e+00 0.0e+00 0.0e+00 28 24  0  0  0  28 24  0  0  0  1189
>  MatSolve             228 1.0 1.1442e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 14  0  0  0  20 14  0  0  0   944

   These flop rates are ok, but not great. Perhaps an older machine.

>  MatLUFactorNum         3 1.0 8.1930e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0   290
>  MatILUFactorSym        3 1.0 2.7102e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  MatAssemblyBegin       5 1.0 3.7000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  MatAssemblyEnd         5 1.0 3.1895e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>  MatGetRowIJ            3 1.0 2.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  MatCreateSubMats       1 1.0 4.0904e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>  MatGetOrdering         3 1.0 4.2640e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  MatView                3 1.0 1.4400e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  VecMDot               72 1.0 1.1984e+00 1.0 2.25e+09 1.0 0.0e+00 0.0e+00 0.0e+00 21 28  0  0  0  21 28  0  0  0  1877

    21 percent of the time in VecMDOT this is huge for s sequential fun. I think maybe you are using a terrible OpenMP BLAS?

    Send configure.log


>  VecNorm               76 1.0 1.6841e-01 1.0 1.70e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  2  0  0  0   3  2  0  0  0  1007
>  VecScale              75 1.0 1.8241e-02 1.0 8.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  4587
>  VecCopy                3 1.0 1.4970e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  VecSet               276 1.0 9.1970e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
>  VecAXPY                6 1.0 3.7450e-03 1.0 1.34e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  3575
>  VecMAXPY              75 1.0 1.0022e+00 1.0 2.41e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 30  0  0  0  18 30  0  0  0  2405
>  VecAssemblyBegin       2 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  VecAssemblyEnd         2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  VecScatterBegin       76 1.0 5.5100e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  VecNormalize          75 1.0 1.8462e-01 1.0 2.51e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  3  0  0  0   3  3  0  0  0  1360
>  KSPSetUp               4 1.0 1.1341e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  KSPSolve               1 1.0 5.3123e+00 1.0 7.91e+09 1.0 0.0e+00 0.0e+00 0.0e+00 93100  0  0  0  93100  0  0  0  1489
>  KSPGMRESOrthog        72 1.0 2.1316e+00 1.0 4.50e+09 1.0 0.0e+00 0.0e+00 0.0e+00 37 57  0  0  0  37 57  0  0  0  2110
>  PCSetUp                4 1.0 1.5531e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0   153
>  PCSetUpOnBlocks        1 1.0 1.1343e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0   209
>  PCApply               76 1.0 1.1671e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 14  0  0  0  20 14  0  0  0   925
>  PCApplyOnBlocks      228 1.0 1.1668e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 14  0  0  0  20 14  0  0  0   925
>  ????????????????????????????????????????????????????????????
> ...
> #PETSc Option Table entries:
>  -ksp_monitor
>  -ksp_type gmres
>  -ksp_view
>  -log_view
>  -pc_bjacobi_local_blocks 3
>  -pc_sub_type lu
>  -pc_type bjacobi
>  #End of PETSc Option Table entries
> ...
>
> Does any of the setup/output ring a bell?
>
> BTW, out of curiosity - what is a ?I-node? routine?
>
>
> Cheers,
> Hao
>
>
> From: Smith, Barry F. <bsmith at mcs.anl.gov>
> Sent: Wednesday, February 5, 2020 9:42 PM
> To: Hao DONG <dong-hao at outlook.com>
> Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> Subject: Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC?
>
>
>
> > On Feb 5, 2020, at 4:36 AM, Hao DONG <dong-hao at outlook.com> wrote:
> >
> > Thanks a lot for your suggestions, Hong and Barry -
> >
> > As you suggested, I first tried the LU direct solvers (built-in and MUMPs) out this morning, which work perfectly, albeit slow. As my problem itself is a part of a PDE based optimization, the exact solution in the searching procedure is not necessary (I often set a relative tolerance of 1E-7/1E-8, etc. for Krylov subspace methods). Also tried BJACOBI with exact LU, the KSP just converges in one or two iterations, as expected.
> >
> > I added -kspview option for the D-ILU test (still with Block Jacobi as preconditioner and bcgs as KSP solver). The KSPview output from one of the examples in a toy model looks like:
> >
> > KSP Object: 1 MPI processes
> >    type: bcgs
> >    maximum iterations=120, nonzero initial guess
> >    tolerances:  relative=1e-07, absolute=1e-50, divergence=10000.
> >    left preconditioning
> >    using PRECONDITIONED norm type for convergence test
> >  PC Object: 1 MPI processes
> >    type: bjacobi
> >      number of blocks = 3
> >      Local solve is same for all blocks, in the following KSP and PC objects:
> >      KSP Object: (sub_) 1 MPI processes
> >        type: preonly
> >        maximum iterations=10000, initial guess is zero
> >        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >        left preconditioning
> >        using NONE norm type for convergence test
> >      PC Object: (sub_) 1 MPI processes
> >        type: ilu
> >          out-of-place factorization
> >          0 levels of fill
> >          tolerance for zero pivot 2.22045e-14
> >          matrix ordering: natural
> >          factor fill ratio given 1., needed 1.
> >            Factored matrix follows:
> >              Mat Object: 1 MPI processes
> >                type: seqaij
> >                rows=11294, cols=11294
> >                package used to perform factorization: petsc
> >                total: nonzeros=76008, allocated nonzeros=76008
> >                total number of mallocs used during MatSetValues calls=0
> >                  not using I-node routines
> >        linear system matrix = precond matrix:
> >        Mat Object: 1 MPI processes
> >          type: seqaij
> >          rows=11294, cols=11294
> >          total: nonzeros=76008, allocated nonzeros=76008
> >          total number of mallocs used during MatSetValues calls=0
> >            not using I-node routines
> >    linear system matrix = precond matrix:
> >    Mat Object: 1 MPI processes
> >      type: mpiaij
> >      rows=33880, cols=33880
> >      total: nonzeros=436968, allocated nonzeros=436968
> >      total number of mallocs used during MatSetValues calls=0
> >        not using I-node (on process 0) routines
> >
> > do you see something wrong with my setup?
> >
> > I also tried a quick performance test with a small 278906 by 278906 matrix (3850990 nnz) with the following parameters:
> >
> > -ksp_type bcgs -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type ilu -ksp_view
> >
> > Reducing the relative residual to 1E-7
> >
> > Took 4.08s with 41 bcgs iterations.
> >
> > Merely changing the -pc_bjacobi_local_blocks to 6
> >
> > Took 7.02s with 73 bcgs iterations. 9 blocks would further take 9.45s with 101 bcgs iterations.
>
>     This is normal. more blocks slower convergence
> >
> > As a reference, my home-brew Fortran code solves the same problem (3-block D-ILU0) in
> >
> > 1.84s with 24 bcgs iterations (the bcgs code is also a home-brew one)?
> >
>     Run the PETSc code with optimization ./configure --with-debugging=0  an run the code with -log_view this will show where the PETSc code is spending the time (send it to use)
>
>
> >
> >
> > Well, by saying ?use explicit L/U matrix as preconditioner?, I wonder if a user is allowed to provide his own (separate) L and U Mat for preconditioning (like how it works in Matlab solvers), e.g.
> >
> > x = qmr(A,b,Tol,MaxIter,L,U,x)
> >
> > As I already explicitly constructed the L and U matrices in Fortran, it is not hard to convert them to Mat in Petsc to test Petsc and my Fortran code head-to-head. In that case, the A, b, x, and L/U are all identical, it would be easier to see where the problem came from.
> >
> >
>      No, we don't provide this kind of support
>
>
> >
> > BTW, there is another thing I wondered - is there a way to output residual in unpreconditioned norm? I tried to
> >
> > call KSPSetNormType(ksp_local, KSP_NORM_UNPRECONDITIONED, ierr)
> >
> > But always get an error that current ksp does not support unpreconditioned in LEFT/RIGHT (either way). Is it possible to do that (output unpreconditioned residual) in PETSc at all?
>
>    -ksp_monitor_true_residual    You can also run GMRES (and some other methods) with right preconditioning, -ksp_pc_side right  then the residual computed is by the algorithm the unpreconditioned residual
>
>    KSPSetNormType sets the norm used in the algorithm, it generally always has to left or right, only a couple algorithms support unpreconditioned directly.
>
>    Barry
>
>
> >
> > Cheers,
> > Hao
> >
> >
> > From: Smith, Barry F. <bsmith at mcs.anl.gov>
> > Sent: Tuesday, February 4, 2020 8:27 PM
> > To: Hao DONG <dong-hao at outlook.com>
> > Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> > Subject: Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC?
> >
> >
> >
> > > On Feb 4, 2020, at 12:41 PM, Hao DONG <dong-hao at outlook.com> wrote:
> > >
> > > Dear all,
> > >
> > >
> > > I have a few questions about the implementation of diagonal ILU PC in PETSc. I want to solve a very simple system with KSP (in parallel), the nature of the system (finite difference time-harmonic Maxwell) is probably not important to the question itself. Long story short, I just need to solve a set of equations of Ax = b with a block diagonal system matrix, like (not sure if the mono font works):
> > >
> > >    |X    |
> > > A =|  Y  |
> > >    |    Z|
> > >
> > > Note that A is not really block-diagonal, it?s just a multi-diagonal matrix determined by the FD mesh, where most elements are close to diagonal. So instead of a full ILU decomposition, a D-ILU is a good approximation as a preconditioner, and the number of blocks should not matter too much:
> > >
> > >     |Lx      |         |Ux      |
> > > L = |   Ly   | and U = |   Uy   |
> > >     |      Lz|         |      Uz|
> > >
> > > Where [Lx, Ux] = ILU0(X), etc. Indeed, the D-ILU preconditioner (with 3N blocks) is quite efficient with Krylov subspace methods like BiCGstab or QMR in my sequential Fortran/Matlab code.
> > >
> > > So like most users, I am looking for a parallel implement with this problem in PETSc. After looking through the manual, it seems that the most straightforward way to do it is through PCBJACOBI. Not sure I understand it right, I just setup a 3-block PCJACOBI and give each of the block a KSP with PCILU. Is this supposed to be equivalent to my D-ILU preconditioner? My little block of fortran code would look like:
> > > ...
> > >       call PCBJacobiSetTotalBlocks(pc_local,Ntotal,                   &
> > >      &     isubs,ierr)
> > >       call PCBJacobiSetLocalBlocks(pc_local, Nsub,                    &
> > >      &    isubs(istart:iend),ierr)
> > >       ! set up the block jacobi structure
> > >       call KSPSetup(ksp_local,ierr)
> > >       ! allocate sub ksps
> > >       allocate(ksp_sub(Nsub))
> > >       call PCBJacobiGetSubKSP(pc_local,Nsub,istart,                   &
> > >      &     ksp_sub,ierr)
> > >       do i=1,Nsub
> > >           call KSPGetPC(ksp_sub(i),pc_sub,ierr)
> > >           !ILU preconditioner
> > >           call PCSetType(pc_sub,ptype,ierr)
> > >           call PCFactorSetLevels(pc_sub,1,ierr) ! use ILU(1) here
> > >           call KSPSetType(ksp_sub(i),KSPPREONLY,ierr)]
> > >       end do
> > >       call KSPSetTolerances(ksp_local,KSSiter%tol,PETSC_DEFAULT_REAL, &
> > >      &     PETSC_DEFAULT_REAL,KSSiter%maxit,ierr)
> > > ?
> >
> >      This code looks essentially right. You should call with -ksp_view to see exactly what PETSc is using for a solver.
> >
> > >
> > > I understand that the parallel performance may not be comparable, so I first set up a one-process test (with MPIAij, but all the rows are local since there is only one process). The system is solved without any problem (identical results within error). But the performance is actually a lot worse (code built without debugging flags in performance tests) than my own home-brew implementation in Fortran (I wrote my own ILU0 in CSR sparse matrix format), which is hard to believe. I suspect the difference is from the PC as the PETSc version took much more BiCGstab iterations (60-ish vs 100-ish) to converge to the same relative tol.
> >
> >    PETSc uses GMRES by default with a restart of 30 and left preconditioning. It could be different exact choices in the solver (which is why -ksp_view is so useful) can explain the differences in the runs between your code and PETSc's
> > >
> > > This is further confirmed when I change the setup of D-ILU (using 6 or 9 blocks instead of 3). While my Fortran/Matlab codes see minimal performance difference (<5%) when I play with the D-ILU setup, increasing the number of D-ILU blocks from 3 to 9 caused the ksp setup with PCBJACOBI to suffer a performance decrease of more than 50% in sequential test.
> >
> >    This is odd, the more blocks the smaller each block so the quicker the ILU set up should be. You can run various cases with -log_view and send us the output to see what is happening at each part of the computation in time.
> >
> > > So my implementation IS somewhat different in PETSc. Do I miss something in the PCBJACOBI setup? Or do I have some fundamental misunderstanding of how PCBJACOBI works in PETSc?
> >
> >    Probably not.
> > >
> > > If this is not the right way to implement a block diagonal ILU as (parallel) PC, please kindly point me to the right direction. I searched through the mail list to find some answers, only to find a couple of similar questions... An example would be nice.
> >
> >    You approach seems fundamentally right but I cannot be sure of possible bugs.
> > >
> > > On the other hand, does PETSc support a simple way to use explicit L/U matrix as a preconditioner? I can import the  D-ILU matrix (I already converted my A matrix into Mat) constructed in my Fortran code to make a better comparison. Or do I have to construct my own PC using PCshell? If so, is there a good tutorial/example to learn about how to use PCSHELL (in a more advanced sense, like how to setup pc side and transpose)?
> >
> >    Not sure what you mean by explicit L/U matrix as a preconditioner. As Hong said, yes you can use a parallel LU from MUMPS or SuperLU_DIST or Pastix as the solver. You do not need any shell code. You simply need to set the PCType to lu
> >
> >    You can also set all this options from the command line and don't need to write the code specifically. So call KSPSetFromOptions() and then for example
> >
> >     -pc_type bjacobi  -pc_bjacobi_local_blocks 3 -pc_sub_type ilu (this last one is applied to each block so you could use -pc_type lu and it would use lu on each block.)
> >
> >    -ksp_type_none  -pc_type lu -pc_factor_mat_solver_type mumps  (do parallel LU with mumps)
> >
> > By not hardwiring in the code and just using options you can test out different cases much quicker
> >
> > Use -ksp_view to make sure that is using the solver the way you expect.
> >
> > Barry
> >
> >
> >
> >    Barry
> >
> > >
> > > Thanks in advance,
> > >
> > > Hao

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200207/198eb8ef/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: configure.log
Type: application/octet-stream
Size: 622592 bytes
Desc: configure.log
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200207/198eb8ef/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: configure.log
Type: application/octet-stream
Size: 1682411 bytes
Desc: configure.log
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200207/198eb8ef/attachment-0003.obj>

From knepley at gmail.com  Fri Feb  7 07:51:07 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 7 Feb 2020 08:51:07 -0500
Subject: [petsc-users] Condition Number and GMRES iteration
In-Reply-To: <CAN5Wd-LcRCAr-StMLD6McREAST9-yQpg=rStBL7TtxVzG=fE-A@mail.gmail.com>
References: <CAN5Wd-LcRCAr-StMLD6McREAST9-yQpg=rStBL7TtxVzG=fE-A@mail.gmail.com>
Message-ID: <CAMYG4GnpZRnhSoQ-Ga=1KvJD7FUEFYu25K26Ut5vRgvTgwdCkw@mail.gmail.com>

On Thu, Feb 6, 2020 at 7:37 PM Fande Kong <fdkong.jd at gmail.com> wrote:

> Hi All,
>
> MOOSE team, Alex and I are working on some variable scaling techniques to
> improve the condition number of the matrix of linear systems. The goal of
> variable scaling is to make the diagonal of matrix as close to unity as
> possible. After scaling (for certain example), the condition number of the
> linear system is actually reduced, but the GMRES iteration does not
> decrease at all.
>
> From my understanding, the condition number is the worst estimation for
> GMRES convergence. That is, the GMRES iteration should not increases when
> the condition number decreases. This actually could example what we saw:
> the improved condition number does not necessary lead to a decrease in
> GMRES iteration. We try to understand this a bit more, and we guess that
> the number of eigenvalue clusters of the matrix of the linear system
> may/might be related to the convergence rate of GMRES.  We plot eigenvalues
> of scaled system and unscaled system, and the clusters look different from
> each other, but the GMRRES iterations are the same.
>
> Anyone know what is the right relationship between the condition number
> and GMRES iteration? How does the number of eigenvalue clusters affect
> GMRES iteration?  How to count eigenvalue clusters? For example, how many
> eigenvalue clusters we have in the attach image respectively?
>
> If you need more details, please let us know. Alex and I are happy to
> provide any details you are interested in.
>

Hi Fande,

This is one of my favorite papers of all time:

  https://epubs.siam.org/doi/abs/10.1137/S0895479894275030

It shows that the spectrum alone tells you nothing at all about GMRES
convergence. You need other things, like symmetry (almost
everything is known) or normality (a little bit is known).

  Thanks,

      Matt


> Thanks,
>
> Fande Kong,
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200207/4d621554/attachment.html>

From knepley at gmail.com  Fri Feb  7 07:53:45 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 7 Feb 2020 08:53:45 -0500
Subject: [petsc-users] Condition Number and GMRES iteration
In-Reply-To: <CANFcJrFrPjXRuZZupW0fsCxCydrA-OpAvj-a349F+tHCwZ6f1w@mail.gmail.com>
References: <CAN5Wd-LcRCAr-StMLD6McREAST9-yQpg=rStBL7TtxVzG=fE-A@mail.gmail.com>
	<CANFcJrFrPjXRuZZupW0fsCxCydrA-OpAvj-a349F+tHCwZ6f1w@mail.gmail.com>
Message-ID: <CAMYG4GnNuNQLj+kjhYFm9TwSmYxs8-E3aCZ4tEXQT5F=r8FH2w@mail.gmail.com>

On Thu, Feb 6, 2020 at 8:07 PM Alexander Lindsay <alexlindsay239 at gmail.com>
wrote:

> It looks like Fande has attached the eigenvalue plots with the real axis
> having a logarithmic scale. The same plots with a linear scale are attached
> here.
>
> The system has 306 degrees of freedom. 12 eigenvalues are unity for both
> scaled and unscaled cases; this number corresponds to the number of mesh
> nodes with Dirichlet boundary conditions (just a 1 on the diagonal for the
> corresponding rows). The rest of the eigenvalues are orders of magnitude
> smaller for the unscaled case; using scaling these eigenvalues are brought
> much closer 1.
>
> This particular problem is linear but we solve it with SNES, so constant
> Jacobian. We run with options '-pc_type none -ksp_gmres_restart 1000
> -snes_rtol 1e-8 -ksp_rtol 1e-5` so for this linear problem it takes two
> non-linear iterations to solve.
>

Why not just make -ksp_rtol 1e-8?

  Thanks,

     Matt


> Unscaled:
>
> first nonlinear iteration takes 2 linear iterations
> second nonlinear iteration takes 99 linear iterations
>
> Scaled:
>
> first nonlinear iteration takes 94 linear iterations
> second nonlinear iteration takes 100 linear iterations
>
> Running with `-pc_type svd` the condition number for the unscaled
> simulation is 4e9 while it is 2e3 for the scaled simulation.
>
>
>
> On Thu, Feb 6, 2020 at 4:36 PM Fande Kong <fdkong.jd at gmail.com> wrote:
>
>> Hi All,
>>
>> MOOSE team, Alex and I are working on some variable scaling techniques to
>> improve the condition number of the matrix of linear systems. The goal of
>> variable scaling is to make the diagonal of matrix as close to unity as
>> possible. After scaling (for certain example), the condition number of the
>> linear system is actually reduced, but the GMRES iteration does not
>> decrease at all.
>>
>> From my understanding, the condition number is the worst estimation for
>> GMRES convergence. That is, the GMRES iteration should not increases when
>> the condition number decreases. This actually could example what we saw:
>> the improved condition number does not necessary lead to a decrease in
>> GMRES iteration. We try to understand this a bit more, and we guess that
>> the number of eigenvalue clusters of the matrix of the linear system
>> may/might be related to the convergence rate of GMRES.  We plot eigenvalues
>> of scaled system and unscaled system, and the clusters look different from
>> each other, but the GMRRES iterations are the same.
>>
>> Anyone know what is the right relationship between the condition number
>> and GMRES iteration? How does the number of eigenvalue clusters affect
>> GMRES iteration?  How to count eigenvalue clusters? For example, how many
>> eigenvalue clusters we have in the attach image respectively?
>>
>> If you need more details, please let us know. Alex and I are happy to
>> provide any details you are interested in.
>>
>>
>> Thanks,
>>
>> Fande Kong,
>>
>>
>>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200207/42ced29c/attachment.html>

From mfadams at lbl.gov  Fri Feb  7 12:31:43 2020
From: mfadams at lbl.gov (Mark Adams)
Date: Fri, 7 Feb 2020 13:31:43 -0500
Subject: [petsc-users] Condition Number and GMRES iteration
In-Reply-To: <CANFcJrFrPjXRuZZupW0fsCxCydrA-OpAvj-a349F+tHCwZ6f1w@mail.gmail.com>
References: <CAN5Wd-LcRCAr-StMLD6McREAST9-yQpg=rStBL7TtxVzG=fE-A@mail.gmail.com>
	<CANFcJrFrPjXRuZZupW0fsCxCydrA-OpAvj-a349F+tHCwZ6f1w@mail.gmail.com>
Message-ID: <CADOhEh57u8JRpHyOVRZwqBdZUqd+917gCac=j4Dn=cR4pF7d+Q@mail.gmail.com>

On Thu, Feb 6, 2020 at 8:07 PM Alexander Lindsay <alexlindsay239 at gmail.com>
wrote:

> It looks like Fande has attached the eigenvalue plots with the real axis
> having a logarithmic scale. The same plots with a linear scale are attached
> here.
>
> The system has 306 degrees of freedom. 12 eigenvalues are unity for both
> scaled and unscaled cases; this number corresponds to the number of mesh
> nodes with Dirichlet boundary conditions (just a 1 on the diagonal for the
> corresponding rows). The rest of the eigenvalues are orders of magnitude
> smaller for the unscaled case; using scaling these eigenvalues are brought
> much closer 1.
>

So you are running un-preconditioned GMRES, I assume. So your condition
number is like 10^23 because if these 1s on the diagonal. I suggest always
scaling by the diagonal for that reason, but if you want to run
un-preconditioned then you have to be careful about what amount to penalty
terms. In this case just take them out of the system entirely. They are
just mudding up your numerical studies. (Now Krylov is just nailing these
in the first iteration, espicaily GMRES which focuses on the largest
eigenvalue vectors).

BTW, one of my earliest talks, in grad school before I had any real
results, was called "condition number does not matter" and I showed
examples problems where solvers, multigrid to be specific in some cases,
work great on poorly conditioned problems (eg, scale and your problem) and
fail on well conditioned problems (eg, incompressibility)


>
> This particular problem is linear but we solve it with SNES, so constant
> Jacobian. We run with options '-pc_type none -ksp_gmres_restart 1000
> -snes_rtol 1e-8 -ksp_rtol 1e-5` so for this linear problem it takes two
> non-linear iterations to solve.
>
> Unscaled:
>
> first nonlinear iteration takes 2 linear iterations
> second nonlinear iteration takes 99 linear iterations
>
> Scaled:
>
> first nonlinear iteration takes 94 linear iterations
> second nonlinear iteration takes 100 linear iterations
>
> Running with `-pc_type svd` the condition number for the unscaled
> simulation is 4e9 while it is 2e3 for the scaled simulation.
>
>
>
> On Thu, Feb 6, 2020 at 4:36 PM Fande Kong <fdkong.jd at gmail.com> wrote:
>
>> Hi All,
>>
>> MOOSE team, Alex and I are working on some variable scaling techniques to
>> improve the condition number of the matrix of linear systems. The goal of
>> variable scaling is to make the diagonal of matrix as close to unity as
>> possible. After scaling (for certain example), the condition number of the
>> linear system is actually reduced, but the GMRES iteration does not
>> decrease at all.
>>
>> From my understanding, the condition number is the worst estimation for
>> GMRES convergence. That is, the GMRES iteration should not increases when
>> the condition number decreases. This actually could example what we saw:
>> the improved condition number does not necessary lead to a decrease in
>> GMRES iteration. We try to understand this a bit more, and we guess that
>> the number of eigenvalue clusters of the matrix of the linear system
>> may/might be related to the convergence rate of GMRES.  We plot eigenvalues
>> of scaled system and unscaled system, and the clusters look different from
>> each other, but the GMRRES iterations are the same.
>>
>> Anyone know what is the right relationship between the condition number
>> and GMRES iteration? How does the number of eigenvalue clusters affect
>> GMRES iteration?  How to count eigenvalue clusters? For example, how many
>> eigenvalue clusters we have in the attach image respectively?
>>
>> If you need more details, please let us know. Alex and I are happy to
>> provide any details you are interested in.
>>
>>
>> Thanks,
>>
>> Fande Kong,
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200207/a968a55a/attachment.html>

From eijkhout at tacc.utexas.edu  Fri Feb  7 12:43:27 2020
From: eijkhout at tacc.utexas.edu (Victor Eijkhout)
Date: Fri, 7 Feb 2020 18:43:27 +0000
Subject: [petsc-users] Condition Number and GMRES iteration
In-Reply-To: <CADOhEh57u8JRpHyOVRZwqBdZUqd+917gCac=j4Dn=cR4pF7d+Q@mail.gmail.com>
References: <CAN5Wd-LcRCAr-StMLD6McREAST9-yQpg=rStBL7TtxVzG=fE-A@mail.gmail.com>
	<CANFcJrFrPjXRuZZupW0fsCxCydrA-OpAvj-a349F+tHCwZ6f1w@mail.gmail.com>
	<CADOhEh57u8JRpHyOVRZwqBdZUqd+917gCac=j4Dn=cR4pF7d+Q@mail.gmail.com>
Message-ID: <78AF6D48-807E-4014-A8D1-B31207A8C3FC@tacc.utexas.edu>



On , 2020Feb7, at 12:31, Mark Adams <mfadams at lbl.gov<mailto:mfadams at lbl.gov>> wrote:

BTW, one of my earliest talks, in grad school before I had any real results, was called "condition number does not matter?

After you learn that the condition number gives an _upper_bound_ on the number of iterations, you learn that if a few eigenvalues are separated from a cluster of other eigenvalues, your number of iterations is 1 for each separated one, and then a bound based on the remaining cluster.

(Condition number predicts a number of iterations based on Chebychev polynomials. Since the CG polynomials are optimal, they are at least as good as Chebychev. Hence the number of iterations is at most what you got from Chebychev, which is the condition number bound.)

Victor.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200207/04e7393c/attachment.html>

From knepley at gmail.com  Fri Feb  7 12:55:45 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 7 Feb 2020 13:55:45 -0500
Subject: [petsc-users] Condition Number and GMRES iteration
In-Reply-To: <78AF6D48-807E-4014-A8D1-B31207A8C3FC@tacc.utexas.edu>
References: <CAN5Wd-LcRCAr-StMLD6McREAST9-yQpg=rStBL7TtxVzG=fE-A@mail.gmail.com>
	<CANFcJrFrPjXRuZZupW0fsCxCydrA-OpAvj-a349F+tHCwZ6f1w@mail.gmail.com>
	<CADOhEh57u8JRpHyOVRZwqBdZUqd+917gCac=j4Dn=cR4pF7d+Q@mail.gmail.com>
	<78AF6D48-807E-4014-A8D1-B31207A8C3FC@tacc.utexas.edu>
Message-ID: <CAMYG4GmOJrPMQE-nBnq2BnPYxE+p7M0Tt=Bc5RYEnBgdsTbX2A@mail.gmail.com>

On Fri, Feb 7, 2020 at 1:43 PM Victor Eijkhout <eijkhout at tacc.utexas.edu>
wrote:

>
>
> On , 2020Feb7, at 12:31, Mark Adams <mfadams at lbl.gov> wrote:
>
> BTW, one of my earliest talks, in grad school before I had any real
> results, was called "condition number does not matter?
>
>
> After you learn that the condition number gives an _upper_bound_ on the
> number of iterations, you learn that if a few eigenvalues are separated
> from a cluster of other eigenvalues, your number of iterations is 1 for
> each separated one, and then a bound based on the remaining cluster.
>

This is _only_ for normal matrices. Not true for general matrices.

   Matt


> (Condition number predicts a number of iterations based on Chebychev
> polynomials. Since the CG polynomials are optimal, they are at least as
> good as Chebychev. Hence the number of iterations is at most what you got
> from Chebychev, which is the condition number bound.)
>
> Victor.
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200207/8711c730/attachment.html>

From fdkong.jd at gmail.com  Fri Feb  7 13:14:08 2020
From: fdkong.jd at gmail.com (Fande Kong)
Date: Fri, 7 Feb 2020 12:14:08 -0700
Subject: [petsc-users] Condition Number and GMRES iteration
In-Reply-To: <CAMYG4GnpZRnhSoQ-Ga=1KvJD7FUEFYu25K26Ut5vRgvTgwdCkw@mail.gmail.com>
References: <CAN5Wd-LcRCAr-StMLD6McREAST9-yQpg=rStBL7TtxVzG=fE-A@mail.gmail.com>
	<CAMYG4GnpZRnhSoQ-Ga=1KvJD7FUEFYu25K26Ut5vRgvTgwdCkw@mail.gmail.com>
Message-ID: <CAN5Wd-LOB-_2HdFPVQjk0j3z1hknMeAmSXKt6ToceffPMBXZnA@mail.gmail.com>

Thanks, Matt,

It is a great paper. According to the paper, here is my understanding: for
normal matrices, the eigenvalues of the matrix together with the
initial residual completely determine the GMRES convergence rate. For
non-normal matrices, eigenvalues are NOT the relevant quantities in
determining the behavior of GMRES.

What quantities we should look at for non-normal matrices? In other words,
how do we know one matrix is easier than others to solve?  Possibly they
are still open problems?!

Thanks,

Fande,

On Fri, Feb 7, 2020 at 6:51 AM Matthew Knepley <knepley at gmail.com> wrote:

> On Thu, Feb 6, 2020 at 7:37 PM Fande Kong <fdkong.jd at gmail.com> wrote:
>
>> Hi All,
>>
>> MOOSE team, Alex and I are working on some variable scaling techniques to
>> improve the condition number of the matrix of linear systems. The goal of
>> variable scaling is to make the diagonal of matrix as close to unity as
>> possible. After scaling (for certain example), the condition number of the
>> linear system is actually reduced, but the GMRES iteration does not
>> decrease at all.
>>
>> From my understanding, the condition number is the worst estimation for
>> GMRES convergence. That is, the GMRES iteration should not increases when
>> the condition number decreases. This actually could example what we saw:
>> the improved condition number does not necessary lead to a decrease in
>> GMRES iteration. We try to understand this a bit more, and we guess that
>> the number of eigenvalue clusters of the matrix of the linear system
>> may/might be related to the convergence rate of GMRES.  We plot eigenvalues
>> of scaled system and unscaled system, and the clusters look different from
>> each other, but the GMRRES iterations are the same.
>>
>> Anyone know what is the right relationship between the condition number
>> and GMRES iteration? How does the number of eigenvalue clusters affect
>> GMRES iteration?  How to count eigenvalue clusters? For example, how many
>> eigenvalue clusters we have in the attach image respectively?
>>
>> If you need more details, please let us know. Alex and I are happy to
>> provide any details you are interested in.
>>
>
> Hi Fande,
>
> This is one of my favorite papers of all time:
>
>   https://epubs.siam.org/doi/abs/10.1137/S0895479894275030
>
> It shows that the spectrum alone tells you nothing at all about GMRES
> convergence. You need other things, like symmetry (almost
> everything is known) or normality (a little bit is known).
>
>   Thanks,
>
>       Matt
>
>
>> Thanks,
>>
>> Fande Kong,
>>
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200207/4ff6c85c/attachment-0001.html>

From fdkong.jd at gmail.com  Fri Feb  7 13:15:14 2020
From: fdkong.jd at gmail.com (Fande Kong)
Date: Fri, 7 Feb 2020 12:15:14 -0700
Subject: [petsc-users] Condition Number and GMRES iteration
In-Reply-To: <78AF6D48-807E-4014-A8D1-B31207A8C3FC@tacc.utexas.edu>
References: <CAN5Wd-LcRCAr-StMLD6McREAST9-yQpg=rStBL7TtxVzG=fE-A@mail.gmail.com>
	<CANFcJrFrPjXRuZZupW0fsCxCydrA-OpAvj-a349F+tHCwZ6f1w@mail.gmail.com>
	<CADOhEh57u8JRpHyOVRZwqBdZUqd+917gCac=j4Dn=cR4pF7d+Q@mail.gmail.com>
	<78AF6D48-807E-4014-A8D1-B31207A8C3FC@tacc.utexas.edu>
Message-ID: <CAN5Wd-Kw0DkdjbadBqCZZgwffjhC9AssBc0gt5+GW+DmZvMWyA@mail.gmail.com>

On Fri, Feb 7, 2020 at 11:43 AM Victor Eijkhout <eijkhout at tacc.utexas.edu>
wrote:

>
>
> On , 2020Feb7, at 12:31, Mark Adams <mfadams at lbl.gov> wrote:
>
> BTW, one of my earliest talks, in grad school before I had any real
> results, was called "condition number does not matter?
>
>
> After you learn that the condition number gives an _upper_bound_ on the
> number of iterations, you learn that if a few eigenvalues are separated
> from a cluster of other eigenvalues, your number of iterations is 1 for
> each separated one, and then a bound based on the remaining cluster.
>
> (Condition number predicts a number of iterations based on Chebychev
> polynomials. Since the CG polynomials are optimal, they are at least as
> good as Chebychev. Hence the number of iterations is at most what you got
> from Chebychev, which is the condition number bound.)
>

I like this explanation for normal matrices. Thanks so much, Victor,


Fande,


>
> Victor.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200207/954a042a/attachment.html>

From wence at gmx.li  Fri Feb  7 15:17:36 2020
From: wence at gmx.li (Lawrence Mitchell)
Date: Fri, 7 Feb 2020 21:17:36 +0000
Subject: [petsc-users] Condition Number and GMRES iteration
In-Reply-To: <CAN5Wd-LOB-_2HdFPVQjk0j3z1hknMeAmSXKt6ToceffPMBXZnA@mail.gmail.com>
References: <CAN5Wd-LcRCAr-StMLD6McREAST9-yQpg=rStBL7TtxVzG=fE-A@mail.gmail.com>
	<CAMYG4GnpZRnhSoQ-Ga=1KvJD7FUEFYu25K26Ut5vRgvTgwdCkw@mail.gmail.com>
	<CAN5Wd-LOB-_2HdFPVQjk0j3z1hknMeAmSXKt6ToceffPMBXZnA@mail.gmail.com>
Message-ID: <CA+wRr2ne2n6GpYJkB_RmQVBMKPMf9EXNchSThMNs14erYfJHOQ@mail.gmail.com>

On Fri, 7 Feb 2020 at 19:15, Fande Kong <fdkong.jd at gmail.com> wrote:

> Thanks, Matt,
>
> It is a great paper. According to the paper, here is my understanding: for
> normal matrices, the eigenvalues of the matrix together with the
> initial residual completely determine the GMRES convergence rate. For
> non-normal matrices, eigenvalues are NOT the relevant quantities in
> determining the behavior of GMRES.
>
> What quantities we should look at for non-normal matrices? In other words,
> how do we know one matrix is easier than others to solve?
>

You need to do a field of values analysis to provide information. This can
give you bounds on convergence but it is often very weak.

J?rg Liesen has a bunch of papers on this.

Lawrence
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200207/a8a56598/attachment.html>

From jed at jedbrown.org  Fri Feb  7 16:27:59 2020
From: jed at jedbrown.org (Jed Brown)
Date: Fri, 07 Feb 2020 15:27:59 -0700
Subject: [petsc-users] Condition Number and GMRES iteration
In-Reply-To: <CAN5Wd-LOB-_2HdFPVQjk0j3z1hknMeAmSXKt6ToceffPMBXZnA@mail.gmail.com>
References: <CAN5Wd-LcRCAr-StMLD6McREAST9-yQpg=rStBL7TtxVzG=fE-A@mail.gmail.com>
	<CAMYG4GnpZRnhSoQ-Ga=1KvJD7FUEFYu25K26Ut5vRgvTgwdCkw@mail.gmail.com>
	<CAN5Wd-LOB-_2HdFPVQjk0j3z1hknMeAmSXKt6ToceffPMBXZnA@mail.gmail.com>
Message-ID: <87pneqav80.fsf@jedbrown.org>

Fande Kong <fdkong.jd at gmail.com> writes:

> Thanks, Matt,
>
> It is a great paper. According to the paper, here is my understanding: for
> normal matrices, the eigenvalues of the matrix together with the
> initial residual completely determine the GMRES convergence rate. For
> non-normal matrices, eigenvalues are NOT the relevant quantities in
> determining the behavior of GMRES.
>
> What quantities we should look at for non-normal matrices? In other words,
> how do we know one matrix is easier than others to solve?  Possibly they
> are still open problems?!

You can use the pseudospectrum, but that isn't a convenient thing to
compute for large systems.

With respect to condition number: an orthogonal matrix is a normal
matrix of condition number 1 for which GMRES requires n iterations.

From bsmith at mcs.anl.gov  Fri Feb  7 18:02:24 2020
From: bsmith at mcs.anl.gov (Smith, Barry F.)
Date: Sat, 8 Feb 2020 00:02:24 +0000
Subject: [petsc-users] What is the right way to implement a (block)
 Diagonal ILU as PC?
In-Reply-To: <MN2PR07MB6239E5B73B916F6BDD8279AA951C0@MN2PR07MB6239.namprd07.prod.outlook.com>
References: <MN2PR07MB6239B134B7BCE032E8ACC4C995030@MN2PR07MB6239.namprd07.prod.outlook.com>
	<264F91C4-8558-4850-9B4B-ABE4123C2A2C@anl.gov>
	<MN2PR07MB6239E7694FFB455C533991FB95020@MN2PR07MB6239.namprd07.prod.outlook.com>
	<4A373D93-4018-45E0-B805-3ECC528472DD@mcs.anl.gov>
	<BN8PR07MB6228BCD9B6D1205CB3E6D60F951D0@BN8PR07MB6228.namprd07.prod.outlook.com>
	<A3BEB448-119F-472E-A133-099EB4332E47@mcs.anl.gov>
	<MN2PR07MB6239E5B73B916F6BDD8279AA951C0@MN2PR07MB6239.namprd07.prod.outlook.com>
Message-ID: <EF5FCE13-A9D0-4E50-B20C-D7D7DE7AE628@mcs.anl.gov>



> On Feb 7, 2020, at 7:44 AM, Hao DONG <dong-hao at outlook.com> wrote:
> 
> Thanks, Barry, I really appreciate your help - 
> 
> I removed the OpenMP flags and rebuilt PETSc. So the problem is from the BLAS lib I linked?

  Yes, the openmp causes it to run in parallel, but the problem is not big enough and the machine is not good enough for parallel BLAS to speed things up, instead it slows things down a lot. We see this often, parallel BLAS must be used with care

> Not sure which version my BLAS is, though? But I also included the -download-Scalapack option. Shouldn?t that enable linking with PBLAS automatically?
> 
> After looking at the bcgs code in PETSc, I suppose the iteration residual recorded is indeed recorded twice per one "actual iteration?. So that can explain the difference of iteration numbers. 
> 
> My laptop is indeed an old machine (MBP15 mid-2014). I just cannot work with vi without a physical ESC key...

   The latest has a physical ESC, I am stuff without the ESC for a couple more years.

> I have attached the configure.log -didn?t know that it is so large! 
> 
> Anyway, it seems that the removal of -openmp changes quite a lot of things, the performance is indeed getting much better - the flop/sec increases by a factor of 3. Still, I am getting 20 percent of VecMDot, but no VecMDot in BCGS all (see output below), is that a feature of gmres method? 

   Yes, GMRES orthogonalizes against the last restart directions which uses these routines while  BCGS does not, this is why BCGS is cheaper per iteration.

    PETSc is no faster than your code because the algorithm is the same, the compilers the same, and the hardware the same. No way to have clever tricks for PETSc to be much faster. What PETS provides is a huge variety of tested algorithms that no single person could code on their own. Anything in PETSc you could code yourself if you had endless time and get basically the same performance.

  Barry


> 
> here is the output of the same problem with:  
> 
> -pc_type bjacobi -pc_bjacobi_local_blocks 3 -sub_pc_type ilu -ksp_type gmres -ksp_monitor -ksp_view
> 
> 
> ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
>  
>  Mod3DMT.test on a arch-darwin-c-opt named Haos-MBP with 1 processor, by donghao Fri Feb  7 10:26:19 2020
>  Using Petsc Release Version 3.12.3, unknown
>  
>                           Max       Max/Min     Avg       Total
>  Time (sec):           2.520e+00     1.000   2.520e+00
>  Objects:              1.756e+03     1.000   1.756e+03
>  Flop:                 7.910e+09     1.000   7.910e+09  7.910e+09
>  Flop/sec:             3.138e+09     1.000   3.138e+09  3.138e+09
>  MPI Messages:         0.000e+00     0.000   0.000e+00  0.000e+00
>  MPI Message Lengths:  0.000e+00     0.000   0.000e+00  0.000e+00
>  MPI Reductions:       0.000e+00     0.000
>  
>  Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
>                              e.g., VecAXPY() for real vectors of length N --> 2N flop
>                              and VecAXPY() for complex vectors of length N --> 8N flop
>  
>  Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
>                          Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
>   0:      Main Stage: 2.5204e+00 100.0%  7.9096e+09 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>  
>  ------------------------------------------------------------------------------------------------------------------------
> ?
> ------------------------------------------------------------------------------------------------------------------------
>  Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
>                     Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>  ------------------------------------------------------------------------------------------------------------------------
>  
>  --- Event Stage 0: Main Stage
>  
>  BuildTwoSidedF         1 1.0 3.4000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  MatMult               75 1.0 6.2884e-01 1.0 1.88e+09 1.0 0.0e+00 0.0e+00 0.0e+00 25 24  0  0  0  25 24  0  0  0  2991
>  MatSolve             228 1.0 4.4164e-01 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 14  0  0  0  18 14  0  0  0  2445
>  MatLUFactorNum         3 1.0 4.1317e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0   574
>  MatILUFactorSym        3 1.0 2.3858e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>  MatAssemblyBegin       5 1.0 4.4000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  MatAssemblyEnd         5 1.0 1.5067e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>  MatGetRowIJ            3 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  MatCreateSubMats       1 1.0 2.4558e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>  MatGetOrdering         3 1.0 1.3290e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  MatView                3 1.0 1.2800e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  VecMDot               72 1.0 4.9875e-01 1.0 2.25e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 28  0  0  0  20 28  0  0  0  4509
>  VecNorm               76 1.0 6.6666e-02 1.0 1.70e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  2  0  0  0   3  2  0  0  0  2544
>  VecScale              75 1.0 1.7982e-02 1.0 8.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0  4653
>  VecCopy                3 1.0 1.5080e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  VecSet               276 1.0 9.6784e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  4  0  0  0  0   4  0  0  0  0     0
>  VecAXPY                6 1.0 3.6860e-03 1.0 1.34e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  3632
>  VecMAXPY              75 1.0 4.0490e-01 1.0 2.41e+09 1.0 0.0e+00 0.0e+00 0.0e+00 16 30  0  0  0  16 30  0  0  0  5951
>  VecAssemblyBegin       2 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  VecAssemblyEnd         2 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  VecScatterBegin       76 1.0 5.3800e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  VecNormalize          75 1.0 8.3690e-02 1.0 2.51e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  3  0  0  0   3  3  0  0  0  2999
>  KSPSetUp               4 1.0 1.1663e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  KSPSolve               1 1.0 2.2119e+00 1.0 7.91e+09 1.0 0.0e+00 0.0e+00 0.0e+00 88100  0  0  0  88100  0  0  0  3576
>  KSPGMRESOrthog        72 1.0 8.7843e-01 1.0 4.50e+09 1.0 0.0e+00 0.0e+00 0.0e+00 35 57  0  0  0  35 57  0  0  0  5121
>  PCSetUp                4 1.0 9.2448e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  4  0  0  0  0   4  0  0  0  0   257
>  PCSetUpOnBlocks        1 1.0 6.6597e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0   356
>  PCApply               76 1.0 4.6281e-01 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 14  0  0  0  18 14  0  0  0  2333
>  PCApplyOnBlocks      228 1.0 4.6262e-01 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 14  0  0  0  18 14  0  0  0  2334
>  ------------------------------------------------------------------------------------------------------------------------
> 
> Average time to get PetscTime(): 1e-07
>  #PETSc Option Table entries:
>  -I LBFGS
>  -ksp_type gmres
>  -ksp_view
>  -log_view
>  -pc_bjacobi_local_blocks 3
>  -pc_type bjacobi
>  -sub_pc_type ilu
>  #End of PETSc Option Table entries
>  Compiled with FORTRAN kernels
>  Compiled with full precision matrices (default)
>  sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 16 sizeof(PetscInt) 4
>  Configure options: --with-scalar-type=complex --download-mumps --download-scalapack --with-fortran-kernels=1 --          FOPTFLAGS="-O3 -ffree-line-length-0 -msse2" --COPTFLAGS="-O3 -msse2" --CXXOPTFLAGS="-O3 -msse2" --with-debugging=0
>  -----------------------------------------
>  Libraries compiled on 2020-02-07 10:07:42 on Haos-MBP
>  Machine characteristics: Darwin-19.3.0-x86_64-i386-64bit
>  Using PETSc directory: /Users/donghao/src/git/PETSc-current
>  Using PETSc arch: arch-darwin-c-opt
>  -----------------------------------------
>  
>  Using C compiler: mpicc  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fno-stack-   check -Qunused-arguments -fvisibility=hidden -O3 -msse2
>  Using Fortran compiler: mpif90  -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -O3 -ffree-line-length-0 -         msse2
>  -----------------------------------------
>  
>  Using include paths: -I/Users/donghao/src/git/PETSc-current/include -I/Users/donghao/src/git/PETSc-current/arch-darwin-  c-opt/include
>  -----------------------------------------
>  
>  Using C linker: mpicc
>  Using Fortran linker: mpif90
>  Using libraries: -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/donghao/src/git/PETSc-   current/arch-darwin-c-opt/lib -lpetsc -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/    donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -Wl,-rpath,/usr/local/opt/libevent/lib -L/usr/local/opt/libevent/    lib -Wl,-rpath,/usr/local/Cellar/open-mpi/4.0.2/lib -L/usr/local/Cellar/open-mpi/4.0.2/lib -Wl,-rpath,/usr/local/Cellar/ gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple-darwin19/9.2.0 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple-        darwin19/9.2.0 -Wl,-rpath,/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -lcmumps -   ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -llapack -lblas -lc++ -ldl -lmpi_usempif08 -                 lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lquadmath -lm -lc++ -ldl
>  -----------------------------------------
> 
> 
> 
> The BCGS solver performance is now comparable to my own Fortran code (1.84s). Still, I feel that there is something wrong hidden somewhere in my setup - a professional lib should to perform better, I believe. Any other ideas that I can look into? Interestingly there is no VecMDot operation at all! Here is the output with the option of: 
> 
> -pc_type bjacobi -pc_bjacobi_local_blocks 3 -sub_pc_type ilu -ksp_type bcgs -ksp_monitor -ksp_view
> 
> 
> 
> 
> ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
>  
>  Mod3DMT.test on a arch-darwin-c-opt named Haos-MBP with 1 processor, by donghao Fri Feb  7 10:38:00 2020
>  Using Petsc Release Version 3.12.3, unknown
>  
>                           Max       Max/Min     Avg       Total
>  Time (sec):           2.187e+00     1.000   2.187e+00
>  Objects:              1.155e+03     1.000   1.155e+03
>  Flop:                 4.311e+09     1.000   4.311e+09  4.311e+09
>  Flop/sec:             1.971e+09     1.000   1.971e+09  1.971e+09
>  MPI Messages:         0.000e+00     0.000   0.000e+00  0.000e+00
>  MPI Message Lengths:  0.000e+00     0.000   0.000e+00  0.000e+00
>  MPI Reductions:       0.000e+00     0.000
>  
>  Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
>                              e.g., VecAXPY() for real vectors of length N --> 2N flop
>                              and VecAXPY() for complex vectors of length N --> 8N flop
>  
>  Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
>                          Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
>   0:      Main Stage: 2.1870e+00 100.0%  4.3113e+09 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>  
>  ------------------------------------------------------------------------------------------------------------------------
> 
>  ------------------------------------------------------------------------------------------------------------------------
>  Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
>                     Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>  ------------------------------------------------------------------------------------------------------------------------
>  
>  --- Event Stage 0: Main Stage
>  
>  BuildTwoSidedF         1 1.0 2.2000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  MatMult               83 1.0 7.8726e-01 1.0 2.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 36 48  0  0  0  36 48  0  0  0  2644
>  MatSolve             252 1.0 5.5656e-01 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 25 28  0  0  0  25 28  0  0  0  2144
>  MatLUFactorNum         3 1.0 4.5115e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   526
>  MatILUFactorSym        3 1.0 2.5103e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>  MatAssemblyBegin       5 1.0 3.3000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  MatAssemblyEnd         5 1.0 1.5709e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>  MatGetRowIJ            3 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  MatCreateSubMats       1 1.0 2.8989e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>  MatGetOrdering         3 1.0 1.1200e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  MatView                3 1.0 1.2600e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  VecDot                82 1.0 8.9328e-02 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00  4  4  0  0  0   4  4  0  0  0  2048
>  VecDotNorm2           41 1.0 9.9019e-02 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00  5  4  0  0  0   5  4  0  0  0  1848
>  VecNorm               43 1.0 3.9988e-02 1.0 9.59e+07 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0  2399
>  VecCopy                2 1.0 1.1150e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  VecSet               271 1.0 4.2833e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
>  VecAXPY                1 1.0 5.9200e-04 1.0 2.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  3769
>  VecAXPBYCZ            82 1.0 1.1448e-01 1.0 3.66e+08 1.0 0.0e+00 0.0e+00 0.0e+00  5  8  0  0  0   5  8  0  0  0  3196
>  VecWAXPY              82 1.0 6.7460e-02 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  4  0  0  0   3  4  0  0  0  2712
>  VecAssemblyBegin       2 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  VecAssemblyEnd         2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  VecScatterBegin       84 1.0 5.2800e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>  KSPSetUp               4 1.0 1.4765e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>  KSPSolve               1 1.0 1.8514e+00 1.0 4.31e+09 1.0 0.0e+00 0.0e+00 0.0e+00 85100  0  0  0  85100  0  0  0  2329
>  PCSetUp                4 1.0 1.0193e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  5  1  0  0  0   5  1  0  0  0   233
>  PCSetUpOnBlocks        1 1.0 7.1421e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  3  1  0  0  0   3  1  0  0  0   332
>  PCApply               84 1.0 5.7927e-01 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 26 28  0  0  0  26 28  0  0  0  2060
>  PCApplyOnBlocks      252 1.0 5.7902e-01 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 26 28  0  0  0  26 28  0  0  0  2061
>  ------------------------------------------------------------------------------------------------------------------------
> 
> 
> Cheers, 
> Hao
> 
> 
> 
> From: Smith, Barry F. <bsmith at mcs.anl.gov>
> Sent: Thursday, February 6, 2020 7:03 PM
> To: Hao DONG <dong-hao at outlook.com>
> Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> Subject: Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC?
>  
> 
>   Read my comments ALL the way down, they go a long way.
> 
> > On Feb 6, 2020, at 3:43 AM, Hao DONG <dong-hao at outlook.com> wrote:
> > 
> > Dear Hong and Barry, 
> > 
> > Thanks for the suggestions. So there could be some problems in my PETSc configuration? - but my PETSc lib was indeed compiled without the debug flags (--with-debugging=0). I use GCC/GFortran (Home-brew GCC 9.2.0) for the compiling and building of PETSc and my own fortran code. My Fortran compiling flags are simply like: 
> > 
> > -O3 -ffree-line-length-none -fastsse 
> > 
> > Which is also used for -FOPTFLAGS in PETSc (I added -openmp for PETSc, but not my fortran code, as I don?t have any OMP optimizations in my program). Note the performance test results I listed yesterday (e.g. 4.08s with 41 bcgs iterations.) are without any CSR-array->PETSc translation overhead (only include the set and solve part). 
> 
>    PETSc doesn't use -openmp in any way for its solvers. Do not use this option, it may be slowing the code down. Please send configure.log
> 
> > 
> > I have two questions about the performance difference - 
> > 
> > 1. Is ilu only factorized once for each iteration, or ilu is performed at every outer ksp iteration steps? Sounds unlikely - but if so, this could cause some extra overheads. 
> 
>    ILU is ONLY done if the matrix has changed (which seems wrong).
> > 
> > 2. Some KSP solvers like BCGS or TFQMR has two ?half-iterations? for each iteration step. Not sure how it works in PETSc, but is that possible that both the two ?half" relative residuals are counted in the output array, doubling the number of iterations (but that cannot explain the extra time consumed)?
> 
>    Yes, PETSc might report them as two, you need to check the exact code.
> 
> > 
> > Anyway, the output with -log_view from the same 278906 by 278906 matrix with 3-block D-ILU in PETSc is as follows: 
> > 
> > 
> > ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
> >  
> >  MEMsolv.lu on a arch-darwin-c-opt named Haos-MBP with 1 processor, by donghao Thu Feb  6 09:07:35 2020
> >  Using Petsc Release Version 3.12.3, unknown
> >  
> >                           Max       Max/Min     Avg       Total
> >  Time (sec):           4.443e+00     1.000   4.443e+00
> >  Objects:              1.155e+03     1.000   1.155e+03
> >  Flop:                 4.311e+09     1.000   4.311e+09  4.311e+09
> >  Flop/sec:             9.703e+08     1.000   9.703e+08  9.703e+08
> >  MPI Messages:         0.000e+00     0.000   0.000e+00  0.000e+00
> >  MPI Message Lengths:  0.000e+00     0.000   0.000e+00  0.000e+00
> >  MPI Reductions:       0.000e+00     0.000
> >  
> >  Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
> >                              e.g., VecAXPY() for real vectors of length N --> 2N flop
> >                              and VecAXPY() for complex vectors of length N --> 8N flop
> >  
> >  Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
> >                          Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
> >   0:      Main Stage: 4.4435e+00 100.0%  4.3113e+09 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%
> >  
> >  ????????????????????????????????????????????????????????????
> >  See the 'Profiling' chapter of the users' manual for details on interpreting output.
> >  Phase summary info:
> >     Count: number of times phase was executed
> >     Time and Flop: Max - maximum over all processors
> >                    Ratio - ratio of maximum to minimum over all processors
> >     Mess: number of messages sent
> >     AvgLen: average message length (bytes)
> >     Reduct: number of global reductions
> >     Global: entire computation
> >     Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
> >        %T - percent time in this phase         %F - percent flop in this phase
> >        %M - percent messages in this phase     %L - percent message lengths in this phase
> >        %R - percent reductions in this phase
> >     Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
> >  ------------------------------------------------------------------------------------------------------------------------
> >  Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
> >                     Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> >  ------------------------------------------------------------------------------------------------------------------------
> >  
> >  --- Event Stage 0: Main Stage
> >  
> >  BuildTwoSidedF         1 1.0 2.3000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> >  MatMult               83 1.0 1.7815e+00 1.0 2.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 40 48  0  0  0  40 48  0  0  0  1168
> >  MatSolve             252 1.0 1.2708e+00 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 29 28  0  0  0  29 28  0  0  0   939
> >  MatLUFactorNum         3 1.0 7.9725e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   298
> >  MatILUFactorSym        3 1.0 2.6998e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> >  MatAssemblyBegin       5 1.0 3.6000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> >  MatAssemblyEnd         5 1.0 3.1619e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> >  MatGetRowIJ            3 1.0 2.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> >  MatCreateSubMats       1 1.0 3.9659e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> >  MatGetOrdering         3 1.0 4.3070e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> >  MatView                3 1.0 1.3600e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> >  VecDot                82 1.0 1.8948e-01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00  4  4  0  0  0   4  4  0  0  0   966
> >  VecDotNorm2           41 1.0 1.6812e-01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00  4  4  0  0  0   4  4  0  0  0  1088
> >  VecNorm               43 1.0 9.5099e-02 1.0 9.59e+07 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0  1009
> >  VecCopy                2 1.0 1.0120e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> >  VecSet               271 1.0 3.8922e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> >  VecAXPY                1 1.0 7.7200e-04 1.0 2.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2890
> >  VecAXPBYCZ            82 1.0 2.4370e-01 1.0 3.66e+08 1.0 0.0e+00 0.0e+00 0.0e+00  5  8  0  0  0   5  8  0  0  0  1502
> >  VecWAXPY              82 1.0 1.4148e-01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  4  0  0  0   3  4  0  0  0  1293
> >  VecAssemblyBegin       2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> >  VecAssemblyEnd         2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> >  VecScatterBegin       84 1.0 5.9300e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> >  KSPSetUp               4 1.0 1.4167e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> >  KSPSolve               1 1.0 4.0250e+00 1.0 4.31e+09 1.0 0.0e+00 0.0e+00 0.0e+00 91100  0  0  0  91100  0  0  0  1071
> >  PCSetUp                4 1.0 1.5207e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  3  1  0  0  0   3  1  0  0  0   156
> >  PCSetUpOnBlocks        1 1.0 1.1116e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  3  1  0  0  0   3  1  0  0  0   214
> >  PCApply               84 1.0 1.2912e+00 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 29 28  0  0  0  29 28  0  0  0   924
> >  PCApplyOnBlocks      252 1.0 1.2909e+00 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 29 28  0  0  0  29 28  0  0  0   924
> >  ------------------------------------------------------------------------------------------------------------------------
> > 
> > # I skipped the memory part - the options (and compiler options) are as follows: 
> > 
> > #PETSc Option Table entries: 
> >  -ksp_type bcgs
> >  -ksp_view
> >  -log_view
> >  -pc_bjacobi_local_blocks 3
> >  -pc_factor_levels 0
> >  -pc_sub_type ilu
> >  -pc_type bjacobi
> >  #End of PETSc Option Table entries
> >  Compiled with FORTRAN kernels
> >  Compiled with full precision matrices (default)
> >  sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 16 sizeof(PetscInt) 4
> >  Configure options: --with-scalar-type=complex --download-mumps --download-scalapack --with-fortran-kernels=1 --          FOPTFLAGS=?-O3 -fastsse -mp -openmp? --COPTFLAGS=?-O3 -fastsse -mp -openmp? --CXXOPTFLAGS="-O3 -fastsse -mp -openmp" --     with-debugging=0
> >  -----------------------------------------
> >  Libraries compiled on 2020-02-03 10:44:31 on Haos-MBP
> >  Machine characteristics: Darwin-19.2.0-x86_64-i386-64bit
> >  Using PETSc directory: /Users/donghao/src/git/PETSc-current
> >  Using PETSc arch: arch-darwin-c-opt
> >  -----------------------------------------
> > 
> >  Using C compiler: mpicc  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fno-stack-   check -Qunused-arguments -fvisibility=hidden
> >  Using Fortran compiler: mpif90  -Wall -ffree-line-length-0 -Wno-unused-dummy-argument
> > 
> > Using include paths: -I/Users/donghao/src/git/PETSc-current/include -I/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/include
> >  -----------------------------------------
> >  
> >  Using C linker: mpicc
> >  Using Fortran linker: mpif90
> >  Using libraries: -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/donghao/src/git/PETSc-   current/arch-darwin-c-opt/lib -lpetsc -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/    donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -Wl,-rpath,/usr/local/opt/libevent/lib -L/usr/local/opt/libevent/    lib -Wl,-rpath,/usr/local/Cellar/open-mpi/4.0.2/lib -L/usr/local/Cellar/open-mpi/4.0.2/lib -Wl,-rpath,/usr/local/Cellar/ gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple-darwin19/9.2.0 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple-        darwin19/9.2.0 -Wl,-rpath,/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -lcmumps -   ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -llapack -lblas -lc++ -ldl -lmpi_usempif08 -                 lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lquadmath -lm -lc++ -ldl
> > 
> > 
> > On the other hand, running PETSc with 
> > 
> > -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type lu -ksp_type gmres -ksp_monitor -ksp_view -log_view
> > 
> > For the same problem takes 5.37s and  72 GMRES iterations. Our previous testings show that BiCGstab (bcgs in PETSc) is almost always the most effective KSP solver for our non-symmetrical complex system. Strangely, the system is still using ilu instead of lu for sub blocks. The output is like: 
> 
>   -sub_pc_type lu
> 
> > 
> >    0 KSP Residual norm 2.480412407430e+02
> >    1 KSP Residual norm 8.848059967835e+01
> >    2 KSP Residual norm 3.415272863261e+01
> >    3 KSP Residual norm 1.563045190939e+01
> >    4 KSP Residual norm 6.241296940043e+00
> >    5 KSP Residual norm 2.739710899854e+00
> >    6 KSP Residual norm 1.391304148888e+00
> >    7 KSP Residual norm 7.959262020849e-01
> >    8 KSP Residual norm 4.828323055231e-01
> >    9 KSP Residual norm 2.918529739200e-01
> >   10 KSP Residual norm 1.905508589557e-01
> >   11 KSP Residual norm 1.291541892702e-01
> >   12 KSP Residual norm 8.827145774707e-02
> >   13 KSP Residual norm 6.521331095889e-02
> >   14 KSP Residual norm 5.095787952595e-02
> >   15 KSP Residual norm 4.043060387395e-02
> >   16 KSP Residual norm 3.232590200012e-02
> >   17 KSP Residual norm 2.593944982216e-02
> >   18 KSP Residual norm 2.064639483533e-02
> >   19 KSP Residual norm 1.653916663492e-02
> >   20 KSP Residual norm 1.334946415452e-02
> >   21 KSP Residual norm 1.092886880597e-02
> >   22 KSP Residual norm 8.988004105542e-03
> >   23 KSP Residual norm 7.466501315240e-03
> >   24 KSP Residual norm 6.284389135436e-03
> >   25 KSP Residual norm 5.425231669964e-03
> >   26 KSP Residual norm 4.766338253084e-03
> >   27 KSP Residual norm 4.241238878242e-03
> >   28 KSP Residual norm 3.808113525685e-03
> >   29 KSP Residual norm 3.449383788116e-03
> >   30 KSP Residual norm 3.126025526388e-03
> >   31 KSP Residual norm 2.958328054299e-03
> >   32 KSP Residual norm 2.802344900403e-03
> >   33 KSP Residual norm 2.621993580492e-03
> >   34 KSP Residual norm 2.430066269304e-03
> >   35 KSP Residual norm 2.259043079597e-03
> >   36 KSP Residual norm 2.104287972986e-03
> >   37 KSP Residual norm 1.952916080045e-03
> >   38 KSP Residual norm 1.804988937999e-03
> >   39 KSP Residual norm 1.643302117377e-03
> >   40 KSP Residual norm 1.471661332036e-03
> >   41 KSP Residual norm 1.286445911163e-03
> >   42 KSP Residual norm 1.127543025848e-03
> >   43 KSP Residual norm 9.777148275484e-04
> >   44 KSP Residual norm 8.293314450006e-04
> >   45 KSP Residual norm 6.989331136622e-04
> >   46 KSP Residual norm 5.852307780220e-04
> >   47 KSP Residual norm 4.926715539762e-04
> >   48 KSP Residual norm 4.215941372075e-04
> >   49 KSP Residual norm 3.699489548162e-04
> >   50 KSP Residual norm 3.293897163533e-04
> >   51 KSP Residual norm 2.959954542998e-04
> >   52 KSP Residual norm 2.700193032414e-04
> >   53 KSP Residual norm 2.461789791204e-04
> >   54 KSP Residual norm 2.218839085563e-04
> >   55 KSP Residual norm 1.945154309976e-04
> >   56 KSP Residual norm 1.661128781744e-04
> >   57 KSP Residual norm 1.413198766258e-04
> >   58 KSP Residual norm 1.213984003195e-04
> >   59 KSP Residual norm 1.044317450754e-04
> >   60 KSP Residual norm 8.919957502977e-05
> >   61 KSP Residual norm 8.042584301275e-05
> >   62 KSP Residual norm 7.292784493581e-05
> >   63 KSP Residual norm 6.481935501872e-05
> >   64 KSP Residual norm 5.718564652679e-05
> >   65 KSP Residual norm 5.072589750116e-05
> >   66 KSP Residual norm 4.487930741285e-05
> >   67 KSP Residual norm 3.941040674119e-05
> >   68 KSP Residual norm 3.492873281291e-05
> >   69 KSP Residual norm 3.103798339845e-05
> >   70 KSP Residual norm 2.822943237409e-05
> >   71 KSP Residual norm 2.610615023776e-05
> >   72 KSP Residual norm 2.441692671173e-05
> >  KSP Object: 1 MPI processes
> >    type: gmres
> >      restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
> >      happy breakdown tolerance 1e-30
> >    maximum iterations=150, nonzero initial guess
> >    tolerances:  relative=1e-07, absolute=1e-50, divergence=10000.
> >    left preconditioning
> >    using PRECONDITIONED norm type for convergence test
> >  PC Object: 1 MPI processes
> >    type: bjacobi
> >      number of blocks = 3
> >      Local solve is same for all blocks, in the following KSP and PC objects:
> >      KSP Object: (sub_) 1 MPI processes
> >        type: preonly
> >        maximum iterations=10000, initial guess is zero
> >        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >        left preconditioning
> >        using NONE norm type for convergence test
> >      PC Object: (sub_) 1 MPI processes
> >        type: ilu
> >          out-of-place factorization
> >          0 levels of fill
> >          tolerance for zero pivot 2.22045e-14
> >          matrix ordering: natural
> >          factor fill ratio given 1., needed 1.
> >            Factored matrix follows:
> >              Mat Object: 1 MPI processes
> >                type: seqaij
> >                rows=92969, cols=92969
> >                package used to perform factorization: petsc
> >                total: nonzeros=638417, allocated nonzeros=638417
> >                total number of mallocs used during MatSetValues calls=0
> >                  not using I-node routines
> >        linear system matrix = precond matrix:
> >        Mat Object: 1 MPI processes
> >          type: seqaij
> >          rows=92969, cols=92969
> >          total: nonzeros=638417, allocated nonzeros=638417
> >          total number of mallocs used during MatSetValues calls=0
> >            not using I-node routines
> >    linear system matrix = precond matrix:
> >    Mat Object: 1 MPI processes
> >      type: mpiaij
> >      rows=278906, cols=278906
> >      total: nonzeros=3274027, allocated nonzeros=3274027
> >      total number of mallocs used during MatSetValues calls=0
> >        not using I-node (on process 0) routines
> > ...
> >  ------------------------------------------------------------------------------------------------------------------------
> >  Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
> >                     Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> >  ------------------------------------------------------------------------------------------------------------------------
> >  
> >  --- Event Stage 0: Main Stage
> >  
> >  BuildTwoSidedF         1 1.0 2.3000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> >  MatMult               75 1.0 1.5812e+00 1.0 1.88e+09 1.0 0.0e+00 0.0e+00 0.0e+00 28 24  0  0  0  28 24  0  0  0  1189
> >  MatSolve             228 1.0 1.1442e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 14  0  0  0  20 14  0  0  0   944
> 
>    These flop rates are ok, but not great. Perhaps an older machine.
> 
> >  MatLUFactorNum         3 1.0 8.1930e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0   290
> >  MatILUFactorSym        3 1.0 2.7102e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> >  MatAssemblyBegin       5 1.0 3.7000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> >  MatAssemblyEnd         5 1.0 3.1895e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> >  MatGetRowIJ            3 1.0 2.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> >  MatCreateSubMats       1 1.0 4.0904e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> >  MatGetOrdering         3 1.0 4.2640e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> >  MatView                3 1.0 1.4400e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> >  VecMDot               72 1.0 1.1984e+00 1.0 2.25e+09 1.0 0.0e+00 0.0e+00 0.0e+00 21 28  0  0  0  21 28  0  0  0  1877
> 
>     21 percent of the time in VecMDOT this is huge for s sequential fun. I think maybe you are using a terrible OpenMP BLAS? 
> 
>     Send configure.log 
> 
> 
> >  VecNorm               76 1.0 1.6841e-01 1.0 1.70e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  2  0  0  0   3  2  0  0  0  1007
> >  VecScale              75 1.0 1.8241e-02 1.0 8.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  4587
> >  VecCopy                3 1.0 1.4970e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> >  VecSet               276 1.0 9.1970e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
> >  VecAXPY                6 1.0 3.7450e-03 1.0 1.34e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  3575
> >  VecMAXPY              75 1.0 1.0022e+00 1.0 2.41e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 30  0  0  0  18 30  0  0  0  2405
> >  VecAssemblyBegin       2 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> >  VecAssemblyEnd         2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> >  VecScatterBegin       76 1.0 5.5100e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> >  VecNormalize          75 1.0 1.8462e-01 1.0 2.51e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  3  0  0  0   3  3  0  0  0  1360
> >  KSPSetUp               4 1.0 1.1341e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> >  KSPSolve               1 1.0 5.3123e+00 1.0 7.91e+09 1.0 0.0e+00 0.0e+00 0.0e+00 93100  0  0  0  93100  0  0  0  1489
> >  KSPGMRESOrthog        72 1.0 2.1316e+00 1.0 4.50e+09 1.0 0.0e+00 0.0e+00 0.0e+00 37 57  0  0  0  37 57  0  0  0  2110
> >  PCSetUp                4 1.0 1.5531e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0   153
> >  PCSetUpOnBlocks        1 1.0 1.1343e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0   209
> >  PCApply               76 1.0 1.1671e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 14  0  0  0  20 14  0  0  0   925
> >  PCApplyOnBlocks      228 1.0 1.1668e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 14  0  0  0  20 14  0  0  0   925
> >  ????????????????????????????????????????????????????????????
> > ...
> > #PETSc Option Table entries:
> >  -ksp_monitor
> >  -ksp_type gmres
> >  -ksp_view
> >  -log_view
> >  -pc_bjacobi_local_blocks 3
> >  -pc_sub_type lu
> >  -pc_type bjacobi
> >  #End of PETSc Option Table entries
> > ...
> > 
> > Does any of the setup/output ring a bell? 
> > 
> > BTW, out of curiosity - what is a ?I-node? routine?
> > 
> > 
> > Cheers, 
> > Hao
> > 
> > 
> > From: Smith, Barry F. <bsmith at mcs.anl.gov>
> > Sent: Wednesday, February 5, 2020 9:42 PM
> > To: Hao DONG <dong-hao at outlook.com>
> > Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> > Subject: Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC?
> >  
> > 
> > 
> > > On Feb 5, 2020, at 4:36 AM, Hao DONG <dong-hao at outlook.com> wrote:
> > > 
> > > Thanks a lot for your suggestions, Hong and Barry - 
> > > 
> > > As you suggested, I first tried the LU direct solvers (built-in and MUMPs) out this morning, which work perfectly, albeit slow. As my problem itself is a part of a PDE based optimization, the exact solution in the searching procedure is not necessary (I often set a relative tolerance of 1E-7/1E-8, etc. for Krylov subspace methods). Also tried BJACOBI with exact LU, the KSP just converges in one or two iterations, as expected. 
> > > 
> > > I added -kspview option for the D-ILU test (still with Block Jacobi as preconditioner and bcgs as KSP solver). The KSPview output from one of the examples in a toy model looks like: 
> > > 
> > > KSP Object: 1 MPI processes
> > >    type: bcgs
> > >    maximum iterations=120, nonzero initial guess
> > >    tolerances:  relative=1e-07, absolute=1e-50, divergence=10000.
> > >    left preconditioning
> > >    using PRECONDITIONED norm type for convergence test
> > >  PC Object: 1 MPI processes
> > >    type: bjacobi
> > >      number of blocks = 3
> > >      Local solve is same for all blocks, in the following KSP and PC objects:
> > >      KSP Object: (sub_) 1 MPI processes
> > >        type: preonly
> > >        maximum iterations=10000, initial guess is zero
> > >        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> > >        left preconditioning
> > >        using NONE norm type for convergence test
> > >      PC Object: (sub_) 1 MPI processes
> > >        type: ilu
> > >          out-of-place factorization
> > >          0 levels of fill
> > >          tolerance for zero pivot 2.22045e-14
> > >          matrix ordering: natural
> > >          factor fill ratio given 1., needed 1.
> > >            Factored matrix follows:
> > >              Mat Object: 1 MPI processes
> > >                type: seqaij
> > >                rows=11294, cols=11294
> > >                package used to perform factorization: petsc
> > >                total: nonzeros=76008, allocated nonzeros=76008
> > >                total number of mallocs used during MatSetValues calls=0
> > >                  not using I-node routines
> > >        linear system matrix = precond matrix:
> > >        Mat Object: 1 MPI processes
> > >          type: seqaij
> > >          rows=11294, cols=11294
> > >          total: nonzeros=76008, allocated nonzeros=76008
> > >          total number of mallocs used during MatSetValues calls=0
> > >            not using I-node routines
> > >    linear system matrix = precond matrix:
> > >    Mat Object: 1 MPI processes
> > >      type: mpiaij
> > >      rows=33880, cols=33880
> > >      total: nonzeros=436968, allocated nonzeros=436968
> > >      total number of mallocs used during MatSetValues calls=0
> > >        not using I-node (on process 0) routines
> > > 
> > > do you see something wrong with my setup?
> > > 
> > > I also tried a quick performance test with a small 278906 by 278906 matrix (3850990 nnz) with the following parameters: 
> > > 
> > > -ksp_type bcgs -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type ilu -ksp_view
> > > 
> > > Reducing the relative residual to 1E-7 
> > > 
> > > Took 4.08s with 41 bcgs iterations. 
> > > 
> > > Merely changing the -pc_bjacobi_local_blocks to 6 
> > > 
> > > Took 7.02s with 73 bcgs iterations. 9 blocks would further take 9.45s with 101 bcgs iterations.
> > 
> >     This is normal. more blocks slower convergence
> > > 
> > > As a reference, my home-brew Fortran code solves the same problem (3-block D-ILU0) in 
> > > 
> > > 1.84s with 24 bcgs iterations (the bcgs code is also a home-brew one)?
> > > 
> >     Run the PETSc code with optimization ./configure --with-debugging=0  an run the code with -log_view this will show where the PETSc code is spending the time (send it to use)
> > 
> > 
> > > 
> > > 
> > > Well, by saying ?use explicit L/U matrix as preconditioner?, I wonder if a user is allowed to provide his own (separate) L and U Mat for preconditioning (like how it works in Matlab solvers), e.g.
> > > 
> > > x = qmr(A,b,Tol,MaxIter,L,U,x)
> > >  
> > > As I already explicitly constructed the L and U matrices in Fortran, it is not hard to convert them to Mat in Petsc to test Petsc and my Fortran code head-to-head. In that case, the A, b, x, and L/U are all identical, it would be easier to see where the problem came from. 
> > > 
> > > 
> >      No, we don't provide this kind of support
> >      
> > 
> > > 
> > > BTW, there is another thing I wondered - is there a way to output residual in unpreconditioned norm? I tried to
> > > 
> > > call KSPSetNormType(ksp_local, KSP_NORM_UNPRECONDITIONED, ierr)
> > > 
> > > But always get an error that current ksp does not support unpreconditioned in LEFT/RIGHT (either way). Is it possible to do that (output unpreconditioned residual) in PETSc at all?
> > 
> >    -ksp_monitor_true_residual    You can also run GMRES (and some other methods) with right preconditioning, -ksp_pc_side right  then the residual computed is by the algorithm the unpreconditioned residual
> > 
> >    KSPSetNormType sets the norm used in the algorithm, it generally always has to left or right, only a couple algorithms support unpreconditioned directly.
> > 
> >    Barry
> > 
> > 
> > > 
> > > Cheers, 
> > > Hao
> > > 
> > > 
> > > From: Smith, Barry F. <bsmith at mcs.anl.gov>
> > > Sent: Tuesday, February 4, 2020 8:27 PM
> > > To: Hao DONG <dong-hao at outlook.com>
> > > Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> > > Subject: Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC?
> > >  
> > > 
> > > 
> > > > On Feb 4, 2020, at 12:41 PM, Hao DONG <dong-hao at outlook.com> wrote:
> > > > 
> > > > Dear all, 
> > > > 
> > > > 
> > > > I have a few questions about the implementation of diagonal ILU PC in PETSc. I want to solve a very simple system with KSP (in parallel), the nature of the system (finite difference time-harmonic Maxwell) is probably not important to the question itself. Long story short, I just need to solve a set of equations of Ax = b with a block diagonal system matrix, like (not sure if the mono font works): 
> > > > 
> > > >    |X    |  
> > > > A =|  Y  |  
> > > >    |    Z| 
> > > > 
> > > > Note that A is not really block-diagonal, it?s just a multi-diagonal matrix determined by the FD mesh, where most elements are close to diagonal. So instead of a full ILU decomposition, a D-ILU is a good approximation as a preconditioner, and the number of blocks should not matter too much: 
> > > > 
> > > >     |Lx      |         |Ux      |
> > > > L = |   Ly   | and U = |   Uy   |
> > > >     |      Lz|         |      Uz|
> > > > 
> > > > Where [Lx, Ux] = ILU0(X), etc. Indeed, the D-ILU preconditioner (with 3N blocks) is quite efficient with Krylov subspace methods like BiCGstab or QMR in my sequential Fortran/Matlab code. 
> > > > 
> > > > So like most users, I am looking for a parallel implement with this problem in PETSc. After looking through the manual, it seems that the most straightforward way to do it is through PCBJACOBI. Not sure I understand it right, I just setup a 3-block PCJACOBI and give each of the block a KSP with PCILU. Is this supposed to be equivalent to my D-ILU preconditioner? My little block of fortran code would look like: 
> > > > ...
> > > >       call PCBJacobiSetTotalBlocks(pc_local,Ntotal,                   &
> > > >      &     isubs,ierr)
> > > >       call PCBJacobiSetLocalBlocks(pc_local, Nsub,                    &
> > > >      &    isubs(istart:iend),ierr)
> > > >       ! set up the block jacobi structure
> > > >       call KSPSetup(ksp_local,ierr)
> > > >       ! allocate sub ksps
> > > >       allocate(ksp_sub(Nsub))
> > > >       call PCBJacobiGetSubKSP(pc_local,Nsub,istart,                   &
> > > >      &     ksp_sub,ierr)
> > > >       do i=1,Nsub
> > > >           call KSPGetPC(ksp_sub(i),pc_sub,ierr)
> > > >           !ILU preconditioner
> > > >           call PCSetType(pc_sub,ptype,ierr)
> > > >           call PCFactorSetLevels(pc_sub,1,ierr) ! use ILU(1) here
> > > >           call KSPSetType(ksp_sub(i),KSPPREONLY,ierr)]
> > > >       end do
> > > >       call KSPSetTolerances(ksp_local,KSSiter%tol,PETSC_DEFAULT_REAL, &
> > > >      &     PETSC_DEFAULT_REAL,KSSiter%maxit,ierr)
> > > > ? 
> > > 
> > >      This code looks essentially right. You should call with -ksp_view to see exactly what PETSc is using for a solver. 
> > > 
> > > > 
> > > > I understand that the parallel performance may not be comparable, so I first set up a one-process test (with MPIAij, but all the rows are local since there is only one process). The system is solved without any problem (identical results within error). But the performance is actually a lot worse (code built without debugging flags in performance tests) than my own home-brew implementation in Fortran (I wrote my own ILU0 in CSR sparse matrix format), which is hard to believe. I suspect the difference is from the PC as the PETSc version took much more BiCGstab iterations (60-ish vs 100-ish) to converge to the same relative tol. 
> > > 
> > >    PETSc uses GMRES by default with a restart of 30 and left preconditioning. It could be different exact choices in the solver (which is why -ksp_view is so useful) can explain the differences in the runs between your code and PETSc's
> > > > 
> > > > This is further confirmed when I change the setup of D-ILU (using 6 or 9 blocks instead of 3). While my Fortran/Matlab codes see minimal performance difference (<5%) when I play with the D-ILU setup, increasing the number of D-ILU blocks from 3 to 9 caused the ksp setup with PCBJACOBI to suffer a performance decrease of more than 50% in sequential test.
> > > 
> > >    This is odd, the more blocks the smaller each block so the quicker the ILU set up should be. You can run various cases with -log_view and send us the output to see what is happening at each part of the computation in time.
> > >  
> > > > So my implementation IS somewhat different in PETSc. Do I miss something in the PCBJACOBI setup? Or do I have some fundamental misunderstanding of how PCBJACOBI works in PETSc? 
> > > 
> > >    Probably not.
> > > > 
> > > > If this is not the right way to implement a block diagonal ILU as (parallel) PC, please kindly point me to the right direction. I searched through the mail list to find some answers, only to find a couple of similar questions... An example would be nice.
> > > 
> > >    You approach seems fundamentally right but I cannot be sure of possible bugs.
> > > > 
> > > > On the other hand, does PETSc support a simple way to use explicit L/U matrix as a preconditioner? I can import the  D-ILU matrix (I already converted my A matrix into Mat) constructed in my Fortran code to make a better comparison. Or do I have to construct my own PC using PCshell? If so, is there a good tutorial/example to learn about how to use PCSHELL (in a more advanced sense, like how to setup pc side and transpose)? 
> > > 
> > >    Not sure what you mean by explicit L/U matrix as a preconditioner. As Hong said, yes you can use a parallel LU from MUMPS or SuperLU_DIST or Pastix as the solver. You do not need any shell code. You simply need to set the PCType to lu 
> > > 
> > >    You can also set all this options from the command line and don't need to write the code specifically. So call KSPSetFromOptions() and then for example
> > > 
> > >     -pc_type bjacobi  -pc_bjacobi_local_blocks 3 -pc_sub_type ilu (this last one is applied to each block so you could use -pc_type lu and it would use lu on each block.) 
> > > 
> > >    -ksp_type_none  -pc_type lu -pc_factor_mat_solver_type mumps  (do parallel LU with mumps)
> > > 
> > > By not hardwiring in the code and just using options you can test out different cases much quicker
> > > 
> > > Use -ksp_view to make sure that is using the solver the way you expect.
> > > 
> > > Barry
> > > 
> > > 
> > > 
> > >    Barry
> > > 
> > > > 
> > > > Thanks in advance, 
> > > > 
> > > > Hao
> 
> <configure.log><configure.log>


From griesser.jan at googlemail.com  Mon Feb 10 07:32:37 2020
From: griesser.jan at googlemail.com (=?UTF-8?B?SmFuIEdyaWXDn2Vy?=)
Date: Mon, 10 Feb 2020 14:32:37 +0100
Subject: [petsc-users] Spectrum slicing,
 Cholesky factorization for positive semidefinite matrices
Message-ID: <CAB_-EyU9Xay1PbD7VnM53gJPRfE35c=q6PQNRgVpyEXSi2iD-w@mail.gmail.com>

Hello, everybody,
i want to use the spectrum slicing method in Slepc4py to compute a subset
of the eigenvalues and associated eigenvectors of my matrix. To do this I
need a factorization that provids the Matrix Inertia. The Cholesky
decomposition is given as an example in the user manual. The problem ist
that my matrix is not positive definit but positive semidefinit (Three
eigenvalues are zero). The PETSc user forum only states that for the
Cholesky factorization a symmetric matrix is zero, but as far is i remember
the Chosleky factorization is only numerical stable for positive definite
matrices. Can i use an LU factorization for the spectrum slicing, although
the PETSc user manual states that the Inertia is accessible when using
Cholseky? Or can is still use Chollesky?
Greetings Jan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200210/d4b7e5ac/attachment.html>

From jroman at dsic.upv.es  Mon Feb 10 07:41:50 2020
From: jroman at dsic.upv.es (Jose E. Roman)
Date: Mon, 10 Feb 2020 14:41:50 +0100
Subject: [petsc-users] Spectrum slicing,
 Cholesky factorization for positive semidefinite matrices
In-Reply-To: <CAB_-EyU9Xay1PbD7VnM53gJPRfE35c=q6PQNRgVpyEXSi2iD-w@mail.gmail.com>
References: <CAB_-EyU9Xay1PbD7VnM53gJPRfE35c=q6PQNRgVpyEXSi2iD-w@mail.gmail.com>
Message-ID: <5ED476D2-A8FE-49A2-A95D-33A5383BAC2B@dsic.upv.es>

The spectrum slicing method computes the Cholesky factorization of (A-sigma*B) or (A-sigma*I) for several values of sigma. This matrix is indefinite, it does not matter if your B matrix is semi-definite. If B is singular, the only precaution is that you have to use purification, but this option is turned on by default so no problem.

Jose


> El 10 feb 2020, a las 14:32, Jan Grie?er via petsc-users <petsc-users at mcs.anl.gov> escribi?:
> 
> Hello, everybody,
> i want to use the spectrum slicing method in Slepc4py to compute a subset of the eigenvalues and associated eigenvectors of my matrix. To do this I need a factorization that provids the Matrix Inertia. The Cholesky decomposition is given as an example in the user manual. The problem ist that my matrix is not positive definit but positive semidefinit (Three eigenvalues are zero). The PETSc user forum only states that for the Cholesky factorization a symmetric matrix is zero, but as far is i remember the Chosleky factorization is only numerical stable for positive definite matrices. Can i use an LU factorization for the spectrum slicing, although the PETSc user manual states that the Inertia is accessible when using Cholseky? Or can is still use Chollesky? 
> Greetings Jan 


From dong-hao at outlook.com  Mon Feb 10 08:47:50 2020
From: dong-hao at outlook.com (Hao DONG)
Date: Mon, 10 Feb 2020 14:47:50 +0000
Subject: [petsc-users] What is the right way to implement a (block)
 Diagonal ILU as PC?
In-Reply-To: <EF5FCE13-A9D0-4E50-B20C-D7D7DE7AE628@mcs.anl.gov>
References: <MN2PR07MB6239B134B7BCE032E8ACC4C995030@MN2PR07MB6239.namprd07.prod.outlook.com>
	<264F91C4-8558-4850-9B4B-ABE4123C2A2C@anl.gov>
	<MN2PR07MB6239E7694FFB455C533991FB95020@MN2PR07MB6239.namprd07.prod.outlook.com>
	<4A373D93-4018-45E0-B805-3ECC528472DD@mcs.anl.gov>
	<BN8PR07MB6228BCD9B6D1205CB3E6D60F951D0@BN8PR07MB6228.namprd07.prod.outlook.com>
	<A3BEB448-119F-472E-A133-099EB4332E47@mcs.anl.gov>
	<MN2PR07MB6239E5B73B916F6BDD8279AA951C0@MN2PR07MB6239.namprd07.prod.outlook.com>
	<EF5FCE13-A9D0-4E50-B20C-D7D7DE7AE628@mcs.anl.gov>
Message-ID: <9DF1BA10-D81B-4BCF-98EE-0179B9A681BA@outlook.com>

Hi Barry, 

Thank you for you suggestions (and insights)! Indeed my initial motivation to try out PETSc is the different methods. As my matrix pattern is relatively simple (3D time-harmonic Maxwell equation arises from stagger-grid finite difference), also considering the fact that I am not wealthy enough to utilize the direct solvers, I was looking for a fast Krylov subspace method / preconditioner that scale well with, say, tens of cpu cores. 

As a simple block-Jacobian preconditioner seems to lose its efficiency with more than a handful of blocks, I planned to look into other methods/preconditioners, e.g. multigrid (as preconditioner) or domain decomposition methods. But I will probably need to look through a number of literatures before laying my hands on those (or bother you with more questions!). Anyway, thanks again for your kind help. 


All the best, 
Hao

> On Feb 8, 2020, at 8:02 AM, Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
> 
> 
> 
>> On Feb 7, 2020, at 7:44 AM, Hao DONG <dong-hao at outlook.com> wrote:
>> 
>> Thanks, Barry, I really appreciate your help - 
>> 
>> I removed the OpenMP flags and rebuilt PETSc. So the problem is from the BLAS lib I linked?
> 
>  Yes, the openmp causes it to run in parallel, but the problem is not big enough and the machine is not good enough for parallel BLAS to speed things up, instead it slows things down a lot. We see this often, parallel BLAS must be used with care
> 
>> Not sure which version my BLAS is, though? But I also included the -download-Scalapack option. Shouldn?t that enable linking with PBLAS automatically?
>> 
>> After looking at the bcgs code in PETSc, I suppose the iteration residual recorded is indeed recorded twice per one "actual iteration?. So that can explain the difference of iteration numbers. 
>> 
>> My laptop is indeed an old machine (MBP15 mid-2014). I just cannot work with vi without a physical ESC key...
> 
>   The latest has a physical ESC, I am stuff without the ESC for a couple more years.
> 
>> I have attached the configure.log -didn?t know that it is so large! 
>> 
>> Anyway, it seems that the removal of -openmp changes quite a lot of things, the performance is indeed getting much better - the flop/sec increases by a factor of 3. Still, I am getting 20 percent of VecMDot, but no VecMDot in BCGS all (see output below), is that a feature of gmres method? 
> 
>   Yes, GMRES orthogonalizes against the last restart directions which uses these routines while  BCGS does not, this is why BCGS is cheaper per iteration.
> 
>    PETSc is no faster than your code because the algorithm is the same, the compilers the same, and the hardware the same. No way to have clever tricks for PETSc to be much faster. What PETS provides is a huge variety of tested algorithms that no single person could code on their own. Anything in PETSc you could code yourself if you had endless time and get basically the same performance.
> 
>  Barry
> 
> 
>> 
>> here is the output of the same problem with:  
>> 
>> -pc_type bjacobi -pc_bjacobi_local_blocks 3 -sub_pc_type ilu -ksp_type gmres -ksp_monitor -ksp_view
>> 
>> 
>> ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
>> 
>> Mod3DMT.test on a arch-darwin-c-opt named Haos-MBP with 1 processor, by donghao Fri Feb  7 10:26:19 2020
>> Using Petsc Release Version 3.12.3, unknown
>> 
>>                          Max       Max/Min     Avg       Total
>> Time (sec):           2.520e+00     1.000   2.520e+00
>> Objects:              1.756e+03     1.000   1.756e+03
>> Flop:                 7.910e+09     1.000   7.910e+09  7.910e+09
>> Flop/sec:             3.138e+09     1.000   3.138e+09  3.138e+09
>> MPI Messages:         0.000e+00     0.000   0.000e+00  0.000e+00
>> MPI Message Lengths:  0.000e+00     0.000   0.000e+00  0.000e+00
>> MPI Reductions:       0.000e+00     0.000
>> 
>> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
>>                             e.g., VecAXPY() for real vectors of length N --> 2N flop
>>                             and VecAXPY() for complex vectors of length N --> 8N flop
>> 
>> Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
>>                         Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
>>  0:      Main Stage: 2.5204e+00 100.0%  7.9096e+09 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>> 
>> ------------------------------------------------------------------------------------------------------------------------
>> ?
>> ------------------------------------------------------------------------------------------------------------------------
>> Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
>>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>> ------------------------------------------------------------------------------------------------------------------------
>> 
>> --- Event Stage 0: Main Stage
>> 
>> BuildTwoSidedF         1 1.0 3.4000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatMult               75 1.0 6.2884e-01 1.0 1.88e+09 1.0 0.0e+00 0.0e+00 0.0e+00 25 24  0  0  0  25 24  0  0  0  2991
>> MatSolve             228 1.0 4.4164e-01 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 14  0  0  0  18 14  0  0  0  2445
>> MatLUFactorNum         3 1.0 4.1317e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0   574
>> MatILUFactorSym        3 1.0 2.3858e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>> MatAssemblyBegin       5 1.0 4.4000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatAssemblyEnd         5 1.0 1.5067e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>> MatGetRowIJ            3 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatCreateSubMats       1 1.0 2.4558e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>> MatGetOrdering         3 1.0 1.3290e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatView                3 1.0 1.2800e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecMDot               72 1.0 4.9875e-01 1.0 2.25e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 28  0  0  0  20 28  0  0  0  4509
>> VecNorm               76 1.0 6.6666e-02 1.0 1.70e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  2  0  0  0   3  2  0  0  0  2544
>> VecScale              75 1.0 1.7982e-02 1.0 8.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0  4653
>> VecCopy                3 1.0 1.5080e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecSet               276 1.0 9.6784e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  4  0  0  0  0   4  0  0  0  0     0
>> VecAXPY                6 1.0 3.6860e-03 1.0 1.34e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  3632
>> VecMAXPY              75 1.0 4.0490e-01 1.0 2.41e+09 1.0 0.0e+00 0.0e+00 0.0e+00 16 30  0  0  0  16 30  0  0  0  5951
>> VecAssemblyBegin       2 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecAssemblyEnd         2 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecScatterBegin       76 1.0 5.3800e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecNormalize          75 1.0 8.3690e-02 1.0 2.51e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  3  0  0  0   3  3  0  0  0  2999
>> KSPSetUp               4 1.0 1.1663e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> KSPSolve               1 1.0 2.2119e+00 1.0 7.91e+09 1.0 0.0e+00 0.0e+00 0.0e+00 88100  0  0  0  88100  0  0  0  3576
>> KSPGMRESOrthog        72 1.0 8.7843e-01 1.0 4.50e+09 1.0 0.0e+00 0.0e+00 0.0e+00 35 57  0  0  0  35 57  0  0  0  5121
>> PCSetUp                4 1.0 9.2448e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  4  0  0  0  0   4  0  0  0  0   257
>> PCSetUpOnBlocks        1 1.0 6.6597e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0   356
>> PCApply               76 1.0 4.6281e-01 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 14  0  0  0  18 14  0  0  0  2333
>> PCApplyOnBlocks      228 1.0 4.6262e-01 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 14  0  0  0  18 14  0  0  0  2334
>> ------------------------------------------------------------------------------------------------------------------------
>> 
>> Average time to get PetscTime(): 1e-07
>> #PETSc Option Table entries:
>> -I LBFGS
>> -ksp_type gmres
>> -ksp_view
>> -log_view
>> -pc_bjacobi_local_blocks 3
>> -pc_type bjacobi
>> -sub_pc_type ilu
>> #End of PETSc Option Table entries
>> Compiled with FORTRAN kernels
>> Compiled with full precision matrices (default)
>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 16 sizeof(PetscInt) 4
>> Configure options: --with-scalar-type=complex --download-mumps --download-scalapack --with-fortran-kernels=1 --          FOPTFLAGS="-O3 -ffree-line-length-0 -msse2" --COPTFLAGS="-O3 -msse2" --CXXOPTFLAGS="-O3 -msse2" --with-debugging=0
>> -----------------------------------------
>> Libraries compiled on 2020-02-07 10:07:42 on Haos-MBP
>> Machine characteristics: Darwin-19.3.0-x86_64-i386-64bit
>> Using PETSc directory: /Users/donghao/src/git/PETSc-current
>> Using PETSc arch: arch-darwin-c-opt
>> -----------------------------------------
>> 
>> Using C compiler: mpicc  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fno-stack-   check -Qunused-arguments -fvisibility=hidden -O3 -msse2
>> Using Fortran compiler: mpif90  -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -O3 -ffree-line-length-0 -         msse2
>> -----------------------------------------
>> 
>> Using include paths: -I/Users/donghao/src/git/PETSc-current/include -I/Users/donghao/src/git/PETSc-current/arch-darwin-  c-opt/include
>> -----------------------------------------
>> 
>> Using C linker: mpicc
>> Using Fortran linker: mpif90
>> Using libraries: -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/donghao/src/git/PETSc-   current/arch-darwin-c-opt/lib -lpetsc -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/    donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -Wl,-rpath,/usr/local/opt/libevent/lib -L/usr/local/opt/libevent/    lib -Wl,-rpath,/usr/local/Cellar/open-mpi/4.0.2/lib -L/usr/local/Cellar/open-mpi/4.0.2/lib -Wl,-rpath,/usr/local/Cellar/ gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple-darwin19/9.2.0 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple-        darwin19/9.2.0 -Wl,-rpath,/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -lcmumps -   ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -llapack -lblas -lc++ -ldl -lmpi_usempif08 -                 lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lquadmath -lm -lc++ -ldl
>> -----------------------------------------
>> 
>> 
>> 
>> The BCGS solver performance is now comparable to my own Fortran code (1.84s). Still, I feel that there is something wrong hidden somewhere in my setup - a professional lib should to perform better, I believe. Any other ideas that I can look into? Interestingly there is no VecMDot operation at all! Here is the output with the option of: 
>> 
>> -pc_type bjacobi -pc_bjacobi_local_blocks 3 -sub_pc_type ilu -ksp_type bcgs -ksp_monitor -ksp_view
>> 
>> 
>> 
>> 
>> ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
>> 
>> Mod3DMT.test on a arch-darwin-c-opt named Haos-MBP with 1 processor, by donghao Fri Feb  7 10:38:00 2020
>> Using Petsc Release Version 3.12.3, unknown
>> 
>>                          Max       Max/Min     Avg       Total
>> Time (sec):           2.187e+00     1.000   2.187e+00
>> Objects:              1.155e+03     1.000   1.155e+03
>> Flop:                 4.311e+09     1.000   4.311e+09  4.311e+09
>> Flop/sec:             1.971e+09     1.000   1.971e+09  1.971e+09
>> MPI Messages:         0.000e+00     0.000   0.000e+00  0.000e+00
>> MPI Message Lengths:  0.000e+00     0.000   0.000e+00  0.000e+00
>> MPI Reductions:       0.000e+00     0.000
>> 
>> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
>>                             e.g., VecAXPY() for real vectors of length N --> 2N flop
>>                             and VecAXPY() for complex vectors of length N --> 8N flop
>> 
>> Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
>>                         Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
>>  0:      Main Stage: 2.1870e+00 100.0%  4.3113e+09 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>> 
>> ------------------------------------------------------------------------------------------------------------------------
>> 
>> ------------------------------------------------------------------------------------------------------------------------
>> Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
>>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>> ------------------------------------------------------------------------------------------------------------------------
>> 
>> --- Event Stage 0: Main Stage
>> 
>> BuildTwoSidedF         1 1.0 2.2000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatMult               83 1.0 7.8726e-01 1.0 2.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 36 48  0  0  0  36 48  0  0  0  2644
>> MatSolve             252 1.0 5.5656e-01 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 25 28  0  0  0  25 28  0  0  0  2144
>> MatLUFactorNum         3 1.0 4.5115e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   526
>> MatILUFactorSym        3 1.0 2.5103e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>> MatAssemblyBegin       5 1.0 3.3000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatAssemblyEnd         5 1.0 1.5709e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>> MatGetRowIJ            3 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatCreateSubMats       1 1.0 2.8989e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>> MatGetOrdering         3 1.0 1.1200e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatView                3 1.0 1.2600e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecDot                82 1.0 8.9328e-02 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00  4  4  0  0  0   4  4  0  0  0  2048
>> VecDotNorm2           41 1.0 9.9019e-02 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00  5  4  0  0  0   5  4  0  0  0  1848
>> VecNorm               43 1.0 3.9988e-02 1.0 9.59e+07 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0  2399
>> VecCopy                2 1.0 1.1150e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecSet               271 1.0 4.2833e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
>> VecAXPY                1 1.0 5.9200e-04 1.0 2.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  3769
>> VecAXPBYCZ            82 1.0 1.1448e-01 1.0 3.66e+08 1.0 0.0e+00 0.0e+00 0.0e+00  5  8  0  0  0   5  8  0  0  0  3196
>> VecWAXPY              82 1.0 6.7460e-02 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  4  0  0  0   3  4  0  0  0  2712
>> VecAssemblyBegin       2 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecAssemblyEnd         2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecScatterBegin       84 1.0 5.2800e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> KSPSetUp               4 1.0 1.4765e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>> KSPSolve               1 1.0 1.8514e+00 1.0 4.31e+09 1.0 0.0e+00 0.0e+00 0.0e+00 85100  0  0  0  85100  0  0  0  2329
>> PCSetUp                4 1.0 1.0193e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  5  1  0  0  0   5  1  0  0  0   233
>> PCSetUpOnBlocks        1 1.0 7.1421e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  3  1  0  0  0   3  1  0  0  0   332
>> PCApply               84 1.0 5.7927e-01 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 26 28  0  0  0  26 28  0  0  0  2060
>> PCApplyOnBlocks      252 1.0 5.7902e-01 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 26 28  0  0  0  26 28  0  0  0  2061
>> ------------------------------------------------------------------------------------------------------------------------
>> 
>> 
>> Cheers, 
>> Hao
>> 
>> 
>> 
>> From: Smith, Barry F. <bsmith at mcs.anl.gov>
>> Sent: Thursday, February 6, 2020 7:03 PM
>> To: Hao DONG <dong-hao at outlook.com>
>> Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
>> Subject: Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC?
>> 
>> 
>>  Read my comments ALL the way down, they go a long way.
>> 
>>> On Feb 6, 2020, at 3:43 AM, Hao DONG <dong-hao at outlook.com> wrote:
>>> 
>>> Dear Hong and Barry, 
>>> 
>>> Thanks for the suggestions. So there could be some problems in my PETSc configuration? - but my PETSc lib was indeed compiled without the debug flags (--with-debugging=0). I use GCC/GFortran (Home-brew GCC 9.2.0) for the compiling and building of PETSc and my own fortran code. My Fortran compiling flags are simply like: 
>>> 
>>> -O3 -ffree-line-length-none -fastsse 
>>> 
>>> Which is also used for -FOPTFLAGS in PETSc (I added -openmp for PETSc, but not my fortran code, as I don?t have any OMP optimizations in my program). Note the performance test results I listed yesterday (e.g. 4.08s with 41 bcgs iterations.) are without any CSR-array->PETSc translation overhead (only include the set and solve part). 
>> 
>>   PETSc doesn't use -openmp in any way for its solvers. Do not use this option, it may be slowing the code down. Please send configure.log
>> 
>>> 
>>> I have two questions about the performance difference - 
>>> 
>>> 1. Is ilu only factorized once for each iteration, or ilu is performed at every outer ksp iteration steps? Sounds unlikely - but if so, this could cause some extra overheads. 
>> 
>>   ILU is ONLY done if the matrix has changed (which seems wrong).
>>> 
>>> 2. Some KSP solvers like BCGS or TFQMR has two ?half-iterations? for each iteration step. Not sure how it works in PETSc, but is that possible that both the two ?half" relative residuals are counted in the output array, doubling the number of iterations (but that cannot explain the extra time consumed)?
>> 
>>   Yes, PETSc might report them as two, you need to check the exact code.
>> 
>>> 
>>> Anyway, the output with -log_view from the same 278906 by 278906 matrix with 3-block D-ILU in PETSc is as follows: 
>>> 
>>> 
>>> ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
>>> 
>>> MEMsolv.lu on a arch-darwin-c-opt named Haos-MBP with 1 processor, by donghao Thu Feb  6 09:07:35 2020
>>> Using Petsc Release Version 3.12.3, unknown
>>> 
>>>                          Max       Max/Min     Avg       Total
>>> Time (sec):           4.443e+00     1.000   4.443e+00
>>> Objects:              1.155e+03     1.000   1.155e+03
>>> Flop:                 4.311e+09     1.000   4.311e+09  4.311e+09
>>> Flop/sec:             9.703e+08     1.000   9.703e+08  9.703e+08
>>> MPI Messages:         0.000e+00     0.000   0.000e+00  0.000e+00
>>> MPI Message Lengths:  0.000e+00     0.000   0.000e+00  0.000e+00
>>> MPI Reductions:       0.000e+00     0.000
>>> 
>>> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
>>>                             e.g., VecAXPY() for real vectors of length N --> 2N flop
>>>                             and VecAXPY() for complex vectors of length N --> 8N flop
>>> 
>>> Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
>>>                         Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
>>>  0:      Main Stage: 4.4435e+00 100.0%  4.3113e+09 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>>> 
>>> ????????????????????????????????????????????????????????????
>>> See the 'Profiling' chapter of the users' manual for details on interpreting output.
>>> Phase summary info:
>>>    Count: number of times phase was executed
>>>    Time and Flop: Max - maximum over all processors
>>>                   Ratio - ratio of maximum to minimum over all processors
>>>    Mess: number of messages sent
>>>    AvgLen: average message length (bytes)
>>>    Reduct: number of global reductions
>>>    Global: entire computation
>>>    Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
>>>       %T - percent time in this phase         %F - percent flop in this phase
>>>       %M - percent messages in this phase     %L - percent message lengths in this phase
>>>       %R - percent reductions in this phase
>>>    Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
>>> ------------------------------------------------------------------------------------------------------------------------
>>> Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
>>>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>>> ------------------------------------------------------------------------------------------------------------------------
>>> 
>>> --- Event Stage 0: Main Stage
>>> 
>>> BuildTwoSidedF         1 1.0 2.3000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatMult               83 1.0 1.7815e+00 1.0 2.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 40 48  0  0  0  40 48  0  0  0  1168
>>> MatSolve             252 1.0 1.2708e+00 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 29 28  0  0  0  29 28  0  0  0   939
>>> MatLUFactorNum         3 1.0 7.9725e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   298
>>> MatILUFactorSym        3 1.0 2.6998e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>> MatAssemblyBegin       5 1.0 3.6000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatAssemblyEnd         5 1.0 3.1619e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>> MatGetRowIJ            3 1.0 2.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatCreateSubMats       1 1.0 3.9659e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>> MatGetOrdering         3 1.0 4.3070e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatView                3 1.0 1.3600e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecDot                82 1.0 1.8948e-01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00  4  4  0  0  0   4  4  0  0  0   966
>>> VecDotNorm2           41 1.0 1.6812e-01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00  4  4  0  0  0   4  4  0  0  0  1088
>>> VecNorm               43 1.0 9.5099e-02 1.0 9.59e+07 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0  1009
>>> VecCopy                2 1.0 1.0120e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecSet               271 1.0 3.8922e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>> VecAXPY                1 1.0 7.7200e-04 1.0 2.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2890
>>> VecAXPBYCZ            82 1.0 2.4370e-01 1.0 3.66e+08 1.0 0.0e+00 0.0e+00 0.0e+00  5  8  0  0  0   5  8  0  0  0  1502
>>> VecWAXPY              82 1.0 1.4148e-01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  4  0  0  0   3  4  0  0  0  1293
>>> VecAssemblyBegin       2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecAssemblyEnd         2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecScatterBegin       84 1.0 5.9300e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> KSPSetUp               4 1.0 1.4167e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> KSPSolve               1 1.0 4.0250e+00 1.0 4.31e+09 1.0 0.0e+00 0.0e+00 0.0e+00 91100  0  0  0  91100  0  0  0  1071
>>> PCSetUp                4 1.0 1.5207e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  3  1  0  0  0   3  1  0  0  0   156
>>> PCSetUpOnBlocks        1 1.0 1.1116e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  3  1  0  0  0   3  1  0  0  0   214
>>> PCApply               84 1.0 1.2912e+00 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 29 28  0  0  0  29 28  0  0  0   924
>>> PCApplyOnBlocks      252 1.0 1.2909e+00 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 29 28  0  0  0  29 28  0  0  0   924
>>> ------------------------------------------------------------------------------------------------------------------------
>>> 
>>> # I skipped the memory part - the options (and compiler options) are as follows: 
>>> 
>>> #PETSc Option Table entries: 
>>> -ksp_type bcgs
>>> -ksp_view
>>> -log_view
>>> -pc_bjacobi_local_blocks 3
>>> -pc_factor_levels 0
>>> -pc_sub_type ilu
>>> -pc_type bjacobi
>>> #End of PETSc Option Table entries
>>> Compiled with FORTRAN kernels
>>> Compiled with full precision matrices (default)
>>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 16 sizeof(PetscInt) 4
>>> Configure options: --with-scalar-type=complex --download-mumps --download-scalapack --with-fortran-kernels=1 --          FOPTFLAGS=?-O3 -fastsse -mp -openmp? --COPTFLAGS=?-O3 -fastsse -mp -openmp? --CXXOPTFLAGS="-O3 -fastsse -mp -openmp" --     with-debugging=0
>>> -----------------------------------------
>>> Libraries compiled on 2020-02-03 10:44:31 on Haos-MBP
>>> Machine characteristics: Darwin-19.2.0-x86_64-i386-64bit
>>> Using PETSc directory: /Users/donghao/src/git/PETSc-current
>>> Using PETSc arch: arch-darwin-c-opt
>>> -----------------------------------------
>>> 
>>> Using C compiler: mpicc  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fno-stack-   check -Qunused-arguments -fvisibility=hidden
>>> Using Fortran compiler: mpif90  -Wall -ffree-line-length-0 -Wno-unused-dummy-argument
>>> 
>>> Using include paths: -I/Users/donghao/src/git/PETSc-current/include -I/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/include
>>> -----------------------------------------
>>> 
>>> Using C linker: mpicc
>>> Using Fortran linker: mpif90
>>> Using libraries: -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/donghao/src/git/PETSc-   current/arch-darwin-c-opt/lib -lpetsc -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/    donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -Wl,-rpath,/usr/local/opt/libevent/lib -L/usr/local/opt/libevent/    lib -Wl,-rpath,/usr/local/Cellar/open-mpi/4.0.2/lib -L/usr/local/Cellar/open-mpi/4.0.2/lib -Wl,-rpath,/usr/local/Cellar/ gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple-darwin19/9.2.0 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple-        darwin19/9.2.0 -Wl,-rpath,/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -lcmumps -   ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -llapack -lblas -lc++ -ldl -lmpi_usempif08 -                 lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lquadmath -lm -lc++ -ldl
>>> 
>>> 
>>> On the other hand, running PETSc with 
>>> 
>>> -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type lu -ksp_type gmres -ksp_monitor -ksp_view -log_view
>>> 
>>> For the same problem takes 5.37s and  72 GMRES iterations. Our previous testings show that BiCGstab (bcgs in PETSc) is almost always the most effective KSP solver for our non-symmetrical complex system. Strangely, the system is still using ilu instead of lu for sub blocks. The output is like: 
>> 
>>  -sub_pc_type lu
>> 
>>> 
>>>   0 KSP Residual norm 2.480412407430e+02
>>>   1 KSP Residual norm 8.848059967835e+01
>>>   2 KSP Residual norm 3.415272863261e+01
>>>   3 KSP Residual norm 1.563045190939e+01
>>>   4 KSP Residual norm 6.241296940043e+00
>>>   5 KSP Residual norm 2.739710899854e+00
>>>   6 KSP Residual norm 1.391304148888e+00
>>>   7 KSP Residual norm 7.959262020849e-01
>>>   8 KSP Residual norm 4.828323055231e-01
>>>   9 KSP Residual norm 2.918529739200e-01
>>>  10 KSP Residual norm 1.905508589557e-01
>>>  11 KSP Residual norm 1.291541892702e-01
>>>  12 KSP Residual norm 8.827145774707e-02
>>>  13 KSP Residual norm 6.521331095889e-02
>>>  14 KSP Residual norm 5.095787952595e-02
>>>  15 KSP Residual norm 4.043060387395e-02
>>>  16 KSP Residual norm 3.232590200012e-02
>>>  17 KSP Residual norm 2.593944982216e-02
>>>  18 KSP Residual norm 2.064639483533e-02
>>>  19 KSP Residual norm 1.653916663492e-02
>>>  20 KSP Residual norm 1.334946415452e-02
>>>  21 KSP Residual norm 1.092886880597e-02
>>>  22 KSP Residual norm 8.988004105542e-03
>>>  23 KSP Residual norm 7.466501315240e-03
>>>  24 KSP Residual norm 6.284389135436e-03
>>>  25 KSP Residual norm 5.425231669964e-03
>>>  26 KSP Residual norm 4.766338253084e-03
>>>  27 KSP Residual norm 4.241238878242e-03
>>>  28 KSP Residual norm 3.808113525685e-03
>>>  29 KSP Residual norm 3.449383788116e-03
>>>  30 KSP Residual norm 3.126025526388e-03
>>>  31 KSP Residual norm 2.958328054299e-03
>>>  32 KSP Residual norm 2.802344900403e-03
>>>  33 KSP Residual norm 2.621993580492e-03
>>>  34 KSP Residual norm 2.430066269304e-03
>>>  35 KSP Residual norm 2.259043079597e-03
>>>  36 KSP Residual norm 2.104287972986e-03
>>>  37 KSP Residual norm 1.952916080045e-03
>>>  38 KSP Residual norm 1.804988937999e-03
>>>  39 KSP Residual norm 1.643302117377e-03
>>>  40 KSP Residual norm 1.471661332036e-03
>>>  41 KSP Residual norm 1.286445911163e-03
>>>  42 KSP Residual norm 1.127543025848e-03
>>>  43 KSP Residual norm 9.777148275484e-04
>>>  44 KSP Residual norm 8.293314450006e-04
>>>  45 KSP Residual norm 6.989331136622e-04
>>>  46 KSP Residual norm 5.852307780220e-04
>>>  47 KSP Residual norm 4.926715539762e-04
>>>  48 KSP Residual norm 4.215941372075e-04
>>>  49 KSP Residual norm 3.699489548162e-04
>>>  50 KSP Residual norm 3.293897163533e-04
>>>  51 KSP Residual norm 2.959954542998e-04
>>>  52 KSP Residual norm 2.700193032414e-04
>>>  53 KSP Residual norm 2.461789791204e-04
>>>  54 KSP Residual norm 2.218839085563e-04
>>>  55 KSP Residual norm 1.945154309976e-04
>>>  56 KSP Residual norm 1.661128781744e-04
>>>  57 KSP Residual norm 1.413198766258e-04
>>>  58 KSP Residual norm 1.213984003195e-04
>>>  59 KSP Residual norm 1.044317450754e-04
>>>  60 KSP Residual norm 8.919957502977e-05
>>>  61 KSP Residual norm 8.042584301275e-05
>>>  62 KSP Residual norm 7.292784493581e-05
>>>  63 KSP Residual norm 6.481935501872e-05
>>>  64 KSP Residual norm 5.718564652679e-05
>>>  65 KSP Residual norm 5.072589750116e-05
>>>  66 KSP Residual norm 4.487930741285e-05
>>>  67 KSP Residual norm 3.941040674119e-05
>>>  68 KSP Residual norm 3.492873281291e-05
>>>  69 KSP Residual norm 3.103798339845e-05
>>>  70 KSP Residual norm 2.822943237409e-05
>>>  71 KSP Residual norm 2.610615023776e-05
>>>  72 KSP Residual norm 2.441692671173e-05
>>> KSP Object: 1 MPI processes
>>>   type: gmres
>>>     restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>>>     happy breakdown tolerance 1e-30
>>>   maximum iterations=150, nonzero initial guess
>>>   tolerances:  relative=1e-07, absolute=1e-50, divergence=10000.
>>>   left preconditioning
>>>   using PRECONDITIONED norm type for convergence test
>>> PC Object: 1 MPI processes
>>>   type: bjacobi
>>>     number of blocks = 3
>>>     Local solve is same for all blocks, in the following KSP and PC objects:
>>>     KSP Object: (sub_) 1 MPI processes
>>>       type: preonly
>>>       maximum iterations=10000, initial guess is zero
>>>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>>       left preconditioning
>>>       using NONE norm type for convergence test
>>>     PC Object: (sub_) 1 MPI processes
>>>       type: ilu
>>>         out-of-place factorization
>>>         0 levels of fill
>>>         tolerance for zero pivot 2.22045e-14
>>>         matrix ordering: natural
>>>         factor fill ratio given 1., needed 1.
>>>           Factored matrix follows:
>>>             Mat Object: 1 MPI processes
>>>               type: seqaij
>>>               rows=92969, cols=92969
>>>               package used to perform factorization: petsc
>>>               total: nonzeros=638417, allocated nonzeros=638417
>>>               total number of mallocs used during MatSetValues calls=0
>>>                 not using I-node routines
>>>       linear system matrix = precond matrix:
>>>       Mat Object: 1 MPI processes
>>>         type: seqaij
>>>         rows=92969, cols=92969
>>>         total: nonzeros=638417, allocated nonzeros=638417
>>>         total number of mallocs used during MatSetValues calls=0
>>>           not using I-node routines
>>>   linear system matrix = precond matrix:
>>>   Mat Object: 1 MPI processes
>>>     type: mpiaij
>>>     rows=278906, cols=278906
>>>     total: nonzeros=3274027, allocated nonzeros=3274027
>>>     total number of mallocs used during MatSetValues calls=0
>>>       not using I-node (on process 0) routines
>>> ...
>>> ------------------------------------------------------------------------------------------------------------------------
>>> Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
>>>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>>> ------------------------------------------------------------------------------------------------------------------------
>>> 
>>> --- Event Stage 0: Main Stage
>>> 
>>> BuildTwoSidedF         1 1.0 2.3000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatMult               75 1.0 1.5812e+00 1.0 1.88e+09 1.0 0.0e+00 0.0e+00 0.0e+00 28 24  0  0  0  28 24  0  0  0  1189
>>> MatSolve             228 1.0 1.1442e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 14  0  0  0  20 14  0  0  0   944
>> 
>>   These flop rates are ok, but not great. Perhaps an older machine.
>> 
>>> MatLUFactorNum         3 1.0 8.1930e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0   290
>>> MatILUFactorSym        3 1.0 2.7102e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatAssemblyBegin       5 1.0 3.7000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatAssemblyEnd         5 1.0 3.1895e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>> MatGetRowIJ            3 1.0 2.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatCreateSubMats       1 1.0 4.0904e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>> MatGetOrdering         3 1.0 4.2640e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatView                3 1.0 1.4400e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecMDot               72 1.0 1.1984e+00 1.0 2.25e+09 1.0 0.0e+00 0.0e+00 0.0e+00 21 28  0  0  0  21 28  0  0  0  1877
>> 
>>    21 percent of the time in VecMDOT this is huge for s sequential fun. I think maybe you are using a terrible OpenMP BLAS? 
>> 
>>    Send configure.log 
>> 
>> 
>>> VecNorm               76 1.0 1.6841e-01 1.0 1.70e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  2  0  0  0   3  2  0  0  0  1007
>>> VecScale              75 1.0 1.8241e-02 1.0 8.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  4587
>>> VecCopy                3 1.0 1.4970e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecSet               276 1.0 9.1970e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
>>> VecAXPY                6 1.0 3.7450e-03 1.0 1.34e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  3575
>>> VecMAXPY              75 1.0 1.0022e+00 1.0 2.41e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 30  0  0  0  18 30  0  0  0  2405
>>> VecAssemblyBegin       2 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecAssemblyEnd         2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecScatterBegin       76 1.0 5.5100e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecNormalize          75 1.0 1.8462e-01 1.0 2.51e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  3  0  0  0   3  3  0  0  0  1360
>>> KSPSetUp               4 1.0 1.1341e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> KSPSolve               1 1.0 5.3123e+00 1.0 7.91e+09 1.0 0.0e+00 0.0e+00 0.0e+00 93100  0  0  0  93100  0  0  0  1489
>>> KSPGMRESOrthog        72 1.0 2.1316e+00 1.0 4.50e+09 1.0 0.0e+00 0.0e+00 0.0e+00 37 57  0  0  0  37 57  0  0  0  2110
>>> PCSetUp                4 1.0 1.5531e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0   153
>>> PCSetUpOnBlocks        1 1.0 1.1343e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0   209
>>> PCApply               76 1.0 1.1671e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 14  0  0  0  20 14  0  0  0   925
>>> PCApplyOnBlocks      228 1.0 1.1668e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 14  0  0  0  20 14  0  0  0   925
>>> ????????????????????????????????????????????????????????????
>>> ...
>>> #PETSc Option Table entries:
>>> -ksp_monitor
>>> -ksp_type gmres
>>> -ksp_view
>>> -log_view
>>> -pc_bjacobi_local_blocks 3
>>> -pc_sub_type lu
>>> -pc_type bjacobi
>>> #End of PETSc Option Table entries
>>> ...
>>> 
>>> Does any of the setup/output ring a bell? 
>>> 
>>> BTW, out of curiosity - what is a ?I-node? routine?
>>> 
>>> 
>>> Cheers, 
>>> Hao
>>> 
>>> 
>>> From: Smith, Barry F. <bsmith at mcs.anl.gov>
>>> Sent: Wednesday, February 5, 2020 9:42 PM
>>> To: Hao DONG <dong-hao at outlook.com>
>>> Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
>>> Subject: Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC?
>>> 
>>> 
>>> 
>>>> On Feb 5, 2020, at 4:36 AM, Hao DONG <dong-hao at outlook.com> wrote:
>>>> 
>>>> Thanks a lot for your suggestions, Hong and Barry - 
>>>> 
>>>> As you suggested, I first tried the LU direct solvers (built-in and MUMPs) out this morning, which work perfectly, albeit slow. As my problem itself is a part of a PDE based optimization, the exact solution in the searching procedure is not necessary (I often set a relative tolerance of 1E-7/1E-8, etc. for Krylov subspace methods). Also tried BJACOBI with exact LU, the KSP just converges in one or two iterations, as expected. 
>>>> 
>>>> I added -kspview option for the D-ILU test (still with Block Jacobi as preconditioner and bcgs as KSP solver). The KSPview output from one of the examples in a toy model looks like: 
>>>> 
>>>> KSP Object: 1 MPI processes
>>>>   type: bcgs
>>>>   maximum iterations=120, nonzero initial guess
>>>>   tolerances:  relative=1e-07, absolute=1e-50, divergence=10000.
>>>>   left preconditioning
>>>>   using PRECONDITIONED norm type for convergence test
>>>> PC Object: 1 MPI processes
>>>>   type: bjacobi
>>>>     number of blocks = 3
>>>>     Local solve is same for all blocks, in the following KSP and PC objects:
>>>>     KSP Object: (sub_) 1 MPI processes
>>>>       type: preonly
>>>>       maximum iterations=10000, initial guess is zero
>>>>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>>>       left preconditioning
>>>>       using NONE norm type for convergence test
>>>>     PC Object: (sub_) 1 MPI processes
>>>>       type: ilu
>>>>         out-of-place factorization
>>>>         0 levels of fill
>>>>         tolerance for zero pivot 2.22045e-14
>>>>         matrix ordering: natural
>>>>         factor fill ratio given 1., needed 1.
>>>>           Factored matrix follows:
>>>>             Mat Object: 1 MPI processes
>>>>               type: seqaij
>>>>               rows=11294, cols=11294
>>>>               package used to perform factorization: petsc
>>>>               total: nonzeros=76008, allocated nonzeros=76008
>>>>               total number of mallocs used during MatSetValues calls=0
>>>>                 not using I-node routines
>>>>       linear system matrix = precond matrix:
>>>>       Mat Object: 1 MPI processes
>>>>         type: seqaij
>>>>         rows=11294, cols=11294
>>>>         total: nonzeros=76008, allocated nonzeros=76008
>>>>         total number of mallocs used during MatSetValues calls=0
>>>>           not using I-node routines
>>>>   linear system matrix = precond matrix:
>>>>   Mat Object: 1 MPI processes
>>>>     type: mpiaij
>>>>     rows=33880, cols=33880
>>>>     total: nonzeros=436968, allocated nonzeros=436968
>>>>     total number of mallocs used during MatSetValues calls=0
>>>>       not using I-node (on process 0) routines
>>>> 
>>>> do you see something wrong with my setup?
>>>> 
>>>> I also tried a quick performance test with a small 278906 by 278906 matrix (3850990 nnz) with the following parameters: 
>>>> 
>>>> -ksp_type bcgs -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type ilu -ksp_view
>>>> 
>>>> Reducing the relative residual to 1E-7 
>>>> 
>>>> Took 4.08s with 41 bcgs iterations. 
>>>> 
>>>> Merely changing the -pc_bjacobi_local_blocks to 6 
>>>> 
>>>> Took 7.02s with 73 bcgs iterations. 9 blocks would further take 9.45s with 101 bcgs iterations.
>>> 
>>>    This is normal. more blocks slower convergence
>>>> 
>>>> As a reference, my home-brew Fortran code solves the same problem (3-block D-ILU0) in 
>>>> 
>>>> 1.84s with 24 bcgs iterations (the bcgs code is also a home-brew one)?
>>>> 
>>>    Run the PETSc code with optimization ./configure --with-debugging=0  an run the code with -log_view this will show where the PETSc code is spending the time (send it to use)
>>> 
>>> 
>>>> 
>>>> 
>>>> Well, by saying ?use explicit L/U matrix as preconditioner?, I wonder if a user is allowed to provide his own (separate) L and U Mat for preconditioning (like how it works in Matlab solvers), e.g.
>>>> 
>>>> x = qmr(A,b,Tol,MaxIter,L,U,x)
>>>> 
>>>> As I already explicitly constructed the L and U matrices in Fortran, it is not hard to convert them to Mat in Petsc to test Petsc and my Fortran code head-to-head. In that case, the A, b, x, and L/U are all identical, it would be easier to see where the problem came from. 
>>>> 
>>>> 
>>>     No, we don't provide this kind of support
>>> 
>>> 
>>>> 
>>>> BTW, there is another thing I wondered - is there a way to output residual in unpreconditioned norm? I tried to
>>>> 
>>>> call KSPSetNormType(ksp_local, KSP_NORM_UNPRECONDITIONED, ierr)
>>>> 
>>>> But always get an error that current ksp does not support unpreconditioned in LEFT/RIGHT (either way). Is it possible to do that (output unpreconditioned residual) in PETSc at all?
>>> 
>>>   -ksp_monitor_true_residual    You can also run GMRES (and some other methods) with right preconditioning, -ksp_pc_side right  then the residual computed is by the algorithm the unpreconditioned residual
>>> 
>>>   KSPSetNormType sets the norm used in the algorithm, it generally always has to left or right, only a couple algorithms support unpreconditioned directly.
>>> 
>>>   Barry
>>> 
>>> 
>>>> 
>>>> Cheers, 
>>>> Hao
>>>> 
>>>> 
>>>> From: Smith, Barry F. <bsmith at mcs.anl.gov>
>>>> Sent: Tuesday, February 4, 2020 8:27 PM
>>>> To: Hao DONG <dong-hao at outlook.com>
>>>> Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
>>>> Subject: Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC?
>>>> 
>>>> 
>>>> 
>>>>> On Feb 4, 2020, at 12:41 PM, Hao DONG <dong-hao at outlook.com> wrote:
>>>>> 
>>>>> Dear all, 
>>>>> 
>>>>> 
>>>>> I have a few questions about the implementation of diagonal ILU PC in PETSc. I want to solve a very simple system with KSP (in parallel), the nature of the system (finite difference time-harmonic Maxwell) is probably not important to the question itself. Long story short, I just need to solve a set of equations of Ax = b with a block diagonal system matrix, like (not sure if the mono font works): 
>>>>> 
>>>>>   |X    |  
>>>>> A =|  Y  |  
>>>>>   |    Z| 
>>>>> 
>>>>> Note that A is not really block-diagonal, it?s just a multi-diagonal matrix determined by the FD mesh, where most elements are close to diagonal. So instead of a full ILU decomposition, a D-ILU is a good approximation as a preconditioner, and the number of blocks should not matter too much: 
>>>>> 
>>>>>    |Lx      |         |Ux      |
>>>>> L = |   Ly   | and U = |   Uy   |
>>>>>    |      Lz|         |      Uz|
>>>>> 
>>>>> Where [Lx, Ux] = ILU0(X), etc. Indeed, the D-ILU preconditioner (with 3N blocks) is quite efficient with Krylov subspace methods like BiCGstab or QMR in my sequential Fortran/Matlab code. 
>>>>> 
>>>>> So like most users, I am looking for a parallel implement with this problem in PETSc. After looking through the manual, it seems that the most straightforward way to do it is through PCBJACOBI. Not sure I understand it right, I just setup a 3-block PCJACOBI and give each of the block a KSP with PCILU. Is this supposed to be equivalent to my D-ILU preconditioner? My little block of fortran code would look like: 
>>>>> ...
>>>>>      call PCBJacobiSetTotalBlocks(pc_local,Ntotal,                   &
>>>>>     &     isubs,ierr)
>>>>>      call PCBJacobiSetLocalBlocks(pc_local, Nsub,                    &
>>>>>     &    isubs(istart:iend),ierr)
>>>>>      ! set up the block jacobi structure
>>>>>      call KSPSetup(ksp_local,ierr)
>>>>>      ! allocate sub ksps
>>>>>      allocate(ksp_sub(Nsub))
>>>>>      call PCBJacobiGetSubKSP(pc_local,Nsub,istart,                   &
>>>>>     &     ksp_sub,ierr)
>>>>>      do i=1,Nsub
>>>>>          call KSPGetPC(ksp_sub(i),pc_sub,ierr)
>>>>>          !ILU preconditioner
>>>>>          call PCSetType(pc_sub,ptype,ierr)
>>>>>          call PCFactorSetLevels(pc_sub,1,ierr) ! use ILU(1) here
>>>>>          call KSPSetType(ksp_sub(i),KSPPREONLY,ierr)]
>>>>>      end do
>>>>>      call KSPSetTolerances(ksp_local,KSSiter%tol,PETSC_DEFAULT_REAL, &
>>>>>     &     PETSC_DEFAULT_REAL,KSSiter%maxit,ierr)
>>>>> ? 
>>>> 
>>>>     This code looks essentially right. You should call with -ksp_view to see exactly what PETSc is using for a solver. 
>>>> 
>>>>> 
>>>>> I understand that the parallel performance may not be comparable, so I first set up a one-process test (with MPIAij, but all the rows are local since there is only one process). The system is solved without any problem (identical results within error). But the performance is actually a lot worse (code built without debugging flags in performance tests) than my own home-brew implementation in Fortran (I wrote my own ILU0 in CSR sparse matrix format), which is hard to believe. I suspect the difference is from the PC as the PETSc version took much more BiCGstab iterations (60-ish vs 100-ish) to converge to the same relative tol. 
>>>> 
>>>>   PETSc uses GMRES by default with a restart of 30 and left preconditioning. It could be different exact choices in the solver (which is why -ksp_view is so useful) can explain the differences in the runs between your code and PETSc's
>>>>> 
>>>>> This is further confirmed when I change the setup of D-ILU (using 6 or 9 blocks instead of 3). While my Fortran/Matlab codes see minimal performance difference (<5%) when I play with the D-ILU setup, increasing the number of D-ILU blocks from 3 to 9 caused the ksp setup with PCBJACOBI to suffer a performance decrease of more than 50% in sequential test.
>>>> 
>>>>   This is odd, the more blocks the smaller each block so the quicker the ILU set up should be. You can run various cases with -log_view and send us the output to see what is happening at each part of the computation in time.
>>>> 
>>>>> So my implementation IS somewhat different in PETSc. Do I miss something in the PCBJACOBI setup? Or do I have some fundamental misunderstanding of how PCBJACOBI works in PETSc? 
>>>> 
>>>>   Probably not.
>>>>> 
>>>>> If this is not the right way to implement a block diagonal ILU as (parallel) PC, please kindly point me to the right direction. I searched through the mail list to find some answers, only to find a couple of similar questions... An example would be nice.
>>>> 
>>>>   You approach seems fundamentally right but I cannot be sure of possible bugs.
>>>>> 
>>>>> On the other hand, does PETSc support a simple way to use explicit L/U matrix as a preconditioner? I can import the  D-ILU matrix (I already converted my A matrix into Mat) constructed in my Fortran code to make a better comparison. Or do I have to construct my own PC using PCshell? If so, is there a good tutorial/example to learn about how to use PCSHELL (in a more advanced sense, like how to setup pc side and transpose)? 
>>>> 
>>>>   Not sure what you mean by explicit L/U matrix as a preconditioner. As Hong said, yes you can use a parallel LU from MUMPS or SuperLU_DIST or Pastix as the solver. You do not need any shell code. You simply need to set the PCType to lu 
>>>> 
>>>>   You can also set all this options from the command line and don't need to write the code specifically. So call KSPSetFromOptions() and then for example
>>>> 
>>>>    -pc_type bjacobi  -pc_bjacobi_local_blocks 3 -pc_sub_type ilu (this last one is applied to each block so you could use -pc_type lu and it would use lu on each block.) 
>>>> 
>>>>   -ksp_type_none  -pc_type lu -pc_factor_mat_solver_type mumps  (do parallel LU with mumps)
>>>> 
>>>> By not hardwiring in the code and just using options you can test out different cases much quicker
>>>> 
>>>> Use -ksp_view to make sure that is using the solver the way you expect.
>>>> 
>>>> Barry
>>>> 
>>>> 
>>>> 
>>>>   Barry
>>>> 
>>>>> 
>>>>> Thanks in advance, 
>>>>> 
>>>>> Hao
>> 
>> <configure.log><configure.log>
> 


From bsmith at mcs.anl.gov  Mon Feb 10 09:07:33 2020
From: bsmith at mcs.anl.gov (Smith, Barry F.)
Date: Mon, 10 Feb 2020 15:07:33 +0000
Subject: [petsc-users] What is the right way to implement a (block)
 Diagonal ILU as PC?
In-Reply-To: <9DF1BA10-D81B-4BCF-98EE-0179B9A681BA@outlook.com>
References: <MN2PR07MB6239B134B7BCE032E8ACC4C995030@MN2PR07MB6239.namprd07.prod.outlook.com>
	<264F91C4-8558-4850-9B4B-ABE4123C2A2C@anl.gov>
	<MN2PR07MB6239E7694FFB455C533991FB95020@MN2PR07MB6239.namprd07.prod.outlook.com>
	<4A373D93-4018-45E0-B805-3ECC528472DD@mcs.anl.gov>
	<BN8PR07MB6228BCD9B6D1205CB3E6D60F951D0@BN8PR07MB6228.namprd07.prod.outlook.com>
	<A3BEB448-119F-472E-A133-099EB4332E47@mcs.anl.gov>
	<MN2PR07MB6239E5B73B916F6BDD8279AA951C0@MN2PR07MB6239.namprd07.prod.outlook.com>
	<EF5FCE13-A9D0-4E50-B20C-D7D7DE7AE628@mcs.anl.gov>
	<9DF1BA10-D81B-4BCF-98EE-0179B9A681BA@outlook.com>
Message-ID: <FD4215CA-3D56-4B7A-A1B1-17301BE8285B@mcs.anl.gov>


  You should google for preconditioners/solvers for 3D time-harmonic Maxwell equation arises from stagger-grid finite difference.  Maxwell has its own particular structure and difficulties with iterative solvers depending on the parameters.  Also see page 87 in https://www.mcs.anl.gov/petsc/petsc-current/docs/manual.pdf


    Barry

  -pc_type asm may work well for a handful of processors. 

> On Feb 10, 2020, at 8:47 AM, Hao DONG <dong-hao at outlook.com> wrote:
> 
> Hi Barry, 
> 
> Thank you for you suggestions (and insights)! Indeed my initial motivation to try out PETSc is the different methods. As my matrix pattern is relatively simple (3D time-harmonic Maxwell equation arises from stagger-grid finite difference), also considering the fact that I am not wealthy enough to utilize the direct solvers, I was looking for a fast Krylov subspace method / preconditioner that scale well with, say, tens of cpu cores. 
> 
> As a simple block-Jacobian preconditioner seems to lose its efficiency with more than a handful of blocks, I planned to look into other methods/preconditioners, e.g. multigrid (as preconditioner) or domain decomposition methods. But I will probably need to look through a number of literatures before laying my hands on those (or bother you with more questions!). Anyway, thanks again for your kind help. 
> 
> 
> All the best, 
> Hao
> 
>> On Feb 8, 2020, at 8:02 AM, Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
>> 
>> 
>> 
>>> On Feb 7, 2020, at 7:44 AM, Hao DONG <dong-hao at outlook.com> wrote:
>>> 
>>> Thanks, Barry, I really appreciate your help - 
>>> 
>>> I removed the OpenMP flags and rebuilt PETSc. So the problem is from the BLAS lib I linked?
>> 
>> Yes, the openmp causes it to run in parallel, but the problem is not big enough and the machine is not good enough for parallel BLAS to speed things up, instead it slows things down a lot. We see this often, parallel BLAS must be used with care
>> 
>>> Not sure which version my BLAS is, though? But I also included the -download-Scalapack option. Shouldn?t that enable linking with PBLAS automatically?
>>> 
>>> After looking at the bcgs code in PETSc, I suppose the iteration residual recorded is indeed recorded twice per one "actual iteration?. So that can explain the difference of iteration numbers. 
>>> 
>>> My laptop is indeed an old machine (MBP15 mid-2014). I just cannot work with vi without a physical ESC key...
>> 
>>  The latest has a physical ESC, I am stuff without the ESC for a couple more years.
>> 
>>> I have attached the configure.log -didn?t know that it is so large! 
>>> 
>>> Anyway, it seems that the removal of -openmp changes quite a lot of things, the performance is indeed getting much better - the flop/sec increases by a factor of 3. Still, I am getting 20 percent of VecMDot, but no VecMDot in BCGS all (see output below), is that a feature of gmres method? 
>> 
>>  Yes, GMRES orthogonalizes against the last restart directions which uses these routines while  BCGS does not, this is why BCGS is cheaper per iteration.
>> 
>>   PETSc is no faster than your code because the algorithm is the same, the compilers the same, and the hardware the same. No way to have clever tricks for PETSc to be much faster. What PETS provides is a huge variety of tested algorithms that no single person could code on their own. Anything in PETSc you could code yourself if you had endless time and get basically the same performance.
>> 
>> Barry
>> 
>> 
>>> 
>>> here is the output of the same problem with:  
>>> 
>>> -pc_type bjacobi -pc_bjacobi_local_blocks 3 -sub_pc_type ilu -ksp_type gmres -ksp_monitor -ksp_view
>>> 
>>> 
>>> ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
>>> 
>>> Mod3DMT.test on a arch-darwin-c-opt named Haos-MBP with 1 processor, by donghao Fri Feb  7 10:26:19 2020
>>> Using Petsc Release Version 3.12.3, unknown
>>> 
>>>                         Max       Max/Min     Avg       Total
>>> Time (sec):           2.520e+00     1.000   2.520e+00
>>> Objects:              1.756e+03     1.000   1.756e+03
>>> Flop:                 7.910e+09     1.000   7.910e+09  7.910e+09
>>> Flop/sec:             3.138e+09     1.000   3.138e+09  3.138e+09
>>> MPI Messages:         0.000e+00     0.000   0.000e+00  0.000e+00
>>> MPI Message Lengths:  0.000e+00     0.000   0.000e+00  0.000e+00
>>> MPI Reductions:       0.000e+00     0.000
>>> 
>>> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
>>>                            e.g., VecAXPY() for real vectors of length N --> 2N flop
>>>                            and VecAXPY() for complex vectors of length N --> 8N flop
>>> 
>>> Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
>>>                        Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
>>> 0:      Main Stage: 2.5204e+00 100.0%  7.9096e+09 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>>> 
>>> ------------------------------------------------------------------------------------------------------------------------
>>> ?
>>> ------------------------------------------------------------------------------------------------------------------------
>>> Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
>>>                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>>> ------------------------------------------------------------------------------------------------------------------------
>>> 
>>> --- Event Stage 0: Main Stage
>>> 
>>> BuildTwoSidedF         1 1.0 3.4000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatMult               75 1.0 6.2884e-01 1.0 1.88e+09 1.0 0.0e+00 0.0e+00 0.0e+00 25 24  0  0  0  25 24  0  0  0  2991
>>> MatSolve             228 1.0 4.4164e-01 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 14  0  0  0  18 14  0  0  0  2445
>>> MatLUFactorNum         3 1.0 4.1317e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0   574
>>> MatILUFactorSym        3 1.0 2.3858e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>> MatAssemblyBegin       5 1.0 4.4000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatAssemblyEnd         5 1.0 1.5067e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>> MatGetRowIJ            3 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatCreateSubMats       1 1.0 2.4558e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>> MatGetOrdering         3 1.0 1.3290e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatView                3 1.0 1.2800e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecMDot               72 1.0 4.9875e-01 1.0 2.25e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 28  0  0  0  20 28  0  0  0  4509
>>> VecNorm               76 1.0 6.6666e-02 1.0 1.70e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  2  0  0  0   3  2  0  0  0  2544
>>> VecScale              75 1.0 1.7982e-02 1.0 8.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0  4653
>>> VecCopy                3 1.0 1.5080e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecSet               276 1.0 9.6784e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  4  0  0  0  0   4  0  0  0  0     0
>>> VecAXPY                6 1.0 3.6860e-03 1.0 1.34e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  3632
>>> VecMAXPY              75 1.0 4.0490e-01 1.0 2.41e+09 1.0 0.0e+00 0.0e+00 0.0e+00 16 30  0  0  0  16 30  0  0  0  5951
>>> VecAssemblyBegin       2 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecAssemblyEnd         2 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecScatterBegin       76 1.0 5.3800e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecNormalize          75 1.0 8.3690e-02 1.0 2.51e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  3  0  0  0   3  3  0  0  0  2999
>>> KSPSetUp               4 1.0 1.1663e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> KSPSolve               1 1.0 2.2119e+00 1.0 7.91e+09 1.0 0.0e+00 0.0e+00 0.0e+00 88100  0  0  0  88100  0  0  0  3576
>>> KSPGMRESOrthog        72 1.0 8.7843e-01 1.0 4.50e+09 1.0 0.0e+00 0.0e+00 0.0e+00 35 57  0  0  0  35 57  0  0  0  5121
>>> PCSetUp                4 1.0 9.2448e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  4  0  0  0  0   4  0  0  0  0   257
>>> PCSetUpOnBlocks        1 1.0 6.6597e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0   356
>>> PCApply               76 1.0 4.6281e-01 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 14  0  0  0  18 14  0  0  0  2333
>>> PCApplyOnBlocks      228 1.0 4.6262e-01 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 14  0  0  0  18 14  0  0  0  2334
>>> ------------------------------------------------------------------------------------------------------------------------
>>> 
>>> Average time to get PetscTime(): 1e-07
>>> #PETSc Option Table entries:
>>> -I LBFGS
>>> -ksp_type gmres
>>> -ksp_view
>>> -log_view
>>> -pc_bjacobi_local_blocks 3
>>> -pc_type bjacobi
>>> -sub_pc_type ilu
>>> #End of PETSc Option Table entries
>>> Compiled with FORTRAN kernels
>>> Compiled with full precision matrices (default)
>>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 16 sizeof(PetscInt) 4
>>> Configure options: --with-scalar-type=complex --download-mumps --download-scalapack --with-fortran-kernels=1 --          FOPTFLAGS="-O3 -ffree-line-length-0 -msse2" --COPTFLAGS="-O3 -msse2" --CXXOPTFLAGS="-O3 -msse2" --with-debugging=0
>>> -----------------------------------------
>>> Libraries compiled on 2020-02-07 10:07:42 on Haos-MBP
>>> Machine characteristics: Darwin-19.3.0-x86_64-i386-64bit
>>> Using PETSc directory: /Users/donghao/src/git/PETSc-current
>>> Using PETSc arch: arch-darwin-c-opt
>>> -----------------------------------------
>>> 
>>> Using C compiler: mpicc  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fno-stack-   check -Qunused-arguments -fvisibility=hidden -O3 -msse2
>>> Using Fortran compiler: mpif90  -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -O3 -ffree-line-length-0 -         msse2
>>> -----------------------------------------
>>> 
>>> Using include paths: -I/Users/donghao/src/git/PETSc-current/include -I/Users/donghao/src/git/PETSc-current/arch-darwin-  c-opt/include
>>> -----------------------------------------
>>> 
>>> Using C linker: mpicc
>>> Using Fortran linker: mpif90
>>> Using libraries: -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/donghao/src/git/PETSc-   current/arch-darwin-c-opt/lib -lpetsc -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/    donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -Wl,-rpath,/usr/local/opt/libevent/lib -L/usr/local/opt/libevent/    lib -Wl,-rpath,/usr/local/Cellar/open-mpi/4.0.2/lib -L/usr/local/Cellar/open-mpi/4.0.2/lib -Wl,-rpath,/usr/local/Cellar/ gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple-darwin19/9.2.0 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple-        darwin19/9.2.0 -Wl,-rpath,/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -lcmumps -   ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -llapack -lblas -lc++ -ldl -lmpi_usempif08 -                 lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lquadmath -lm -lc++ -ldl
>>> -----------------------------------------
>>> 
>>> 
>>> 
>>> The BCGS solver performance is now comparable to my own Fortran code (1.84s). Still, I feel that there is something wrong hidden somewhere in my setup - a professional lib should to perform better, I believe. Any other ideas that I can look into? Interestingly there is no VecMDot operation at all! Here is the output with the option of: 
>>> 
>>> -pc_type bjacobi -pc_bjacobi_local_blocks 3 -sub_pc_type ilu -ksp_type bcgs -ksp_monitor -ksp_view
>>> 
>>> 
>>> 
>>> 
>>> ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
>>> 
>>> Mod3DMT.test on a arch-darwin-c-opt named Haos-MBP with 1 processor, by donghao Fri Feb  7 10:38:00 2020
>>> Using Petsc Release Version 3.12.3, unknown
>>> 
>>>                         Max       Max/Min     Avg       Total
>>> Time (sec):           2.187e+00     1.000   2.187e+00
>>> Objects:              1.155e+03     1.000   1.155e+03
>>> Flop:                 4.311e+09     1.000   4.311e+09  4.311e+09
>>> Flop/sec:             1.971e+09     1.000   1.971e+09  1.971e+09
>>> MPI Messages:         0.000e+00     0.000   0.000e+00  0.000e+00
>>> MPI Message Lengths:  0.000e+00     0.000   0.000e+00  0.000e+00
>>> MPI Reductions:       0.000e+00     0.000
>>> 
>>> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
>>>                            e.g., VecAXPY() for real vectors of length N --> 2N flop
>>>                            and VecAXPY() for complex vectors of length N --> 8N flop
>>> 
>>> Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
>>>                        Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
>>> 0:      Main Stage: 2.1870e+00 100.0%  4.3113e+09 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>>> 
>>> ------------------------------------------------------------------------------------------------------------------------
>>> 
>>> ------------------------------------------------------------------------------------------------------------------------
>>> Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
>>>                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>>> ------------------------------------------------------------------------------------------------------------------------
>>> 
>>> --- Event Stage 0: Main Stage
>>> 
>>> BuildTwoSidedF         1 1.0 2.2000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatMult               83 1.0 7.8726e-01 1.0 2.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 36 48  0  0  0  36 48  0  0  0  2644
>>> MatSolve             252 1.0 5.5656e-01 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 25 28  0  0  0  25 28  0  0  0  2144
>>> MatLUFactorNum         3 1.0 4.5115e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   526
>>> MatILUFactorSym        3 1.0 2.5103e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>> MatAssemblyBegin       5 1.0 3.3000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatAssemblyEnd         5 1.0 1.5709e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>> MatGetRowIJ            3 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatCreateSubMats       1 1.0 2.8989e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>> MatGetOrdering         3 1.0 1.1200e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatView                3 1.0 1.2600e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecDot                82 1.0 8.9328e-02 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00  4  4  0  0  0   4  4  0  0  0  2048
>>> VecDotNorm2           41 1.0 9.9019e-02 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00  5  4  0  0  0   5  4  0  0  0  1848
>>> VecNorm               43 1.0 3.9988e-02 1.0 9.59e+07 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0  2399
>>> VecCopy                2 1.0 1.1150e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecSet               271 1.0 4.2833e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
>>> VecAXPY                1 1.0 5.9200e-04 1.0 2.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  3769
>>> VecAXPBYCZ            82 1.0 1.1448e-01 1.0 3.66e+08 1.0 0.0e+00 0.0e+00 0.0e+00  5  8  0  0  0   5  8  0  0  0  3196
>>> VecWAXPY              82 1.0 6.7460e-02 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  4  0  0  0   3  4  0  0  0  2712
>>> VecAssemblyBegin       2 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecAssemblyEnd         2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecScatterBegin       84 1.0 5.2800e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> KSPSetUp               4 1.0 1.4765e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>> KSPSolve               1 1.0 1.8514e+00 1.0 4.31e+09 1.0 0.0e+00 0.0e+00 0.0e+00 85100  0  0  0  85100  0  0  0  2329
>>> PCSetUp                4 1.0 1.0193e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  5  1  0  0  0   5  1  0  0  0   233
>>> PCSetUpOnBlocks        1 1.0 7.1421e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  3  1  0  0  0   3  1  0  0  0   332
>>> PCApply               84 1.0 5.7927e-01 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 26 28  0  0  0  26 28  0  0  0  2060
>>> PCApplyOnBlocks      252 1.0 5.7902e-01 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 26 28  0  0  0  26 28  0  0  0  2061
>>> ------------------------------------------------------------------------------------------------------------------------
>>> 
>>> 
>>> Cheers, 
>>> Hao
>>> 
>>> 
>>> 
>>> From: Smith, Barry F. <bsmith at mcs.anl.gov>
>>> Sent: Thursday, February 6, 2020 7:03 PM
>>> To: Hao DONG <dong-hao at outlook.com>
>>> Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
>>> Subject: Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC?
>>> 
>>> 
>>> Read my comments ALL the way down, they go a long way.
>>> 
>>>> On Feb 6, 2020, at 3:43 AM, Hao DONG <dong-hao at outlook.com> wrote:
>>>> 
>>>> Dear Hong and Barry, 
>>>> 
>>>> Thanks for the suggestions. So there could be some problems in my PETSc configuration? - but my PETSc lib was indeed compiled without the debug flags (--with-debugging=0). I use GCC/GFortran (Home-brew GCC 9.2.0) for the compiling and building of PETSc and my own fortran code. My Fortran compiling flags are simply like: 
>>>> 
>>>> -O3 -ffree-line-length-none -fastsse 
>>>> 
>>>> Which is also used for -FOPTFLAGS in PETSc (I added -openmp for PETSc, but not my fortran code, as I don?t have any OMP optimizations in my program). Note the performance test results I listed yesterday (e.g. 4.08s with 41 bcgs iterations.) are without any CSR-array->PETSc translation overhead (only include the set and solve part). 
>>> 
>>>  PETSc doesn't use -openmp in any way for its solvers. Do not use this option, it may be slowing the code down. Please send configure.log
>>> 
>>>> 
>>>> I have two questions about the performance difference - 
>>>> 
>>>> 1. Is ilu only factorized once for each iteration, or ilu is performed at every outer ksp iteration steps? Sounds unlikely - but if so, this could cause some extra overheads. 
>>> 
>>>  ILU is ONLY done if the matrix has changed (which seems wrong).
>>>> 
>>>> 2. Some KSP solvers like BCGS or TFQMR has two ?half-iterations? for each iteration step. Not sure how it works in PETSc, but is that possible that both the two ?half" relative residuals are counted in the output array, doubling the number of iterations (but that cannot explain the extra time consumed)?
>>> 
>>>  Yes, PETSc might report them as two, you need to check the exact code.
>>> 
>>>> 
>>>> Anyway, the output with -log_view from the same 278906 by 278906 matrix with 3-block D-ILU in PETSc is as follows: 
>>>> 
>>>> 
>>>> ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
>>>> 
>>>> MEMsolv.lu on a arch-darwin-c-opt named Haos-MBP with 1 processor, by donghao Thu Feb  6 09:07:35 2020
>>>> Using Petsc Release Version 3.12.3, unknown
>>>> 
>>>>                         Max       Max/Min     Avg       Total
>>>> Time (sec):           4.443e+00     1.000   4.443e+00
>>>> Objects:              1.155e+03     1.000   1.155e+03
>>>> Flop:                 4.311e+09     1.000   4.311e+09  4.311e+09
>>>> Flop/sec:             9.703e+08     1.000   9.703e+08  9.703e+08
>>>> MPI Messages:         0.000e+00     0.000   0.000e+00  0.000e+00
>>>> MPI Message Lengths:  0.000e+00     0.000   0.000e+00  0.000e+00
>>>> MPI Reductions:       0.000e+00     0.000
>>>> 
>>>> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
>>>>                            e.g., VecAXPY() for real vectors of length N --> 2N flop
>>>>                            and VecAXPY() for complex vectors of length N --> 8N flop
>>>> 
>>>> Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
>>>>                        Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
>>>> 0:      Main Stage: 4.4435e+00 100.0%  4.3113e+09 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>>>> 
>>>> ????????????????????????????????????????????????????????????
>>>> See the 'Profiling' chapter of the users' manual for details on interpreting output.
>>>> Phase summary info:
>>>>   Count: number of times phase was executed
>>>>   Time and Flop: Max - maximum over all processors
>>>>                  Ratio - ratio of maximum to minimum over all processors
>>>>   Mess: number of messages sent
>>>>   AvgLen: average message length (bytes)
>>>>   Reduct: number of global reductions
>>>>   Global: entire computation
>>>>   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
>>>>      %T - percent time in this phase         %F - percent flop in this phase
>>>>      %M - percent messages in this phase     %L - percent message lengths in this phase
>>>>      %R - percent reductions in this phase
>>>>   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
>>>> ------------------------------------------------------------------------------------------------------------------------
>>>> Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
>>>>                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>>>> ------------------------------------------------------------------------------------------------------------------------
>>>> 
>>>> --- Event Stage 0: Main Stage
>>>> 
>>>> BuildTwoSidedF         1 1.0 2.3000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> MatMult               83 1.0 1.7815e+00 1.0 2.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 40 48  0  0  0  40 48  0  0  0  1168
>>>> MatSolve             252 1.0 1.2708e+00 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 29 28  0  0  0  29 28  0  0  0   939
>>>> MatLUFactorNum         3 1.0 7.9725e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   298
>>>> MatILUFactorSym        3 1.0 2.6998e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>>> MatAssemblyBegin       5 1.0 3.6000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> MatAssemblyEnd         5 1.0 3.1619e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>>> MatGetRowIJ            3 1.0 2.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> MatCreateSubMats       1 1.0 3.9659e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>>> MatGetOrdering         3 1.0 4.3070e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> MatView                3 1.0 1.3600e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> VecDot                82 1.0 1.8948e-01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00  4  4  0  0  0   4  4  0  0  0   966
>>>> VecDotNorm2           41 1.0 1.6812e-01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00  4  4  0  0  0   4  4  0  0  0  1088
>>>> VecNorm               43 1.0 9.5099e-02 1.0 9.59e+07 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0  1009
>>>> VecCopy                2 1.0 1.0120e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> VecSet               271 1.0 3.8922e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>>> VecAXPY                1 1.0 7.7200e-04 1.0 2.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2890
>>>> VecAXPBYCZ            82 1.0 2.4370e-01 1.0 3.66e+08 1.0 0.0e+00 0.0e+00 0.0e+00  5  8  0  0  0   5  8  0  0  0  1502
>>>> VecWAXPY              82 1.0 1.4148e-01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  4  0  0  0   3  4  0  0  0  1293
>>>> VecAssemblyBegin       2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> VecAssemblyEnd         2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> VecScatterBegin       84 1.0 5.9300e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> KSPSetUp               4 1.0 1.4167e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> KSPSolve               1 1.0 4.0250e+00 1.0 4.31e+09 1.0 0.0e+00 0.0e+00 0.0e+00 91100  0  0  0  91100  0  0  0  1071
>>>> PCSetUp                4 1.0 1.5207e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  3  1  0  0  0   3  1  0  0  0   156
>>>> PCSetUpOnBlocks        1 1.0 1.1116e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  3  1  0  0  0   3  1  0  0  0   214
>>>> PCApply               84 1.0 1.2912e+00 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 29 28  0  0  0  29 28  0  0  0   924
>>>> PCApplyOnBlocks      252 1.0 1.2909e+00 1.0 1.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 29 28  0  0  0  29 28  0  0  0   924
>>>> ------------------------------------------------------------------------------------------------------------------------
>>>> 
>>>> # I skipped the memory part - the options (and compiler options) are as follows: 
>>>> 
>>>> #PETSc Option Table entries: 
>>>> -ksp_type bcgs
>>>> -ksp_view
>>>> -log_view
>>>> -pc_bjacobi_local_blocks 3
>>>> -pc_factor_levels 0
>>>> -pc_sub_type ilu
>>>> -pc_type bjacobi
>>>> #End of PETSc Option Table entries
>>>> Compiled with FORTRAN kernels
>>>> Compiled with full precision matrices (default)
>>>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 16 sizeof(PetscInt) 4
>>>> Configure options: --with-scalar-type=complex --download-mumps --download-scalapack --with-fortran-kernels=1 --          FOPTFLAGS=?-O3 -fastsse -mp -openmp? --COPTFLAGS=?-O3 -fastsse -mp -openmp? --CXXOPTFLAGS="-O3 -fastsse -mp -openmp" --     with-debugging=0
>>>> -----------------------------------------
>>>> Libraries compiled on 2020-02-03 10:44:31 on Haos-MBP
>>>> Machine characteristics: Darwin-19.2.0-x86_64-i386-64bit
>>>> Using PETSc directory: /Users/donghao/src/git/PETSc-current
>>>> Using PETSc arch: arch-darwin-c-opt
>>>> -----------------------------------------
>>>> 
>>>> Using C compiler: mpicc  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fno-stack-   check -Qunused-arguments -fvisibility=hidden
>>>> Using Fortran compiler: mpif90  -Wall -ffree-line-length-0 -Wno-unused-dummy-argument
>>>> 
>>>> Using include paths: -I/Users/donghao/src/git/PETSc-current/include -I/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/include
>>>> -----------------------------------------
>>>> 
>>>> Using C linker: mpicc
>>>> Using Fortran linker: mpif90
>>>> Using libraries: -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/donghao/src/git/PETSc-   current/arch-darwin-c-opt/lib -lpetsc -Wl,-rpath,/Users/donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -L/Users/    donghao/src/git/PETSc-current/arch-darwin-c-opt/lib -Wl,-rpath,/usr/local/opt/libevent/lib -L/usr/local/opt/libevent/    lib -Wl,-rpath,/usr/local/Cellar/open-mpi/4.0.2/lib -L/usr/local/Cellar/open-mpi/4.0.2/lib -Wl,-rpath,/usr/local/Cellar/ gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple-darwin19/9.2.0 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple-        darwin19/9.2.0 -Wl,-rpath,/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9 -lcmumps -   ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -llapack -lblas -lc++ -ldl -lmpi_usempif08 -                 lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lquadmath -lm -lc++ -ldl
>>>> 
>>>> 
>>>> On the other hand, running PETSc with 
>>>> 
>>>> -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type lu -ksp_type gmres -ksp_monitor -ksp_view -log_view
>>>> 
>>>> For the same problem takes 5.37s and  72 GMRES iterations. Our previous testings show that BiCGstab (bcgs in PETSc) is almost always the most effective KSP solver for our non-symmetrical complex system. Strangely, the system is still using ilu instead of lu for sub blocks. The output is like: 
>>> 
>>> -sub_pc_type lu
>>> 
>>>> 
>>>>  0 KSP Residual norm 2.480412407430e+02
>>>>  1 KSP Residual norm 8.848059967835e+01
>>>>  2 KSP Residual norm 3.415272863261e+01
>>>>  3 KSP Residual norm 1.563045190939e+01
>>>>  4 KSP Residual norm 6.241296940043e+00
>>>>  5 KSP Residual norm 2.739710899854e+00
>>>>  6 KSP Residual norm 1.391304148888e+00
>>>>  7 KSP Residual norm 7.959262020849e-01
>>>>  8 KSP Residual norm 4.828323055231e-01
>>>>  9 KSP Residual norm 2.918529739200e-01
>>>> 10 KSP Residual norm 1.905508589557e-01
>>>> 11 KSP Residual norm 1.291541892702e-01
>>>> 12 KSP Residual norm 8.827145774707e-02
>>>> 13 KSP Residual norm 6.521331095889e-02
>>>> 14 KSP Residual norm 5.095787952595e-02
>>>> 15 KSP Residual norm 4.043060387395e-02
>>>> 16 KSP Residual norm 3.232590200012e-02
>>>> 17 KSP Residual norm 2.593944982216e-02
>>>> 18 KSP Residual norm 2.064639483533e-02
>>>> 19 KSP Residual norm 1.653916663492e-02
>>>> 20 KSP Residual norm 1.334946415452e-02
>>>> 21 KSP Residual norm 1.092886880597e-02
>>>> 22 KSP Residual norm 8.988004105542e-03
>>>> 23 KSP Residual norm 7.466501315240e-03
>>>> 24 KSP Residual norm 6.284389135436e-03
>>>> 25 KSP Residual norm 5.425231669964e-03
>>>> 26 KSP Residual norm 4.766338253084e-03
>>>> 27 KSP Residual norm 4.241238878242e-03
>>>> 28 KSP Residual norm 3.808113525685e-03
>>>> 29 KSP Residual norm 3.449383788116e-03
>>>> 30 KSP Residual norm 3.126025526388e-03
>>>> 31 KSP Residual norm 2.958328054299e-03
>>>> 32 KSP Residual norm 2.802344900403e-03
>>>> 33 KSP Residual norm 2.621993580492e-03
>>>> 34 KSP Residual norm 2.430066269304e-03
>>>> 35 KSP Residual norm 2.259043079597e-03
>>>> 36 KSP Residual norm 2.104287972986e-03
>>>> 37 KSP Residual norm 1.952916080045e-03
>>>> 38 KSP Residual norm 1.804988937999e-03
>>>> 39 KSP Residual norm 1.643302117377e-03
>>>> 40 KSP Residual norm 1.471661332036e-03
>>>> 41 KSP Residual norm 1.286445911163e-03
>>>> 42 KSP Residual norm 1.127543025848e-03
>>>> 43 KSP Residual norm 9.777148275484e-04
>>>> 44 KSP Residual norm 8.293314450006e-04
>>>> 45 KSP Residual norm 6.989331136622e-04
>>>> 46 KSP Residual norm 5.852307780220e-04
>>>> 47 KSP Residual norm 4.926715539762e-04
>>>> 48 KSP Residual norm 4.215941372075e-04
>>>> 49 KSP Residual norm 3.699489548162e-04
>>>> 50 KSP Residual norm 3.293897163533e-04
>>>> 51 KSP Residual norm 2.959954542998e-04
>>>> 52 KSP Residual norm 2.700193032414e-04
>>>> 53 KSP Residual norm 2.461789791204e-04
>>>> 54 KSP Residual norm 2.218839085563e-04
>>>> 55 KSP Residual norm 1.945154309976e-04
>>>> 56 KSP Residual norm 1.661128781744e-04
>>>> 57 KSP Residual norm 1.413198766258e-04
>>>> 58 KSP Residual norm 1.213984003195e-04
>>>> 59 KSP Residual norm 1.044317450754e-04
>>>> 60 KSP Residual norm 8.919957502977e-05
>>>> 61 KSP Residual norm 8.042584301275e-05
>>>> 62 KSP Residual norm 7.292784493581e-05
>>>> 63 KSP Residual norm 6.481935501872e-05
>>>> 64 KSP Residual norm 5.718564652679e-05
>>>> 65 KSP Residual norm 5.072589750116e-05
>>>> 66 KSP Residual norm 4.487930741285e-05
>>>> 67 KSP Residual norm 3.941040674119e-05
>>>> 68 KSP Residual norm 3.492873281291e-05
>>>> 69 KSP Residual norm 3.103798339845e-05
>>>> 70 KSP Residual norm 2.822943237409e-05
>>>> 71 KSP Residual norm 2.610615023776e-05
>>>> 72 KSP Residual norm 2.441692671173e-05
>>>> KSP Object: 1 MPI processes
>>>>  type: gmres
>>>>    restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>>>>    happy breakdown tolerance 1e-30
>>>>  maximum iterations=150, nonzero initial guess
>>>>  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000.
>>>>  left preconditioning
>>>>  using PRECONDITIONED norm type for convergence test
>>>> PC Object: 1 MPI processes
>>>>  type: bjacobi
>>>>    number of blocks = 3
>>>>    Local solve is same for all blocks, in the following KSP and PC objects:
>>>>    KSP Object: (sub_) 1 MPI processes
>>>>      type: preonly
>>>>      maximum iterations=10000, initial guess is zero
>>>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>>>      left preconditioning
>>>>      using NONE norm type for convergence test
>>>>    PC Object: (sub_) 1 MPI processes
>>>>      type: ilu
>>>>        out-of-place factorization
>>>>        0 levels of fill
>>>>        tolerance for zero pivot 2.22045e-14
>>>>        matrix ordering: natural
>>>>        factor fill ratio given 1., needed 1.
>>>>          Factored matrix follows:
>>>>            Mat Object: 1 MPI processes
>>>>              type: seqaij
>>>>              rows=92969, cols=92969
>>>>              package used to perform factorization: petsc
>>>>              total: nonzeros=638417, allocated nonzeros=638417
>>>>              total number of mallocs used during MatSetValues calls=0
>>>>                not using I-node routines
>>>>      linear system matrix = precond matrix:
>>>>      Mat Object: 1 MPI processes
>>>>        type: seqaij
>>>>        rows=92969, cols=92969
>>>>        total: nonzeros=638417, allocated nonzeros=638417
>>>>        total number of mallocs used during MatSetValues calls=0
>>>>          not using I-node routines
>>>>  linear system matrix = precond matrix:
>>>>  Mat Object: 1 MPI processes
>>>>    type: mpiaij
>>>>    rows=278906, cols=278906
>>>>    total: nonzeros=3274027, allocated nonzeros=3274027
>>>>    total number of mallocs used during MatSetValues calls=0
>>>>      not using I-node (on process 0) routines
>>>> ...
>>>> ------------------------------------------------------------------------------------------------------------------------
>>>> Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
>>>>                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>>>> ------------------------------------------------------------------------------------------------------------------------
>>>> 
>>>> --- Event Stage 0: Main Stage
>>>> 
>>>> BuildTwoSidedF         1 1.0 2.3000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> MatMult               75 1.0 1.5812e+00 1.0 1.88e+09 1.0 0.0e+00 0.0e+00 0.0e+00 28 24  0  0  0  28 24  0  0  0  1189
>>>> MatSolve             228 1.0 1.1442e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 14  0  0  0  20 14  0  0  0   944
>>> 
>>>  These flop rates are ok, but not great. Perhaps an older machine.
>>> 
>>>> MatLUFactorNum         3 1.0 8.1930e-02 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0   290
>>>> MatILUFactorSym        3 1.0 2.7102e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> MatAssemblyBegin       5 1.0 3.7000e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> MatAssemblyEnd         5 1.0 3.1895e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>>> MatGetRowIJ            3 1.0 2.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> MatCreateSubMats       1 1.0 4.0904e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>>> MatGetOrdering         3 1.0 4.2640e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> MatView                3 1.0 1.4400e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> VecMDot               72 1.0 1.1984e+00 1.0 2.25e+09 1.0 0.0e+00 0.0e+00 0.0e+00 21 28  0  0  0  21 28  0  0  0  1877
>>> 
>>>   21 percent of the time in VecMDOT this is huge for s sequential fun. I think maybe you are using a terrible OpenMP BLAS? 
>>> 
>>>   Send configure.log 
>>> 
>>> 
>>>> VecNorm               76 1.0 1.6841e-01 1.0 1.70e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  2  0  0  0   3  2  0  0  0  1007
>>>> VecScale              75 1.0 1.8241e-02 1.0 8.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  4587
>>>> VecCopy                3 1.0 1.4970e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> VecSet               276 1.0 9.1970e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
>>>> VecAXPY                6 1.0 3.7450e-03 1.0 1.34e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  3575
>>>> VecMAXPY              75 1.0 1.0022e+00 1.0 2.41e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 30  0  0  0  18 30  0  0  0  2405
>>>> VecAssemblyBegin       2 1.0 1.0000e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> VecAssemblyEnd         2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> VecScatterBegin       76 1.0 5.5100e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> VecNormalize          75 1.0 1.8462e-01 1.0 2.51e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  3  0  0  0   3  3  0  0  0  1360
>>>> KSPSetUp               4 1.0 1.1341e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> KSPSolve               1 1.0 5.3123e+00 1.0 7.91e+09 1.0 0.0e+00 0.0e+00 0.0e+00 93100  0  0  0  93100  0  0  0  1489
>>>> KSPGMRESOrthog        72 1.0 2.1316e+00 1.0 4.50e+09 1.0 0.0e+00 0.0e+00 0.0e+00 37 57  0  0  0  37 57  0  0  0  2110
>>>> PCSetUp                4 1.0 1.5531e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0   153
>>>> PCSetUpOnBlocks        1 1.0 1.1343e-01 1.0 2.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0   209
>>>> PCApply               76 1.0 1.1671e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 14  0  0  0  20 14  0  0  0   925
>>>> PCApplyOnBlocks      228 1.0 1.1668e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 20 14  0  0  0  20 14  0  0  0   925
>>>> ????????????????????????????????????????????????????????????
>>>> ...
>>>> #PETSc Option Table entries:
>>>> -ksp_monitor
>>>> -ksp_type gmres
>>>> -ksp_view
>>>> -log_view
>>>> -pc_bjacobi_local_blocks 3
>>>> -pc_sub_type lu
>>>> -pc_type bjacobi
>>>> #End of PETSc Option Table entries
>>>> ...
>>>> 
>>>> Does any of the setup/output ring a bell? 
>>>> 
>>>> BTW, out of curiosity - what is a ?I-node? routine?
>>>> 
>>>> 
>>>> Cheers, 
>>>> Hao
>>>> 
>>>> 
>>>> From: Smith, Barry F. <bsmith at mcs.anl.gov>
>>>> Sent: Wednesday, February 5, 2020 9:42 PM
>>>> To: Hao DONG <dong-hao at outlook.com>
>>>> Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
>>>> Subject: Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC?
>>>> 
>>>> 
>>>> 
>>>>> On Feb 5, 2020, at 4:36 AM, Hao DONG <dong-hao at outlook.com> wrote:
>>>>> 
>>>>> Thanks a lot for your suggestions, Hong and Barry - 
>>>>> 
>>>>> As you suggested, I first tried the LU direct solvers (built-in and MUMPs) out this morning, which work perfectly, albeit slow. As my problem itself is a part of a PDE based optimization, the exact solution in the searching procedure is not necessary (I often set a relative tolerance of 1E-7/1E-8, etc. for Krylov subspace methods). Also tried BJACOBI with exact LU, the KSP just converges in one or two iterations, as expected. 
>>>>> 
>>>>> I added -kspview option for the D-ILU test (still with Block Jacobi as preconditioner and bcgs as KSP solver). The KSPview output from one of the examples in a toy model looks like: 
>>>>> 
>>>>> KSP Object: 1 MPI processes
>>>>>  type: bcgs
>>>>>  maximum iterations=120, nonzero initial guess
>>>>>  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000.
>>>>>  left preconditioning
>>>>>  using PRECONDITIONED norm type for convergence test
>>>>> PC Object: 1 MPI processes
>>>>>  type: bjacobi
>>>>>    number of blocks = 3
>>>>>    Local solve is same for all blocks, in the following KSP and PC objects:
>>>>>    KSP Object: (sub_) 1 MPI processes
>>>>>      type: preonly
>>>>>      maximum iterations=10000, initial guess is zero
>>>>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>>>>      left preconditioning
>>>>>      using NONE norm type for convergence test
>>>>>    PC Object: (sub_) 1 MPI processes
>>>>>      type: ilu
>>>>>        out-of-place factorization
>>>>>        0 levels of fill
>>>>>        tolerance for zero pivot 2.22045e-14
>>>>>        matrix ordering: natural
>>>>>        factor fill ratio given 1., needed 1.
>>>>>          Factored matrix follows:
>>>>>            Mat Object: 1 MPI processes
>>>>>              type: seqaij
>>>>>              rows=11294, cols=11294
>>>>>              package used to perform factorization: petsc
>>>>>              total: nonzeros=76008, allocated nonzeros=76008
>>>>>              total number of mallocs used during MatSetValues calls=0
>>>>>                not using I-node routines
>>>>>      linear system matrix = precond matrix:
>>>>>      Mat Object: 1 MPI processes
>>>>>        type: seqaij
>>>>>        rows=11294, cols=11294
>>>>>        total: nonzeros=76008, allocated nonzeros=76008
>>>>>        total number of mallocs used during MatSetValues calls=0
>>>>>          not using I-node routines
>>>>>  linear system matrix = precond matrix:
>>>>>  Mat Object: 1 MPI processes
>>>>>    type: mpiaij
>>>>>    rows=33880, cols=33880
>>>>>    total: nonzeros=436968, allocated nonzeros=436968
>>>>>    total number of mallocs used during MatSetValues calls=0
>>>>>      not using I-node (on process 0) routines
>>>>> 
>>>>> do you see something wrong with my setup?
>>>>> 
>>>>> I also tried a quick performance test with a small 278906 by 278906 matrix (3850990 nnz) with the following parameters: 
>>>>> 
>>>>> -ksp_type bcgs -pc_type bjacobi -pc_bjacobi_local_blocks 3 -pc_sub_type ilu -ksp_view
>>>>> 
>>>>> Reducing the relative residual to 1E-7 
>>>>> 
>>>>> Took 4.08s with 41 bcgs iterations. 
>>>>> 
>>>>> Merely changing the -pc_bjacobi_local_blocks to 6 
>>>>> 
>>>>> Took 7.02s with 73 bcgs iterations. 9 blocks would further take 9.45s with 101 bcgs iterations.
>>>> 
>>>>   This is normal. more blocks slower convergence
>>>>> 
>>>>> As a reference, my home-brew Fortran code solves the same problem (3-block D-ILU0) in 
>>>>> 
>>>>> 1.84s with 24 bcgs iterations (the bcgs code is also a home-brew one)?
>>>>> 
>>>>   Run the PETSc code with optimization ./configure --with-debugging=0  an run the code with -log_view this will show where the PETSc code is spending the time (send it to use)
>>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> Well, by saying ?use explicit L/U matrix as preconditioner?, I wonder if a user is allowed to provide his own (separate) L and U Mat for preconditioning (like how it works in Matlab solvers), e.g.
>>>>> 
>>>>> x = qmr(A,b,Tol,MaxIter,L,U,x)
>>>>> 
>>>>> As I already explicitly constructed the L and U matrices in Fortran, it is not hard to convert them to Mat in Petsc to test Petsc and my Fortran code head-to-head. In that case, the A, b, x, and L/U are all identical, it would be easier to see where the problem came from. 
>>>>> 
>>>>> 
>>>>    No, we don't provide this kind of support
>>>> 
>>>> 
>>>>> 
>>>>> BTW, there is another thing I wondered - is there a way to output residual in unpreconditioned norm? I tried to
>>>>> 
>>>>> call KSPSetNormType(ksp_local, KSP_NORM_UNPRECONDITIONED, ierr)
>>>>> 
>>>>> But always get an error that current ksp does not support unpreconditioned in LEFT/RIGHT (either way). Is it possible to do that (output unpreconditioned residual) in PETSc at all?
>>>> 
>>>>  -ksp_monitor_true_residual    You can also run GMRES (and some other methods) with right preconditioning, -ksp_pc_side right  then the residual computed is by the algorithm the unpreconditioned residual
>>>> 
>>>>  KSPSetNormType sets the norm used in the algorithm, it generally always has to left or right, only a couple algorithms support unpreconditioned directly.
>>>> 
>>>>  Barry
>>>> 
>>>> 
>>>>> 
>>>>> Cheers, 
>>>>> Hao
>>>>> 
>>>>> 
>>>>> From: Smith, Barry F. <bsmith at mcs.anl.gov>
>>>>> Sent: Tuesday, February 4, 2020 8:27 PM
>>>>> To: Hao DONG <dong-hao at outlook.com>
>>>>> Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
>>>>> Subject: Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC?
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Feb 4, 2020, at 12:41 PM, Hao DONG <dong-hao at outlook.com> wrote:
>>>>>> 
>>>>>> Dear all, 
>>>>>> 
>>>>>> 
>>>>>> I have a few questions about the implementation of diagonal ILU PC in PETSc. I want to solve a very simple system with KSP (in parallel), the nature of the system (finite difference time-harmonic Maxwell) is probably not important to the question itself. Long story short, I just need to solve a set of equations of Ax = b with a block diagonal system matrix, like (not sure if the mono font works): 
>>>>>> 
>>>>>>  |X    |  
>>>>>> A =|  Y  |  
>>>>>>  |    Z| 
>>>>>> 
>>>>>> Note that A is not really block-diagonal, it?s just a multi-diagonal matrix determined by the FD mesh, where most elements are close to diagonal. So instead of a full ILU decomposition, a D-ILU is a good approximation as a preconditioner, and the number of blocks should not matter too much: 
>>>>>> 
>>>>>>   |Lx      |         |Ux      |
>>>>>> L = |   Ly   | and U = |   Uy   |
>>>>>>   |      Lz|         |      Uz|
>>>>>> 
>>>>>> Where [Lx, Ux] = ILU0(X), etc. Indeed, the D-ILU preconditioner (with 3N blocks) is quite efficient with Krylov subspace methods like BiCGstab or QMR in my sequential Fortran/Matlab code. 
>>>>>> 
>>>>>> So like most users, I am looking for a parallel implement with this problem in PETSc. After looking through the manual, it seems that the most straightforward way to do it is through PCBJACOBI. Not sure I understand it right, I just setup a 3-block PCJACOBI and give each of the block a KSP with PCILU. Is this supposed to be equivalent to my D-ILU preconditioner? My little block of fortran code would look like: 
>>>>>> ...
>>>>>>     call PCBJacobiSetTotalBlocks(pc_local,Ntotal,                   &
>>>>>>    &     isubs,ierr)
>>>>>>     call PCBJacobiSetLocalBlocks(pc_local, Nsub,                    &
>>>>>>    &    isubs(istart:iend),ierr)
>>>>>>     ! set up the block jacobi structure
>>>>>>     call KSPSetup(ksp_local,ierr)
>>>>>>     ! allocate sub ksps
>>>>>>     allocate(ksp_sub(Nsub))
>>>>>>     call PCBJacobiGetSubKSP(pc_local,Nsub,istart,                   &
>>>>>>    &     ksp_sub,ierr)
>>>>>>     do i=1,Nsub
>>>>>>         call KSPGetPC(ksp_sub(i),pc_sub,ierr)
>>>>>>         !ILU preconditioner
>>>>>>         call PCSetType(pc_sub,ptype,ierr)
>>>>>>         call PCFactorSetLevels(pc_sub,1,ierr) ! use ILU(1) here
>>>>>>         call KSPSetType(ksp_sub(i),KSPPREONLY,ierr)]
>>>>>>     end do
>>>>>>     call KSPSetTolerances(ksp_local,KSSiter%tol,PETSC_DEFAULT_REAL, &
>>>>>>    &     PETSC_DEFAULT_REAL,KSSiter%maxit,ierr)
>>>>>> ? 
>>>>> 
>>>>>    This code looks essentially right. You should call with -ksp_view to see exactly what PETSc is using for a solver. 
>>>>> 
>>>>>> 
>>>>>> I understand that the parallel performance may not be comparable, so I first set up a one-process test (with MPIAij, but all the rows are local since there is only one process). The system is solved without any problem (identical results within error). But the performance is actually a lot worse (code built without debugging flags in performance tests) than my own home-brew implementation in Fortran (I wrote my own ILU0 in CSR sparse matrix format), which is hard to believe. I suspect the difference is from the PC as the PETSc version took much more BiCGstab iterations (60-ish vs 100-ish) to converge to the same relative tol. 
>>>>> 
>>>>>  PETSc uses GMRES by default with a restart of 30 and left preconditioning. It could be different exact choices in the solver (which is why -ksp_view is so useful) can explain the differences in the runs between your code and PETSc's
>>>>>> 
>>>>>> This is further confirmed when I change the setup of D-ILU (using 6 or 9 blocks instead of 3). While my Fortran/Matlab codes see minimal performance difference (<5%) when I play with the D-ILU setup, increasing the number of D-ILU blocks from 3 to 9 caused the ksp setup with PCBJACOBI to suffer a performance decrease of more than 50% in sequential test.
>>>>> 
>>>>>  This is odd, the more blocks the smaller each block so the quicker the ILU set up should be. You can run various cases with -log_view and send us the output to see what is happening at each part of the computation in time.
>>>>> 
>>>>>> So my implementation IS somewhat different in PETSc. Do I miss something in the PCBJACOBI setup? Or do I have some fundamental misunderstanding of how PCBJACOBI works in PETSc? 
>>>>> 
>>>>>  Probably not.
>>>>>> 
>>>>>> If this is not the right way to implement a block diagonal ILU as (parallel) PC, please kindly point me to the right direction. I searched through the mail list to find some answers, only to find a couple of similar questions... An example would be nice.
>>>>> 
>>>>>  You approach seems fundamentally right but I cannot be sure of possible bugs.
>>>>>> 
>>>>>> On the other hand, does PETSc support a simple way to use explicit L/U matrix as a preconditioner? I can import the  D-ILU matrix (I already converted my A matrix into Mat) constructed in my Fortran code to make a better comparison. Or do I have to construct my own PC using PCshell? If so, is there a good tutorial/example to learn about how to use PCSHELL (in a more advanced sense, like how to setup pc side and transpose)? 
>>>>> 
>>>>>  Not sure what you mean by explicit L/U matrix as a preconditioner. As Hong said, yes you can use a parallel LU from MUMPS or SuperLU_DIST or Pastix as the solver. You do not need any shell code. You simply need to set the PCType to lu 
>>>>> 
>>>>>  You can also set all this options from the command line and don't need to write the code specifically. So call KSPSetFromOptions() and then for example
>>>>> 
>>>>>   -pc_type bjacobi  -pc_bjacobi_local_blocks 3 -pc_sub_type ilu (this last one is applied to each block so you could use -pc_type lu and it would use lu on each block.) 
>>>>> 
>>>>>  -ksp_type_none  -pc_type lu -pc_factor_mat_solver_type mumps  (do parallel LU with mumps)
>>>>> 
>>>>> By not hardwiring in the code and just using options you can test out different cases much quicker
>>>>> 
>>>>> Use -ksp_view to make sure that is using the solver the way you expect.
>>>>> 
>>>>> Barry
>>>>> 
>>>>> 
>>>>> 
>>>>>  Barry
>>>>> 
>>>>>> 
>>>>>> Thanks in advance, 
>>>>>> 
>>>>>> Hao
>>> 
>>> <configure.log><configure.log>
>> 
> 


From aan2 at princeton.edu  Mon Feb 10 11:09:22 2020
From: aan2 at princeton.edu (Olek Niewiarowski)
Date: Mon, 10 Feb 2020 17:09:22 +0000
Subject: [petsc-users] Implementing the Sherman Morisson formula (low
 rank update) in petsc4py and FEniCS?
In-Reply-To: <35929586-4D5D-4B31-A34E-8D8D266FEA0A@mcs.anl.gov>
References: <MN2PR04MB6957EBDB8DBE35C947805DC08A030@MN2PR04MB6957.namprd04.prod.outlook.com>
	<C5580EE8-E585-4570-9266-073A860AB975@anl.gov>
	<CAMYG4Gm4DZB_Q2iQPKFz7SVu1Q7Cg5mN-pnWNY=szjaNGuGjcw@mail.gmail.com>
	<20E8B18C-F71E-4B10-958B-6CD24DA869A3@mcs.anl.gov>
	<MN2PR04MB6957AFB32FE45626A84C4CFB8A1D0@MN2PR04MB6957.namprd04.prod.outlook.com>
	<CAMYG4GkQxHwJh9YoarD=XZb43CHhnsYG=6cM6Ng+EacQrFWsRg@mail.gmail.com>
	<MN2PR04MB6957CF4A0B62376B974A5C778A1D0@MN2PR04MB6957.namprd04.prod.outlook.com>,
	<35929586-4D5D-4B31-A34E-8D8D266FEA0A@mcs.anl.gov>
Message-ID: <MN2PR04MB695757C95DAE14B8C391ED3E8A190@MN2PR04MB6957.namprd04.prod.outlook.com>

Barry,
Thank you for your help and detailed suggestions. I will try to implement what you proposed and will follow-up with any questions. In the meantime, I just want to make sure I understand the use of SNESSetPicard:
r       - vector to store function value
b       - function evaluation routine    - my F(u) function
Amat    - matrix with which A(x) x - b(x) is to be computed  - a MatCreateLRC() -- what's the best way of passing in scalar k?
Pmat    - matrix from which preconditioner is computed (usually the same as Amat) - a regular Mat()
J       - function to compute matrix value, see SNESJacobianFunction<https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESJacobianFunction.html#SNESJacobianFunction> for details on its calling sequence --  computes K + kaa'

By the way, the manual page states "we do not recommend using this routine. It is far better to provide the nonlinear function F() and some approximation to the Jacobian and use an approximate Newton solver :-)"

Thanks again!

Alexander (Olek) Niewiarowski
PhD Candidate, Civil & Environmental Engineering
Princeton University, 2020
Cell: +1 (610) 393-2978
________________________________
From: Smith, Barry F. <bsmith at mcs.anl.gov>
Sent: Thursday, February 6, 2020 13:25
To: Olek Niewiarowski <aan2 at princeton.edu>
Cc: Matthew Knepley <knepley at gmail.com>; petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Implementing the Sherman Morisson formula (low rank update) in petsc4py and FEniCS?


   If I remember your problem is K(u) + kaa' = F(u)

   You should start by creating a SNES and calling SNESSetPicard. Read its manual page.  Your matrix should be a MatCreateLRC() for the Mat argument to SNESSetPicard and the Peat should be just your K matrix.

    If you run with -ksp_fd_operator -pc_type lu will be using    K to precondition K + kaa' + d F(U)/dU  . Newton's method should converge at quadratic order. You can use -ksp_fd_operator -pc_type anything else to use an iterative linear solver as the preconditioner of K.

    If you really want to use Sherman Morisson formula  then you would create a PC shell and do

typedef struct{
   KSP innerksp;
   Vec u_1,u_2;
} YourStruct;

     SNESGetKSP(&ksp);
     KSPGetPC(&pc);
     PCSetType(pc,PCSHELL);
     PCShellSetApply(pc,YourPCApply)
     PetscMemclear(yourstruct,si
     PCShellSetContext(pc,&yourstruct);

     Then

      YourPCApply(PC pc,Vec in, Vec out)
{
      YourStruct *yourstruct;

       PCShellGetContext(pc,(void**)&yourstruct)

       if (!yourstruct->ksp) {
         PCCreate(comm,&yourstruct->ksp);
         KSPSetPrefix(yourstruct->ksp,"yourpc_");
         Mat A,B;
         KSPGetOperators(ksp,&A,&B);
         KSPSetOperators(yourstruct->ksp,A,B);
         create work vectors
       }
       Apply the solve as you do for the linear case with Sherman Morisson formula
}

This you can run with for example -yourpc_pc_type cholesky

   Barry

Looks complicated,  conceptually simple.


> 2) Call KSPSetOperators(ksp, K, K,)
>
>   3) Solve the first system KSPSolve(ksp, -F, u_1)
>
>   4) Solve the second system KSPSolve(ksp, a, u_2)
>
>   5) Calculate beta VecDot(a, u_2, &gamma); beta = 1./(1. + k*gamma);
>
>   6) Update the guess VecDot(u_2, F, &delta); VecAXPBYPCZ(u, 1.0, beta*delta, 1.0, u_1, u_2)

No

> On Feb 6, 2020, at 9:02 AM, Olek Niewiarowski <aan2 at princeton.edu> wrote:
>
> Hi Matt,
>
> What you suggested in your last email was exactly what I did on my very first attempt at the problem, and while it "worked," convergence was not satisfactory due to the Newton step being fixed in step 6. This is the reason I would like to use the linesearch in SNES instead. Indeed in your manual you "recommend most PETSc users work directly with SNES, rather than using PETSc for the linear problem within a nonlinear solver."  Ideally I'd like to create a SNES solver, pass in the functions to evaluate K, F, a, and k, and set up the underlying KSP object as in my first message. Is this possible?
>
> Thanks,
>
> Alexander (Olek) Niewiarowski
> PhD Candidate, Civil & Environmental Engineering
> Princeton University, 2020
> Cell: +1 (610) 393-2978
> From: Matthew Knepley <knepley at gmail.com>
> Sent: Thursday, February 6, 2020 5:33
> To: Olek Niewiarowski <aan2 at princeton.edu>
> Cc: Smith, Barry F. <bsmith at mcs.anl.gov>; petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> Subject: Re: [petsc-users] Implementing the Sherman Morisson formula (low rank update) in petsc4py and FEniCS?
>
> On Wed, Feb 5, 2020 at 8:53 PM Olek Niewiarowski <aan2 at princeton.edu> wrote:
> Hi Barry and Matt,
>
> Thank you for your input and for creating a new issue in the repo.
> My initial question was more basic (how to configure the SNES's KSP solver as in my first message with a and k), but now I see there's more to the implementation. To reiterate, for my problem's structure, a good solution algorithm (on the algebraic level) is the following "double back-substitution":
> For each nonlinear iteration:
>        ? define intermediate vectors u_1 and u_2
>        ? solve Ku_1 =  -F --> u_1 = -K^{-1}F (this solve is cheap, don't actually need K^-1)
>        ? solve Ku_2 = -a --> u_2 = -K^{-1}a (ditto)
>        ? define \beta = 1/(1 + k a^Tu_2)
>        ? \Delta u = u_1 + \beta k u_2^T F u_2
>        ? u = u + \Delta u
> This is very easy to setup:
>
>   1) Create a KSP object KSPCreate(comm, &ksp)
>
>   2) Call KSPSetOperators(ksp, K, K,)
>
>   3) Solve the first system KSPSolve(ksp, -F, u_1)
>
>   4) Solve the second system KSPSolve(ksp, a, u_2)
>
>   5) Calculate beta VecDot(a, u_2, &gamma); beta = 1./(1. + k*gamma);
>
>   6) Update the guess VecDot(u_2, F, &delta); VecAXPBYPCZ(u, 1.0, beta*delta, 1.0, u_1, u_2)
>
> Thanks,
>
>     Matt
>
> I don't need the Jacobian inverse, [K?kaaT]-1 = K-1  - (kK-1 aaTK-1)/(1+kaTK-1a) just the solution ?u = [K?kaaT]-1F = K-1F - (kK-1 aFK-1a)/(1 + kaTK-1a)
> = u_1 + beta k u_2^T F u_2  (so I never need to invert K either). (To Matt's point on gitlab, K is a symmetric sparse matrix arising from a bilinear form. ) Also, eventually, I want to have more than one low-rank updates to K, but again, Sherman Morrisson Woodbury should still work.
>
> Being new to PETSc, I don't know if this algorithm directly translates into an efficient numerical solver. (I'm also not sure if Picard iteration would be useful here.) What would it take to set up the KSP solver in SNES like I did below? Is it possible "out of the box"?  I looked at MatCreateLRC() - what would I pass this to? (A pointer to demo/tutorial would be very appreciated.) If there's a better way to go about all of this, I'm open to any and all ideas. My only limitation is that I use petsc4py exclusively since I/future users of my code will not be comfortable with C.
>
> Thanks again for your help!
>
>
> Alexander (Olek) Niewiarowski
> PhD Candidate, Civil & Environmental Engineering
> Princeton University, 2020
> Cell: +1 (610) 393-2978
> From: Smith, Barry F. <bsmith at mcs.anl.gov>
> Sent: Wednesday, February 5, 2020 15:46
> To: Matthew Knepley <knepley at gmail.com>
> Cc: Olek Niewiarowski <aan2 at princeton.edu>; petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> Subject: Re: [petsc-users] Implementing the Sherman Morisson formula (low rank update) in petsc4py and FEniCS?
>
>
> https://gitlab.com/petsc/petsc/issues/557
>
>
> > On Feb 5, 2020, at 7:35 AM, Matthew Knepley <knepley at gmail.com> wrote:
> >
> > Perhaps Barry is right that you want Picard, but suppose you really want Newton.
> >
> > "This problem can be solved efficiently using the Sherman-Morrison formula" Well, maybe. The main assumption here is that inverting K is cheap. I see two things you can do in a straightforward way:
> >
> >   1) Use MatCreateLRC() https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateLRC.html to create the Jacobian
> >        and solve using an iterative method. If you pass just K was the preconditioning matrix, you can use common PCs.
> >
> >   2) We only implemented MatMult() for LRC, but you could stick your SMW code in for MatSolve_LRC if you really want to factor K. We would
> >        of course help you do this.
> >
> >   Thanks,
> >
> >      Matt
> >
> > On Wed, Feb 5, 2020 at 1:36 AM Smith, Barry F. via petsc-users <petsc-users at mcs.anl.gov> wrote:
> >
> >    I am not sure of everything in your email but it sounds like you want to use a "Picard" iteration to solve [K(u)?kaaT]?u=?F(u). That is solve
> >
> >   A(u^{n}) (u^{n+1} - u^{n}) = F(u^{n}) - A(u^{n})u^{n}  where A(u) = K(u) - kaaT
> >
> >  PETSc provides code to this with SNESSetPicard() (see the manual pages) I don't know if Petsc4py has bindings for this.
> >
> >   Adding missing python bindings is not terribly difficult and you may be able to do it yourself if this is the approach you want.
> >
> >    Barry
> >
> >
> >
> > > On Feb 4, 2020, at 5:07 PM, Olek Niewiarowski <aan2 at princeton.edu> wrote:
> > >
> > > Hello,
> > > I am a FEniCS user but new to petsc4py. I am trying to modify the KSP solver through the SNES object to implement the Sherman-Morrison formula(e.g.  http://fourier.eng.hmc.edu/e176/lectures/algebra/node6.html ). I am solving a nonlinear system of the form [K(u)?kaaT]?u=?F(u). Here the jacobian matrix K is modified by the term kaaT, where k is a scalar.  Notably, K is sparse, while the term kaaT results in a full matrix. This problem can be solved efficiently using the Sherman-Morrison formula :
> > >
> > > [K?kaaT]-1 = K-1  - (kK-1 aaTK-1)/(1+kaTK-1a)
> > > I have managed to successfully implement this at the linear solve level (by modifying the KSP solver) inside a custom Newton solver in python by following an incomplete tutorial at https://www.firedrakeproject.org/petsc-interface.html#defining-a-preconditioner :
> > > ?             while (norm(delU) > alpha):  # while not converged
> > > ?
> > > ?                 self.update_F()  # call to method to update r.h.s form
> > > ?                 self.update_K()  # call to update the jacobian form
> > > ?                 K = assemble(self.K)  # assemble the jacobian matrix
> > > ?                 F = assemble(self.F)  # assemble the r.h.s vector
> > > ?                 a = assemble(self.a_form)  # assemble the a_form (see Sherman Morrison formula)
> > > ?
> > > ?                 for bc in self.mem.bc:  # apply boundary conditions
> > > ?                     bc.apply(K, F)
> > > ?                     bc.apply(K, a)
> > > ?
> > > ?                 B = PETSc.Mat().create()
> > > ?
> > > ?                 # Assemble the bilinear form that defines A and get the concrete
> > > ?                 # PETSc matrix
> > > ?                 A = as_backend_type(K).mat()  # get the PETSc objects for K and a
> > > ?                 u = as_backend_type(a).vec()
> > > ?
> > > ?                 # Build the matrix "context"  # see firedrake docs
> > > ?                 Bctx = MatrixFreeB(A, u, u, self.k)
> > > ?
> > > ?                 # Set up B
> > > ?                 # B is the same size as A
> > > ?                 B.setSizes(*A.getSizes())
> > > ?                 B.setType(B.Type.PYTHON)
> > > ?                 B.setPythonContext(Bctx)
> > > ?                 B.setUp()
> > > ?
> > > ?
> > > ?                 ksp = PETSc.KSP().create()   # create the KSP linear solver object
> > > ?                 ksp.setOperators(B)
> > > ?                 ksp.setUp()
> > > ?                 pc = ksp.pc
> > > ?                 pc.setType(pc.Type.PYTHON)
> > > ?                 pc.setPythonContext(MatrixFreePC())
> > > ?                 ksp.setFromOptions()
> > > ?
> > > ?                 solution = delU    # the incremental displacement at this iteration
> > > ?
> > > ?                 b = as_backend_type(-F).vec()
> > > ?                 delu = solution.vector().vec()
> > > ?
> > > ?                 ksp.solve(b, delu)
> > >
> > > ?                 self.mem.u.vector().axpy(0.25, self.delU.vector())  # poor man's linesearch
> > > ?                 counter += 1
> > > Here is the corresponding petsc4py code adapted from the firedrake docs:
> > >
> > >       ? class MatrixFreePC(object):
> > >       ?
> > >       ?     def setUp(self, pc):
> > >       ?         B, P = pc.getOperators()
> > >       ?         # extract the MatrixFreeB object from B
> > >       ?         ctx = B.getPythonContext()
> > >       ?         self.A = ctx.A
> > >       ?         self.u = ctx.u
> > >       ?         self.v = ctx.v
> > >       ?         self.k = ctx.k
> > >       ?         # Here we build the PC object that uses the concrete,
> > >       ?         # assembled matrix A.  We will use this to apply the action
> > >       ?         # of A^{-1}
> > >       ?         self.pc = PETSc.PC().create()
> > >       ?         self.pc.setOptionsPrefix("mf_")
> > >       ?         self.pc.setOperators(self.A)
> > >       ?         self.pc.setFromOptions()
> > >       ?         # Since u and v do not change, we can build the denominator
> > >       ?         # and the action of A^{-1} on u only once, in the setup
> > >       ?         # phase.
> > >       ?         tmp = self.A.createVecLeft()
> > >       ?         self.pc.apply(self.u, tmp)
> > >       ?         self._Ainvu = tmp
> > >       ?         self._denom = 1 + self.k*self.v.dot(self._Ainvu)
> > >       ?
> > >       ?     def apply(self, pc, x, y):
> > >       ?         # y <- A^{-1}x
> > >       ?         self.pc.apply(x, y)
> > >       ?         # alpha <- (v^T A^{-1} x) / (1 + v^T A^{-1} u)
> > >       ?         alpha = (self.k*self.v.dot(y)) / self._denom
> > >       ?         # y <- y - alpha * A^{-1}u
> > >       ?         y.axpy(-alpha, self._Ainvu)
> > >       ?
> > >       ?
> > >       ? class MatrixFreeB(object):
> > >       ?
> > >       ?     def __init__(self, A, u, v, k):
> > >       ?         self.A = A
> > >       ?         self.u = u
> > >       ?         self.v = v
> > >       ?         self.k = k
> > >       ?
> > >       ?     def mult(self, mat, x, y):
> > >       ?         # y <- A x
> > >       ?         self.A.mult(x, y)
> > >       ?
> > >       ?         # alpha <- v^T x
> > >       ?         alpha = self.v.dot(x)
> > >       ?
> > >       ?         # y <- y + alpha*u
> > >       ?         y.axpy(alpha, self.u)
> > > However, this approach is not efficient as it requires many iterations due to the Newton step being fixed, so I would like to implement it using SNES and use line search. Unfortunately, I have not been able to find any documentation/tutorial on how to do so. Provided I have the FEniCS forms for F, K, and a, I'd like to do something along the lines of:
> > > solver  = PETScSNESSolver() # the FEniCS SNES wrapper
> > > snes = solver.snes()  # the petsc4py SNES object
> > > ## ??
> > > ksp = snes.getKSP()
> > >  # set ksp option similar to above
> > > solver.solve()
> > >
> > > I would be very grateful if anyone could could help or point me to a reference or demo that does something similar (or maybe a completely different way of solving the problem!).
> > > Many thanks in advance!
> > > Alex
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> > -- Norbert Wiener
> >
> > https://www.cse.buffalo.edu/~knepley/
>
>
>
> --
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200210/7e1d41da/attachment-0001.html>

From knepley at gmail.com  Mon Feb 10 11:13:05 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 10 Feb 2020 09:13:05 -0800
Subject: [petsc-users] Implementing the Sherman Morisson formula (low
 rank update) in petsc4py and FEniCS?
In-Reply-To: <MN2PR04MB695757C95DAE14B8C391ED3E8A190@MN2PR04MB6957.namprd04.prod.outlook.com>
References: <MN2PR04MB6957EBDB8DBE35C947805DC08A030@MN2PR04MB6957.namprd04.prod.outlook.com>
	<C5580EE8-E585-4570-9266-073A860AB975@anl.gov>
	<CAMYG4Gm4DZB_Q2iQPKFz7SVu1Q7Cg5mN-pnWNY=szjaNGuGjcw@mail.gmail.com>
	<20E8B18C-F71E-4B10-958B-6CD24DA869A3@mcs.anl.gov>
	<MN2PR04MB6957AFB32FE45626A84C4CFB8A1D0@MN2PR04MB6957.namprd04.prod.outlook.com>
	<CAMYG4GkQxHwJh9YoarD=XZb43CHhnsYG=6cM6Ng+EacQrFWsRg@mail.gmail.com>
	<MN2PR04MB6957CF4A0B62376B974A5C778A1D0@MN2PR04MB6957.namprd04.prod.outlook.com>
	<35929586-4D5D-4B31-A34E-8D8D266FEA0A@mcs.anl.gov>
	<MN2PR04MB695757C95DAE14B8C391ED3E8A190@MN2PR04MB6957.namprd04.prod.outlook.com>
Message-ID: <CAMYG4G=BAU47Pzci2J8mPzJKRc8E1mEBf8mUuWdrCtMvpuNN6w@mail.gmail.com>

On Mon, Feb 10, 2020 at 9:09 AM Olek Niewiarowski <aan2 at princeton.edu>
wrote:

> Barry,
> Thank you for your help and detailed suggestions. I will try to implement
> what you proposed and will follow-up with any questions. In the meantime, I
> just want to make sure I understand the use of SNESSetPicard:
> *r* - vector to store function value
> *b* - function evaluation routine    - *my F(u) function *
> *Amat* - matrix with which A(x) x - b(x) is to be computed  -* a
> MatCreateLRC() -- what's the best way of passing in scalar k?*
> *Pmat* - matrix from which preconditioner is computed (usually the same
> as Amat) - * a regular Mat()*
> *J* - function to compute matrix value, see SNESJacobianFunction
> <https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESJacobianFunction.html#SNESJacobianFunction>
>  for details on its calling sequence -- * computes K + kaa' *
> By the way, the manual page states "we do not recommend using this
> routine. It is far better to provide the nonlinear function F() and some
> approximation to the Jacobian and use an approximate Newton solver :-)"
>

This is still correct. Barry is suggesting Picard since the implementation
for you is a little bit simpler. However, the Picard solver
is missing the K' piece of the Jacobian, so it will only ever convergence
linearly, as opposed to quadratically for Newton in the convergence
basin. However, once you have Picard going, it will be simple to switch to
Newton by just providing that extra piece.

  Thanks,

     Matt


> Thanks again!
>
> *Alexander (Olek) Niewiarowski*
> PhD Candidate, Civil & Environmental Engineering
> Princeton University, 2020
> Cell: +1 (610) 393-2978
> ------------------------------
> *From:* Smith, Barry F. <bsmith at mcs.anl.gov>
> *Sent:* Thursday, February 6, 2020 13:25
> *To:* Olek Niewiarowski <aan2 at princeton.edu>
> *Cc:* Matthew Knepley <knepley at gmail.com>; petsc-users at mcs.anl.gov <
> petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Implementing the Sherman Morisson formula
> (low rank update) in petsc4py and FEniCS?
>
>
>    If I remember your problem is K(u) + kaa' = F(u)
>
>    You should start by creating a SNES and calling SNESSetPicard. Read its
> manual page.  Your matrix should be a MatCreateLRC() for the Mat argument
> to SNESSetPicard and the Peat should be just your K matrix.
>
>     If you run with -ksp_fd_operator -pc_type lu will be using    K to
> precondition K + kaa' + d F(U)/dU  . Newton's method should converge at
> quadratic order. You can use -ksp_fd_operator -pc_type anything else to use
> an iterative linear solver as the preconditioner of K.
>
>     If you really want to use Sherman Morisson formula  then you would
> create a PC shell and do
>
> typedef struct{
>    KSP innerksp;
>    Vec u_1,u_2;
> } YourStruct;
>
>      SNESGetKSP(&ksp);
>      KSPGetPC(&pc);
>      PCSetType(pc,PCSHELL);
>      PCShellSetApply(pc,YourPCApply)
>      PetscMemclear(yourstruct,si
>      PCShellSetContext(pc,&yourstruct);
>
>      Then
>
>       YourPCApply(PC pc,Vec in, Vec out)
> {
>       YourStruct *yourstruct;
>
>        PCShellGetContext(pc,(void**)&yourstruct)
>
>        if (!yourstruct->ksp) {
>          PCCreate(comm,&yourstruct->ksp);
>          KSPSetPrefix(yourstruct->ksp,"yourpc_");
>          Mat A,B;
>          KSPGetOperators(ksp,&A,&B);
>          KSPSetOperators(yourstruct->ksp,A,B);
>          create work vectors
>        }
>        Apply the solve as you do for the linear case with Sherman Morisson
> formula
> }
>
> This you can run with for example -yourpc_pc_type cholesky
>
>    Barry
>
> Looks complicated,  conceptually simple.
>
>
> > 2) Call KSPSetOperators(ksp, K, K,)
> >
> >   3) Solve the first system KSPSolve(ksp, -F, u_1)
> >
> >   4) Solve the second system KSPSolve(ksp, a, u_2)
> >
> >   5) Calculate beta VecDot(a, u_2, &gamma); beta = 1./(1. + k*gamma);
> >
> >   6) Update the guess VecDot(u_2, F, &delta); VecAXPBYPCZ(u, 1.0,
> beta*delta, 1.0, u_1, u_2)
>
> No
>
> > On Feb 6, 2020, at 9:02 AM, Olek Niewiarowski <aan2 at princeton.edu>
> wrote:
> >
> > Hi Matt,
> >
> > What you suggested in your last email was exactly what I did on my very
> first attempt at the problem, and while it "worked," convergence was not
> satisfactory due to the Newton step being fixed in step 6. This is the
> reason I would like to use the linesearch in SNES instead. Indeed in your
> manual you "recommend most PETSc users work directly with SNES, rather than
> using PETSc for the linear problem within a nonlinear solver."  Ideally I'd
> like to create a SNES solver, pass in the functions to evaluate K, F, a,
> and k, and set up the underlying KSP object as in my first message. Is this
> possible?
> >
> > Thanks,
> >
> > Alexander (Olek) Niewiarowski
> > PhD Candidate, Civil & Environmental Engineering
> > Princeton University, 2020
> > Cell: +1 (610) 393-2978
> > From: Matthew Knepley <knepley at gmail.com>
> > Sent: Thursday, February 6, 2020 5:33
> > To: Olek Niewiarowski <aan2 at princeton.edu>
> > Cc: Smith, Barry F. <bsmith at mcs.anl.gov>; petsc-users at mcs.anl.gov <
> petsc-users at mcs.anl.gov>
> > Subject: Re: [petsc-users] Implementing the Sherman Morisson formula
> (low rank update) in petsc4py and FEniCS?
> >
> > On Wed, Feb 5, 2020 at 8:53 PM Olek Niewiarowski <aan2 at princeton.edu>
> wrote:
> > Hi Barry and Matt,
> >
> > Thank you for your input and for creating a new issue in the repo.
> > My initial question was more basic (how to configure the SNES's KSP
> solver as in my first message with a and k), but now I see there's more to
> the implementation. To reiterate, for my problem's structure, a good
> solution algorithm (on the algebraic level) is the following "double
> back-substitution":
> > For each nonlinear iteration:
> >        ? define intermediate vectors u_1 and u_2
> >        ? solve Ku_1 =  -F --> u_1 = -K^{-1}F (this solve is cheap, don't
> actually need K^-1)
> >        ? solve Ku_2 = -a --> u_2 = -K^{-1}a (ditto)
> >        ? define \beta = 1/(1 + k a^Tu_2)
> >        ? \Delta u = u_1 + \beta k u_2^T F u_2
> >        ? u = u + \Delta u
> > This is very easy to setup:
> >
> >   1) Create a KSP object KSPCreate(comm, &ksp)
> >
> >   2) Call KSPSetOperators(ksp, K, K,)
> >
> >   3) Solve the first system KSPSolve(ksp, -F, u_1)
> >
> >   4) Solve the second system KSPSolve(ksp, a, u_2)
> >
> >   5) Calculate beta VecDot(a, u_2, &gamma); beta = 1./(1. + k*gamma);
> >
> >   6) Update the guess VecDot(u_2, F, &delta); VecAXPBYPCZ(u, 1.0,
> beta*delta, 1.0, u_1, u_2)
> >
> > Thanks,
> >
> >     Matt
> >
> > I don't need the Jacobian inverse, [K?kaaT]-1 = K-1  - (kK-1
> aaTK-1)/(1+kaTK-1a) just the solution ?u = [K?kaaT]-1F = K-1F - (kK-1
> aFK-1a)/(1 + kaTK-1a)
> > = u_1 + beta k u_2^T F u_2  (so I never need to invert K either). (To
> Matt's point on gitlab, K is a symmetric sparse matrix arising from a
> bilinear form. ) Also, eventually, I want to have more than one low-rank
> updates to K, but again, Sherman Morrisson Woodbury should still work.
> >
> > Being new to PETSc, I don't know if this algorithm directly translates
> into an efficient numerical solver. (I'm also not sure if Picard iteration
> would be useful here.) What would it take to set up the KSP solver in SNES
> like I did below? Is it possible "out of the box"?  I looked at
> MatCreateLRC() - what would I pass this to? (A pointer to demo/tutorial
> would be very appreciated.) If there's a better way to go about all of
> this, I'm open to any and all ideas. My only limitation is that I use
> petsc4py exclusively since I/future users of my code will not be
> comfortable with C.
> >
> > Thanks again for your help!
> >
> >
> > Alexander (Olek) Niewiarowski
> > PhD Candidate, Civil & Environmental Engineering
> > Princeton University, 2020
> > Cell: +1 (610) 393-2978
> > From: Smith, Barry F. <bsmith at mcs.anl.gov>
> > Sent: Wednesday, February 5, 2020 15:46
> > To: Matthew Knepley <knepley at gmail.com>
> > Cc: Olek Niewiarowski <aan2 at princeton.edu>; petsc-users at mcs.anl.gov <
> petsc-users at mcs.anl.gov>
> > Subject: Re: [petsc-users] Implementing the Sherman Morisson formula
> (low rank update) in petsc4py and FEniCS?
> >
> >
> > https://gitlab.com/petsc/petsc/issues/557
> >
> >
> > > On Feb 5, 2020, at 7:35 AM, Matthew Knepley <knepley at gmail.com> wrote:
> > >
> > > Perhaps Barry is right that you want Picard, but suppose you really
> want Newton.
> > >
> > > "This problem can be solved efficiently using the Sherman-Morrison
> formula" Well, maybe. The main assumption here is that inverting K is
> cheap. I see two things you can do in a straightforward way:
> > >
> > >   1) Use MatCreateLRC()
> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateLRC.html
> to create the Jacobian
> > >        and solve using an iterative method. If you pass just K was the
> preconditioning matrix, you can use common PCs.
> > >
> > >   2) We only implemented MatMult() for LRC, but you could stick your
> SMW code in for MatSolve_LRC if you really want to factor K. We would
> > >        of course help you do this.
> > >
> > >   Thanks,
> > >
> > >      Matt
> > >
> > > On Wed, Feb 5, 2020 at 1:36 AM Smith, Barry F. via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
> > >
> > >    I am not sure of everything in your email but it sounds like you
> want to use a "Picard" iteration to solve [K(u)?kaaT]?u=?F(u). That is solve
> > >
> > >   A(u^{n}) (u^{n+1} - u^{n}) = F(u^{n}) - A(u^{n})u^{n}  where A(u) =
> K(u) - kaaT
> > >
> > >  PETSc provides code to this with SNESSetPicard() (see the manual
> pages) I don't know if Petsc4py has bindings for this.
> > >
> > >   Adding missing python bindings is not terribly difficult and you may
> be able to do it yourself if this is the approach you want.
> > >
> > >    Barry
> > >
> > >
> > >
> > > > On Feb 4, 2020, at 5:07 PM, Olek Niewiarowski <aan2 at princeton.edu>
> wrote:
> > > >
> > > > Hello,
> > > > I am a FEniCS user but new to petsc4py. I am trying to modify the
> KSP solver through the SNES object to implement the Sherman-Morrison
> formula(e.g.  http://fourier.eng.hmc.edu/e176/lectures/algebra/node6.html
> ). I am solving a nonlinear system of the form [K(u)?kaaT]?u=?F(u). Here
> the jacobian matrix K is modified by the term kaaT, where k is a scalar.
> Notably, K is sparse, while the term kaaT results in a full matrix. This
> problem can be solved efficiently using the Sherman-Morrison formula :
> > > >
> > > > [K?kaaT]-1 = K-1  - (kK-1 aaTK-1)/(1+kaTK-1a)
> > > > I have managed to successfully implement this at the linear solve
> level (by modifying the KSP solver) inside a custom Newton solver in python
> by following an incomplete tutorial at
> https://www.firedrakeproject.org/petsc-interface.html#defining-a-preconditioner
> :
> > > > ?             while (norm(delU) > alpha):  # while not converged
> > > > ?
> > > > ?                 self.update_F()  # call to method to update r.h.s
> form
> > > > ?                 self.update_K()  # call to update the jacobian form
> > > > ?                 K = assemble(self.K)  # assemble the jacobian
> matrix
> > > > ?                 F = assemble(self.F)  # assemble the r.h.s vector
> > > > ?                 a = assemble(self.a_form)  # assemble the a_form
> (see Sherman Morrison formula)
> > > > ?
> > > > ?                 for bc in self.mem.bc:  # apply boundary conditions
> > > > ?                     bc.apply(K, F)
> > > > ?                     bc.apply(K, a)
> > > > ?
> > > > ?                 B = PETSc.Mat().create()
> > > > ?
> > > > ?                 # Assemble the bilinear form that defines A and
> get the concrete
> > > > ?                 # PETSc matrix
> > > > ?                 A = as_backend_type(K).mat()  # get the PETSc
> objects for K and a
> > > > ?                 u = as_backend_type(a).vec()
> > > > ?
> > > > ?                 # Build the matrix "context"  # see firedrake docs
> > > > ?                 Bctx = MatrixFreeB(A, u, u, self.k)
> > > > ?
> > > > ?                 # Set up B
> > > > ?                 # B is the same size as A
> > > > ?                 B.setSizes(*A.getSizes())
> > > > ?                 B.setType(B.Type.PYTHON)
> > > > ?                 B.setPythonContext(Bctx)
> > > > ?                 B.setUp()
> > > > ?
> > > > ?
> > > > ?                 ksp = PETSc.KSP().create()   # create the KSP
> linear solver object
> > > > ?                 ksp.setOperators(B)
> > > > ?                 ksp.setUp()
> > > > ?                 pc = ksp.pc
> > > > ?                 pc.setType(pc.Type.PYTHON)
> > > > ?                 pc.setPythonContext(MatrixFreePC())
> > > > ?                 ksp.setFromOptions()
> > > > ?
> > > > ?                 solution = delU    # the incremental displacement
> at this iteration
> > > > ?
> > > > ?                 b = as_backend_type(-F).vec()
> > > > ?                 delu = solution.vector().vec()
> > > > ?
> > > > ?                 ksp.solve(b, delu)
> > > >
> > > > ?                 self.mem.u.vector().axpy(0.25,
> self.delU.vector())  # poor man's linesearch
> > > > ?                 counter += 1
> > > > Here is the corresponding petsc4py code adapted from the firedrake
> docs:
> > > >
> > > >       ? class MatrixFreePC(object):
> > > >       ?
> > > >       ?     def setUp(self, pc):
> > > >       ?         B, P = pc.getOperators()
> > > >       ?         # extract the MatrixFreeB object from B
> > > >       ?         ctx = B.getPythonContext()
> > > >       ?         self.A = ctx.A
> > > >       ?         self.u = ctx.u
> > > >       ?         self.v = ctx.v
> > > >       ?         self.k = ctx.k
> > > >       ?         # Here we build the PC object that uses the
> concrete,
> > > >       ?         # assembled matrix A.  We will use this to apply the
> action
> > > >       ?         # of A^{-1}
> > > >       ?         self.pc = PETSc.PC().create()
> > > >       ?         self.pc.setOptionsPrefix("mf_")
> > > >       ?         self.pc.setOperators(self.A)
> > > >       ?         self.pc.setFromOptions()
> > > >       ?         # Since u and v do not change, we can build the
> denominator
> > > >       ?         # and the action of A^{-1} on u only once, in the
> setup
> > > >       ?         # phase.
> > > >       ?         tmp = self.A.createVecLeft()
> > > >       ?         self.pc.apply(self.u, tmp)
> > > >       ?         self._Ainvu = tmp
> > > >       ?         self._denom = 1 + self.k*self.v.dot(self._Ainvu)
> > > >       ?
> > > >       ?     def apply(self, pc, x, y):
> > > >       ?         # y <- A^{-1}x
> > > >       ?         self.pc.apply(x, y)
> > > >       ?         # alpha <- (v^T A^{-1} x) / (1 + v^T A^{-1} u)
> > > >       ?         alpha = (self.k*self.v.dot(y)) / self._denom
> > > >       ?         # y <- y - alpha * A^{-1}u
> > > >       ?         y.axpy(-alpha, self._Ainvu)
> > > >       ?
> > > >       ?
> > > >       ? class MatrixFreeB(object):
> > > >       ?
> > > >       ?     def __init__(self, A, u, v, k):
> > > >       ?         self.A = A
> > > >       ?         self.u = u
> > > >       ?         self.v = v
> > > >       ?         self.k = k
> > > >       ?
> > > >       ?     def mult(self, mat, x, y):
> > > >       ?         # y <- A x
> > > >       ?         self.A.mult(x, y)
> > > >       ?
> > > >       ?         # alpha <- v^T x
> > > >       ?         alpha = self.v.dot(x)
> > > >       ?
> > > >       ?         # y <- y + alpha*u
> > > >       ?         y.axpy(alpha, self.u)
> > > > However, this approach is not efficient as it requires many
> iterations due to the Newton step being fixed, so I would like to implement
> it using SNES and use line search. Unfortunately, I have not been able to
> find any documentation/tutorial on how to do so. Provided I have the FEniCS
> forms for F, K, and a, I'd like to do something along the lines of:
> > > > solver  = PETScSNESSolver() # the FEniCS SNES wrapper
> > > > snes = solver.snes()  # the petsc4py SNES object
> > > > ## ??
> > > > ksp = snes.getKSP()
> > > >  # set ksp option similar to above
> > > > solver.solve()
> > > >
> > > > I would be very grateful if anyone could could help or point me to a
> reference or demo that does something similar (or maybe a completely
> different way of solving the problem!).
> > > > Many thanks in advance!
> > > > Alex
> > >
> > >
> > >
> > > --
> > > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> > > -- Norbert Wiener
> > >
> > > https://www.cse.buffalo.edu/~knepley/
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> > -- Norbert Wiener
> >
> > https://www.cse.buffalo.edu/~knepley/
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200210/4063f336/attachment-0001.html>

From jed at jedbrown.org  Mon Feb 10 11:16:26 2020
From: jed at jedbrown.org (Jed Brown)
Date: Mon, 10 Feb 2020 10:16:26 -0700
Subject: [petsc-users] Implementing the Sherman Morisson formula (low
 rank update) in petsc4py and FEniCS?
In-Reply-To: <MN2PR04MB695757C95DAE14B8C391ED3E8A190@MN2PR04MB6957.namprd04.prod.outlook.com>
References: <MN2PR04MB6957EBDB8DBE35C947805DC08A030@MN2PR04MB6957.namprd04.prod.outlook.com>
	<C5580EE8-E585-4570-9266-073A860AB975@anl.gov>
	<CAMYG4Gm4DZB_Q2iQPKFz7SVu1Q7Cg5mN-pnWNY=szjaNGuGjcw@mail.gmail.com>
	<20E8B18C-F71E-4B10-958B-6CD24DA869A3@mcs.anl.gov>
	<MN2PR04MB6957AFB32FE45626A84C4CFB8A1D0@MN2PR04MB6957.namprd04.prod.outlook.com>
	<CAMYG4GkQxHwJh9YoarD=XZb43CHhnsYG=6cM6Ng+EacQrFWsRg@mail.gmail.com>
	<MN2PR04MB6957CF4A0B62376B974A5C778A1D0@MN2PR04MB6957.namprd04.prod.outlook.com>
	<35929586-4D5D-4B31-A34E-8D8D266FEA0A@mcs.anl.gov>
	<MN2PR04MB695757C95DAE14B8C391ED3E8A190@MN2PR04MB6957.namprd04.prod.outlook.com>
Message-ID: <875zge8is5.fsf@jedbrown.org>

Olek Niewiarowski <aan2 at princeton.edu> writes:

> Barry,
> Thank you for your help and detailed suggestions. I will try to implement what you proposed and will follow-up with any questions. In the meantime, I just want to make sure I understand the use of SNESSetPicard:
> r       - vector to store function value
> b       - function evaluation routine    - my F(u) function
> Amat    - matrix with which A(x) x - b(x) is to be computed  - a MatCreateLRC() -- what's the best way of passing in scalar k?

Typically via the context argument, similar to any SNES example.

> Pmat    - matrix from which preconditioner is computed (usually the same as Amat) - a regular Mat()
> J       - function to compute matrix value, see SNESJacobianFunction<https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESJacobianFunction.html#SNESJacobianFunction> for details on its calling sequence --  computes K + kaa'
>
> By the way, the manual page states "we do not recommend using this routine. It is far better to provide the nonlinear function F() and some approximation to the Jacobian and use an approximate Newton solver :-)"

Yep, this is mainly for when someone has legacy code to compute a matrix
as a nonlinear function of the state U, but not a matrix-free way to
compute a residual.  Implementing as Newton (with inexact
matrix/preconditioner) is more flexible and often enables faster
convergence.

From bsmith at mcs.anl.gov  Mon Feb 10 13:11:29 2020
From: bsmith at mcs.anl.gov (Smith, Barry F.)
Date: Mon, 10 Feb 2020 19:11:29 +0000
Subject: [petsc-users] Implementing the Sherman Morisson formula (low
 rank update) in petsc4py and FEniCS?
In-Reply-To: <875zge8is5.fsf@jedbrown.org>
References: <MN2PR04MB6957EBDB8DBE35C947805DC08A030@MN2PR04MB6957.namprd04.prod.outlook.com>
	<C5580EE8-E585-4570-9266-073A860AB975@anl.gov>
	<CAMYG4Gm4DZB_Q2iQPKFz7SVu1Q7Cg5mN-pnWNY=szjaNGuGjcw@mail.gmail.com>
	<20E8B18C-F71E-4B10-958B-6CD24DA869A3@mcs.anl.gov>
	<MN2PR04MB6957AFB32FE45626A84C4CFB8A1D0@MN2PR04MB6957.namprd04.prod.outlook.com>
	<CAMYG4GkQxHwJh9YoarD=XZb43CHhnsYG=6cM6Ng+EacQrFWsRg@mail.gmail.com>
	<MN2PR04MB6957CF4A0B62376B974A5C778A1D0@MN2PR04MB6957.namprd04.prod.outlook.com>
	<35929586-4D5D-4B31-A34E-8D8D266FEA0A@mcs.anl.gov>
	<MN2PR04MB695757C95DAE14B8C391ED3E8A190@MN2PR04MB6957.namprd04.prod.outlook.com>
	<875zge8is5.fsf@jedbrown.org>
Message-ID: <A2783B2D-C73B-4EBB-ADAA-4CA5AEBC4B2B@mcs.anl.gov>


  Note that you can add -snes_fd_operator and get Newton's method with a preconditioner built from the Picard matrix.

   Barry


> On Feb 10, 2020, at 11:16 AM, Jed Brown <jed at jedbrown.org> wrote:
> 
> Olek Niewiarowski <aan2 at princeton.edu> writes:
> 
>> Barry,
>> Thank you for your help and detailed suggestions. I will try to implement what you proposed and will follow-up with any questions. In the meantime, I just want to make sure I understand the use of SNESSetPicard:
>> r       - vector to store function value
>> b       - function evaluation routine    - my F(u) function
>> Amat    - matrix with which A(x) x - b(x) is to be computed  - a MatCreateLRC() -- what's the best way of passing in scalar k?
> 
> Typically via the context argument, similar to any SNES example.
> 
>> Pmat    - matrix from which preconditioner is computed (usually the same as Amat) - a regular Mat()
>> J       - function to compute matrix value, see SNESJacobianFunction<https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESJacobianFunction.html#SNESJacobianFunction> for details on its calling sequence --  computes K + kaa'
>> 
>> By the way, the manual page states "we do not recommend using this routine. It is far better to provide the nonlinear function F() and some approximation to the Jacobian and use an approximate Newton solver :-)"
> 
> Yep, this is mainly for when someone has legacy code to compute a matrix
> as a nonlinear function of the state U, but not a matrix-free way to
> compute a residual.  Implementing as Newton (with inexact
> matrix/preconditioner) is more flexible and often enables faster
> convergence.


From mcdanielbt at ornl.gov  Tue Feb 11 15:10:46 2020
From: mcdanielbt at ornl.gov (McDaniel, Tyler)
Date: Tue, 11 Feb 2020 21:10:46 +0000
Subject: [petsc-users] Iterative solvers, MPI+GPU
Message-ID: <BL0PR0901MB4161BC231CFF6DCCCB3D49F8A6180@BL0PR0901MB4161.namprd09.prod.outlook.com>

Hello,


Our team at Oak Ridge National Lab requires a distributed and GPU-enabled (ideally) iterative solver as part of a new, high-dimensional PDE solver. We are exploring options for software packages with this capability vs. rolling our own, i.e. having some of our team members write one.


Our code already has a distributed, GPU-enabled matrix-vector multiply that we'd like to use for the core of GMRES or similar technique.  I've looked through the PETSc API and found that matrix-free methods are supported, and this: https://www.mcs.anl.gov/petsc/features/gpus.html seems to indicate that GPU acceleration is available for iterative solvers.


My question is: does PETSc support all of these things together? E.g. is it possible for me to use a distributed, matrix free iterative solver with a preconditioner shell on the GPU with PETSc?


Best,

Tyler McDaniel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200211/0eef3171/attachment.html>

From jed at jedbrown.org  Tue Feb 11 15:45:57 2020
From: jed at jedbrown.org (Jed Brown)
Date: Tue, 11 Feb 2020 14:45:57 -0700
Subject: [petsc-users] Iterative solvers, MPI+GPU
In-Reply-To: <BL0PR0901MB4161BC231CFF6DCCCB3D49F8A6180@BL0PR0901MB4161.namprd09.prod.outlook.com>
References: <BL0PR0901MB4161BC231CFF6DCCCB3D49F8A6180@BL0PR0901MB4161.namprd09.prod.outlook.com>
Message-ID: <87o8u4vluy.fsf@jedbrown.org>

The short answer is yes, this works great, and your vectors never need
to leave the GPU (except via send/receive buffers that can hit the
network directly with GPU-aware MPI).  If you have a shell
preconditioner, you're all set.  If you want to use PETSc
preconditioners, we have some that run on GPUs, but not all are
well-suited to GPU architectures, and there is ongoing work to improve
performance for some important methods, such as algebraic multigrid (for
which setup is harder than the solve).

"McDaniel, Tyler via petsc-users" <petsc-users at mcs.anl.gov> writes:

> Hello,
>
>
> Our team at Oak Ridge National Lab requires a distributed and GPU-enabled (ideally) iterative solver as part of a new, high-dimensional PDE solver. We are exploring options for software packages with this capability vs. rolling our own, i.e. having some of our team members write one.
>
>
> Our code already has a distributed, GPU-enabled matrix-vector multiply that we'd like to use for the core of GMRES or similar technique.  I've looked through the PETSc API and found that matrix-free methods are supported, and this: https://www.mcs.anl.gov/petsc/features/gpus.html seems to indicate that GPU acceleration is available for iterative solvers.
>
>
> My question is: does PETSc support all of these things together? E.g. is it possible for me to use a distributed, matrix free iterative solver with a preconditioner shell on the GPU with PETSc?
>
>
> Best,
>
> Tyler McDaniel

From reuben.hill10 at imperial.ac.uk  Wed Feb 12 05:53:52 2020
From: reuben.hill10 at imperial.ac.uk (Hill, Reuben)
Date: Wed, 12 Feb 2020 11:53:52 +0000
Subject: [petsc-users] Vertex only unstructured mesh with DMPlex and DMSWARM
Message-ID: <AM6PR06MB597466A0A543DF8CFE701848C41B0@AM6PR06MB5974.eurprd06.prod.outlook.com>

I'm a new Firedrake developer working on getting my head around PETSc. As far as I'm aware, all our PETSc calls are done via petsc4py.

I'm after general help and advise on two fronts:


1:

I?m trying to represent a point cloud as a vertex-only mesh in an attempt to play nicely with the firedrake stack. If I try to do this using firedrake I manage to cause a PETSc seg fault error at the point of calling

PETSc.DMPlex().createFromCellList(dim, cells, coords, comm=comm)

with dim=0, cells=[[0]], coords=[[1., 2.]] and comm=COMM_WORLD.

Output:

[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
[0]PETSC ERROR: to get more information on the crash.
application called MPI_Abort(MPI_COMM_WORLD, 50152059) - process 0

I?m now looking into getting firedrake to make a DMSWARM which seems to have been designed for doing something closer to this and allows nice things such as being able to make the particles (for me - the vertices of the mesh) move and jump between MPI ranks a-la particle in cell. I note DMSwarm docks don't suggest there is an equivalent of the plex.distribute() method in petsc4py (which I believe calls DMPlexDistribute) that a DMPlex has. In firedrake we create empty DMPlexes on each rank except 0, then call plex.distribute() to take care of parallel partitioning. How, therefore, should I meant to go about distributing particles across MPI ranks?


2:

I'm aware these questions may be very naive. Any advice for learning about relevant bits of PETSc for would be very much appreciated. I'm in chapter 2 of the excellent manual (https://www.mcs.anl.gov/petsc/petsc-current/docs/manual.pdf) and I'm also attempting to understand the DMSwarm example. I presume in oder to understand DMSWARM I really ought to understand DMs more generally (i.e. read the whole manual)?


Many thanks
Reuben Hill


  1.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200212/953eb139/attachment.html>

From patrick.sanan at gmail.com  Wed Feb 12 09:29:13 2020
From: patrick.sanan at gmail.com (Patrick Sanan)
Date: Wed, 12 Feb 2020 16:29:13 +0100
Subject: [petsc-users] Vertex only unstructured mesh with DMPlex and
 DMSWARM
In-Reply-To: <AM6PR06MB597466A0A543DF8CFE701848C41B0@AM6PR06MB5974.eurprd06.prod.outlook.com>
References: <AM6PR06MB597466A0A543DF8CFE701848C41B0@AM6PR06MB5974.eurprd06.prod.outlook.com>
Message-ID: <CA+z91Te=ax6wRtUb-h2DrMfsA7gopWBnU4jDi2obK2iHp61Z4Q@mail.gmail.com>

DMSwarm has DMSwarmMigrate() which might be the closest thing to
DMPlexDistribute().
https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DMSWARM/DMSwarmMigrate.html
Of course, it's good to create particles in parallel when possible.

Am Mi., 12. Feb. 2020 um 12:56 Uhr schrieb Hill, Reuben <
reuben.hill10 at imperial.ac.uk>:

> I'm a new Firedrake developer working on getting my head around PETSc. As
> far as I'm aware, all our PETSc calls are done via petsc4py.
>
> I'm after general help and advise on two fronts:
>
>
> *1*:
>
> I?m trying to represent a point cloud as a vertex-only mesh in an attempt
> to play nicely with the firedrake stack. If I try to do this using
> firedrake I manage to cause a PETSc seg fault error at the point of calling
>
> PETSc.DMPlex().createFromCellList(dim, cells, coords, comm=comm)
>
>
> with dim=0, cells=[[0]], coords=[[1., 2.]] and comm=COMM_WORLD.
>
> Output:
>
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
> probably memory access out of range
> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [0]PETSC ERROR: or see
> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS
> X to find memory corruption errors
> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and
> run
> [0]PETSC ERROR: to get more information on the crash.
> application called MPI_Abort(MPI_COMM_WORLD, 50152059) - process 0
>
>
> I?m now looking into getting firedrake to make a DMSWARM which seems to
> have been designed for doing something closer to this and allows nice
> things such as being able to make the particles (for me - the vertices of
> the mesh) move and jump between MPI ranks a-la particle in cell. I note
> DMSwarm docks don't suggest there is an equivalent of the plex.distribute()
> method in petsc4py (which I believe calls DMPlexDistribute) that a DMPlex
> has. In firedrake we create empty DMPlexes on each rank except 0, then call
> plex.distribute() to take care of parallel partitioning. How, therefore,
> should I meant to go about distributing particles across MPI ranks?
>
>
> *2*:
>
> I'm aware these questions may be very naive. Any advice for learning about
> relevant bits of PETSc for would be very much appreciated. I'm in chapter 2
> of the excellent manual (
> https://www.mcs.anl.gov/petsc/petsc-current/docs/manual.pdf) and I'm also
> attempting to understand the DMSwarm example. I presume in oder to
> understand DMSWARM I really ought to understand DMs more generally (i.e.
> read the whole manual)?
>
>
> Many thanks
> Reuben Hill
>
>
>    1.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200212/976d4811/attachment-0001.html>

From knepley at gmail.com  Wed Feb 12 10:52:42 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 12 Feb 2020 08:52:42 -0800
Subject: [petsc-users] Vertex only unstructured mesh with DMPlex and
 DMSWARM
In-Reply-To: <AM6PR06MB597466A0A543DF8CFE701848C41B0@AM6PR06MB5974.eurprd06.prod.outlook.com>
References: <AM6PR06MB597466A0A543DF8CFE701848C41B0@AM6PR06MB5974.eurprd06.prod.outlook.com>
Message-ID: <CAMYG4GmRFSOgNFMrrRiH1AZXn2xRqo3yuukOQm+Uh-LTOr6r6w@mail.gmail.com>

On Wed, Feb 12, 2020 at 3:56 AM Hill, Reuben <reuben.hill10 at imperial.ac.uk>
wrote:

> I'm a new Firedrake developer working on getting my head around PETSc. As
> far as I'm aware, all our PETSc calls are done via petsc4py.
>
> I'm after general help and advise on two fronts:
>
>
> *1*:
>
> I?m trying to represent a point cloud as a vertex-only mesh in an attempt
> to play nicely with the firedrake stack. If I try to do this using
> firedrake I manage to cause a PETSc seg fault error at the point of calling
>
> PETSc.DMPlex().createFromCellList(dim, cells, coords, comm=comm)
>
>
> with dim=0, cells=[[0]], coords=[[1., 2.]] and comm=COMM_WORLD.
>
> Output:
>
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
> probably memory access out of range
> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [0]PETSC ERROR: or see
> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS
> X to find memory corruption errors
> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and
> run
> [0]PETSC ERROR: to get more information on the crash.
> application called MPI_Abort(MPI_COMM_WORLD, 50152059) - process 0
>
>
> I?m now looking into getting firedrake to make a DMSWARM which seems to
> have been designed for doing something closer to this and allows nice
> things such as being able to make the particles (for me - the vertices of
> the mesh) move and jump between MPI ranks a-la particle in cell. I note
> DMSwarm docks don't suggest there is an equivalent of the plex.distribute()
> method in petsc4py (which I believe calls DMPlexDistribute) that a DMPlex
> has. In firedrake we create empty DMPlexes on each rank except 0, then call
> plex.distribute() to take care of parallel partitioning. How, therefore,
> should I meant to go about distributing particles across MPI ranks?
>

Patrick is right. Here is an example:


https://gitlab.com/petsc/petsc/-/blob/master/src/snes/examples/tutorials/ex63.c

  Thanks

    Matt


> *2*:
>
> I'm aware these questions may be very naive. Any advice for learning about
> relevant bits of PETSc for would be very much appreciated. I'm in chapter 2
> of the excellent manual (
> https://www.mcs.anl.gov/petsc/petsc-current/docs/manual.pdf) and I'm also
> attempting to understand the DMSwarm example. I presume in oder to
> understand DMSWARM I really ought to understand DMs more generally (i.e.
> read the whole manual)?
>
>
> Many thanks
> Reuben Hill
>
>
>    1.
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200212/5de4b2a8/attachment.html>

From perceval.desforges at polytechnique.edu  Thu Feb 13 07:53:03 2020
From: perceval.desforges at polytechnique.edu (Perceval Desforges)
Date: Thu, 13 Feb 2020 14:53:03 +0100
Subject: [petsc-users] DMUMPS_LOAD_RECV_MSGS
Message-ID: <856dbeb7c0711975cc582cfa01f364f3@polytechnique.edu>

Hello all, 

I have been running in a strange issue with petsc, and more specifically
I believe Mumps is the problem.  

In my program, I run a loop where at the beginning of the loop I create
an eps object, calculate the eigenvalues in a certain interval using the
spectrum slicing method, store them, and then destroy the eps object.
For some reason, whatever the problem size, if my loop has too many
iterations (over 2040 I believe), the program will crash giving me this
error : 

 Internal error 1 in DMUMPS_LOAD_RECV_MSGS           0 

application called MPI_Abort(MPI_COMM_WORLD, -99) - process 0 

I am running the program in MPI over 20 processes.  

I don't really understand what this message means, does anybody know?  

Best regards, 

Perceval
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200213/e2c065c6/attachment.html>

From knepley at gmail.com  Thu Feb 13 08:31:43 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 13 Feb 2020 06:31:43 -0800
Subject: [petsc-users] DMUMPS_LOAD_RECV_MSGS
In-Reply-To: <856dbeb7c0711975cc582cfa01f364f3@polytechnique.edu>
References: <856dbeb7c0711975cc582cfa01f364f3@polytechnique.edu>
Message-ID: <CAMYG4GnokQw-=mch0n8brM2EpUc8Y2nLb-g1wsJnpr5t8O72Ag@mail.gmail.com>

On Thu, Feb 13, 2020 at 5:53 AM Perceval Desforges <
perceval.desforges at polytechnique.edu> wrote:

> Hello all,
>
> I have been running in a strange issue with petsc, and more specifically I
> believe Mumps is the problem.
>
> In my program, I run a loop where at the beginning of the loop I create an
> eps object, calculate the eigenvalues in a certain interval using the
> spectrum slicing method, store them, and then destroy the eps object. For
> some reason, whatever the problem size, if my loop has too many iterations
> (over 2040 I believe), the program will crash giving me this error :
>
>  Internal error 1 in DMUMPS_LOAD_RECV_MSGS           0
>
> application called MPI_Abort(MPI_COMM_WORLD, -99) - process 0
>
> I am running the program in MPI over 20 processes.
>
> I don't really understand what this message means, does anybody know?
>
An easy test would be to replace MUMPS with SuperLU and see if the error
persists.

  Thanks,

     Matt


> Best regards,
>
> Perceval
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200213/857f8458/attachment.html>

From bsmith at mcs.anl.gov  Thu Feb 13 09:09:34 2020
From: bsmith at mcs.anl.gov (Smith, Barry F.)
Date: Thu, 13 Feb 2020 15:09:34 +0000
Subject: [petsc-users] DMUMPS_LOAD_RECV_MSGS
In-Reply-To: <856dbeb7c0711975cc582cfa01f364f3@polytechnique.edu>
References: <856dbeb7c0711975cc582cfa01f364f3@polytechnique.edu>
Message-ID: <1AC9C095-472E-45A4-9619-9D5B64F2D1C8@anl.gov>


   Given the 2040 either you or MUMPS is running out of communicators. Do you use your own communicators in your code and are you freeing them when you don't need them? 

   If it is not your code then it is MUMPs that is running out and you should contact them directly

      RECURSIVE SUBROUTINE DMUMPS_LOAD_RECV_MSGS(COMM)
      IMPLICIT NONE
      INCLUDE 'mpif.h'
      INCLUDE 'mumps_tags.h'
      INTEGER IERR, MSGTAG, MSGLEN, MSGSOU,COMM
      INTEGER :: STATUS(MPI_STATUS_SIZE)
      LOGICAL FLAG
 10   CONTINUE
      CALL MPI_IPROBE( MPI_ANY_SOURCE, MPI_ANY_TAG, COMM,
     &     FLAG, STATUS, IERR )
      IF (FLAG) THEN
        KEEP_LOAD(65)=KEEP_LOAD(65)+1
        KEEP_LOAD(267)=KEEP_LOAD(267)-1
        MSGTAG = STATUS( MPI_TAG )
        MSGSOU = STATUS( MPI_SOURCE )
        IF ( MSGTAG .NE. UPDATE_LOAD) THEN
          write(*,*) "Internal error 1 in DMUMPS_LOAD_RECV_MSGS",
     &    MSGTAG
          CALL MUMPS_ABORT()



> On Feb 13, 2020, at 7:53 AM, Perceval Desforges <perceval.desforges at polytechnique.edu> wrote:
> 
> Hello all,
> 
> I have been running in a strange issue with petsc, and more specifically I believe Mumps is the problem. 
> 
> In my program, I run a loop where at the beginning of the loop I create an eps object, calculate the eigenvalues in a certain interval using the spectrum slicing method, store them, and then destroy the eps object. For some reason, whatever the problem size, if my loop has too many iterations (over 2040 I believe), the program will crash giving me this error :
> 
>  Internal error 1 in DMUMPS_LOAD_RECV_MSGS           0
> 
> application called MPI_Abort(MPI_COMM_WORLD, -99) - process 0
> 
> I am running the program in MPI over 20 processes. 
> 
> I don't really understand what this message means, does anybody know? 
> 
> Best regards,
> 
> Perceval
> 


From pranayreddy865 at gmail.com  Thu Feb 13 14:23:33 2020
From: pranayreddy865 at gmail.com (baikadi pranay)
Date: Thu, 13 Feb 2020 13:23:33 -0700
Subject: [petsc-users] Question regarding the EPSSetDimensions routine
Message-ID: <CA+zFCT==i-tyMFXECvuq5Sxf9DH2TYQVbzymjEz_H45gQ7-2dg@mail.gmail.com>

Hello PETSc Users,

I am trying to find the lowest 'n' eigenvalues of a hermitian eigenvalue
problem. The size of the operator matrix (hamiltonian in my case) is
dependent on the mesh spacing provided by the user (which is expected).
However I have the following issue:

The number of eigenvalues given by the solver is not consistent with what
is given as input in the EPSSetDimensions routine. For example, for a
12000x12000 matrix, the solver gives 20 correct eigenvalues if nev=20, but
fails to give any eigenvalue if nev=10.

I am using the following lines of code to solve the problem:


*call EPSCreate(PETSC_COMM_WORLD,eps,ierr)*



*call EPSSetOperators(eps,ham,PETSC_NULL_MAT,ierr)call
EPSSetProblemType(eps,EPS_HEP,ierr)call
EPSSetWhichEigenpairs(eps,EPS_SMALLEST_MAGNITUDE,ierr)call
EPSSetDimensions(eps,n_sub,n_sub*2,PETSC_DEFAULT_INTEGER,ierr)*


*call EPSSetTolerances(eps,1D-10,5000,ierr)call EPSSolve(eps,ierr)*

After the EPSSolve, I am calling EPSGetEigenPair and other relevant
routines to get the eigenvector and eigenvalues.

Any lead as to how to solve this problem would be greatly helpful to us.
Please let me know if I need to provide any further information.

Thank you for your time.

Sincerely,
Pranay.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200213/6ea9f4d5/attachment.html>

From jroman at dsic.upv.es  Thu Feb 13 14:53:54 2020
From: jroman at dsic.upv.es (Jose E. Roman)
Date: Thu, 13 Feb 2020 21:53:54 +0100
Subject: [petsc-users] Question regarding the EPSSetDimensions routine
In-Reply-To: <CA+zFCT==i-tyMFXECvuq5Sxf9DH2TYQVbzymjEz_H45gQ7-2dg@mail.gmail.com>
References: <CA+zFCT==i-tyMFXECvuq5Sxf9DH2TYQVbzymjEz_H45gQ7-2dg@mail.gmail.com>
Message-ID: <B4DB86C9-5398-4B7E-B78B-EE755A7E4413@dsic.upv.es>

For nev=10 you are using a subspace of size 20. This may be too small. Check convergence with a monitor and increase ncv if necessary. 

Jose

> El 13 feb 2020, a las 21:25, baikadi pranay <pranayreddy865 at gmail.com> escribi?:
> 
> ?
> Hello PETSc Users,
> 
> I am trying to find the lowest 'n' eigenvalues of a hermitian eigenvalue problem. The size of the operator matrix (hamiltonian in my case) is dependent on the mesh spacing provided by the user (which is expected). However I have the following issue:
> 
> The number of eigenvalues given by the solver is not consistent with what is given as input in the EPSSetDimensions routine. For example, for a 12000x12000 matrix, the solver gives 20 correct eigenvalues if nev=20, but fails to give any eigenvalue if nev=10.
> 
> I am using the following lines of code to solve the problem:
> 
> call EPSCreate(PETSC_COMM_WORLD,eps,ierr)
> call EPSSetOperators(eps,ham,PETSC_NULL_MAT,ierr)
> call EPSSetProblemType(eps,EPS_HEP,ierr)
> call EPSSetWhichEigenpairs(eps,EPS_SMALLEST_MAGNITUDE,ierr)
> call EPSSetDimensions(eps,n_sub,n_sub*2,PETSC_DEFAULT_INTEGER,ierr)
> call EPSSetTolerances(eps,1D-10,5000,ierr)
> call EPSSolve(eps,ierr)
> 
> After the EPSSolve, I am calling EPSGetEigenPair and other relevant routines to get the eigenvector and eigenvalues.
> 
> Any lead as to how to solve this problem would be greatly helpful to us. Please let me know if I need to provide any further information.
> 
> Thank you for your time.
> 
> Sincerely,
> Pranay.
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200213/2e7ed72c/attachment.html>

From pranayreddy865 at gmail.com  Thu Feb 13 15:03:37 2020
From: pranayreddy865 at gmail.com (baikadi pranay)
Date: Thu, 13 Feb 2020 14:03:37 -0700
Subject: [petsc-users] Question regarding the EPSSetDimensions routine
In-Reply-To: <B4DB86C9-5398-4B7E-B78B-EE755A7E4413@dsic.upv.es>
References: <CA+zFCT==i-tyMFXECvuq5Sxf9DH2TYQVbzymjEz_H45gQ7-2dg@mail.gmail.com>
	<B4DB86C9-5398-4B7E-B78B-EE755A7E4413@dsic.upv.es>
Message-ID: <CA+zFCTm2_o9b7rPJh=FdC-vBUj8bEC_6v-7WtmXE+2nxgHhPRA@mail.gmail.com>

Thank you Jose for the reply.

If I set PETSC_DEFAULT_INTEGER for ncv as suggested in the EPSSetDimensions
documentation, I am still running into the same problem. Also, could you
elaborate on what you mean by checking convergence with a monitor. Do you
mean comparing the eigenvalues for ith and (i+1)th iterations and plotting
the difference to see convergence?

Sincerely,
Pranay.
?

On Thu, Feb 13, 2020 at 1:54 PM Jose E. Roman <jroman at dsic.upv.es> wrote:

> For nev=10 you are using a subspace of size 20. This may be too small.
> Check convergence with a monitor and increase ncv if necessary.
>
> Jose
>
> El 13 feb 2020, a las 21:25, baikadi pranay <pranayreddy865 at gmail.com>
> escribi?:
>
> ?
> Hello PETSc Users,
>
> I am trying to find the lowest 'n' eigenvalues of a hermitian eigenvalue
> problem. The size of the operator matrix (hamiltonian in my case) is
> dependent on the mesh spacing provided by the user (which is expected).
> However I have the following issue:
>
> The number of eigenvalues given by the solver is not consistent with what
> is given as input in the EPSSetDimensions routine. For example, for a
> 12000x12000 matrix, the solver gives 20 correct eigenvalues if nev=20, but
> fails to give any eigenvalue if nev=10.
>
> I am using the following lines of code to solve the problem:
>
>
> *call EPSCreate(PETSC_COMM_WORLD,eps,ierr)*
>
>
>
> *call EPSSetOperators(eps,ham,PETSC_NULL_MAT,ierr)call
> EPSSetProblemType(eps,EPS_HEP,ierr)call
> EPSSetWhichEigenpairs(eps,EPS_SMALLEST_MAGNITUDE,ierr)call
> EPSSetDimensions(eps,n_sub,n_sub*2,PETSC_DEFAULT_INTEGER,ierr)*
>
>
> *call EPSSetTolerances(eps,1D-10,5000,ierr)call EPSSolve(eps,ierr)*
>
> After the EPSSolve, I am calling EPSGetEigenPair and other relevant
> routines to get the eigenvector and eigenvalues.
>
> Any lead as to how to solve this problem would be greatly helpful to us.
> Please let me know if I need to provide any further information.
>
> Thank you for your time.
>
> Sincerely,
> Pranay.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200213/8bd12d38/attachment.html>

From skavou1 at lsu.edu  Thu Feb 13 15:05:34 2020
From: skavou1 at lsu.edu (Sepideh Kavousi)
Date: Thu, 13 Feb 2020 21:05:34 +0000
Subject: [petsc-users] TSMonitorSolutionVTK
Message-ID: <SN4PR0601MB37904F8A687821DA0AA18F1DEE1A0@SN4PR0601MB3790.namprd06.prod.outlook.com>

Dear Petsc users,
I am trying to write VTK output file every few steps. When I was using the petsc 3.7 I used the following lines:
if (user->ts_write % 500 ==0) {
ierr= PetscSNPrintf(user->filename,sizeof(user->filename),"one-%03d.vts",user->ts_write);CHKERRQ(ierr);
ierr= TSMonitorSolutionVTK(user->ts,user->ts_write,t,user->sol_old,&user->filename);CHKERRQ(ierr);}
user->ts_write+=1;
}
and it worked fine, but when I use the same line in Petsc  3.9 and 3.10, using these lines it still outputs the files but when I open them with Visit and try to visualize the individual field component, it does not show them.

I was wondering does anything change in "TSMonitorSolutionVTK"  between the petsc 3.7 and the newer versions?

Best,
Sepideh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200213/14d113ac/attachment-0001.html>

From jroman at dsic.upv.es  Thu Feb 13 15:17:20 2020
From: jroman at dsic.upv.es (Jose E. Roman)
Date: Thu, 13 Feb 2020 22:17:20 +0100
Subject: [petsc-users] Question regarding the EPSSetDimensions routine
In-Reply-To: <CA+zFCTm2_o9b7rPJh=FdC-vBUj8bEC_6v-7WtmXE+2nxgHhPRA@mail.gmail.com>
References: <CA+zFCT==i-tyMFXECvuq5Sxf9DH2TYQVbzymjEz_H45gQ7-2dg@mail.gmail.com>
	<B4DB86C9-5398-4B7E-B78B-EE755A7E4413@dsic.upv.es>
	<CA+zFCTm2_o9b7rPJh=FdC-vBUj8bEC_6v-7WtmXE+2nxgHhPRA@mail.gmail.com>
Message-ID: <A7C9003B-EA06-47AD-A22E-FD7981D259E7@dsic.upv.es>

I mean run with -eps_monitor (see section 2.5.3) and you will see if residuals are decreasing. Either increase the maximum number of iterations or the size of the subspace.


> El 13 feb 2020, a las 22:03, baikadi pranay <pranayreddy865 at gmail.com> escribi?:
> 
> Thank you Jose for the reply. 
> 
> If I set PETSC_DEFAULT_INTEGER for ncv as suggested in the EPSSetDimensions documentation, I am still running into the same problem. Also, could you elaborate on what you mean by checking convergence with a monitor. Do you mean comparing the eigenvalues for ith and (i+1)th iterations and plotting the difference to see convergence?
> 
> Sincerely,
> Pranay.
> ?
> 
> On Thu, Feb 13, 2020 at 1:54 PM Jose E. Roman <jroman at dsic.upv.es> wrote:
> For nev=10 you are using a subspace of size 20. This may be too small. Check convergence with a monitor and increase ncv if necessary. 
> 
> Jose
> 
>> El 13 feb 2020, a las 21:25, baikadi pranay <pranayreddy865 at gmail.com> escribi?:
>> 
>> ?
>> Hello PETSc Users,
>> 
>> I am trying to find the lowest 'n' eigenvalues of a hermitian eigenvalue problem. The size of the operator matrix (hamiltonian in my case) is dependent on the mesh spacing provided by the user (which is expected). However I have the following issue:
>> 
>> The number of eigenvalues given by the solver is not consistent with what is given as input in the EPSSetDimensions routine. For example, for a 12000x12000 matrix, the solver gives 20 correct eigenvalues if nev=20, but fails to give any eigenvalue if nev=10.
>> 
>> I am using the following lines of code to solve the problem:
>> 
>> call EPSCreate(PETSC_COMM_WORLD,eps,ierr)
>> call EPSSetOperators(eps,ham,PETSC_NULL_MAT,ierr)
>> call EPSSetProblemType(eps,EPS_HEP,ierr)
>> call EPSSetWhichEigenpairs(eps,EPS_SMALLEST_MAGNITUDE,ierr)
>> call EPSSetDimensions(eps,n_sub,n_sub*2,PETSC_DEFAULT_INTEGER,ierr)
>> call EPSSetTolerances(eps,1D-10,5000,ierr)
>> call EPSSolve(eps,ierr)
>> 
>> After the EPSSolve, I am calling EPSGetEigenPair and other relevant routines to get the eigenvector and eigenvalues.
>> 
>> Any lead as to how to solve this problem would be greatly helpful to us. Please let me know if I need to provide any further information.
>> 
>> Thank you for your time.
>> 
>> Sincerely,
>> Pranay.
>> 


From jed at jedbrown.org  Thu Feb 13 16:06:26 2020
From: jed at jedbrown.org (Jed Brown)
Date: Thu, 13 Feb 2020 15:06:26 -0700
Subject: [petsc-users] TSMonitorSolutionVTK
In-Reply-To: <SN4PR0601MB37904F8A687821DA0AA18F1DEE1A0@SN4PR0601MB3790.namprd06.prod.outlook.com>
References: <SN4PR0601MB37904F8A687821DA0AA18F1DEE1A0@SN4PR0601MB3790.namprd06.prod.outlook.com>
Message-ID: <87imkarvkt.fsf@jedbrown.org>

Sepideh Kavousi <skavou1 at lsu.edu> writes:

> Dear Petsc users,
> I am trying to write VTK output file every few steps. When I was using the petsc 3.7 I used the following lines:
> if (user->ts_write % 500 ==0) {
> ierr= PetscSNPrintf(user->filename,sizeof(user->filename),"one-%03d.vts",user->ts_write);CHKERRQ(ierr);
> ierr= TSMonitorSolutionVTK(user->ts,user->ts_write,t,user->sol_old,&user->filename);CHKERRQ(ierr);}
> user->ts_write+=1;
> }
> and it worked fine, but when I use the same line in Petsc  3.9 and 3.10, using these lines it still outputs the files but when I open them with Visit and try to visualize the individual field component, it does not show them.

Do they not exist or are just unnamed?  I believe we had a couple
regressions with this output style circa 3.10 (side-effect of better
handling vector fields), but I believe it was fixed by 3.11 or certainly
3.12.  Would it be possible for you to upgrade?


Note that you can (and should) pass the format specifier directly, as in:

  TSMonitorSolutionVTK(user->ts,user->ts_write,t,user->sol_old,"one-%03D.vts");

Visit won't care about the numbers not being contiguous.  If you want to
name them manually, just handle the viewer yourself; body of TSMonitorSolutionVTK:

  ierr = PetscSNPrintf(filename,sizeof(filename),(const char*)filenametemplate,step);CHKERRQ(ierr);
  ierr = PetscViewerVTKOpen(PetscObjectComm((PetscObject)ts),filename,FILE_MODE_WRITE,&viewer);CHKERRQ(ierr);
  ierr = VecView(u,viewer);CHKERRQ(ierr);
  ierr = PetscViewerDestroy(&viewer);CHKERRQ(ierr);

From jed at jedbrown.org  Thu Feb 13 17:28:35 2020
From: jed at jedbrown.org (Jed Brown)
Date: Thu, 13 Feb 2020 16:28:35 -0700
Subject: [petsc-users] TSMonitorSolutionVTK
In-Reply-To: <DM5PR0601MB3782A94CD68B87A11EE89292EE1A0@DM5PR0601MB3782.namprd06.prod.outlook.com>
References: <SN4PR0601MB37904F8A687821DA0AA18F1DEE1A0@SN4PR0601MB3790.namprd06.prod.outlook.com>
	<87imkarvkt.fsf@jedbrown.org>
	<DM5PR0601MB3782A94CD68B87A11EE89292EE1A0@DM5PR0601MB3782.namprd06.prod.outlook.com>
Message-ID: <87blq2rrrw.fsf@jedbrown.org>

Good to hear it works now.  And here's the relevant merge request (from April 2019).

https://gitlab.com/petsc/petsc/-/merge_requests/1602

Sepideh Kavousi <skavou1 at lsu.edu> writes:

> Dear Dr. Brown,
> In the VTS files the field components did not exist. I checked 3.11 and 3.12, and it seems 3.11 still has the problem but for the 3.12 the problem is solved.
> Thanks,
> Sepideh
> ________________________________
> From: Jed Brown <jed at jedbrown.org>
> Sent: Thursday, February 13, 2020 4:06 PM
> To: Sepideh Kavousi <skavou1 at lsu.edu>; petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> Subject: Re: [petsc-users] TSMonitorSolutionVTK
>
> Sepideh Kavousi <skavou1 at lsu.edu> writes:
>
>> Dear Petsc users,
>> I am trying to write VTK output file every few steps. When I was using the petsc 3.7 I used the following lines:
>> if (user->ts_write % 500 ==0) {
>> ierr= PetscSNPrintf(user->filename,sizeof(user->filename),"one-%03d.vts",user->ts_write);CHKERRQ(ierr);
>> ierr= TSMonitorSolutionVTK(user->ts,user->ts_write,t,user->sol_old,&user->filename);CHKERRQ(ierr);}
>> user->ts_write+=1;
>> }
>> and it worked fine, but when I use the same line in Petsc  3.9 and 3.10, using these lines it still outputs the files but when I open them with Visit and try to visualize the individual field component, it does not show them.
>
> Do they not exist or are just unnamed?  I believe we had a couple
> regressions with this output style circa 3.10 (side-effect of better
> handling vector fields), but I believe it was fixed by 3.11 or certainly
> 3.12.  Would it be possible for you to upgrade?
>
>
> Note that you can (and should) pass the format specifier directly, as in:
>
>   TSMonitorSolutionVTK(user->ts,user->ts_write,t,user->sol_old,"one-%03D.vts");
>
> Visit won't care about the numbers not being contiguous.  If you want to
> name them manually, just handle the viewer yourself; body of TSMonitorSolutionVTK:
>
>   ierr = PetscSNPrintf(filename,sizeof(filename),(const char*)filenametemplate,step);CHKERRQ(ierr);
>   ierr = PetscViewerVTKOpen(PetscObjectComm((PetscObject)ts),filename,FILE_MODE_WRITE,&viewer);CHKERRQ(ierr);
>   ierr = VecView(u,viewer);CHKERRQ(ierr);
>   ierr = PetscViewerDestroy(&viewer);CHKERRQ(ierr);

From richard.beare at monash.edu  Thu Feb 13 23:43:46 2020
From: richard.beare at monash.edu (Richard Beare)
Date: Fri, 14 Feb 2020 16:43:46 +1100
Subject: [petsc-users] Crash caused by strange error in KSPSetUp
Message-ID: <CAEMparSWcyXdHyMFCMGDxSLdtktjmk+_rnB0TfdJyTXokbuRDA@mail.gmail.com>

Hi Everyone,
I am experimenting with the Simlul at trophy tool (
https://github.com/Inria-Asclepios/simul-atrophy) that uses petsc to
simulate brain atrophy based on segmented MRI data. I am not the author. I
have this running on most of a dataset of about 50 scans, but experience
crashes with several that I am trying to track down. However I am out of
ideas. The problem images are slightly bigger than some of the successful
ones, but not substantially so, and I have experimented on machines with
sufficient RAM. The error happens very quickly, as part of setup - see the
valgrind report below. I haven't managed to get the sgcheck tool to work
yet. I can only guess that the ksp object is somehow becoming corrupted
during the setup process, but the array sizes that I can track (which
derive from image sizes), appear correct at every point I can check. Any
suggestions as to how I can check what might go wrong in the setup of the
ksp object?
Thankyou.

valgrind tells me:

==18175== Argument 'size' of function memalign has a fishy (possibly
negative) value: -17152038144
==18175==    at 0x4C320A6: memalign (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==18175==    by 0x4F0F1F2: PetscMallocAlign(unsigned long, int, char
const*, char const*, void**) (mal.c:28)
==18175==    by 0x56B43CA: MatSeqAIJSetPreallocation_SeqAIJ (aij.c:3595)
==18175==    by 0x56B39BD: MatSeqAIJSetPreallocation (aij.c:3539)
==18175==    by 0x59A9B44: DMCreateMatrix_DA_3d_MPIAIJ(_p_DM*, _p_Mat*)
(fdda.c:1085)
==18175==    by 0x59A4C71: DMCreateMatrix_DA(_p_DM*, _p_Mat**) (fdda.c:759)
==18175==    by 0x58BBD29: DMCreateMatrix (dm.c:956)
==18175==    by 0x5E509D5: KSPSetUp (itfunc.c:262)
==18175==    by 0x40A3DE: PetscAdLemTaras3D::solveModel(bool)
(PetscAdLemTaras3D.hxx:269)
==18175==    by 0x42413F: AdLem3D<3u>::solveModel(bool, bool, bool)
(AdLem3D.hxx:552)
==18175==    by 0x41C25C: main (PetscAdLemMain.cxx:349)
==18175==

-- 
--
A/Prof Richard Beare
Imaging and Bioinformatics, Peninsula Clinical School
orcid.org/0000-0002-7530-5664
Richard.Beare at monash.edu
+61 3 9788 1724



Geospatial Research:
https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200214/4125981c/attachment.html>

From bsmith at mcs.anl.gov  Fri Feb 14 00:07:24 2020
From: bsmith at mcs.anl.gov (Smith, Barry F.)
Date: Fri, 14 Feb 2020 06:07:24 +0000
Subject: [petsc-users] Crash caused by strange error in KSPSetUp
In-Reply-To: <CAEMparSWcyXdHyMFCMGDxSLdtktjmk+_rnB0TfdJyTXokbuRDA@mail.gmail.com>
References: <CAEMparSWcyXdHyMFCMGDxSLdtktjmk+_rnB0TfdJyTXokbuRDA@mail.gmail.com>
Message-ID: <CD9F4826-CECA-40BD-999D-BD01DACEC8F2@anl.gov>


   Richard,

     It is likely that for these problems some of the integers become too large for the int variable to hold them, thus they overflow and become negative.

     You should make a new PETSC_ARCH configuration of PETSc that uses the configure option --with-64-bit-indices, this will change PETSc to use 64 bit integers which will not overflow.

     Good luck and let us know how it works out

    Barry

     Probably the code is built with an older version of PETSc; the later versions should produce a more useful error message.

> On Feb 13, 2020, at 11:43 PM, Richard Beare via petsc-users <petsc-users at mcs.anl.gov> wrote:
> 
> Hi Everyone,
> I am experimenting with the Simlul at trophy tool (https://github.com/Inria-Asclepios/simul-atrophy) that uses petsc to simulate brain atrophy based on segmented MRI data. I am not the author. I have this running on most of a dataset of about 50 scans, but experience crashes with several that I am trying to track down. However I am out of ideas. The problem images are slightly bigger than some of the successful ones, but not substantially so, and I have experimented on machines with sufficient RAM. The error happens very quickly, as part of setup - see the valgrind report below. I haven't managed to get the sgcheck tool to work yet. I can only guess that the ksp object is somehow becoming corrupted during the setup process, but the array sizes that I can track (which derive from image sizes), appear correct at every point I can check. Any suggestions as to how I can check what might go wrong in the setup of the ksp object?
> Thankyou.
> 
> valgrind tells me:
> 
> ==18175== Argument 'size' of function memalign has a fishy (possibly negative) value: -17152038144
> ==18175==    at 0x4C320A6: memalign (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==18175==    by 0x4F0F1F2: PetscMallocAlign(unsigned long, int, char const*, char const*, void**) (mal.c:28)
> ==18175==    by 0x56B43CA: MatSeqAIJSetPreallocation_SeqAIJ (aij.c:3595)
> ==18175==    by 0x56B39BD: MatSeqAIJSetPreallocation (aij.c:3539)
> ==18175==    by 0x59A9B44: DMCreateMatrix_DA_3d_MPIAIJ(_p_DM*, _p_Mat*) (fdda.c:1085)
> ==18175==    by 0x59A4C71: DMCreateMatrix_DA(_p_DM*, _p_Mat**) (fdda.c:759)
> ==18175==    by 0x58BBD29: DMCreateMatrix (dm.c:956)
> ==18175==    by 0x5E509D5: KSPSetUp (itfunc.c:262)
> ==18175==    by 0x40A3DE: PetscAdLemTaras3D::solveModel(bool) (PetscAdLemTaras3D.hxx:269)
> ==18175==    by 0x42413F: AdLem3D<3u>::solveModel(bool, bool, bool) (AdLem3D.hxx:552)
> ==18175==    by 0x41C25C: main (PetscAdLemMain.cxx:349)
> ==18175== 
> 
> -- 
> --
> A/Prof Richard Beare
> Imaging and Bioinformatics, Peninsula Clinical School
> orcid.org/0000-0002-7530-5664
> Richard.Beare at monash.edu
> +61 3 9788 1724
> 
> 
> 
> Geospatial Research: https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis


From richard.beare at monash.edu  Fri Feb 14 05:10:53 2020
From: richard.beare at monash.edu (Richard Beare)
Date: Fri, 14 Feb 2020 22:10:53 +1100
Subject: [petsc-users] Crash caused by strange error in KSPSetUp
In-Reply-To: <CD9F4826-CECA-40BD-999D-BD01DACEC8F2@anl.gov>
References: <CAEMparSWcyXdHyMFCMGDxSLdtktjmk+_rnB0TfdJyTXokbuRDA@mail.gmail.com>
	<CD9F4826-CECA-40BD-999D-BD01DACEC8F2@anl.gov>
Message-ID: <CAEMparRFQ=1A-vJVGZDLM3r4FwO7wGBum8xej7ZrVYradF4rHg@mail.gmail.com>

No luck - exactly the same error after including the
--with-64-bit-indicies=yes --download-mpich=yes options

==8674== Argument 'size' of function memalign has a fishy (possibly
negative) value: -17152036540
==8674==    at 0x4C320A6: memalign (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8674==    by 0x4F0CFF2: PetscMallocAlign(unsigned long, int, char const*,
char const*, void**) (mal.c:28)
==8674==    by 0x4F0F716: PetscTrMallocDefault(unsigned long, int, char
const*, char const*, void**) (mtr.c:188)
==8674==    by 0x569AF3E: MatSeqAIJSetPreallocation_SeqAIJ (aij.c:3595)
==8674==    by 0x569A531: MatSeqAIJSetPreallocation (aij.c:3539)
==8674==    by 0x599080A: DMCreateMatrix_DA_3d_MPIAIJ(_p_DM*, _p_Mat*)
(fdda.c:1085)
==8674==    by 0x598B937: DMCreateMatrix_DA(_p_DM*, _p_Mat**) (fdda.c:759)
==8674==    by 0x58A2BF2: DMCreateMatrix (dm.c:956)
==8674==    by 0x5E377B3: KSPSetUp (itfunc.c:262)
==8674==    by 0x409FFC: PetscAdLemTaras3D::solveModel(bool)
(PetscAdLemTaras3D.hxx:255)
==8674==    by 0x4239FB: AdLem3D<3u>::solveModel(bool, bool, bool)
(AdLem3D.hxx:551)
==8674==    by 0x41BD17: main (PetscAdLemMain.cxx:344)
==8674==
On Fri, 14 Feb 2020 at 17:07, Smith, Barry F. <bsmith at mcs.anl.gov> wrote:

>
>    Richard,
>
>      It is likely that for these problems some of the integers become too
> large for the int variable to hold them, thus they overflow and become
> negative.
>
>      You should make a new PETSC_ARCH configuration of PETSc that uses the
> configure option --with-64-bit-indices, this will change PETSc to use 64
> bit integers which will not overflow.
>
>      Good luck and let us know how it works out
>
>     Barry
>
>      Probably the code is built with an older version of PETSc; the later
> versions should produce a more useful error message.
>
> > On Feb 13, 2020, at 11:43 PM, Richard Beare via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
> >
> > Hi Everyone,
> > I am experimenting with the Simlul at trophy tool (
> https://github.com/Inria-Asclepios/simul-atrophy) that uses petsc to
> simulate brain atrophy based on segmented MRI data. I am not the author. I
> have this running on most of a dataset of about 50 scans, but experience
> crashes with several that I am trying to track down. However I am out of
> ideas. The problem images are slightly bigger than some of the successful
> ones, but not substantially so, and I have experimented on machines with
> sufficient RAM. The error happens very quickly, as part of setup - see the
> valgrind report below. I haven't managed to get the sgcheck tool to work
> yet. I can only guess that the ksp object is somehow becoming corrupted
> during the setup process, but the array sizes that I can track (which
> derive from image sizes), appear correct at every point I can check. Any
> suggestions as to how I can check what might go wrong in the setup of the
> ksp object?
> > Thankyou.
> >
> > valgrind tells me:
> >
> > ==18175== Argument 'size' of function memalign has a fishy (possibly
> negative) value: -17152038144
> > ==18175==    at 0x4C320A6: memalign (in
> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> > ==18175==    by 0x4F0F1F2: PetscMallocAlign(unsigned long, int, char
> const*, char const*, void**) (mal.c:28)
> > ==18175==    by 0x56B43CA: MatSeqAIJSetPreallocation_SeqAIJ (aij.c:3595)
> > ==18175==    by 0x56B39BD: MatSeqAIJSetPreallocation (aij.c:3539)
> > ==18175==    by 0x59A9B44: DMCreateMatrix_DA_3d_MPIAIJ(_p_DM*, _p_Mat*)
> (fdda.c:1085)
> > ==18175==    by 0x59A4C71: DMCreateMatrix_DA(_p_DM*, _p_Mat**)
> (fdda.c:759)
> > ==18175==    by 0x58BBD29: DMCreateMatrix (dm.c:956)
> > ==18175==    by 0x5E509D5: KSPSetUp (itfunc.c:262)
> > ==18175==    by 0x40A3DE: PetscAdLemTaras3D::solveModel(bool)
> (PetscAdLemTaras3D.hxx:269)
> > ==18175==    by 0x42413F: AdLem3D<3u>::solveModel(bool, bool, bool)
> (AdLem3D.hxx:552)
> > ==18175==    by 0x41C25C: main (PetscAdLemMain.cxx:349)
> > ==18175==
> >
> > --
> > --
> > A/Prof Richard Beare
> > Imaging and Bioinformatics, Peninsula Clinical School
> > orcid.org/0000-0002-7530-5664
> > Richard.Beare at monash.edu
> > +61 3 9788 1724
> >
> >
> >
> > Geospatial Research:
> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
>
>

-- 
--
A/Prof Richard Beare
Imaging and Bioinformatics, Peninsula Clinical School
orcid.org/0000-0002-7530-5664
Richard.Beare at monash.edu
+61 3 9788 1724



Geospatial Research:
https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200214/7659faa3/attachment-0001.html>

From jczhang at mcs.anl.gov  Fri Feb 14 09:47:04 2020
From: jczhang at mcs.anl.gov (Junchao Zhang)
Date: Fri, 14 Feb 2020 09:47:04 -0600
Subject: [petsc-users] Crash caused by strange error in KSPSetUp
In-Reply-To: <CAEMparRFQ=1A-vJVGZDLM3r4FwO7wGBum8xej7ZrVYradF4rHg@mail.gmail.com>
References: <CAEMparSWcyXdHyMFCMGDxSLdtktjmk+_rnB0TfdJyTXokbuRDA@mail.gmail.com>
	<CD9F4826-CECA-40BD-999D-BD01DACEC8F2@anl.gov>
	<CAEMparRFQ=1A-vJVGZDLM3r4FwO7wGBum8xej7ZrVYradF4rHg@mail.gmail.com>
Message-ID: <CA+MQGp_U24udLR-aNPhDEG1o5skvpyMmxO64NFoHzFM2Vwmqgg@mail.gmail.com>

Which petsc version do you use? In aij.c of the master branch, I saw Barry
recently added a useful check to catch number of nonzero overflow, ierr =
PetscIntCast(nz64,&nz);CHKERRQ(ierr);  But you mentioned using 64-bit
indices did not solve the problem, it might not be the reason.  You should
try the master branch if feasible. Also, vary number of MPI ranks to see if
error stack changes.

--Junchao Zhang


On Fri, Feb 14, 2020 at 5:12 AM Richard Beare via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> No luck - exactly the same error after including the
> --with-64-bit-indicies=yes --download-mpich=yes options
>
> ==8674== Argument 'size' of function memalign has a fishy (possibly
> negative) value: -17152036540
> ==8674==    at 0x4C320A6: memalign (in
> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==8674==    by 0x4F0CFF2: PetscMallocAlign(unsigned long, int, char
> const*, char const*, void**) (mal.c:28)
> ==8674==    by 0x4F0F716: PetscTrMallocDefault(unsigned long, int, char
> const*, char const*, void**) (mtr.c:188)
> ==8674==    by 0x569AF3E: MatSeqAIJSetPreallocation_SeqAIJ (aij.c:3595)
> ==8674==    by 0x569A531: MatSeqAIJSetPreallocation (aij.c:3539)
> ==8674==    by 0x599080A: DMCreateMatrix_DA_3d_MPIAIJ(_p_DM*, _p_Mat*)
> (fdda.c:1085)
> ==8674==    by 0x598B937: DMCreateMatrix_DA(_p_DM*, _p_Mat**) (fdda.c:759)
> ==8674==    by 0x58A2BF2: DMCreateMatrix (dm.c:956)
> ==8674==    by 0x5E377B3: KSPSetUp (itfunc.c:262)
> ==8674==    by 0x409FFC: PetscAdLemTaras3D::solveModel(bool)
> (PetscAdLemTaras3D.hxx:255)
> ==8674==    by 0x4239FB: AdLem3D<3u>::solveModel(bool, bool, bool)
> (AdLem3D.hxx:551)
> ==8674==    by 0x41BD17: main (PetscAdLemMain.cxx:344)
> ==8674==
> On Fri, 14 Feb 2020 at 17:07, Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
>
>>
>>    Richard,
>>
>>      It is likely that for these problems some of the integers become too
>> large for the int variable to hold them, thus they overflow and become
>> negative.
>>
>>      You should make a new PETSC_ARCH configuration of PETSc that uses
>> the configure option --with-64-bit-indices, this will change PETSc to use
>> 64 bit integers which will not overflow.
>>
>>      Good luck and let us know how it works out
>>
>>     Barry
>>
>>      Probably the code is built with an older version of PETSc; the later
>> versions should produce a more useful error message.
>>
>> > On Feb 13, 2020, at 11:43 PM, Richard Beare via petsc-users <
>> petsc-users at mcs.anl.gov> wrote:
>> >
>> > Hi Everyone,
>> > I am experimenting with the Simlul at trophy tool (
>> https://github.com/Inria-Asclepios/simul-atrophy) that uses petsc to
>> simulate brain atrophy based on segmented MRI data. I am not the author. I
>> have this running on most of a dataset of about 50 scans, but experience
>> crashes with several that I am trying to track down. However I am out of
>> ideas. The problem images are slightly bigger than some of the successful
>> ones, but not substantially so, and I have experimented on machines with
>> sufficient RAM. The error happens very quickly, as part of setup - see the
>> valgrind report below. I haven't managed to get the sgcheck tool to work
>> yet. I can only guess that the ksp object is somehow becoming corrupted
>> during the setup process, but the array sizes that I can track (which
>> derive from image sizes), appear correct at every point I can check. Any
>> suggestions as to how I can check what might go wrong in the setup of the
>> ksp object?
>> > Thankyou.
>> >
>> > valgrind tells me:
>> >
>> > ==18175== Argument 'size' of function memalign has a fishy (possibly
>> negative) value: -17152038144
>> > ==18175==    at 0x4C320A6: memalign (in
>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>> > ==18175==    by 0x4F0F1F2: PetscMallocAlign(unsigned long, int, char
>> const*, char const*, void**) (mal.c:28)
>> > ==18175==    by 0x56B43CA: MatSeqAIJSetPreallocation_SeqAIJ (aij.c:3595)
>> > ==18175==    by 0x56B39BD: MatSeqAIJSetPreallocation (aij.c:3539)
>> > ==18175==    by 0x59A9B44: DMCreateMatrix_DA_3d_MPIAIJ(_p_DM*, _p_Mat*)
>> (fdda.c:1085)
>> > ==18175==    by 0x59A4C71: DMCreateMatrix_DA(_p_DM*, _p_Mat**)
>> (fdda.c:759)
>> > ==18175==    by 0x58BBD29: DMCreateMatrix (dm.c:956)
>> > ==18175==    by 0x5E509D5: KSPSetUp (itfunc.c:262)
>> > ==18175==    by 0x40A3DE: PetscAdLemTaras3D::solveModel(bool)
>> (PetscAdLemTaras3D.hxx:269)
>> > ==18175==    by 0x42413F: AdLem3D<3u>::solveModel(bool, bool, bool)
>> (AdLem3D.hxx:552)
>> > ==18175==    by 0x41C25C: main (PetscAdLemMain.cxx:349)
>> > ==18175==
>> >
>> > --
>> > --
>> > A/Prof Richard Beare
>> > Imaging and Bioinformatics, Peninsula Clinical School
>> > orcid.org/0000-0002-7530-5664
>> > Richard.Beare at monash.edu
>> > +61 3 9788 1724
>> >
>> >
>> >
>> > Geospatial Research:
>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
>>
>>
>
> --
> --
> A/Prof Richard Beare
> Imaging and Bioinformatics, Peninsula Clinical School
> orcid.org/0000-0002-7530-5664
> Richard.Beare at monash.edu
> +61 3 9788 1724
>
>
>
> Geospatial Research:
> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200214/341622b2/attachment.html>

From eijkhout at tacc.utexas.edu  Fri Feb 14 09:49:12 2020
From: eijkhout at tacc.utexas.edu (Victor Eijkhout)
Date: Fri, 14 Feb 2020 15:49:12 +0000
Subject: [petsc-users] Crash caused by strange error in KSPSetUp
In-Reply-To: <CA+MQGp_U24udLR-aNPhDEG1o5skvpyMmxO64NFoHzFM2Vwmqgg@mail.gmail.com>
References: <CAEMparSWcyXdHyMFCMGDxSLdtktjmk+_rnB0TfdJyTXokbuRDA@mail.gmail.com>
	<CD9F4826-CECA-40BD-999D-BD01DACEC8F2@anl.gov>
	<CAEMparRFQ=1A-vJVGZDLM3r4FwO7wGBum8xej7ZrVYradF4rHg@mail.gmail.com>
	<CA+MQGp_U24udLR-aNPhDEG1o5skvpyMmxO64NFoHzFM2Vwmqgg@mail.gmail.com>
Message-ID: <AAB1E70C-9CCA-4864-B365-EEA6E19CD735@tacc.utexas.edu>



On , 2020Feb14, at 09:47, Junchao Zhang via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:

Barry recently added a useful check to catch number of nonzero overflow, ierr = PetscIntCast(nz64,&nz);

Is that only activated in debug versions of the installation?

Victor.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200214/221dbff6/attachment.html>

From jczhang at mcs.anl.gov  Fri Feb 14 09:52:32 2020
From: jczhang at mcs.anl.gov (Junchao Zhang)
Date: Fri, 14 Feb 2020 09:52:32 -0600
Subject: [petsc-users] Crash caused by strange error in KSPSetUp
In-Reply-To: <AAB1E70C-9CCA-4864-B365-EEA6E19CD735@tacc.utexas.edu>
References: <CAEMparSWcyXdHyMFCMGDxSLdtktjmk+_rnB0TfdJyTXokbuRDA@mail.gmail.com>
	<CD9F4826-CECA-40BD-999D-BD01DACEC8F2@anl.gov>
	<CAEMparRFQ=1A-vJVGZDLM3r4FwO7wGBum8xej7ZrVYradF4rHg@mail.gmail.com>
	<CA+MQGp_U24udLR-aNPhDEG1o5skvpyMmxO64NFoHzFM2Vwmqgg@mail.gmail.com>
	<AAB1E70C-9CCA-4864-B365-EEA6E19CD735@tacc.utexas.edu>
Message-ID: <CA+MQGp9kSeOFj7kcgAJqFKHN5vdy0auD0y7xFUzgCiW+z_u_ww@mail.gmail.com>

On Fri, Feb 14, 2020 at 9:49 AM Victor Eijkhout <eijkhout at tacc.utexas.edu>
wrote:

>
>
> On , 2020Feb14, at 09:47, Junchao Zhang via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
> Barry recently added a useful check to catch number of nonzero overflow,
> ierr = PetscIntCast(nz64,&nz);
>
>
> Is that only activated in debug versions of the installation?
>
No. It is activated for all builds.


>
> Victor.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200214/02b1d2d7/attachment-0001.html>

From richard.beare at monash.edu  Fri Feb 14 15:03:49 2020
From: richard.beare at monash.edu (Richard Beare)
Date: Sat, 15 Feb 2020 08:03:49 +1100
Subject: [petsc-users] Crash caused by strange error in KSPSetUp
In-Reply-To: <CA+MQGp_U24udLR-aNPhDEG1o5skvpyMmxO64NFoHzFM2Vwmqgg@mail.gmail.com>
References: <CAEMparSWcyXdHyMFCMGDxSLdtktjmk+_rnB0TfdJyTXokbuRDA@mail.gmail.com>
	<CD9F4826-CECA-40BD-999D-BD01DACEC8F2@anl.gov>
	<CAEMparRFQ=1A-vJVGZDLM3r4FwO7wGBum8xej7ZrVYradF4rHg@mail.gmail.com>
	<CA+MQGp_U24udLR-aNPhDEG1o5skvpyMmxO64NFoHzFM2Vwmqgg@mail.gmail.com>
Message-ID: <CAEMparQmCWxuCMihpWcJB6PfiuY+vnz76jwh+aaMy1_zvyJ5qQ@mail.gmail.com>

I will see if I can build with master. The docs for simulatrophy say
3.6.3.1.

On Sat, 15 Feb 2020 at 02:47, Junchao Zhang <jczhang at mcs.anl.gov> wrote:

> Which petsc version do you use? In aij.c of the master branch, I saw Barry
> recently added a useful check to catch number of nonzero overflow, ierr =
> PetscIntCast(nz64,&nz);CHKERRQ(ierr);  But you mentioned using 64-bit
> indices did not solve the problem, it might not be the reason.  You should
> try the master branch if feasible. Also, vary number of MPI ranks to see if
> error stack changes.
>
> --Junchao Zhang
>
>
> On Fri, Feb 14, 2020 at 5:12 AM Richard Beare via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
>> No luck - exactly the same error after including the
>> --with-64-bit-indicies=yes --download-mpich=yes options
>>
>> ==8674== Argument 'size' of function memalign has a fishy (possibly
>> negative) value: -17152036540
>> ==8674==    at 0x4C320A6: memalign (in
>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>> ==8674==    by 0x4F0CFF2: PetscMallocAlign(unsigned long, int, char
>> const*, char const*, void**) (mal.c:28)
>> ==8674==    by 0x4F0F716: PetscTrMallocDefault(unsigned long, int, char
>> const*, char const*, void**) (mtr.c:188)
>> ==8674==    by 0x569AF3E: MatSeqAIJSetPreallocation_SeqAIJ (aij.c:3595)
>> ==8674==    by 0x569A531: MatSeqAIJSetPreallocation (aij.c:3539)
>> ==8674==    by 0x599080A: DMCreateMatrix_DA_3d_MPIAIJ(_p_DM*, _p_Mat*)
>> (fdda.c:1085)
>> ==8674==    by 0x598B937: DMCreateMatrix_DA(_p_DM*, _p_Mat**) (fdda.c:759)
>> ==8674==    by 0x58A2BF2: DMCreateMatrix (dm.c:956)
>> ==8674==    by 0x5E377B3: KSPSetUp (itfunc.c:262)
>> ==8674==    by 0x409FFC: PetscAdLemTaras3D::solveModel(bool)
>> (PetscAdLemTaras3D.hxx:255)
>> ==8674==    by 0x4239FB: AdLem3D<3u>::solveModel(bool, bool, bool)
>> (AdLem3D.hxx:551)
>> ==8674==    by 0x41BD17: main (PetscAdLemMain.cxx:344)
>> ==8674==
>> On Fri, 14 Feb 2020 at 17:07, Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
>>
>>>
>>>    Richard,
>>>
>>>      It is likely that for these problems some of the integers become
>>> too large for the int variable to hold them, thus they overflow and become
>>> negative.
>>>
>>>      You should make a new PETSC_ARCH configuration of PETSc that uses
>>> the configure option --with-64-bit-indices, this will change PETSc to use
>>> 64 bit integers which will not overflow.
>>>
>>>      Good luck and let us know how it works out
>>>
>>>     Barry
>>>
>>>      Probably the code is built with an older version of PETSc; the
>>> later versions should produce a more useful error message.
>>>
>>> > On Feb 13, 2020, at 11:43 PM, Richard Beare via petsc-users <
>>> petsc-users at mcs.anl.gov> wrote:
>>> >
>>> > Hi Everyone,
>>> > I am experimenting with the Simlul at trophy tool (
>>> https://github.com/Inria-Asclepios/simul-atrophy) that uses petsc to
>>> simulate brain atrophy based on segmented MRI data. I am not the author. I
>>> have this running on most of a dataset of about 50 scans, but experience
>>> crashes with several that I am trying to track down. However I am out of
>>> ideas. The problem images are slightly bigger than some of the successful
>>> ones, but not substantially so, and I have experimented on machines with
>>> sufficient RAM. The error happens very quickly, as part of setup - see the
>>> valgrind report below. I haven't managed to get the sgcheck tool to work
>>> yet. I can only guess that the ksp object is somehow becoming corrupted
>>> during the setup process, but the array sizes that I can track (which
>>> derive from image sizes), appear correct at every point I can check. Any
>>> suggestions as to how I can check what might go wrong in the setup of the
>>> ksp object?
>>> > Thankyou.
>>> >
>>> > valgrind tells me:
>>> >
>>> > ==18175== Argument 'size' of function memalign has a fishy (possibly
>>> negative) value: -17152038144
>>> > ==18175==    at 0x4C320A6: memalign (in
>>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>>> > ==18175==    by 0x4F0F1F2: PetscMallocAlign(unsigned long, int, char
>>> const*, char const*, void**) (mal.c:28)
>>> > ==18175==    by 0x56B43CA: MatSeqAIJSetPreallocation_SeqAIJ
>>> (aij.c:3595)
>>> > ==18175==    by 0x56B39BD: MatSeqAIJSetPreallocation (aij.c:3539)
>>> > ==18175==    by 0x59A9B44: DMCreateMatrix_DA_3d_MPIAIJ(_p_DM*,
>>> _p_Mat*) (fdda.c:1085)
>>> > ==18175==    by 0x59A4C71: DMCreateMatrix_DA(_p_DM*, _p_Mat**)
>>> (fdda.c:759)
>>> > ==18175==    by 0x58BBD29: DMCreateMatrix (dm.c:956)
>>> > ==18175==    by 0x5E509D5: KSPSetUp (itfunc.c:262)
>>> > ==18175==    by 0x40A3DE: PetscAdLemTaras3D::solveModel(bool)
>>> (PetscAdLemTaras3D.hxx:269)
>>> > ==18175==    by 0x42413F: AdLem3D<3u>::solveModel(bool, bool, bool)
>>> (AdLem3D.hxx:552)
>>> > ==18175==    by 0x41C25C: main (PetscAdLemMain.cxx:349)
>>> > ==18175==
>>> >
>>> > --
>>> > --
>>> > A/Prof Richard Beare
>>> > Imaging and Bioinformatics, Peninsula Clinical School
>>> > orcid.org/0000-0002-7530-5664
>>> > Richard.Beare at monash.edu
>>> > +61 3 9788 1724
>>> >
>>> >
>>> >
>>> > Geospatial Research:
>>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
>>>
>>>
>>
>> --
>> --
>> A/Prof Richard Beare
>> Imaging and Bioinformatics, Peninsula Clinical School
>> orcid.org/0000-0002-7530-5664
>> Richard.Beare at monash.edu
>> +61 3 9788 1724
>>
>>
>>
>> Geospatial Research:
>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
>>
>

-- 
--
A/Prof Richard Beare
Imaging and Bioinformatics, Peninsula Clinical School
orcid.org/0000-0002-7530-5664
Richard.Beare at monash.edu
+61 3 9788 1724



Geospatial Research:
https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200215/fca534ff/attachment.html>

From richard.beare at monash.edu  Fri Feb 14 20:32:57 2020
From: richard.beare at monash.edu (Richard Beare)
Date: Sat, 15 Feb 2020 13:32:57 +1100
Subject: [petsc-users] Crash caused by strange error in KSPSetUp
In-Reply-To: <CAEMparQmCWxuCMihpWcJB6PfiuY+vnz76jwh+aaMy1_zvyJ5qQ@mail.gmail.com>
References: <CAEMparSWcyXdHyMFCMGDxSLdtktjmk+_rnB0TfdJyTXokbuRDA@mail.gmail.com>
	<CD9F4826-CECA-40BD-999D-BD01DACEC8F2@anl.gov>
	<CAEMparRFQ=1A-vJVGZDLM3r4FwO7wGBum8xej7ZrVYradF4rHg@mail.gmail.com>
	<CA+MQGp_U24udLR-aNPhDEG1o5skvpyMmxO64NFoHzFM2Vwmqgg@mail.gmail.com>
	<CAEMparQmCWxuCMihpWcJB6PfiuY+vnz76jwh+aaMy1_zvyJ5qQ@mail.gmail.com>
Message-ID: <CAEMparQkSDuO4dhkAgLsFTrQTrDj=60yGTjpEqcY77FAENmQ4g@mail.gmail.com>

It doesn't compile out of the box with master.

singularity def file attached.

On Sat, 15 Feb 2020 at 08:03, Richard Beare <richard.beare at monash.edu>
wrote:

> I will see if I can build with master. The docs for simulatrophy say
> 3.6.3.1.
>
> On Sat, 15 Feb 2020 at 02:47, Junchao Zhang <jczhang at mcs.anl.gov> wrote:
>
>> Which petsc version do you use? In aij.c of the master branch, I saw
>> Barry recently added a useful check to catch number of nonzero overflow,
>> ierr = PetscIntCast(nz64,&nz);CHKERRQ(ierr);  But you mentioned using
>> 64-bit indices did not solve the problem, it might not be the reason.  You
>> should try the master branch if feasible. Also, vary number of MPI ranks to
>> see if error stack changes.
>>
>> --Junchao Zhang
>>
>>
>> On Fri, Feb 14, 2020 at 5:12 AM Richard Beare via petsc-users <
>> petsc-users at mcs.anl.gov> wrote:
>>
>>> No luck - exactly the same error after including the
>>> --with-64-bit-indicies=yes --download-mpich=yes options
>>>
>>> ==8674== Argument 'size' of function memalign has a fishy (possibly
>>> negative) value: -17152036540
>>> ==8674==    at 0x4C320A6: memalign (in
>>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>>> ==8674==    by 0x4F0CFF2: PetscMallocAlign(unsigned long, int, char
>>> const*, char const*, void**) (mal.c:28)
>>> ==8674==    by 0x4F0F716: PetscTrMallocDefault(unsigned long, int, char
>>> const*, char const*, void**) (mtr.c:188)
>>> ==8674==    by 0x569AF3E: MatSeqAIJSetPreallocation_SeqAIJ (aij.c:3595)
>>> ==8674==    by 0x569A531: MatSeqAIJSetPreallocation (aij.c:3539)
>>> ==8674==    by 0x599080A: DMCreateMatrix_DA_3d_MPIAIJ(_p_DM*, _p_Mat*)
>>> (fdda.c:1085)
>>> ==8674==    by 0x598B937: DMCreateMatrix_DA(_p_DM*, _p_Mat**)
>>> (fdda.c:759)
>>> ==8674==    by 0x58A2BF2: DMCreateMatrix (dm.c:956)
>>> ==8674==    by 0x5E377B3: KSPSetUp (itfunc.c:262)
>>> ==8674==    by 0x409FFC: PetscAdLemTaras3D::solveModel(bool)
>>> (PetscAdLemTaras3D.hxx:255)
>>> ==8674==    by 0x4239FB: AdLem3D<3u>::solveModel(bool, bool, bool)
>>> (AdLem3D.hxx:551)
>>> ==8674==    by 0x41BD17: main (PetscAdLemMain.cxx:344)
>>> ==8674==
>>> On Fri, 14 Feb 2020 at 17:07, Smith, Barry F. <bsmith at mcs.anl.gov>
>>> wrote:
>>>
>>>>
>>>>    Richard,
>>>>
>>>>      It is likely that for these problems some of the integers become
>>>> too large for the int variable to hold them, thus they overflow and become
>>>> negative.
>>>>
>>>>      You should make a new PETSC_ARCH configuration of PETSc that uses
>>>> the configure option --with-64-bit-indices, this will change PETSc to use
>>>> 64 bit integers which will not overflow.
>>>>
>>>>      Good luck and let us know how it works out
>>>>
>>>>     Barry
>>>>
>>>>      Probably the code is built with an older version of PETSc; the
>>>> later versions should produce a more useful error message.
>>>>
>>>> > On Feb 13, 2020, at 11:43 PM, Richard Beare via petsc-users <
>>>> petsc-users at mcs.anl.gov> wrote:
>>>> >
>>>> > Hi Everyone,
>>>> > I am experimenting with the Simlul at trophy tool (
>>>> https://github.com/Inria-Asclepios/simul-atrophy) that uses petsc to
>>>> simulate brain atrophy based on segmented MRI data. I am not the author. I
>>>> have this running on most of a dataset of about 50 scans, but experience
>>>> crashes with several that I am trying to track down. However I am out of
>>>> ideas. The problem images are slightly bigger than some of the successful
>>>> ones, but not substantially so, and I have experimented on machines with
>>>> sufficient RAM. The error happens very quickly, as part of setup - see the
>>>> valgrind report below. I haven't managed to get the sgcheck tool to work
>>>> yet. I can only guess that the ksp object is somehow becoming corrupted
>>>> during the setup process, but the array sizes that I can track (which
>>>> derive from image sizes), appear correct at every point I can check. Any
>>>> suggestions as to how I can check what might go wrong in the setup of the
>>>> ksp object?
>>>> > Thankyou.
>>>> >
>>>> > valgrind tells me:
>>>> >
>>>> > ==18175== Argument 'size' of function memalign has a fishy (possibly
>>>> negative) value: -17152038144
>>>> > ==18175==    at 0x4C320A6: memalign (in
>>>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>>>> > ==18175==    by 0x4F0F1F2: PetscMallocAlign(unsigned long, int, char
>>>> const*, char const*, void**) (mal.c:28)
>>>> > ==18175==    by 0x56B43CA: MatSeqAIJSetPreallocation_SeqAIJ
>>>> (aij.c:3595)
>>>> > ==18175==    by 0x56B39BD: MatSeqAIJSetPreallocation (aij.c:3539)
>>>> > ==18175==    by 0x59A9B44: DMCreateMatrix_DA_3d_MPIAIJ(_p_DM*,
>>>> _p_Mat*) (fdda.c:1085)
>>>> > ==18175==    by 0x59A4C71: DMCreateMatrix_DA(_p_DM*, _p_Mat**)
>>>> (fdda.c:759)
>>>> > ==18175==    by 0x58BBD29: DMCreateMatrix (dm.c:956)
>>>> > ==18175==    by 0x5E509D5: KSPSetUp (itfunc.c:262)
>>>> > ==18175==    by 0x40A3DE: PetscAdLemTaras3D::solveModel(bool)
>>>> (PetscAdLemTaras3D.hxx:269)
>>>> > ==18175==    by 0x42413F: AdLem3D<3u>::solveModel(bool, bool, bool)
>>>> (AdLem3D.hxx:552)
>>>> > ==18175==    by 0x41C25C: main (PetscAdLemMain.cxx:349)
>>>> > ==18175==
>>>> >
>>>> > --
>>>> > --
>>>> > A/Prof Richard Beare
>>>> > Imaging and Bioinformatics, Peninsula Clinical School
>>>> > orcid.org/0000-0002-7530-5664
>>>> > Richard.Beare at monash.edu
>>>> > +61 3 9788 1724
>>>> >
>>>> >
>>>> >
>>>> > Geospatial Research:
>>>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
>>>>
>>>>
>>>
>>> --
>>> --
>>> A/Prof Richard Beare
>>> Imaging and Bioinformatics, Peninsula Clinical School
>>> orcid.org/0000-0002-7530-5664
>>> Richard.Beare at monash.edu
>>> +61 3 9788 1724
>>>
>>>
>>>
>>> Geospatial Research:
>>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
>>>
>>
>
> --
> --
> A/Prof Richard Beare
> Imaging and Bioinformatics, Peninsula Clinical School
> orcid.org/0000-0002-7530-5664
> Richard.Beare at monash.edu
> +61 3 9788 1724
>
>
>
> Geospatial Research:
> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
>


-- 
--
A/Prof Richard Beare
Imaging and Bioinformatics, Peninsula Clinical School
orcid.org/0000-0002-7530-5664
Richard.Beare at monash.edu
+61 3 9788 1724



Geospatial Research:
https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200215/6c7fb399/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sa.def
Type: application/octet-stream
Size: 5804 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200215/6c7fb399/attachment-0001.obj>

From hongzhang at anl.gov  Sat Feb 15 11:14:35 2020
From: hongzhang at anl.gov (Zhang, Hong)
Date: Sat, 15 Feb 2020 17:14:35 +0000
Subject: [petsc-users] Flagging the solver to restart
In-Reply-To: <CAG3=zwDf-EL8RbBDL_mUY=NORNWb+EQM7bU4SBObttCPBx=NbA@mail.gmail.com>
References: <CAG3=zwCjBdGgd8gv3Dt4UAzbrDEDkCjmBCOjuKq2w=N_eN_EkA@mail.gmail.com>
	<4246EF6F-6B7A-4202-806B-6D334E5B9427@anl.gov>
	<CAG3=zwDf-EL8RbBDL_mUY=NORNWb+EQM7bU4SBObttCPBx=NbA@mail.gmail.com>
Message-ID: <9071818E-8A2A-4206-88B9-1C383CC4238B@anl.gov>

Please make sure your replies go to the maillist.

On Feb 15, 2020, at 4:35 AM, Mohammed Ashour <ashour.msc at gmail.com<mailto:ashour.msc at gmail.com>> wrote:

Dear Mr. Hong,
Thanks for your reply and clarification.
I have a follow-up question. If TSRollBack() is to be called within a TSPostStep, that would set the ts->steprollback to PETSC_TRUE. And since there is a falsity test on TSPreStep in TSSolve, that would prevent TSPreStep from being engaged as long as ts->steprollback is PETSC_TRUE, which it is after being set so in the TSPostStep. So I'm wondering, why is there a falsity test on TSPreStep knowing that it would not be accessible if TSRollBack is called within TSPostStep.

This guarantees TSPreStep is called only once before each successful step.

Hong


Thanks in advance.
Yours sincerely

On Tue, Feb 4, 2020 at 5:32 PM Zhang, Hong <hongzhang at anl.gov<mailto:hongzhang at anl.gov>> wrote:


> On Feb 2, 2020, at 11:24 AM, Mohammed Ashour <ashour.msc at gmail.com<mailto:ashour.msc at gmail.com>> wrote:
>
> Dear All,
> I'm solving a constraint phase-field problem using PetIGA. This question i'm having is more relevant to PETSc, so I'm posting here.
>
> I have an algorithm involving iterating on the solution vector until certain criteria are met before moving forward for the next time step. The sequence inside the TSSolve is to call TSMonitor first, to print a user-defined set of values and the move to solve at TSStep and then call TSPostEvaluate.
>
> So I'm using the TSMonitor to update some variables at time n , those variables are used the in the residual and jacobian calculations at time n+1, and then solving and then check if those criteria are met or not in a function assigned to TS via TSSetPostEvaluate, if the criteria are met, it'll move forward, if not, it'll engaged the routine TSRollBack(), which based on my understanding is the proper way yo flag the solver to recalculate n+1. My question is, is this the proper way to do it? what is the difference between TSRollBack and TSRestart?

You are right that TSRollBack() recalculates the current time step. But I would not suggest to use TSPostEvaluate in your case. Presumably you are not using the PETSc adaptor (e.g. via -ts_adapt_type none) and want to control the stepsize yourself. You can check the criteria in TSPostStep, call TSRollBack() if the criteria are not met and update the variables accordingly. The variables can also be updated in TSPreStep(), but TSMonitor should not be used since it is designed for read-only operations.

TSRestart may be needed when you are using non-self-starting integration methods such as multiple step methods and FSAL RK methods (-ts_rk_type <3bs,5dp,5bs,6vr,7vr,8vr>). These methods rely on solutions or residuals from previous time steps, thus need a flag to hard restart the time integration whenever discontinuity is introduced (e.g. a parameter in the RHS function is changed). So TSRestart sets the flag to tell the integrator to treat the next time step like the first time step in a time integration.

Hong (Mr.)

> Thanks a lot
>
> --



--
Mohammed Ashour, M.Sc.
PhD Scholar
Bauhaus-Universit?t Weimar
Institute of Structural Mechanics (ISM)
Marienstra?e 7
99423 Weimar, Germany
Mobile: +(49) 176 58834667

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200215/4cfe99bc/attachment.html>

From shrirang.abhyankar at pnnl.gov  Sat Feb 15 12:20:14 2020
From: shrirang.abhyankar at pnnl.gov (Abhyankar, Shrirang G)
Date: Sat, 15 Feb 2020 18:20:14 +0000
Subject: [petsc-users] Flagging the solver to restart
In-Reply-To: <9071818E-8A2A-4206-88B9-1C383CC4238B@anl.gov>
References: <CAG3=zwCjBdGgd8gv3Dt4UAzbrDEDkCjmBCOjuKq2w=N_eN_EkA@mail.gmail.com>
	<4246EF6F-6B7A-4202-806B-6D334E5B9427@anl.gov>
	<CAG3=zwDf-EL8RbBDL_mUY=NORNWb+EQM7bU4SBObttCPBx=NbA@mail.gmail.com>
	<9071818E-8A2A-4206-88B9-1C383CC4238B@anl.gov>
Message-ID: <86076CCE-DF62-4805-9F3C-FBE441CDEAFF@pnnl.gov>

if the criteria are met, it'll move forward, if not, it'll engaged the routine TSRollBack(), which based on my understanding is the proper way yo flag the solver to recalculate n+1.

Are you trying to do some kind of zero crossing event or root-finding here? If so, using TSSetEventHandler<https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/TS/TSSetEventHandler.html> would be a better way than to write your own code. You merely have to define the criteria/condition in an event function and how to handle it in a posteventfunction. PETSc will manage the event location and rollback part for you.  See an example here https://www.mcs.anl.gov/petsc/petsc-current/src/ts/examples/tutorials/ex40.c.html.

Thanks,
Shri


From: petsc-users <petsc-users-bounces at mcs.anl.gov> on behalf of "Zhang, Hong via petsc-users" <petsc-users at mcs.anl.gov>
Reply-To: "Zhang, Hong" <hongzhang at anl.gov>
Date: Saturday, February 15, 2020 at 11:14 AM
To: Mohammed Ashour <ashour.msc at gmail.com>
Cc: PETSc users list <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Flagging the solver to restart

Please make sure your replies go to the maillist.


On Feb 15, 2020, at 4:35 AM, Mohammed Ashour <ashour.msc at gmail.com<mailto:ashour.msc at gmail.com>> wrote:

Dear Mr. Hong,
Thanks for your reply and clarification.
I have a follow-up question. If TSRollBack() is to be called within a TSPostStep, that would set the ts->steprollback to PETSC_TRUE. And since there is a falsity test on TSPreStep in TSSolve, that would prevent TSPreStep from being engaged as long as ts->steprollback is PETSC_TRUE, which it is after being set so in the TSPostStep. So I'm wondering, why is there a falsity test on TSPreStep knowing that it would not be accessible if TSRollBack is called within TSPostStep.

This guarantees TSPreStep is called only once before each successful step.

Hong



Thanks in advance.
Yours sincerely

On Tue, Feb 4, 2020 at 5:32 PM Zhang, Hong <hongzhang at anl.gov<mailto:hongzhang at anl.gov>> wrote:


> On Feb 2, 2020, at 11:24 AM, Mohammed Ashour <ashour.msc at gmail.com<mailto:ashour.msc at gmail.com>> wrote:
>
> Dear All,
> I'm solving a constraint phase-field problem using PetIGA. This question i'm having is more relevant to PETSc, so I'm posting here.
>
> I have an algorithm involving iterating on the solution vector until certain criteria are met before moving forward for the next time step. The sequence inside the TSSolve is to call TSMonitor first, to print a user-defined set of values and the move to solve at TSStep and then call TSPostEvaluate.
>
> So I'm using the TSMonitor to update some variables at time n , those variables are used the in the residual and jacobian calculations at time n+1, and then solving and then check if those criteria are met or not in a function assigned to TS via TSSetPostEvaluate, if the criteria are met, it'll move forward, if not, it'll engaged the routine TSRollBack(), which based on my understanding is the proper way yo flag the solver to recalculate n+1. My question is, is this the proper way to do it? what is the difference between TSRollBack and TSRestart?

You are right that TSRollBack() recalculates the current time step. But I would not suggest to use TSPostEvaluate in your case. Presumably you are not using the PETSc adaptor (e.g. via -ts_adapt_type none) and want to control the stepsize yourself. You can check the criteria in TSPostStep, call TSRollBack() if the criteria are not met and update the variables accordingly. The variables can also be updated in TSPreStep(), but TSMonitor should not be used since it is designed for read-only operations.

TSRestart may be needed when you are using non-self-starting integration methods such as multiple step methods and FSAL RK methods (-ts_rk_type <3bs,5dp,5bs,6vr,7vr,8vr>). These methods rely on solutions or residuals from previous time steps, thus need a flag to hard restart the time integration whenever discontinuity is introduced (e.g. a parameter in the RHS function is changed). So TSRestart sets the flag to tell the integrator to treat the next time step like the first time step in a time integration.

Hong (Mr.)

> Thanks a lot
>
> --


--
Mohammed Ashour, M.Sc.
PhD Scholar
Bauhaus-Universit?t Weimar
Institute of Structural Mechanics (ISM)
Marienstra?e 7
99423 Weimar, Germany
Mobile: +(49) 176 58834667

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200215/2cbea7c9/attachment.html>

From yyang85 at stanford.edu  Sat Feb 15 21:42:10 2020
From: yyang85 at stanford.edu (Yuyun Yang)
Date: Sun, 16 Feb 2020 03:42:10 +0000
Subject: [petsc-users] Matrix-free method in PETSc
Message-ID: <DM6PR02MB52427978C0D49D7F6D07FCF9D2170@DM6PR02MB5242.namprd02.prod.outlook.com>

Hello team,

I wanted to apply the Krylov subspace method to a matrix-free implementation of a stencil, such that the iterative method acts on the operation without ever constructing the matrix explicitly (for example, when doing backward Euler).

I'm not sure whether there is already an example for that somewhere. If so, could you point me to a relevant example?

Thank you!

Best regards,
Yuyun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200216/c0aa272c/attachment-0001.html>

From jed at jedbrown.org  Sat Feb 15 22:51:40 2020
From: jed at jedbrown.org (Jed Brown)
Date: Sat, 15 Feb 2020 21:51:40 -0700
Subject: [petsc-users] Matrix-free method in PETSc
In-Reply-To: <DM6PR02MB52427978C0D49D7F6D07FCF9D2170@DM6PR02MB5242.namprd02.prod.outlook.com>
References: <DM6PR02MB52427978C0D49D7F6D07FCF9D2170@DM6PR02MB5242.namprd02.prod.outlook.com>
Message-ID: <878sl3w2w3.fsf@jedbrown.org>

Take most any example, say in src/ts/examples/tutorials/, and run with

  -ts_type beuler -snes_mf

The -snes_mf says to run the nonlinear solver (Newton by default) with
an unpreconditioned Krylov method for which the action of the Jacobian
is given by matrix-free finite differencing of the residual (which is
nonlinear when the dynamical system is).

Yuyun Yang <yyang85 at stanford.edu> writes:

> Hello team,
>
> I wanted to apply the Krylov subspace method to a matrix-free implementation of a stencil, such that the iterative method acts on the operation without ever constructing the matrix explicitly (for example, when doing backward Euler).
>
> I'm not sure whether there is already an example for that somewhere. If so, could you point me to a relevant example?
>
> Thank you!
>
> Best regards,
> Yuyun

From bsmith at mcs.anl.gov  Sun Feb 16 00:02:10 2020
From: bsmith at mcs.anl.gov (Smith, Barry F.)
Date: Sun, 16 Feb 2020 06:02:10 +0000
Subject: [petsc-users] Matrix-free method in PETSc
In-Reply-To: <DM6PR02MB52427978C0D49D7F6D07FCF9D2170@DM6PR02MB5242.namprd02.prod.outlook.com>
References: <DM6PR02MB52427978C0D49D7F6D07FCF9D2170@DM6PR02MB5242.namprd02.prod.outlook.com>
Message-ID: <D9C47F36-D516-4ABA-A31B-30CE5F8D9744@anl.gov>

  Yuyun,

    If you are speaking about using a finite difference stencil on a structured grid where you provide the Jacobian vector products yourself by looping over the grid doing the stencil operation we unfortunately do not have exactly that kind of example. 

    But it is actually not difficult. I suggest starting with src/ts/examples/tests/ex22.c It computes the sparse matrix explicitly with FormIJacobian() 

    What you need to do is instead in main() use MatCreateShell() and MatShellSetOperation(,MATOP_MULT,(void (*)(void))MyMatMult) then provide the routine MyMatMult() to do your stencil based matrix free product; note that you can create this new routine by taking the structure of IFunction() and reorganizing it to do the Jacobian product instead. You will need to get the information about the shell matrix size on each process by calling DMDAGetCorners(). 

    You will then remove the explicit computation of the Jacobian, and also remove the Event stuff since you don't need it.

     Extending to 2 and 3d is straight forward. 

     Any questions let us know.

   Barry

   If you like this would make a great merge request with your code to improve our examples.


> On Feb 15, 2020, at 9:42 PM, Yuyun Yang <yyang85 at stanford.edu> wrote:
> 
> Hello team,
> 
> I wanted to apply the Krylov subspace method to a matrix-free implementation of a stencil, such that the iterative method acts on the operation without ever constructing the matrix explicitly (for example, when doing backward Euler).
> 
> I'm not sure whether there is already an example for that somewhere. If so, could you point me to a relevant example?
> 
> Thank you!
> 
> Best regards,
> Yuyun


From yyang85 at stanford.edu  Sun Feb 16 05:12:13 2020
From: yyang85 at stanford.edu (Yuyun Yang)
Date: Sun, 16 Feb 2020 11:12:13 +0000
Subject: [petsc-users] Matrix-free method in PETSc
In-Reply-To: <D9C47F36-D516-4ABA-A31B-30CE5F8D9744@anl.gov>
References: <DM6PR02MB52427978C0D49D7F6D07FCF9D2170@DM6PR02MB5242.namprd02.prod.outlook.com>,
	<D9C47F36-D516-4ABA-A31B-30CE5F8D9744@anl.gov>
Message-ID: <DM6PR02MB5242C9E2F8ED6FB1A1A28CC8D2170@DM6PR02MB5242.namprd02.prod.outlook.com>

Thank you, that is very helpful information indeed! I will try it and send you my code when it works.

Best regards,
Yuyun
________________________________
From: Smith, Barry F. <bsmith at mcs.anl.gov>
Sent: Saturday, February 15, 2020 10:02 PM
To: Yuyun Yang <yyang85 at stanford.edu>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Matrix-free method in PETSc

  Yuyun,

    If you are speaking about using a finite difference stencil on a structured grid where you provide the Jacobian vector products yourself by looping over the grid doing the stencil operation we unfortunately do not have exactly that kind of example.

    But it is actually not difficult. I suggest starting with src/ts/examples/tests/ex22.c It computes the sparse matrix explicitly with FormIJacobian()

    What you need to do is instead in main() use MatCreateShell() and MatShellSetOperation(,MATOP_MULT,(void (*)(void))MyMatMult) then provide the routine MyMatMult() to do your stencil based matrix free product; note that you can create this new routine by taking the structure of IFunction() and reorganizing it to do the Jacobian product instead. You will need to get the information about the shell matrix size on each process by calling DMDAGetCorners().

    You will then remove the explicit computation of the Jacobian, and also remove the Event stuff since you don't need it.

     Extending to 2 and 3d is straight forward.

     Any questions let us know.

   Barry

   If you like this would make a great merge request with your code to improve our examples.


> On Feb 15, 2020, at 9:42 PM, Yuyun Yang <yyang85 at stanford.edu> wrote:
>
> Hello team,
>
> I wanted to apply the Krylov subspace method to a matrix-free implementation of a stencil, such that the iterative method acts on the operation without ever constructing the matrix explicitly (for example, when doing backward Euler).
>
> I'm not sure whether there is already an example for that somewhere. If so, could you point me to a relevant example?
>
> Thank you!
>
> Best regards,
> Yuyun

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200216/271544e7/attachment.html>

From jczhang at mcs.anl.gov  Sun Feb 16 23:37:02 2020
From: jczhang at mcs.anl.gov (Junchao Zhang)
Date: Sun, 16 Feb 2020 23:37:02 -0600
Subject: [petsc-users] Crash caused by strange error in KSPSetUp
In-Reply-To: <CAEMparQkSDuO4dhkAgLsFTrQTrDj=60yGTjpEqcY77FAENmQ4g@mail.gmail.com>
References: <CAEMparSWcyXdHyMFCMGDxSLdtktjmk+_rnB0TfdJyTXokbuRDA@mail.gmail.com>
	<CD9F4826-CECA-40BD-999D-BD01DACEC8F2@anl.gov>
	<CAEMparRFQ=1A-vJVGZDLM3r4FwO7wGBum8xej7ZrVYradF4rHg@mail.gmail.com>
	<CA+MQGp_U24udLR-aNPhDEG1o5skvpyMmxO64NFoHzFM2Vwmqgg@mail.gmail.com>
	<CAEMparQmCWxuCMihpWcJB6PfiuY+vnz76jwh+aaMy1_zvyJ5qQ@mail.gmail.com>
	<CAEMparQkSDuO4dhkAgLsFTrQTrDj=60yGTjpEqcY77FAENmQ4g@mail.gmail.com>
Message-ID: <CA+MQGp-cEONx8SLsVW3Y_H29ijOi+uP7nZgquLQ6YM_uYoVxrQ@mail.gmail.com>

Richard,
  I managed to get the code Simlul at trophy built. Could you tell me how to
run your test? I want to see if I can reproduce the error. Thanks

--Junchao Zhang


On Fri, Feb 14, 2020 at 8:34 PM Richard Beare <richard.beare at monash.edu>
wrote:

> It doesn't compile out of the box with master.
>
> singularity def file attached.
>
> On Sat, 15 Feb 2020 at 08:03, Richard Beare <richard.beare at monash.edu>
> wrote:
>
>> I will see if I can build with master. The docs for simulatrophy say
>> 3.6.3.1.
>>
>> On Sat, 15 Feb 2020 at 02:47, Junchao Zhang <jczhang at mcs.anl.gov> wrote:
>>
>>> Which petsc version do you use? In aij.c of the master branch, I saw
>>> Barry recently added a useful check to catch number of nonzero overflow,
>>> ierr = PetscIntCast(nz64,&nz);CHKERRQ(ierr);  But you mentioned using
>>> 64-bit indices did not solve the problem, it might not be the reason.  You
>>> should try the master branch if feasible. Also, vary number of MPI ranks to
>>> see if error stack changes.
>>>
>>> --Junchao Zhang
>>>
>>>
>>> On Fri, Feb 14, 2020 at 5:12 AM Richard Beare via petsc-users <
>>> petsc-users at mcs.anl.gov> wrote:
>>>
>>>> No luck - exactly the same error after including the
>>>> --with-64-bit-indicies=yes --download-mpich=yes options
>>>>
>>>> ==8674== Argument 'size' of function memalign has a fishy (possibly
>>>> negative) value: -17152036540
>>>> ==8674==    at 0x4C320A6: memalign (in
>>>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>>>> ==8674==    by 0x4F0CFF2: PetscMallocAlign(unsigned long, int, char
>>>> const*, char const*, void**) (mal.c:28)
>>>> ==8674==    by 0x4F0F716: PetscTrMallocDefault(unsigned long, int, char
>>>> const*, char const*, void**) (mtr.c:188)
>>>> ==8674==    by 0x569AF3E: MatSeqAIJSetPreallocation_SeqAIJ (aij.c:3595)
>>>> ==8674==    by 0x569A531: MatSeqAIJSetPreallocation (aij.c:3539)
>>>> ==8674==    by 0x599080A: DMCreateMatrix_DA_3d_MPIAIJ(_p_DM*, _p_Mat*)
>>>> (fdda.c:1085)
>>>> ==8674==    by 0x598B937: DMCreateMatrix_DA(_p_DM*, _p_Mat**)
>>>> (fdda.c:759)
>>>> ==8674==    by 0x58A2BF2: DMCreateMatrix (dm.c:956)
>>>> ==8674==    by 0x5E377B3: KSPSetUp (itfunc.c:262)
>>>> ==8674==    by 0x409FFC: PetscAdLemTaras3D::solveModel(bool)
>>>> (PetscAdLemTaras3D.hxx:255)
>>>> ==8674==    by 0x4239FB: AdLem3D<3u>::solveModel(bool, bool, bool)
>>>> (AdLem3D.hxx:551)
>>>> ==8674==    by 0x41BD17: main (PetscAdLemMain.cxx:344)
>>>> ==8674==
>>>> On Fri, 14 Feb 2020 at 17:07, Smith, Barry F. <bsmith at mcs.anl.gov>
>>>> wrote:
>>>>
>>>>>
>>>>>    Richard,
>>>>>
>>>>>      It is likely that for these problems some of the integers become
>>>>> too large for the int variable to hold them, thus they overflow and become
>>>>> negative.
>>>>>
>>>>>      You should make a new PETSC_ARCH configuration of PETSc that uses
>>>>> the configure option --with-64-bit-indices, this will change PETSc to use
>>>>> 64 bit integers which will not overflow.
>>>>>
>>>>>      Good luck and let us know how it works out
>>>>>
>>>>>     Barry
>>>>>
>>>>>      Probably the code is built with an older version of PETSc; the
>>>>> later versions should produce a more useful error message.
>>>>>
>>>>> > On Feb 13, 2020, at 11:43 PM, Richard Beare via petsc-users <
>>>>> petsc-users at mcs.anl.gov> wrote:
>>>>> >
>>>>> > Hi Everyone,
>>>>> > I am experimenting with the Simlul at trophy tool (
>>>>> https://github.com/Inria-Asclepios/simul-atrophy) that uses petsc to
>>>>> simulate brain atrophy based on segmented MRI data. I am not the author. I
>>>>> have this running on most of a dataset of about 50 scans, but experience
>>>>> crashes with several that I am trying to track down. However I am out of
>>>>> ideas. The problem images are slightly bigger than some of the successful
>>>>> ones, but not substantially so, and I have experimented on machines with
>>>>> sufficient RAM. The error happens very quickly, as part of setup - see the
>>>>> valgrind report below. I haven't managed to get the sgcheck tool to work
>>>>> yet. I can only guess that the ksp object is somehow becoming corrupted
>>>>> during the setup process, but the array sizes that I can track (which
>>>>> derive from image sizes), appear correct at every point I can check. Any
>>>>> suggestions as to how I can check what might go wrong in the setup of the
>>>>> ksp object?
>>>>> > Thankyou.
>>>>> >
>>>>> > valgrind tells me:
>>>>> >
>>>>> > ==18175== Argument 'size' of function memalign has a fishy (possibly
>>>>> negative) value: -17152038144
>>>>> > ==18175==    at 0x4C320A6: memalign (in
>>>>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>>>>> > ==18175==    by 0x4F0F1F2: PetscMallocAlign(unsigned long, int, char
>>>>> const*, char const*, void**) (mal.c:28)
>>>>> > ==18175==    by 0x56B43CA: MatSeqAIJSetPreallocation_SeqAIJ
>>>>> (aij.c:3595)
>>>>> > ==18175==    by 0x56B39BD: MatSeqAIJSetPreallocation (aij.c:3539)
>>>>> > ==18175==    by 0x59A9B44: DMCreateMatrix_DA_3d_MPIAIJ(_p_DM*,
>>>>> _p_Mat*) (fdda.c:1085)
>>>>> > ==18175==    by 0x59A4C71: DMCreateMatrix_DA(_p_DM*, _p_Mat**)
>>>>> (fdda.c:759)
>>>>> > ==18175==    by 0x58BBD29: DMCreateMatrix (dm.c:956)
>>>>> > ==18175==    by 0x5E509D5: KSPSetUp (itfunc.c:262)
>>>>> > ==18175==    by 0x40A3DE: PetscAdLemTaras3D::solveModel(bool)
>>>>> (PetscAdLemTaras3D.hxx:269)
>>>>> > ==18175==    by 0x42413F: AdLem3D<3u>::solveModel(bool, bool, bool)
>>>>> (AdLem3D.hxx:552)
>>>>> > ==18175==    by 0x41C25C: main (PetscAdLemMain.cxx:349)
>>>>> > ==18175==
>>>>> >
>>>>> > --
>>>>> > --
>>>>> > A/Prof Richard Beare
>>>>> > Imaging and Bioinformatics, Peninsula Clinical School
>>>>> > orcid.org/0000-0002-7530-5664
>>>>> > Richard.Beare at monash.edu
>>>>> > +61 3 9788 1724
>>>>> >
>>>>> >
>>>>> >
>>>>> > Geospatial Research:
>>>>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
>>>>>
>>>>>
>>>>
>>>> --
>>>> --
>>>> A/Prof Richard Beare
>>>> Imaging and Bioinformatics, Peninsula Clinical School
>>>> orcid.org/0000-0002-7530-5664
>>>> Richard.Beare at monash.edu
>>>> +61 3 9788 1724
>>>>
>>>>
>>>>
>>>> Geospatial Research:
>>>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
>>>>
>>>
>>
>> --
>> --
>> A/Prof Richard Beare
>> Imaging and Bioinformatics, Peninsula Clinical School
>> orcid.org/0000-0002-7530-5664
>> Richard.Beare at monash.edu
>> +61 3 9788 1724
>>
>>
>>
>> Geospatial Research:
>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
>>
>
>
> --
> --
> A/Prof Richard Beare
> Imaging and Bioinformatics, Peninsula Clinical School
> orcid.org/0000-0002-7530-5664
> Richard.Beare at monash.edu
> +61 3 9788 1724
>
>
>
> Geospatial Research:
> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200216/c44c97ff/attachment.html>

From eda.oktay at metu.edu.tr  Mon Feb 17 02:35:59 2020
From: eda.oktay at metu.edu.tr (Eda Oktay)
Date: Mon, 17 Feb 2020 11:35:59 +0300
Subject: [petsc-users] Forming a matrix from vectors
Message-ID: <CA+vu_XjeBm6EdAwXj-GZO3MubpvFtV2XttFK7B54oxE5N_yAjg@mail.gmail.com>

Hello all,

I am trying to form a matrix whose columns are eigenvectors I have
calculated before U = [v1,v2,v3]. Is there any easy way of forming this
matrix? My matrix should be parallel and I have created vectors as below,
where nev i s the number of requested eigenvalues. So each V[i] represents
an eigenvector and I should form a matrix by using V.

Vec *V;
  VecDuplicateVecs(vr,nev,&V);
  for (i=0; i<nev;i++){
  ierr = EPSGetEigenpair(eps,i,&kr,NULL,V[i],NULL);
  }

Thanks!

Eda
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200217/1ecd3ef5/attachment.html>

From jroman at dsic.upv.es  Mon Feb 17 03:24:30 2020
From: jroman at dsic.upv.es (Jose E. Roman)
Date: Mon, 17 Feb 2020 10:24:30 +0100
Subject: [petsc-users] Forming a matrix from vectors
In-Reply-To: <CA+vu_XjeBm6EdAwXj-GZO3MubpvFtV2XttFK7B54oxE5N_yAjg@mail.gmail.com>
References: <CA+vu_XjeBm6EdAwXj-GZO3MubpvFtV2XttFK7B54oxE5N_yAjg@mail.gmail.com>
Message-ID: <F236F711-DAD2-470B-82B7-6F5DE9134BD4@dsic.upv.es>

I would use MatDenseGetColumn() and VecGetArrayRead() to get the two pointers and then copy the values with a loop.

Jose

> El 17 feb 2020, a las 9:35, Eda Oktay <eda.oktay at metu.edu.tr> escribi?:
> 
> Hello all,
> 
> I am trying to form a matrix whose columns are eigenvectors I have calculated before U = [v1,v2,v3]. Is there any easy way of forming this matrix? My matrix should be parallel and I have created vectors as below, where nev i s the number of requested eigenvalues. So each V[i] represents an eigenvector and I should form a matrix by using V.
> 
> Vec *V;
>   VecDuplicateVecs(vr,nev,&V); 
>   for (i=0; i<nev;i++){
>   ierr = EPSGetEigenpair(eps,i,&kr,NULL,V[i],NULL);  
>   }
> 
> Thanks!
> 
> Eda


From knepley at gmail.com  Mon Feb 17 06:01:42 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 17 Feb 2020 07:01:42 -0500
Subject: [petsc-users] Forming a matrix from vectors
In-Reply-To: <F236F711-DAD2-470B-82B7-6F5DE9134BD4@dsic.upv.es>
References: <CA+vu_XjeBm6EdAwXj-GZO3MubpvFtV2XttFK7B54oxE5N_yAjg@mail.gmail.com>
	<F236F711-DAD2-470B-82B7-6F5DE9134BD4@dsic.upv.es>
Message-ID: <CAMYG4G=-=QOnrdqkV39vLyyNxS-goyp-DtG47TYqG5cmTA6aDw@mail.gmail.com>

On Mon, Feb 17, 2020 at 4:24 AM Jose E. Roman <jroman at dsic.upv.es> wrote:

> I would use MatDenseGetColumn() and VecGetArrayRead() to get the two
> pointers and then copy the values with a loop.
>

I would do as Jose says to get it working. After you verify it, we could
show you how to avoid a copy.

  Thanks,

     Matt


> Jose
>
> > El 17 feb 2020, a las 9:35, Eda Oktay <eda.oktay at metu.edu.tr> escribi?:
> >
> > Hello all,
> >
> > I am trying to form a matrix whose columns are eigenvectors I have
> calculated before U = [v1,v2,v3]. Is there any easy way of forming this
> matrix? My matrix should be parallel and I have created vectors as below,
> where nev i s the number of requested eigenvalues. So each V[i] represents
> an eigenvector and I should form a matrix by using V.
> >
> > Vec *V;
> >   VecDuplicateVecs(vr,nev,&V);
> >   for (i=0; i<nev;i++){
> >   ierr = EPSGetEigenpair(eps,i,&kr,NULL,V[i],NULL);
> >   }
> >
> > Thanks!
> >
> > Eda
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200217/964ae0bd/attachment-0001.html>

From yyang85 at stanford.edu  Mon Feb 17 07:56:43 2020
From: yyang85 at stanford.edu (Yuyun Yang)
Date: Mon, 17 Feb 2020 13:56:43 +0000
Subject: [petsc-users] Matrix-free method in PETSc
In-Reply-To: <DM6PR02MB5242C9E2F8ED6FB1A1A28CC8D2170@DM6PR02MB5242.namprd02.prod.outlook.com>
References: <DM6PR02MB52427978C0D49D7F6D07FCF9D2170@DM6PR02MB5242.namprd02.prod.outlook.com>,
	<D9C47F36-D516-4ABA-A31B-30CE5F8D9744@anl.gov>,
	<DM6PR02MB5242C9E2F8ED6FB1A1A28CC8D2170@DM6PR02MB5242.namprd02.prod.outlook.com>
Message-ID: <DM6PR02MB524259A2FC82DE98649B15E9D2160@DM6PR02MB5242.namprd02.prod.outlook.com>

Hello,

I actually have a question about the usage of DMDA since I'm quite new to this. I wonder if the DMDA suite of functions can be directly called on vectors created from VecCreate? Or the vectors have to be formed by DMDACreateGlobalVector? I'm also not sure about what the dof and stencil width arguments do.

I'm still unsure about the usage of MatCreateShell and MatShellSetOperation, since it seems that MyMatMult should still have 3 inputs just like MatMult (the matrix and two vectors). Since I'm not forming the matrix, does that mean the matrix input is meaningless but still needs to exist for the sake of this format?

After I create such a shell matrix, can I use it like a regular matrix in KSP and utilize preconditioners?

Thanks!
Yuyun
________________________________
From: petsc-users <petsc-users-bounces at mcs.anl.gov> on behalf of Yuyun Yang <yyang85 at stanford.edu>
Sent: Sunday, February 16, 2020 3:12 AM
To: Smith, Barry F. <bsmith at mcs.anl.gov>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Matrix-free method in PETSc

Thank you, that is very helpful information indeed! I will try it and send you my code when it works.

Best regards,
Yuyun
________________________________
From: Smith, Barry F. <bsmith at mcs.anl.gov>
Sent: Saturday, February 15, 2020 10:02 PM
To: Yuyun Yang <yyang85 at stanford.edu>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Matrix-free method in PETSc

  Yuyun,

    If you are speaking about using a finite difference stencil on a structured grid where you provide the Jacobian vector products yourself by looping over the grid doing the stencil operation we unfortunately do not have exactly that kind of example.

    But it is actually not difficult. I suggest starting with src/ts/examples/tests/ex22.c It computes the sparse matrix explicitly with FormIJacobian()

    What you need to do is instead in main() use MatCreateShell() and MatShellSetOperation(,MATOP_MULT,(void (*)(void))MyMatMult) then provide the routine MyMatMult() to do your stencil based matrix free product; note that you can create this new routine by taking the structure of IFunction() and reorganizing it to do the Jacobian product instead. You will need to get the information about the shell matrix size on each process by calling DMDAGetCorners().

    You will then remove the explicit computation of the Jacobian, and also remove the Event stuff since you don't need it.

     Extending to 2 and 3d is straight forward.

     Any questions let us know.

   Barry

   If you like this would make a great merge request with your code to improve our examples.


> On Feb 15, 2020, at 9:42 PM, Yuyun Yang <yyang85 at stanford.edu> wrote:
>
> Hello team,
>
> I wanted to apply the Krylov subspace method to a matrix-free implementation of a stencil, such that the iterative method acts on the operation without ever constructing the matrix explicitly (for example, when doing backward Euler).
>
> I'm not sure whether there is already an example for that somewhere. If so, could you point me to a relevant example?
>
> Thank you!
>
> Best regards,
> Yuyun

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200217/dce8ba14/attachment.html>

From knepley at gmail.com  Mon Feb 17 09:19:21 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 17 Feb 2020 10:19:21 -0500
Subject: [petsc-users] Matrix-free method in PETSc
In-Reply-To: <DM6PR02MB524259A2FC82DE98649B15E9D2160@DM6PR02MB5242.namprd02.prod.outlook.com>
References: <DM6PR02MB52427978C0D49D7F6D07FCF9D2170@DM6PR02MB5242.namprd02.prod.outlook.com>
	<D9C47F36-D516-4ABA-A31B-30CE5F8D9744@anl.gov>
	<DM6PR02MB5242C9E2F8ED6FB1A1A28CC8D2170@DM6PR02MB5242.namprd02.prod.outlook.com>
	<DM6PR02MB524259A2FC82DE98649B15E9D2160@DM6PR02MB5242.namprd02.prod.outlook.com>
Message-ID: <CAMYG4GnLQNHj-Ez-65RqSAF-zLng7FYnU6_LxVzgua-Bi9s_Pg@mail.gmail.com>

On Mon, Feb 17, 2020 at 8:56 AM Yuyun Yang <yyang85 at stanford.edu> wrote:

> Hello,
>
> I actually have a question about the usage of DMDA since I'm quite new to
> this. I wonder if the DMDA suite of functions can be directly called on
> vectors created from VecCreate? Or the vectors have to be formed by
> DMDACreateGlobalVector?
>

Most things work the same. About the only thing that is different is that
we set a special viewer for vectors from DMDACreateGlobalVector()
which puts it in lexicographic order on output.


> I'm also not sure about what the dof and stencil width arguments do.
>

'dof' is how many unknowns lie at each vertex. 'sw' is the width of the
ghost region for local vectors.


> I'm still unsure about the usage of MatCreateShell and
> MatShellSetOperation, since it seems that MyMatMult should still have 3
> inputs just like MatMult (the matrix and two vectors).
>

MatShell is a type where you provide your own function implementations,
rather than using those for a particular storage format, like AIJ.


> Since I'm not forming the matrix, does that mean the matrix input is
> meaningless but still needs to exist for the sake of this format?
>

It means you calculate the output yourself, using the input.


> After I create such a shell matrix, can I use it like a regular matrix in
> KSP and utilize preconditioners?
>

Many preconditioners want access to individual elements of the matrix,
which usually will not work with shell matrices, since the
user just wants to provide the multiply routine.

  Thanks,

    Matt


> Thanks!
> Yuyun
> ------------------------------
> *From:* petsc-users <petsc-users-bounces at mcs.anl.gov> on behalf of Yuyun
> Yang <yyang85 at stanford.edu>
> *Sent:* Sunday, February 16, 2020 3:12 AM
> *To:* Smith, Barry F. <bsmith at mcs.anl.gov>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Matrix-free method in PETSc
>
> Thank you, that is very helpful information indeed! I will try it and send
> you my code when it works.
>
> Best regards,
> Yuyun
> ------------------------------
> *From:* Smith, Barry F. <bsmith at mcs.anl.gov>
> *Sent:* Saturday, February 15, 2020 10:02 PM
> *To:* Yuyun Yang <yyang85 at stanford.edu>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Matrix-free method in PETSc
>
>   Yuyun,
>
>     If you are speaking about using a finite difference stencil on a
> structured grid where you provide the Jacobian vector products yourself by
> looping over the grid doing the stencil operation we unfortunately do not
> have exactly that kind of example.
>
>     But it is actually not difficult. I suggest starting with
> src/ts/examples/tests/ex22.c It computes the sparse matrix explicitly with
> FormIJacobian()
>
>     What you need to do is instead in main() use MatCreateShell() and
> MatShellSetOperation(,MATOP_MULT,(void (*)(void))MyMatMult) then provide
> the routine MyMatMult() to do your stencil based matrix free product; note
> that you can create this new routine by taking the structure of IFunction()
> and reorganizing it to do the Jacobian product instead. You will need to
> get the information about the shell matrix size on each process by calling
> DMDAGetCorners().
>
>     You will then remove the explicit computation of the Jacobian, and
> also remove the Event stuff since you don't need it.
>
>      Extending to 2 and 3d is straight forward.
>
>      Any questions let us know.
>
>    Barry
>
>    If you like this would make a great merge request with your code to
> improve our examples.
>
>
> > On Feb 15, 2020, at 9:42 PM, Yuyun Yang <yyang85 at stanford.edu> wrote:
> >
> > Hello team,
> >
> > I wanted to apply the Krylov subspace method to a matrix-free
> implementation of a stencil, such that the iterative method acts on the
> operation without ever constructing the matrix explicitly (for example,
> when doing backward Euler).
> >
> > I'm not sure whether there is already an example for that somewhere. If
> so, could you point me to a relevant example?
> >
> > Thank you!
> >
> > Best regards,
> > Yuyun
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200217/1c986654/attachment.html>

From jczhang at mcs.anl.gov  Mon Feb 17 10:19:15 2020
From: jczhang at mcs.anl.gov (Junchao Zhang)
Date: Mon, 17 Feb 2020 10:19:15 -0600
Subject: [petsc-users] Crash caused by strange error in KSPSetUp
In-Reply-To: <CA+MQGp-cEONx8SLsVW3Y_H29ijOi+uP7nZgquLQ6YM_uYoVxrQ@mail.gmail.com>
References: <CAEMparSWcyXdHyMFCMGDxSLdtktjmk+_rnB0TfdJyTXokbuRDA@mail.gmail.com>
	<CD9F4826-CECA-40BD-999D-BD01DACEC8F2@anl.gov>
	<CAEMparRFQ=1A-vJVGZDLM3r4FwO7wGBum8xej7ZrVYradF4rHg@mail.gmail.com>
	<CA+MQGp_U24udLR-aNPhDEG1o5skvpyMmxO64NFoHzFM2Vwmqgg@mail.gmail.com>
	<CAEMparQmCWxuCMihpWcJB6PfiuY+vnz76jwh+aaMy1_zvyJ5qQ@mail.gmail.com>
	<CAEMparQkSDuO4dhkAgLsFTrQTrDj=60yGTjpEqcY77FAENmQ4g@mail.gmail.com>
	<CA+MQGp-cEONx8SLsVW3Y_H29ijOi+uP7nZgquLQ6YM_uYoVxrQ@mail.gmail.com>
Message-ID: <CA+MQGp-pF1x+7i7VKCft_m4wAr-t3cqer+oTM7vHx=6C3noDOw@mail.gmail.com>

Hi, Richard,
 I tested the case you sent over and found it did fail due to the 32-bit
overflow on number of non-zeros, and with a 64-bit built petsc it passed.
You had a typo when you reported that --with-64-bit-indicies=yes failed. It
should be --with-64-bit-indices=yes.
 You can go with a 64-bit built petsc, or you can go with parallel
computing and run with multiple MPI ranks so that each rank has less
non-zeros and it is faster (but you need to make sure that code is
correctly parallelized).
 Barry's recent fix ierr = PetscIntCast(nz64,&nz);CHKERRQ(ierr); would
print more useful error messages in this case.  Barry, should we patch it
back to 3.6.3?

--Junchao Zhang


On Sun, Feb 16, 2020 at 11:37 PM Junchao Zhang <jczhang at mcs.anl.gov> wrote:

> Richard,
>   I managed to get the code Simlul at trophy built. Could you tell me how to
> run your test? I want to see if I can reproduce the error. Thanks
>
> --Junchao Zhang
>
>
> On Fri, Feb 14, 2020 at 8:34 PM Richard Beare <richard.beare at monash.edu>
> wrote:
>
>> It doesn't compile out of the box with master.
>>
>> singularity def file attached.
>>
>> On Sat, 15 Feb 2020 at 08:03, Richard Beare <richard.beare at monash.edu>
>> wrote:
>>
>>> I will see if I can build with master. The docs for simulatrophy say
>>> 3.6.3.1.
>>>
>>> On Sat, 15 Feb 2020 at 02:47, Junchao Zhang <jczhang at mcs.anl.gov> wrote:
>>>
>>>> Which petsc version do you use? In aij.c of the master branch, I saw
>>>> Barry recently added a useful check to catch number of nonzero overflow,
>>>> ierr = PetscIntCast(nz64,&nz);CHKERRQ(ierr);  But you mentioned using
>>>> 64-bit indices did not solve the problem, it might not be the reason.  You
>>>> should try the master branch if feasible. Also, vary number of MPI ranks to
>>>> see if error stack changes.
>>>>
>>>> --Junchao Zhang
>>>>
>>>>
>>>> On Fri, Feb 14, 2020 at 5:12 AM Richard Beare via petsc-users <
>>>> petsc-users at mcs.anl.gov> wrote:
>>>>
>>>>> No luck - exactly the same error after including the
>>>>> --with-64-bit-indicies=yes --download-mpich=yes options
>>>>>
>>>>> ==8674== Argument 'size' of function memalign has a fishy (possibly
>>>>> negative) value: -17152036540
>>>>> ==8674==    at 0x4C320A6: memalign (in
>>>>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>>>>> ==8674==    by 0x4F0CFF2: PetscMallocAlign(unsigned long, int, char
>>>>> const*, char const*, void**) (mal.c:28)
>>>>> ==8674==    by 0x4F0F716: PetscTrMallocDefault(unsigned long, int,
>>>>> char const*, char const*, void**) (mtr.c:188)
>>>>> ==8674==    by 0x569AF3E: MatSeqAIJSetPreallocation_SeqAIJ (aij.c:3595)
>>>>> ==8674==    by 0x569A531: MatSeqAIJSetPreallocation (aij.c:3539)
>>>>> ==8674==    by 0x599080A: DMCreateMatrix_DA_3d_MPIAIJ(_p_DM*, _p_Mat*)
>>>>> (fdda.c:1085)
>>>>> ==8674==    by 0x598B937: DMCreateMatrix_DA(_p_DM*, _p_Mat**)
>>>>> (fdda.c:759)
>>>>> ==8674==    by 0x58A2BF2: DMCreateMatrix (dm.c:956)
>>>>> ==8674==    by 0x5E377B3: KSPSetUp (itfunc.c:262)
>>>>> ==8674==    by 0x409FFC: PetscAdLemTaras3D::solveModel(bool)
>>>>> (PetscAdLemTaras3D.hxx:255)
>>>>> ==8674==    by 0x4239FB: AdLem3D<3u>::solveModel(bool, bool, bool)
>>>>> (AdLem3D.hxx:551)
>>>>> ==8674==    by 0x41BD17: main (PetscAdLemMain.cxx:344)
>>>>> ==8674==
>>>>> On Fri, 14 Feb 2020 at 17:07, Smith, Barry F. <bsmith at mcs.anl.gov>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>    Richard,
>>>>>>
>>>>>>      It is likely that for these problems some of the integers become
>>>>>> too large for the int variable to hold them, thus they overflow and become
>>>>>> negative.
>>>>>>
>>>>>>      You should make a new PETSC_ARCH configuration of PETSc that
>>>>>> uses the configure option --with-64-bit-indices, this will change PETSc to
>>>>>> use 64 bit integers which will not overflow.
>>>>>>
>>>>>>      Good luck and let us know how it works out
>>>>>>
>>>>>>     Barry
>>>>>>
>>>>>>      Probably the code is built with an older version of PETSc; the
>>>>>> later versions should produce a more useful error message.
>>>>>>
>>>>>> > On Feb 13, 2020, at 11:43 PM, Richard Beare via petsc-users <
>>>>>> petsc-users at mcs.anl.gov> wrote:
>>>>>> >
>>>>>> > Hi Everyone,
>>>>>> > I am experimenting with the Simlul at trophy tool (
>>>>>> https://github.com/Inria-Asclepios/simul-atrophy) that uses petsc to
>>>>>> simulate brain atrophy based on segmented MRI data. I am not the author. I
>>>>>> have this running on most of a dataset of about 50 scans, but experience
>>>>>> crashes with several that I am trying to track down. However I am out of
>>>>>> ideas. The problem images are slightly bigger than some of the successful
>>>>>> ones, but not substantially so, and I have experimented on machines with
>>>>>> sufficient RAM. The error happens very quickly, as part of setup - see the
>>>>>> valgrind report below. I haven't managed to get the sgcheck tool to work
>>>>>> yet. I can only guess that the ksp object is somehow becoming corrupted
>>>>>> during the setup process, but the array sizes that I can track (which
>>>>>> derive from image sizes), appear correct at every point I can check. Any
>>>>>> suggestions as to how I can check what might go wrong in the setup of the
>>>>>> ksp object?
>>>>>> > Thankyou.
>>>>>> >
>>>>>> > valgrind tells me:
>>>>>> >
>>>>>> > ==18175== Argument 'size' of function memalign has a fishy
>>>>>> (possibly negative) value: -17152038144
>>>>>> > ==18175==    at 0x4C320A6: memalign (in
>>>>>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>>>>>> > ==18175==    by 0x4F0F1F2: PetscMallocAlign(unsigned long, int,
>>>>>> char const*, char const*, void**) (mal.c:28)
>>>>>> > ==18175==    by 0x56B43CA: MatSeqAIJSetPreallocation_SeqAIJ
>>>>>> (aij.c:3595)
>>>>>> > ==18175==    by 0x56B39BD: MatSeqAIJSetPreallocation (aij.c:3539)
>>>>>> > ==18175==    by 0x59A9B44: DMCreateMatrix_DA_3d_MPIAIJ(_p_DM*,
>>>>>> _p_Mat*) (fdda.c:1085)
>>>>>> > ==18175==    by 0x59A4C71: DMCreateMatrix_DA(_p_DM*, _p_Mat**)
>>>>>> (fdda.c:759)
>>>>>> > ==18175==    by 0x58BBD29: DMCreateMatrix (dm.c:956)
>>>>>> > ==18175==    by 0x5E509D5: KSPSetUp (itfunc.c:262)
>>>>>> > ==18175==    by 0x40A3DE: PetscAdLemTaras3D::solveModel(bool)
>>>>>> (PetscAdLemTaras3D.hxx:269)
>>>>>> > ==18175==    by 0x42413F: AdLem3D<3u>::solveModel(bool, bool, bool)
>>>>>> (AdLem3D.hxx:552)
>>>>>> > ==18175==    by 0x41C25C: main (PetscAdLemMain.cxx:349)
>>>>>> > ==18175==
>>>>>> >
>>>>>> > --
>>>>>> > --
>>>>>> > A/Prof Richard Beare
>>>>>> > Imaging and Bioinformatics, Peninsula Clinical School
>>>>>> > orcid.org/0000-0002-7530-5664
>>>>>> > Richard.Beare at monash.edu
>>>>>> > +61 3 9788 1724
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > Geospatial Research:
>>>>>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> --
>>>>> A/Prof Richard Beare
>>>>> Imaging and Bioinformatics, Peninsula Clinical School
>>>>> orcid.org/0000-0002-7530-5664
>>>>> Richard.Beare at monash.edu
>>>>> +61 3 9788 1724
>>>>>
>>>>>
>>>>>
>>>>> Geospatial Research:
>>>>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
>>>>>
>>>>
>>>
>>> --
>>> --
>>> A/Prof Richard Beare
>>> Imaging and Bioinformatics, Peninsula Clinical School
>>> orcid.org/0000-0002-7530-5664
>>> Richard.Beare at monash.edu
>>> +61 3 9788 1724
>>>
>>>
>>>
>>> Geospatial Research:
>>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
>>>
>>
>>
>> --
>> --
>> A/Prof Richard Beare
>> Imaging and Bioinformatics, Peninsula Clinical School
>> orcid.org/0000-0002-7530-5664
>> Richard.Beare at monash.edu
>> +61 3 9788 1724
>>
>>
>>
>> Geospatial Research:
>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200217/571034de/attachment-0001.html>

From juaneah at gmail.com  Mon Feb 17 12:19:07 2020
From: juaneah at gmail.com (Emmanuel Ayala)
Date: Mon, 17 Feb 2020 12:19:07 -0600
Subject: [petsc-users] SLEPc: The inner product is not well defined
In-Reply-To: <E7DB53E3-2E41-4A5B-9C07-893ABB64C7FC@dsic.upv.es>
References: <CAMo+o5hZSMjEUS_pM62K0g46hYx9KBffJ7ETc=0QNqoAzZHpZw@mail.gmail.com>
	<E7DB53E3-2E41-4A5B-9C07-893ABB64C7FC@dsic.upv.es>
Message-ID: <CAMo+o5izPGYrJ+4gGwofJJZNgU5wpGrqYcKssjpbrUF0UV=XJg@mail.gmail.com>

Thank you very much for the answer.

This error appears when computing the B-norm of a vector x, as
> sqrt(x'*B*x). Probably your B matrix is semi-definite, and due to
> floating-point error the value x'*B*x becomes negative for a certain vector
> x. The code uses a tolerance of 10*PETSC_MACHINE_EPSILON, but it seems the
> rounding errors are larger in your case. Or maybe your B-matrix is
> indefinite, in which case you should solve the problem as non-symmetric (or
> as symmetric-indefinite GHIEP).
>
> Do you get the same problem with the Krylov-Schur solver?
>
>
After check the input matrices, the problem was solved using GHIEP.


> A workaround is to edit the source code and remove the check or increase
> the tolerance, but this may be catastrophic if your B is indefinite. A
> better solution is to reformulate the problem, solving the matrix pair
> (A,C) where C=alpha*A+beta*B is positive definite (note that then the
> eigenvalues become lambda/(beta+alpha*lambda)).
>
>
Ok, there is a rule to choose the values for alpha and beta?

Kind regards.
Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200217/a924889b/attachment.html>

From juaneah at gmail.com  Mon Feb 17 12:35:38 2020
From: juaneah at gmail.com (Emmanuel Ayala)
Date: Mon, 17 Feb 2020 12:35:38 -0600
Subject: [petsc-users] BCs for a EPS solver
Message-ID: <CAMo+o5j6KYjF-8kk=whV_BevUs3WTHvvNxY9EiLFywSCvcH=Qw@mail.gmail.com>

Hi everyone,

I have an eigenvalue problem where I need to apply BCs to the stiffness and
mass matrix.

Usually, for KSP solver, it is enough to set to zero the rows and columns
related to the boundary conditions. I used to apply it with
MatZeroRowsColumns, with a 1s on the diagonal. Then the solver works well.

There is something similar to KSP for EPS solver ?

I already used MatZeroRowsColumns (for EPS solver), with a 1s on the
diagonal, and I got wrong result.

Kind regards.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200217/55802214/attachment.html>

From jeremy at seamplex.com  Mon Feb 17 12:39:47 2020
From: jeremy at seamplex.com (Jeremy Theler)
Date: Mon, 17 Feb 2020 15:39:47 -0300
Subject: [petsc-users] BCs for a EPS solver
In-Reply-To: <CAMo+o5j6KYjF-8kk=whV_BevUs3WTHvvNxY9EiLFywSCvcH=Qw@mail.gmail.com>
References: <CAMo+o5j6KYjF-8kk=whV_BevUs3WTHvvNxY9EiLFywSCvcH=Qw@mail.gmail.com>
Message-ID: <e8761982c583b46798d583c0e5e47654dbdff1b1.camel@seamplex.com>

The usual trick is to set ones in one matrix and zeros in the other
one.


On Mon, 2020-02-17 at 12:35 -0600, Emmanuel Ayala wrote:
> Hi everyone,
> 
> I have an eigenvalue problem where I need to apply BCs to the
> stiffness and mass matrix. 
> 
> Usually, for KSP solver, it is enough to set to zero the rows and
> columns related to the boundary conditions. I used to apply it with
> MatZeroRowsColumns, with a 1s on the diagonal. Then the solver works
> well.
> 
> There is something similar to KSP for EPS solver ? 
> 
> I already used MatZeroRowsColumns (for EPS solver), with a 1s on the
> diagonal, and I got wrong result.
> 
> Kind regards.
> 
> 
> 
> 


From juaneah at gmail.com  Mon Feb 17 12:57:50 2020
From: juaneah at gmail.com (Emmanuel Ayala)
Date: Mon, 17 Feb 2020 12:57:50 -0600
Subject: [petsc-users] BCs for a EPS solver
In-Reply-To: <e8761982c583b46798d583c0e5e47654dbdff1b1.camel@seamplex.com>
References: <CAMo+o5j6KYjF-8kk=whV_BevUs3WTHvvNxY9EiLFywSCvcH=Qw@mail.gmail.com>
	<e8761982c583b46798d583c0e5e47654dbdff1b1.camel@seamplex.com>
Message-ID: <CAMo+o5iz4dBwyS1gdj4UrTaLFze_AHOpmcW6LOFa-xHOXCcVxQ@mail.gmail.com>

Hi, thanks for the quick answer.

I just did it, and it does not work. My problem is GNHEP and I use the
default solver (Krylov-Schur). Moreover I run the code with the options:
-st_ksp_type preonly -st_pc_type lu -st_pc_factor_mat_solver_type mumps

Any other suggestions?
Kind regards.

El lun., 17 de feb. de 2020 a la(s) 12:39, Jeremy Theler (
jeremy at seamplex.com) escribi?:

> The usual trick is to set ones in one matrix and zeros in the other
> one.
>
>
> On Mon, 2020-02-17 at 12:35 -0600, Emmanuel Ayala wrote:
> > Hi everyone,
> >
> > I have an eigenvalue problem where I need to apply BCs to the
> > stiffness and mass matrix.
> >
> > Usually, for KSP solver, it is enough to set to zero the rows and
> > columns related to the boundary conditions. I used to apply it with
> > MatZeroRowsColumns, with a 1s on the diagonal. Then the solver works
> > well.
> >
> > There is something similar to KSP for EPS solver ?
> >
> > I already used MatZeroRowsColumns (for EPS solver), with a 1s on the
> > diagonal, and I got wrong result.
> >
> > Kind regards.
> >
> >
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200217/4be24c1a/attachment.html>

From knepley at gmail.com  Mon Feb 17 13:20:26 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 17 Feb 2020 14:20:26 -0500
Subject: [petsc-users] BCs for a EPS solver
In-Reply-To: <CAMo+o5iz4dBwyS1gdj4UrTaLFze_AHOpmcW6LOFa-xHOXCcVxQ@mail.gmail.com>
References: <CAMo+o5j6KYjF-8kk=whV_BevUs3WTHvvNxY9EiLFywSCvcH=Qw@mail.gmail.com>
	<e8761982c583b46798d583c0e5e47654dbdff1b1.camel@seamplex.com>
	<CAMo+o5iz4dBwyS1gdj4UrTaLFze_AHOpmcW6LOFa-xHOXCcVxQ@mail.gmail.com>
Message-ID: <CAMYG4G=q4jkrEgAHn1id-hjQfnceN_BQ5MFk=uS9Jn-2inW2ig@mail.gmail.com>

On Mon, Feb 17, 2020 at 1:59 PM Emmanuel Ayala <juaneah at gmail.com> wrote:

> Hi, thanks for the quick answer.
>
> I just did it, and it does not work. My problem is GNHEP and I use the
> default solver (Krylov-Schur). Moreover I run the code with the options:
> -st_ksp_type preonly -st_pc_type lu -st_pc_factor_mat_solver_type mumps
>

I guess a better question is: What do you expect to work?

For a linear solve,

  A x = b

if a row i is 0 except for a one on the diagonal, then I get

  x_i = b_i

so hopefully you put the correct boundary value in b_i. For the generalized
eigenproblem

  A x = \lambda B x

if you set row i to the identity in A, and zero in B, we get

  x_i = 0

and you must put the boundary values into x after you have finished the
solve. Is this what you did?

  Thanks,

     Matt


> Any other suggestions?
> Kind regards.
>
> El lun., 17 de feb. de 2020 a la(s) 12:39, Jeremy Theler (
> jeremy at seamplex.com) escribi?:
>
>> The usual trick is to set ones in one matrix and zeros in the other
>> one.
>>
>>
>> On Mon, 2020-02-17 at 12:35 -0600, Emmanuel Ayala wrote:
>> > Hi everyone,
>> >
>> > I have an eigenvalue problem where I need to apply BCs to the
>> > stiffness and mass matrix.
>> >
>> > Usually, for KSP solver, it is enough to set to zero the rows and
>> > columns related to the boundary conditions. I used to apply it with
>> > MatZeroRowsColumns, with a 1s on the diagonal. Then the solver works
>> > well.
>> >
>> > There is something similar to KSP for EPS solver ?
>> >
>> > I already used MatZeroRowsColumns (for EPS solver), with a 1s on the
>> > diagonal, and I got wrong result.
>> >
>> > Kind regards.
>> >
>> >
>> >
>> >
>>
>>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200217/351e3fac/attachment-0001.html>

From hongzhang at anl.gov  Mon Feb 17 13:44:14 2020
From: hongzhang at anl.gov (Zhang, Hong)
Date: Mon, 17 Feb 2020 19:44:14 +0000
Subject: [petsc-users] Forming a matrix from vectors
In-Reply-To: <CA+vu_XjeBm6EdAwXj-GZO3MubpvFtV2XttFK7B54oxE5N_yAjg@mail.gmail.com>
References: <CA+vu_XjeBm6EdAwXj-GZO3MubpvFtV2XttFK7B54oxE5N_yAjg@mail.gmail.com>
Message-ID: <74DB1A51-307D-4A44-9FD5-196FCAB435D0@anl.gov>

You can create a dense matrix and use VecPlaceArray() to take a column out of the matrix as a vector. For example,

MatCreateDense()
MatDenseGetColumn(A,0,col)
VecPlaceArray(v,col)

? // fill in the vector with values

VecResetArray(v)
MatDenseRestoreColumn(A,&col)

Hong (Mr.)

> On Feb 17, 2020, at 2:35 AM, Eda Oktay <eda.oktay at metu.edu.tr> wrote:
> 
> Hello all,
> 
> I am trying to form a matrix whose columns are eigenvectors I have calculated before U = [v1,v2,v3]. Is there any easy way of forming this matrix? My matrix should be parallel and I have created vectors as below, where nev i s the number of requested eigenvalues. So each V[i] represents an eigenvector and I should form a matrix by using V.
> 
> Vec *V;
>   VecDuplicateVecs(vr,nev,&V); 
>   for (i=0; i<nev;i++){
>   ierr = EPSGetEigenpair(eps,i,&kr,NULL,V[i],NULL);  
>   }
> 
> Thanks!
> 
> Eda


From richard.beare at monash.edu  Mon Feb 17 14:16:04 2020
From: richard.beare at monash.edu (Richard Beare)
Date: Tue, 18 Feb 2020 07:16:04 +1100
Subject: [petsc-users] Crash caused by strange error in KSPSetUp
In-Reply-To: <CA+MQGp-pF1x+7i7VKCft_m4wAr-t3cqer+oTM7vHx=6C3noDOw@mail.gmail.com>
References: <CAEMparSWcyXdHyMFCMGDxSLdtktjmk+_rnB0TfdJyTXokbuRDA@mail.gmail.com>
	<CD9F4826-CECA-40BD-999D-BD01DACEC8F2@anl.gov>
	<CAEMparRFQ=1A-vJVGZDLM3r4FwO7wGBum8xej7ZrVYradF4rHg@mail.gmail.com>
	<CA+MQGp_U24udLR-aNPhDEG1o5skvpyMmxO64NFoHzFM2Vwmqgg@mail.gmail.com>
	<CAEMparQmCWxuCMihpWcJB6PfiuY+vnz76jwh+aaMy1_zvyJ5qQ@mail.gmail.com>
	<CAEMparQkSDuO4dhkAgLsFTrQTrDj=60yGTjpEqcY77FAENmQ4g@mail.gmail.com>
	<CA+MQGp-cEONx8SLsVW3Y_H29ijOi+uP7nZgquLQ6YM_uYoVxrQ@mail.gmail.com>
	<CA+MQGp-pF1x+7i7VKCft_m4wAr-t3cqer+oTM7vHx=6C3noDOw@mail.gmail.com>
Message-ID: <CAEMparR+_3guDgasgOxFAZB6wifaX3KdzpX_jUJ61LfXdw=56Q@mail.gmail.com>

Awesome - thanks for that. I will check it out. I will also look at what
needs to be done to bring simulatrophy to a more recent version of petsc.

On Tue, 18 Feb 2020 at 03:19, Junchao Zhang <jczhang at mcs.anl.gov> wrote:

> Hi, Richard,
>  I tested the case you sent over and found it did fail due to the 32-bit
> overflow on number of non-zeros, and with a 64-bit built petsc it passed.
> You had a typo when you reported that --with-64-bit-indicies=yes failed. It
> should be --with-64-bit-indices=yes.
>  You can go with a 64-bit built petsc, or you can go with parallel
> computing and run with multiple MPI ranks so that each rank has less
> non-zeros and it is faster (but you need to make sure that code is
> correctly parallelized).
>  Barry's recent fix ierr = PetscIntCast(nz64,&nz);CHKERRQ(ierr); would
> print more useful error messages in this case.  Barry, should we patch it
> back to 3.6.3?
>
> --Junchao Zhang
>
>
> On Sun, Feb 16, 2020 at 11:37 PM Junchao Zhang <jczhang at mcs.anl.gov>
> wrote:
>
>> Richard,
>>   I managed to get the code Simlul at trophy built. Could you tell me how
>> to run your test? I want to see if I can reproduce the error. Thanks
>>
>> --Junchao Zhang
>>
>>
>> On Fri, Feb 14, 2020 at 8:34 PM Richard Beare <richard.beare at monash.edu>
>> wrote:
>>
>>> It doesn't compile out of the box with master.
>>>
>>> singularity def file attached.
>>>
>>> On Sat, 15 Feb 2020 at 08:03, Richard Beare <richard.beare at monash.edu>
>>> wrote:
>>>
>>>> I will see if I can build with master. The docs for simulatrophy say
>>>> 3.6.3.1.
>>>>
>>>> On Sat, 15 Feb 2020 at 02:47, Junchao Zhang <jczhang at mcs.anl.gov>
>>>> wrote:
>>>>
>>>>> Which petsc version do you use? In aij.c of the master branch, I saw
>>>>> Barry recently added a useful check to catch number of nonzero overflow,
>>>>> ierr = PetscIntCast(nz64,&nz);CHKERRQ(ierr);  But you mentioned using
>>>>> 64-bit indices did not solve the problem, it might not be the reason.  You
>>>>> should try the master branch if feasible. Also, vary number of MPI ranks to
>>>>> see if error stack changes.
>>>>>
>>>>> --Junchao Zhang
>>>>>
>>>>>
>>>>> On Fri, Feb 14, 2020 at 5:12 AM Richard Beare via petsc-users <
>>>>> petsc-users at mcs.anl.gov> wrote:
>>>>>
>>>>>> No luck - exactly the same error after including the
>>>>>> --with-64-bit-indicies=yes --download-mpich=yes options
>>>>>>
>>>>>> ==8674== Argument 'size' of function memalign has a fishy (possibly
>>>>>> negative) value: -17152036540
>>>>>> ==8674==    at 0x4C320A6: memalign (in
>>>>>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>>>>>> ==8674==    by 0x4F0CFF2: PetscMallocAlign(unsigned long, int, char
>>>>>> const*, char const*, void**) (mal.c:28)
>>>>>> ==8674==    by 0x4F0F716: PetscTrMallocDefault(unsigned long, int,
>>>>>> char const*, char const*, void**) (mtr.c:188)
>>>>>> ==8674==    by 0x569AF3E: MatSeqAIJSetPreallocation_SeqAIJ
>>>>>> (aij.c:3595)
>>>>>> ==8674==    by 0x569A531: MatSeqAIJSetPreallocation (aij.c:3539)
>>>>>> ==8674==    by 0x599080A: DMCreateMatrix_DA_3d_MPIAIJ(_p_DM*,
>>>>>> _p_Mat*) (fdda.c:1085)
>>>>>> ==8674==    by 0x598B937: DMCreateMatrix_DA(_p_DM*, _p_Mat**)
>>>>>> (fdda.c:759)
>>>>>> ==8674==    by 0x58A2BF2: DMCreateMatrix (dm.c:956)
>>>>>> ==8674==    by 0x5E377B3: KSPSetUp (itfunc.c:262)
>>>>>> ==8674==    by 0x409FFC: PetscAdLemTaras3D::solveModel(bool)
>>>>>> (PetscAdLemTaras3D.hxx:255)
>>>>>> ==8674==    by 0x4239FB: AdLem3D<3u>::solveModel(bool, bool, bool)
>>>>>> (AdLem3D.hxx:551)
>>>>>> ==8674==    by 0x41BD17: main (PetscAdLemMain.cxx:344)
>>>>>> ==8674==
>>>>>> On Fri, 14 Feb 2020 at 17:07, Smith, Barry F. <bsmith at mcs.anl.gov>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>    Richard,
>>>>>>>
>>>>>>>      It is likely that for these problems some of the integers
>>>>>>> become too large for the int variable to hold them, thus they overflow and
>>>>>>> become negative.
>>>>>>>
>>>>>>>      You should make a new PETSC_ARCH configuration of PETSc that
>>>>>>> uses the configure option --with-64-bit-indices, this will change PETSc to
>>>>>>> use 64 bit integers which will not overflow.
>>>>>>>
>>>>>>>      Good luck and let us know how it works out
>>>>>>>
>>>>>>>     Barry
>>>>>>>
>>>>>>>      Probably the code is built with an older version of PETSc; the
>>>>>>> later versions should produce a more useful error message.
>>>>>>>
>>>>>>> > On Feb 13, 2020, at 11:43 PM, Richard Beare via petsc-users <
>>>>>>> petsc-users at mcs.anl.gov> wrote:
>>>>>>> >
>>>>>>> > Hi Everyone,
>>>>>>> > I am experimenting with the Simlul at trophy tool (
>>>>>>> https://github.com/Inria-Asclepios/simul-atrophy) that uses petsc
>>>>>>> to simulate brain atrophy based on segmented MRI data. I am not the author.
>>>>>>> I have this running on most of a dataset of about 50 scans, but experience
>>>>>>> crashes with several that I am trying to track down. However I am out of
>>>>>>> ideas. The problem images are slightly bigger than some of the successful
>>>>>>> ones, but not substantially so, and I have experimented on machines with
>>>>>>> sufficient RAM. The error happens very quickly, as part of setup - see the
>>>>>>> valgrind report below. I haven't managed to get the sgcheck tool to work
>>>>>>> yet. I can only guess that the ksp object is somehow becoming corrupted
>>>>>>> during the setup process, but the array sizes that I can track (which
>>>>>>> derive from image sizes), appear correct at every point I can check. Any
>>>>>>> suggestions as to how I can check what might go wrong in the setup of the
>>>>>>> ksp object?
>>>>>>> > Thankyou.
>>>>>>> >
>>>>>>> > valgrind tells me:
>>>>>>> >
>>>>>>> > ==18175== Argument 'size' of function memalign has a fishy
>>>>>>> (possibly negative) value: -17152038144
>>>>>>> > ==18175==    at 0x4C320A6: memalign (in
>>>>>>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>>>>>>> > ==18175==    by 0x4F0F1F2: PetscMallocAlign(unsigned long, int,
>>>>>>> char const*, char const*, void**) (mal.c:28)
>>>>>>> > ==18175==    by 0x56B43CA: MatSeqAIJSetPreallocation_SeqAIJ
>>>>>>> (aij.c:3595)
>>>>>>> > ==18175==    by 0x56B39BD: MatSeqAIJSetPreallocation (aij.c:3539)
>>>>>>> > ==18175==    by 0x59A9B44: DMCreateMatrix_DA_3d_MPIAIJ(_p_DM*,
>>>>>>> _p_Mat*) (fdda.c:1085)
>>>>>>> > ==18175==    by 0x59A4C71: DMCreateMatrix_DA(_p_DM*, _p_Mat**)
>>>>>>> (fdda.c:759)
>>>>>>> > ==18175==    by 0x58BBD29: DMCreateMatrix (dm.c:956)
>>>>>>> > ==18175==    by 0x5E509D5: KSPSetUp (itfunc.c:262)
>>>>>>> > ==18175==    by 0x40A3DE: PetscAdLemTaras3D::solveModel(bool)
>>>>>>> (PetscAdLemTaras3D.hxx:269)
>>>>>>> > ==18175==    by 0x42413F: AdLem3D<3u>::solveModel(bool, bool,
>>>>>>> bool) (AdLem3D.hxx:552)
>>>>>>> > ==18175==    by 0x41C25C: main (PetscAdLemMain.cxx:349)
>>>>>>> > ==18175==
>>>>>>> >
>>>>>>> > --
>>>>>>> > --
>>>>>>> > A/Prof Richard Beare
>>>>>>> > Imaging and Bioinformatics, Peninsula Clinical School
>>>>>>> > orcid.org/0000-0002-7530-5664
>>>>>>> > Richard.Beare at monash.edu
>>>>>>> > +61 3 9788 1724
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > Geospatial Research:
>>>>>>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> --
>>>>>> A/Prof Richard Beare
>>>>>> Imaging and Bioinformatics, Peninsula Clinical School
>>>>>> orcid.org/0000-0002-7530-5664
>>>>>> Richard.Beare at monash.edu
>>>>>> +61 3 9788 1724
>>>>>>
>>>>>>
>>>>>>
>>>>>> Geospatial Research:
>>>>>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
>>>>>>
>>>>>
>>>>
>>>> --
>>>> --
>>>> A/Prof Richard Beare
>>>> Imaging and Bioinformatics, Peninsula Clinical School
>>>> orcid.org/0000-0002-7530-5664
>>>> Richard.Beare at monash.edu
>>>> +61 3 9788 1724
>>>>
>>>>
>>>>
>>>> Geospatial Research:
>>>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
>>>>
>>>
>>>
>>> --
>>> --
>>> A/Prof Richard Beare
>>> Imaging and Bioinformatics, Peninsula Clinical School
>>> orcid.org/0000-0002-7530-5664
>>> Richard.Beare at monash.edu
>>> +61 3 9788 1724
>>>
>>>
>>>
>>> Geospatial Research:
>>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
>>>
>>

-- 
--
A/Prof Richard Beare
Imaging and Bioinformatics, Peninsula Clinical School
orcid.org/0000-0002-7530-5664
Richard.Beare at monash.edu
+61 3 9788 1724



Geospatial Research:
https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200218/648c10b2/attachment.html>

From juaneah at gmail.com  Mon Feb 17 15:33:26 2020
From: juaneah at gmail.com (Emmanuel Ayala)
Date: Mon, 17 Feb 2020 15:33:26 -0600
Subject: [petsc-users] BCs for a EPS solver
In-Reply-To: <CAMYG4G=q4jkrEgAHn1id-hjQfnceN_BQ5MFk=uS9Jn-2inW2ig@mail.gmail.com>
References: <CAMo+o5j6KYjF-8kk=whV_BevUs3WTHvvNxY9EiLFywSCvcH=Qw@mail.gmail.com>
	<e8761982c583b46798d583c0e5e47654dbdff1b1.camel@seamplex.com>
	<CAMo+o5iz4dBwyS1gdj4UrTaLFze_AHOpmcW6LOFa-xHOXCcVxQ@mail.gmail.com>
	<CAMYG4G=q4jkrEgAHn1id-hjQfnceN_BQ5MFk=uS9Jn-2inW2ig@mail.gmail.com>
Message-ID: <CAMo+o5hi48Nedqx_5Pc8b4+-TDejyTufxdssJA_-w4fs+J8M=A@mail.gmail.com>

Hi,

Thank you for the clarification, now I understand what means change those
values, and I tried to do that.

But if I put the row i to the identity in A, and zero in B, the solver
crash:

[0]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------
[0]PETSC ERROR: Error in external library
[1]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------
[1]PETSC ERROR: Error in external library
[1]PETSC ERROR: Error reported by MUMPS in numerical factorization phase:
INFOG(1)=-10, INFO(2)=0

[1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
for trouble shooting.
[1]PETSC ERROR: Petsc Release Version 3.12.3, Jan, 03, 2020
[1]PETSC ERROR: [2]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------
[2]PETSC ERROR: Error in external library
[2]PETSC ERROR: Error reported by MUMPS in numerical factorization phase:
INFOG(1)=-10, INFO(2)=0

[2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
for trouble shooting.
[2]PETSC ERROR: Petsc Release Version 3.12.3, Jan, 03, 2020
[2]PETSC ERROR: ./comp on a arch-linux-c-opt-O2-mumps named ayala by ayala
Mon Feb 17 15:28:01 2020
[3]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------
[3]PETSC ERROR: Error in external library
[3]PETSC ERROR: Error reported by MUMPS in numerical factorization phase:
INFOG(1)=-10, INFO(2)=0

[3]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
for trouble shooting.
[3]PETSC ERROR: Petsc Release Version 3.12.3, Jan, 03, 2020
[3]PETSC ERROR: ./comp on a arch-linux-c-opt-O2-mumps named ayala by ayala
Mon Feb 17 15:28:01 2020
[3]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2
-march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native"
FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich
--download-mumps --download-scalapack --download-parmetis --download-metis
--download-superlu_dist --download-cmake --download-fblaslapack=1
--with-cxx-dialect=C++11
[3]PETSC ERROR: #1 MatFactorNumeric_MUMPS() line 1365 in
/home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/impls/aij/mpi/mumps/mumps.c
[0]PETSC ERROR: Error reported by MUMPS in numerical factorization phase:
INFOG(1)=-10, INFO(2)=33

[0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.12.3, Jan, 03, 2020
[0]PETSC ERROR: ./comp on a arch-linux-c-opt-O2-mumps named ayala by ayala
Mon Feb 17 15:28:01 2020
[0]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2
-march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native"
FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich
--download-mumps --download-scalapack --download-parmetis --download-metis
--download-superlu_dist --download-cmake --download-fblaslapack=1
--with-cxx-dialect=C++11
[0]PETSC ERROR: #1 MatFactorNumeric_MUMPS() line 1365 in
/home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/impls/aij/mpi/mumps/mumps.c
./comp on a arch-linux-c-opt-O2-mumps named ayala by ayala Mon Feb 17
15:28:01 2020
[1]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2
-march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native"
FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich
--download-mumps --download-scalapack --download-parmetis --download-metis
--download-superlu_dist --download-cmake --download-fblaslapack=1
--with-cxx-dialect=C++11
[1]PETSC ERROR: #1 MatFactorNumeric_MUMPS() line 1365 in
/home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/impls/aij/mpi/mumps/mumps.c
[1]PETSC ERROR: #2 MatLUFactorNumeric() line 3057 in
/home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/interface/matrix.c
[2]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2
-march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native"
FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich
--download-mumps --download-scalapack --download-parmetis --download-metis
--download-superlu_dist --download-cmake --download-fblaslapack=1
--with-cxx-dialect=C++11
[2]PETSC ERROR: #1 MatFactorNumeric_MUMPS() line 1365 in
/home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/impls/aij/mpi/mumps/mumps.c
[2]PETSC ERROR: #2 MatLUFactorNumeric() line 3057 in
/home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/interface/matrix.c
[2]PETSC ERROR: [3]PETSC ERROR: #2 MatLUFactorNumeric() line 3057 in
/home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/interface/matrix.c
[3]PETSC ERROR: #3 PCSetUp_LU() line 126 in
/home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/impls/factor/lu/lu.c
[3]PETSC ERROR: #4 PCSetUp() line 894 in
/home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/interface/precon.c
[0]PETSC ERROR: #2 MatLUFactorNumeric() line 3057 in
/home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/interface/matrix.c
[0]PETSC ERROR: #3 PCSetUp_LU() line 126 in
/home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/impls/factor/lu/lu.c
[0]PETSC ERROR: [1]PETSC ERROR: #3 PCSetUp_LU() line 126 in
/home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/impls/factor/lu/lu.c
[1]PETSC ERROR: #4 PCSetUp() line 894 in
/home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/interface/precon.c
[1]PETSC ERROR: #3 PCSetUp_LU() line 126 in
/home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/impls/factor/lu/lu.c
[2]PETSC ERROR: #4 PCSetUp() line 894 in
/home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/interface/precon.c
[2]PETSC ERROR: #5 KSPSetUp() line 376 in
/home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/ksp/interface/itfunc.c
[3]PETSC ERROR: #5 KSPSetUp() line 376 in
/home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/ksp/interface/itfunc.c
[3]PETSC ERROR: #6 STSetUp_Shift() line 120 in
/home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/impls/shift/shift.c
#4 PCSetUp() line 894 in
/home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/interface/precon.c
[0]PETSC ERROR: #5 KSPSetUp() line 376 in
/home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/ksp/interface/itfunc.c
[0]PETSC ERROR: #6 STSetUp_Shift() line 120 in
/home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/impls/shift/shift.c
[0]PETSC ERROR: #5 KSPSetUp() line 376 in
/home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/ksp/interface/itfunc.c
[1]PETSC ERROR: #6 STSetUp_Shift() line 120 in
/home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/impls/shift/shift.c
[1]PETSC ERROR: #7 STSetUp() line 271 in
/home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/interface/stsolve.c
[1]PETSC ERROR: #8 EPSSetUp() line 273 in
/home/ayala/Documents/SLEPc/slepc-3.12.2/src/eps/interface/epssetup.c
[2]PETSC ERROR: #6 STSetUp_Shift() line 120 in
/home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/impls/shift/shift.c
[2]PETSC ERROR: #7 STSetUp() line 271 in
/home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/interface/stsolve.c
[2]PETSC ERROR: #8 EPSSetUp() line 273 in
/home/ayala/Documents/SLEPc/slepc-3.12.2/src/eps/interface/epssetup.c
[2]PETSC ERROR: #9 FourBar_NaturalPulsation() line 3937 in
/home/ayala/Nextcloud/cpp_projects/2020-02-13-muboto-balancing-v17-mma/Multibody.cc

El lun., 17 de feb. de 2020 a la(s) 13:20, Matthew Knepley (
knepley at gmail.com) escribi?:

> On Mon, Feb 17, 2020 at 1:59 PM Emmanuel Ayala <juaneah at gmail.com> wrote:
>
>> Hi, thanks for the quick answer.
>>
>> I just did it, and it does not work. My problem is GNHEP and I use the
>> default solver (Krylov-Schur). Moreover I run the code with the options:
>> -st_ksp_type preonly -st_pc_type lu -st_pc_factor_mat_solver_type mumps
>>
>
> I guess a better question is: What do you expect to work?
>
> For a linear solve,
>
>   A x = b
>
> if a row i is 0 except for a one on the diagonal, then I get
>
>   x_i = b_i
>
> so hopefully you put the correct boundary value in b_i. For the
> generalized eigenproblem
>
>   A x = \lambda B x
>
> if you set row i to the identity in A, and zero in B, we get
>
>   x_i = 0
>
> and you must put the boundary values into x after you have finished the
> solve. Is this what you did?
>
>   Thanks,
>
>      Matt
>
>
>> Any other suggestions?
>> Kind regards.
>>
>> El lun., 17 de feb. de 2020 a la(s) 12:39, Jeremy Theler (
>> jeremy at seamplex.com) escribi?:
>>
>>> The usual trick is to set ones in one matrix and zeros in the other
>>> one.
>>>
>>>
>>> On Mon, 2020-02-17 at 12:35 -0600, Emmanuel Ayala wrote:
>>> > Hi everyone,
>>> >
>>> > I have an eigenvalue problem where I need to apply BCs to the
>>> > stiffness and mass matrix.
>>> >
>>> > Usually, for KSP solver, it is enough to set to zero the rows and
>>> > columns related to the boundary conditions. I used to apply it with
>>> > MatZeroRowsColumns, with a 1s on the diagonal. Then the solver works
>>> > well.
>>> >
>>> > There is something similar to KSP for EPS solver ?
>>> >
>>> > I already used MatZeroRowsColumns (for EPS solver), with a 1s on the
>>> > diagonal, and I got wrong result.
>>> >
>>> > Kind regards.
>>> >
>>> >
>>> >
>>> >
>>>
>>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200217/701a7c3e/attachment-0001.html>

From bsmith at mcs.anl.gov  Mon Feb 17 17:34:32 2020
From: bsmith at mcs.anl.gov (Smith, Barry F.)
Date: Mon, 17 Feb 2020 23:34:32 +0000
Subject: [petsc-users] Matrix-free method in PETSc
In-Reply-To: <DM6PR02MB524259A2FC82DE98649B15E9D2160@DM6PR02MB5242.namprd02.prod.outlook.com>
References: <DM6PR02MB52427978C0D49D7F6D07FCF9D2170@DM6PR02MB5242.namprd02.prod.outlook.com>
	<D9C47F36-D516-4ABA-A31B-30CE5F8D9744@anl.gov>
	<DM6PR02MB5242C9E2F8ED6FB1A1A28CC8D2170@DM6PR02MB5242.namprd02.prod.outlook.com>
	<DM6PR02MB524259A2FC82DE98649B15E9D2160@DM6PR02MB5242.namprd02.prod.outlook.com>
Message-ID: <624EDFBF-E105-4F70-87C6-58BE940D73C9@mcs.anl.gov>



> On Feb 17, 2020, at 7:56 AM, Yuyun Yang <yyang85 at stanford.edu> wrote:
> 
> Hello,
> 
> I actually have a question about the usage of DMDA since I'm quite new to this. I wonder if the DMDA suite of functions can be directly called on vectors created from VecCreate?

   Yes, but you have to make sure the ones you create have the same sizes and parallel layouts. Generally best to get them from the DMDA or VecDuplicate() than the hassle of figuring out sizes.

> Or the vectors have to be formed by DMDACreateGlobalVector? I'm also not sure about what the dof and stencil width arguments do.
> 
> I'm still unsure about the usage of MatCreateShell and MatShellSetOperation, since it seems that MyMatMult should still have 3 inputs just like MatMult (the matrix and two vectors). Since I'm not forming the matrix, does that mean the matrix input is meaningless but still needs to exist for the sake of this format?

    Well the matrix input is your shell matrix so it likely has information you need to do your multiply routine. MatShellGetContext() (No you do not want to put your information about the matrix stencil inside global variables!)


> 
> After I create such a shell matrix, can I use it like a regular matrix in KSP and utilize preconditioners?
> 
> Thanks!
> Yuyun
> From: petsc-users <petsc-users-bounces at mcs.anl.gov> on behalf of Yuyun Yang <yyang85 at stanford.edu>
> Sent: Sunday, February 16, 2020 3:12 AM
> To: Smith, Barry F. <bsmith at mcs.anl.gov>
> Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> Subject: Re: [petsc-users] Matrix-free method in PETSc
>  
> Thank you, that is very helpful information indeed! I will try it and send you my code when it works.
> 
> Best regards,
> Yuyun
> From: Smith, Barry F. <bsmith at mcs.anl.gov>
> Sent: Saturday, February 15, 2020 10:02 PM
> To: Yuyun Yang <yyang85 at stanford.edu>
> Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> Subject: Re: [petsc-users] Matrix-free method in PETSc
>  
>   Yuyun,
> 
>     If you are speaking about using a finite difference stencil on a structured grid where you provide the Jacobian vector products yourself by looping over the grid doing the stencil operation we unfortunately do not have exactly that kind of example. 
> 
>     But it is actually not difficult. I suggest starting with src/ts/examples/tests/ex22.c It computes the sparse matrix explicitly with FormIJacobian() 
> 
>     What you need to do is instead in main() use MatCreateShell() and MatShellSetOperation(,MATOP_MULT,(void (*)(void))MyMatMult) then provide the routine MyMatMult() to do your stencil based matrix free product; note that you can create this new routine by taking the structure of IFunction() and reorganizing it to do the Jacobian product instead. You will need to get the information about the shell matrix size on each process by calling DMDAGetCorners(). 
> 
>     You will then remove the explicit computation of the Jacobian, and also remove the Event stuff since you don't need it.
> 
>      Extending to 2 and 3d is straight forward. 
> 
>      Any questions let us know.
> 
>    Barry
> 
>    If you like this would make a great merge request with your code to improve our examples.
> 
> 
> > On Feb 15, 2020, at 9:42 PM, Yuyun Yang <yyang85 at stanford.edu> wrote:
> > 
> > Hello team,
> > 
> > I wanted to apply the Krylov subspace method to a matrix-free implementation of a stencil, such that the iterative method acts on the operation without ever constructing the matrix explicitly (for example, when doing backward Euler).
> > 
> > I'm not sure whether there is already an example for that somewhere. If so, could you point me to a relevant example?
> > 
> > Thank you!
> > 
> > Best regards,
> > Yuyun


From knepley at gmail.com  Mon Feb 17 17:41:15 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 17 Feb 2020 18:41:15 -0500
Subject: [petsc-users] BCs for a EPS solver
In-Reply-To: <CAMo+o5hi48Nedqx_5Pc8b4+-TDejyTufxdssJA_-w4fs+J8M=A@mail.gmail.com>
References: <CAMo+o5j6KYjF-8kk=whV_BevUs3WTHvvNxY9EiLFywSCvcH=Qw@mail.gmail.com>
	<e8761982c583b46798d583c0e5e47654dbdff1b1.camel@seamplex.com>
	<CAMo+o5iz4dBwyS1gdj4UrTaLFze_AHOpmcW6LOFa-xHOXCcVxQ@mail.gmail.com>
	<CAMYG4G=q4jkrEgAHn1id-hjQfnceN_BQ5MFk=uS9Jn-2inW2ig@mail.gmail.com>
	<CAMo+o5hi48Nedqx_5Pc8b4+-TDejyTufxdssJA_-w4fs+J8M=A@mail.gmail.com>
Message-ID: <CAMYG4G=qSog3sx90dtJu8+CEQVOXf5rhjQDBgEsmqJVhp5jUrg@mail.gmail.com>

On Mon, Feb 17, 2020 at 4:33 PM Emmanuel Ayala <juaneah at gmail.com> wrote:

> Hi,
>
> Thank you for the clarification, now I understand what means change those
> values, and I tried to do that.
>
> But if I put the row i to the identity in A, and zero in B, the solver
> crash:
>

So if you need to factor B, maybe reverse it?

  Thanks,

    Matt


> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: Error in external library
> [1]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [1]PETSC ERROR: Error in external library
> [1]PETSC ERROR: Error reported by MUMPS in numerical factorization phase:
> INFOG(1)=-10, INFO(2)=0
>
> [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [1]PETSC ERROR: Petsc Release Version 3.12.3, Jan, 03, 2020
> [1]PETSC ERROR: [2]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [2]PETSC ERROR: Error in external library
> [2]PETSC ERROR: Error reported by MUMPS in numerical factorization phase:
> INFOG(1)=-10, INFO(2)=0
>
> [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [2]PETSC ERROR: Petsc Release Version 3.12.3, Jan, 03, 2020
> [2]PETSC ERROR: ./comp on a arch-linux-c-opt-O2-mumps named ayala by ayala
> Mon Feb 17 15:28:01 2020
> [3]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [3]PETSC ERROR: Error in external library
> [3]PETSC ERROR: Error reported by MUMPS in numerical factorization phase:
> INFOG(1)=-10, INFO(2)=0
>
> [3]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [3]PETSC ERROR: Petsc Release Version 3.12.3, Jan, 03, 2020
> [3]PETSC ERROR: ./comp on a arch-linux-c-opt-O2-mumps named ayala by ayala
> Mon Feb 17 15:28:01 2020
> [3]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2
> -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native"
> FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich
> --download-mumps --download-scalapack --download-parmetis --download-metis
> --download-superlu_dist --download-cmake --download-fblaslapack=1
> --with-cxx-dialect=C++11
> [3]PETSC ERROR: #1 MatFactorNumeric_MUMPS() line 1365 in
> /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/impls/aij/mpi/mumps/mumps.c
> [0]PETSC ERROR: Error reported by MUMPS in numerical factorization phase:
> INFOG(1)=-10, INFO(2)=33
>
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.12.3, Jan, 03, 2020
> [0]PETSC ERROR: ./comp on a arch-linux-c-opt-O2-mumps named ayala by ayala
> Mon Feb 17 15:28:01 2020
> [0]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2
> -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native"
> FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich
> --download-mumps --download-scalapack --download-parmetis --download-metis
> --download-superlu_dist --download-cmake --download-fblaslapack=1
> --with-cxx-dialect=C++11
> [0]PETSC ERROR: #1 MatFactorNumeric_MUMPS() line 1365 in
> /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/impls/aij/mpi/mumps/mumps.c
> ./comp on a arch-linux-c-opt-O2-mumps named ayala by ayala Mon Feb 17
> 15:28:01 2020
> [1]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2
> -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native"
> FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich
> --download-mumps --download-scalapack --download-parmetis --download-metis
> --download-superlu_dist --download-cmake --download-fblaslapack=1
> --with-cxx-dialect=C++11
> [1]PETSC ERROR: #1 MatFactorNumeric_MUMPS() line 1365 in
> /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/impls/aij/mpi/mumps/mumps.c
> [1]PETSC ERROR: #2 MatLUFactorNumeric() line 3057 in
> /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/interface/matrix.c
> [2]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2
> -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native"
> FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich
> --download-mumps --download-scalapack --download-parmetis --download-metis
> --download-superlu_dist --download-cmake --download-fblaslapack=1
> --with-cxx-dialect=C++11
> [2]PETSC ERROR: #1 MatFactorNumeric_MUMPS() line 1365 in
> /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/impls/aij/mpi/mumps/mumps.c
> [2]PETSC ERROR: #2 MatLUFactorNumeric() line 3057 in
> /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/interface/matrix.c
> [2]PETSC ERROR: [3]PETSC ERROR: #2 MatLUFactorNumeric() line 3057 in
> /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/interface/matrix.c
> [3]PETSC ERROR: #3 PCSetUp_LU() line 126 in
> /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/impls/factor/lu/lu.c
> [3]PETSC ERROR: #4 PCSetUp() line 894 in
> /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/interface/precon.c
> [0]PETSC ERROR: #2 MatLUFactorNumeric() line 3057 in
> /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/interface/matrix.c
> [0]PETSC ERROR: #3 PCSetUp_LU() line 126 in
> /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/impls/factor/lu/lu.c
> [0]PETSC ERROR: [1]PETSC ERROR: #3 PCSetUp_LU() line 126 in
> /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/impls/factor/lu/lu.c
> [1]PETSC ERROR: #4 PCSetUp() line 894 in
> /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/interface/precon.c
> [1]PETSC ERROR: #3 PCSetUp_LU() line 126 in
> /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/impls/factor/lu/lu.c
> [2]PETSC ERROR: #4 PCSetUp() line 894 in
> /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/interface/precon.c
> [2]PETSC ERROR: #5 KSPSetUp() line 376 in
> /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/ksp/interface/itfunc.c
> [3]PETSC ERROR: #5 KSPSetUp() line 376 in
> /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/ksp/interface/itfunc.c
> [3]PETSC ERROR: #6 STSetUp_Shift() line 120 in
> /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/impls/shift/shift.c
> #4 PCSetUp() line 894 in
> /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/interface/precon.c
> [0]PETSC ERROR: #5 KSPSetUp() line 376 in
> /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/ksp/interface/itfunc.c
> [0]PETSC ERROR: #6 STSetUp_Shift() line 120 in
> /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/impls/shift/shift.c
> [0]PETSC ERROR: #5 KSPSetUp() line 376 in
> /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/ksp/interface/itfunc.c
> [1]PETSC ERROR: #6 STSetUp_Shift() line 120 in
> /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/impls/shift/shift.c
> [1]PETSC ERROR: #7 STSetUp() line 271 in
> /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/interface/stsolve.c
> [1]PETSC ERROR: #8 EPSSetUp() line 273 in
> /home/ayala/Documents/SLEPc/slepc-3.12.2/src/eps/interface/epssetup.c
> [2]PETSC ERROR: #6 STSetUp_Shift() line 120 in
> /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/impls/shift/shift.c
> [2]PETSC ERROR: #7 STSetUp() line 271 in
> /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/interface/stsolve.c
> [2]PETSC ERROR: #8 EPSSetUp() line 273 in
> /home/ayala/Documents/SLEPc/slepc-3.12.2/src/eps/interface/epssetup.c
> [2]PETSC ERROR: #9 FourBar_NaturalPulsation() line 3937 in
> /home/ayala/Nextcloud/cpp_projects/2020-02-13-muboto-balancing-v17-mma/Multibody.cc
>
> El lun., 17 de feb. de 2020 a la(s) 13:20, Matthew Knepley (
> knepley at gmail.com) escribi?:
>
>> On Mon, Feb 17, 2020 at 1:59 PM Emmanuel Ayala <juaneah at gmail.com> wrote:
>>
>>> Hi, thanks for the quick answer.
>>>
>>> I just did it, and it does not work. My problem is GNHEP and I use the
>>> default solver (Krylov-Schur). Moreover I run the code with the options:
>>> -st_ksp_type preonly -st_pc_type lu -st_pc_factor_mat_solver_type mumps
>>>
>>
>> I guess a better question is: What do you expect to work?
>>
>> For a linear solve,
>>
>>   A x = b
>>
>> if a row i is 0 except for a one on the diagonal, then I get
>>
>>   x_i = b_i
>>
>> so hopefully you put the correct boundary value in b_i. For the
>> generalized eigenproblem
>>
>>   A x = \lambda B x
>>
>> if you set row i to the identity in A, and zero in B, we get
>>
>>   x_i = 0
>>
>> and you must put the boundary values into x after you have finished the
>> solve. Is this what you did?
>>
>>   Thanks,
>>
>>      Matt
>>
>>
>>> Any other suggestions?
>>> Kind regards.
>>>
>>> El lun., 17 de feb. de 2020 a la(s) 12:39, Jeremy Theler (
>>> jeremy at seamplex.com) escribi?:
>>>
>>>> The usual trick is to set ones in one matrix and zeros in the other
>>>> one.
>>>>
>>>>
>>>> On Mon, 2020-02-17 at 12:35 -0600, Emmanuel Ayala wrote:
>>>> > Hi everyone,
>>>> >
>>>> > I have an eigenvalue problem where I need to apply BCs to the
>>>> > stiffness and mass matrix.
>>>> >
>>>> > Usually, for KSP solver, it is enough to set to zero the rows and
>>>> > columns related to the boundary conditions. I used to apply it with
>>>> > MatZeroRowsColumns, with a 1s on the diagonal. Then the solver works
>>>> > well.
>>>> >
>>>> > There is something similar to KSP for EPS solver ?
>>>> >
>>>> > I already used MatZeroRowsColumns (for EPS solver), with a 1s on the
>>>> > diagonal, and I got wrong result.
>>>> >
>>>> > Kind regards.
>>>> >
>>>> >
>>>> >
>>>> >
>>>>
>>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200217/33a7e02e/attachment.html>

From jroman at dsic.upv.es  Tue Feb 18 03:17:46 2020
From: jroman at dsic.upv.es (Jose E. Roman)
Date: Tue, 18 Feb 2020 10:17:46 +0100
Subject: [petsc-users] SLEPc: The inner product is not well defined
In-Reply-To: <CAMo+o5izPGYrJ+4gGwofJJZNgU5wpGrqYcKssjpbrUF0UV=XJg@mail.gmail.com>
References: <CAMo+o5hZSMjEUS_pM62K0g46hYx9KBffJ7ETc=0QNqoAzZHpZw@mail.gmail.com>
	<E7DB53E3-2E41-4A5B-9C07-893ABB64C7FC@dsic.upv.es>
	<CAMo+o5izPGYrJ+4gGwofJJZNgU5wpGrqYcKssjpbrUF0UV=XJg@mail.gmail.com>
Message-ID: <5772B337-D031-434F-B2B9-AC6EA7D19783@dsic.upv.es>



> El 17 feb 2020, a las 19:19, Emmanuel Ayala <juaneah at gmail.com> escribi?:
> 
> Thank you very much for the answer.
> 
> This error appears when computing the B-norm of a vector x, as sqrt(x'*B*x). Probably your B matrix is semi-definite, and due to floating-point error the value x'*B*x becomes negative for a certain vector x. The code uses a tolerance of 10*PETSC_MACHINE_EPSILON, but it seems the rounding errors are larger in your case. Or maybe your B-matrix is indefinite, in which case you should solve the problem as non-symmetric (or as symmetric-indefinite GHIEP).
> 
> Do you get the same problem with the Krylov-Schur solver?
> 
> 
> After check the input matrices, the problem was solved using GHIEP.
>  
> A workaround is to edit the source code and remove the check or increase the tolerance, but this may be catastrophic if your B is indefinite. A better solution is to reformulate the problem, solving the matrix pair (A,C) where C=alpha*A+beta*B is positive definite (note that then the eigenvalues become lambda/(beta+alpha*lambda)).
> 
> 
> Ok, there is a rule to choose the values for alpha and beta? 

For instance take alpha=1 and beta=-sigma, where sigma is a lower bound of the leftmost eigenvalue of B (the most negative one). This assumes that A is positive definite.

Jose


> 
> Kind regards.
> Thanks.
>  
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200218/6577db32/attachment-0001.html>

From jroman at dsic.upv.es  Tue Feb 18 03:51:36 2020
From: jroman at dsic.upv.es (Jose E. Roman)
Date: Tue, 18 Feb 2020 10:51:36 +0100
Subject: [petsc-users] BCs for a EPS solver
In-Reply-To: <CAMYG4G=qSog3sx90dtJu8+CEQVOXf5rhjQDBgEsmqJVhp5jUrg@mail.gmail.com>
References: <CAMo+o5j6KYjF-8kk=whV_BevUs3WTHvvNxY9EiLFywSCvcH=Qw@mail.gmail.com>
	<e8761982c583b46798d583c0e5e47654dbdff1b1.camel@seamplex.com>
	<CAMo+o5iz4dBwyS1gdj4UrTaLFze_AHOpmcW6LOFa-xHOXCcVxQ@mail.gmail.com>
	<CAMYG4G=q4jkrEgAHn1id-hjQfnceN_BQ5MFk=uS9Jn-2inW2ig@mail.gmail.com>
	<CAMo+o5hi48Nedqx_5Pc8b4+-TDejyTufxdssJA_-w4fs+J8M=A@mail.gmail.com>
	<CAMYG4G=qSog3sx90dtJu8+CEQVOXf5rhjQDBgEsmqJVhp5jUrg@mail.gmail.com>
Message-ID: <9286DD37-1487-4F54-BBC9-4E6133FEC916@dsic.upv.es>

You put alpha on the diagonal of A and beta on the diagonal of B to get an eigenvalue lambda=alpha/beta. If you set beta=0 then lambda=Inf. The choice depends on where your wanted eigenvalues are and how you are solving the eigenproblem. The choice of lambda=Inf suggested by Jeremy avoids inserting eigenvalues that may interfere with the problem's eigenvalues, but this is good for shift-and-invert, not for the case where you solve linear systems with B.

Anyway, this kind of manipulation may have an impact on convergence of the eigensolver or on conditioning of the linear solves. A possibly better approach is just to get rid of the BC unknowns by creating smaller A, B matrices, e.g. with MatGetSubMatrix().

Jose
 

> El 18 feb 2020, a las 0:41, Matthew Knepley <knepley at gmail.com> escribi?:
> 
> On Mon, Feb 17, 2020 at 4:33 PM Emmanuel Ayala <juaneah at gmail.com <mailto:juaneah at gmail.com>> wrote:
> Hi,
> 
> Thank you for the clarification, now I understand what means change those values, and I tried to do that.
> 
> But if I put the row i to the identity in A, and zero in B, the solver crash:
> 
> So if you need to factor B, maybe reverse it?
> 
>   Thanks,
> 
>     Matt
>  
> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [0]PETSC ERROR: Error in external library
> [1]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [1]PETSC ERROR: Error in external library
> [1]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-10, INFO(2)=0
> 
> [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html <https://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble shooting.
> [1]PETSC ERROR: Petsc Release Version 3.12.3, Jan, 03, 2020 
> [1]PETSC ERROR: [2]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [2]PETSC ERROR: Error in external library
> [2]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-10, INFO(2)=0
> 
> [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html <https://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble shooting.
> [2]PETSC ERROR: Petsc Release Version 3.12.3, Jan, 03, 2020 
> [2]PETSC ERROR: ./comp on a arch-linux-c-opt-O2-mumps named ayala by ayala Mon Feb 17 15:28:01 2020
> [3]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [3]PETSC ERROR: Error in external library
> [3]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-10, INFO(2)=0
> 
> [3]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html <https://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble shooting.
> [3]PETSC ERROR: Petsc Release Version 3.12.3, Jan, 03, 2020 
> [3]PETSC ERROR: ./comp on a arch-linux-c-opt-O2-mumps named ayala by ayala Mon Feb 17 15:28:01 2020
> [3]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2 -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich --download-mumps --download-scalapack --download-parmetis --download-metis --download-superlu_dist --download-cmake --download-fblaslapack=1 --with-cxx-dialect=C++11
> [3]PETSC ERROR: #1 MatFactorNumeric_MUMPS() line 1365 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/impls/aij/mpi/mumps/mumps.c
> [0]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-10, INFO(2)=33
> 
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html <https://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.12.3, Jan, 03, 2020 
> [0]PETSC ERROR: ./comp on a arch-linux-c-opt-O2-mumps named ayala by ayala Mon Feb 17 15:28:01 2020
> [0]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2 -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich --download-mumps --download-scalapack --download-parmetis --download-metis --download-superlu_dist --download-cmake --download-fblaslapack=1 --with-cxx-dialect=C++11
> [0]PETSC ERROR: #1 MatFactorNumeric_MUMPS() line 1365 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/impls/aij/mpi/mumps/mumps.c
> ./comp on a arch-linux-c-opt-O2-mumps named ayala by ayala Mon Feb 17 15:28:01 2020
> [1]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2 -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich --download-mumps --download-scalapack --download-parmetis --download-metis --download-superlu_dist --download-cmake --download-fblaslapack=1 --with-cxx-dialect=C++11
> [1]PETSC ERROR: #1 MatFactorNumeric_MUMPS() line 1365 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/impls/aij/mpi/mumps/mumps.c
> [1]PETSC ERROR: #2 MatLUFactorNumeric() line 3057 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/interface/matrix.c
> [2]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2 -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich --download-mumps --download-scalapack --download-parmetis --download-metis --download-superlu_dist --download-cmake --download-fblaslapack=1 --with-cxx-dialect=C++11
> [2]PETSC ERROR: #1 MatFactorNumeric_MUMPS() line 1365 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/impls/aij/mpi/mumps/mumps.c
> [2]PETSC ERROR: #2 MatLUFactorNumeric() line 3057 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/interface/matrix.c
> [2]PETSC ERROR: [3]PETSC ERROR: #2 MatLUFactorNumeric() line 3057 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/interface/matrix.c
> [3]PETSC ERROR: #3 PCSetUp_LU() line 126 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/impls/factor/lu/lu.c
> [3]PETSC ERROR: #4 PCSetUp() line 894 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/interface/precon.c
> [0]PETSC ERROR: #2 MatLUFactorNumeric() line 3057 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/interface/matrix.c
> [0]PETSC ERROR: #3 PCSetUp_LU() line 126 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/impls/factor/lu/lu.c
> [0]PETSC ERROR: [1]PETSC ERROR: #3 PCSetUp_LU() line 126 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/impls/factor/lu/lu.c
> [1]PETSC ERROR: #4 PCSetUp() line 894 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/interface/precon.c
> [1]PETSC ERROR: #3 PCSetUp_LU() line 126 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/impls/factor/lu/lu.c
> [2]PETSC ERROR: #4 PCSetUp() line 894 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/interface/precon.c
> [2]PETSC ERROR: #5 KSPSetUp() line 376 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/ksp/interface/itfunc.c
> [3]PETSC ERROR: #5 KSPSetUp() line 376 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/ksp/interface/itfunc.c
> [3]PETSC ERROR: #6 STSetUp_Shift() line 120 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/impls/shift/shift.c
> #4 PCSetUp() line 894 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/interface/precon.c
> [0]PETSC ERROR: #5 KSPSetUp() line 376 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/ksp/interface/itfunc.c
> [0]PETSC ERROR: #6 STSetUp_Shift() line 120 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/impls/shift/shift.c
> [0]PETSC ERROR: #5 KSPSetUp() line 376 in /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/ksp/interface/itfunc.c
> [1]PETSC ERROR: #6 STSetUp_Shift() line 120 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/impls/shift/shift.c
> [1]PETSC ERROR: #7 STSetUp() line 271 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/interface/stsolve.c
> [1]PETSC ERROR: #8 EPSSetUp() line 273 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/eps/interface/epssetup.c
> [2]PETSC ERROR: #6 STSetUp_Shift() line 120 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/impls/shift/shift.c
> [2]PETSC ERROR: #7 STSetUp() line 271 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/interface/stsolve.c
> [2]PETSC ERROR: #8 EPSSetUp() line 273 in /home/ayala/Documents/SLEPc/slepc-3.12.2/src/eps/interface/epssetup.c
> [2]PETSC ERROR: #9 FourBar_NaturalPulsation() line 3937 in /home/ayala/Nextcloud/cpp_projects/2020-02-13-muboto-balancing-v17-mma/Multibody.cc
> 
> El lun., 17 de feb. de 2020 a la(s) 13:20, Matthew Knepley (knepley at gmail.com <mailto:knepley at gmail.com>) escribi?:
> On Mon, Feb 17, 2020 at 1:59 PM Emmanuel Ayala <juaneah at gmail.com <mailto:juaneah at gmail.com>> wrote:
> Hi, thanks for the quick answer.
> 
> I just did it, and it does not work. My problem is GNHEP and I use the default solver (Krylov-Schur). Moreover I run the code with the options: -st_ksp_type preonly -st_pc_type lu -st_pc_factor_mat_solver_type mumps
> 
> I guess a better question is: What do you expect to work?
> 
> For a linear solve,
> 
>   A x = b
> 
> if a row i is 0 except for a one on the diagonal, then I get
> 
>   x_i = b_i
> 
> so hopefully you put the correct boundary value in b_i. For the generalized eigenproblem
> 
>   A x = \lambda B x
> 
> if you set row i to the identity in A, and zero in B, we get
> 
>   x_i = 0
> 
> and you must put the boundary values into x after you have finished the solve. Is this what you did?
> 
>   Thanks,
> 
>      Matt
>  
> Any other suggestions?
> Kind regards.
> 
> El lun., 17 de feb. de 2020 a la(s) 12:39, Jeremy Theler (jeremy at seamplex.com <mailto:jeremy at seamplex.com>) escribi?:
> The usual trick is to set ones in one matrix and zeros in the other
> one.
> 
> 
> On Mon, 2020-02-17 at 12:35 -0600, Emmanuel Ayala wrote:
> > Hi everyone,
> > 
> > I have an eigenvalue problem where I need to apply BCs to the
> > stiffness and mass matrix. 
> > 
> > Usually, for KSP solver, it is enough to set to zero the rows and
> > columns related to the boundary conditions. I used to apply it with
> > MatZeroRowsColumns, with a 1s on the diagonal. Then the solver works
> > well.
> > 
> > There is something similar to KSP for EPS solver ? 
> > 
> > I already used MatZeroRowsColumns (for EPS solver), with a 1s on the
> > diagonal, and I got wrong result.
> > 
> > Kind regards.
> > 
> > 
> > 
> > 
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200218/da4bfdef/attachment.html>

From d.scott at epcc.ed.ac.uk  Tue Feb 18 05:03:07 2020
From: d.scott at epcc.ed.ac.uk (David Scott)
Date: Tue, 18 Feb 2020 11:03:07 +0000
Subject: [petsc-users] DM_BOUNDARY_GHOSTED
Message-ID: <0fb1334e-13f7-23f3-1c42-79286ed164c9@epcc.ed.ac.uk>

Hello,

I wish to solve a channel flow problem with different boundary
conditions. In the streamwise direction I may have periodic or
inlet/outlet BCs. I would like to make my code for the two cases as
similar as possible. If I use DM_BOUNDARY_PERIODIC then when performing
a linear solve the ghost values will be set automatically. For the
inlet/outlet case can I use DM_BOUNDARY_GHOSTED instead and somehow
arrange for values that I specify to be placed in the ghost locations?

Thanks,

David

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

From knepley at gmail.com  Tue Feb 18 06:42:22 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 18 Feb 2020 07:42:22 -0500
Subject: [petsc-users] DM_BOUNDARY_GHOSTED
In-Reply-To: <0fb1334e-13f7-23f3-1c42-79286ed164c9@epcc.ed.ac.uk>
References: <0fb1334e-13f7-23f3-1c42-79286ed164c9@epcc.ed.ac.uk>
Message-ID: <CAMYG4G=OQ73YTXdQhCfM8qWXaSm-t_4QvTSAZD8WcFMgUZeYGw@mail.gmail.com>

On Tue, Feb 18, 2020 at 6:03 AM David Scott <d.scott at epcc.ed.ac.uk> wrote:

> Hello,
>
> I wish to solve a channel flow problem with different boundary
> conditions. In the streamwise direction I may have periodic or
> inlet/outlet BCs. I would like to make my code for the two cases as
> similar as possible. If I use DM_BOUNDARY_PERIODIC then when performing
> a linear solve the ghost values will be set automatically. For the
> inlet/outlet case can I use DM_BOUNDARY_GHOSTED instead and somehow
> arrange for values that I specify to be placed in the ghost locations?
>

Yes, that is the intent.

  Thanks,

     Matt


> Thanks,
>
> David
>
> The University of Edinburgh is a charitable body, registered in Scotland,
> with registration number SC005336.
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200218/adc0bf5d/attachment.html>

From yyang85 at stanford.edu  Tue Feb 18 07:19:52 2020
From: yyang85 at stanford.edu (Yuyun Yang)
Date: Tue, 18 Feb 2020 13:19:52 +0000
Subject: [petsc-users] Matrix-free method in PETSc
In-Reply-To: <624EDFBF-E105-4F70-87C6-58BE940D73C9@mcs.anl.gov>
References: <DM6PR02MB52427978C0D49D7F6D07FCF9D2170@DM6PR02MB5242.namprd02.prod.outlook.com>
	<D9C47F36-D516-4ABA-A31B-30CE5F8D9744@anl.gov>
	<DM6PR02MB5242C9E2F8ED6FB1A1A28CC8D2170@DM6PR02MB5242.namprd02.prod.outlook.com>
	<DM6PR02MB524259A2FC82DE98649B15E9D2160@DM6PR02MB5242.namprd02.prod.outlook.com>
	<624EDFBF-E105-4F70-87C6-58BE940D73C9@mcs.anl.gov>
Message-ID: <1D793649-96CD-4807-8330-53D9CA176348@stanford.edu>

Thanks for the clarification.

Got one more question: if I have variable coefficients, my stencil will be updated at every time step, so will the coefficients in myMatMult. In that case, is it necessary to destroy the shell matrix and create it all over again, or can I use it as it is, only calling the stencil update function, assuming the result will be passed into the matrix operation automatically?

Thanks,
Yuyun

?On 2/18/20, 7:34 AM, "Smith, Barry F." <bsmith at mcs.anl.gov> wrote:

    
    
    > On Feb 17, 2020, at 7:56 AM, Yuyun Yang <yyang85 at stanford.edu> wrote:
    > 
    > Hello,
    > 
    > I actually have a question about the usage of DMDA since I'm quite new to this. I wonder if the DMDA suite of functions can be directly called on vectors created from VecCreate?
    
       Yes, but you have to make sure the ones you create have the same sizes and parallel layouts. Generally best to get them from the DMDA or VecDuplicate() than the hassle of figuring out sizes.
    
    > Or the vectors have to be formed by DMDACreateGlobalVector? I'm also not sure about what the dof and stencil width arguments do.
    > 
    > I'm still unsure about the usage of MatCreateShell and MatShellSetOperation, since it seems that MyMatMult should still have 3 inputs just like MatMult (the matrix and two vectors). Since I'm not forming the matrix, does that mean the matrix input is meaningless but still needs to exist for the sake of this format?
    
        Well the matrix input is your shell matrix so it likely has information you need to do your multiply routine. MatShellGetContext() (No you do not want to put your information about the matrix stencil inside global variables!)
    
    
    > 
    > After I create such a shell matrix, can I use it like a regular matrix in KSP and utilize preconditioners?
    > 
    > Thanks!
    > Yuyun
    > From: petsc-users <petsc-users-bounces at mcs.anl.gov> on behalf of Yuyun Yang <yyang85 at stanford.edu>
    > Sent: Sunday, February 16, 2020 3:12 AM
    > To: Smith, Barry F. <bsmith at mcs.anl.gov>
    > Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
    > Subject: Re: [petsc-users] Matrix-free method in PETSc
    >  
    > Thank you, that is very helpful information indeed! I will try it and send you my code when it works.
    > 
    > Best regards,
    > Yuyun
    > From: Smith, Barry F. <bsmith at mcs.anl.gov>
    > Sent: Saturday, February 15, 2020 10:02 PM
    > To: Yuyun Yang <yyang85 at stanford.edu>
    > Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
    > Subject: Re: [petsc-users] Matrix-free method in PETSc
    >  
    >   Yuyun,
    > 
    >     If you are speaking about using a finite difference stencil on a structured grid where you provide the Jacobian vector products yourself by looping over the grid doing the stencil operation we unfortunately do not have exactly that kind of example. 
    > 
    >     But it is actually not difficult. I suggest starting with src/ts/examples/tests/ex22.c It computes the sparse matrix explicitly with FormIJacobian() 
    > 
    >     What you need to do is instead in main() use MatCreateShell() and MatShellSetOperation(,MATOP_MULT,(void (*)(void))MyMatMult) then provide the routine MyMatMult() to do your stencil based matrix free product; note that you can create this new routine by taking the structure of IFunction() and reorganizing it to do the Jacobian product instead. You will need to get the information about the shell matrix size on each process by calling DMDAGetCorners(). 
    > 
    >     You will then remove the explicit computation of the Jacobian, and also remove the Event stuff since you don't need it.
    > 
    >      Extending to 2 and 3d is straight forward. 
    > 
    >      Any questions let us know.
    > 
    >    Barry
    > 
    >    If you like this would make a great merge request with your code to improve our examples.
    > 
    > 
    > > On Feb 15, 2020, at 9:42 PM, Yuyun Yang <yyang85 at stanford.edu> wrote:
    > > 
    > > Hello team,
    > > 
    > > I wanted to apply the Krylov subspace method to a matrix-free implementation of a stencil, such that the iterative method acts on the operation without ever constructing the matrix explicitly (for example, when doing backward Euler).
    > > 
    > > I'm not sure whether there is already an example for that somewhere. If so, could you point me to a relevant example?
    > > 
    > > Thank you!
    > > 
    > > Best regards,
    > > Yuyun
    
    


From knepley at gmail.com  Tue Feb 18 07:23:03 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 18 Feb 2020 08:23:03 -0500
Subject: [petsc-users] Matrix-free method in PETSc
In-Reply-To: <1D793649-96CD-4807-8330-53D9CA176348@stanford.edu>
References: <DM6PR02MB52427978C0D49D7F6D07FCF9D2170@DM6PR02MB5242.namprd02.prod.outlook.com>
	<D9C47F36-D516-4ABA-A31B-30CE5F8D9744@anl.gov>
	<DM6PR02MB5242C9E2F8ED6FB1A1A28CC8D2170@DM6PR02MB5242.namprd02.prod.outlook.com>
	<DM6PR02MB524259A2FC82DE98649B15E9D2160@DM6PR02MB5242.namprd02.prod.outlook.com>
	<624EDFBF-E105-4F70-87C6-58BE940D73C9@mcs.anl.gov>
	<1D793649-96CD-4807-8330-53D9CA176348@stanford.edu>
Message-ID: <CAMYG4GnDFGvktLEzpQLh5-_N-BSAkZ21gEahnU=MvKmYY1u5_g@mail.gmail.com>

On Tue, Feb 18, 2020 at 8:20 AM Yuyun Yang <yyang85 at stanford.edu> wrote:

> Thanks for the clarification.
>
> Got one more question: if I have variable coefficients, my stencil will be
> updated at every time step, so will the coefficients in myMatMult. In that
> case, is it necessary to destroy the shell matrix and create it all over
> again, or can I use it as it is, only calling the stencil update function,
> assuming the result will be passed into the matrix operation automatically?
>

You update the information in the context associated with the shell matrix.
No need to destroy it.

  Thanks,

    Matt


> Thanks,
> Yuyun
>
> ?On 2/18/20, 7:34 AM, "Smith, Barry F." <bsmith at mcs.anl.gov> wrote:
>
>
>
>     > On Feb 17, 2020, at 7:56 AM, Yuyun Yang <yyang85 at stanford.edu>
> wrote:
>     >
>     > Hello,
>     >
>     > I actually have a question about the usage of DMDA since I'm quite
> new to this. I wonder if the DMDA suite of functions can be directly called
> on vectors created from VecCreate?
>
>        Yes, but you have to make sure the ones you create have the same
> sizes and parallel layouts. Generally best to get them from the DMDA or
> VecDuplicate() than the hassle of figuring out sizes.
>
>     > Or the vectors have to be formed by DMDACreateGlobalVector? I'm also
> not sure about what the dof and stencil width arguments do.
>     >
>     > I'm still unsure about the usage of MatCreateShell and
> MatShellSetOperation, since it seems that MyMatMult should still have 3
> inputs just like MatMult (the matrix and two vectors). Since I'm not
> forming the matrix, does that mean the matrix input is meaningless but
> still needs to exist for the sake of this format?
>
>         Well the matrix input is your shell matrix so it likely has
> information you need to do your multiply routine. MatShellGetContext() (No
> you do not want to put your information about the matrix stencil inside
> global variables!)
>
>
>     >
>     > After I create such a shell matrix, can I use it like a regular
> matrix in KSP and utilize preconditioners?
>     >
>     > Thanks!
>     > Yuyun
>     > From: petsc-users <petsc-users-bounces at mcs.anl.gov> on behalf of
> Yuyun Yang <yyang85 at stanford.edu>
>     > Sent: Sunday, February 16, 2020 3:12 AM
>     > To: Smith, Barry F. <bsmith at mcs.anl.gov>
>     > Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
>     > Subject: Re: [petsc-users] Matrix-free method in PETSc
>     >
>     > Thank you, that is very helpful information indeed! I will try it
> and send you my code when it works.
>     >
>     > Best regards,
>     > Yuyun
>     > From: Smith, Barry F. <bsmith at mcs.anl.gov>
>     > Sent: Saturday, February 15, 2020 10:02 PM
>     > To: Yuyun Yang <yyang85 at stanford.edu>
>     > Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
>     > Subject: Re: [petsc-users] Matrix-free method in PETSc
>     >
>     >   Yuyun,
>     >
>     >     If you are speaking about using a finite difference stencil on a
> structured grid where you provide the Jacobian vector products yourself by
> looping over the grid doing the stencil operation we unfortunately do not
> have exactly that kind of example.
>     >
>     >     But it is actually not difficult. I suggest starting with
> src/ts/examples/tests/ex22.c It computes the sparse matrix explicitly with
> FormIJacobian()
>     >
>     >     What you need to do is instead in main() use MatCreateShell()
> and MatShellSetOperation(,MATOP_MULT,(void (*)(void))MyMatMult) then
> provide the routine MyMatMult() to do your stencil based matrix free
> product; note that you can create this new routine by taking the structure
> of IFunction() and reorganizing it to do the Jacobian product instead. You
> will need to get the information about the shell matrix size on each
> process by calling DMDAGetCorners().
>     >
>     >     You will then remove the explicit computation of the Jacobian,
> and also remove the Event stuff since you don't need it.
>     >
>     >      Extending to 2 and 3d is straight forward.
>     >
>     >      Any questions let us know.
>     >
>     >    Barry
>     >
>     >    If you like this would make a great merge request with your code
> to improve our examples.
>     >
>     >
>     > > On Feb 15, 2020, at 9:42 PM, Yuyun Yang <yyang85 at stanford.edu>
> wrote:
>     > >
>     > > Hello team,
>     > >
>     > > I wanted to apply the Krylov subspace method to a matrix-free
> implementation of a stencil, such that the iterative method acts on the
> operation without ever constructing the matrix explicitly (for example,
> when doing backward Euler).
>     > >
>     > > I'm not sure whether there is already an example for that
> somewhere. If so, could you point me to a relevant example?
>     > >
>     > > Thank you!
>     > >
>     > > Best regards,
>     > > Yuyun
>
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200218/ccf391ad/attachment-0001.html>

From yyang85 at stanford.edu  Tue Feb 18 08:26:11 2020
From: yyang85 at stanford.edu (Yuyun Yang)
Date: Tue, 18 Feb 2020 14:26:11 +0000
Subject: [petsc-users] Matrix-free method in PETSc
In-Reply-To: <CAMYG4GnDFGvktLEzpQLh5-_N-BSAkZ21gEahnU=MvKmYY1u5_g@mail.gmail.com>
References: <DM6PR02MB52427978C0D49D7F6D07FCF9D2170@DM6PR02MB5242.namprd02.prod.outlook.com>
	<D9C47F36-D516-4ABA-A31B-30CE5F8D9744@anl.gov>
	<DM6PR02MB5242C9E2F8ED6FB1A1A28CC8D2170@DM6PR02MB5242.namprd02.prod.outlook.com>
	<DM6PR02MB524259A2FC82DE98649B15E9D2160@DM6PR02MB5242.namprd02.prod.outlook.com>
	<624EDFBF-E105-4F70-87C6-58BE940D73C9@mcs.anl.gov>
	<1D793649-96CD-4807-8330-53D9CA176348@stanford.edu>
	<CAMYG4GnDFGvktLEzpQLh5-_N-BSAkZ21gEahnU=MvKmYY1u5_g@mail.gmail.com>
Message-ID: <FDFEE410-DF1A-47CC-9051-E9805ED497F9@stanford.edu>

Thanks. Also, when using KSP, would the syntax be KSPSetOperators(ksp,A,A)? Since you mentioned preconditioners are not generally used for matrix-free operators, I wasn?t sure whether I should still put ?A? in the Pmat field.

Is it still possible to use TS in conjunction with the matrix-free operator? I?d like to create a simple test case that solves the 1d heat equation implicitly with variable coefficients, but didn?t know how the time stepping can be set up.

Thanks,
Yuyun

From: Matthew Knepley <knepley at gmail.com>
Date: Tuesday, February 18, 2020 at 9:23 PM
To: Yuyun Yang <yyang85 at stanford.edu>
Cc: "Smith, Barry F." <bsmith at mcs.anl.gov>, "petsc-users at mcs.anl.gov" <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Matrix-free method in PETSc

On Tue, Feb 18, 2020 at 8:20 AM Yuyun Yang <yyang85 at stanford.edu<mailto:yyang85 at stanford.edu>> wrote:
Thanks for the clarification.

Got one more question: if I have variable coefficients, my stencil will be updated at every time step, so will the coefficients in myMatMult. In that case, is it necessary to destroy the shell matrix and create it all over again, or can I use it as it is, only calling the stencil update function, assuming the result will be passed into the matrix operation automatically?

You update the information in the context associated with the shell matrix. No need to destroy it.

  Thanks,

    Matt

Thanks,
Yuyun

On 2/18/20, 7:34 AM, "Smith, Barry F." <bsmith at mcs.anl.gov<mailto:bsmith at mcs.anl.gov>> wrote:



    > On Feb 17, 2020, at 7:56 AM, Yuyun Yang <yyang85 at stanford.edu<mailto:yyang85 at stanford.edu>> wrote:
    >
    > Hello,
    >
    > I actually have a question about the usage of DMDA since I'm quite new to this. I wonder if the DMDA suite of functions can be directly called on vectors created from VecCreate?

       Yes, but you have to make sure the ones you create have the same sizes and parallel layouts. Generally best to get them from the DMDA or VecDuplicate() than the hassle of figuring out sizes.

    > Or the vectors have to be formed by DMDACreateGlobalVector? I'm also not sure about what the dof and stencil width arguments do.
    >
    > I'm still unsure about the usage of MatCreateShell and MatShellSetOperation, since it seems that MyMatMult should still have 3 inputs just like MatMult (the matrix and two vectors). Since I'm not forming the matrix, does that mean the matrix input is meaningless but still needs to exist for the sake of this format?

        Well the matrix input is your shell matrix so it likely has information you need to do your multiply routine. MatShellGetContext() (No you do not want to put your information about the matrix stencil inside global variables!)


    >
    > After I create such a shell matrix, can I use it like a regular matrix in KSP and utilize preconditioners?
    >
    > Thanks!
    > Yuyun
    > From: petsc-users <petsc-users-bounces at mcs.anl.gov<mailto:petsc-users-bounces at mcs.anl.gov>> on behalf of Yuyun Yang <yyang85 at stanford.edu<mailto:yyang85 at stanford.edu>>
    > Sent: Sunday, February 16, 2020 3:12 AM
    > To: Smith, Barry F. <bsmith at mcs.anl.gov<mailto:bsmith at mcs.anl.gov>>
    > Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
    > Subject: Re: [petsc-users] Matrix-free method in PETSc
    >
    > Thank you, that is very helpful information indeed! I will try it and send you my code when it works.
    >
    > Best regards,
    > Yuyun
    > From: Smith, Barry F. <bsmith at mcs.anl.gov<mailto:bsmith at mcs.anl.gov>>
    > Sent: Saturday, February 15, 2020 10:02 PM
    > To: Yuyun Yang <yyang85 at stanford.edu<mailto:yyang85 at stanford.edu>>
    > Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
    > Subject: Re: [petsc-users] Matrix-free method in PETSc
    >
    >   Yuyun,
    >
    >     If you are speaking about using a finite difference stencil on a structured grid where you provide the Jacobian vector products yourself by looping over the grid doing the stencil operation we unfortunately do not have exactly that kind of example.
    >
    >     But it is actually not difficult. I suggest starting with src/ts/examples/tests/ex22.c It computes the sparse matrix explicitly with FormIJacobian()
    >
    >     What you need to do is instead in main() use MatCreateShell() and MatShellSetOperation(,MATOP_MULT,(void (*)(void))MyMatMult) then provide the routine MyMatMult() to do your stencil based matrix free product; note that you can create this new routine by taking the structure of IFunction() and reorganizing it to do the Jacobian product instead. You will need to get the information about the shell matrix size on each process by calling DMDAGetCorners().
    >
    >     You will then remove the explicit computation of the Jacobian, and also remove the Event stuff since you don't need it.
    >
    >      Extending to 2 and 3d is straight forward.
    >
    >      Any questions let us know.
    >
    >    Barry
    >
    >    If you like this would make a great merge request with your code to improve our examples.
    >
    >
    > > On Feb 15, 2020, at 9:42 PM, Yuyun Yang <yyang85 at stanford.edu<mailto:yyang85 at stanford.edu>> wrote:
    > >
    > > Hello team,
    > >
    > > I wanted to apply the Krylov subspace method to a matrix-free implementation of a stencil, such that the iterative method acts on the operation without ever constructing the matrix explicitly (for example, when doing backward Euler).
    > >
    > > I'm not sure whether there is already an example for that somewhere. If so, could you point me to a relevant example?
    > >
    > > Thank you!
    > >
    > > Best regards,
    > > Yuyun




--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200218/4197ecf0/attachment.html>

From hongzhang at anl.gov  Tue Feb 18 09:31:57 2020
From: hongzhang at anl.gov (Zhang, Hong)
Date: Tue, 18 Feb 2020 15:31:57 +0000
Subject: [petsc-users] Matrix-free method in PETSc
In-Reply-To: <FDFEE410-DF1A-47CC-9051-E9805ED497F9@stanford.edu>
References: <DM6PR02MB52427978C0D49D7F6D07FCF9D2170@DM6PR02MB5242.namprd02.prod.outlook.com>
	<D9C47F36-D516-4ABA-A31B-30CE5F8D9744@anl.gov>
	<DM6PR02MB5242C9E2F8ED6FB1A1A28CC8D2170@DM6PR02MB5242.namprd02.prod.outlook.com>
	<DM6PR02MB524259A2FC82DE98649B15E9D2160@DM6PR02MB5242.namprd02.prod.outlook.com>
	<624EDFBF-E105-4F70-87C6-58BE940D73C9@mcs.anl.gov>
	<1D793649-96CD-4807-8330-53D9CA176348@stanford.edu>
	<CAMYG4GnDFGvktLEzpQLh5-_N-BSAkZ21gEahnU=MvKmYY1u5_g@mail.gmail.com>
	<FDFEE410-DF1A-47CC-9051-E9805ED497F9@stanford.edu>
Message-ID: <B976BB0E-0478-4F67-BF3E-B8EC060399AB@anl.gov>

Here is an TS example using DMDA and matrix-free Jacobians. Though the matrix-free part is faked, it demonstrates the workflow.

https://gitlab.com/petsc/petsc/-/blob/hongzh/ts-matshell-example/src/ts/examples/tutorials/advection-diffusion-reaction/ex5_mf.c

Hong (Mr.)


On Feb 18, 2020, at 8:26 AM, Yuyun Yang <yyang85 at stanford.edu<mailto:yyang85 at stanford.edu>> wrote:

Thanks. Also, when using KSP, would the syntax be KSPSetOperators(ksp,A,A)? Since you mentioned preconditioners are not generally used for matrix-free operators, I wasn?t sure whether I should still put ?A? in the Pmat field.

Is it still possible to use TS in conjunction with the matrix-free operator? I?d like to create a simple test case that solves the 1d heat equation implicitly with variable coefficients, but didn?t know how the time stepping can be set up.

Thanks,
Yuyun

From: Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>>
Date: Tuesday, February 18, 2020 at 9:23 PM
To: Yuyun Yang <yyang85 at stanford.edu<mailto:yyang85 at stanford.edu>>
Cc: "Smith, Barry F." <bsmith at mcs.anl.gov<mailto:bsmith at mcs.anl.gov>>, "petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>" <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Matrix-free method in PETSc

On Tue, Feb 18, 2020 at 8:20 AM Yuyun Yang <yyang85 at stanford.edu<mailto:yyang85 at stanford.edu>> wrote:
Thanks for the clarification.

Got one more question: if I have variable coefficients, my stencil will be updated at every time step, so will the coefficients in myMatMult. In that case, is it necessary to destroy the shell matrix and create it all over again, or can I use it as it is, only calling the stencil update function, assuming the result will be passed into the matrix operation automatically?

You update the information in the context associated with the shell matrix. No need to destroy it.

  Thanks,

    Matt

Thanks,
Yuyun

On 2/18/20, 7:34 AM, "Smith, Barry F." <bsmith at mcs.anl.gov<mailto:bsmith at mcs.anl.gov>> wrote:



    > On Feb 17, 2020, at 7:56 AM, Yuyun Yang <yyang85 at stanford.edu<mailto:yyang85 at stanford.edu>> wrote:
    >
    > Hello,
    >
    > I actually have a question about the usage of DMDA since I'm quite new to this. I wonder if the DMDA suite of functions can be directly called on vectors created from VecCreate?

       Yes, but you have to make sure the ones you create have the same sizes and parallel layouts. Generally best to get them from the DMDA or VecDuplicate() than the hassle of figuring out sizes.

    > Or the vectors have to be formed by DMDACreateGlobalVector? I'm also not sure about what the dof and stencil width arguments do.
    >
    > I'm still unsure about the usage of MatCreateShell and MatShellSetOperation, since it seems that MyMatMult should still have 3 inputs just like MatMult (the matrix and two vectors). Since I'm not forming the matrix, does that mean the matrix input is meaningless but still needs to exist for the sake of this format?

        Well the matrix input is your shell matrix so it likely has information you need to do your multiply routine. MatShellGetContext() (No you do not want to put your information about the matrix stencil inside global variables!)


    >
    > After I create such a shell matrix, can I use it like a regular matrix in KSP and utilize preconditioners?
    >
    > Thanks!
    > Yuyun
    > From: petsc-users <petsc-users-bounces at mcs.anl.gov<mailto:petsc-users-bounces at mcs.anl.gov>> on behalf of Yuyun Yang <yyang85 at stanford.edu<mailto:yyang85 at stanford.edu>>
    > Sent: Sunday, February 16, 2020 3:12 AM
    > To: Smith, Barry F. <bsmith at mcs.anl.gov<mailto:bsmith at mcs.anl.gov>>
    > Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
    > Subject: Re: [petsc-users] Matrix-free method in PETSc
    >
    > Thank you, that is very helpful information indeed! I will try it and send you my code when it works.
    >
    > Best regards,
    > Yuyun
    > From: Smith, Barry F. <bsmith at mcs.anl.gov<mailto:bsmith at mcs.anl.gov>>
    > Sent: Saturday, February 15, 2020 10:02 PM
    > To: Yuyun Yang <yyang85 at stanford.edu<mailto:yyang85 at stanford.edu>>
    > Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
    > Subject: Re: [petsc-users] Matrix-free method in PETSc
    >
    >   Yuyun,
    >
    >     If you are speaking about using a finite difference stencil on a structured grid where you provide the Jacobian vector products yourself by looping over the grid doing the stencil operation we unfortunately do not have exactly that kind of example.
    >
    >     But it is actually not difficult. I suggest starting with src/ts/examples/tests/ex22.c It computes the sparse matrix explicitly with FormIJacobian()
    >
    >     What you need to do is instead in main() use MatCreateShell() and MatShellSetOperation(,MATOP_MULT,(void (*)(void))MyMatMult) then provide the routine MyMatMult() to do your stencil based matrix free product; note that you can create this new routine by taking the structure of IFunction() and reorganizing it to do the Jacobian product instead. You will need to get the information about the shell matrix size on each process by calling DMDAGetCorners().
    >
    >     You will then remove the explicit computation of the Jacobian, and also remove the Event stuff since you don't need it.
    >
    >      Extending to 2 and 3d is straight forward.
    >
    >      Any questions let us know.
    >
    >    Barry
    >
    >    If you like this would make a great merge request with your code to improve our examples.
    >
    >
    > > On Feb 15, 2020, at 9:42 PM, Yuyun Yang <yyang85 at stanford.edu<mailto:yyang85 at stanford.edu>> wrote:
    > >
    > > Hello team,
    > >
    > > I wanted to apply the Krylov subspace method to a matrix-free implementation of a stencil, such that the iterative method acts on the operation without ever constructing the matrix explicitly (for example, when doing backward Euler).
    > >
    > > I'm not sure whether there is already an example for that somewhere. If so, could you point me to a relevant example?
    > >
    > > Thank you!
    > >
    > > Best regards,
    > > Yuyun




--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200218/3451b388/attachment-0001.html>

From juaneah at gmail.com  Tue Feb 18 13:15:57 2020
From: juaneah at gmail.com (Emmanuel Ayala)
Date: Tue, 18 Feb 2020 13:15:57 -0600
Subject: [petsc-users] SLEPc: The inner product is not well defined
In-Reply-To: <5772B337-D031-434F-B2B9-AC6EA7D19783@dsic.upv.es>
References: <CAMo+o5hZSMjEUS_pM62K0g46hYx9KBffJ7ETc=0QNqoAzZHpZw@mail.gmail.com>
	<E7DB53E3-2E41-4A5B-9C07-893ABB64C7FC@dsic.upv.es>
	<CAMo+o5izPGYrJ+4gGwofJJZNgU5wpGrqYcKssjpbrUF0UV=XJg@mail.gmail.com>
	<5772B337-D031-434F-B2B9-AC6EA7D19783@dsic.upv.es>
Message-ID: <CAMo+o5i+2ddU63bw7tX0GOex2Gq=Et=FhSsCKuZF9Ghkk29PVQ@mail.gmail.com>

Ok, thank you!

Kind regards.

El mar., 18 de feb. de 2020 a la(s) 03:17, Jose E. Roman (jroman at dsic.upv.es)
escribi?:

>
>
> El 17 feb 2020, a las 19:19, Emmanuel Ayala <juaneah at gmail.com> escribi?:
>
> Thank you very much for the answer.
>
> This error appears when computing the B-norm of a vector x, as
>> sqrt(x'*B*x). Probably your B matrix is semi-definite, and due to
>> floating-point error the value x'*B*x becomes negative for a certain vector
>> x. The code uses a tolerance of 10*PETSC_MACHINE_EPSILON, but it seems the
>> rounding errors are larger in your case. Or maybe your B-matrix is
>> indefinite, in which case you should solve the problem as non-symmetric (or
>> as symmetric-indefinite GHIEP).
>>
>> Do you get the same problem with the Krylov-Schur solver?
>>
>>
> After check the input matrices, the problem was solved using GHIEP.
>
>
>> A workaround is to edit the source code and remove the check or increase
>> the tolerance, but this may be catastrophic if your B is indefinite. A
>> better solution is to reformulate the problem, solving the matrix pair
>> (A,C) where C=alpha*A+beta*B is positive definite (note that then the
>> eigenvalues become lambda/(beta+alpha*lambda)).
>>
>>
> Ok, there is a rule to choose the values for alpha and beta?
>
>
> For instance take alpha=1 and beta=-sigma, where sigma is a lower bound of
> the leftmost eigenvalue of B (the most negative one). This assumes that A
> is positive definite.
>
> Jose
>
>
>
> Kind regards.
> Thanks.
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200218/5292506a/attachment.html>

From juaneah at gmail.com  Tue Feb 18 13:18:35 2020
From: juaneah at gmail.com (Emmanuel Ayala)
Date: Tue, 18 Feb 2020 13:18:35 -0600
Subject: [petsc-users] BCs for a EPS solver
In-Reply-To: <9286DD37-1487-4F54-BBC9-4E6133FEC916@dsic.upv.es>
References: <CAMo+o5j6KYjF-8kk=whV_BevUs3WTHvvNxY9EiLFywSCvcH=Qw@mail.gmail.com>
	<e8761982c583b46798d583c0e5e47654dbdff1b1.camel@seamplex.com>
	<CAMo+o5iz4dBwyS1gdj4UrTaLFze_AHOpmcW6LOFa-xHOXCcVxQ@mail.gmail.com>
	<CAMYG4G=q4jkrEgAHn1id-hjQfnceN_BQ5MFk=uS9Jn-2inW2ig@mail.gmail.com>
	<CAMo+o5hi48Nedqx_5Pc8b4+-TDejyTufxdssJA_-w4fs+J8M=A@mail.gmail.com>
	<CAMYG4G=qSog3sx90dtJu8+CEQVOXf5rhjQDBgEsmqJVhp5jUrg@mail.gmail.com>
	<9286DD37-1487-4F54-BBC9-4E6133FEC916@dsic.upv.es>
Message-ID: <CAMo+o5h66kyYA+jsPc+Zr6Z-96Q9pSJW7FSBq-r-T7PYZ1RJFg@mail.gmail.com>

Thanks for the answer.

Finally I generate a submatrix and It worked.

Kind regards.

El mar., 18 de feb. de 2020 a la(s) 03:51, Jose E. Roman (jroman at dsic.upv.es)
escribi?:

> You put alpha on the diagonal of A and beta on the diagonal of B to get an
> eigenvalue lambda=alpha/beta. If you set beta=0 then lambda=Inf. The choice
> depends on where your wanted eigenvalues are and how you are solving the
> eigenproblem. The choice of lambda=Inf suggested by Jeremy avoids inserting
> eigenvalues that may interfere with the problem's eigenvalues, but this is
> good for shift-and-invert, not for the case where you solve linear systems
> with B.
>
> Anyway, this kind of manipulation may have an impact on convergence of the
> eigensolver or on conditioning of the linear solves. A possibly better
> approach is just to get rid of the BC unknowns by creating smaller A, B
> matrices, e.g. with MatGetSubMatrix().
>
> Jose
>
>
> El 18 feb 2020, a las 0:41, Matthew Knepley <knepley at gmail.com> escribi?:
>
> On Mon, Feb 17, 2020 at 4:33 PM Emmanuel Ayala <juaneah at gmail.com> wrote:
>
>> Hi,
>>
>> Thank you for the clarification, now I understand what means change those
>> values, and I tried to do that.
>>
>> But if I put the row i to the identity in A, and zero in B, the solver
>> crash:
>>
>
> So if you need to factor B, maybe reverse it?
>
>   Thanks,
>
>     Matt
>
>
>> [0]PETSC ERROR: --------------------- Error Message
>> --------------------------------------------------------------
>> [0]PETSC ERROR: Error in external library
>> [1]PETSC ERROR: --------------------- Error Message
>> --------------------------------------------------------------
>> [1]PETSC ERROR: Error in external library
>> [1]PETSC ERROR: Error reported by MUMPS in numerical factorization phase:
>> INFOG(1)=-10, INFO(2)=0
>>
>> [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for
>> trouble shooting.
>> [1]PETSC ERROR: Petsc Release Version 3.12.3, Jan, 03, 2020
>> [1]PETSC ERROR: [2]PETSC ERROR: --------------------- Error Message
>> --------------------------------------------------------------
>> [2]PETSC ERROR: Error in external library
>> [2]PETSC ERROR: Error reported by MUMPS in numerical factorization phase:
>> INFOG(1)=-10, INFO(2)=0
>>
>> [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for
>> trouble shooting.
>> [2]PETSC ERROR: Petsc Release Version 3.12.3, Jan, 03, 2020
>> [2]PETSC ERROR: ./comp on a arch-linux-c-opt-O2-mumps named ayala by
>> ayala Mon Feb 17 15:28:01 2020
>> [3]PETSC ERROR: --------------------- Error Message
>> --------------------------------------------------------------
>> [3]PETSC ERROR: Error in external library
>> [3]PETSC ERROR: Error reported by MUMPS in numerical factorization phase:
>> INFOG(1)=-10, INFO(2)=0
>>
>> [3]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for
>> trouble shooting.
>> [3]PETSC ERROR: Petsc Release Version 3.12.3, Jan, 03, 2020
>> [3]PETSC ERROR: ./comp on a arch-linux-c-opt-O2-mumps named ayala by
>> ayala Mon Feb 17 15:28:01 2020
>> [3]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2
>> -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native"
>> FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich
>> --download-mumps --download-scalapack --download-parmetis --download-metis
>> --download-superlu_dist --download-cmake --download-fblaslapack=1
>> --with-cxx-dialect=C++11
>> [3]PETSC ERROR: #1 MatFactorNumeric_MUMPS() line 1365 in
>> /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/impls/aij/mpi/mumps/mumps.c
>> [0]PETSC ERROR: Error reported by MUMPS in numerical factorization phase:
>> INFOG(1)=-10, INFO(2)=33
>>
>> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for
>> trouble shooting.
>> [0]PETSC ERROR: Petsc Release Version 3.12.3, Jan, 03, 2020
>> [0]PETSC ERROR: ./comp on a arch-linux-c-opt-O2-mumps named ayala by
>> ayala Mon Feb 17 15:28:01 2020
>> [0]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2
>> -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native"
>> FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich
>> --download-mumps --download-scalapack --download-parmetis --download-metis
>> --download-superlu_dist --download-cmake --download-fblaslapack=1
>> --with-cxx-dialect=C++11
>> [0]PETSC ERROR: #1 MatFactorNumeric_MUMPS() line 1365 in
>> /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/impls/aij/mpi/mumps/mumps.c
>> ./comp on a arch-linux-c-opt-O2-mumps named ayala by ayala Mon Feb 17
>> 15:28:01 2020
>> [1]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2
>> -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native"
>> FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich
>> --download-mumps --download-scalapack --download-parmetis --download-metis
>> --download-superlu_dist --download-cmake --download-fblaslapack=1
>> --with-cxx-dialect=C++11
>> [1]PETSC ERROR: #1 MatFactorNumeric_MUMPS() line 1365 in
>> /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/impls/aij/mpi/mumps/mumps.c
>> [1]PETSC ERROR: #2 MatLUFactorNumeric() line 3057 in
>> /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/interface/matrix.c
>> [2]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2
>> -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native"
>> FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich
>> --download-mumps --download-scalapack --download-parmetis --download-metis
>> --download-superlu_dist --download-cmake --download-fblaslapack=1
>> --with-cxx-dialect=C++11
>> [2]PETSC ERROR: #1 MatFactorNumeric_MUMPS() line 1365 in
>> /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/impls/aij/mpi/mumps/mumps.c
>> [2]PETSC ERROR: #2 MatLUFactorNumeric() line 3057 in
>> /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/interface/matrix.c
>> [2]PETSC ERROR: [3]PETSC ERROR: #2 MatLUFactorNumeric() line 3057 in
>> /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/interface/matrix.c
>> [3]PETSC ERROR: #3 PCSetUp_LU() line 126 in
>> /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/impls/factor/lu/lu.c
>> [3]PETSC ERROR: #4 PCSetUp() line 894 in
>> /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/interface/precon.c
>> [0]PETSC ERROR: #2 MatLUFactorNumeric() line 3057 in
>> /home/ayala/Documents/PETSc/petsc-3.12.3/src/mat/interface/matrix.c
>> [0]PETSC ERROR: #3 PCSetUp_LU() line 126 in
>> /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/impls/factor/lu/lu.c
>> [0]PETSC ERROR: [1]PETSC ERROR: #3 PCSetUp_LU() line 126 in
>> /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/impls/factor/lu/lu.c
>> [1]PETSC ERROR: #4 PCSetUp() line 894 in
>> /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/interface/precon.c
>> [1]PETSC ERROR: #3 PCSetUp_LU() line 126 in
>> /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/impls/factor/lu/lu.c
>> [2]PETSC ERROR: #4 PCSetUp() line 894 in
>> /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/interface/precon.c
>> [2]PETSC ERROR: #5 KSPSetUp() line 376 in
>> /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/ksp/interface/itfunc.c
>> [3]PETSC ERROR: #5 KSPSetUp() line 376 in
>> /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/ksp/interface/itfunc.c
>> [3]PETSC ERROR: #6 STSetUp_Shift() line 120 in
>> /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/impls/shift/shift.c
>> #4 PCSetUp() line 894 in
>> /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/pc/interface/precon.c
>> [0]PETSC ERROR: #5 KSPSetUp() line 376 in
>> /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/ksp/interface/itfunc.c
>> [0]PETSC ERROR: #6 STSetUp_Shift() line 120 in
>> /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/impls/shift/shift.c
>> [0]PETSC ERROR: #5 KSPSetUp() line 376 in
>> /home/ayala/Documents/PETSc/petsc-3.12.3/src/ksp/ksp/interface/itfunc.c
>> [1]PETSC ERROR: #6 STSetUp_Shift() line 120 in
>> /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/impls/shift/shift.c
>> [1]PETSC ERROR: #7 STSetUp() line 271 in
>> /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/interface/stsolve.c
>> [1]PETSC ERROR: #8 EPSSetUp() line 273 in
>> /home/ayala/Documents/SLEPc/slepc-3.12.2/src/eps/interface/epssetup.c
>> [2]PETSC ERROR: #6 STSetUp_Shift() line 120 in
>> /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/impls/shift/shift.c
>> [2]PETSC ERROR: #7 STSetUp() line 271 in
>> /home/ayala/Documents/SLEPc/slepc-3.12.2/src/sys/classes/st/interface/stsolve.c
>> [2]PETSC ERROR: #8 EPSSetUp() line 273 in
>> /home/ayala/Documents/SLEPc/slepc-3.12.2/src/eps/interface/epssetup.c
>> [2]PETSC ERROR: #9 FourBar_NaturalPulsation() line 3937 in
>> /home/ayala/Nextcloud/cpp_projects/2020-02-13-muboto-balancing-v17-mma/
>> Multibody.cc
>>
>> El lun., 17 de feb. de 2020 a la(s) 13:20, Matthew Knepley (
>> knepley at gmail.com) escribi?:
>>
>>> On Mon, Feb 17, 2020 at 1:59 PM Emmanuel Ayala <juaneah at gmail.com>
>>> wrote:
>>>
>>>> Hi, thanks for the quick answer.
>>>>
>>>> I just did it, and it does not work. My problem is GNHEP and I use the
>>>> default solver (Krylov-Schur). Moreover I run the code with the options:
>>>> -st_ksp_type preonly -st_pc_type lu -st_pc_factor_mat_solver_type mumps
>>>>
>>>
>>> I guess a better question is: What do you expect to work?
>>>
>>> For a linear solve,
>>>
>>>   A x = b
>>>
>>> if a row i is 0 except for a one on the diagonal, then I get
>>>
>>>   x_i = b_i
>>>
>>> so hopefully you put the correct boundary value in b_i. For the
>>> generalized eigenproblem
>>>
>>>   A x = \lambda B x
>>>
>>> if you set row i to the identity in A, and zero in B, we get
>>>
>>>   x_i = 0
>>>
>>> and you must put the boundary values into x after you have finished the
>>> solve. Is this what you did?
>>>
>>>   Thanks,
>>>
>>>      Matt
>>>
>>>
>>>> Any other suggestions?
>>>> Kind regards.
>>>>
>>>> El lun., 17 de feb. de 2020 a la(s) 12:39, Jeremy Theler (
>>>> jeremy at seamplex.com) escribi?:
>>>>
>>>>> The usual trick is to set ones in one matrix and zeros in the other
>>>>> one.
>>>>>
>>>>>
>>>>> On Mon, 2020-02-17 at 12:35 -0600, Emmanuel Ayala wrote:
>>>>> > Hi everyone,
>>>>> >
>>>>> > I have an eigenvalue problem where I need to apply BCs to the
>>>>> > stiffness and mass matrix.
>>>>> >
>>>>> > Usually, for KSP solver, it is enough to set to zero the rows and
>>>>> > columns related to the boundary conditions. I used to apply it with
>>>>> > MatZeroRowsColumns, with a 1s on the diagonal. Then the solver works
>>>>> > well.
>>>>> >
>>>>> > There is something similar to KSP for EPS solver ?
>>>>> >
>>>>> > I already used MatZeroRowsColumns (for EPS solver), with a 1s on the
>>>>> > diagonal, and I got wrong result.
>>>>> >
>>>>> > Kind regards.
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> <http://www.cse.buffalo.edu/~knepley/>
>>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200218/f1e5685f/attachment-0001.html>

From hectorb at utexas.edu  Tue Feb 18 14:30:44 2020
From: hectorb at utexas.edu (Hector E Barrios Molano)
Date: Tue, 18 Feb 2020 14:30:44 -0600
Subject: [petsc-users] Efficiently move matrix from single processor to
 multiple
Message-ID: <beed299b-e163-95d7-47a0-7b59311cf536@utexas.edu>

Dear PETSc Experts!

Do you know if there is an efficient way to move a matrix from a single 
processor (MatCreateSeqBAIJ) to a matrix contained in all processors?

As a little bit of context, I have a code in which only one processor 
creates a matrix and a vector for a linear system of equations. Then we 
want to use a parallel solver to get the solution and give it back to a 
single processor

I tried MatView to create a binary file and MatLoad to load the matrix 
in parallel. This seems to work but performance is significantly 
decreased independent of the number of processors used.

I have some questions:

Can I share the matrix without having to write it to a file, for 
example, through a buffer?
Is there a way to efficiently avoid the overhead of writing, reading 
loading matrices to and from processors?

Thanks for your comments,

Hector
-- 
*Hector Barrios*
PhD Student, Graduate Research Assistant
Hildebrand Department of Petroleum and Geosystems Engineering
The University of Texas at Austin
hectorb at utexas.edu <mailto:hectorb at utexas.edu>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200218/0968a21b/attachment.html>

From jed at jedbrown.org  Tue Feb 18 14:49:32 2020
From: jed at jedbrown.org (Jed Brown)
Date: Tue, 18 Feb 2020 13:49:32 -0700
Subject: [petsc-users] Efficiently move matrix from single processor to
 multiple
In-Reply-To: <beed299b-e163-95d7-47a0-7b59311cf536@utexas.edu>
References: <beed299b-e163-95d7-47a0-7b59311cf536@utexas.edu>
Message-ID: <87blpv39k3.fsf@jedbrown.org>

Hector E Barrios Molano <hectorb at utexas.edu> writes:

> Dear PETSc Experts!
>
> Do you know if there is an efficient way to move a matrix from a single 
> processor (MatCreateSeqBAIJ) to a matrix contained in all processors?

How did you create the original SeqBAIJ?  Could you just call
MatSetValues on a parallel matrix instead (even if only setting values
from rank 0)?  Many of our tutorials suggest this mode if you can't
parallelize your assembly.

> As a little bit of context, I have a code in which only one processor 
> creates a matrix and a vector for a linear system of equations. Then we 
> want to use a parallel solver to get the solution and give it back to a 
> single processor
>
> I tried MatView to create a binary file and MatLoad to load the matrix 
> in parallel. This seems to work but performance is significantly 
> decreased independent of the number of processors used.
>
> I have some questions:
>
> Can I share the matrix without having to write it to a file, for 
> example, through a buffer?
> Is there a way to efficiently avoid the overhead of writing, reading 
> loading matrices to and from processors?
>
> Thanks for your comments,
>
> Hector
> -- 
> *Hector Barrios*
> PhD Student, Graduate Research Assistant
> Hildebrand Department of Petroleum and Geosystems Engineering
> The University of Texas at Austin
> hectorb at utexas.edu <mailto:hectorb at utexas.edu>

From yyang85 at stanford.edu  Tue Feb 18 17:56:47 2020
From: yyang85 at stanford.edu (Yuyun Yang)
Date: Tue, 18 Feb 2020 23:56:47 +0000
Subject: [petsc-users] Matrix-free method in PETSc
In-Reply-To: <B976BB0E-0478-4F67-BF3E-B8EC060399AB@anl.gov>
References: <DM6PR02MB52427978C0D49D7F6D07FCF9D2170@DM6PR02MB5242.namprd02.prod.outlook.com>
	<D9C47F36-D516-4ABA-A31B-30CE5F8D9744@anl.gov>
	<DM6PR02MB5242C9E2F8ED6FB1A1A28CC8D2170@DM6PR02MB5242.namprd02.prod.outlook.com>
	<DM6PR02MB524259A2FC82DE98649B15E9D2160@DM6PR02MB5242.namprd02.prod.outlook.com>
	<624EDFBF-E105-4F70-87C6-58BE940D73C9@mcs.anl.gov>
	<1D793649-96CD-4807-8330-53D9CA176348@stanford.edu>
	<CAMYG4GnDFGvktLEzpQLh5-_N-BSAkZ21gEahnU=MvKmYY1u5_g@mail.gmail.com>
	<FDFEE410-DF1A-47CC-9051-E9805ED497F9@stanford.edu>
	<B976BB0E-0478-4F67-BF3E-B8EC060399AB@anl.gov>
Message-ID: <5F6B7FE1-AF86-4F37-ACDD-84408ADA9A9C@stanford.edu>

Thanks a lot for the example!

From: "Zhang, Hong" <hongzhang at anl.gov>
Date: Tuesday, February 18, 2020 at 11:32 PM
To: Yuyun Yang <yyang85 at stanford.edu>
Cc: Matthew Knepley <knepley at gmail.com>, "petsc-users at mcs.anl.gov" <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Matrix-free method in PETSc

Here is an TS example using DMDA and matrix-free Jacobians. Though the matrix-free part is faked, it demonstrates the workflow.

https://gitlab.com/petsc/petsc/-/blob/hongzh/ts-matshell-example/src/ts/examples/tutorials/advection-diffusion-reaction/ex5_mf.c

Hong (Mr.)



On Feb 18, 2020, at 8:26 AM, Yuyun Yang <yyang85 at stanford.edu<mailto:yyang85 at stanford.edu>> wrote:

Thanks. Also, when using KSP, would the syntax be KSPSetOperators(ksp,A,A)? Since you mentioned preconditioners are not generally used for matrix-free operators, I wasn?t sure whether I should still put ?A? in the Pmat field.

Is it still possible to use TS in conjunction with the matrix-free operator? I?d like to create a simple test case that solves the 1d heat equation implicitly with variable coefficients, but didn?t know how the time stepping can be set up.

Thanks,
Yuyun

From: Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>>
Date: Tuesday, February 18, 2020 at 9:23 PM
To: Yuyun Yang <yyang85 at stanford.edu<mailto:yyang85 at stanford.edu>>
Cc: "Smith, Barry F." <bsmith at mcs.anl.gov<mailto:bsmith at mcs.anl.gov>>, "petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>" <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Matrix-free method in PETSc

On Tue, Feb 18, 2020 at 8:20 AM Yuyun Yang <yyang85 at stanford.edu<mailto:yyang85 at stanford.edu>> wrote:
Thanks for the clarification.

Got one more question: if I have variable coefficients, my stencil will be updated at every time step, so will the coefficients in myMatMult. In that case, is it necessary to destroy the shell matrix and create it all over again, or can I use it as it is, only calling the stencil update function, assuming the result will be passed into the matrix operation automatically?

You update the information in the context associated with the shell matrix. No need to destroy it.

  Thanks,

    Matt

Thanks,
Yuyun

On 2/18/20, 7:34 AM, "Smith, Barry F." <bsmith at mcs.anl.gov<mailto:bsmith at mcs.anl.gov>> wrote:



    > On Feb 17, 2020, at 7:56 AM, Yuyun Yang <yyang85 at stanford.edu<mailto:yyang85 at stanford.edu>> wrote:
    >
    > Hello,
    >
    > I actually have a question about the usage of DMDA since I'm quite new to this. I wonder if the DMDA suite of functions can be directly called on vectors created from VecCreate?

       Yes, but you have to make sure the ones you create have the same sizes and parallel layouts. Generally best to get them from the DMDA or VecDuplicate() than the hassle of figuring out sizes.

    > Or the vectors have to be formed by DMDACreateGlobalVector? I'm also not sure about what the dof and stencil width arguments do.
    >
    > I'm still unsure about the usage of MatCreateShell and MatShellSetOperation, since it seems that MyMatMult should still have 3 inputs just like MatMult (the matrix and two vectors). Since I'm not forming the matrix, does that mean the matrix input is meaningless but still needs to exist for the sake of this format?

        Well the matrix input is your shell matrix so it likely has information you need to do your multiply routine. MatShellGetContext() (No you do not want to put your information about the matrix stencil inside global variables!)


    >
    > After I create such a shell matrix, can I use it like a regular matrix in KSP and utilize preconditioners?
    >
    > Thanks!
    > Yuyun
    > From: petsc-users <petsc-users-bounces at mcs.anl.gov<mailto:petsc-users-bounces at mcs.anl.gov>> on behalf of Yuyun Yang <yyang85 at stanford.edu<mailto:yyang85 at stanford.edu>>
    > Sent: Sunday, February 16, 2020 3:12 AM
    > To: Smith, Barry F. <bsmith at mcs.anl.gov<mailto:bsmith at mcs.anl.gov>>
    > Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
    > Subject: Re: [petsc-users] Matrix-free method in PETSc
    >
    > Thank you, that is very helpful information indeed! I will try it and send you my code when it works.
    >
    > Best regards,
    > Yuyun
    > From: Smith, Barry F. <bsmith at mcs.anl.gov<mailto:bsmith at mcs.anl.gov>>
    > Sent: Saturday, February 15, 2020 10:02 PM
    > To: Yuyun Yang <yyang85 at stanford.edu<mailto:yyang85 at stanford.edu>>
    > Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
    > Subject: Re: [petsc-users] Matrix-free method in PETSc
    >
    >   Yuyun,
    >
    >     If you are speaking about using a finite difference stencil on a structured grid where you provide the Jacobian vector products yourself by looping over the grid doing the stencil operation we unfortunately do not have exactly that kind of example.
    >
    >     But it is actually not difficult. I suggest starting with src/ts/examples/tests/ex22.c It computes the sparse matrix explicitly with FormIJacobian()
    >
    >     What you need to do is instead in main() use MatCreateShell() and MatShellSetOperation(,MATOP_MULT,(void (*)(void))MyMatMult) then provide the routine MyMatMult() to do your stencil based matrix free product; note that you can create this new routine by taking the structure of IFunction() and reorganizing it to do the Jacobian product instead. You will need to get the information about the shell matrix size on each process by calling DMDAGetCorners().
    >
    >     You will then remove the explicit computation of the Jacobian, and also remove the Event stuff since you don't need it.
    >
    >      Extending to 2 and 3d is straight forward.
    >
    >      Any questions let us know.
    >
    >    Barry
    >
    >    If you like this would make a great merge request with your code to improve our examples.
    >
    >
    > > On Feb 15, 2020, at 9:42 PM, Yuyun Yang <yyang85 at stanford.edu<mailto:yyang85 at stanford.edu>> wrote:
    > >
    > > Hello team,
    > >
    > > I wanted to apply the Krylov subspace method to a matrix-free implementation of a stencil, such that the iterative method acts on the operation without ever constructing the matrix explicitly (for example, when doing backward Euler).
    > >
    > > I'm not sure whether there is already an example for that somewhere. If so, could you point me to a relevant example?
    > >
    > > Thank you!
    > >
    > > Best regards,
    > > Yuyun





--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200218/83aeef7a/attachment-0001.html>

From bsmith at mcs.anl.gov  Tue Feb 18 21:09:04 2020
From: bsmith at mcs.anl.gov (Smith, Barry F.)
Date: Wed, 19 Feb 2020 03:09:04 +0000
Subject: [petsc-users] Matrix-free method in PETSc
In-Reply-To: <FDFEE410-DF1A-47CC-9051-E9805ED497F9@stanford.edu>
References: <DM6PR02MB52427978C0D49D7F6D07FCF9D2170@DM6PR02MB5242.namprd02.prod.outlook.com>
	<D9C47F36-D516-4ABA-A31B-30CE5F8D9744@anl.gov>
	<DM6PR02MB5242C9E2F8ED6FB1A1A28CC8D2170@DM6PR02MB5242.namprd02.prod.outlook.com>
	<DM6PR02MB524259A2FC82DE98649B15E9D2160@DM6PR02MB5242.namprd02.prod.outlook.com>
	<624EDFBF-E105-4F70-87C6-58BE940D73C9@mcs.anl.gov>
	<1D793649-96CD-4807-8330-53D9CA176348@stanford.edu>
	<CAMYG4GnDFGvktLEzpQLh5-_N-BSAkZ21gEahnU=MvKmYY1u5_g@mail.gmail.com>
	<FDFEE410-DF1A-47CC-9051-E9805ED497F9@stanford.edu>
Message-ID: <264F2C78-2D0D-46CC-A161-6161D97B433E@mcs.anl.gov>



> On Feb 18, 2020, at 8:26 AM, Yuyun Yang <yyang85 at stanford.edu> wrote:
> 
> Thanks. Also, when using KSP, would the syntax be KSPSetOperators(ksp,A,A)? Since you mentioned preconditioners are not generally used for matrix-free operators, I wasn?t sure whether I should still put ?A? in the Pmat field.
>  
> Is it still possible to use TS in conjunction with the matrix-free operator? I?d like to create a simple test case that solves the 1d heat equation implicitly with variable coefficients, but didn?t know how the time stepping can be set up.


   On Feb 15, 2020, at 9:42 PM you asked about "(for example, when doing backward Euler)" 

   on Saturday, February 15, 2020 10:02 PM I suggested you start with the example src/ts/examples/tests/ex22.c I outlined how you could change it to be matrix free. The example clearly uses TS 

    Now three days later you are asking about how time-stepping can be set up with a matrix-free operator? If you are going to ignore answers we provide to your questions maybe we won't bother answering in the future.





  





>  
> Thanks,
> Yuyun
>  
> From: Matthew Knepley <knepley at gmail.com>
> Date: Tuesday, February 18, 2020 at 9:23 PM
> To: Yuyun Yang <yyang85 at stanford.edu>
> Cc: "Smith, Barry F." <bsmith at mcs.anl.gov>, "petsc-users at mcs.anl.gov" <petsc-users at mcs.anl.gov>
> Subject: Re: [petsc-users] Matrix-free method in PETSc
>  
> On Tue, Feb 18, 2020 at 8:20 AM Yuyun Yang <yyang85 at stanford.edu> wrote:
> Thanks for the clarification.
> 
> Got one more question: if I have variable coefficients, my stencil will be updated at every time step, so will the coefficients in myMatMult. In that case, is it necessary to destroy the shell matrix and create it all over again, or can I use it as it is, only calling the stencil update function, assuming the result will be passed into the matrix operation automatically?
>  
> You update the information in the context associated with the shell matrix. No need to destroy it.
>  
>   Thanks,
>  
>     Matt
>  
> Thanks,
> Yuyun
> 
> On 2/18/20, 7:34 AM, "Smith, Barry F." <bsmith at mcs.anl.gov> wrote:
> 
> 
> 
>     > On Feb 17, 2020, at 7:56 AM, Yuyun Yang <yyang85 at stanford.edu> wrote:
>     > 
>     > Hello,
>     > 
>     > I actually have a question about the usage of DMDA since I'm quite new to this. I wonder if the DMDA suite of functions can be directly called on vectors created from VecCreate?
> 
>        Yes, but you have to make sure the ones you create have the same sizes and parallel layouts. Generally best to get them from the DMDA or VecDuplicate() than the hassle of figuring out sizes.
> 
>     > Or the vectors have to be formed by DMDACreateGlobalVector? I'm also not sure about what the dof and stencil width arguments do.
>     > 
>     > I'm still unsure about the usage of MatCreateShell and MatShellSetOperation, since it seems that MyMatMult should still have 3 inputs just like MatMult (the matrix and two vectors). Since I'm not forming the matrix, does that mean the matrix input is meaningless but still needs to exist for the sake of this format?
> 
>         Well the matrix input is your shell matrix so it likely has information you need to do your multiply routine. MatShellGetContext() (No you do not want to put your information about the matrix stencil inside global variables!)
> 
> 
>     > 
>     > After I create such a shell matrix, can I use it like a regular matrix in KSP and utilize preconditioners?
>     > 
>     > Thanks!
>     > Yuyun
>     > From: petsc-users <petsc-users-bounces at mcs.anl.gov> on behalf of Yuyun Yang <yyang85 at stanford.edu>
>     > Sent: Sunday, February 16, 2020 3:12 AM
>     > To: Smith, Barry F. <bsmith at mcs.anl.gov>
>     > Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
>     > Subject: Re: [petsc-users] Matrix-free method in PETSc
>     >  
>     > Thank you, that is very helpful information indeed! I will try it and send you my code when it works.
>     > 
>     > Best regards,
>     > Yuyun
>     > From: Smith, Barry F. <bsmith at mcs.anl.gov>
>     > Sent: Saturday, February 15, 2020 10:02 PM
>     > To: Yuyun Yang <yyang85 at stanford.edu>
>     > Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
>     > Subject: Re: [petsc-users] Matrix-free method in PETSc
>     >  
>     >   Yuyun,
>     > 
>     >     If you are speaking about using a finite difference stencil on a structured grid where you provide the Jacobian vector products yourself by looping over the grid doing the stencil operation we unfortunately do not have exactly that kind of example. 
>     > 
>     >     But it is actually not difficult. I suggest starting with src/ts/examples/tests/ex22.c It computes the sparse matrix explicitly with FormIJacobian() 
>     > 
>     >     What you need to do is instead in main() use MatCreateShell() and MatShellSetOperation(,MATOP_MULT,(void (*)(void))MyMatMult) then provide the routine MyMatMult() to do your stencil based matrix free product; note that you can create this new routine by taking the structure of IFunction() and reorganizing it to do the Jacobian product instead. You will need to get the information about the shell matrix size on each process by calling DMDAGetCorners(). 
>     > 
>     >     You will then remove the explicit computation of the Jacobian, and also remove the Event stuff since you don't need it.
>     > 
>     >      Extending to 2 and 3d is straight forward. 
>     > 
>     >      Any questions let us know.
>     > 
>     >    Barry
>     > 
>     >    If you like this would make a great merge request with your code to improve our examples.
>     > 
>     > 
>     > > On Feb 15, 2020, at 9:42 PM, Yuyun Yang <yyang85 at stanford.edu> wrote:
>     > > 
>     > > Hello team,
>     > > 
>     > > I wanted to apply the Krylov subspace method to a matrix-free implementation of a stencil, such that the iterative method acts on the operation without ever constructing the matrix explicitly (for example, when doing backward Euler).
>     > > 
>     > > I'm not sure whether there is already an example for that somewhere. If so, could you point me to a relevant example?
>     > > 
>     > > Thank you!
>     > > 
>     > > Best regards,
>     > > Yuyun
> 
> 
> 
> 
>  
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
>  
> https://www.cse.buffalo.edu/~knepley/


From mfadams at lbl.gov  Wed Feb 19 16:07:58 2020
From: mfadams at lbl.gov (Mark Adams)
Date: Wed, 19 Feb 2020 17:07:58 -0500
Subject: [petsc-users] ParMetis error
Message-ID: <CADOhEh4o-piqoDazLCy_GXJza+RnfdnQeTrzAkO9bJE+oRH7XQ@mail.gmail.com>

We have a code that works with v3.7.7 but with newer versions we get what
looks like an internal ParMetis error ('Failed during initial
partitioning'). See attached output.

I've never seen this message ... any ideas?

Thanks,
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200219/91556e90/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Petsc3-11_xgc_28263517_.log
Type: application/octet-stream
Size: 473109 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200219/91556e90/attachment-0001.obj>

From bsmith at mcs.anl.gov  Wed Feb 19 17:58:17 2020
From: bsmith at mcs.anl.gov (Smith, Barry F.)
Date: Wed, 19 Feb 2020 23:58:17 +0000
Subject: [petsc-users] ParMetis error
In-Reply-To: <CADOhEh4o-piqoDazLCy_GXJza+RnfdnQeTrzAkO9bJE+oRH7XQ@mail.gmail.com>
References: <CADOhEh4o-piqoDazLCy_GXJza+RnfdnQeTrzAkO9bJE+oRH7XQ@mail.gmail.com>
Message-ID: <37372288-90BA-4020-99F3-4E8541F0A5E2@anl.gov>


  Mark,

    It may be best to try jumping to the latest PETSc 3.12. ParMETIS had some difficult issues with matrices we started to provide to it in the last year and the code to handle the problems may not be in 3.11

    If the problem persists in 3.12 then I would start with checking with valgrind. 

   Barry


> On Feb 19, 2020, at 4:07 PM, Mark Adams <mfadams at lbl.gov> wrote:
> 
> We have a code that works with v3.7.7 but with newer versions we get what looks like an internal ParMetis error ('Failed during initial partitioning'). See attached output.
> 
> I've never seen this message ... any ideas?
> 
> Thanks,
> Mark
> <Petsc3-11_xgc_28263517_.log>


From jourdon_anthony at hotmail.fr  Thu Feb 20 02:03:18 2020
From: jourdon_anthony at hotmail.fr (Anthony Jourdon)
Date: Thu, 20 Feb 2020 08:03:18 +0000
Subject: [petsc-users] DMDA Error
In-Reply-To: <CA+MQGp9pzjftXqvqUeaAhfuPcump7rR5PY+wk4XcKOLgtxcDfA@mail.gmail.com>
References: <HE1PR08MB2922471BB05385C29F520442E9370@HE1PR08MB2922.eurprd08.prod.outlook.com>
	<CA+MQGp9tmjt0t-ypbknf6a=TFHSm4fe_xhZg1p3WroTw7xVHwg@mail.gmail.com>
	<DB6PR08MB2917AA2ACC8B6143145E1AB2E90D0@DB6PR08MB2917.eurprd08.prod.outlook.com>,
	<CA+MQGp9pzjftXqvqUeaAhfuPcump7rR5PY+wk4XcKOLgtxcDfA@mail.gmail.com>
Message-ID: <HE1PR08MB29222896F5296E1E888FC116E9130@HE1PR08MB2922.eurprd08.prod.outlook.com>

Hello,

After tests and discussions with the computer admins the problem is solved !
It appears that the bug indeed comes from intel mpi 2019 and all of its updates.
For reasons that I do not understand it seems that intel mpi 2019 gives strange MPI errors when inter-nodes communication is required for computers using infiniband.
Apparently this is a known error and indeed I found topics on forums talking about that.

I switch to intel mpi 2018 Update 3 and no problem, code runs normally on 1024 mpi ranks.

Thank you for your attention and your time !

Sincerly,
Anthony Jourdon

________________________________
De : Zhang, Junchao <jczhang at mcs.anl.gov>
Envoy? : vendredi 24 janvier 2020 16:52
? : Anthony Jourdon <jourdon_anthony at hotmail.fr>
Cc : petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Objet : Re: [petsc-users] DMDA Error

Hello, Anthony
  I tried petsc-3.8.4 + icc/gcc + Intel MPI 2019 update 5 + optimized/debug build, and ran with 1024 ranks, but I could not reproduce the error.  Maybe you can try these:
 * Use the latest petsc + your test example, run with AND without -vecscatter_type mpi1, to see if they can report useful messages.
 * Or, use Intel MPI 2019 update 6 to see if this is an Intel MPI bug.

$ cat ex50.c
#include <petscdm.h>
#include <petscdmda.h>

int main(int argc,char **argv)
{
  PetscErrorCode ierr;
  PetscInt       size;
  PetscInt       X = 1024,Y = 128,Z=512;
  //PetscInt     X = 512,Y = 64, Z=256;
  DM             da;

  ierr = PetscInitialize(&argc,&argv,(char*)0,NULL);if (ierr) return ierr;
  ierr = MPI_Comm_size(PETSC_COMM_WORLD,&size);CHKERRQ(ierr);

  ierr = DMDACreate3d(PETSC_COMM_WORLD,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE,DMDA_STENCIL_BOX,2*X+1,2*Y+1,2*Z+1,PETSC_DECIDE,PETSC_DECIDE,PETSC_DECIDE,3,2,NULL,NULL,NULL,&da);CHKERRQ(ierr);
  ierr = DMSetFromOptions(da);CHKERRQ(ierr);
  ierr = DMSetUp(da);CHKERRQ(ierr);

  ierr = PetscPrintf(PETSC_COMM_WORLD,"Running with %D MPI ranks\n",size);CHKERRQ(ierr);

  ierr = DMDestroy(&da);CHKERRQ(ierr);
  ierr = PetscFinalize();
  return ierr;
}

$ldd ex50
 linux-vdso.so.1 =>  (0x00007ffdbcd43000)
libpetsc.so.3.8 => /home/jczhang/petsc/linux-intel-opt/lib/libpetsc.so.3.8 (0x00002afd27e51000)
libX11.so.6 => /lib64/libX11.so.6 (0x00002afd2a811000)
libifport.so.5 => /blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-parallel-studio-cluster.2019.5-zqvneipqa4u52iwlyy5kx4hbsfnspz6g/compilers_and_libraries_2019.5.281/linux/compiler/lib/intel64_lin/libifport.so.5 (0x00002afd2ab4f000)
libmpicxx.so.12 => /blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-parallel-studio-cluster.2019.5-zqvneipqa4u52iwlyy5kx4hbsfnspz6g/compilers_and_libraries_2019.5.281/linux/mpi/intel64/lib/libmpicxx.so.12 (0x00002afd2ad7d000)
libdl.so.2 => /lib64/libdl.so.2 (0x00002afd2af9d000)
libmpifort.so.12 => /blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-parallel-studio-cluster.2019.5-zqvneipqa4u52iwlyy5kx4hbsfnspz6g/compilers_and_libraries_2019.5.281/linux/mpi/intel64/lib/libmpifort.so.12 (0x00002afd2b1a1000)
libmpi.so.12 => /blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-parallel-studio-cluster.2019.5-zqvneipqa4u52iwlyy5kx4hbsfnspz6g/compilers_and_libraries_2019.5.281/linux/mpi/intel64/lib/release/libmpi.so.12 (0x00002afd2b55f000)
librt.so.1 => /lib64/librt.so.1 (0x00002afd2c564000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00002afd2c76c000)
libimf.so => /blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-parallel-studio-cluster.2019.5-zqvneipqa4u52iwlyy5kx4hbsfnspz6g/compilers_and_libraries_2019.5.281/linux/compiler/lib/intel64_lin/libimf.so (0x00002afd2c988000)
libsvml.so => /blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-parallel-studio-cluster.2019.5-zqvneipqa4u52iwlyy5kx4hbsfnspz6g/compilers_and_libraries_2019.5.281/linux/compiler/lib/intel64_lin/libsvml.so (0x00002afd2d00d000)
libirng.so => /blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-parallel-studio-cluster.2019.5-zqvneipqa4u52iwlyy5kx4hbsfnspz6g/compilers_and_libraries_2019.5.281/linux/compiler/lib/intel64_lin/libirng.so (0x00002afd2ea99000)
libm.so.6 => /lib64/libm.so.6 (0x00002afd2ee04000)
libcilkrts.so.5 => /blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-parallel-studio-cluster.2019.5-zqvneipqa4u52iwlyy5kx4hbsfnspz6g/compilers_and_libraries_2019.5.281/linux/compiler/lib/intel64_lin/libcilkrts.so.5 (0x00002afd2f106000)
libstdc++.so.6 => /blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-parallel-studio-cluster.2019.5-zqvneipqa4u52iwlyy5kx4hbsfnspz6g/clck/2019.5/lib/intel64/libstdc++.so.6 (0x00002afd2f343000)
libgcc_s.so.1 => /blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-parallel-studio-cluster.2019.5-zqvneipqa4u52iwlyy5kx4hbsfnspz6g/clck/2019.5/lib/intel64/libgcc_s.so.1 (0x00002afd2f655000)
libirc.so => /blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-parallel-studio-cluster.2019.5-zqvneipqa4u52iwlyy5kx4hbsfnspz6g/compilers_and_libraries_2019.5.281/linux/compiler/lib/intel64_lin/libirc.so (0x00002afd2f86b000)
libc.so.6 => /lib64/libc.so.6 (0x00002afd2fadd000)
libintlc.so.5 => /blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-parallel-studio-cluster.2019.5-zqvneipqa4u52iwlyy5kx4hbsfnspz6g/compilers_and_libraries_2019.5.281/linux/compiler/lib/intel64_lin/libintlc.so.5 (0x00002afd2feaa000)
libxcb.so.1 => /lib64/libxcb.so.1 (0x00002afd3011c000)
/lib64/ld-linux-x86-64.so.2 (0x00002afd27c2d000)
libfabric.so.1 => /blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-parallel-studio-cluster.2019.5-zqvneipqa4u52iwlyy5kx4hbsfnspz6g/compilers_and_libraries_2019.5.281/linux/mpi/intel64/libfabric/lib/libfabric.so.1 (0x00002afd30344000)
libXau.so.6 => /lib64/libXau.so.6 (0x00002afd3057c000)

--Junchao Zhang


On Tue, Jan 21, 2020 at 2:25 AM Anthony Jourdon <jourdon_anthony at hotmail.fr<mailto:jourdon_anthony at hotmail.fr>> wrote:
Hello,

I made a test to try to reproduce the error.
To do so I modified the file $PETSC_DIR/src/dm/examples/tests/ex35.c
I attach the file in case of need.

The same error is reproduced for 1024 mpi ranks. I tested two problem sizes (2*512+1x2*64+1x2*256+1 and 2*1024+1x2*128+1x2*512+1) and the error occured for both cases, the first case is also the one I used to run before the OS and mpi updates.
I also run the code with -malloc_debug and nothing more appeared.

I attached the configure command I used to build a debug version of petsc.

Thank you for your time,
Sincerly.
Anthony Jourdon


________________________________
De : Zhang, Junchao <jczhang at mcs.anl.gov<mailto:jczhang at mcs.anl.gov>>
Envoy? : jeudi 16 janvier 2020 16:49
? : Anthony Jourdon <jourdon_anthony at hotmail.fr<mailto:jourdon_anthony at hotmail.fr>>
Cc : petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Objet : Re: [petsc-users] DMDA Error

It seems the problem is triggered by DMSetUp. You can write a small test creating the DMDA with the same size as your code, to see if you can reproduce the problem. If yes, it would be much easier for us to debug it.
--Junchao Zhang


On Thu, Jan 16, 2020 at 7:38 AM Anthony Jourdon <jourdon_anthony at hotmail.fr<mailto:jourdon_anthony at hotmail.fr>> wrote:

Dear Petsc developer,


I need assistance with an error.


I run a code that uses the DMDA related functions. I'm using petsc-3.8.4.


This code used to run very well on a super computer with the OS SLES11.

Petsc was built using an intel mpi 5.1.3.223 module and intel mkl version 2016.0.2.181

The code was running with no problem on 1024 and more mpi ranks.


Recently, the OS of the computer has been updated to RHEL7

I rebuilt Petsc using new available versions of intel mpi (2019U5) and mkl (2019.0.5.281) which are the same versions for compilers and mkl.

Since then I tested to run the exact same code on 8, 16, 24, 48, 512 and 1024 mpi ranks.

Until 1024 mpi ranks no problem, but for 1024 an error related to DMDA appeared. I snip the first lines of the error stack here and the full error stack is attached.


[534]PETSC ERROR: #1 PetscGatherMessageLengths() line 120 in /scratch2/dlp/appli_local/SCR/OROGEN/petsc3.8.4_MPI/petsc-3.8.4/src/sys/utils/mpimesg.c

[534]PETSC ERROR: #2 VecScatterCreate_PtoS() line 2288 in /scratch2/dlp/appli_local/SCR/OROGEN/petsc3.8.4_MPI/petsc-3.8.4/src/vec/vec/utils/vpscat.c

[534]PETSC ERROR: #3 VecScatterCreate() line 1462 in /scratch2/dlp/appli_local/SCR/OROGEN/petsc3.8.4_MPI/petsc-3.8.4/src/vec/vec/utils/vscat.c

[534]PETSC ERROR: #4 DMSetUp_DA_3D() line 1042 in /scratch2/dlp/appli_local/SCR/OROGEN/petsc3.8.4_MPI/petsc-3.8.4/src/dm/impls/da/da3.c

[534]PETSC ERROR: #5 DMSetUp_DA() line 25 in /scratch2/dlp/appli_local/SCR/OROGEN/petsc3.8.4_MPI/petsc-3.8.4/src/dm/impls/da/dareg.c

[534]PETSC ERROR: #6 DMSetUp() line 720 in /scratch2/dlp/appli_local/SCR/OROGEN/petsc3.8.4_MPI/petsc-3.8.4/src/dm/interface/dm.c



Thank you for your time,

Sincerly,


Anthony Jourdon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200220/e5a59832/attachment-0001.html>

From eijkhout at tacc.utexas.edu  Thu Feb 20 08:47:22 2020
From: eijkhout at tacc.utexas.edu (Victor Eijkhout)
Date: Thu, 20 Feb 2020 14:47:22 +0000
Subject: [petsc-users] DMDA Error
In-Reply-To: <HE1PR08MB29222896F5296E1E888FC116E9130@HE1PR08MB2922.eurprd08.prod.outlook.com>
References: <HE1PR08MB2922471BB05385C29F520442E9370@HE1PR08MB2922.eurprd08.prod.outlook.com>
	<CA+MQGp9tmjt0t-ypbknf6a=TFHSm4fe_xhZg1p3WroTw7xVHwg@mail.gmail.com>
	<DB6PR08MB2917AA2ACC8B6143145E1AB2E90D0@DB6PR08MB2917.eurprd08.prod.outlook.com>
	<CA+MQGp9pzjftXqvqUeaAhfuPcump7rR5PY+wk4XcKOLgtxcDfA@mail.gmail.com>
	<HE1PR08MB29222896F5296E1E888FC116E9130@HE1PR08MB2922.eurprd08.prod.outlook.com>
Message-ID: <B8CD0FE2-6C3B-449F-B6B9-EEDF7B52489F@tacc.utexas.edu>



On , 2020Feb20, at 02:03, Anthony Jourdon <jourdon_anthony at hotmail.fr<mailto:jourdon_anthony at hotmail.fr>> wrote:

It appears that the bug indeed comes from intel mpi 2019 and all of its updates.


Try

export I_MPI_ADJUST_ALLREDUCE=1

(Please check spelling)

For some reason the default allreduce algorithm is broken. This setting with 19.0.6 has solved many problems. Don?t use 19.0.5 or earlier at all.

Victor.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200220/c574b8e6/attachment.html>

From mfadams at lbl.gov  Thu Feb 20 13:22:07 2020
From: mfadams at lbl.gov (Mark Adams)
Date: Thu, 20 Feb 2020 14:22:07 -0500
Subject: [petsc-users] static libs
Message-ID: <CADOhEh5zOk0SECF80sBv8dogEyLtcAZxwB7s5vQX7ifj8XmMOg@mail.gmail.com>

We are having problems linking with at Cray static library environment
variable, that is required to link Adios, and IO package. How does one
build with static PETSc libs?
Thanks,
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200220/4f1e4957/attachment.html>

From balay at mcs.anl.gov  Thu Feb 20 13:24:14 2020
From: balay at mcs.anl.gov (Satish Balay)
Date: Thu, 20 Feb 2020 13:24:14 -0600
Subject: [petsc-users] static libs
In-Reply-To: <CADOhEh5zOk0SECF80sBv8dogEyLtcAZxwB7s5vQX7ifj8XmMOg@mail.gmail.com>
References: <CADOhEh5zOk0SECF80sBv8dogEyLtcAZxwB7s5vQX7ifj8XmMOg@mail.gmail.com>
Message-ID: <alpine.LFD.2.21.2002201323540.2066@sb>

You can build PETSc statically with configure option:

--with-shared-libraries=0

Satish

On Thu, 20 Feb 2020, Mark Adams wrote:

> We are having problems linking with at Cray static library environment
> variable, that is required to link Adios, and IO package. How does one
> build with static PETSc libs?
> Thanks,
> Mark
> 


From balay at mcs.anl.gov  Thu Feb 20 13:29:49 2020
From: balay at mcs.anl.gov (Satish Balay)
Date: Thu, 20 Feb 2020 13:29:49 -0600
Subject: [petsc-users] static libs
In-Reply-To: <alpine.LFD.2.21.2002201323540.2066@sb>
References: <CADOhEh5zOk0SECF80sBv8dogEyLtcAZxwB7s5vQX7ifj8XmMOg@mail.gmail.com>
	<alpine.LFD.2.21.2002201323540.2066@sb>
Message-ID: <alpine.LFD.2.21.2002201326480.2066@sb>

BTW: What do you mean by 'Cray static library environment variable'? Is it CRAYPE_LINK_TYPE? What is set to? What problems are you having?

One can get shared library build of PETSc working with:

export CRAYPE_LINK_TYPE=dynamic

Satish

On Thu, 20 Feb 2020, Satish Balay via petsc-users wrote:

> You can build PETSc statically with configure option:
> 
> --with-shared-libraries=0
> 
> Satish
> 
> On Thu, 20 Feb 2020, Mark Adams wrote:
> 
> > We are having problems linking with at Cray static library environment
> > variable, that is required to link Adios, and IO package. How does one
> > build with static PETSc libs?
> > Thanks,
> > Mark
> > 
> 


From asharma at pppl.gov  Thu Feb 20 13:52:04 2020
From: asharma at pppl.gov (Amil Sharma)
Date: Thu, 20 Feb 2020 14:52:04 -0500
Subject: [petsc-users] static libs
In-Reply-To: <alpine.LFD.2.21.2002201326480.2066@sb>
References: <CADOhEh5zOk0SECF80sBv8dogEyLtcAZxwB7s5vQX7ifj8XmMOg@mail.gmail.com>
	<alpine.LFD.2.21.2002201323540.2066@sb>
	<alpine.LFD.2.21.2002201326480.2066@sb>
Message-ID: <CAH_e+sxMaWc-i8+y8i+WB9YdBaeFSDsJHqPY6CX-LQXefkCoZQ@mail.gmail.com>

We need static linking in order to link an existing static IO library, but
we did not know the PETSc static build configure option.

On Thu, Feb 20, 2020 at 2:30 PM Satish Balay <balay at mcs.anl.gov> wrote:

> BTW: What do you mean by 'Cray static library environment variable'? Is it
> CRAYPE_LINK_TYPE? What is set to? What problems are you having?
>
> One can get shared library build of PETSc working with:
>
> export CRAYPE_LINK_TYPE=dynamic
>
> Satish
>
> On Thu, 20 Feb 2020, Satish Balay via petsc-users wrote:
>
> > You can build PETSc statically with configure option:
> >
> > --with-shared-libraries=0
> >
> > Satish
> >
> > On Thu, 20 Feb 2020, Mark Adams wrote:
> >
> > > We are having problems linking with at Cray static library environment
> > > variable, that is required to link Adios, and IO package. How does one
> > > build with static PETSc libs?
> > > Thanks,
> > > Mark
> > >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200220/b60ba722/attachment.html>

From asharma at pppl.gov  Thu Feb 20 14:01:57 2020
From: asharma at pppl.gov (Amil Sharma)
Date: Thu, 20 Feb 2020 15:01:57 -0500
Subject: [petsc-users] static libs
In-Reply-To: <CAKvGn45ye7tWA35jzUYMsOEPNZrWYRwKF4CFAP96fA_KsZk-+A@mail.gmail.com>
References: <CADOhEh5zOk0SECF80sBv8dogEyLtcAZxwB7s5vQX7ifj8XmMOg@mail.gmail.com>
	<alpine.LFD.2.21.2002201323540.2066@sb>
	<alpine.LFD.2.21.2002201326480.2066@sb>
	<CAH_e+sxMaWc-i8+y8i+WB9YdBaeFSDsJHqPY6CX-LQXefkCoZQ@mail.gmail.com>
	<CAKvGn45ye7tWA35jzUYMsOEPNZrWYRwKF4CFAP96fA_KsZk-+A@mail.gmail.com>
Message-ID: <CAH_e+sw-MtcixN76UUtAZ85CoNtJnPy3eziOhr=HR=6LQ=Ezcw@mail.gmail.com>

Just wondering if static linking is better for performance?

On Thu, Feb 20, 2020 at 2:58 PM Albert Mollen <amollen at pppl.gov> wrote:

> Hi Mark,
> I'm trying to rebuild Adios2 with dynamic linking on cori. Hopefully we
> can move over to that.
>
> Best regards
> ----------
> Albert Moll?n
> Associate Research Physicist
>
> Theory Department
> Princeton Plasma Physics Laboratory
> P.O. Box 451
> Princeton, NJ 08543-0451
> USA
>
> Tel. +1 609-243-3909
> E-mail: amollen at pppl.gov
>
>
> On Thu, Feb 20, 2020 at 2:52 PM Amil Sharma <asharma at pppl.gov> wrote:
>
>> We need static linking in order to link an existing static IO library,
>> but we did not know the PETSc static build configure option.
>>
>> On Thu, Feb 20, 2020 at 2:30 PM Satish Balay <balay at mcs.anl.gov> wrote:
>>
>>> BTW: What do you mean by 'Cray static library environment variable'? Is
>>> it CRAYPE_LINK_TYPE? What is set to? What problems are you having?
>>>
>>> One can get shared library build of PETSc working with:
>>>
>>> export CRAYPE_LINK_TYPE=dynamic
>>>
>>> Satish
>>>
>>> On Thu, 20 Feb 2020, Satish Balay via petsc-users wrote:
>>>
>>> > You can build PETSc statically with configure option:
>>> >
>>> > --with-shared-libraries=0
>>> >
>>> > Satish
>>> >
>>> > On Thu, 20 Feb 2020, Mark Adams wrote:
>>> >
>>> > > We are having problems linking with at Cray static library
>>> environment
>>> > > variable, that is required to link Adios, and IO package. How does
>>> one
>>> > > build with static PETSc libs?
>>> > > Thanks,
>>> > > Mark
>>> > >
>>> >
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200220/531dae1e/attachment.html>

From amollen at pppl.gov  Thu Feb 20 13:57:48 2020
From: amollen at pppl.gov (Albert Mollen)
Date: Thu, 20 Feb 2020 14:57:48 -0500
Subject: [petsc-users] static libs
In-Reply-To: <CAH_e+sxMaWc-i8+y8i+WB9YdBaeFSDsJHqPY6CX-LQXefkCoZQ@mail.gmail.com>
References: <CADOhEh5zOk0SECF80sBv8dogEyLtcAZxwB7s5vQX7ifj8XmMOg@mail.gmail.com>
	<alpine.LFD.2.21.2002201323540.2066@sb>
	<alpine.LFD.2.21.2002201326480.2066@sb>
	<CAH_e+sxMaWc-i8+y8i+WB9YdBaeFSDsJHqPY6CX-LQXefkCoZQ@mail.gmail.com>
Message-ID: <CAKvGn45ye7tWA35jzUYMsOEPNZrWYRwKF4CFAP96fA_KsZk-+A@mail.gmail.com>

Hi Mark,
I'm trying to rebuild Adios2 with dynamic linking on cori. Hopefully we can
move over to that.

Best regards
----------
Albert Moll?n
Associate Research Physicist

Theory Department
Princeton Plasma Physics Laboratory
P.O. Box 451
Princeton, NJ 08543-0451
USA

Tel. +1 609-243-3909
E-mail: amollen at pppl.gov


On Thu, Feb 20, 2020 at 2:52 PM Amil Sharma <asharma at pppl.gov> wrote:

> We need static linking in order to link an existing static IO library, but
> we did not know the PETSc static build configure option.
>
> On Thu, Feb 20, 2020 at 2:30 PM Satish Balay <balay at mcs.anl.gov> wrote:
>
>> BTW: What do you mean by 'Cray static library environment variable'? Is
>> it CRAYPE_LINK_TYPE? What is set to? What problems are you having?
>>
>> One can get shared library build of PETSc working with:
>>
>> export CRAYPE_LINK_TYPE=dynamic
>>
>> Satish
>>
>> On Thu, 20 Feb 2020, Satish Balay via petsc-users wrote:
>>
>> > You can build PETSc statically with configure option:
>> >
>> > --with-shared-libraries=0
>> >
>> > Satish
>> >
>> > On Thu, 20 Feb 2020, Mark Adams wrote:
>> >
>> > > We are having problems linking with at Cray static library environment
>> > > variable, that is required to link Adios, and IO package. How does one
>> > > build with static PETSc libs?
>> > > Thanks,
>> > > Mark
>> > >
>> >
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200220/e39e72b6/attachment-0001.html>

From amollen at pppl.gov  Thu Feb 20 13:57:48 2020
From: amollen at pppl.gov (Albert Mollen)
Date: Thu, 20 Feb 2020 14:57:48 -0500
Subject: [petsc-users] static libs
In-Reply-To: <CAH_e+sxMaWc-i8+y8i+WB9YdBaeFSDsJHqPY6CX-LQXefkCoZQ@mail.gmail.com>
References: <CADOhEh5zOk0SECF80sBv8dogEyLtcAZxwB7s5vQX7ifj8XmMOg@mail.gmail.com>
	<alpine.LFD.2.21.2002201323540.2066@sb>
	<alpine.LFD.2.21.2002201326480.2066@sb>
	<CAH_e+sxMaWc-i8+y8i+WB9YdBaeFSDsJHqPY6CX-LQXefkCoZQ@mail.gmail.com>
Message-ID: <CAKvGn45ye7tWA35jzUYMsOEPNZrWYRwKF4CFAP96fA_KsZk-+A@mail.gmail.com>

Hi Mark,
I'm trying to rebuild Adios2 with dynamic linking on cori. Hopefully we can
move over to that.

Best regards
----------
Albert Moll?n
Associate Research Physicist

Theory Department
Princeton Plasma Physics Laboratory
P.O. Box 451
Princeton, NJ 08543-0451
USA

Tel. +1 609-243-3909
E-mail: amollen at pppl.gov


On Thu, Feb 20, 2020 at 2:52 PM Amil Sharma <asharma at pppl.gov> wrote:

> We need static linking in order to link an existing static IO library, but
> we did not know the PETSc static build configure option.
>
> On Thu, Feb 20, 2020 at 2:30 PM Satish Balay <balay at mcs.anl.gov> wrote:
>
>> BTW: What do you mean by 'Cray static library environment variable'? Is
>> it CRAYPE_LINK_TYPE? What is set to? What problems are you having?
>>
>> One can get shared library build of PETSc working with:
>>
>> export CRAYPE_LINK_TYPE=dynamic
>>
>> Satish
>>
>> On Thu, 20 Feb 2020, Satish Balay via petsc-users wrote:
>>
>> > You can build PETSc statically with configure option:
>> >
>> > --with-shared-libraries=0
>> >
>> > Satish
>> >
>> > On Thu, 20 Feb 2020, Mark Adams wrote:
>> >
>> > > We are having problems linking with at Cray static library environment
>> > > variable, that is required to link Adios, and IO package. How does one
>> > > build with static PETSc libs?
>> > > Thanks,
>> > > Mark
>> > >
>> >
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200220/e39e72b6/attachment-0002.html>

From jczhang at mcs.anl.gov  Thu Feb 20 15:26:28 2020
From: jczhang at mcs.anl.gov (Junchao Zhang)
Date: Thu, 20 Feb 2020 15:26:28 -0600
Subject: [petsc-users] static libs
In-Reply-To: <CAH_e+sw-MtcixN76UUtAZ85CoNtJnPy3eziOhr=HR=6LQ=Ezcw@mail.gmail.com>
References: <CADOhEh5zOk0SECF80sBv8dogEyLtcAZxwB7s5vQX7ifj8XmMOg@mail.gmail.com>
	<alpine.LFD.2.21.2002201323540.2066@sb>
	<alpine.LFD.2.21.2002201326480.2066@sb>
	<CAH_e+sxMaWc-i8+y8i+WB9YdBaeFSDsJHqPY6CX-LQXefkCoZQ@mail.gmail.com>
	<CAKvGn45ye7tWA35jzUYMsOEPNZrWYRwKF4CFAP96fA_KsZk-+A@mail.gmail.com>
	<CAH_e+sw-MtcixN76UUtAZ85CoNtJnPy3eziOhr=HR=6LQ=Ezcw@mail.gmail.com>
Message-ID: <CA+MQGp9eDbgadHC5FxO60U3fsKqWBB8twqsj1BHGzXJG214arw@mail.gmail.com>

Copy & paste from a Cray paper:
"The main disadvantage of dynamic shared libraries is the runtime
performance costs of dynamic linking. Every time the program is executed it
has to perform a large part of its linking process. The lookup of symbols
in a dynamic shared library is much less efficient than in static
libraries. The loading of a dynamic shared library during an application?s
execution may result in a ?jitter? effect where a single process holds up
the forward progress of other processes of the application while it is
loading a library. "

BTW,  Cori's default is changed from static to dynamic.  I heard Frontier
will also use dynamic.

--Junchao Zhang


On Thu, Feb 20, 2020 at 2:03 PM Amil Sharma via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> Just wondering if static linking is better for performance?
>
> On Thu, Feb 20, 2020 at 2:58 PM Albert Mollen <amollen at pppl.gov> wrote:
>
>> Hi Mark,
>> I'm trying to rebuild Adios2 with dynamic linking on cori. Hopefully we
>> can move over to that.
>>
>> Best regards
>> ----------
>> Albert Moll?n
>> Associate Research Physicist
>>
>> Theory Department
>> Princeton Plasma Physics Laboratory
>> P.O. Box 451
>> Princeton, NJ 08543-0451
>> USA
>>
>> Tel. +1 609-243-3909
>> E-mail: amollen at pppl.gov
>>
>>
>> On Thu, Feb 20, 2020 at 2:52 PM Amil Sharma <asharma at pppl.gov> wrote:
>>
>>> We need static linking in order to link an existing static IO library,
>>> but we did not know the PETSc static build configure option.
>>>
>>> On Thu, Feb 20, 2020 at 2:30 PM Satish Balay <balay at mcs.anl.gov> wrote:
>>>
>>>> BTW: What do you mean by 'Cray static library environment variable'? Is
>>>> it CRAYPE_LINK_TYPE? What is set to? What problems are you having?
>>>>
>>>> One can get shared library build of PETSc working with:
>>>>
>>>> export CRAYPE_LINK_TYPE=dynamic
>>>>
>>>> Satish
>>>>
>>>> On Thu, 20 Feb 2020, Satish Balay via petsc-users wrote:
>>>>
>>>> > You can build PETSc statically with configure option:
>>>> >
>>>> > --with-shared-libraries=0
>>>> >
>>>> > Satish
>>>> >
>>>> > On Thu, 20 Feb 2020, Mark Adams wrote:
>>>> >
>>>> > > We are having problems linking with at Cray static library
>>>> environment
>>>> > > variable, that is required to link Adios, and IO package. How does
>>>> one
>>>> > > build with static PETSc libs?
>>>> > > Thanks,
>>>> > > Mark
>>>> > >
>>>> >
>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200220/5c1dd80a/attachment.html>

From jed at jedbrown.org  Thu Feb 20 15:46:36 2020
From: jed at jedbrown.org (Jed Brown)
Date: Thu, 20 Feb 2020 14:46:36 -0700
Subject: [petsc-users] static libs
In-Reply-To: <CA+MQGp9eDbgadHC5FxO60U3fsKqWBB8twqsj1BHGzXJG214arw@mail.gmail.com>
References: <CADOhEh5zOk0SECF80sBv8dogEyLtcAZxwB7s5vQX7ifj8XmMOg@mail.gmail.com>
	<alpine.LFD.2.21.2002201323540.2066@sb>
	<alpine.LFD.2.21.2002201326480.2066@sb>
	<CAH_e+sxMaWc-i8+y8i+WB9YdBaeFSDsJHqPY6CX-LQXefkCoZQ@mail.gmail.com>
	<CAKvGn45ye7tWA35jzUYMsOEPNZrWYRwKF4CFAP96fA_KsZk-+A@mail.gmail.com>
	<CAH_e+sw-MtcixN76UUtAZ85CoNtJnPy3eziOhr=HR=6LQ=Ezcw@mail.gmail.com>
	<CA+MQGp9eDbgadHC5FxO60U3fsKqWBB8twqsj1BHGzXJG214arw@mail.gmail.com>
Message-ID: <87o8ttvsn7.fsf@jedbrown.org>

Yeah, this startup is typically not bad for normal compiled programs,
but can be substantial if you have many libraries or are using a
language like Python, which may search hundreds or thousands of paths.
In any case, it's mainly a filesystem metadata scalability stress, and
one reason some people like to use containers.

Making private symbols private helps improve the performance of symbol
relocation (a sequential operation).  People rarely care about this in
scientific software since the actual symbol relocation rarely takes more
than a second or two.  Those interested in symbol visibility and
optimizing startup time for shared libraries can check out this classic
guid.

  https://www.akkadia.org/drepper/dsohowto.pdf

Junchao Zhang via petsc-users <petsc-users at mcs.anl.gov> writes:

> Copy & paste from a Cray paper:
> "The main disadvantage of dynamic shared libraries is the runtime
> performance costs of dynamic linking. Every time the program is executed it
> has to perform a large part of its linking process. The lookup of symbols
> in a dynamic shared library is much less efficient than in static
> libraries. The loading of a dynamic shared library during an application?s
> execution may result in a ?jitter? effect where a single process holds up
> the forward progress of other processes of the application while it is
> loading a library. "
>
> BTW,  Cori's default is changed from static to dynamic.  I heard Frontier
> will also use dynamic.
>
> --Junchao Zhang
>
>
> On Thu, Feb 20, 2020 at 2:03 PM Amil Sharma via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
>> Just wondering if static linking is better for performance?
>>
>> On Thu, Feb 20, 2020 at 2:58 PM Albert Mollen <amollen at pppl.gov> wrote:
>>
>>> Hi Mark,
>>> I'm trying to rebuild Adios2 with dynamic linking on cori. Hopefully we
>>> can move over to that.
>>>
>>> Best regards
>>> ----------
>>> Albert Moll?n
>>> Associate Research Physicist
>>>
>>> Theory Department
>>> Princeton Plasma Physics Laboratory
>>> P.O. Box 451
>>> Princeton, NJ 08543-0451
>>> USA
>>>
>>> Tel. +1 609-243-3909
>>> E-mail: amollen at pppl.gov
>>>
>>>
>>> On Thu, Feb 20, 2020 at 2:52 PM Amil Sharma <asharma at pppl.gov> wrote:
>>>
>>>> We need static linking in order to link an existing static IO library,
>>>> but we did not know the PETSc static build configure option.
>>>>
>>>> On Thu, Feb 20, 2020 at 2:30 PM Satish Balay <balay at mcs.anl.gov> wrote:
>>>>
>>>>> BTW: What do you mean by 'Cray static library environment variable'? Is
>>>>> it CRAYPE_LINK_TYPE? What is set to? What problems are you having?
>>>>>
>>>>> One can get shared library build of PETSc working with:
>>>>>
>>>>> export CRAYPE_LINK_TYPE=dynamic
>>>>>
>>>>> Satish
>>>>>
>>>>> On Thu, 20 Feb 2020, Satish Balay via petsc-users wrote:
>>>>>
>>>>> > You can build PETSc statically with configure option:
>>>>> >
>>>>> > --with-shared-libraries=0
>>>>> >
>>>>> > Satish
>>>>> >
>>>>> > On Thu, 20 Feb 2020, Mark Adams wrote:
>>>>> >
>>>>> > > We are having problems linking with at Cray static library
>>>>> environment
>>>>> > > variable, that is required to link Adios, and IO package. How does
>>>>> one
>>>>> > > build with static PETSc libs?
>>>>> > > Thanks,
>>>>> > > Mark
>>>>> > >
>>>>> >
>>>>>
>>>>>

From mfadams at lbl.gov  Thu Feb 20 16:47:37 2020
From: mfadams at lbl.gov (Mark Adams)
Date: Thu, 20 Feb 2020 17:47:37 -0500
Subject: [petsc-users] static libs
In-Reply-To: <alpine.LFD.2.21.2002201323540.2066@sb>
References: <CADOhEh5zOk0SECF80sBv8dogEyLtcAZxwB7s5vQX7ifj8XmMOg@mail.gmail.com>
	<alpine.LFD.2.21.2002201323540.2066@sb>
Message-ID: <CADOhEh5QMZ-iz+QgypgnS7wDLY_5aKHwzredfaJrSMeQ0p4eCw@mail.gmail.com>

On Thu, Feb 20, 2020 at 2:24 PM Satish Balay <balay at mcs.anl.gov> wrote:

> You can build PETSc statically with configure option:
>
> --with-shared-libraries=0
>

Thanks, I had forgotten this, was searching for 'static'


>
> Satish
>
> On Thu, 20 Feb 2020, Mark Adams wrote:
>
> > We are having problems linking with at Cray static library environment
> > variable, that is required to link Adios, and IO package. How does one
> > build with static PETSc libs?
> > Thanks,
> > Mark
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200220/e962a270/attachment.html>

From mfadams at lbl.gov  Thu Feb 20 16:51:26 2020
From: mfadams at lbl.gov (Mark Adams)
Date: Thu, 20 Feb 2020 17:51:26 -0500
Subject: [petsc-users] static libs
In-Reply-To: <alpine.LFD.2.21.2002201326480.2066@sb>
References: <CADOhEh5zOk0SECF80sBv8dogEyLtcAZxwB7s5vQX7ifj8XmMOg@mail.gmail.com>
	<alpine.LFD.2.21.2002201323540.2066@sb>
	<alpine.LFD.2.21.2002201326480.2066@sb>
Message-ID: <CADOhEh5q65AgLc+P7=Lacbzrny81HPPMsLSfHwmoUVoXZ=67sg@mail.gmail.com>

On Thu, Feb 20, 2020 at 2:30 PM Satish Balay <balay at mcs.anl.gov> wrote:

> BTW: What do you mean by 'Cray static library environment variable'? Is it
> CRAYPE_LINK_TYPE?


Yes, that was it.


> What is set to? What problems are you having?
>

I think they were using 'static' for Adios but are going to try to make it
work with dynamic.  Otherwise I can configure with static libs.

Thanks,


>
> One can get shared library build of PETSc working with:
>
> export CRAYPE_LINK_TYPE=dynamic
>
> Satish
>
> On Thu, 20 Feb 2020, Satish Balay via petsc-users wrote:
>
> > You can build PETSc statically with configure option:
> >
> > --with-shared-libraries=0
> >
> > Satish
> >
> > On Thu, 20 Feb 2020, Mark Adams wrote:
> >
> > > We are having problems linking with at Cray static library environment
> > > variable, that is required to link Adios, and IO package. How does one
> > > build with static PETSc libs?
> > > Thanks,
> > > Mark
> > >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200220/ab37d1ba/attachment-0001.html>

From eijkhout at tacc.utexas.edu  Thu Feb 20 16:53:27 2020
From: eijkhout at tacc.utexas.edu (Victor Eijkhout)
Date: Thu, 20 Feb 2020 22:53:27 +0000
Subject: [petsc-users] static libs
In-Reply-To: <CA+MQGp9eDbgadHC5FxO60U3fsKqWBB8twqsj1BHGzXJG214arw@mail.gmail.com>
References: <CADOhEh5zOk0SECF80sBv8dogEyLtcAZxwB7s5vQX7ifj8XmMOg@mail.gmail.com>
	<alpine.LFD.2.21.2002201323540.2066@sb>
	<alpine.LFD.2.21.2002201326480.2066@sb>
	<CAH_e+sxMaWc-i8+y8i+WB9YdBaeFSDsJHqPY6CX-LQXefkCoZQ@mail.gmail.com>
	<CAKvGn45ye7tWA35jzUYMsOEPNZrWYRwKF4CFAP96fA_KsZk-+A@mail.gmail.com>
	<CAH_e+sw-MtcixN76UUtAZ85CoNtJnPy3eziOhr=HR=6LQ=Ezcw@mail.gmail.com>
	<CA+MQGp9eDbgadHC5FxO60U3fsKqWBB8twqsj1BHGzXJG214arw@mail.gmail.com>
Message-ID: <1147D371-7873-417E-9E31-A6DAE09465B7@tacc.utexas.edu>



On , 2020Feb20, at 15:26, Junchao Zhang via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:

The main disadvantage of dynamic shared libraries is the runtime performance costs of dynamic linking. Every time the program is executed it has to perform a large part of its linking process

The main disadvantage of static linked libraries is the program load time. Each processor that executes the program has to load the executable from disk.

Static => large executables => disk hit.

Victor.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200220/1f6b14d4/attachment.html>

From jed at jedbrown.org  Thu Feb 20 18:29:29 2020
From: jed at jedbrown.org (Jed Brown)
Date: Thu, 20 Feb 2020 17:29:29 -0700
Subject: [petsc-users] static libs
In-Reply-To: <1147D371-7873-417E-9E31-A6DAE09465B7@tacc.utexas.edu>
References: <CADOhEh5zOk0SECF80sBv8dogEyLtcAZxwB7s5vQX7ifj8XmMOg@mail.gmail.com>
	<alpine.LFD.2.21.2002201323540.2066@sb>
	<alpine.LFD.2.21.2002201326480.2066@sb>
	<CAH_e+sxMaWc-i8+y8i+WB9YdBaeFSDsJHqPY6CX-LQXefkCoZQ@mail.gmail.com>
	<CAKvGn45ye7tWA35jzUYMsOEPNZrWYRwKF4CFAP96fA_KsZk-+A@mail.gmail.com>
	<CAH_e+sw-MtcixN76UUtAZ85CoNtJnPy3eziOhr=HR=6LQ=Ezcw@mail.gmail.com>
	<CA+MQGp9eDbgadHC5FxO60U3fsKqWBB8twqsj1BHGzXJG214arw@mail.gmail.com>
	<1147D371-7873-417E-9E31-A6DAE09465B7@tacc.utexas.edu>
Message-ID: <87d0a8wzo6.fsf@jedbrown.org>

Victor Eijkhout <eijkhout at tacc.utexas.edu> writes:

> On , 2020Feb20, at 15:26, Junchao Zhang via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
>
> The main disadvantage of dynamic shared libraries is the runtime performance costs of dynamic linking. Every time the program is executed it has to perform a large part of its linking process
>
> The main disadvantage of static linked libraries is the program load time. Each processor that executes the program has to load the executable from disk.
>
> Static => large executables => disk hit.

I mean, that code is loaded one way or another, be it in a shared
library or a static executable.  One advantage of shared libraries is
that code and read-only data can be shared between processes.  So when
you mpiexec -n 64 on your fat node, only one copy of the code and
read-only data needs to be resident in memory.

From mfadams at lbl.gov  Fri Feb 21 13:09:17 2020
From: mfadams at lbl.gov (Mark Adams)
Date: Fri, 21 Feb 2020 14:09:17 -0500
Subject: [petsc-users] [petsc-maint] "make ... all" failure
In-Reply-To: <CACE5fRpvRtMV_8VVnvsrWVS8OjMGeq0cc1rum4O+4-ahndhXxA@mail.gmail.com>
References: <CACE5fRodb8Ocyho0WtGmaHHsmUhNN+W=8r9_x_w98DSHmk+bSA@mail.gmail.com>
	<CADOhEh4G3UT5WP-vZCqHZf==U5xo8fmYfKBhQb5SKuvhvtfrbw@mail.gmail.com>
	<CACE5fRpvRtMV_8VVnvsrWVS8OjMGeq0cc1rum4O+4-ahndhXxA@mail.gmail.com>
Message-ID: <CADOhEh6tTQqLAFuuuEYROuSrjU_BZ-ERNECc-fdjxT7Aw=q2PQ@mail.gmail.com>

Please send the error that you see with --with-64-bit-indices=0. This
should not be hard to fix.

And yes, always start with a clean environment when you get a strange
error. Compile time errors are very rare.

On Fri, Feb 21, 2020 at 1:26 PM Jin Chen <jchen at pppl.gov> wrote:

> I tried and found
>
> --with-64-bit-indices=1
>
> works.
>
> But mumps and superlu don't have 64-bit support.
>
> -- Jin
>
> On Fri, Feb 21, 2020 at 12:07 PM Mark Adams <mfadams at lbl.gov> wrote:
>
>> Cool, a compiler error.
>>
>> First, delete the "arch" directory and configure again. This a deep 'make
>> clean'. Something you need to do this when you switch branches.
>>
>> On Fri, Feb 21, 2020 at 11:30 AM Jin Chen via petsc-maint <
>> petsc-maint at mcs.anl.gov> wrote:
>>
>>> Hi,
>>>
>>> I'm installing petsc branch
>>>
>>> barry/fix-superlu_dist-py-for-gpus
>>>
>>> on another computer for testing. It passed configure, but failed at
>>> "make .... all".
>>>
>>> Would you please take a look? Both configure.log and make.log are
>>> attached.
>>>
>>> Thanks,
>>>
>>> -- Jin
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200221/6934895d/attachment.html>

From jchen at pppl.gov  Fri Feb 21 13:20:22 2020
From: jchen at pppl.gov (Jin Chen)
Date: Fri, 21 Feb 2020 14:20:22 -0500
Subject: [petsc-users] [petsc-maint] "make ... all" failure
In-Reply-To: <CADOhEh6tTQqLAFuuuEYROuSrjU_BZ-ERNECc-fdjxT7Aw=q2PQ@mail.gmail.com>
References: <CACE5fRodb8Ocyho0WtGmaHHsmUhNN+W=8r9_x_w98DSHmk+bSA@mail.gmail.com>
	<CADOhEh4G3UT5WP-vZCqHZf==U5xo8fmYfKBhQb5SKuvhvtfrbw@mail.gmail.com>
	<CACE5fRpvRtMV_8VVnvsrWVS8OjMGeq0cc1rum4O+4-ahndhXxA@mail.gmail.com>
	<CADOhEh6tTQqLAFuuuEYROuSrjU_BZ-ERNECc-fdjxT7Aw=q2PQ@mail.gmail.com>
Message-ID: <CACE5fRoTfqeirOvsaTSnk7hAf63K-463y0jKrWcZze8oa4cUiQ@mail.gmail.com>

errors from setting   --with-64-bit-indices=0 :


/opt/pgi/19.5/linux86-64-llvm/19.5/include/edg/xmmintrin.h(2514): internal
error: assertion failed at:
"/dvs/p4/build/sw/rel/gpu_drv/r440/TC440_70/drivers/compiler/edg/EDG_5.0/src/sys_predef.c",
line 574
1 catastrophic error detected in the compilation of
"/tmp/tmpxft_00004945_00000000-4_aijcusparse.cpp4.ii".
Compilation aborted.
nvcc error   : 'cudafe++' died due to signal 6
gmake[2]: ***
[tigergpu-pgi195-openmpi/obj/mat/impls/aij/seq/seqcusparse/aijcusparse.o]
Error 6

On Fri, Feb 21, 2020 at 2:09 PM Mark Adams <mfadams at lbl.gov> wrote:

> Please send the error that you see with --with-64-bit-indices=0. This
> should not be hard to fix.
>
> And yes, always start with a clean environment when you get a strange
> error. Compile time errors are very rare.
>
> On Fri, Feb 21, 2020 at 1:26 PM Jin Chen <jchen at pppl.gov> wrote:
>
>> I tried and found
>>
>> --with-64-bit-indices=1
>>
>> works.
>>
>> But mumps and superlu don't have 64-bit support.
>>
>> -- Jin
>>
>> On Fri, Feb 21, 2020 at 12:07 PM Mark Adams <mfadams at lbl.gov> wrote:
>>
>>> Cool, a compiler error.
>>>
>>> First, delete the "arch" directory and configure again. This a deep
>>> 'make clean'. Something you need to do this when you switch branches.
>>>
>>> On Fri, Feb 21, 2020 at 11:30 AM Jin Chen via petsc-maint <
>>> petsc-maint at mcs.anl.gov> wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm installing petsc branch
>>>>
>>>> barry/fix-superlu_dist-py-for-gpus
>>>>
>>>> on another computer for testing. It passed configure, but failed at
>>>> "make .... all".
>>>>
>>>> Would you please take a look? Both configure.log and make.log are
>>>> attached.
>>>>
>>>> Thanks,
>>>>
>>>> -- Jin
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200221/b3a92421/attachment.html>

From mfadams at lbl.gov  Fri Feb 21 13:36:08 2020
From: mfadams at lbl.gov (Mark Adams)
Date: Fri, 21 Feb 2020 14:36:08 -0500
Subject: [petsc-users] [petsc-maint] "make ... all" failure
In-Reply-To: <CACE5fRoTfqeirOvsaTSnk7hAf63K-463y0jKrWcZze8oa4cUiQ@mail.gmail.com>
References: <CACE5fRodb8Ocyho0WtGmaHHsmUhNN+W=8r9_x_w98DSHmk+bSA@mail.gmail.com>
	<CADOhEh4G3UT5WP-vZCqHZf==U5xo8fmYfKBhQb5SKuvhvtfrbw@mail.gmail.com>
	<CACE5fRpvRtMV_8VVnvsrWVS8OjMGeq0cc1rum4O+4-ahndhXxA@mail.gmail.com>
	<CADOhEh6tTQqLAFuuuEYROuSrjU_BZ-ERNECc-fdjxT7Aw=q2PQ@mail.gmail.com>
	<CACE5fRoTfqeirOvsaTSnk7hAf63K-463y0jKrWcZze8oa4cUiQ@mail.gmail.com>
Message-ID: <CADOhEh7bSxppjMRrZTnfNA+XB=e=LYTe3pLVfx7XgMJ+cDKFRA@mail.gmail.com>

OK, the compiler is really failing on 64 bit integers.

This is out of my expertise.

It is possible that when we were doing this last year that we used 64 bit
integers and never encountered this.



On Fri, Feb 21, 2020 at 2:20 PM Jin Chen <jchen at pppl.gov> wrote:

> errors from setting   --with-64-bit-indices=0 :
>
>
> /opt/pgi/19.5/linux86-64-llvm/19.5/include/edg/xmmintrin.h(2514): internal
> error: assertion failed at:
> "/dvs/p4/build/sw/rel/gpu_drv/r440/TC440_70/drivers/compiler/edg/EDG_5.0/src/sys_predef.c",
> line 574
> 1 catastrophic error detected in the compilation of
> "/tmp/tmpxft_00004945_00000000-4_aijcusparse.cpp4.ii".
> Compilation aborted.
> nvcc error   : 'cudafe++' died due to signal 6
> gmake[2]: ***
> [tigergpu-pgi195-openmpi/obj/mat/impls/aij/seq/seqcusparse/aijcusparse.o]
> Error 6
>
> On Fri, Feb 21, 2020 at 2:09 PM Mark Adams <mfadams at lbl.gov> wrote:
>
>> Please send the error that you see with --with-64-bit-indices=0. This
>> should not be hard to fix.
>>
>> And yes, always start with a clean environment when you get a strange
>> error. Compile time errors are very rare.
>>
>> On Fri, Feb 21, 2020 at 1:26 PM Jin Chen <jchen at pppl.gov> wrote:
>>
>>> I tried and found
>>>
>>> --with-64-bit-indices=1
>>>
>>> works.
>>>
>>> But mumps and superlu don't have 64-bit support.
>>>
>>> -- Jin
>>>
>>> On Fri, Feb 21, 2020 at 12:07 PM Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>>> Cool, a compiler error.
>>>>
>>>> First, delete the "arch" directory and configure again. This a deep
>>>> 'make clean'. Something you need to do this when you switch branches.
>>>>
>>>> On Fri, Feb 21, 2020 at 11:30 AM Jin Chen via petsc-maint <
>>>> petsc-maint at mcs.anl.gov> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I'm installing petsc branch
>>>>>
>>>>> barry/fix-superlu_dist-py-for-gpus
>>>>>
>>>>> on another computer for testing. It passed configure, but failed at
>>>>> "make .... all".
>>>>>
>>>>> Would you please take a look? Both configure.log and make.log are
>>>>> attached.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -- Jin
>>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200221/97960979/attachment.html>

From shrirang.abhyankar at pnnl.gov  Sat Feb 22 08:18:57 2020
From: shrirang.abhyankar at pnnl.gov (Abhyankar, Shrirang G)
Date: Sat, 22 Feb 2020 14:18:57 +0000
Subject: [petsc-users] Using PETSc with GPU supported SuperLU_Dist
Message-ID: <264462B2-AE1F-4922-948E-0C6FCCB9A429@pnnl.gov>

Hi,
   I want to install PETSc with GPU supported SuperLU_Dist. What are the configure options I should be using?

Thanks,
Shri
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200222/6096778e/attachment-0001.html>

From balay at mcs.anl.gov  Sat Feb 22 08:25:01 2020
From: balay at mcs.anl.gov (Satish Balay)
Date: Sat, 22 Feb 2020 08:25:01 -0600
Subject: [petsc-users] Using PETSc with GPU supported SuperLU_Dist
In-Reply-To: <264462B2-AE1F-4922-948E-0C6FCCB9A429@pnnl.gov>
References: <264462B2-AE1F-4922-948E-0C6FCCB9A429@pnnl.gov>
Message-ID: <alpine.LFD.2.21.2002220822150.2066@sb>

On Sat, 22 Feb 2020, Abhyankar, Shrirang G via petsc-users wrote:

> Hi,
>    I want to install PETSc with GPU supported SuperLU_Dist. What are the configure options I should be using?


Shri,

>>>>>
    if self.framework.argDB['download-superlu_dist-gpu']:
      self.cuda           = framework.require('config.packages.cuda',self)
      self.openmp         = framework.require('config.packages.openmp',self)
      self.deps           = [self.mpi,self.blasLapack,self.cuda,self.openmp]
<<<<<

So try:

--with-cuda=1 --download-superlu_dist=1 --download-superlu_dist-gpu=1 --with-openmp=1 [and usual MPI, blaslapack]

Satish


From shrirang.abhyankar at pnnl.gov  Sat Feb 22 11:41:46 2020
From: shrirang.abhyankar at pnnl.gov (Abhyankar, Shrirang G)
Date: Sat, 22 Feb 2020 17:41:46 +0000
Subject: [petsc-users] Using PETSc with GPU supported SuperLU_Dist
In-Reply-To: <alpine.LFD.2.21.2002220822150.2066@sb>
References: <264462B2-AE1F-4922-948E-0C6FCCB9A429@pnnl.gov>
	<alpine.LFD.2.21.2002220822150.2066@sb>
Message-ID: <4BDB7C51-7452-45CC-A118-4D3F4F5D03D1@pnnl.gov>

Thanks, Satish. Configure and make go through fine. Getting an undefined reference error for VecGetArrayWrite_SeqCUDA.

Shri
From: Satish Balay <balay at mcs.anl.gov>
Reply-To: petsc-users <petsc-users at mcs.anl.gov>
Date: Saturday, February 22, 2020 at 8:25 AM
To: "Abhyankar, Shrirang G" <shrirang.abhyankar at pnnl.gov>
Cc: "petsc-users at mcs.anl.gov" <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Using PETSc with GPU supported SuperLU_Dist

On Sat, 22 Feb 2020, Abhyankar, Shrirang G via petsc-users wrote:

Hi,
    I want to install PETSc with GPU supported SuperLU_Dist. What are the configure options I should be using?


Shri,


    if self.framework.argDB['download-superlu_dist-gpu']:
      self.cuda           = framework.require('config.packages.cuda',self)
      self.openmp         = framework.require('config.packages.openmp',self)
      self.deps           = [self.mpi,self.blasLapack,self.cuda,self.openmp]
<<<<<

So try:

--with-cuda=1 --download-superlu_dist=1 --download-superlu_dist-gpu=1 --with-openmp=1 [and usual MPI, blaslapack]

Satish


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200222/c6ded6b5/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.log
Type: application/octet-stream
Size: 4994 bytes
Desc: test.log
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200222/c6ded6b5/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: make.log
Type: application/octet-stream
Size: 106830 bytes
Desc: make.log
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200222/c6ded6b5/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: configure.log
Type: application/octet-stream
Size: 2546435 bytes
Desc: configure.log
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200222/c6ded6b5/attachment-0005.obj>

From balay at mcs.anl.gov  Sat Feb 22 12:27:48 2020
From: balay at mcs.anl.gov (Satish Balay)
Date: Sat, 22 Feb 2020 12:27:48 -0600
Subject: [petsc-users] Using PETSc with GPU supported SuperLU_Dist
In-Reply-To: <4BDB7C51-7452-45CC-A118-4D3F4F5D03D1@pnnl.gov>
References: <264462B2-AE1F-4922-948E-0C6FCCB9A429@pnnl.gov>
	<alpine.LFD.2.21.2002220822150.2066@sb>
	<4BDB7C51-7452-45CC-A118-4D3F4F5D03D1@pnnl.gov>
Message-ID: <alpine.LFD.2.21.2002221227200.2066@sb>

Looks like a bug in petsc that needs fixing. However - you shouldn't need options '--with-cxx-dialect=C++11 --with-clanguage=c++'

Satish

On Sat, 22 Feb 2020, Abhyankar, Shrirang G via petsc-users wrote:

> Thanks, Satish. Configure and make go through fine. Getting an undefined reference error for VecGetArrayWrite_SeqCUDA.
> 
> Shri
> From: Satish Balay <balay at mcs.anl.gov>
> Reply-To: petsc-users <petsc-users at mcs.anl.gov>
> Date: Saturday, February 22, 2020 at 8:25 AM
> To: "Abhyankar, Shrirang G" <shrirang.abhyankar at pnnl.gov>
> Cc: "petsc-users at mcs.anl.gov" <petsc-users at mcs.anl.gov>
> Subject: Re: [petsc-users] Using PETSc with GPU supported SuperLU_Dist
> 
> On Sat, 22 Feb 2020, Abhyankar, Shrirang G via petsc-users wrote:
> 
> Hi,
>     I want to install PETSc with GPU supported SuperLU_Dist. What are the configure options I should be using?
> 
> 
> Shri,
> 
> 
>     if self.framework.argDB['download-superlu_dist-gpu']:
>       self.cuda           = framework.require('config.packages.cuda',self)
>       self.openmp         = framework.require('config.packages.openmp',self)
>       self.deps           = [self.mpi,self.blasLapack,self.cuda,self.openmp]
> <<<<<
> 
> So try:
> 
> --with-cuda=1 --download-superlu_dist=1 --download-superlu_dist-gpu=1 --with-openmp=1 [and usual MPI, blaslapack]
> 
> Satish
> 
> 
> 


From jczhang at mcs.anl.gov  Sat Feb 22 20:53:46 2020
From: jczhang at mcs.anl.gov (Junchao Zhang)
Date: Sat, 22 Feb 2020 20:53:46 -0600
Subject: [petsc-users] Using PETSc with GPU supported SuperLU_Dist
In-Reply-To: <4BDB7C51-7452-45CC-A118-4D3F4F5D03D1@pnnl.gov>
References: <264462B2-AE1F-4922-948E-0C6FCCB9A429@pnnl.gov>
	<alpine.LFD.2.21.2002220822150.2066@sb>
	<4BDB7C51-7452-45CC-A118-4D3F4F5D03D1@pnnl.gov>
Message-ID: <CA+MQGp881gb7XOuKfVBeTZ+NEVzHWX6ipVBqS8GKSj=Np=TWsQ@mail.gmail.com>

We met the error before and knew why. Will fix it soon.
--Junchao Zhang


On Sat, Feb 22, 2020 at 11:43 AM Abhyankar, Shrirang G via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> Thanks, Satish. Configure and make go through fine. Getting an undefined
> reference error for VecGetArrayWrite_SeqCUDA.
>
>
>
> Shri
>
> *From: *Satish Balay <balay at mcs.anl.gov>
> *Reply-To: *petsc-users <petsc-users at mcs.anl.gov>
> *Date: *Saturday, February 22, 2020 at 8:25 AM
> *To: *"Abhyankar, Shrirang G" <shrirang.abhyankar at pnnl.gov>
> *Cc: *"petsc-users at mcs.anl.gov" <petsc-users at mcs.anl.gov>
> *Subject: *Re: [petsc-users] Using PETSc with GPU supported SuperLU_Dist
>
>
>
> On Sat, 22 Feb 2020, Abhyankar, Shrirang G via petsc-users wrote:
>
>
>
> Hi,
>
>     I want to install PETSc with GPU supported SuperLU_Dist. What are the
> configure options I should be using?
>
>
>
>
>
> Shri,
>
>
>
>
>
>     if self.framework.argDB['download-superlu_dist-gpu']:
>
>       self.cuda           = framework.require('config.packages.cuda',self)
>
>       self.openmp         =
> framework.require('config.packages.openmp',self)
>
>       self.deps           =
> [self.mpi,self.blasLapack,self.cuda,self.openmp]
>
> <<<<<
>
>
>
> So try:
>
>
>
> --with-cuda=1 --download-superlu_dist=1 --download-superlu_dist-gpu=1
> --with-openmp=1 [and usual MPI, blaslapack]
>
>
>
> Satish
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200222/045cf7cc/attachment.html>

From balay at mcs.anl.gov  Sat Feb 22 20:59:14 2020
From: balay at mcs.anl.gov (Satish Balay)
Date: Sat, 22 Feb 2020 20:59:14 -0600
Subject: [petsc-users] Using PETSc with GPU supported SuperLU_Dist
In-Reply-To: <CA+MQGp881gb7XOuKfVBeTZ+NEVzHWX6ipVBqS8GKSj=Np=TWsQ@mail.gmail.com>
References: <264462B2-AE1F-4922-948E-0C6FCCB9A429@pnnl.gov>
	<alpine.LFD.2.21.2002220822150.2066@sb>
	<4BDB7C51-7452-45CC-A118-4D3F4F5D03D1@pnnl.gov>
	<CA+MQGp881gb7XOuKfVBeTZ+NEVzHWX6ipVBqS8GKSj=Np=TWsQ@mail.gmail.com>
Message-ID: <alpine.LFD.2.21.2002222058480.2066@sb>

The fix is now in both  maint and master

https://gitlab.com/petsc/petsc/-/merge_requests/2555

Satish

On Sat, 22 Feb 2020, Junchao Zhang via petsc-users wrote:

> We met the error before and knew why. Will fix it soon.
> --Junchao Zhang
> 
> 
> On Sat, Feb 22, 2020 at 11:43 AM Abhyankar, Shrirang G via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
> 
> > Thanks, Satish. Configure and make go through fine. Getting an undefined
> > reference error for VecGetArrayWrite_SeqCUDA.
> >
> >
> >
> > Shri
> >
> > *From: *Satish Balay <balay at mcs.anl.gov>
> > *Reply-To: *petsc-users <petsc-users at mcs.anl.gov>
> > *Date: *Saturday, February 22, 2020 at 8:25 AM
> > *To: *"Abhyankar, Shrirang G" <shrirang.abhyankar at pnnl.gov>
> > *Cc: *"petsc-users at mcs.anl.gov" <petsc-users at mcs.anl.gov>
> > *Subject: *Re: [petsc-users] Using PETSc with GPU supported SuperLU_Dist
> >
> >
> >
> > On Sat, 22 Feb 2020, Abhyankar, Shrirang G via petsc-users wrote:
> >
> >
> >
> > Hi,
> >
> >     I want to install PETSc with GPU supported SuperLU_Dist. What are the
> > configure options I should be using?
> >
> >
> >
> >
> >
> > Shri,
> >
> >
> >
> >
> >
> >     if self.framework.argDB['download-superlu_dist-gpu']:
> >
> >       self.cuda           = framework.require('config.packages.cuda',self)
> >
> >       self.openmp         =
> > framework.require('config.packages.openmp',self)
> >
> >       self.deps           =
> > [self.mpi,self.blasLapack,self.cuda,self.openmp]
> >
> > <<<<<
> >
> >
> >
> > So try:
> >
> >
> >
> > --with-cuda=1 --download-superlu_dist=1 --download-superlu_dist-gpu=1
> > --with-openmp=1 [and usual MPI, blaslapack]
> >
> >
> >
> > Satish
> >
> >
> >
> >
> >
> 


From jczhang at mcs.anl.gov  Sat Feb 22 21:02:11 2020
From: jczhang at mcs.anl.gov (Junchao Zhang)
Date: Sat, 22 Feb 2020 21:02:11 -0600
Subject: [petsc-users] Using PETSc with GPU supported SuperLU_Dist
In-Reply-To: <5efd582f8f36424bab7e5604b33efcbf@BY5PR09MB5585.namprd09.prod.outlook.com>
References: <264462B2-AE1F-4922-948E-0C6FCCB9A429@pnnl.gov>
	<alpine.LFD.2.21.2002220822150.2066@sb>
	<4BDB7C51-7452-45CC-A118-4D3F4F5D03D1@pnnl.gov>
	<CA+MQGp881gb7XOuKfVBeTZ+NEVzHWX6ipVBqS8GKSj=Np=TWsQ@mail.gmail.com>
	<5efd582f8f36424bab7e5604b33efcbf@BY5PR09MB5585.namprd09.prod.outlook.com>
Message-ID: <CA+MQGp9CovCZAE3x=JgDpxiKp5AscZ7jX0-dT4gemupGQti7LQ@mail.gmail.com>

Great. Thanks.

On Sat, Feb 22, 2020 at 8:59 PM Balay, Satish <balay at mcs.anl.gov> wrote:

> The fix is now in both  maint and master
>
> https://gitlab.com/petsc/petsc/-/merge_requests/2555
>
> Satish
>
> On Sat, 22 Feb 2020, Junchao Zhang via petsc-users wrote:
>
> > We met the error before and knew why. Will fix it soon.
> > --Junchao Zhang
> >
> >
> > On Sat, Feb 22, 2020 at 11:43 AM Abhyankar, Shrirang G via petsc-users <
> > petsc-users at mcs.anl.gov> wrote:
> >
> > > Thanks, Satish. Configure and make go through fine. Getting an
> undefined
> > > reference error for VecGetArrayWrite_SeqCUDA.
> > >
> > >
> > >
> > > Shri
> > >
> > > *From: *Satish Balay <balay at mcs.anl.gov>
> > > *Reply-To: *petsc-users <petsc-users at mcs.anl.gov>
> > > *Date: *Saturday, February 22, 2020 at 8:25 AM
> > > *To: *"Abhyankar, Shrirang G" <shrirang.abhyankar at pnnl.gov>
> > > *Cc: *"petsc-users at mcs.anl.gov" <petsc-users at mcs.anl.gov>
> > > *Subject: *Re: [petsc-users] Using PETSc with GPU supported
> SuperLU_Dist
> > >
> > >
> > >
> > > On Sat, 22 Feb 2020, Abhyankar, Shrirang G via petsc-users wrote:
> > >
> > >
> > >
> > > Hi,
> > >
> > >     I want to install PETSc with GPU supported SuperLU_Dist. What are
> the
> > > configure options I should be using?
> > >
> > >
> > >
> > >
> > >
> > > Shri,
> > >
> > >
> > >
> > >
> > >
> > >     if self.framework.argDB['download-superlu_dist-gpu']:
> > >
> > >       self.cuda           =
> framework.require('config.packages.cuda',self)
> > >
> > >       self.openmp         =
> > > framework.require('config.packages.openmp',self)
> > >
> > >       self.deps           =
> > > [self.mpi,self.blasLapack,self.cuda,self.openmp]
> > >
> > > <<<<<
> > >
> > >
> > >
> > > So try:
> > >
> > >
> > >
> > > --with-cuda=1 --download-superlu_dist=1 --download-superlu_dist-gpu=1
> > > --with-openmp=1 [and usual MPI, blaslapack]
> > >
> > >
> > >
> > > Satish
> > >
> > >
> > >
> > >
> > >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200222/62bc67eb/attachment-0001.html>

From barrydog505 at gmail.com  Sun Feb 23 01:59:19 2020
From: barrydog505 at gmail.com (Tsung-Hsing Chen)
Date: Sun, 23 Feb 2020 15:59:19 +0800
Subject: [petsc-users] Error - Out of memory. This could be due to
 allocating too large an object or bleeding by not properly destroying
 unneeded objects.
Message-ID: <CANZ1gTY6RFH12-O9f6bhm1tTw5K8M8ReFiQOMMsgZ_mk_5kjyg@mail.gmail.com>

Hi all,

I have written a simple code to solve the FEM problem, and I want to use LU
to solve the Ax=b.
My problem(error) won't happen at the beginning until M & N in A_matrix is
getting larger. (Can also be understood as mesh vertex increase.)
All the error output seems to relate to LU, but I don't know what should be
done.
The followings are the code I wrote(section) and the error output.

Here's the code (section) :
  /*
        code ...
  */
  ierr = MatCreate(PETSC_COMM_WORLD, &A_matrix);CHKERRQ(ierr);
  ierr = MatSetSizes(A_matrix, PETSC_DECIDE, PETSC_DECIDE, M,
N);CHKERRQ(ierr);
  ierr = MatSetType(A_matrix, MATSEQAIJ);CHKERRQ(ierr);
  // setting nnz ...
  ierr = MatSeqAIJSetPreallocation(A_matrix, 0, nnz);CHKERRQ(ierr);
  /*
        MatSetValues(); ...
        MatAssemblyBegin();
        MatAssemblyEnd();
  */
  ierr = KSPCreate(PETSC_COMM_WORLD, &ksp);CHKERRQ(ierr);
  ierr = KSPSetOperators(ksp, A_matrix, A_matrix);CHKERRQ(ierr);
  ierr = KSPSetType(ksp, KSPPREONLY);CHKERRQ(ierr);
  ierr = KSPGetPC(ksp, &pc);CHKERRQ(ierr);
  ierr = PCSetType(pc, PCLU);CHKERRQ(ierr);
  ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr);
  ierr = KSPSetUp(ksp);CHKERRQ(ierr);
  /*
        code ...
  */

Here's the error (run with valgrind --tool=memcheck --leak-check=full) :
  ==6371== Warning: set address range perms: large range [0x7c84a040,
0xb4e9a598) (undefined)
  ==6371== Warning: set address range perms: large range [0xb4e9b040,
0x2b4e9aeac) (undefined)
  ==6371== Warning: set address range perms: large range [0x2b4e9b040,
0x4b4e9aeac) (undefined)
  ==6371== Argument 'size' of function memalign has a fishy (possibly
negative) value: -5187484888
  ==6371==    at 0x4C320A6: memalign (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
  ==6371==    by 0x501B4B0: PetscMallocAlign (mal.c:49)
  ==6371==    by 0x501CE37: PetscMallocA (mal.c:422)
  ==6371==    by 0x5ACFF0C: MatLUFactorSymbolic_SeqAIJ (aijfact.c:366)
  ==6371==    by 0x561D8B3: MatLUFactorSymbolic (matrix.c:3005)
  ==6371==    by 0x644ED9C: PCSetUp_LU (lu.c:90)
  ==6371==    by 0x65A2C32: PCSetUp (precon.c:894)
  ==6371==    by 0x6707E71: KSPSetUp (itfunc.c:376)
  ==6371==    by 0x13AB09: Calculate (taylor_hood.c:1780)
  ==6371==    by 0x10CB85: main (taylor_hood.c:228)
  ==6371==
  [0]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------
  [0]PETSC ERROR: Out of memory. This could be due to allocating
  [0]PETSC ERROR: too large an object or bleeding by not properly
  [0]PETSC ERROR: destroying unneeded objects.
  [0]PETSC ERROR: Memory allocated 0 Memory used by process 15258234880
  [0]PETSC ERROR: Try running with -malloc_dump or -malloc_view for info.
  [0]PETSC ERROR: Memory requested 18446744068522065920
  [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
for trouble shooting.
  [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown
  [0]PETSC ERROR: ./taylor_hood on a arch-linux2-c-debug named e2-120 by
barry Sun Feb 23 14:18:46 2020
  [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++
--with-fc=gfortran --download-mpich --download-fblaslapack
--download-triangle
  [0]PETSC ERROR: #1 MatLUFactorSymbolic_SeqAIJ() line 366 in
/home/barry/petsc/src/mat/impls/aij/seq/aijfact.c
  [0]PETSC ERROR: #2 PetscMallocA() line 422 in
/home/barry/petsc/src/sys/memory/mal.c
  [0]PETSC ERROR: #3 MatLUFactorSymbolic_SeqAIJ() line 366 in
/home/barry/petsc/src/mat/impls/aij/seq/aijfact.c
  [0]PETSC ERROR: #4 MatLUFactorSymbolic() line 3005 in
/home/barry/petsc/src/mat/interface/matrix.c
  [0]PETSC ERROR: #5 PCSetUp_LU() line 90 in
/home/barry/petsc/src/ksp/pc/impls/factor/lu/lu.c
  [0]PETSC ERROR: #6 PCSetUp() line 894 in
/home/barry/petsc/src/ksp/pc/interface/precon.c
  [0]PETSC ERROR: #7 KSPSetUp() line 376 in
/home/barry/petsc/src/ksp/ksp/interface/itfunc.c
  [0]PETSC ERROR: #8 Calculate() line 1780 in
/home/barry/brain/brain/3D/taylor_hood.c
  [0]PETSC ERROR: #9 main() line 230 in
/home/barry/brain/brain/3D/taylor_hood.c
  [0]PETSC ERROR: PETSc Option Table entries:
  [0]PETSC ERROR: -dm_view
  [0]PETSC ERROR: -f mesh/ellipsoid.msh
  [0]PETSC ERROR: -matload_block_size 1
  [0]PETSC ERROR: ----------------End of Error Message -------send entire
error message to petsc-maint at mcs.anl.gov----------

Is there any setting that should be done but I ignore?

Thanks in advance,

Tsung-Hsing Chen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200223/bb2f9eeb/attachment.html>

From stefano.zampini at gmail.com  Sun Feb 23 02:33:26 2020
From: stefano.zampini at gmail.com (Stefano Zampini)
Date: Sun, 23 Feb 2020 11:33:26 +0300
Subject: [petsc-users] Error - Out of memory. This could be due to
 allocating too large an object or bleeding by not properly destroying
 unneeded objects.
In-Reply-To: <CANZ1gTY6RFH12-O9f6bhm1tTw5K8M8ReFiQOMMsgZ_mk_5kjyg@mail.gmail.com>
References: <CANZ1gTY6RFH12-O9f6bhm1tTw5K8M8ReFiQOMMsgZ_mk_5kjyg@mail.gmail.com>
Message-ID: <CAGPUisi=e6yytkJcOq90R5tx8MRprV23eNyELki0CTwknWH0gQ@mail.gmail.com>

This seems integer overflow when computing the factors.

How large is the matrix when you encounter the error?
Note that LU is not memory optimal and you can easily encounter
out-of-memory issues with large matrices.
Assuming sparsity, the memory requirements for LU are N log(N) in 2D and
N^4/3 in 3D.


Il giorno dom 23 feb 2020 alle ore 11:01 Tsung-Hsing Chen <
barrydog505 at gmail.com> ha scritto:

> Hi all,
>
> I have written a simple code to solve the FEM problem, and I want to use
> LU to solve the Ax=b.
> My problem(error) won't happen at the beginning until M & N in A_matrix is
> getting larger. (Can also be understood as mesh vertex increase.)
> All the error output seems to relate to LU, but I don't know what should
> be done.
> The followings are the code I wrote(section) and the error output.
>
> Here's the code (section) :
>   /*
>         code ...
>   */
>   ierr = MatCreate(PETSC_COMM_WORLD, &A_matrix);CHKERRQ(ierr);
>   ierr = MatSetSizes(A_matrix, PETSC_DECIDE, PETSC_DECIDE, M,
> N);CHKERRQ(ierr);
>   ierr = MatSetType(A_matrix, MATSEQAIJ);CHKERRQ(ierr);
>   // setting nnz ...
>   ierr = MatSeqAIJSetPreallocation(A_matrix, 0, nnz);CHKERRQ(ierr);
>   /*
>         MatSetValues(); ...
>         MatAssemblyBegin();
>         MatAssemblyEnd();
>   */
>   ierr = KSPCreate(PETSC_COMM_WORLD, &ksp);CHKERRQ(ierr);
>   ierr = KSPSetOperators(ksp, A_matrix, A_matrix);CHKERRQ(ierr);
>   ierr = KSPSetType(ksp, KSPPREONLY);CHKERRQ(ierr);
>   ierr = KSPGetPC(ksp, &pc);CHKERRQ(ierr);
>   ierr = PCSetType(pc, PCLU);CHKERRQ(ierr);
>   ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr);
>   ierr = KSPSetUp(ksp);CHKERRQ(ierr);
>   /*
>         code ...
>   */
>
> Here's the error (run with valgrind --tool=memcheck --leak-check=full) :
>   ==6371== Warning: set address range perms: large range [0x7c84a040,
> 0xb4e9a598) (undefined)
>   ==6371== Warning: set address range perms: large range [0xb4e9b040,
> 0x2b4e9aeac) (undefined)
>   ==6371== Warning: set address range perms: large range [0x2b4e9b040,
> 0x4b4e9aeac) (undefined)
>   ==6371== Argument 'size' of function memalign has a fishy (possibly
> negative) value: -5187484888
>   ==6371==    at 0x4C320A6: memalign (in
> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>   ==6371==    by 0x501B4B0: PetscMallocAlign (mal.c:49)
>   ==6371==    by 0x501CE37: PetscMallocA (mal.c:422)
>   ==6371==    by 0x5ACFF0C: MatLUFactorSymbolic_SeqAIJ (aijfact.c:366)
>   ==6371==    by 0x561D8B3: MatLUFactorSymbolic (matrix.c:3005)
>   ==6371==    by 0x644ED9C: PCSetUp_LU (lu.c:90)
>   ==6371==    by 0x65A2C32: PCSetUp (precon.c:894)
>   ==6371==    by 0x6707E71: KSPSetUp (itfunc.c:376)
>   ==6371==    by 0x13AB09: Calculate (taylor_hood.c:1780)
>   ==6371==    by 0x10CB85: main (taylor_hood.c:228)
>   ==6371==
>   [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
>   [0]PETSC ERROR: Out of memory. This could be due to allocating
>   [0]PETSC ERROR: too large an object or bleeding by not properly
>   [0]PETSC ERROR: destroying unneeded objects.
>   [0]PETSC ERROR: Memory allocated 0 Memory used by process 15258234880
>   [0]PETSC ERROR: Try running with -malloc_dump or -malloc_view for info.
>   [0]PETSC ERROR: Memory requested 18446744068522065920
>   [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
>   [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown
>   [0]PETSC ERROR: ./taylor_hood on a arch-linux2-c-debug named e2-120 by
> barry Sun Feb 23 14:18:46 2020
>   [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++
> --with-fc=gfortran --download-mpich --download-fblaslapack
> --download-triangle
>   [0]PETSC ERROR: #1 MatLUFactorSymbolic_SeqAIJ() line 366 in
> /home/barry/petsc/src/mat/impls/aij/seq/aijfact.c
>   [0]PETSC ERROR: #2 PetscMallocA() line 422 in
> /home/barry/petsc/src/sys/memory/mal.c
>   [0]PETSC ERROR: #3 MatLUFactorSymbolic_SeqAIJ() line 366 in
> /home/barry/petsc/src/mat/impls/aij/seq/aijfact.c
>   [0]PETSC ERROR: #4 MatLUFactorSymbolic() line 3005 in
> /home/barry/petsc/src/mat/interface/matrix.c
>   [0]PETSC ERROR: #5 PCSetUp_LU() line 90 in
> /home/barry/petsc/src/ksp/pc/impls/factor/lu/lu.c
>   [0]PETSC ERROR: #6 PCSetUp() line 894 in
> /home/barry/petsc/src/ksp/pc/interface/precon.c
>   [0]PETSC ERROR: #7 KSPSetUp() line 376 in
> /home/barry/petsc/src/ksp/ksp/interface/itfunc.c
>   [0]PETSC ERROR: #8 Calculate() line 1780 in
> /home/barry/brain/brain/3D/taylor_hood.c
>   [0]PETSC ERROR: #9 main() line 230 in
> /home/barry/brain/brain/3D/taylor_hood.c
>   [0]PETSC ERROR: PETSc Option Table entries:
>   [0]PETSC ERROR: -dm_view
>   [0]PETSC ERROR: -f mesh/ellipsoid.msh
>   [0]PETSC ERROR: -matload_block_size 1
>   [0]PETSC ERROR: ----------------End of Error Message -------send entire
> error message to petsc-maint at mcs.anl.gov----------
>
> Is there any setting that should be done but I ignore?
>
> Thanks in advance,
>
> Tsung-Hsing Chen
>


-- 
Stefano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200223/218b55e5/attachment.html>

From barrydog505 at gmail.com  Sun Feb 23 02:52:44 2020
From: barrydog505 at gmail.com (Tsung-Hsing Chen)
Date: Sun, 23 Feb 2020 16:52:44 +0800
Subject: [petsc-users] Error - Out of memory. This could be due to
 allocating too large an object or bleeding by not properly destroying
 unneeded objects.
In-Reply-To: <CAGPUisi=e6yytkJcOq90R5tx8MRprV23eNyELki0CTwknWH0gQ@mail.gmail.com>
References: <CANZ1gTY6RFH12-O9f6bhm1tTw5K8M8ReFiQOMMsgZ_mk_5kjyg@mail.gmail.com>
	<CAGPUisi=e6yytkJcOq90R5tx8MRprV23eNyELki0CTwknWH0gQ@mail.gmail.com>
Message-ID: <CANZ1gTZhPR8mApr2yLJV-P+Zo=Cz2QuxS+An3ci_wgNpiZ8FyA@mail.gmail.com>

This error came with a matrix approximate 300,000*300,000, and I was
solving a 3D model.
" the memory requirements for LU are N log(N) in 2D and N^4/3 in 3D. " What
unit is it? Byte?

Stefano Zampini <stefano.zampini at gmail.com> ? 2020?2?23? ?? ??4:33???

> This seems integer overflow when computing the factors.
>
> How large is the matrix when you encounter the error?
> Note that LU is not memory optimal and you can easily encounter
> out-of-memory issues with large matrices.
> Assuming sparsity, the memory requirements for LU are N log(N) in 2D and
> N^4/3 in 3D.
>
>
> Il giorno dom 23 feb 2020 alle ore 11:01 Tsung-Hsing Chen <
> barrydog505 at gmail.com> ha scritto:
>
>> Hi all,
>>
>> I have written a simple code to solve the FEM problem, and I want to use
>> LU to solve the Ax=b.
>> My problem(error) won't happen at the beginning until M & N in A_matrix
>> is getting larger. (Can also be understood as mesh vertex increase.)
>> All the error output seems to relate to LU, but I don't know what should
>> be done.
>> The followings are the code I wrote(section) and the error output.
>>
>> Here's the code (section) :
>>   /*
>>         code ...
>>   */
>>   ierr = MatCreate(PETSC_COMM_WORLD, &A_matrix);CHKERRQ(ierr);
>>   ierr = MatSetSizes(A_matrix, PETSC_DECIDE, PETSC_DECIDE, M,
>> N);CHKERRQ(ierr);
>>   ierr = MatSetType(A_matrix, MATSEQAIJ);CHKERRQ(ierr);
>>   // setting nnz ...
>>   ierr = MatSeqAIJSetPreallocation(A_matrix, 0, nnz);CHKERRQ(ierr);
>>   /*
>>         MatSetValues(); ...
>>         MatAssemblyBegin();
>>         MatAssemblyEnd();
>>   */
>>   ierr = KSPCreate(PETSC_COMM_WORLD, &ksp);CHKERRQ(ierr);
>>   ierr = KSPSetOperators(ksp, A_matrix, A_matrix);CHKERRQ(ierr);
>>   ierr = KSPSetType(ksp, KSPPREONLY);CHKERRQ(ierr);
>>   ierr = KSPGetPC(ksp, &pc);CHKERRQ(ierr);
>>   ierr = PCSetType(pc, PCLU);CHKERRQ(ierr);
>>   ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr);
>>   ierr = KSPSetUp(ksp);CHKERRQ(ierr);
>>   /*
>>         code ...
>>   */
>>
>> Here's the error (run with valgrind --tool=memcheck --leak-check=full) :
>>   ==6371== Warning: set address range perms: large range [0x7c84a040,
>> 0xb4e9a598) (undefined)
>>   ==6371== Warning: set address range perms: large range [0xb4e9b040,
>> 0x2b4e9aeac) (undefined)
>>   ==6371== Warning: set address range perms: large range [0x2b4e9b040,
>> 0x4b4e9aeac) (undefined)
>>   ==6371== Argument 'size' of function memalign has a fishy (possibly
>> negative) value: -5187484888
>>   ==6371==    at 0x4C320A6: memalign (in
>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>>   ==6371==    by 0x501B4B0: PetscMallocAlign (mal.c:49)
>>   ==6371==    by 0x501CE37: PetscMallocA (mal.c:422)
>>   ==6371==    by 0x5ACFF0C: MatLUFactorSymbolic_SeqAIJ (aijfact.c:366)
>>   ==6371==    by 0x561D8B3: MatLUFactorSymbolic (matrix.c:3005)
>>   ==6371==    by 0x644ED9C: PCSetUp_LU (lu.c:90)
>>   ==6371==    by 0x65A2C32: PCSetUp (precon.c:894)
>>   ==6371==    by 0x6707E71: KSPSetUp (itfunc.c:376)
>>   ==6371==    by 0x13AB09: Calculate (taylor_hood.c:1780)
>>   ==6371==    by 0x10CB85: main (taylor_hood.c:228)
>>   ==6371==
>>   [0]PETSC ERROR: --------------------- Error Message
>> --------------------------------------------------------------
>>   [0]PETSC ERROR: Out of memory. This could be due to allocating
>>   [0]PETSC ERROR: too large an object or bleeding by not properly
>>   [0]PETSC ERROR: destroying unneeded objects.
>>   [0]PETSC ERROR: Memory allocated 0 Memory used by process 15258234880
>>   [0]PETSC ERROR: Try running with -malloc_dump or -malloc_view for info.
>>   [0]PETSC ERROR: Memory requested 18446744068522065920
>>   [0]PETSC ERROR: See
>> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble
>> shooting.
>>   [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown
>>   [0]PETSC ERROR: ./taylor_hood on a arch-linux2-c-debug named e2-120 by
>> barry Sun Feb 23 14:18:46 2020
>>   [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++
>> --with-fc=gfortran --download-mpich --download-fblaslapack
>> --download-triangle
>>   [0]PETSC ERROR: #1 MatLUFactorSymbolic_SeqAIJ() line 366 in
>> /home/barry/petsc/src/mat/impls/aij/seq/aijfact.c
>>   [0]PETSC ERROR: #2 PetscMallocA() line 422 in
>> /home/barry/petsc/src/sys/memory/mal.c
>>   [0]PETSC ERROR: #3 MatLUFactorSymbolic_SeqAIJ() line 366 in
>> /home/barry/petsc/src/mat/impls/aij/seq/aijfact.c
>>   [0]PETSC ERROR: #4 MatLUFactorSymbolic() line 3005 in
>> /home/barry/petsc/src/mat/interface/matrix.c
>>   [0]PETSC ERROR: #5 PCSetUp_LU() line 90 in
>> /home/barry/petsc/src/ksp/pc/impls/factor/lu/lu.c
>>   [0]PETSC ERROR: #6 PCSetUp() line 894 in
>> /home/barry/petsc/src/ksp/pc/interface/precon.c
>>   [0]PETSC ERROR: #7 KSPSetUp() line 376 in
>> /home/barry/petsc/src/ksp/ksp/interface/itfunc.c
>>   [0]PETSC ERROR: #8 Calculate() line 1780 in
>> /home/barry/brain/brain/3D/taylor_hood.c
>>   [0]PETSC ERROR: #9 main() line 230 in
>> /home/barry/brain/brain/3D/taylor_hood.c
>>   [0]PETSC ERROR: PETSc Option Table entries:
>>   [0]PETSC ERROR: -dm_view
>>   [0]PETSC ERROR: -f mesh/ellipsoid.msh
>>   [0]PETSC ERROR: -matload_block_size 1
>>   [0]PETSC ERROR: ----------------End of Error Message -------send entire
>> error message to petsc-maint at mcs.anl.gov----------
>>
>> Is there any setting that should be done but I ignore?
>>
>> Thanks in advance,
>>
>> Tsung-Hsing Chen
>>
>
>
> --
> Stefano
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200223/24aeaa7f/attachment-0001.html>

From stefano.zampini at gmail.com  Sun Feb 23 03:35:09 2020
From: stefano.zampini at gmail.com (Stefano Zampini)
Date: Sun, 23 Feb 2020 12:35:09 +0300
Subject: [petsc-users] Error - Out of memory. This could be due to
 allocating too large an object or bleeding by not properly destroying
 unneeded objects.
In-Reply-To: <CANZ1gTZhPR8mApr2yLJV-P+Zo=Cz2QuxS+An3ci_wgNpiZ8FyA@mail.gmail.com>
References: <CANZ1gTY6RFH12-O9f6bhm1tTw5K8M8ReFiQOMMsgZ_mk_5kjyg@mail.gmail.com>
	<CAGPUisi=e6yytkJcOq90R5tx8MRprV23eNyELki0CTwknWH0gQ@mail.gmail.com>
	<CANZ1gTZhPR8mApr2yLJV-P+Zo=Cz2QuxS+An3ci_wgNpiZ8FyA@mail.gmail.com>
Message-ID: <CAGPUisg5DtQPS3oTguFC4UvQQW+7yGVMQEQ59TWLraH_V4KB8Q@mail.gmail.com>

Il giorno dom 23 feb 2020 alle ore 11:53 Tsung-Hsing Chen <
barrydog505 at gmail.com> ha scritto:

> This error came with a matrix approximate 300,000*300,000, and I was
> solving a 3D model.
> " the memory requirements for LU are N log(N) in 2D and N^4/3 in 3D. "
> What unit is it? Byte?
>

Number of floating-point entries.

Assuming an optimal nested dissection ordering can be found (i.e. no
"dense" rows), the largest front is asymptotically as large as N^(2/3) (N
the size of the matrix)
Storing it in memory requires (N^(2/3))^2 entries, thus N^4/3 entries.
What problem are you solving?
If you plan to use direct methods, you may want to experiment with parallel
factorization packages like  MUMPS or SUPERLU_DIST


>
> Stefano Zampini <stefano.zampini at gmail.com> ? 2020?2?23? ?? ??4:33???
>
>> This seems integer overflow when computing the factors.
>>
>> How large is the matrix when you encounter the error?
>> Note that LU is not memory optimal and you can easily encounter
>> out-of-memory issues with large matrices.
>> Assuming sparsity, the memory requirements for LU are N log(N) in 2D and
>> N^4/3 in 3D.
>>
>>
>> Il giorno dom 23 feb 2020 alle ore 11:01 Tsung-Hsing Chen <
>> barrydog505 at gmail.com> ha scritto:
>>
>>> Hi all,
>>>
>>> I have written a simple code to solve the FEM problem, and I want to use
>>> LU to solve the Ax=b.
>>> My problem(error) won't happen at the beginning until M & N in A_matrix
>>> is getting larger. (Can also be understood as mesh vertex increase.)
>>> All the error output seems to relate to LU, but I don't know what should
>>> be done.
>>> The followings are the code I wrote(section) and the error output.
>>>
>>> Here's the code (section) :
>>>   /*
>>>         code ...
>>>   */
>>>   ierr = MatCreate(PETSC_COMM_WORLD, &A_matrix);CHKERRQ(ierr);
>>>   ierr = MatSetSizes(A_matrix, PETSC_DECIDE, PETSC_DECIDE, M,
>>> N);CHKERRQ(ierr);
>>>   ierr = MatSetType(A_matrix, MATSEQAIJ);CHKERRQ(ierr);
>>>   // setting nnz ...
>>>   ierr = MatSeqAIJSetPreallocation(A_matrix, 0, nnz);CHKERRQ(ierr);
>>>   /*
>>>         MatSetValues(); ...
>>>         MatAssemblyBegin();
>>>         MatAssemblyEnd();
>>>   */
>>>   ierr = KSPCreate(PETSC_COMM_WORLD, &ksp);CHKERRQ(ierr);
>>>   ierr = KSPSetOperators(ksp, A_matrix, A_matrix);CHKERRQ(ierr);
>>>   ierr = KSPSetType(ksp, KSPPREONLY);CHKERRQ(ierr);
>>>   ierr = KSPGetPC(ksp, &pc);CHKERRQ(ierr);
>>>   ierr = PCSetType(pc, PCLU);CHKERRQ(ierr);
>>>   ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr);
>>>   ierr = KSPSetUp(ksp);CHKERRQ(ierr);
>>>   /*
>>>         code ...
>>>   */
>>>
>>> Here's the error (run with valgrind --tool=memcheck --leak-check=full) :
>>>   ==6371== Warning: set address range perms: large range [0x7c84a040,
>>> 0xb4e9a598) (undefined)
>>>   ==6371== Warning: set address range perms: large range [0xb4e9b040,
>>> 0x2b4e9aeac) (undefined)
>>>   ==6371== Warning: set address range perms: large range [0x2b4e9b040,
>>> 0x4b4e9aeac) (undefined)
>>>   ==6371== Argument 'size' of function memalign has a fishy (possibly
>>> negative) value: -5187484888
>>>   ==6371==    at 0x4C320A6: memalign (in
>>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>>>   ==6371==    by 0x501B4B0: PetscMallocAlign (mal.c:49)
>>>   ==6371==    by 0x501CE37: PetscMallocA (mal.c:422)
>>>   ==6371==    by 0x5ACFF0C: MatLUFactorSymbolic_SeqAIJ (aijfact.c:366)
>>>   ==6371==    by 0x561D8B3: MatLUFactorSymbolic (matrix.c:3005)
>>>   ==6371==    by 0x644ED9C: PCSetUp_LU (lu.c:90)
>>>   ==6371==    by 0x65A2C32: PCSetUp (precon.c:894)
>>>   ==6371==    by 0x6707E71: KSPSetUp (itfunc.c:376)
>>>   ==6371==    by 0x13AB09: Calculate (taylor_hood.c:1780)
>>>   ==6371==    by 0x10CB85: main (taylor_hood.c:228)
>>>   ==6371==
>>>   [0]PETSC ERROR: --------------------- Error Message
>>> --------------------------------------------------------------
>>>   [0]PETSC ERROR: Out of memory. This could be due to allocating
>>>   [0]PETSC ERROR: too large an object or bleeding by not properly
>>>   [0]PETSC ERROR: destroying unneeded objects.
>>>   [0]PETSC ERROR: Memory allocated 0 Memory used by process 15258234880
>>>   [0]PETSC ERROR: Try running with -malloc_dump or -malloc_view for info.
>>>   [0]PETSC ERROR: Memory requested 18446744068522065920
>>>   [0]PETSC ERROR: See
>>> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble
>>> shooting.
>>>   [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown
>>>   [0]PETSC ERROR: ./taylor_hood on a arch-linux2-c-debug named e2-120 by
>>> barry Sun Feb 23 14:18:46 2020
>>>   [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++
>>> --with-fc=gfortran --download-mpich --download-fblaslapack
>>> --download-triangle
>>>   [0]PETSC ERROR: #1 MatLUFactorSymbolic_SeqAIJ() line 366 in
>>> /home/barry/petsc/src/mat/impls/aij/seq/aijfact.c
>>>   [0]PETSC ERROR: #2 PetscMallocA() line 422 in
>>> /home/barry/petsc/src/sys/memory/mal.c
>>>   [0]PETSC ERROR: #3 MatLUFactorSymbolic_SeqAIJ() line 366 in
>>> /home/barry/petsc/src/mat/impls/aij/seq/aijfact.c
>>>   [0]PETSC ERROR: #4 MatLUFactorSymbolic() line 3005 in
>>> /home/barry/petsc/src/mat/interface/matrix.c
>>>   [0]PETSC ERROR: #5 PCSetUp_LU() line 90 in
>>> /home/barry/petsc/src/ksp/pc/impls/factor/lu/lu.c
>>>   [0]PETSC ERROR: #6 PCSetUp() line 894 in
>>> /home/barry/petsc/src/ksp/pc/interface/precon.c
>>>   [0]PETSC ERROR: #7 KSPSetUp() line 376 in
>>> /home/barry/petsc/src/ksp/ksp/interface/itfunc.c
>>>   [0]PETSC ERROR: #8 Calculate() line 1780 in
>>> /home/barry/brain/brain/3D/taylor_hood.c
>>>   [0]PETSC ERROR: #9 main() line 230 in
>>> /home/barry/brain/brain/3D/taylor_hood.c
>>>   [0]PETSC ERROR: PETSc Option Table entries:
>>>   [0]PETSC ERROR: -dm_view
>>>   [0]PETSC ERROR: -f mesh/ellipsoid.msh
>>>   [0]PETSC ERROR: -matload_block_size 1
>>>   [0]PETSC ERROR: ----------------End of Error Message -------send
>>> entire error message to petsc-maint at mcs.anl.gov----------
>>>
>>> Is there any setting that should be done but I ignore?
>>>
>>> Thanks in advance,
>>>
>>> Tsung-Hsing Chen
>>>
>>
>>
>> --
>> Stefano
>>
>

-- 
Stefano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200223/6dd3d911/attachment.html>

From barrydog505 at gmail.com  Sun Feb 23 04:22:44 2020
From: barrydog505 at gmail.com (Tsung-Hsing Chen)
Date: Sun, 23 Feb 2020 18:22:44 +0800
Subject: [petsc-users] Error - Out of memory. This could be due to
 allocating too large an object or bleeding by not properly destroying
 unneeded objects.
In-Reply-To: <CAGPUisg5DtQPS3oTguFC4UvQQW+7yGVMQEQ59TWLraH_V4KB8Q@mail.gmail.com>
References: <CANZ1gTY6RFH12-O9f6bhm1tTw5K8M8ReFiQOMMsgZ_mk_5kjyg@mail.gmail.com>
	<CAGPUisi=e6yytkJcOq90R5tx8MRprV23eNyELki0CTwknWH0gQ@mail.gmail.com>
	<CANZ1gTZhPR8mApr2yLJV-P+Zo=Cz2QuxS+An3ci_wgNpiZ8FyA@mail.gmail.com>
	<CAGPUisg5DtQPS3oTguFC4UvQQW+7yGVMQEQ59TWLraH_V4KB8Q@mail.gmail.com>
Message-ID: <CANZ1gTaCEfwsiui14J9btANZJzAKo4Sk76U6cbS3HutgJgdRkg@mail.gmail.com>

A problem related to elasticity.
I think I will try those external packages.

Thanks for your assistance.

Stefano Zampini <stefano.zampini at gmail.com> ? 2020?2?23? ?? ??5:35???

>
>
> Il giorno dom 23 feb 2020 alle ore 11:53 Tsung-Hsing Chen <
> barrydog505 at gmail.com> ha scritto:
>
>> This error came with a matrix approximate 300,000*300,000, and I was
>> solving a 3D model.
>> " the memory requirements for LU are N log(N) in 2D and N^4/3 in 3D. "
>> What unit is it? Byte?
>>
>
> Number of floating-point entries.
>
> Assuming an optimal nested dissection ordering can be found (i.e. no
> "dense" rows), the largest front is asymptotically as large as N^(2/3) (N
> the size of the matrix)
> Storing it in memory requires (N^(2/3))^2 entries, thus N^4/3 entries.
> What problem are you solving?
> If you plan to use direct methods, you may want to experiment with
> parallel factorization packages like  MUMPS or SUPERLU_DIST
>
>
>>
>> Stefano Zampini <stefano.zampini at gmail.com> ? 2020?2?23? ?? ??4:33???
>>
>>> This seems integer overflow when computing the factors.
>>>
>>> How large is the matrix when you encounter the error?
>>> Note that LU is not memory optimal and you can easily encounter
>>> out-of-memory issues with large matrices.
>>> Assuming sparsity, the memory requirements for LU are N log(N) in 2D and
>>> N^4/3 in 3D.
>>>
>>>
>>> Il giorno dom 23 feb 2020 alle ore 11:01 Tsung-Hsing Chen <
>>> barrydog505 at gmail.com> ha scritto:
>>>
>>>> Hi all,
>>>>
>>>> I have written a simple code to solve the FEM problem, and I want to
>>>> use LU to solve the Ax=b.
>>>> My problem(error) won't happen at the beginning until M & N in A_matrix
>>>> is getting larger. (Can also be understood as mesh vertex increase.)
>>>> All the error output seems to relate to LU, but I don't know what
>>>> should be done.
>>>> The followings are the code I wrote(section) and the error output.
>>>>
>>>> Here's the code (section) :
>>>>   /*
>>>>         code ...
>>>>   */
>>>>   ierr = MatCreate(PETSC_COMM_WORLD, &A_matrix);CHKERRQ(ierr);
>>>>   ierr = MatSetSizes(A_matrix, PETSC_DECIDE, PETSC_DECIDE, M,
>>>> N);CHKERRQ(ierr);
>>>>   ierr = MatSetType(A_matrix, MATSEQAIJ);CHKERRQ(ierr);
>>>>   // setting nnz ...
>>>>   ierr = MatSeqAIJSetPreallocation(A_matrix, 0, nnz);CHKERRQ(ierr);
>>>>   /*
>>>>         MatSetValues(); ...
>>>>         MatAssemblyBegin();
>>>>         MatAssemblyEnd();
>>>>   */
>>>>   ierr = KSPCreate(PETSC_COMM_WORLD, &ksp);CHKERRQ(ierr);
>>>>   ierr = KSPSetOperators(ksp, A_matrix, A_matrix);CHKERRQ(ierr);
>>>>   ierr = KSPSetType(ksp, KSPPREONLY);CHKERRQ(ierr);
>>>>   ierr = KSPGetPC(ksp, &pc);CHKERRQ(ierr);
>>>>   ierr = PCSetType(pc, PCLU);CHKERRQ(ierr);
>>>>   ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr);
>>>>   ierr = KSPSetUp(ksp);CHKERRQ(ierr);
>>>>   /*
>>>>         code ...
>>>>   */
>>>>
>>>> Here's the error (run with valgrind --tool=memcheck --leak-check=full) :
>>>>   ==6371== Warning: set address range perms: large range [0x7c84a040,
>>>> 0xb4e9a598) (undefined)
>>>>   ==6371== Warning: set address range perms: large range [0xb4e9b040,
>>>> 0x2b4e9aeac) (undefined)
>>>>   ==6371== Warning: set address range perms: large range [0x2b4e9b040,
>>>> 0x4b4e9aeac) (undefined)
>>>>   ==6371== Argument 'size' of function memalign has a fishy (possibly
>>>> negative) value: -5187484888
>>>>   ==6371==    at 0x4C320A6: memalign (in
>>>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>>>>   ==6371==    by 0x501B4B0: PetscMallocAlign (mal.c:49)
>>>>   ==6371==    by 0x501CE37: PetscMallocA (mal.c:422)
>>>>   ==6371==    by 0x5ACFF0C: MatLUFactorSymbolic_SeqAIJ (aijfact.c:366)
>>>>   ==6371==    by 0x561D8B3: MatLUFactorSymbolic (matrix.c:3005)
>>>>   ==6371==    by 0x644ED9C: PCSetUp_LU (lu.c:90)
>>>>   ==6371==    by 0x65A2C32: PCSetUp (precon.c:894)
>>>>   ==6371==    by 0x6707E71: KSPSetUp (itfunc.c:376)
>>>>   ==6371==    by 0x13AB09: Calculate (taylor_hood.c:1780)
>>>>   ==6371==    by 0x10CB85: main (taylor_hood.c:228)
>>>>   ==6371==
>>>>   [0]PETSC ERROR: --------------------- Error Message
>>>> --------------------------------------------------------------
>>>>   [0]PETSC ERROR: Out of memory. This could be due to allocating
>>>>   [0]PETSC ERROR: too large an object or bleeding by not properly
>>>>   [0]PETSC ERROR: destroying unneeded objects.
>>>>   [0]PETSC ERROR: Memory allocated 0 Memory used by process 15258234880
>>>>   [0]PETSC ERROR: Try running with -malloc_dump or -malloc_view for
>>>> info.
>>>>   [0]PETSC ERROR: Memory requested 18446744068522065920
>>>>   [0]PETSC ERROR: See
>>>> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble
>>>> shooting.
>>>>   [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown
>>>>   [0]PETSC ERROR: ./taylor_hood on a arch-linux2-c-debug named e2-120
>>>> by barry Sun Feb 23 14:18:46 2020
>>>>   [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++
>>>> --with-fc=gfortran --download-mpich --download-fblaslapack
>>>> --download-triangle
>>>>   [0]PETSC ERROR: #1 MatLUFactorSymbolic_SeqAIJ() line 366 in
>>>> /home/barry/petsc/src/mat/impls/aij/seq/aijfact.c
>>>>   [0]PETSC ERROR: #2 PetscMallocA() line 422 in
>>>> /home/barry/petsc/src/sys/memory/mal.c
>>>>   [0]PETSC ERROR: #3 MatLUFactorSymbolic_SeqAIJ() line 366 in
>>>> /home/barry/petsc/src/mat/impls/aij/seq/aijfact.c
>>>>   [0]PETSC ERROR: #4 MatLUFactorSymbolic() line 3005 in
>>>> /home/barry/petsc/src/mat/interface/matrix.c
>>>>   [0]PETSC ERROR: #5 PCSetUp_LU() line 90 in
>>>> /home/barry/petsc/src/ksp/pc/impls/factor/lu/lu.c
>>>>   [0]PETSC ERROR: #6 PCSetUp() line 894 in
>>>> /home/barry/petsc/src/ksp/pc/interface/precon.c
>>>>   [0]PETSC ERROR: #7 KSPSetUp() line 376 in
>>>> /home/barry/petsc/src/ksp/ksp/interface/itfunc.c
>>>>   [0]PETSC ERROR: #8 Calculate() line 1780 in
>>>> /home/barry/brain/brain/3D/taylor_hood.c
>>>>   [0]PETSC ERROR: #9 main() line 230 in
>>>> /home/barry/brain/brain/3D/taylor_hood.c
>>>>   [0]PETSC ERROR: PETSc Option Table entries:
>>>>   [0]PETSC ERROR: -dm_view
>>>>   [0]PETSC ERROR: -f mesh/ellipsoid.msh
>>>>   [0]PETSC ERROR: -matload_block_size 1
>>>>   [0]PETSC ERROR: ----------------End of Error Message -------send
>>>> entire error message to petsc-maint at mcs.anl.gov----------
>>>>
>>>> Is there any setting that should be done but I ignore?
>>>>
>>>> Thanks in advance,
>>>>
>>>> Tsung-Hsing Chen
>>>>
>>>
>>>
>>> --
>>> Stefano
>>>
>>
>
> --
> Stefano
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200223/87bb7bd1/attachment-0001.html>

From shrirang.abhyankar at pnnl.gov  Sun Feb 23 15:08:54 2020
From: shrirang.abhyankar at pnnl.gov (Abhyankar, Shrirang G)
Date: Sun, 23 Feb 2020 21:08:54 +0000
Subject: [petsc-users] Using PETSc with GPU supported SuperLU_Dist
In-Reply-To: <alpine.LFD.2.21.2002222058480.2066@sb>
References: <264462B2-AE1F-4922-948E-0C6FCCB9A429@pnnl.gov>
	<alpine.LFD.2.21.2002220822150.2066@sb>
	<4BDB7C51-7452-45CC-A118-4D3F4F5D03D1@pnnl.gov>
	<CA+MQGp881gb7XOuKfVBeTZ+NEVzHWX6ipVBqS8GKSj=Np=TWsQ@mail.gmail.com>
	<alpine.LFD.2.21.2002222058480.2066@sb>
Message-ID: <4C14C2B3-0CB1-4E5F-A414-D5FBC10F6F18@pnnl.gov>

I am getting an error now for CUDA driver version. Any suggestions?

petsc:maint$ make test
Running test examples to verify correct installation
Using PETSC_DIR=/people/abhy245/software/petsc and PETSC_ARCH=debug-mode-newell
Possible error running C/C++ src/snes/examples/tutorials/ex19 with 1 MPI process
See http://www.mcs.anl.gov/petsc/documentation/faq.html
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Error in system call
[0]PETSC ERROR: error in cudaSetDevice CUDA driver version is insufficient for CUDA runtime version
[0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.12.4, unknown
[0]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by abhy245 Sun Feb 23 12:49:55 2020
[0]PETSC ERROR: Configure options --download-fblaslapack --download-make --download-metis --download-parmetis --download-scalapack --download-suitesparse --download-superlu_dist-gpu=1 --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++ --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1 --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1 PETSC_ARCH=debug-mode-newell
[0]PETSC ERROR: #1 PetscCUDAInitialize() line 261 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
[0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
[0]PETSC ERROR: #3 PetscInitialize() line 1010 in /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[46518,1],0]
  Exit code:    88
--------------------------------------------------------------------------
Possible error running C/C++ src/snes/examples/tutorials/ex19 with 2 MPI processes
See http://www.mcs.anl.gov/petsc/documentation/faq.html
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[1]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[1]PETSC ERROR: Error in system call
[1]PETSC ERROR: [0]PETSC ERROR: Error in system call
[0]PETSC ERROR: error in cudaGetDeviceCount CUDA driver version is insufficient for CUDA runtime version
[0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
error in cudaGetDeviceCount CUDA driver version is insufficient for CUDA runtime version
[1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[1]PETSC ERROR: [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown
[0]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by abhy245 Sun Feb 23 12:49:57 2020
[0]PETSC ERROR: Configure options --download-fblaslapack --download-make --download-metis --download-parmetis --download-scalapack --download-suitesparse --download-superlu_dist-gpu=1 --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++ --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1 --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1 PETSC_ARCH=debug-mode-newell
[0]PETSC ERROR: #1 PetscCUDAInitialize() line 254 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
[0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
[0]PETSC ERROR: #3 PetscInitialize() line 1010 in /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c
Petsc Release Version 3.12.4, unknown
[1]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by abhy245 Sun Feb 23 12:49:57 2020
[1]PETSC ERROR: Configure options --download-fblaslapack --download-make --download-metis --download-parmetis --download-scalapack --download-suitesparse --download-superlu_dist-gpu=1 --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++ --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1 --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1 PETSC_ARCH=debug-mode-newell
[1]PETSC ERROR: #1 PetscCUDAInitialize() line 254 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
[1]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
[1]PETSC ERROR: #3 PetscInitialize() line 1010 in /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[46522,1],0]
  Exit code:    88
--------------------------------------------------------------------------
1,2c1,21
< lid velocity = 0.0025, prandtl # = 1., grashof # = 1.
< Number of SNES iterations = 2
---
> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [0]PETSC ERROR: Error in system call
> [0]PETSC ERROR: error in cudaSetDevice CUDA driver version is insufficient for CUDA runtime version
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown
> [0]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by abhy245 Sun Feb 23 12:50:00 2020
> [0]PETSC ERROR: Configure options --download-fblaslapack --download-make --download-metis --download-parmetis --download-scalapack --download-suitesparse --download-superlu_dist-gpu=1 --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++ --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1 --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1 PETSC_ARCH=debug-mode-newell
> [0]PETSC ERROR: #1 PetscCUDAInitialize() line 261 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
> [0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
> [0]PETSC ERROR: #3 PetscInitialize() line 1010 in /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c
> --------------------------------------------------------------------------
> Primary job  terminated normally, but 1 process returned
> a non-zero exit code. Per user-direction, the job has been aborted.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpiexec detected that one or more processes exited with non-zero status, thus causing
> the job to be terminated. The first process to do so was:
>
>   Process name: [[46545,1],0]
>   Exit code:    88
> --------------------------------------------------------------------------
/people/abhy245/software/petsc/src/snes/examples/tutorials
Possible problem with ex19 running with superlu_dist, diffs above
=========================================
Possible error running Fortran example src/snes/examples/tutorials/ex5f with 1 MPI process
See http://www.mcs.anl.gov/petsc/documentation/faq.html
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Error in system call
[0]PETSC ERROR: error in cudaSetDevice CUDA driver version is insufficient for CUDA runtime version
[0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.12.4, unknown
[0]PETSC ERROR: ./ex5f on a debug-mode-newell named newell01.pnl.gov by abhy245 Sun Feb 23 12:50:04 2020
[0]PETSC ERROR: Configure options --download-fblaslapack --download-make --download-metis --download-parmetis --download-scalapack --download-suitesparse --download-superlu_dist-gpu=1 --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++ --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1 --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1 PETSC_ARCH=debug-mode-newell
[0]PETSC ERROR: #1 PetscCUDAInitialize() line 261 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
[0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
[0]PETSC ERROR: PetscInitialize:Checking initial options
 Unable to initialize PETSc
--------------------------------------------------------------------------
mpiexec has exited due to process rank 0 with PID 0 on
node newell01 exiting improperly. There are three reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
orte_create_session_dirs is set to false. In this case, the run-time cannot
detect that the abort call was an abnormal termination. Hence, the only
error message you will receive is this one.

This may have caused other processes in the application to be
terminated by signals sent by mpiexec (as reported here).

You can avoid this message by specifying -quiet on the mpiexec command line.
--------------------------------------------------------------------------
Completed test examples
From: Satish Balay <balay at mcs.anl.gov>
Reply-To: petsc-users <petsc-users at mcs.anl.gov>
Date: Saturday, February 22, 2020 at 9:00 PM
To: Junchao Zhang <jczhang at mcs.anl.gov>
Cc: "Abhyankar, Shrirang G" <shrirang.abhyankar at pnnl.gov>, petsc-users <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Using PETSc with GPU supported SuperLU_Dist

The fix is now in both  maint and master

https://gitlab.com/petsc/petsc/-/merge_requests/2555

Satish

On Sat, 22 Feb 2020, Junchao Zhang via petsc-users wrote:

We met the error before and knew why. Will fix it soon.
--Junchao Zhang
On Sat, Feb 22, 2020 at 11:43 AM Abhyankar, Shrirang G via petsc-users <
petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
> Thanks, Satish. Configure and make go through fine. Getting an undefined
> reference error for VecGetArrayWrite_SeqCUDA.
>
>
>
> Shri
>
> *From: *Satish Balay <balay at mcs.anl.gov<mailto:balay at mcs.anl.gov>>
> *Reply-To: *petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
> *Date: *Saturday, February 22, 2020 at 8:25 AM
> *To: *"Abhyankar, Shrirang G" <shrirang.abhyankar at pnnl.gov<mailto:shrirang.abhyankar at pnnl.gov>>
> *Cc: *"petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>" <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
> *Subject: *Re: [petsc-users] Using PETSc with GPU supported SuperLU_Dist
>
>
>
> On Sat, 22 Feb 2020, Abhyankar, Shrirang G via petsc-users wrote:
>
>
>
> Hi,
>
>     I want to install PETSc with GPU supported SuperLU_Dist. What are the
> configure options I should be using?
>
>
>
>
>
> Shri,
>
>
>
>
>
>     if self.framework.argDB['download-superlu_dist-gpu']:
>
>       self.cuda           = framework.require('config.packages.cuda',self)
>
>       self.openmp         =
> framework.require('config.packages.openmp',self)
>
>       self.deps           =
> [self.mpi,self.blasLapack,self.cuda,self.openmp]
>
> <<<<<
>
>
>
> So try:
>
>
>
> --with-cuda=1 --download-superlu_dist=1 --download-superlu_dist-gpu=1
> --with-openmp=1 [and usual MPI, blaslapack]
>
>
>
> Satish
>
>
>
>
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200223/023dcf12/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: make.log
Type: application/octet-stream
Size: 106174 bytes
Desc: make.log
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200223/023dcf12/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: configure.log
Type: application/octet-stream
Size: 2406311 bytes
Desc: configure.log
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200223/023dcf12/attachment-0003.obj>

From shrirang.abhyankar at pnnl.gov  Sun Feb 23 15:33:11 2020
From: shrirang.abhyankar at pnnl.gov (Abhyankar, Shrirang G)
Date: Sun, 23 Feb 2020 21:33:11 +0000
Subject: [petsc-users] Using PETSc with GPU supported SuperLU_Dist
In-Reply-To: <4C14C2B3-0CB1-4E5F-A414-D5FBC10F6F18@pnnl.gov>
References: <264462B2-AE1F-4922-948E-0C6FCCB9A429@pnnl.gov>
	<alpine.LFD.2.21.2002220822150.2066@sb>
	<4BDB7C51-7452-45CC-A118-4D3F4F5D03D1@pnnl.gov>
	<CA+MQGp881gb7XOuKfVBeTZ+NEVzHWX6ipVBqS8GKSj=Np=TWsQ@mail.gmail.com>
	<alpine.LFD.2.21.2002222058480.2066@sb>
	<4C14C2B3-0CB1-4E5F-A414-D5FBC10F6F18@pnnl.gov>
Message-ID: <BC43F79D-388D-4BDC-899C-CE74346EEBCF@pnnl.gov>

I was using CUDA v10.2. Switching to 9.2 gives a clean make test.

Thanks,
Shri


From: petsc-users <petsc-users-bounces at mcs.anl.gov> on behalf of "Abhyankar, Shrirang G via petsc-users" <petsc-users at mcs.anl.gov>
Reply-To: "Abhyankar, Shrirang G" <shrirang.abhyankar at pnnl.gov>
Date: Sunday, February 23, 2020 at 3:10 PM
To: petsc-users <petsc-users at mcs.anl.gov>, Junchao Zhang <jczhang at mcs.anl.gov>
Subject: Re: [petsc-users] Using PETSc with GPU supported SuperLU_Dist

I am getting an error now for CUDA driver version. Any suggestions?

petsc:maint$ make test
Running test examples to verify correct installation
Using PETSC_DIR=/people/abhy245/software/petsc and PETSC_ARCH=debug-mode-newell
Possible error running C/C++ src/snes/examples/tutorials/ex19 with 1 MPI process
See http://www.mcs.anl.gov/petsc/documentation/faq.html
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Error in system call
[0]PETSC ERROR: error in cudaSetDevice CUDA driver version is insufficient for CUDA runtime version
[0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.12.4, unknown
[0]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by abhy245 Sun Feb 23 12:49:55 2020
[0]PETSC ERROR: Configure options --download-fblaslapack --download-make --download-metis --download-parmetis --download-scalapack --download-suitesparse --download-superlu_dist-gpu=1 --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++ --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1 --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1 PETSC_ARCH=debug-mode-newell
[0]PETSC ERROR: #1 PetscCUDAInitialize() line 261 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
[0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
[0]PETSC ERROR: #3 PetscInitialize() line 1010 in /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[46518,1],0]
  Exit code:    88
--------------------------------------------------------------------------
Possible error running C/C++ src/snes/examples/tutorials/ex19 with 2 MPI processes
See http://www.mcs.anl.gov/petsc/documentation/faq.html
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[1]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[1]PETSC ERROR: Error in system call
[1]PETSC ERROR: [0]PETSC ERROR: Error in system call
[0]PETSC ERROR: error in cudaGetDeviceCount CUDA driver version is insufficient for CUDA runtime version
[0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
error in cudaGetDeviceCount CUDA driver version is insufficient for CUDA runtime version
[1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[1]PETSC ERROR: [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown
[0]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by abhy245 Sun Feb 23 12:49:57 2020
[0]PETSC ERROR: Configure options --download-fblaslapack --download-make --download-metis --download-parmetis --download-scalapack --download-suitesparse --download-superlu_dist-gpu=1 --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++ --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1 --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1 PETSC_ARCH=debug-mode-newell
[0]PETSC ERROR: #1 PetscCUDAInitialize() line 254 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
[0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
[0]PETSC ERROR: #3 PetscInitialize() line 1010 in /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c
Petsc Release Version 3.12.4, unknown
[1]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by abhy245 Sun Feb 23 12:49:57 2020
[1]PETSC ERROR: Configure options --download-fblaslapack --download-make --download-metis --download-parmetis --download-scalapack --download-suitesparse --download-superlu_dist-gpu=1 --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++ --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1 --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1 PETSC_ARCH=debug-mode-newell
[1]PETSC ERROR: #1 PetscCUDAInitialize() line 254 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
[1]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
[1]PETSC ERROR: #3 PetscInitialize() line 1010 in /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[46522,1],0]
  Exit code:    88
--------------------------------------------------------------------------
1,2c1,21
< lid velocity = 0.0025, prandtl # = 1., grashof # = 1.
< Number of SNES iterations = 2
---
> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [0]PETSC ERROR: Error in system call
> [0]PETSC ERROR: error in cudaSetDevice CUDA driver version is insufficient for CUDA runtime version
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown
> [0]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by abhy245 Sun Feb 23 12:50:00 2020
> [0]PETSC ERROR: Configure options --download-fblaslapack --download-make --download-metis --download-parmetis --download-scalapack --download-suitesparse --download-superlu_dist-gpu=1 --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++ --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1 --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1 PETSC_ARCH=debug-mode-newell
> [0]PETSC ERROR: #1 PetscCUDAInitialize() line 261 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
> [0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
> [0]PETSC ERROR: #3 PetscInitialize() line 1010 in /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c
> --------------------------------------------------------------------------
> Primary job  terminated normally, but 1 process returned
> a non-zero exit code. Per user-direction, the job has been aborted.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpiexec detected that one or more processes exited with non-zero status, thus causing
> the job to be terminated. The first process to do so was:
>
>   Process name: [[46545,1],0]
>   Exit code:    88
> --------------------------------------------------------------------------
/people/abhy245/software/petsc/src/snes/examples/tutorials
Possible problem with ex19 running with superlu_dist, diffs above
=========================================
Possible error running Fortran example src/snes/examples/tutorials/ex5f with 1 MPI process
See http://www.mcs.anl.gov/petsc/documentation/faq.html
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Error in system call
[0]PETSC ERROR: error in cudaSetDevice CUDA driver version is insufficient for CUDA runtime version
[0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.12.4, unknown
[0]PETSC ERROR: ./ex5f on a debug-mode-newell named newell01.pnl.gov by abhy245 Sun Feb 23 12:50:04 2020
[0]PETSC ERROR: Configure options --download-fblaslapack --download-make --download-metis --download-parmetis --download-scalapack --download-suitesparse --download-superlu_dist-gpu=1 --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++ --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1 --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1 PETSC_ARCH=debug-mode-newell
[0]PETSC ERROR: #1 PetscCUDAInitialize() line 261 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
[0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
[0]PETSC ERROR: PetscInitialize:Checking initial options
 Unable to initialize PETSc
--------------------------------------------------------------------------
mpiexec has exited due to process rank 0 with PID 0 on
node newell01 exiting improperly. There are three reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
orte_create_session_dirs is set to false. In this case, the run-time cannot
detect that the abort call was an abnormal termination. Hence, the only
error message you will receive is this one.

This may have caused other processes in the application to be
terminated by signals sent by mpiexec (as reported here).

You can avoid this message by specifying -quiet on the mpiexec command line.
--------------------------------------------------------------------------
Completed test examples
From: Satish Balay <balay at mcs.anl.gov>
Reply-To: petsc-users <petsc-users at mcs.anl.gov>
Date: Saturday, February 22, 2020 at 9:00 PM
To: Junchao Zhang <jczhang at mcs.anl.gov>
Cc: "Abhyankar, Shrirang G" <shrirang.abhyankar at pnnl.gov>, petsc-users <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Using PETSc with GPU supported SuperLU_Dist

The fix is now in both  maint and master

https://gitlab.com/petsc/petsc/-/merge_requests/2555

Satish

On Sat, 22 Feb 2020, Junchao Zhang via petsc-users wrote:

We met the error before and knew why. Will fix it soon.
--Junchao Zhang
On Sat, Feb 22, 2020 at 11:43 AM Abhyankar, Shrirang G via petsc-users <
petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
> Thanks, Satish. Configure and make go through fine. Getting an undefined
> reference error for VecGetArrayWrite_SeqCUDA.
>
>
>
> Shri
>
> *From: *Satish Balay <balay at mcs.anl.gov<mailto:balay at mcs.anl.gov>>
> *Reply-To: *petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
> *Date: *Saturday, February 22, 2020 at 8:25 AM
> *To: *"Abhyankar, Shrirang G" <shrirang.abhyankar at pnnl.gov<mailto:shrirang.abhyankar at pnnl.gov>>
> *Cc: *"petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>" <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
> *Subject: *Re: [petsc-users] Using PETSc with GPU supported SuperLU_Dist
>
>
>
> On Sat, 22 Feb 2020, Abhyankar, Shrirang G via petsc-users wrote:
>
>
>
> Hi,
>
>     I want to install PETSc with GPU supported SuperLU_Dist. What are the
> configure options I should be using?
>
>
>
>
>
> Shri,
>
>
>
>
>
>     if self.framework.argDB['download-superlu_dist-gpu']:
>
>       self.cuda           = framework.require('config.packages.cuda',self)
>
>       self.openmp         =
> framework.require('config.packages.openmp',self)
>
>       self.deps           =
> [self.mpi,self.blasLapack,self.cuda,self.openmp]
>
> <<<<<
>
>
>
> So try:
>
>
>
> --with-cuda=1 --download-superlu_dist=1 --download-superlu_dist-gpu=1
> --with-openmp=1 [and usual MPI, blaslapack]
>
>
>
> Satish
>
>
>
>
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200223/455540a2/attachment-0001.html>

From richard.beare at monash.edu  Sun Feb 23 17:24:02 2020
From: richard.beare at monash.edu (Richard Beare)
Date: Mon, 24 Feb 2020 10:24:02 +1100
Subject: [petsc-users] Correct approach for updating deprecated code
Message-ID: <CAEMparS-odeegKUZPvzP7qXJ1YVHnR2wnecdLGjYPRw9_N-p0A@mail.gmail.com>

Hi,
The following code gives a deprecation warning. What is the correct way of
updating the use of ViewerSetFormat to ViewerPushFormat (which I presume is
the preferred replacement). My first attempt gave errors concerning
ordering.

Thanks

PetscViewer viewer1;
ierr = PetscViewerBinaryOpen(PETSC_COMM_WORLD,fileName.c_str
(),FILE_MODE_WRITE,&viewer1);CHKERRQ(ierr);
ierr = PetscViewerSetFormat(viewer1,PETSC_VIEWER_BINARY_MATLAB);CHKERRQ
(ierr);

ierr = PetscObjectSetName((PetscObject)mX,"x");CHKERRQ(ierr);
ierr = PetscObjectSetName((PetscObject)mB,"b");CHKERRQ(ierr);

ierr = VecView(mX,viewer1);CHKERRQ(ierr);
ierr = VecView(mB,viewer1);CHKERRQ(ierr);


-- 
--
A/Prof Richard Beare
Imaging and Bioinformatics, Peninsula Clinical School
orcid.org/0000-0002-7530-5664
Richard.Beare at monash.edu
+61 3 9788 1724



Geospatial Research:
https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200224/19f828fa/attachment.html>

From knepley at gmail.com  Sun Feb 23 17:43:02 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Sun, 23 Feb 2020 18:43:02 -0500
Subject: [petsc-users] Correct approach for updating deprecated code
In-Reply-To: <CAEMparS-odeegKUZPvzP7qXJ1YVHnR2wnecdLGjYPRw9_N-p0A@mail.gmail.com>
References: <CAEMparS-odeegKUZPvzP7qXJ1YVHnR2wnecdLGjYPRw9_N-p0A@mail.gmail.com>
Message-ID: <CAMYG4Gn07rab2TL4mGGkCzqhNWEheo3E-9B30-61b8sM6Y2_Nw@mail.gmail.com>

On Sun, Feb 23, 2020 at 6:25 PM Richard Beare via petsc-users <
petsc-users at mcs.anl.gov> wrote:

>
> Hi,
> The following code gives a deprecation warning. What is the correct way of
> updating the use of ViewerSetFormat to ViewerPushFormat (which I presume is
> the preferred replacement). My first attempt gave errors concerning
> ordering.
>

You can't just change SetFormat to PushFormat here?

  Matt


> Thanks
>
> PetscViewer viewer1;
> ierr = PetscViewerBinaryOpen(PETSC_COMM_WORLD,fileName.c_str
> (),FILE_MODE_WRITE,&viewer1);CHKERRQ(ierr);
> ierr = PetscViewerSetFormat(viewer1,PETSC_VIEWER_BINARY_MATLAB);CHKERRQ
> (ierr);
>
> ierr = PetscObjectSetName((PetscObject)mX,"x");CHKERRQ(ierr);
> ierr = PetscObjectSetName((PetscObject)mB,"b");CHKERRQ(ierr);
>
> ierr = VecView(mX,viewer1);CHKERRQ(ierr);
> ierr = VecView(mB,viewer1);CHKERRQ(ierr);
>
>
> --
> --
> A/Prof Richard Beare
> Imaging and Bioinformatics, Peninsula Clinical School
> orcid.org/0000-0002-7530-5664
> Richard.Beare at monash.edu
> +61 3 9788 1724
>
>
>
> Geospatial Research:
> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200223/dd25fae8/attachment.html>

From richard.beare at monash.edu  Sun Feb 23 17:45:29 2020
From: richard.beare at monash.edu (Richard Beare)
Date: Mon, 24 Feb 2020 10:45:29 +1100
Subject: [petsc-users] Correct approach for updating deprecated code
In-Reply-To: <CAMYG4Gn07rab2TL4mGGkCzqhNWEheo3E-9B30-61b8sM6Y2_Nw@mail.gmail.com>
References: <CAEMparS-odeegKUZPvzP7qXJ1YVHnR2wnecdLGjYPRw9_N-p0A@mail.gmail.com>
	<CAMYG4Gn07rab2TL4mGGkCzqhNWEheo3E-9B30-61b8sM6Y2_Nw@mail.gmail.com>
Message-ID: <CAEMparQ+nDjOBNXu6BJA=rEAuAU=WDDFe2cbWJFAp7WZeq33jA@mail.gmail.com>

That's what I did (see below), but I got ordering errors (unfortunately
deleted those logs too soon). I'll rerun if no one recognises what I've
done wrong.

PetscViewer viewer1;
ierr = PetscViewerBinaryOpen(PETSC_COMM_WORLD,fileName.c_str
(),FILE_MODE_WRITE,&viewer1);CHKERRQ(ierr);
//ierr =
PetscViewerSetFormat(viewer1,PETSC_VIEWER_BINARY_MATLAB);CHKERRQ(ierr);
ierr = PetscViewerPushFormat(viewer1,PETSC_VIEWER_BINARY_MATLAB);CHKERRQ
(ierr);

ierr = PetscObjectSetName((PetscObject)mX,"x");CHKERRQ(ierr);
ierr = PetscObjectSetName((PetscObject)mB,"b");CHKERRQ(ierr);

On Mon, 24 Feb 2020 at 10:43, Matthew Knepley <knepley at gmail.com> wrote:

> On Sun, Feb 23, 2020 at 6:25 PM Richard Beare via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
>>
>> Hi,
>> The following code gives a deprecation warning. What is the correct way
>> of updating the use of ViewerSetFormat to ViewerPushFormat (which I presume
>> is the preferred replacement). My first attempt gave errors concerning
>> ordering.
>>
>
> You can't just change SetFormat to PushFormat here?
>
>   Matt
>
>
>> Thanks
>>
>> PetscViewer viewer1;
>> ierr = PetscViewerBinaryOpen(PETSC_COMM_WORLD,fileName.c_str
>> (),FILE_MODE_WRITE,&viewer1);CHKERRQ(ierr);
>> ierr = PetscViewerSetFormat(viewer1,PETSC_VIEWER_BINARY_MATLAB);CHKERRQ
>> (ierr);
>>
>> ierr = PetscObjectSetName((PetscObject)mX,"x");CHKERRQ(ierr);
>> ierr = PetscObjectSetName((PetscObject)mB,"b");CHKERRQ(ierr);
>>
>> ierr = VecView(mX,viewer1);CHKERRQ(ierr);
>> ierr = VecView(mB,viewer1);CHKERRQ(ierr);
>>
>>
>> --
>> --
>> A/Prof Richard Beare
>> Imaging and Bioinformatics, Peninsula Clinical School
>> orcid.org/0000-0002-7530-5664
>> Richard.Beare at monash.edu
>> +61 3 9788 1724
>>
>>
>>
>> Geospatial Research:
>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>


-- 
--
A/Prof Richard Beare
Imaging and Bioinformatics, Peninsula Clinical School
orcid.org/0000-0002-7530-5664
Richard.Beare at monash.edu
+61 3 9788 1724



Geospatial Research:
https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200224/a3ec1fec/attachment.html>

From knepley at gmail.com  Sun Feb 23 19:35:47 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Sun, 23 Feb 2020 20:35:47 -0500
Subject: [petsc-users] Correct approach for updating deprecated code
In-Reply-To: <CAEMparQ+nDjOBNXu6BJA=rEAuAU=WDDFe2cbWJFAp7WZeq33jA@mail.gmail.com>
References: <CAEMparS-odeegKUZPvzP7qXJ1YVHnR2wnecdLGjYPRw9_N-p0A@mail.gmail.com>
	<CAMYG4Gn07rab2TL4mGGkCzqhNWEheo3E-9B30-61b8sM6Y2_Nw@mail.gmail.com>
	<CAEMparQ+nDjOBNXu6BJA=rEAuAU=WDDFe2cbWJFAp7WZeq33jA@mail.gmail.com>
Message-ID: <CAMYG4G=ddyuT3XBEGq0x4CYs4juAh2Qb-NoEk8Ajw1sWq98FHg@mail.gmail.com>

I think you are going to have to send the error logs.

  Thanks,

     MAtt

On Sun, Feb 23, 2020 at 6:45 PM Richard Beare <richard.beare at monash.edu>
wrote:

> That's what I did (see below), but I got ordering errors (unfortunately
> deleted those logs too soon). I'll rerun if no one recognises what I've
> done wrong.
>
> PetscViewer viewer1;
> ierr = PetscViewerBinaryOpen(PETSC_COMM_WORLD,fileName.c_str
> (),FILE_MODE_WRITE,&viewer1);CHKERRQ(ierr);
> //ierr =
> PetscViewerSetFormat(viewer1,PETSC_VIEWER_BINARY_MATLAB);CHKERRQ(ierr);
> ierr = PetscViewerPushFormat(viewer1,PETSC_VIEWER_BINARY_MATLAB);CHKERRQ
> (ierr);
>
> ierr = PetscObjectSetName((PetscObject)mX,"x");CHKERRQ(ierr);
> ierr = PetscObjectSetName((PetscObject)mB,"b");CHKERRQ(ierr);
>
> On Mon, 24 Feb 2020 at 10:43, Matthew Knepley <knepley at gmail.com> wrote:
>
>> On Sun, Feb 23, 2020 at 6:25 PM Richard Beare via petsc-users <
>> petsc-users at mcs.anl.gov> wrote:
>>
>>>
>>> Hi,
>>> The following code gives a deprecation warning. What is the correct way
>>> of updating the use of ViewerSetFormat to ViewerPushFormat (which I presume
>>> is the preferred replacement). My first attempt gave errors concerning
>>> ordering.
>>>
>>
>> You can't just change SetFormat to PushFormat here?
>>
>>   Matt
>>
>>
>>> Thanks
>>>
>>> PetscViewer viewer1;
>>> ierr = PetscViewerBinaryOpen(PETSC_COMM_WORLD,fileName.c_str
>>> (),FILE_MODE_WRITE,&viewer1);CHKERRQ(ierr);
>>> ierr = PetscViewerSetFormat(viewer1,PETSC_VIEWER_BINARY_MATLAB);CHKERRQ
>>> (ierr);
>>>
>>> ierr = PetscObjectSetName((PetscObject)mX,"x");CHKERRQ(ierr);
>>> ierr = PetscObjectSetName((PetscObject)mB,"b");CHKERRQ(ierr);
>>>
>>> ierr = VecView(mX,viewer1);CHKERRQ(ierr);
>>> ierr = VecView(mB,viewer1);CHKERRQ(ierr);
>>>
>>>
>>> --
>>> --
>>> A/Prof Richard Beare
>>> Imaging and Bioinformatics, Peninsula Clinical School
>>> orcid.org/0000-0002-7530-5664
>>> Richard.Beare at monash.edu
>>> +61 3 9788 1724
>>>
>>>
>>>
>>> Geospatial Research:
>>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>
>
> --
> --
> A/Prof Richard Beare
> Imaging and Bioinformatics, Peninsula Clinical School
> orcid.org/0000-0002-7530-5664
> Richard.Beare at monash.edu
> +61 3 9788 1724
>
>
>
> Geospatial Research:
> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200223/ddce2d02/attachment-0001.html>

From pierpaolo.minelli at cnr.it  Mon Feb 24 04:30:24 2020
From: pierpaolo.minelli at cnr.it (Pierpaolo Minelli)
Date: Mon, 24 Feb 2020 11:30:24 +0100
Subject: [petsc-users] Using DMDAs with an assigned domain decomposition to
 Solve Poisson Equation
Message-ID: <37B03CED-D8A0-4736-9F5C-2994200BF2D5@cnr.it>

Hi,
I'm developing a 3D code in Fortran to study the space-time evolution of charged particles within a Cartesian domain.
The domain decomposition has been made by me taking into account symmetry and load balancing reasons related to my specific problem. In this first draft, it will remain constant throughout my simulation.

Is there a way, using DMDAs, to solve Poisson's equation, using the domain decomposition above, obtaining as a result the local solution including its ghost cells values?

As input data at each time-step I know the electric charge density in each local subdomain (RHS), including the ghost cells, even if I don't think they are useful for the calculation of the equation.
Matrix coefficients (LHS) and boundary conditions are constant during my simulation.

As an output I would need to know the local electrical potential in each local subdomain, including the values of the ghost cells in each dimension(X,Y,Z).

Is there an example that I can use in Fortran to solve this kind of problem?

Thanks in advance

Pierpaolo Minelli


From mfadams at lbl.gov  Mon Feb 24 05:08:31 2020
From: mfadams at lbl.gov (Mark Adams)
Date: Mon, 24 Feb 2020 06:08:31 -0500
Subject: [petsc-users] Using DMDAs with an assigned domain decomposition
 to Solve Poisson Equation
In-Reply-To: <37B03CED-D8A0-4736-9F5C-2994200BF2D5@cnr.it>
References: <37B03CED-D8A0-4736-9F5C-2994200BF2D5@cnr.it>
Message-ID: <CADOhEh4u5tbxJJvyQOMYqY6H-_TgRdQ4GrYcztxkSqV1QepLog@mail.gmail.com>

On Mon, Feb 24, 2020 at 5:30 AM Pierpaolo Minelli <pierpaolo.minelli at cnr.it>
wrote:

> Hi,
> I'm developing a 3D code in Fortran to study the space-time evolution of
> charged particles within a Cartesian domain.
> The domain decomposition has been made by me taking into account symmetry
> and load balancing reasons related to my specific problem. In this first
> draft, it will remain constant throughout my simulation.
>
> Is there a way, using DMDAs, to solve Poisson's equation, using the domain
> decomposition above, obtaining as a result the local solution including its
> ghost cells values?
>

https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DM/DMGlobalToLocalBegin.html#DMGlobalToLocalBegin



>
> As input data at each time-step I know the electric charge density in each
> local subdomain (RHS), including the ghost cells, even if I don't think
> they are useful for the calculation of the equation.
> Matrix coefficients (LHS) and boundary conditions are constant during my
> simulation.
>
> As an output I would need to know the local electrical potential in each
> local subdomain, including the values of the ghost cells in each
> dimension(X,Y,Z).
>
> Is there an example that I can use in Fortran to solve this kind of
> problem?
>

I see one, but it is not hard to convert a C example:

https://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/ex14f.F90.html


>
> Thanks in advance
>
> Pierpaolo Minelli
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200224/a149ca8d/attachment.html>

From knepley at gmail.com  Mon Feb 24 05:24:24 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 24 Feb 2020 06:24:24 -0500
Subject: [petsc-users] Using DMDAs with an assigned domain decomposition
 to Solve Poisson Equation
In-Reply-To: <37B03CED-D8A0-4736-9F5C-2994200BF2D5@cnr.it>
References: <37B03CED-D8A0-4736-9F5C-2994200BF2D5@cnr.it>
Message-ID: <CAMYG4Gn54a9Dk__UFiPLDn2hhxDx7Mqek8q4M9T=U=fN2CNYOw@mail.gmail.com>

On Mon, Feb 24, 2020 at 5:30 AM Pierpaolo Minelli <pierpaolo.minelli at cnr.it>
wrote:

> Hi,
> I'm developing a 3D code in Fortran to study the space-time evolution of
> charged particles within a Cartesian domain.
> The domain decomposition has been made by me taking into account symmetry
> and load balancing reasons related to my specific problem.


That may be a problem. DMDA can only decompose itself along straight lines
through the domain. Is that how your decomposition looks?


> In this first draft, it will remain constant throughout my simulation.
>
> Is there a way, using DMDAs, to solve Poisson's equation, using the domain
> decomposition above, obtaining as a result the local solution including its
> ghost cells values?
>

How do you discretize the Poisson equation?

  Thanks,

    Matt


> As input data at each time-step I know the electric charge density in each
> local subdomain (RHS), including the ghost cells, even if I don't think
> they are useful for the calculation of the equation.
> Matrix coefficients (LHS) and boundary conditions are constant during my
> simulation.
>
> As an output I would need to know the local electrical potential in each
> local subdomain, including the values of the ghost cells in each
> dimension(X,Y,Z).
>
> Is there an example that I can use in Fortran to solve this kind of
> problem?
>
> Thanks in advance
>
> Pierpaolo Minelli
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200224/a0447eb9/attachment.html>

From pierpaolo.minelli at cnr.it  Mon Feb 24 05:29:39 2020
From: pierpaolo.minelli at cnr.it (Pierpaolo Minelli)
Date: Mon, 24 Feb 2020 12:29:39 +0100
Subject: [petsc-users] Using DMDAs with an assigned domain decomposition
 to Solve Poisson Equation
In-Reply-To: <CADOhEh4u5tbxJJvyQOMYqY6H-_TgRdQ4GrYcztxkSqV1QepLog@mail.gmail.com>
References: <37B03CED-D8A0-4736-9F5C-2994200BF2D5@cnr.it>
	<CADOhEh4u5tbxJJvyQOMYqY6H-_TgRdQ4GrYcztxkSqV1QepLog@mail.gmail.com>
Message-ID: <00B8AC11-E70D-41F5-BEA5-96788B59715D@cnr.it>



> Il giorno 24 feb 2020, alle ore 12:08, Mark Adams <mfadams at lbl.gov> ha scritto:
> 
> 
> On Mon, Feb 24, 2020 at 5:30 AM Pierpaolo Minelli <pierpaolo.minelli at cnr.it <mailto:pierpaolo.minelli at cnr.it>> wrote:
> Hi,
> I'm developing a 3D code in Fortran to study the space-time evolution of charged particles within a Cartesian domain.
> The domain decomposition has been made by me taking into account symmetry and load balancing reasons related to my specific problem. In this first draft, it will remain constant throughout my simulation.
> 
> Is there a way, using DMDAs, to solve Poisson's equation, using the domain decomposition above, obtaining as a result the local solution including its ghost cells values?
> 
> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DM/DMGlobalToLocalBegin.html#DMGlobalToLocalBegin <https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DM/DMGlobalToLocalBegin.html#DMGlobalToLocalBegin>
> 
>  
> 
> As input data at each time-step I know the electric charge density in each local subdomain (RHS), including the ghost cells, even if I don't think they are useful for the calculation of the equation.
> Matrix coefficients (LHS) and boundary conditions are constant during my simulation.
> 
> As an output I would need to know the local electrical potential in each local subdomain, including the values of the ghost cells in each dimension(X,Y,Z).
> 
> Is there an example that I can use in Fortran to solve this kind of problem?
> 
> I see one, but it is not hard to convert a C example:
> 
> https://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/ex14f.F90.html <https://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/ex14f.F90.html>

Thanks, i will try to give a look to this example and i will try to check also C examples in that directory.

>  
> 
> Thanks in advance
> 
> Pierpaolo Minelli
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200224/257a1619/attachment.html>

From pierpaolo.minelli at cnr.it  Mon Feb 24 05:35:23 2020
From: pierpaolo.minelli at cnr.it (Pierpaolo Minelli)
Date: Mon, 24 Feb 2020 12:35:23 +0100
Subject: [petsc-users] Using DMDAs with an assigned domain decomposition
 to Solve Poisson Equation
In-Reply-To: <CAMYG4Gn54a9Dk__UFiPLDn2hhxDx7Mqek8q4M9T=U=fN2CNYOw@mail.gmail.com>
References: <37B03CED-D8A0-4736-9F5C-2994200BF2D5@cnr.it>
	<CAMYG4Gn54a9Dk__UFiPLDn2hhxDx7Mqek8q4M9T=U=fN2CNYOw@mail.gmail.com>
Message-ID: <22A77097-9BFB-4FA5-A152-BF86D4D40D1F@cnr.it>



> Il giorno 24 feb 2020, alle ore 12:24, Matthew Knepley <knepley at gmail.com> ha scritto:
> 
> On Mon, Feb 24, 2020 at 5:30 AM Pierpaolo Minelli <pierpaolo.minelli at cnr.it <mailto:pierpaolo.minelli at cnr.it>> wrote:
> Hi,
> I'm developing a 3D code in Fortran to study the space-time evolution of charged particles within a Cartesian domain.
> The domain decomposition has been made by me taking into account symmetry and load balancing reasons related to my specific problem.
> 
> That may be a problem. DMDA can only decompose itself along straight lines through the domain. Is that how your decomposition looks?

My decomposition at the moment is paractically a 2D decomposition because i have:

M = 251 (X)
N = 341 (Y)
P = 161 (Z)

and if i use 24 MPI procs, i divided my domain in a 3D Cartesian Topology with:

m = 4
n = 6
p = 1


>  
> In this first draft, it will remain constant throughout my simulation.
> 
> Is there a way, using DMDAs, to solve Poisson's equation, using the domain decomposition above, obtaining as a result the local solution including its ghost cells values?
> 
> How do you discretize the Poisson equation?

I intend to use a 7 point stencil like that in this example:

https://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/ex22f.F90.html <https://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/ex22f.F90.html>


> 
>   Thanks,
> 
>     Matt
>  
> As input data at each time-step I know the electric charge density in each local subdomain (RHS), including the ghost cells, even if I don't think they are useful for the calculation of the equation.
> Matrix coefficients (LHS) and boundary conditions are constant during my simulation.
> 
> As an output I would need to know the local electrical potential in each local subdomain, including the values of the ghost cells in each dimension(X,Y,Z).
> 
> Is there an example that I can use in Fortran to solve this kind of problem?
> 
> Thanks in advance
> 
> Pierpaolo Minelli
> 
> 


Thanks
Pierpaolo

> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200224/a5cc1f34/attachment-0001.html>

From knepley at gmail.com  Mon Feb 24 05:58:43 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 24 Feb 2020 06:58:43 -0500
Subject: [petsc-users] Using DMDAs with an assigned domain decomposition
 to Solve Poisson Equation
In-Reply-To: <22A77097-9BFB-4FA5-A152-BF86D4D40D1F@cnr.it>
References: <37B03CED-D8A0-4736-9F5C-2994200BF2D5@cnr.it>
	<CAMYG4Gn54a9Dk__UFiPLDn2hhxDx7Mqek8q4M9T=U=fN2CNYOw@mail.gmail.com>
	<22A77097-9BFB-4FA5-A152-BF86D4D40D1F@cnr.it>
Message-ID: <CAMYG4G=Fcex-H0dyUQmRHiFs2xPsx=B27W+Mp7oMCZENtbN3eA@mail.gmail.com>

On Mon, Feb 24, 2020 at 6:35 AM Pierpaolo Minelli <pierpaolo.minelli at cnr.it>
wrote:

>
>
> Il giorno 24 feb 2020, alle ore 12:24, Matthew Knepley <knepley at gmail.com>
> ha scritto:
>
> On Mon, Feb 24, 2020 at 5:30 AM Pierpaolo Minelli <
> pierpaolo.minelli at cnr.it> wrote:
>
>> Hi,
>> I'm developing a 3D code in Fortran to study the space-time evolution of
>> charged particles within a Cartesian domain.
>> The domain decomposition has been made by me taking into account symmetry
>> and load balancing reasons related to my specific problem.
>
>
> That may be a problem. DMDA can only decompose itself along straight lines
> through the domain. Is that how your decomposition looks?
>
>
> My decomposition at the moment is paractically a 2D decomposition because
> i have:
>
> M = 251 (X)
> N = 341 (Y)
> P = 161 (Z)
>
> and if i use 24 MPI procs, i divided my domain in a 3D Cartesian Topology
> with:
>
> m = 4
> n = 6
> p = 1
>
>
>
>
>> In this first draft, it will remain constant throughout my simulation.
>>
>> Is there a way, using DMDAs, to solve Poisson's equation, using the
>> domain decomposition above, obtaining as a result the local solution
>> including its ghost cells values?
>>
>
> How do you discretize the Poisson equation?
>
>
> I intend to use a 7 point stencil like that in this example:
>
>
> https://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/ex22f.F90.html
>

Okay, then you can do exactly as Mark says and use that example. This will
allow you to use geometric multigrid
for the Poisson problem. I don't think it can be beaten speed-wise.

  Thanks,

     Matt


>
>   Thanks,
>
>     Matt
>
>
>> As input data at each time-step I know the electric charge density in
>> each local subdomain (RHS), including the ghost cells, even if I don't
>> think they are useful for the calculation of the equation.
>> Matrix coefficients (LHS) and boundary conditions are constant during my
>> simulation.
>>
>> As an output I would need to know the local electrical potential in each
>> local subdomain, including the values of the ghost cells in each
>> dimension(X,Y,Z).
>>
>> Is there an example that I can use in Fortran to solve this kind of
>> problem?
>>
>> Thanks in advance
>>
>> Pierpaolo Minelli
>>
>>
>
>
> Thanks
> Pierpaolo
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200224/65f0dd08/attachment.html>

From pierpaolo.minelli at cnr.it  Mon Feb 24 06:07:09 2020
From: pierpaolo.minelli at cnr.it (Pierpaolo Minelli)
Date: Mon, 24 Feb 2020 13:07:09 +0100
Subject: [petsc-users] Using DMDAs with an assigned domain decomposition
 to Solve Poisson Equation
In-Reply-To: <CAMYG4G=Fcex-H0dyUQmRHiFs2xPsx=B27W+Mp7oMCZENtbN3eA@mail.gmail.com>
References: <37B03CED-D8A0-4736-9F5C-2994200BF2D5@cnr.it>
	<CAMYG4Gn54a9Dk__UFiPLDn2hhxDx7Mqek8q4M9T=U=fN2CNYOw@mail.gmail.com>
	<22A77097-9BFB-4FA5-A152-BF86D4D40D1F@cnr.it>
	<CAMYG4G=Fcex-H0dyUQmRHiFs2xPsx=B27W+Mp7oMCZENtbN3eA@mail.gmail.com>
Message-ID: <F19F2CE0-30C5-41BB-A275-EDB66388E568@cnr.it>



> Il giorno 24 feb 2020, alle ore 12:58, Matthew Knepley <knepley at gmail.com> ha scritto:
> 
> On Mon, Feb 24, 2020 at 6:35 AM Pierpaolo Minelli <pierpaolo.minelli at cnr.it <mailto:pierpaolo.minelli at cnr.it>> wrote:
> 
> 
>> Il giorno 24 feb 2020, alle ore 12:24, Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>> ha scritto:
>> 
>> On Mon, Feb 24, 2020 at 5:30 AM Pierpaolo Minelli <pierpaolo.minelli at cnr.it <mailto:pierpaolo.minelli at cnr.it>> wrote:
>> Hi,
>> I'm developing a 3D code in Fortran to study the space-time evolution of charged particles within a Cartesian domain.
>> The domain decomposition has been made by me taking into account symmetry and load balancing reasons related to my specific problem.
>> 
>> That may be a problem. DMDA can only decompose itself along straight lines through the domain. Is that how your decomposition looks?
> 
> My decomposition at the moment is paractically a 2D decomposition because i have:
> 
> M = 251 (X)
> N = 341 (Y)
> P = 161 (Z)
> 
> and if i use 24 MPI procs, i divided my domain in a 3D Cartesian Topology with:
> 
> m = 4
> n = 6
> p = 1
> 
> 
>>  
>> In this first draft, it will remain constant throughout my simulation.
>> 
>> Is there a way, using DMDAs, to solve Poisson's equation, using the domain decomposition above, obtaining as a result the local solution including its ghost cells values?
>> 
>> How do you discretize the Poisson equation?
> 
> I intend to use a 7 point stencil like that in this example:
> 
> https://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/ex22f.F90.html <https://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/ex22f.F90.html>
> 
> Okay, then you can do exactly as Mark says and use that example. This will allow you to use geometric multigrid
> for the Poisson problem. I don't think it can be beaten speed-wise.
> 
>   Thanks,
> 
>      Matt
>  


Ok, i will try this approach and let you know.

Thanks again

Pierpaolo



>> 
>>   Thanks,
>> 
>>     Matt
>>  
>> As input data at each time-step I know the electric charge density in each local subdomain (RHS), including the ghost cells, even if I don't think they are useful for the calculation of the equation.
>> Matrix coefficients (LHS) and boundary conditions are constant during my simulation.
>> 
>> As an output I would need to know the local electrical potential in each local subdomain, including the values of the ghost cells in each dimension(X,Y,Z).
>> 
>> Is there an example that I can use in Fortran to solve this kind of problem?
>> 
>> Thanks in advance
>> 
>> Pierpaolo Minelli
>> 
>> 
> 
> 
> Thanks
> Pierpaolo
> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>> -- Norbert Wiener
>> 
>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200224/534e3f42/attachment.html>

From mfadams at lbl.gov  Mon Feb 24 07:20:38 2020
From: mfadams at lbl.gov (Mark Adams)
Date: Mon, 24 Feb 2020 08:20:38 -0500
Subject: [petsc-users] Using DMDAs with an assigned domain decomposition
 to Solve Poisson Equation
In-Reply-To: <00B8AC11-E70D-41F5-BEA5-96788B59715D@cnr.it>
References: <37B03CED-D8A0-4736-9F5C-2994200BF2D5@cnr.it>
	<CADOhEh4u5tbxJJvyQOMYqY6H-_TgRdQ4GrYcztxkSqV1QepLog@mail.gmail.com>
	<00B8AC11-E70D-41F5-BEA5-96788B59715D@cnr.it>
Message-ID: <CADOhEh5vJJSio2w3GsLYf1aZNCM-w3wBYgFqXqf9P5Fao2FXww@mail.gmail.com>

>
>
>
> https://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/ex14f.F90.html
>
>
> Thanks, i will try to give a look to this example and i will try to check
> also C examples in that directory.
>

There are a lot of places with examples (this is actually not an
obvious place). The web page for each method lists examples that use it and
similar methods. You can find examples by following these links too.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200224/7dba20d2/attachment.html>

From jczhang at mcs.anl.gov  Mon Feb 24 09:01:05 2020
From: jczhang at mcs.anl.gov (Junchao Zhang)
Date: Mon, 24 Feb 2020 09:01:05 -0600
Subject: [petsc-users] Using PETSc with GPU supported SuperLU_Dist
In-Reply-To: <BC43F79D-388D-4BDC-899C-CE74346EEBCF@pnnl.gov>
References: <264462B2-AE1F-4922-948E-0C6FCCB9A429@pnnl.gov>
	<alpine.LFD.2.21.2002220822150.2066@sb>
	<4BDB7C51-7452-45CC-A118-4D3F4F5D03D1@pnnl.gov>
	<CA+MQGp881gb7XOuKfVBeTZ+NEVzHWX6ipVBqS8GKSj=Np=TWsQ@mail.gmail.com>
	<alpine.LFD.2.21.2002222058480.2066@sb>
	<4C14C2B3-0CB1-4E5F-A414-D5FBC10F6F18@pnnl.gov>
	<BC43F79D-388D-4BDC-899C-CE74346EEBCF@pnnl.gov>
Message-ID: <CA+MQGp864q2-rQRWS+ir6FGjcDzWyfwKrCS-tYp2dNDg9akhJA@mail.gmail.com>

[0]PETSC ERROR: error in cudaSetDevice CUDA driver version is insufficient
for CUDA runtime version

That means you need to update your cuda driver for CUDA 10.2.  See minimum
requirement in Table 1 at
https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#major-components

--Junchao Zhang


On Sun, Feb 23, 2020 at 3:33 PM Abhyankar, Shrirang G <
shrirang.abhyankar at pnnl.gov> wrote:

> I was using CUDA v10.2. Switching to 9.2 gives a clean make test.
>
>
>
> Thanks,
>
> Shri
>
>
>
>
>
> *From: *petsc-users <petsc-users-bounces at mcs.anl.gov> on behalf of
> "Abhyankar, Shrirang G via petsc-users" <petsc-users at mcs.anl.gov>
> *Reply-To: *"Abhyankar, Shrirang G" <shrirang.abhyankar at pnnl.gov>
> *Date: *Sunday, February 23, 2020 at 3:10 PM
> *To: *petsc-users <petsc-users at mcs.anl.gov>, Junchao Zhang <
> jczhang at mcs.anl.gov>
> *Subject: *Re: [petsc-users] Using PETSc with GPU supported SuperLU_Dist
>
>
>
> I am getting an error now for CUDA driver version. Any suggestions?
>
>
>
> petsc:maint$ make test
>
> Running test examples to verify correct installation
>
> Using PETSC_DIR=/people/abhy245/software/petsc and
> PETSC_ARCH=debug-mode-newell
>
> Possible error running C/C++ src/snes/examples/tutorials/ex19 with 1 MPI
> process
>
> See http://www.mcs.anl.gov/petsc/documentation/faq.html
>
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
>
> [0]PETSC ERROR: Error in system call
>
> [0]PETSC ERROR: error in cudaSetDevice CUDA driver version is insufficient
> for CUDA runtime version
>
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
>
> [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown
>
> [0]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by
> abhy245 Sun Feb 23 12:49:55 2020
>
> [0]PETSC ERROR: Configure options --download-fblaslapack --download-make
> --download-metis --download-parmetis --download-scalapack
> --download-suitesparse --download-superlu_dist-gpu=1
> --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++
> --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1
> --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1
> PETSC_ARCH=debug-mode-newell
>
> [0]PETSC ERROR: #1 PetscCUDAInitialize() line 261 in
> /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
>
> [0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in
> /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
>
> [0]PETSC ERROR: #3 PetscInitialize() line 1010 in
> /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c
>
> --------------------------------------------------------------------------
>
> Primary job  terminated normally, but 1 process returned
>
> a non-zero exit code. Per user-direction, the job has been aborted.
>
> --------------------------------------------------------------------------
>
> --------------------------------------------------------------------------
>
> mpiexec detected that one or more processes exited with non-zero status,
> thus causing
>
> the job to be terminated. The first process to do so was:
>
>
>
>   Process name: [[46518,1],0]
>
>   Exit code:    88
>
> --------------------------------------------------------------------------
>
> Possible error running C/C++ src/snes/examples/tutorials/ex19 with 2 MPI
> processes
>
> See http://www.mcs.anl.gov/petsc/documentation/faq.html
>
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
>
> [1]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
>
> [1]PETSC ERROR: Error in system call
>
> [1]PETSC ERROR: [0]PETSC ERROR: Error in system call
>
> [0]PETSC ERROR: error in cudaGetDeviceCount CUDA driver version is
> insufficient for CUDA runtime version
>
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
>
> error in cudaGetDeviceCount CUDA driver version is insufficient for CUDA
> runtime version
>
> [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
>
> [1]PETSC ERROR: [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown
>
> [0]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by
> abhy245 Sun Feb 23 12:49:57 2020
>
> [0]PETSC ERROR: Configure options --download-fblaslapack --download-make
> --download-metis --download-parmetis --download-scalapack
> --download-suitesparse --download-superlu_dist-gpu=1
> --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++
> --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1
> --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1
> PETSC_ARCH=debug-mode-newell
>
> [0]PETSC ERROR: #1 PetscCUDAInitialize() line 254 in
> /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
>
> [0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in
> /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
>
> [0]PETSC ERROR: #3 PetscInitialize() line 1010 in
> /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c
>
> Petsc Release Version 3.12.4, unknown
>
> [1]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by
> abhy245 Sun Feb 23 12:49:57 2020
>
> [1]PETSC ERROR: Configure options --download-fblaslapack --download-make
> --download-metis --download-parmetis --download-scalapack
> --download-suitesparse --download-superlu_dist-gpu=1
> --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++
> --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1
> --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1
> PETSC_ARCH=debug-mode-newell
>
> [1]PETSC ERROR: #1 PetscCUDAInitialize() line 254 in
> /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
>
> [1]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in
> /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
>
> [1]PETSC ERROR: #3 PetscInitialize() line 1010 in
> /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c
>
> --------------------------------------------------------------------------
>
> Primary job  terminated normally, but 1 process returned
>
> a non-zero exit code. Per user-direction, the job has been aborted.
>
> --------------------------------------------------------------------------
>
> --------------------------------------------------------------------------
>
> mpiexec detected that one or more processes exited with non-zero status,
> thus causing
>
> the job to be terminated. The first process to do so was:
>
>
>
>   Process name: [[46522,1],0]
>
>   Exit code:    88
>
> --------------------------------------------------------------------------
>
> 1,2c1,21
>
> < lid velocity = 0.0025, prandtl # = 1., grashof # = 1.
>
> < Number of SNES iterations = 2
>
> ---
>
> > [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
>
> > [0]PETSC ERROR: Error in system call
>
> > [0]PETSC ERROR: error in cudaSetDevice CUDA driver version is
> insufficient for CUDA runtime version
>
> > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
>
> > [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown
>
> > [0]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by
> abhy245 Sun Feb 23 12:50:00 2020
>
> > [0]PETSC ERROR: Configure options --download-fblaslapack --download-make
> --download-metis --download-parmetis --download-scalapack
> --download-suitesparse --download-superlu_dist-gpu=1
> --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++
> --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1
> --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1
> PETSC_ARCH=debug-mode-newell
>
> > [0]PETSC ERROR: #1 PetscCUDAInitialize() line 261 in
> /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
>
> > [0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in
> /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
>
> > [0]PETSC ERROR: #3 PetscInitialize() line 1010 in
> /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c
>
> >
> --------------------------------------------------------------------------
>
> > Primary job  terminated normally, but 1 process returned
>
> > a non-zero exit code. Per user-direction, the job has been aborted.
>
> >
> --------------------------------------------------------------------------
>
> >
> --------------------------------------------------------------------------
>
> > mpiexec detected that one or more processes exited with non-zero status,
> thus causing
>
> > the job to be terminated. The first process to do so was:
>
> >
>
> >   Process name: [[46545,1],0]
>
> >   Exit code:    88
>
> >
> --------------------------------------------------------------------------
>
> /people/abhy245/software/petsc/src/snes/examples/tutorials
>
> Possible problem with ex19 running with superlu_dist, diffs above
>
> =========================================
>
> Possible error running Fortran example src/snes/examples/tutorials/ex5f
> with 1 MPI process
>
> See http://www.mcs.anl.gov/petsc/documentation/faq.html
>
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
>
> [0]PETSC ERROR: Error in system call
>
> [0]PETSC ERROR: error in cudaSetDevice CUDA driver version is insufficient
> for CUDA runtime version
>
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
>
> [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown
>
> [0]PETSC ERROR: ./ex5f on a debug-mode-newell named newell01.pnl.gov by
> abhy245 Sun Feb 23 12:50:04 2020
>
> [0]PETSC ERROR: Configure options --download-fblaslapack --download-make
> --download-metis --download-parmetis --download-scalapack
> --download-suitesparse --download-superlu_dist-gpu=1
> --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++
> --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1
> --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1
> PETSC_ARCH=debug-mode-newell
>
> [0]PETSC ERROR: #1 PetscCUDAInitialize() line 261 in
> /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
>
> [0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in
> /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
>
> [0]PETSC ERROR: PetscInitialize:Checking initial options
>
>  Unable to initialize PETSc
>
> --------------------------------------------------------------------------
>
> mpiexec has exited due to process rank 0 with PID 0 on
>
> node newell01 exiting improperly. There are three reasons this could occur:
>
>
>
> 1. this process did not call "init" before exiting, but others in
>
> the job did. This can cause a job to hang indefinitely while it waits
>
> for all processes to call "init". By rule, if one process calls "init",
>
> then ALL processes must call "init" prior to termination.
>
>
>
> 2. this process called "init", but exited without calling "finalize".
>
> By rule, all processes that call "init" MUST call "finalize" prior to
>
> exiting or it will be considered an "abnormal termination"
>
>
>
> 3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
>
> orte_create_session_dirs is set to false. In this case, the run-time cannot
>
> detect that the abort call was an abnormal termination. Hence, the only
>
> error message you will receive is this one.
>
>
>
> This may have caused other processes in the application to be
>
> terminated by signals sent by mpiexec (as reported here).
>
>
>
> You can avoid this message by specifying -quiet on the mpiexec command
> line.
>
> --------------------------------------------------------------------------
>
> Completed test examples
>
> *From: *Satish Balay <balay at mcs.anl.gov>
> *Reply-To: *petsc-users <petsc-users at mcs.anl.gov>
> *Date: *Saturday, February 22, 2020 at 9:00 PM
> *To: *Junchao Zhang <jczhang at mcs.anl.gov>
> *Cc: *"Abhyankar, Shrirang G" <shrirang.abhyankar at pnnl.gov>, petsc-users <
> petsc-users at mcs.anl.gov>
> *Subject: *Re: [petsc-users] Using PETSc with GPU supported SuperLU_Dist
>
>
>
> The fix is now in both  maint and master
>
>
>
> https://gitlab.com/petsc/petsc/-/merge_requests/2555
>
>
>
> Satish
>
>
>
> On Sat, 22 Feb 2020, Junchao Zhang via petsc-users wrote:
>
>
>
> We met the error before and knew why. Will fix it soon.
>
> --Junchao Zhang
>
> On Sat, Feb 22, 2020 at 11:43 AM Abhyankar, Shrirang G via petsc-users <
>
> petsc-users at mcs.anl.gov> wrote:
>
> > Thanks, Satish. Configure and make go through fine. Getting an undefined
>
> > reference error for VecGetArrayWrite_SeqCUDA.
>
> >
>
> >
>
> >
>
> > Shri
>
> >
>
> > *From: *Satish Balay <balay at mcs.anl.gov>
>
> > *Reply-To: *petsc-users <petsc-users at mcs.anl.gov>
>
> > *Date: *Saturday, February 22, 2020 at 8:25 AM
>
> > *To: *"Abhyankar, Shrirang G" <shrirang.abhyankar at pnnl.gov>
>
> > *Cc: *"petsc-users at mcs.anl.gov" <petsc-users at mcs.anl.gov>
>
> > *Subject: *Re: [petsc-users] Using PETSc with GPU supported SuperLU_Dist
>
> >
>
> >
>
> >
>
> > On Sat, 22 Feb 2020, Abhyankar, Shrirang G via petsc-users wrote:
>
> >
>
> >
>
> >
>
> > Hi,
>
> >
>
> >     I want to install PETSc with GPU supported SuperLU_Dist. What are the
>
> > configure options I should be using?
>
> >
>
> >
>
> >
>
> >
>
> >
>
> > Shri,
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >     if self.framework.argDB['download-superlu_dist-gpu']:
>
> >
>
> >       self.cuda           =
> framework.require('config.packages.cuda',self)
>
> >
>
> >       self.openmp         =
>
> > framework.require('config.packages.openmp',self)
>
> >
>
> >       self.deps           =
>
> > [self.mpi,self.blasLapack,self.cuda,self.openmp]
>
> >
>
> > <<<<<
>
> >
>
> >
>
> >
>
> > So try:
>
> >
>
> >
>
> >
>
> > --with-cuda=1 --download-superlu_dist=1 --download-superlu_dist-gpu=1
>
> > --with-openmp=1 [and usual MPI, blaslapack]
>
> >
>
> >
>
> >
>
> > Satish
>
> >
>
> >
>
> >
>
> >
>
> >
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200224/44ed892a/attachment-0001.html>

From balay at mcs.anl.gov  Mon Feb 24 09:18:51 2020
From: balay at mcs.anl.gov (Satish Balay)
Date: Mon, 24 Feb 2020 09:18:51 -0600
Subject: [petsc-users] Using PETSc with GPU supported SuperLU_Dist
In-Reply-To: <CA+MQGp864q2-rQRWS+ir6FGjcDzWyfwKrCS-tYp2dNDg9akhJA@mail.gmail.com>
References: <264462B2-AE1F-4922-948E-0C6FCCB9A429@pnnl.gov>
	<alpine.LFD.2.21.2002220822150.2066@sb>
	<4BDB7C51-7452-45CC-A118-4D3F4F5D03D1@pnnl.gov>
	<CA+MQGp881gb7XOuKfVBeTZ+NEVzHWX6ipVBqS8GKSj=Np=TWsQ@mail.gmail.com>
	<alpine.LFD.2.21.2002222058480.2066@sb>
	<4C14C2B3-0CB1-4E5F-A414-D5FBC10F6F18@pnnl.gov>
	<BC43F79D-388D-4BDC-899C-CE74346EEBCF@pnnl.gov>
	<CA+MQGp864q2-rQRWS+ir6FGjcDzWyfwKrCS-tYp2dNDg9akhJA@mail.gmail.com>
Message-ID: <alpine.LFD.2.21.2002240915400.2066@sb>

nvidia-smi gives some relevant info. I'm not sure what exactly the cuda-version listed here refers to..

[is it the max version of cuda - this driver is compatible with?]

Satish

-----

[balay at p1 ~]$ nvidia-smi 
Mon Feb 24 09:15:26 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.59       Driver Version: 440.59       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro T2000        Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   45C    P8     4W /  N/A |    182MiB /  3911MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1372      G   /usr/libexec/Xorg                            180MiB |
+-----------------------------------------------------------------------------+
[balay at p1 ~]$ 


On Mon, 24 Feb 2020, Junchao Zhang via petsc-users wrote:

> [0]PETSC ERROR: error in cudaSetDevice CUDA driver version is insufficient
> for CUDA runtime version
> 
> That means you need to update your cuda driver for CUDA 10.2.  See minimum
> requirement in Table 1 at
> https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#major-components
> 
> --Junchao Zhang
> 
> 
> On Sun, Feb 23, 2020 at 3:33 PM Abhyankar, Shrirang G <
> shrirang.abhyankar at pnnl.gov> wrote:
> 
> > I was using CUDA v10.2. Switching to 9.2 gives a clean make test.
> >
> >
> >
> > Thanks,
> >
> > Shri
> >
> >
> >
> >
> >
> > *From: *petsc-users <petsc-users-bounces at mcs.anl.gov> on behalf of
> > "Abhyankar, Shrirang G via petsc-users" <petsc-users at mcs.anl.gov>
> > *Reply-To: *"Abhyankar, Shrirang G" <shrirang.abhyankar at pnnl.gov>
> > *Date: *Sunday, February 23, 2020 at 3:10 PM
> > *To: *petsc-users <petsc-users at mcs.anl.gov>, Junchao Zhang <
> > jczhang at mcs.anl.gov>
> > *Subject: *Re: [petsc-users] Using PETSc with GPU supported SuperLU_Dist
> >
> >
> >
> > I am getting an error now for CUDA driver version. Any suggestions?
> >
> >
> >
> > petsc:maint$ make test
> >
> > Running test examples to verify correct installation
> >
> > Using PETSC_DIR=/people/abhy245/software/petsc and
> > PETSC_ARCH=debug-mode-newell
> >
> > Possible error running C/C++ src/snes/examples/tutorials/ex19 with 1 MPI
> > process
> >
> > See http://www.mcs.anl.gov/petsc/documentation/faq.html
> >
> > [0]PETSC ERROR: --------------------- Error Message
> > --------------------------------------------------------------
> >
> > [0]PETSC ERROR: Error in system call
> >
> > [0]PETSC ERROR: error in cudaSetDevice CUDA driver version is insufficient
> > for CUDA runtime version
> >
> > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> > for trouble shooting.
> >
> > [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown
> >
> > [0]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by
> > abhy245 Sun Feb 23 12:49:55 2020
> >
> > [0]PETSC ERROR: Configure options --download-fblaslapack --download-make
> > --download-metis --download-parmetis --download-scalapack
> > --download-suitesparse --download-superlu_dist-gpu=1
> > --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++
> > --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1
> > --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1
> > PETSC_ARCH=debug-mode-newell
> >
> > [0]PETSC ERROR: #1 PetscCUDAInitialize() line 261 in
> > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
> >
> > [0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in
> > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
> >
> > [0]PETSC ERROR: #3 PetscInitialize() line 1010 in
> > /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c
> >
> > --------------------------------------------------------------------------
> >
> > Primary job  terminated normally, but 1 process returned
> >
> > a non-zero exit code. Per user-direction, the job has been aborted.
> >
> > --------------------------------------------------------------------------
> >
> > --------------------------------------------------------------------------
> >
> > mpiexec detected that one or more processes exited with non-zero status,
> > thus causing
> >
> > the job to be terminated. The first process to do so was:
> >
> >
> >
> >   Process name: [[46518,1],0]
> >
> >   Exit code:    88
> >
> > --------------------------------------------------------------------------
> >
> > Possible error running C/C++ src/snes/examples/tutorials/ex19 with 2 MPI
> > processes
> >
> > See http://www.mcs.anl.gov/petsc/documentation/faq.html
> >
> > [0]PETSC ERROR: --------------------- Error Message
> > --------------------------------------------------------------
> >
> > [1]PETSC ERROR: --------------------- Error Message
> > --------------------------------------------------------------
> >
> > [1]PETSC ERROR: Error in system call
> >
> > [1]PETSC ERROR: [0]PETSC ERROR: Error in system call
> >
> > [0]PETSC ERROR: error in cudaGetDeviceCount CUDA driver version is
> > insufficient for CUDA runtime version
> >
> > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> > for trouble shooting.
> >
> > error in cudaGetDeviceCount CUDA driver version is insufficient for CUDA
> > runtime version
> >
> > [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> > for trouble shooting.
> >
> > [1]PETSC ERROR: [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown
> >
> > [0]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by
> > abhy245 Sun Feb 23 12:49:57 2020
> >
> > [0]PETSC ERROR: Configure options --download-fblaslapack --download-make
> > --download-metis --download-parmetis --download-scalapack
> > --download-suitesparse --download-superlu_dist-gpu=1
> > --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++
> > --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1
> > --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1
> > PETSC_ARCH=debug-mode-newell
> >
> > [0]PETSC ERROR: #1 PetscCUDAInitialize() line 254 in
> > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
> >
> > [0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in
> > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
> >
> > [0]PETSC ERROR: #3 PetscInitialize() line 1010 in
> > /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c
> >
> > Petsc Release Version 3.12.4, unknown
> >
> > [1]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by
> > abhy245 Sun Feb 23 12:49:57 2020
> >
> > [1]PETSC ERROR: Configure options --download-fblaslapack --download-make
> > --download-metis --download-parmetis --download-scalapack
> > --download-suitesparse --download-superlu_dist-gpu=1
> > --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++
> > --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1
> > --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1
> > PETSC_ARCH=debug-mode-newell
> >
> > [1]PETSC ERROR: #1 PetscCUDAInitialize() line 254 in
> > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
> >
> > [1]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in
> > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
> >
> > [1]PETSC ERROR: #3 PetscInitialize() line 1010 in
> > /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c
> >
> > --------------------------------------------------------------------------
> >
> > Primary job  terminated normally, but 1 process returned
> >
> > a non-zero exit code. Per user-direction, the job has been aborted.
> >
> > --------------------------------------------------------------------------
> >
> > --------------------------------------------------------------------------
> >
> > mpiexec detected that one or more processes exited with non-zero status,
> > thus causing
> >
> > the job to be terminated. The first process to do so was:
> >
> >
> >
> >   Process name: [[46522,1],0]
> >
> >   Exit code:    88
> >
> > --------------------------------------------------------------------------
> >
> > 1,2c1,21
> >
> > < lid velocity = 0.0025, prandtl # = 1., grashof # = 1.
> >
> > < Number of SNES iterations = 2
> >
> > ---
> >
> > > [0]PETSC ERROR: --------------------- Error Message
> > --------------------------------------------------------------
> >
> > > [0]PETSC ERROR: Error in system call
> >
> > > [0]PETSC ERROR: error in cudaSetDevice CUDA driver version is
> > insufficient for CUDA runtime version
> >
> > > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> > for trouble shooting.
> >
> > > [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown
> >
> > > [0]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by
> > abhy245 Sun Feb 23 12:50:00 2020
> >
> > > [0]PETSC ERROR: Configure options --download-fblaslapack --download-make
> > --download-metis --download-parmetis --download-scalapack
> > --download-suitesparse --download-superlu_dist-gpu=1
> > --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++
> > --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1
> > --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1
> > PETSC_ARCH=debug-mode-newell
> >
> > > [0]PETSC ERROR: #1 PetscCUDAInitialize() line 261 in
> > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
> >
> > > [0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in
> > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
> >
> > > [0]PETSC ERROR: #3 PetscInitialize() line 1010 in
> > /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c
> >
> > >
> > --------------------------------------------------------------------------
> >
> > > Primary job  terminated normally, but 1 process returned
> >
> > > a non-zero exit code. Per user-direction, the job has been aborted.
> >
> > >
> > --------------------------------------------------------------------------
> >
> > >
> > --------------------------------------------------------------------------
> >
> > > mpiexec detected that one or more processes exited with non-zero status,
> > thus causing
> >
> > > the job to be terminated. The first process to do so was:
> >
> > >
> >
> > >   Process name: [[46545,1],0]
> >
> > >   Exit code:    88
> >
> > >
> > --------------------------------------------------------------------------
> >
> > /people/abhy245/software/petsc/src/snes/examples/tutorials
> >
> > Possible problem with ex19 running with superlu_dist, diffs above
> >
> > =========================================
> >
> > Possible error running Fortran example src/snes/examples/tutorials/ex5f
> > with 1 MPI process
> >
> > See http://www.mcs.anl.gov/petsc/documentation/faq.html
> >
> > [0]PETSC ERROR: --------------------- Error Message
> > --------------------------------------------------------------
> >
> > [0]PETSC ERROR: Error in system call
> >
> > [0]PETSC ERROR: error in cudaSetDevice CUDA driver version is insufficient
> > for CUDA runtime version
> >
> > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> > for trouble shooting.
> >
> > [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown
> >
> > [0]PETSC ERROR: ./ex5f on a debug-mode-newell named newell01.pnl.gov by
> > abhy245 Sun Feb 23 12:50:04 2020
> >
> > [0]PETSC ERROR: Configure options --download-fblaslapack --download-make
> > --download-metis --download-parmetis --download-scalapack
> > --download-suitesparse --download-superlu_dist-gpu=1
> > --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++
> > --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1
> > --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1
> > PETSC_ARCH=debug-mode-newell
> >
> > [0]PETSC ERROR: #1 PetscCUDAInitialize() line 261 in
> > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
> >
> > [0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in
> > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
> >
> > [0]PETSC ERROR: PetscInitialize:Checking initial options
> >
> >  Unable to initialize PETSc
> >
> > --------------------------------------------------------------------------
> >
> > mpiexec has exited due to process rank 0 with PID 0 on
> >
> > node newell01 exiting improperly. There are three reasons this could occur:
> >
> >
> >
> > 1. this process did not call "init" before exiting, but others in
> >
> > the job did. This can cause a job to hang indefinitely while it waits
> >
> > for all processes to call "init". By rule, if one process calls "init",
> >
> > then ALL processes must call "init" prior to termination.
> >
> >
> >
> > 2. this process called "init", but exited without calling "finalize".
> >
> > By rule, all processes that call "init" MUST call "finalize" prior to
> >
> > exiting or it will be considered an "abnormal termination"
> >
> >
> >
> > 3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
> >
> > orte_create_session_dirs is set to false. In this case, the run-time cannot
> >
> > detect that the abort call was an abnormal termination. Hence, the only
> >
> > error message you will receive is this one.
> >
> >
> >
> > This may have caused other processes in the application to be
> >
> > terminated by signals sent by mpiexec (as reported here).
> >
> >
> >
> > You can avoid this message by specifying -quiet on the mpiexec command
> > line.
> >
> > --------------------------------------------------------------------------
> >
> > Completed test examples
> >
> > *From: *Satish Balay <balay at mcs.anl.gov>
> > *Reply-To: *petsc-users <petsc-users at mcs.anl.gov>
> > *Date: *Saturday, February 22, 2020 at 9:00 PM
> > *To: *Junchao Zhang <jczhang at mcs.anl.gov>
> > *Cc: *"Abhyankar, Shrirang G" <shrirang.abhyankar at pnnl.gov>, petsc-users <
> > petsc-users at mcs.anl.gov>
> > *Subject: *Re: [petsc-users] Using PETSc with GPU supported SuperLU_Dist
> >
> >
> >
> > The fix is now in both  maint and master
> >
> >
> >
> > https://gitlab.com/petsc/petsc/-/merge_requests/2555
> >
> >
> >
> > Satish
> >
> >
> >
> > On Sat, 22 Feb 2020, Junchao Zhang via petsc-users wrote:
> >
> >
> >
> > We met the error before and knew why. Will fix it soon.
> >
> > --Junchao Zhang
> >
> > On Sat, Feb 22, 2020 at 11:43 AM Abhyankar, Shrirang G via petsc-users <
> >
> > petsc-users at mcs.anl.gov> wrote:
> >
> > > Thanks, Satish. Configure and make go through fine. Getting an undefined
> >
> > > reference error for VecGetArrayWrite_SeqCUDA.
> >
> > >
> >
> > >
> >
> > >
> >
> > > Shri
> >
> > >
> >
> > > *From: *Satish Balay <balay at mcs.anl.gov>
> >
> > > *Reply-To: *petsc-users <petsc-users at mcs.anl.gov>
> >
> > > *Date: *Saturday, February 22, 2020 at 8:25 AM
> >
> > > *To: *"Abhyankar, Shrirang G" <shrirang.abhyankar at pnnl.gov>
> >
> > > *Cc: *"petsc-users at mcs.anl.gov" <petsc-users at mcs.anl.gov>
> >
> > > *Subject: *Re: [petsc-users] Using PETSc with GPU supported SuperLU_Dist
> >
> > >
> >
> > >
> >
> > >
> >
> > > On Sat, 22 Feb 2020, Abhyankar, Shrirang G via petsc-users wrote:
> >
> > >
> >
> > >
> >
> > >
> >
> > > Hi,
> >
> > >
> >
> > >     I want to install PETSc with GPU supported SuperLU_Dist. What are the
> >
> > > configure options I should be using?
> >
> > >
> >
> > >
> >
> > >
> >
> > >
> >
> > >
> >
> > > Shri,
> >
> > >
> >
> > >
> >
> > >
> >
> > >
> >
> > >
> >
> > >     if self.framework.argDB['download-superlu_dist-gpu']:
> >
> > >
> >
> > >       self.cuda           =
> > framework.require('config.packages.cuda',self)
> >
> > >
> >
> > >       self.openmp         =
> >
> > > framework.require('config.packages.openmp',self)
> >
> > >
> >
> > >       self.deps           =
> >
> > > [self.mpi,self.blasLapack,self.cuda,self.openmp]
> >
> > >
> >
> > > <<<<<
> >
> > >
> >
> > >
> >
> > >
> >
> > > So try:
> >
> > >
> >
> > >
> >
> > >
> >
> > > --with-cuda=1 --download-superlu_dist=1 --download-superlu_dist-gpu=1
> >
> > > --with-openmp=1 [and usual MPI, blaslapack]
> >
> > >
> >
> > >
> >
> > >
> >
> > > Satish
> >
> > >
> >
> > >
> >
> > >
> >
> > >
> >
> > >
> >
> >
> >
> >
> >
> 


From balay at mcs.anl.gov  Mon Feb 24 09:35:36 2020
From: balay at mcs.anl.gov (Satish Balay)
Date: Mon, 24 Feb 2020 09:35:36 -0600
Subject: [petsc-users] Correct approach for updating deprecated code
In-Reply-To: <CAMYG4G=ddyuT3XBEGq0x4CYs4juAh2Qb-NoEk8Ajw1sWq98FHg@mail.gmail.com>
References: <CAEMparS-odeegKUZPvzP7qXJ1YVHnR2wnecdLGjYPRw9_N-p0A@mail.gmail.com>
	<CAMYG4Gn07rab2TL4mGGkCzqhNWEheo3E-9B30-61b8sM6Y2_Nw@mail.gmail.com>
	<CAEMparQ+nDjOBNXu6BJA=rEAuAU=WDDFe2cbWJFAp7WZeq33jA@mail.gmail.com>
	<CAMYG4G=ddyuT3XBEGq0x4CYs4juAh2Qb-NoEk8Ajw1sWq98FHg@mail.gmail.com>
Message-ID: <alpine.LFD.2.21.2002240934290.2066@sb>

Perhaps this is helpful.

https://gitlab.com/petsc/petsc/commits/6a9046bcf1dc7e213a87d3843bfa02f323786ad4

Satish

On Sun, 23 Feb 2020, Matthew Knepley wrote:

> I think you are going to have to send the error logs.
> 
>   Thanks,
> 
>      MAtt
> 
> On Sun, Feb 23, 2020 at 6:45 PM Richard Beare <richard.beare at monash.edu>
> wrote:
> 
> > That's what I did (see below), but I got ordering errors (unfortunately
> > deleted those logs too soon). I'll rerun if no one recognises what I've
> > done wrong.
> >
> > PetscViewer viewer1;
> > ierr = PetscViewerBinaryOpen(PETSC_COMM_WORLD,fileName.c_str
> > (),FILE_MODE_WRITE,&viewer1);CHKERRQ(ierr);
> > //ierr =
> > PetscViewerSetFormat(viewer1,PETSC_VIEWER_BINARY_MATLAB);CHKERRQ(ierr);
> > ierr = PetscViewerPushFormat(viewer1,PETSC_VIEWER_BINARY_MATLAB);CHKERRQ
> > (ierr);
> >
> > ierr = PetscObjectSetName((PetscObject)mX,"x");CHKERRQ(ierr);
> > ierr = PetscObjectSetName((PetscObject)mB,"b");CHKERRQ(ierr);
> >
> > On Mon, 24 Feb 2020 at 10:43, Matthew Knepley <knepley at gmail.com> wrote:
> >
> >> On Sun, Feb 23, 2020 at 6:25 PM Richard Beare via petsc-users <
> >> petsc-users at mcs.anl.gov> wrote:
> >>
> >>>
> >>> Hi,
> >>> The following code gives a deprecation warning. What is the correct way
> >>> of updating the use of ViewerSetFormat to ViewerPushFormat (which I presume
> >>> is the preferred replacement). My first attempt gave errors concerning
> >>> ordering.
> >>>
> >>
> >> You can't just change SetFormat to PushFormat here?
> >>
> >>   Matt
> >>
> >>
> >>> Thanks
> >>>
> >>> PetscViewer viewer1;
> >>> ierr = PetscViewerBinaryOpen(PETSC_COMM_WORLD,fileName.c_str
> >>> (),FILE_MODE_WRITE,&viewer1);CHKERRQ(ierr);
> >>> ierr = PetscViewerSetFormat(viewer1,PETSC_VIEWER_BINARY_MATLAB);CHKERRQ
> >>> (ierr);
> >>>
> >>> ierr = PetscObjectSetName((PetscObject)mX,"x");CHKERRQ(ierr);
> >>> ierr = PetscObjectSetName((PetscObject)mB,"b");CHKERRQ(ierr);
> >>>
> >>> ierr = VecView(mX,viewer1);CHKERRQ(ierr);
> >>> ierr = VecView(mB,viewer1);CHKERRQ(ierr);
> >>>
> >>>
> >>> --
> >>> --
> >>> A/Prof Richard Beare
> >>> Imaging and Bioinformatics, Peninsula Clinical School
> >>> orcid.org/0000-0002-7530-5664
> >>> Richard.Beare at monash.edu
> >>> +61 3 9788 1724
> >>>
> >>>
> >>>
> >>> Geospatial Research:
> >>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
> >>>
> >>
> >>
> >> --
> >> What most experimenters take for granted before they begin their
> >> experiments is infinitely more interesting than any results to which their
> >> experiments lead.
> >> -- Norbert Wiener
> >>
> >> https://www.cse.buffalo.edu/~knepley/
> >> <http://www.cse.buffalo.edu/~knepley/>
> >>
> >
> >
> > --
> > --
> > A/Prof Richard Beare
> > Imaging and Bioinformatics, Peninsula Clinical School
> > orcid.org/0000-0002-7530-5664
> > Richard.Beare at monash.edu
> > +61 3 9788 1724
> >
> >
> >
> > Geospatial Research:
> > https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
> >
> 
> 
> 


From knepley at gmail.com  Mon Feb 24 10:04:52 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 24 Feb 2020 11:04:52 -0500
Subject: [petsc-users] Correct approach for updating deprecated code
In-Reply-To: <CAEMparQ+nDjOBNXu6BJA=rEAuAU=WDDFe2cbWJFAp7WZeq33jA@mail.gmail.com>
References: <CAEMparS-odeegKUZPvzP7qXJ1YVHnR2wnecdLGjYPRw9_N-p0A@mail.gmail.com>
	<CAMYG4Gn07rab2TL4mGGkCzqhNWEheo3E-9B30-61b8sM6Y2_Nw@mail.gmail.com>
	<CAEMparQ+nDjOBNXu6BJA=rEAuAU=WDDFe2cbWJFAp7WZeq33jA@mail.gmail.com>
Message-ID: <CAMYG4GmZmpzPzzJboqdYKzWpBt=1vbqC3WqJ9UwixN+-aot5mw@mail.gmail.com>

On Sun, Feb 23, 2020 at 6:45 PM Richard Beare <richard.beare at monash.edu>
wrote:

> That's what I did (see below), but I got ordering errors (unfortunately
> deleted those logs too soon). I'll rerun if no one recognises what I've
> done wrong.
>
> PetscViewer viewer1;
> ierr = PetscViewerBinaryOpen(PETSC_COMM_WORLD,fileName.c_str
> (),FILE_MODE_WRITE,&viewer1);CHKERRQ(ierr);
> //ierr =
> PetscViewerSetFormat(viewer1,PETSC_VIEWER_BINARY_MATLAB);CHKERRQ(ierr);
> ierr = PetscViewerPushFormat(viewer1,PETSC_VIEWER_BINARY_MATLAB);CHKERRQ
> (ierr);
>

This should not cause problems. However, is it possible that somewhere you
are pushing a format
again and again without popping? This could exceed the stack size.

  Thanks,

    Matt


> ierr = PetscObjectSetName((PetscObject)mX,"x");CHKERRQ(ierr);
> ierr = PetscObjectSetName((PetscObject)mB,"b");CHKERRQ(ierr);
>
> On Mon, 24 Feb 2020 at 10:43, Matthew Knepley <knepley at gmail.com> wrote:
>
>> On Sun, Feb 23, 2020 at 6:25 PM Richard Beare via petsc-users <
>> petsc-users at mcs.anl.gov> wrote:
>>
>>>
>>> Hi,
>>> The following code gives a deprecation warning. What is the correct way
>>> of updating the use of ViewerSetFormat to ViewerPushFormat (which I presume
>>> is the preferred replacement). My first attempt gave errors concerning
>>> ordering.
>>>
>>
>> You can't just change SetFormat to PushFormat here?
>>
>>   Matt
>>
>>
>>> Thanks
>>>
>>> PetscViewer viewer1;
>>> ierr = PetscViewerBinaryOpen(PETSC_COMM_WORLD,fileName.c_str
>>> (),FILE_MODE_WRITE,&viewer1);CHKERRQ(ierr);
>>> ierr = PetscViewerSetFormat(viewer1,PETSC_VIEWER_BINARY_MATLAB);CHKERRQ
>>> (ierr);
>>>
>>> ierr = PetscObjectSetName((PetscObject)mX,"x");CHKERRQ(ierr);
>>> ierr = PetscObjectSetName((PetscObject)mB,"b");CHKERRQ(ierr);
>>>
>>> ierr = VecView(mX,viewer1);CHKERRQ(ierr);
>>> ierr = VecView(mB,viewer1);CHKERRQ(ierr);
>>>
>>>
>>> --
>>> --
>>> A/Prof Richard Beare
>>> Imaging and Bioinformatics, Peninsula Clinical School
>>> orcid.org/0000-0002-7530-5664
>>> Richard.Beare at monash.edu
>>> +61 3 9788 1724
>>>
>>>
>>>
>>> Geospatial Research:
>>> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>
>
> --
> --
> A/Prof Richard Beare
> Imaging and Bioinformatics, Peninsula Clinical School
> orcid.org/0000-0002-7530-5664
> Richard.Beare at monash.edu
> +61 3 9788 1724
>
>
>
> Geospatial Research:
> https://www.monash.edu/medicine/scs/medicine/research/geospatial-analysis
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200224/b22e0243/attachment-0001.html>

From juaneah at gmail.com  Mon Feb 24 16:50:53 2020
From: juaneah at gmail.com (Emmanuel Ayala)
Date: Mon, 24 Feb 2020 16:50:53 -0600
Subject: [petsc-users] Behavior of PetscViewerVTKOpen
Message-ID: <CAMo+o5iojhhKq=NnTsxxPWvRgXcjstR1NAe6ijTMeMH2904xSA@mail.gmail.com>

Hi everyone,

I think VTK is not the best option to save files, but I just want a quick
way to visualize a structured grid generated with DMDACreate3d. I visualize
the vts file with ParaView.

The situation is, If I change the number global dimension in each direction
of the array (*M,N,P *parameters for DMDACreate3d) it is supposed that the
size of the mesh won't changes, ONLY the number of division in the affected
direction, but it does NOT happen ( I used DMDASetUniformCoordinates to
have an uniform grid ). When I save the file, and then check it with
ParaView, the size changes. (I checked the coordinates with
DMGetCoordinates everything is OK, size and division), the problem is in
the visualization using VTK.

I use the next set of functions to save the file.

ierr =
PetscViewerVTKOpen(PETSC_COMM_WORLD,"nodes_coord.vts",FILE_MODE_WRITE,&view);
CHKERRQ(ierr);
ierr = DMGetCoordinates(da_nodes,&v1); CHKERRQ(ierr); // borrowed reference
ierr = VecView(coord,view);CHKERRQ(ierr);

Then, I realize that if I create a vector with DMCreateGlobalVector and
then copy in it the coordinate from DMGetCoordinates, the size remains
unchanged (in the visualization) and just the number of elements along the
direction change (when I modify M,N,P).

ierr =
PetscViewerVTKOpen(PETSC_COMM_WORLD,"nodes_coord.vts",FILE_MODE_WRITE,&view);
CHKERRQ(ierr);
ierr = DMGetCoordinates(da_nodes,&v1); CHKERRQ(ierr); // borrowed reference
ierr = DMCreateGlobalVector(da_nodes,&coord); CHKERRQ(ierr);
ierr = VecCopy(v1,coord); CHKERRQ(ierr);
ierr = VecView(coord,view);CHKERRQ(ierr);

*There is something wrong with PetscViewerVTKOpen or it's just the approach
that I used?*

Kind regards.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200224/ac579e5f/attachment.html>

From jed at jedbrown.org  Mon Feb 24 17:48:06 2020
From: jed at jedbrown.org (Jed Brown)
Date: Mon, 24 Feb 2020 16:48:06 -0700
Subject: [petsc-users] Behavior of PetscViewerVTKOpen
In-Reply-To: <CAMo+o5iojhhKq=NnTsxxPWvRgXcjstR1NAe6ijTMeMH2904xSA@mail.gmail.com>
References: <CAMo+o5iojhhKq=NnTsxxPWvRgXcjstR1NAe6ijTMeMH2904xSA@mail.gmail.com>
Message-ID: <87pne3o8cp.fsf@jedbrown.org>

Emmanuel Ayala <juaneah at gmail.com> writes:

> Hi everyone,
>
> I think VTK is not the best option to save files, but I just want a quick
> way to visualize a structured grid generated with DMDACreate3d. I visualize
> the vts file with ParaView.
>
> The situation is, If I change the number global dimension in each direction
> of the array (*M,N,P *parameters for DMDACreate3d) it is supposed that the
> size of the mesh won't changes, ONLY the number of division in the affected
> direction, but it does NOT happen ( I used DMDASetUniformCoordinates to
> have an uniform grid ). When I save the file, and then check it with
> ParaView, the size changes. (I checked the coordinates with
> DMGetCoordinates everything is OK, size and division), the problem is in
> the visualization using VTK.
>
> I use the next set of functions to save the file.

Normally you view a field defined on your DM (which will be correctly
mapped by your coordinates), not the coordinates (which reside on a
coordinate DM, which does not itself have coordinates).  Try something
like this:

  DMCreateGlobalVector(dm, &U);
  VecView(U, viewer);

> ierr =
> PetscViewerVTKOpen(PETSC_COMM_WORLD,"nodes_coord.vts",FILE_MODE_WRITE,&view);
> CHKERRQ(ierr);
> ierr = DMGetCoordinates(da_nodes,&v1); CHKERRQ(ierr); // borrowed reference
> ierr = VecView(coord,view);CHKERRQ(ierr);
>
> Then, I realize that if I create a vector with DMCreateGlobalVector and
> then copy in it the coordinate from DMGetCoordinates, the size remains
> unchanged (in the visualization) and just the number of elements along the
> direction change (when I modify M,N,P).
>
> ierr =
> PetscViewerVTKOpen(PETSC_COMM_WORLD,"nodes_coord.vts",FILE_MODE_WRITE,&view);
> CHKERRQ(ierr);
> ierr = DMGetCoordinates(da_nodes,&v1); CHKERRQ(ierr); // borrowed reference
> ierr = DMCreateGlobalVector(da_nodes,&coord); CHKERRQ(ierr);
> ierr = VecCopy(v1,coord); CHKERRQ(ierr);
> ierr = VecView(coord,view);CHKERRQ(ierr);
>
> *There is something wrong with PetscViewerVTKOpen or it's just the approach
> that I used?*
>
> Kind regards.

From juaneah at gmail.com  Mon Feb 24 18:11:42 2020
From: juaneah at gmail.com (Emmanuel Ayala)
Date: Mon, 24 Feb 2020 18:11:42 -0600
Subject: [petsc-users] Behavior of PetscViewerVTKOpen
In-Reply-To: <87pne3o8cp.fsf@jedbrown.org>
References: <CAMo+o5iojhhKq=NnTsxxPWvRgXcjstR1NAe6ijTMeMH2904xSA@mail.gmail.com>
	<87pne3o8cp.fsf@jedbrown.org>
Message-ID: <CAMo+o5gK+hKyodMmpkAfVo3E2qHadDK7Gg7DeUaDd5KWgsv-cQ@mail.gmail.com>

Ok, that's right, now i understand. Thanks!

Kind regards.

El lun., 24 de feb. de 2020 a la(s) 17:48, Jed Brown (jed at jedbrown.org)
escribi?:

> Emmanuel Ayala <juaneah at gmail.com> writes:
>
> > Hi everyone,
> >
> > I think VTK is not the best option to save files, but I just want a quick
> > way to visualize a structured grid generated with DMDACreate3d. I
> visualize
> > the vts file with ParaView.
> >
> > The situation is, If I change the number global dimension in each
> direction
> > of the array (*M,N,P *parameters for DMDACreate3d) it is supposed that
> the
> > size of the mesh won't changes, ONLY the number of division in the
> affected
> > direction, but it does NOT happen ( I used DMDASetUniformCoordinates to
> > have an uniform grid ). When I save the file, and then check it with
> > ParaView, the size changes. (I checked the coordinates with
> > DMGetCoordinates everything is OK, size and division), the problem is in
> > the visualization using VTK.
> >
> > I use the next set of functions to save the file.
>
> Normally you view a field defined on your DM (which will be correctly
> mapped by your coordinates), not the coordinates (which reside on a
> coordinate DM, which does not itself have coordinates).  Try something
> like this:
>
>   DMCreateGlobalVector(dm, &U);
>   VecView(U, viewer);
>
> > ierr =
> >
> PetscViewerVTKOpen(PETSC_COMM_WORLD,"nodes_coord.vts",FILE_MODE_WRITE,&view);
> > CHKERRQ(ierr);
> > ierr = DMGetCoordinates(da_nodes,&v1); CHKERRQ(ierr); // borrowed
> reference
> > ierr = VecView(coord,view);CHKERRQ(ierr);
> >
> > Then, I realize that if I create a vector with DMCreateGlobalVector and
> > then copy in it the coordinate from DMGetCoordinates, the size remains
> > unchanged (in the visualization) and just the number of elements along
> the
> > direction change (when I modify M,N,P).
> >
> > ierr =
> >
> PetscViewerVTKOpen(PETSC_COMM_WORLD,"nodes_coord.vts",FILE_MODE_WRITE,&view);
> > CHKERRQ(ierr);
> > ierr = DMGetCoordinates(da_nodes,&v1); CHKERRQ(ierr); // borrowed
> reference
> > ierr = DMCreateGlobalVector(da_nodes,&coord); CHKERRQ(ierr);
> > ierr = VecCopy(v1,coord); CHKERRQ(ierr);
> > ierr = VecView(coord,view);CHKERRQ(ierr);
> >
> > *There is something wrong with PetscViewerVTKOpen or it's just the
> approach
> > that I used?*
> >
> > Kind regards.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200224/a873653e/attachment.html>

From claudio.tomasi at usi.ch  Tue Feb 25 10:29:14 2020
From: claudio.tomasi at usi.ch (Claudio Tomasi)
Date: Tue, 25 Feb 2020 16:29:14 +0000
Subject: [petsc-users] DMDAVecGetArrayDOF for read-only vectors
Message-ID: <EFD26487-82DC-458C-BADD-0D0543A9EBEA@usi.ch>

Dear all,
I would like to ask whether there's an analogous of the routine DMDAVecGetArrayDOF for read-only vectors.
I'm using the SNES solver and in computing the SNESfunction I need to read and use the values of the solution vector x, which is created with DMCreateGlobalVector where the associated DM has two dofs per node.
Best regards,
Claudio Tomasi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200225/3fea1eb2/attachment.html>

From sajidsyed2021 at u.northwestern.edu  Tue Feb 25 10:37:09 2020
From: sajidsyed2021 at u.northwestern.edu (Sajid Ali)
Date: Tue, 25 Feb 2020 10:37:09 -0600
Subject: [petsc-users] Questions about TSAdjoint for time dependent
 parameters
Message-ID: <CAOGsD9juiQOBvy7ug0qLjep4T-Hpv+nTX33G1b+wGs412q=jdA@mail.gmail.com>

Hi PETSc-developers,

Could the code used for section 5.1 of the recent paper "PETSc TSAdjoint: a
discrete adjoint ODE solver for first-order and second-order sensitivity
analysis" be shared ? Are there more examples that deal with time dependent
parameters in the git repository ?

Another question I have is regarding the equations used to introduce
adjoints in section 7.1 of the manual where for the state of the solution
vector is denoted by y and the parameters by p.

[1] I'm unsure about what the partial derivative of y0 with respect to p
means since I understand y0 to be the initial conditions used to solve the
TS which would not depend on the parameters (since the parameters are
related to the equations TS tries to solve for which should not dependent
on the initialization used). Could someone clarify what this means ?

[2] The manual described that a user has to set the correct initialization
for the adjoint variables when calling TSSetCostGradients. The
initialization for mu vector is whereby given to be d?i/dp at t=tF. If p is
time dependent, does one evaluate this derivative with respect to p(t) at
t=tF ?

Thank You,
Sajid Ali | PhD Candidate
Applied Physics
Northwestern University
s-sajid-ali.github.io
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200225/309e4118/attachment-0001.html>

From ellen.price at cfa.harvard.edu  Tue Feb 25 10:54:23 2020
From: ellen.price at cfa.harvard.edu (Ellen M. Price)
Date: Tue, 25 Feb 2020 11:54:23 -0500
Subject: [petsc-users] Guidance on using Tao for unconstrained minimization
Message-ID: <9480b1d3-1ed3-8108-15d8-a5914312c5f2@cfa.harvard.edu>

Hello PETSc users!

I am using Tao for an unconstrained minimization problem. I have found
that CG works better than the other types for this application. After
about 85 iterations, I get an error about line search failure. I'm not
clear on what this means, or how I could mitigate the problem, and
neither the manual nor FAQ give any guidance. Can anyone suggest things
I could try to help the method converge? I have function and gradient
info, but no Hessian.

Thanks,
Ellen Price

From hongzhang at anl.gov  Tue Feb 25 10:59:43 2020
From: hongzhang at anl.gov (Zhang, Hong)
Date: Tue, 25 Feb 2020 16:59:43 +0000
Subject: [petsc-users] Questions about TSAdjoint for time dependent
 parameters
In-Reply-To: <CAOGsD9juiQOBvy7ug0qLjep4T-Hpv+nTX33G1b+wGs412q=jdA@mail.gmail.com>
References: <CAOGsD9juiQOBvy7ug0qLjep4T-Hpv+nTX33G1b+wGs412q=jdA@mail.gmail.com>
Message-ID: <C9C58316-CF76-4080-BFD4-78E50993F3DC@anl.gov>



On Feb 25, 2020, at 10:37 AM, Sajid Ali <sajidsyed2021 at u.northwestern.edu<mailto:sajidsyed2021 at u.northwestern.edu>> wrote:

Hi PETSc-developers,

Could the code used for section 5.1 of the recent paper "PETSc TSAdjoint: a discrete adjoint ODE solver for first-order and second-order sensitivity analysis" be shared ? Are there more examples that deal with time dependent parameters in the git repository ?

The code is in the master branch. See ts/examples/tutorials/optimal_control/ex1.c. This is the only example that deals with time-varying parameters.


Another question I have is regarding the equations used to introduce adjoints in section 7.1 of the manual where for the state of the solution vector is denoted by y and the parameters by p.

[1] I'm unsure about what the partial derivative of y0 with respect to p means since I understand y0 to be the initial conditions used to solve the TS which would not depend on the parameters (since the parameters are related to the equations TS tries to solve for which should not dependent on the initialization used). Could someone clarify what this means ?

 There exist applications that initial condition depends on the design parameters.


[2] The manual described that a user has to set the correct initialization for the adjoint variables when calling TSSetCostGradients. The initialization for mu vector is whereby given to be d?i/dp at t=tF. If p is time dependent, does one evaluate this derivative with respect to p(t) at t=tF ?

Yes

The adjoint solvers are designed to handle as many cases as possible. In your case, you may have simpler dependencies than those supported. If the initial condition and the objective function do not depend on the parameters directly, their partial derivatives wrt to p will be zero and you can simply ignore them.

Hong (Mr.)


Thank You,
Sajid Ali | PhD Candidate
Applied Physics
Northwestern University
s-sajid-ali.github.io<http://s-sajid-ali.github.io/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200225/4abf9f2a/attachment.html>

From sajidsyed2021 at u.northwestern.edu  Tue Feb 25 11:21:42 2020
From: sajidsyed2021 at u.northwestern.edu (Sajid Ali)
Date: Tue, 25 Feb 2020 11:21:42 -0600
Subject: [petsc-users] Questions about TSAdjoint for time dependent
 parameters
In-Reply-To: <C9C58316-CF76-4080-BFD4-78E50993F3DC@anl.gov>
References: <CAOGsD9juiQOBvy7ug0qLjep4T-Hpv+nTX33G1b+wGs412q=jdA@mail.gmail.com>
	<C9C58316-CF76-4080-BFD4-78E50993F3DC@anl.gov>
Message-ID: <CAOGsD9gW65Pm1jUKAp1RJp3uOWFnA727-2cS5pCmJAnkA2Q8EA@mail.gmail.com>

Hi Hong,

Thanks for the explanation!

If I have a cost function consisting of an L2 norm of the difference of a
TS-solution and some reference along with some constraints (say bounds,
L1-sparsity, total variation etc), would I provide a routine for gradient
evaluation of only the L2 norm (where TAO would take care of the
constraints) or do I also have to take the constraints into account (since
I'd also have to differentiate the regularizers) ?


Thank You,
Sajid Ali | PhD Candidate
Applied Physics
Northwestern University
s-sajid-ali.github.io
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200225/045d2519/attachment.html>

From knepley at gmail.com  Tue Feb 25 11:27:24 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 25 Feb 2020 12:27:24 -0500
Subject: [petsc-users] Questions about TSAdjoint for time dependent
 parameters
In-Reply-To: <CAOGsD9gW65Pm1jUKAp1RJp3uOWFnA727-2cS5pCmJAnkA2Q8EA@mail.gmail.com>
References: <CAOGsD9juiQOBvy7ug0qLjep4T-Hpv+nTX33G1b+wGs412q=jdA@mail.gmail.com>
	<C9C58316-CF76-4080-BFD4-78E50993F3DC@anl.gov>
	<CAOGsD9gW65Pm1jUKAp1RJp3uOWFnA727-2cS5pCmJAnkA2Q8EA@mail.gmail.com>
Message-ID: <CAMYG4GmzYak5Cbj+xy4fCoks-WhcpH8GUt-_c+8yV5sX+bJLSw@mail.gmail.com>

On Tue, Feb 25, 2020 at 12:23 PM Sajid Ali <sajidsyed2021 at u.northwestern.edu>
wrote:

> Hi Hong,
>
> Thanks for the explanation!
>
> If I have a cost function consisting of an L2 norm of the difference of a
> TS-solution and some reference along with some constraints (say bounds,
> L1-sparsity, total variation etc), would I provide a routine for gradient
> evaluation of only the L2 norm (where TAO would take care of the
> constraints) or do I also have to take the constraints into account (since
> I'd also have to differentiate the regularizers) ?
>

We want to have a framework for this separable case. The ADMM
implementation that was recently merged is a step in this direction.
See Alp's talk from SIAM PP 2020.

  Thanks,

    Matt


> Thank You,
> Sajid Ali | PhD Candidate
> Applied Physics
> Northwestern University
> s-sajid-ali.github.io
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200225/fb320331/attachment-0001.html>

From hongzhang at anl.gov  Tue Feb 25 11:49:35 2020
From: hongzhang at anl.gov (Zhang, Hong)
Date: Tue, 25 Feb 2020 17:49:35 +0000
Subject: [petsc-users] Questions about TSAdjoint for time dependent
 parameters
In-Reply-To: <CAOGsD9gW65Pm1jUKAp1RJp3uOWFnA727-2cS5pCmJAnkA2Q8EA@mail.gmail.com>
References: <CAOGsD9juiQOBvy7ug0qLjep4T-Hpv+nTX33G1b+wGs412q=jdA@mail.gmail.com>
	<C9C58316-CF76-4080-BFD4-78E50993F3DC@anl.gov>
	<CAOGsD9gW65Pm1jUKAp1RJp3uOWFnA727-2cS5pCmJAnkA2Q8EA@mail.gmail.com>
Message-ID: <918151C4-5C6E-4AF2-A540-854AB6B1AC32@anl.gov>



On Feb 25, 2020, at 11:21 AM, Sajid Ali <sajidsyed2021 at u.northwestern.edu<mailto:sajidsyed2021 at u.northwestern.edu>> wrote:

Hi Hong,

Thanks for the explanation!

If I have a cost function consisting of an L2 norm of the difference of a TS-solution and some reference along with some constraints (say bounds, L1-sparsity, total variation etc), would I provide a routine for gradient evaluation of only the L2 norm (where TAO would take care of the constraints) or do I also have to take the constraints into account (since I'd also have to differentiate the regularizers) ?

This depends on how you would like to formulate and solve your optimization problem. If you wan to use the built-in regularizers in TAO, then you just need provide gradient evaluation of the L2 norm. But TAO provides interfaces for users to provide customized regularizers and the gradient of them, in this case, again, adjoint can be used for the gradient calculation in the same way you handle objective functions/gradients. Of course, it is also possible to include regularizers in your objective function.

Hong


Thank You,
Sajid Ali | PhD Candidate
Applied Physics
Northwestern University
s-sajid-ali.github.io<http://s-sajid-ali.github.io/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200225/b6347250/attachment.html>

From jed at jedbrown.org  Tue Feb 25 12:49:58 2020
From: jed at jedbrown.org (Jed Brown)
Date: Tue, 25 Feb 2020 11:49:58 -0700
Subject: [petsc-users] DMDAVecGetArrayDOF for read-only vectors
In-Reply-To: <EFD26487-82DC-458C-BADD-0D0543A9EBEA@usi.ch>
References: <EFD26487-82DC-458C-BADD-0D0543A9EBEA@usi.ch>
Message-ID: <87tv3emrhl.fsf@jedbrown.org>

Claudio Tomasi <claudio.tomasi at usi.ch> writes:

> Dear all,
> I would like to ask whether there's an analogous of the routine DMDAVecGetArrayDOF for read-only vectors.

https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DMDA/DMDAVecGetArrayDOFRead.html

I see it's missing links to/from related pages, so I've added those in this MR

https://gitlab.com/petsc/petsc/-/merge_requests/2560

> I'm using the SNES solver and in computing the SNESfunction I need to read and use the values of the solution vector x, which is created with DMCreateGlobalVector where the associated DM has two dofs per node.
> Best regards,
> Claudio Tomasi

From mfadams at lbl.gov  Tue Feb 25 21:35:56 2020
From: mfadams at lbl.gov (Mark Adams)
Date: Tue, 25 Feb 2020 22:35:56 -0500
Subject: [petsc-users] [WARNING: UNSCANNABLE EXTRACTION FAILED]/usr/bin/ld:
 cannot find -lhwloc
Message-ID: <CADOhEh7209zRqcCwMMQ5b-kmH_6p6CZ_4DE++RdsHvOhPL7d9Q@mail.gmail.com>

We are running on a KNL and getting /usr/bin/ld: cannot find -lhwloc

This is v3.7.7.

I see -lhwloc in the PETSc stuff. We are also missing libX11.a and I am
configuring now with
--with-x=0 to try to fix that.

I've attached to full output and the logs.

Any ideas?
Thanks,
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200225/c2e672ef/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: output
Type: application/octet-stream
Size: 59425 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200225/c2e672ef/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: make.log.gz
Type: application/x-gzip
Size: 10750 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200225/c2e672ef/attachment-0002.gz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: configure.log.gz
Type: application/x-gzip
Size: 547697 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200225/c2e672ef/attachment-0003.gz>

From balay at mcs.anl.gov  Tue Feb 25 21:36:57 2020
From: balay at mcs.anl.gov (Satish Balay)
Date: Tue, 25 Feb 2020 21:36:57 -0600
Subject: [petsc-users] /usr/bin/ld: cannot find -lhwloc
In-Reply-To: <CADOhEh4BH9ZurMz8R-j47UEBjD-Y=ocjh7PCw3VhM0PqizO8rw@mail.gmail.com>
References: <CADOhEh4BH9ZurMz8R-j47UEBjD-Y=ocjh7PCw3VhM0PqizO8rw@mail.gmail.com>
Message-ID: <alpine.LFD.2.21.2002252134240.2172@sb>

Try

--with-hwloc=0

Satish

On Tue, 25 Feb 2020, Mark Adams wrote:

> We are running on a KNL and getting /usr/bin/ld: cannot find -lhwloc
> 
> This is v3.7.7.
> 
> I see -lhwloc in the PETSc stuff. We are also missing libX11.a and I am
> configuring now with
> --with-x=0 to try to fix that.
> 
> I've attached to full output and the logs.
> 
> Any ideas?
> Thanks,
> Mark
> 


From barrydog505 at gmail.com  Wed Feb 26 02:15:57 2020
From: barrydog505 at gmail.com (Tsung-Hsing Chen)
Date: Wed, 26 Feb 2020 16:15:57 +0800
Subject: [petsc-users] The question of the output from ksp/ex2.c
Message-ID: <CANZ1gTYSXxC9Mh6gLWucgGUTjQpj5T4JQ7Ogp3xHDz9w28iQhg@mail.gmail.com>

Hi,

I tried to run the example in ksp/examples/tutorials/ex2.
I run the code with : mpiexec -n 2 ./ex2 -ksp_monitor_short -m 5 -n 5
-ksp_gmres_cgs_refinement_type refine_always

the output is :
  0 KSP Residual norm 3.21109
  1 KSP Residual norm 0.93268
  2 KSP Residual norm 0.103515
  3 KSP Residual norm 0.00787798
  4 KSP Residual norm 0.000387275
Norm of error 0.000392701 iterations 4
  0 KSP Residual norm 3.21109
  1 KSP Residual norm 0.93268
  2 KSP Residual norm 0.103515
  3 KSP Residual norm 0.00787798
  4 KSP Residual norm 0.000387275
Norm of error 0.000392701 iterations 4

My output(above) is twice as the ksp/examples/tutorials/output/ex2_4.out.
Is this the right answer that should come out?

Thanks in advance,

Tsung-Hsing Chen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200226/38b36db2/attachment.html>

From stefano.zampini at gmail.com  Wed Feb 26 02:25:58 2020
From: stefano.zampini at gmail.com (Stefano Zampini)
Date: Wed, 26 Feb 2020 11:25:58 +0300
Subject: [petsc-users] The question of the output from ksp/ex2.c
In-Reply-To: <CANZ1gTYSXxC9Mh6gLWucgGUTjQpj5T4JQ7Ogp3xHDz9w28iQhg@mail.gmail.com>
References: <CANZ1gTYSXxC9Mh6gLWucgGUTjQpj5T4JQ7Ogp3xHDz9w28iQhg@mail.gmail.com>
Message-ID: <CAGPUisizhg5U13pmBk3-=sxEo5HMznMnYAvqM4srmaM+QrYudQ@mail.gmail.com>

This is what I get

[szampini at localhost tutorials]$ mpiexec -n 2 ./ex2 -ksp_monitor_short -m 5
-n 5 -ksp_gmres_cgs_refinement_type refine_always
  0 KSP Residual norm 2.73499
  1 KSP Residual norm 0.795482
  2 KSP Residual norm 0.261984
  3 KSP Residual norm 0.0752998
  4 KSP Residual norm 0.0230031
  5 KSP Residual norm 0.00521255
  6 KSP Residual norm 0.00145783
  7 KSP Residual norm 0.000277319
Norm of error 0.000292349 iterations 7

When I sequentially, I get (same output as yours)

[szampini at localhost tutorials]$ mpiexec -n 1 ./ex2 -ksp_monitor_short -m 5
-n 5 -ksp_gmres_cgs_refinement_type refine_always
  0 KSP Residual norm 3.21109
  1 KSP Residual norm 0.93268
  2 KSP Residual norm 0.103515
  3 KSP Residual norm 0.00787798
  4 KSP Residual norm 0.000387275
Norm of error 0.000392701 iterations 4

This means you are using the wrong mpiexec

Il giorno mer 26 feb 2020 alle ore 11:17 Tsung-Hsing Chen <
barrydog505 at gmail.com> ha scritto:

> Hi,
>
> I tried to run the example in ksp/examples/tutorials/ex2.
> I run the code with : mpiexec -n 2 ./ex2 -ksp_monitor_short -m 5 -n 5
> -ksp_gmres_cgs_refinement_type refine_always
>
> the output is :
>   0 KSP Residual norm 3.21109
>   1 KSP Residual norm 0.93268
>   2 KSP Residual norm 0.103515
>   3 KSP Residual norm 0.00787798
>   4 KSP Residual norm 0.000387275
> Norm of error 0.000392701 iterations 4
>   0 KSP Residual norm 3.21109
>   1 KSP Residual norm 0.93268
>   2 KSP Residual norm 0.103515
>   3 KSP Residual norm 0.00787798
>   4 KSP Residual norm 0.000387275
> Norm of error 0.000392701 iterations 4
>
> My output(above) is twice as the ksp/examples/tutorials/output/ex2_4.out.
> Is this the right answer that should come out?
>
> Thanks in advance,
>
> Tsung-Hsing Chen
>


-- 
Stefano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200226/0d62799c/attachment.html>

From barrydog505 at gmail.com  Wed Feb 26 02:37:56 2020
From: barrydog505 at gmail.com (Tsung-Hsing Chen)
Date: Wed, 26 Feb 2020 16:37:56 +0800
Subject: [petsc-users] The question of the output from ksp/ex2.c
In-Reply-To: <CAGPUisizhg5U13pmBk3-=sxEo5HMznMnYAvqM4srmaM+QrYudQ@mail.gmail.com>
References: <CANZ1gTYSXxC9Mh6gLWucgGUTjQpj5T4JQ7Ogp3xHDz9w28iQhg@mail.gmail.com>
	<CAGPUisizhg5U13pmBk3-=sxEo5HMznMnYAvqM4srmaM+QrYudQ@mail.gmail.com>
Message-ID: <CANZ1gTbZpFD4FkmjP8VXK=bzBVnioZXZsNgqkVHrgtNBYJGtZA@mail.gmail.com>

So, What should I do to use the correct mpiexec?
Am I configure petsc with the wrong way or something should be done?

Stefano Zampini <stefano.zampini at gmail.com> ? 2020?2?26? ?? ??4:26???

> This is what I get
>
> [szampini at localhost tutorials]$ mpiexec -n 2 ./ex2 -ksp_monitor_short -m
> 5 -n 5 -ksp_gmres_cgs_refinement_type refine_always
>   0 KSP Residual norm 2.73499
>   1 KSP Residual norm 0.795482
>   2 KSP Residual norm 0.261984
>   3 KSP Residual norm 0.0752998
>   4 KSP Residual norm 0.0230031
>   5 KSP Residual norm 0.00521255
>   6 KSP Residual norm 0.00145783
>   7 KSP Residual norm 0.000277319
> Norm of error 0.000292349 iterations 7
>
> When I sequentially, I get (same output as yours)
>
> [szampini at localhost tutorials]$ mpiexec -n 1 ./ex2 -ksp_monitor_short -m
> 5 -n 5 -ksp_gmres_cgs_refinement_type refine_always
>   0 KSP Residual norm 3.21109
>   1 KSP Residual norm 0.93268
>   2 KSP Residual norm 0.103515
>   3 KSP Residual norm 0.00787798
>   4 KSP Residual norm 0.000387275
> Norm of error 0.000392701 iterations 4
>
> This means you are using the wrong mpiexec
>
> Il giorno mer 26 feb 2020 alle ore 11:17 Tsung-Hsing Chen <
> barrydog505 at gmail.com> ha scritto:
>
>> Hi,
>>
>> I tried to run the example in ksp/examples/tutorials/ex2.
>> I run the code with : mpiexec -n 2 ./ex2 -ksp_monitor_short -m 5 -n 5
>> -ksp_gmres_cgs_refinement_type refine_always
>>
>> the output is :
>>   0 KSP Residual norm 3.21109
>>   1 KSP Residual norm 0.93268
>>   2 KSP Residual norm 0.103515
>>   3 KSP Residual norm 0.00787798
>>   4 KSP Residual norm 0.000387275
>> Norm of error 0.000392701 iterations 4
>>   0 KSP Residual norm 3.21109
>>   1 KSP Residual norm 0.93268
>>   2 KSP Residual norm 0.103515
>>   3 KSP Residual norm 0.00787798
>>   4 KSP Residual norm 0.000387275
>> Norm of error 0.000392701 iterations 4
>>
>> My output(above) is twice as the ksp/examples/tutorials/output/ex2_4.out.
>> Is this the right answer that should come out?
>>
>> Thanks in advance,
>>
>> Tsung-Hsing Chen
>>
>
>
> --
> Stefano
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200226/befcde26/attachment.html>

From stefano.zampini at gmail.com  Wed Feb 26 02:50:32 2020
From: stefano.zampini at gmail.com (Stefano Zampini)
Date: Wed, 26 Feb 2020 11:50:32 +0300
Subject: [petsc-users] The question of the output from ksp/ex2.c
In-Reply-To: <CANZ1gTbZpFD4FkmjP8VXK=bzBVnioZXZsNgqkVHrgtNBYJGtZA@mail.gmail.com>
References: <CANZ1gTYSXxC9Mh6gLWucgGUTjQpj5T4JQ7Ogp3xHDz9w28iQhg@mail.gmail.com>
	<CAGPUisizhg5U13pmBk3-=sxEo5HMznMnYAvqM4srmaM+QrYudQ@mail.gmail.com>
	<CANZ1gTbZpFD4FkmjP8VXK=bzBVnioZXZsNgqkVHrgtNBYJGtZA@mail.gmail.com>
Message-ID: <CAGPUisjaxhQmbjeLvACK+0X2AsD_N5Dq1S6DYahj_0XE0-F-Gg@mail.gmail.com>

First, make sure you compiled with support for MPI by running make check

[szampini at localhost petsc]$ make check
Running test examples to verify correct installation
Using PETSC_DIR=/home/szampini/Devel/jointinversion/pkgs/petsc and
PETSC_ARCH=arch-joint
C/C++ example src/snes/examples/tutorials/ex19 run successfully with 1 MPI
process
C/C++ example src/snes/examples/tutorials/ex19 run successfully with 2 MPI
processes
C/C++ example src/snes/examples/tutorials/ex19 run successfully with hypre
C/C++ example src/snes/examples/tutorials/ex19 run successfully with mumps
Completed test examples

if you have the "2 MPI processes" output, then type

[szampini at localhost petsc]$ make -f gmakefile print VAR=MPIEXEC
mpiexec

For me, mpiexec is system-wide.

Il giorno mer 26 feb 2020 alle ore 11:38 Tsung-Hsing Chen <
barrydog505 at gmail.com> ha scritto:

> So, What should I do to use the correct mpiexec?
> Am I configure petsc with the wrong way or something should be done?
>
> Stefano Zampini <stefano.zampini at gmail.com> ? 2020?2?26? ?? ??4:26???
>
>> This is what I get
>>
>> [szampini at localhost tutorials]$ mpiexec -n 2 ./ex2 -ksp_monitor_short -m
>> 5 -n 5 -ksp_gmres_cgs_refinement_type refine_always
>>   0 KSP Residual norm 2.73499
>>   1 KSP Residual norm 0.795482
>>   2 KSP Residual norm 0.261984
>>   3 KSP Residual norm 0.0752998
>>   4 KSP Residual norm 0.0230031
>>   5 KSP Residual norm 0.00521255
>>   6 KSP Residual norm 0.00145783
>>   7 KSP Residual norm 0.000277319
>> Norm of error 0.000292349 iterations 7
>>
>> When I sequentially, I get (same output as yours)
>>
>> [szampini at localhost tutorials]$ mpiexec -n 1 ./ex2 -ksp_monitor_short -m
>> 5 -n 5 -ksp_gmres_cgs_refinement_type refine_always
>>   0 KSP Residual norm 3.21109
>>   1 KSP Residual norm 0.93268
>>   2 KSP Residual norm 0.103515
>>   3 KSP Residual norm 0.00787798
>>   4 KSP Residual norm 0.000387275
>> Norm of error 0.000392701 iterations 4
>>
>> This means you are using the wrong mpiexec
>>
>> Il giorno mer 26 feb 2020 alle ore 11:17 Tsung-Hsing Chen <
>> barrydog505 at gmail.com> ha scritto:
>>
>>> Hi,
>>>
>>> I tried to run the example in ksp/examples/tutorials/ex2.
>>> I run the code with : mpiexec -n 2 ./ex2 -ksp_monitor_short -m 5 -n 5
>>> -ksp_gmres_cgs_refinement_type refine_always
>>>
>>> the output is :
>>>   0 KSP Residual norm 3.21109
>>>   1 KSP Residual norm 0.93268
>>>   2 KSP Residual norm 0.103515
>>>   3 KSP Residual norm 0.00787798
>>>   4 KSP Residual norm 0.000387275
>>> Norm of error 0.000392701 iterations 4
>>>   0 KSP Residual norm 3.21109
>>>   1 KSP Residual norm 0.93268
>>>   2 KSP Residual norm 0.103515
>>>   3 KSP Residual norm 0.00787798
>>>   4 KSP Residual norm 0.000387275
>>> Norm of error 0.000392701 iterations 4
>>>
>>> My output(above) is twice as the ksp/examples/tutorials/output/ex2_4.out.
>>> Is this the right answer that should come out?
>>>
>>> Thanks in advance,
>>>
>>> Tsung-Hsing Chen
>>>
>>
>>
>> --
>> Stefano
>>
>

-- 
Stefano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200226/22dff238/attachment-0001.html>

From barrydog505 at gmail.com  Wed Feb 26 04:21:24 2020
From: barrydog505 at gmail.com (Tsung-Hsing Chen)
Date: Wed, 26 Feb 2020 18:21:24 +0800
Subject: [petsc-users] The question of the output from ksp/ex2.c
In-Reply-To: <CAGPUisjaxhQmbjeLvACK+0X2AsD_N5Dq1S6DYahj_0XE0-F-Gg@mail.gmail.com>
References: <CANZ1gTYSXxC9Mh6gLWucgGUTjQpj5T4JQ7Ogp3xHDz9w28iQhg@mail.gmail.com>
	<CAGPUisizhg5U13pmBk3-=sxEo5HMznMnYAvqM4srmaM+QrYudQ@mail.gmail.com>
	<CANZ1gTbZpFD4FkmjP8VXK=bzBVnioZXZsNgqkVHrgtNBYJGtZA@mail.gmail.com>
	<CAGPUisjaxhQmbjeLvACK+0X2AsD_N5Dq1S6DYahj_0XE0-F-Gg@mail.gmail.com>
Message-ID: <CANZ1gTZX7zafKLbXnyqraahzeLoKeVdvvcw5hz-bJUuu0V7sWg@mail.gmail.com>

Unfortunately, it still no work for me.
what I do is
first : ./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran
--download-mpich --download-fblaslapack
then make ......, and make check.
the output has shown that "C/C++ example src/snes/examples/tutorials/ex19
run successfully with 2 MPI processes".
last, I type "make -f gmakefile print VAR=MPIEXEC".

And I went running ex2, the problem still exists.
Is there needed to do anything else before I run ex2?
By the way, should I move to petsc-maint at mcs.anl.gov for the upcoming
question?


Stefano Zampini <stefano.zampini at gmail.com> ? 2020?2?26? ?? ??4:50???

> First, make sure you compiled with support for MPI by running make check
>
> [szampini at localhost petsc]$ make check
> Running test examples to verify correct installation
> Using PETSC_DIR=/home/szampini/Devel/jointinversion/pkgs/petsc and
> PETSC_ARCH=arch-joint
> C/C++ example src/snes/examples/tutorials/ex19 run successfully with 1 MPI
> process
> C/C++ example src/snes/examples/tutorials/ex19 run successfully with 2 MPI
> processes
> C/C++ example src/snes/examples/tutorials/ex19 run successfully with hypre
> C/C++ example src/snes/examples/tutorials/ex19 run successfully with mumps
> Completed test examples
>
> if you have the "2 MPI processes" output, then type
>
> [szampini at localhost petsc]$ make -f gmakefile print VAR=MPIEXEC
> mpiexec
>
> For me, mpiexec is system-wide.
>
> Il giorno mer 26 feb 2020 alle ore 11:38 Tsung-Hsing Chen <
> barrydog505 at gmail.com> ha scritto:
>
>> So, What should I do to use the correct mpiexec?
>> Am I configure petsc with the wrong way or something should be done?
>>
>> Stefano Zampini <stefano.zampini at gmail.com> ? 2020?2?26? ?? ??4:26???
>>
>>> This is what I get
>>>
>>> [szampini at localhost tutorials]$ mpiexec -n 2 ./ex2 -ksp_monitor_short
>>> -m 5 -n 5 -ksp_gmres_cgs_refinement_type refine_always
>>>   0 KSP Residual norm 2.73499
>>>   1 KSP Residual norm 0.795482
>>>   2 KSP Residual norm 0.261984
>>>   3 KSP Residual norm 0.0752998
>>>   4 KSP Residual norm 0.0230031
>>>   5 KSP Residual norm 0.00521255
>>>   6 KSP Residual norm 0.00145783
>>>   7 KSP Residual norm 0.000277319
>>> Norm of error 0.000292349 iterations 7
>>>
>>> When I sequentially, I get (same output as yours)
>>>
>>> [szampini at localhost tutorials]$ mpiexec -n 1 ./ex2 -ksp_monitor_short
>>> -m 5 -n 5 -ksp_gmres_cgs_refinement_type refine_always
>>>   0 KSP Residual norm 3.21109
>>>   1 KSP Residual norm 0.93268
>>>   2 KSP Residual norm 0.103515
>>>   3 KSP Residual norm 0.00787798
>>>   4 KSP Residual norm 0.000387275
>>> Norm of error 0.000392701 iterations 4
>>>
>>> This means you are using the wrong mpiexec
>>>
>>> Il giorno mer 26 feb 2020 alle ore 11:17 Tsung-Hsing Chen <
>>> barrydog505 at gmail.com> ha scritto:
>>>
>>>> Hi,
>>>>
>>>> I tried to run the example in ksp/examples/tutorials/ex2.
>>>> I run the code with : mpiexec -n 2 ./ex2 -ksp_monitor_short -m 5 -n 5
>>>> -ksp_gmres_cgs_refinement_type refine_always
>>>>
>>>> the output is :
>>>>   0 KSP Residual norm 3.21109
>>>>   1 KSP Residual norm 0.93268
>>>>   2 KSP Residual norm 0.103515
>>>>   3 KSP Residual norm 0.00787798
>>>>   4 KSP Residual norm 0.000387275
>>>> Norm of error 0.000392701 iterations 4
>>>>   0 KSP Residual norm 3.21109
>>>>   1 KSP Residual norm 0.93268
>>>>   2 KSP Residual norm 0.103515
>>>>   3 KSP Residual norm 0.00787798
>>>>   4 KSP Residual norm 0.000387275
>>>> Norm of error 0.000392701 iterations 4
>>>>
>>>> My output(above) is twice as
>>>> the ksp/examples/tutorials/output/ex2_4.out.
>>>> Is this the right answer that should come out?
>>>>
>>>> Thanks in advance,
>>>>
>>>> Tsung-Hsing Chen
>>>>
>>>
>>>
>>> --
>>> Stefano
>>>
>>
>
> --
> Stefano
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200226/13d76560/attachment.html>

From barrydog505 at gmail.com  Wed Feb 26 04:59:31 2020
From: barrydog505 at gmail.com (Tsung-Hsing Chen)
Date: Wed, 26 Feb 2020 18:59:31 +0800
Subject: [petsc-users] The question of the output from ksp/ex2.c
In-Reply-To: <CANZ1gTZX7zafKLbXnyqraahzeLoKeVdvvcw5hz-bJUuu0V7sWg@mail.gmail.com>
References: <CANZ1gTYSXxC9Mh6gLWucgGUTjQpj5T4JQ7Ogp3xHDz9w28iQhg@mail.gmail.com>
	<CAGPUisizhg5U13pmBk3-=sxEo5HMznMnYAvqM4srmaM+QrYudQ@mail.gmail.com>
	<CANZ1gTbZpFD4FkmjP8VXK=bzBVnioZXZsNgqkVHrgtNBYJGtZA@mail.gmail.com>
	<CAGPUisjaxhQmbjeLvACK+0X2AsD_N5Dq1S6DYahj_0XE0-F-Gg@mail.gmail.com>
	<CANZ1gTZX7zafKLbXnyqraahzeLoKeVdvvcw5hz-bJUuu0V7sWg@mail.gmail.com>
Message-ID: <CANZ1gTYs-eyOPfj7-0OThA50xuLctzrB_xcT84uq-PRFTJBRSg@mail.gmail.com>

I think I just found out what happened.
There is another mpi "openmpi" that already exists on my computer.
After I remove it then all back to normal.

Thanks for your assistance,

Tsung-Hsing Chen


Tsung-Hsing Chen <barrydog505 at gmail.com> ? 2020?2?26? ?? ??6:21???

> Unfortunately, it still no work for me.
> what I do is
> first : ./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran
> --download-mpich --download-fblaslapack
> then make ......, and make check.
> the output has shown that "C/C++ example src/snes/examples/tutorials/ex19
> run successfully with 2 MPI processes".
> last, I type "make -f gmakefile print VAR=MPIEXEC".
>
> And I went running ex2, the problem still exists.
> Is there needed to do anything else before I run ex2?
> By the way, should I move to petsc-maint at mcs.anl.gov for the upcoming
> question?
>
>
> Stefano Zampini <stefano.zampini at gmail.com> ? 2020?2?26? ?? ??4:50???
>
>> First, make sure you compiled with support for MPI by running make check
>>
>> [szampini at localhost petsc]$ make check
>> Running test examples to verify correct installation
>> Using PETSC_DIR=/home/szampini/Devel/jointinversion/pkgs/petsc and
>> PETSC_ARCH=arch-joint
>> C/C++ example src/snes/examples/tutorials/ex19 run successfully with 1
>> MPI process
>> C/C++ example src/snes/examples/tutorials/ex19 run successfully with 2
>> MPI processes
>> C/C++ example src/snes/examples/tutorials/ex19 run successfully with hypre
>> C/C++ example src/snes/examples/tutorials/ex19 run successfully with mumps
>> Completed test examples
>>
>> if you have the "2 MPI processes" output, then type
>>
>> [szampini at localhost petsc]$ make -f gmakefile print VAR=MPIEXEC
>> mpiexec
>>
>> For me, mpiexec is system-wide.
>>
>> Il giorno mer 26 feb 2020 alle ore 11:38 Tsung-Hsing Chen <
>> barrydog505 at gmail.com> ha scritto:
>>
>>> So, What should I do to use the correct mpiexec?
>>> Am I configure petsc with the wrong way or something should be done?
>>>
>>> Stefano Zampini <stefano.zampini at gmail.com> ? 2020?2?26? ?? ??4:26???
>>>
>>>> This is what I get
>>>>
>>>> [szampini at localhost tutorials]$ mpiexec -n 2 ./ex2 -ksp_monitor_short
>>>> -m 5 -n 5 -ksp_gmres_cgs_refinement_type refine_always
>>>>   0 KSP Residual norm 2.73499
>>>>   1 KSP Residual norm 0.795482
>>>>   2 KSP Residual norm 0.261984
>>>>   3 KSP Residual norm 0.0752998
>>>>   4 KSP Residual norm 0.0230031
>>>>   5 KSP Residual norm 0.00521255
>>>>   6 KSP Residual norm 0.00145783
>>>>   7 KSP Residual norm 0.000277319
>>>> Norm of error 0.000292349 iterations 7
>>>>
>>>> When I sequentially, I get (same output as yours)
>>>>
>>>> [szampini at localhost tutorials]$ mpiexec -n 1 ./ex2 -ksp_monitor_short
>>>> -m 5 -n 5 -ksp_gmres_cgs_refinement_type refine_always
>>>>   0 KSP Residual norm 3.21109
>>>>   1 KSP Residual norm 0.93268
>>>>   2 KSP Residual norm 0.103515
>>>>   3 KSP Residual norm 0.00787798
>>>>   4 KSP Residual norm 0.000387275
>>>> Norm of error 0.000392701 iterations 4
>>>>
>>>> This means you are using the wrong mpiexec
>>>>
>>>> Il giorno mer 26 feb 2020 alle ore 11:17 Tsung-Hsing Chen <
>>>> barrydog505 at gmail.com> ha scritto:
>>>>
>>>>> Hi,
>>>>>
>>>>> I tried to run the example in ksp/examples/tutorials/ex2.
>>>>> I run the code with : mpiexec -n 2 ./ex2 -ksp_monitor_short -m 5 -n 5
>>>>> -ksp_gmres_cgs_refinement_type refine_always
>>>>>
>>>>> the output is :
>>>>>   0 KSP Residual norm 3.21109
>>>>>   1 KSP Residual norm 0.93268
>>>>>   2 KSP Residual norm 0.103515
>>>>>   3 KSP Residual norm 0.00787798
>>>>>   4 KSP Residual norm 0.000387275
>>>>> Norm of error 0.000392701 iterations 4
>>>>>   0 KSP Residual norm 3.21109
>>>>>   1 KSP Residual norm 0.93268
>>>>>   2 KSP Residual norm 0.103515
>>>>>   3 KSP Residual norm 0.00787798
>>>>>   4 KSP Residual norm 0.000387275
>>>>> Norm of error 0.000392701 iterations 4
>>>>>
>>>>> My output(above) is twice as
>>>>> the ksp/examples/tutorials/output/ex2_4.out.
>>>>> Is this the right answer that should come out?
>>>>>
>>>>> Thanks in advance,
>>>>>
>>>>> Tsung-Hsing Chen
>>>>>
>>>>
>>>>
>>>> --
>>>> Stefano
>>>>
>>>
>>
>> --
>> Stefano
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200226/6d8d4d28/attachment.html>

From jed at jedbrown.org  Wed Feb 26 11:44:03 2020
From: jed at jedbrown.org (Jed Brown)
Date: Wed, 26 Feb 2020 10:44:03 -0700
Subject: [petsc-users] Guidance on using Tao for unconstrained
 minimization
In-Reply-To: <9480b1d3-1ed3-8108-15d8-a5914312c5f2@cfa.harvard.edu>
References: <9480b1d3-1ed3-8108-15d8-a5914312c5f2@cfa.harvard.edu>
Message-ID: <87ftexjlb0.fsf@jedbrown.org>

Could you share output for your current configuration with -tao_monitor -tao_ls_monitor -tao_view?

"Ellen M. Price" <ellen.price at cfa.harvard.edu> writes:

> Hello PETSc users!
>
> I am using Tao for an unconstrained minimization problem. I have found
> that CG works better than the other types for this application. After
> about 85 iterations, I get an error about line search failure. I'm not
> clear on what this means, or how I could mitigate the problem, and
> neither the manual nor FAQ give any guidance. Can anyone suggest things
> I could try to help the method converge? I have function and gradient
> info, but no Hessian.
>
> Thanks,
> Ellen Price

From adener at anl.gov  Wed Feb 26 12:02:47 2020
From: adener at anl.gov (Dener, Alp)
Date: Wed, 26 Feb 2020 18:02:47 +0000
Subject: [petsc-users] Guidance on using Tao for unconstrained
 minimization
In-Reply-To: <87ftexjlb0.fsf@jedbrown.org>
References: <9480b1d3-1ed3-8108-15d8-a5914312c5f2@cfa.harvard.edu>
	<87ftexjlb0.fsf@jedbrown.org>
Message-ID: <etPan.5e56b2c6.7b1f2a1a.108b1@anl.gov>

Hi Ellen,

Per Jed?s suggestion, seeing the monitor and ls_monitor outputs would certainly be helpful.

The line search for CG (and other Tao algorithms) have safeguard steps for failures. When the line search fails to determine a valid step length for the computed CG direction, the search direction falls back to gradient descent for a second line search. If the gradient descent step succeeds here, then the CG updates restart again from that point (discarding previously updated information completely). LS failure is reported to the user only if this safeguard also fails to produce a viable step length, which then suggests that the computed gradient at that point may be incorrect or have significant numerical errors.

If you can afford a slow run for debugging, you can use ?-tao_test_gradient? to check your gradient against the FD approximation at every iteration throughout the run. If you?re confident that the gradient is accurate, I would recommend testing with ?-tao_bncg_type gd? for a pure gradient descent run, and also trying out ?-tao_type bqnls? for the quasi-Newton method (only requires the gradient, no Hessian).

?
Alp Dener
Postdoctoral Researcher
Argonne National Laboratory
https://www.anl.gov/profile/alp-dener


On February 26, 2020 at 11:44:15 AM, Jed Brown (jed at jedbrown.org<mailto:jed at jedbrown.org>) wrote:

Could you share output for your current configuration with -tao_monitor -tao_ls_monitor -tao_view?

"Ellen M. Price" <ellen.price at cfa.harvard.edu> writes:

> Hello PETSc users!
>
> I am using Tao for an unconstrained minimization problem. I have found
> that CG works better than the other types for this application. After
> about 85 iterations, I get an error about line search failure. I'm not
> clear on what this means, or how I could mitigate the problem, and
> neither the manual nor FAQ give any guidance. Can anyone suggest things
> I could try to help the method converge? I have function and gradient
> info, but no Hessian.
>
> Thanks,
> Ellen Price
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200226/4042035f/attachment.html>

From ellen.price at cfa.harvard.edu  Wed Feb 26 17:59:27 2020
From: ellen.price at cfa.harvard.edu (Ellen Price)
Date: Wed, 26 Feb 2020 18:59:27 -0500
Subject: [petsc-users] Guidance on using Tao for unconstrained
 minimization
In-Reply-To: <87ftexjlb0.fsf@jedbrown.org>
References: <9480b1d3-1ed3-8108-15d8-a5914312c5f2@cfa.harvard.edu>
	<87ftexjlb0.fsf@jedbrown.org>
Message-ID: <CAOnA6qFaqmirnM2Yov73NVrngaBmU0pwT48X7eUuD28pt57+Nw@mail.gmail.com>

Hi Jed,

Thanks for getting back to me! Here's the output for my CG config. Sorry
it's kind of a lot.

Ellen

On Wed, Feb 26, 2020 at 12:43 PM Jed Brown <jed at jedbrown.org> wrote:

> Could you share output for your current configuration with -tao_monitor
> -tao_ls_monitor -tao_view?
>
> "Ellen M. Price" <ellen.price at cfa.harvard.edu> writes:
>
> > Hello PETSc users!
> >
> > I am using Tao for an unconstrained minimization problem. I have found
> > that CG works better than the other types for this application. After
> > about 85 iterations, I get an error about line search failure. I'm not
> > clear on what this means, or how I could mitigate the problem, and
> > neither the manual nor FAQ give any guidance. Can anyone suggest things
> > I could try to help the method converge? I have function and gradient
> > info, but no Hessian.
> >
> > Thanks,
> > Ellen Price
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200226/4b41ef1a/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tao_cg.out
Type: application/octet-stream
Size: 187171 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200226/4b41ef1a/attachment-0001.obj>

From mfadams at lbl.gov  Wed Feb 26 18:11:53 2020
From: mfadams at lbl.gov (Mark Adams)
Date: Wed, 26 Feb 2020 19:11:53 -0500
Subject: [petsc-users] /usr/bin/ld: cannot find -lhwloc
In-Reply-To: <alpine.LFD.2.21.2002252134240.2172@sb>
References: <CADOhEh4BH9ZurMz8R-j47UEBjD-Y=ocjh7PCw3VhM0PqizO8rw@mail.gmail.com>
	<alpine.LFD.2.21.2002252134240.2172@sb>
Message-ID: <CADOhEh6y85++3QUZsX91TKW9aYKYC-kLLh38zmjceB=brm_WRQ@mail.gmail.com>

Thanks,
we are now getting
/usr/bin/ld: cannot find -lgcc_s.
Any ideas?
Let me know if you want to logs. (it takes a bit of mucking around).
Thanks again,


On Tue, Feb 25, 2020 at 10:37 PM Satish Balay <balay at mcs.anl.gov> wrote:

> Try
>
> --with-hwloc=0
>
> Satish
>
> On Tue, 25 Feb 2020, Mark Adams wrote:
>
> > We are running on a KNL and getting /usr/bin/ld: cannot find -lhwloc
> >
> > This is v3.7.7.
> >
> > I see -lhwloc in the PETSc stuff. We are also missing libX11.a and I am
> > configuring now with
> > --with-x=0 to try to fix that.
> >
> > I've attached to full output and the logs.
> >
> > Any ideas?
> > Thanks,
> > Mark
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200226/2b590649/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: output
Type: application/octet-stream
Size: 61875 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200226/2b590649/attachment-0001.obj>

From balay at mcs.anl.gov  Wed Feb 26 18:34:55 2020
From: balay at mcs.anl.gov (Satish Balay)
Date: Wed, 26 Feb 2020 18:34:55 -0600
Subject: [petsc-users] /usr/bin/ld: cannot find -lhwloc
In-Reply-To: <CADOhEh6y85++3QUZsX91TKW9aYKYC-kLLh38zmjceB=brm_WRQ@mail.gmail.com>
References: <CADOhEh4BH9ZurMz8R-j47UEBjD-Y=ocjh7PCw3VhM0PqizO8rw@mail.gmail.com>
	<alpine.LFD.2.21.2002252134240.2172@sb>
	<CADOhEh6y85++3QUZsX91TKW9aYKYC-kLLh38zmjceB=brm_WRQ@mail.gmail.com>
Message-ID: <alpine.LFD.2.21.2002261828570.2172@sb>

Such problems usually occur when 'compiler enviornment' is different
during petsc build - and this application build.

So things are worked during petsc configure [and build] - but they are
are not working during application build.

So the issue is to figure out why the env is different in these
2 builds.

And in the petsc build - I see:

>>>
Using C/C++ linker: cc
Using C/C++ flags: -g -O3
Using Fortran linker: ftn
Using Fortran flags: -g -O3
<<<<

On the application build - I see:

ftn -O2 -no-ipo -traceback 

Don't know if these different options are causing these issues. Its
good to test with default petsc makefiles [which use the same
targets as petsc sources] - if possible.

Satish

On Wed, 26 Feb 2020, Mark Adams wrote:

> Thanks,
> we are now getting
> /usr/bin/ld: cannot find -lgcc_s.
> Any ideas?
> Let me know if you want to logs. (it takes a bit of mucking around).
> Thanks again,
> 
> 
> On Tue, Feb 25, 2020 at 10:37 PM Satish Balay <balay at mcs.anl.gov> wrote:
> 
> > Try
> >
> > --with-hwloc=0
> >
> > Satish
> >
> > On Tue, 25 Feb 2020, Mark Adams wrote:
> >
> > > We are running on a KNL and getting /usr/bin/ld: cannot find -lhwloc
> > >
> > > This is v3.7.7.
> > >
> > > I see -lhwloc in the PETSc stuff. We are also missing libX11.a and I am
> > > configuring now with
> > > --with-x=0 to try to fix that.
> > >
> > > I've attached to full output and the logs.
> > >
> > > Any ideas?
> > > Thanks,
> > > Mark
> > >
> >
> >
> 


From juaneah at gmail.com  Wed Feb 26 23:38:20 2020
From: juaneah at gmail.com (Emmanuel Ayala)
Date: Wed, 26 Feb 2020 23:38:20 -0600
Subject: [petsc-users] MatMatMult
Message-ID: <CAMo+o5hnKwVFkgDYFVRZ2-CXm__p2kWSp7v4Ywq=X6YPYVuN0A@mail.gmail.com>

Hi everyone,

Recently I installed the PETSc version 3.12.4 in optimized mode:

./configure --with-debugging=0 COPTFLAGS='-O2 -march=native -mtune=native'
CXXOPTFLAGS='-O2 -march=native -mtune=native' FOPTFLAGS='-O2 -march=native
-mtune=native' --download-mpich=1 --download-fblaslapack=1
--with-cxx-dialect=C++11

When I perform MatMatMult(A,B,MAT_REUSE_MATRIX,PETSC_DEFAULT,&C), for
sparse A and dense B matrices, then:

In the special case where matrix B (and hence C) are dense you can create
the correctly sized matrix C yourself and then call this routine with
MAT_REUSE_MATRIX
<https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatReuse.html#MatReuse>,
rather than first having MatMatMult
<https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatMatMult.html#MatMatMult>()
create it for you. You can NEVER do this if the matrix C is sparse.

So, for these new release 3.12.4, if you create C as a dense matrix it is
necessary to apply a MatAssemblyBegin() and MatAssemblyEnd() after the
matrix multiplication, other wise it's not possible to perform further
operations.

It does not happen in the 3.11.3 version, where MatAssembly after
multiplication it's not necessary.

Kind regards.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200226/268949ee/attachment.html>

From hzhang at mcs.anl.gov  Thu Feb 27 09:33:29 2020
From: hzhang at mcs.anl.gov (hzhang at mcs.anl.gov)
Date: Thu, 27 Feb 2020 09:33:29 -0600
Subject: [petsc-users] MatMatMult
In-Reply-To: <CAMo+o5hnKwVFkgDYFVRZ2-CXm__p2kWSp7v4Ywq=X6YPYVuN0A@mail.gmail.com>
References: <CAMo+o5hnKwVFkgDYFVRZ2-CXm__p2kWSp7v4Ywq=X6YPYVuN0A@mail.gmail.com>
Message-ID: <CAGCphBsTL5XSqha0_ESmZdDtpu40VvWYwH0fjYvbiVwdz8QoJg@mail.gmail.com>

Emmanuel:
You can create a dense C with the required parallel layout without
calling MatAssemblyBegin() and MatAssemblyEnd().
Did you get error without calling these routines?
We only updated the help manu, not internal implementation. In the next
release, we'll introduce new set of API to consolidate the API of
mat-mat-operations.
Hong

Hi everyone,
>
> Recently I installed the PETSc version 3.12.4 in optimized mode:
>
> ./configure --with-debugging=0 COPTFLAGS='-O2 -march=native -mtune=native'
> CXXOPTFLAGS='-O2 -march=native -mtune=native' FOPTFLAGS='-O2 -march=native
> -mtune=native' --download-mpich=1 --download-fblaslapack=1
> --with-cxx-dialect=C++11
>
> When I perform MatMatMult(A,B,MAT_REUSE_MATRIX,PETSC_DEFAULT,&C), for
> sparse A and dense B matrices, then:
>
> In the special case where matrix B (and hence C) are dense you can create
> the correctly sized matrix C yourself and then call this routine with
> MAT_REUSE_MATRIX
> <https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatReuse.html#MatReuse>,
> rather than first having MatMatMult
> <https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatMatMult.html#MatMatMult>()
> create it for you. You can NEVER do this if the matrix C is sparse.
>
> So, for these new release 3.12.4, if you create C as a dense matrix it is
> necessary to apply a MatAssemblyBegin() and MatAssemblyEnd() after the
> matrix multiplication, other wise it's not possible to perform further
> operations.
>
> It does not happen in the 3.11.3 version, where MatAssembly after
> multiplication it's not necessary.
>
> Kind regards.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200227/f1785c16/attachment.html>

From adener at anl.gov  Thu Feb 27 09:55:18 2020
From: adener at anl.gov (Dener, Alp)
Date: Thu, 27 Feb 2020 15:55:18 +0000
Subject: [petsc-users] Guidance on using Tao for unconstrained
 minimization
In-Reply-To: <CAOnA6qFaqmirnM2Yov73NVrngaBmU0pwT48X7eUuD28pt57+Nw@mail.gmail.com>
References: <9480b1d3-1ed3-8108-15d8-a5914312c5f2@cfa.harvard.edu>
	<87ftexjlb0.fsf@jedbrown.org>
	<CAOnA6qFaqmirnM2Yov73NVrngaBmU0pwT48X7eUuD28pt57+Nw@mail.gmail.com>
Message-ID: <etPan.5e57e666.51831d06.108b1@anl.gov>

Hi Ellen,

It looks like you?re using the old unconstrained CG code. This will be deprecated in the near future in favor of the newer bound-constrained CG algorithm (TAOBNCG) that can also solve unconstrained problems when the user does not specify any bounds on the problem.

The newer TAOBNCG algorithm implements a preconditioner that significantly improves the scaling of the search direction and helps the line search accept the unit step length most of the time. I would recommend making sure that you?re on PETSc version 3.11 or newer, and then switching to this with ?-tao_type bncg?. You will not need to change any of your code to do this. If you still fail to converge, please send a new log with the new algorithm and we can evaluate the next steps.

?
Alp Dener
Postdoctoral Researcher
Argonne National Laboratory
https://www.anl.gov/profile/alp-dener


On February 26, 2020 at 6:01:34 PM, Ellen Price (ellen.price at cfa.harvard.edu<mailto:ellen.price at cfa.harvard.edu>) wrote:

Hi Jed,

Thanks for getting back to me! Here's the output for my CG config. Sorry it's kind of a lot.

Ellen

On Wed, Feb 26, 2020 at 12:43 PM Jed Brown <jed at jedbrown.org<mailto:jed at jedbrown.org>> wrote:
Could you share output for your current configuration with -tao_monitor -tao_ls_monitor -tao_view?

"Ellen M. Price" <ellen.price at cfa.harvard.edu<mailto:ellen.price at cfa.harvard.edu>> writes:

> Hello PETSc users!
>
> I am using Tao for an unconstrained minimization problem. I have found
> that CG works better than the other types for this application. After
> about 85 iterations, I get an error about line search failure. I'm not
> clear on what this means, or how I could mitigate the problem, and
> neither the manual nor FAQ give any guidance. Can anyone suggest things
> I could try to help the method converge? I have function and gradient
> info, but no Hessian.
>
> Thanks,
> Ellen Price
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200227/760de8ce/attachment-0001.html>

From adenchfi at hawk.iit.edu  Thu Feb 27 10:40:16 2020
From: adenchfi at hawk.iit.edu (Adam Denchfield)
Date: Thu, 27 Feb 2020 10:40:16 -0600
Subject: [petsc-users] Guidance on using Tao for unconstrained
 minimization
In-Reply-To: <etPan.5e57e666.51831d06.108b1@anl.gov>
References: <9480b1d3-1ed3-8108-15d8-a5914312c5f2@cfa.harvard.edu>
	<87ftexjlb0.fsf@jedbrown.org>
	<CAOnA6qFaqmirnM2Yov73NVrngaBmU0pwT48X7eUuD28pt57+Nw@mail.gmail.com>
	<etPan.5e57e666.51831d06.108b1@anl.gov>
Message-ID: <CAAH0+hfro17Fp2oONQnjnGjU=h0XUY7vpio2YVZ9gW-vb7CTkA@mail.gmail.com>

Hi Ellen,

It is as Alp said. To emphasize what he said, you don't need to worry about
using a bounded CG method - the bounded CG methods can be used for
unconstrained problems, and are much better than the old unconstrained CG
code.


On Thu, Feb 27, 2020, 9:55 AM Dener, Alp via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> Hi Ellen,
>
> It looks like you?re using the old unconstrained CG code. This will be
> deprecated in the near future in favor of the newer bound-constrained CG
> algorithm (TAOBNCG) that can also solve unconstrained problems when the
> user does not specify any bounds on the problem.
>
> The newer TAOBNCG algorithm implements a preconditioner that significantly
> improves the scaling of the search direction and helps the line search
> accept the unit step length most of the time. I would recommend making sure
> that you?re on PETSc version 3.11 or newer, and then switching to this with
> ?-tao_type bncg?. You will not need to change any of your code to do this.
> If you still fail to converge, please send a new log with the new algorithm
> and we can evaluate the next steps.
>
> ?
> Alp Dener
> Postdoctoral Researcher
> Argonne National Laboratory
> https://www.anl.gov/profile/alp-dener
>
> On February 26, 2020 at 6:01:34 PM, Ellen Price (
> ellen.price at cfa.harvard.edu) wrote:
>
> Hi Jed,
>
> Thanks for getting back to me! Here's the output for my CG config. Sorry
> it's kind of a lot.
>
> Ellen
>
> On Wed, Feb 26, 2020 at 12:43 PM Jed Brown <jed at jedbrown.org> wrote:
>
>> Could you share output for your current configuration with -tao_monitor
>> -tao_ls_monitor -tao_view?
>>
>> "Ellen M. Price" <ellen.price at cfa.harvard.edu> writes:
>>
>> > Hello PETSc users!
>> >
>> > I am using Tao for an unconstrained minimization problem. I have found
>> > that CG works better than the other types for this application. After
>> > about 85 iterations, I get an error about line search failure. I'm not
>> > clear on what this means, or how I could mitigate the problem, and
>> > neither the manual nor FAQ give any guidance. Can anyone suggest things
>> > I could try to help the method converge? I have function and gradient
>> > info, but no Hessian.
>> >
>> > Thanks,
>> > Ellen Price
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200227/ebf7b599/attachment.html>

From ellen.price at cfa.harvard.edu  Thu Feb 27 13:47:13 2020
From: ellen.price at cfa.harvard.edu (Ellen Price)
Date: Thu, 27 Feb 2020 14:47:13 -0500
Subject: [petsc-users] Guidance on using Tao for unconstrained
 minimization
In-Reply-To: <CAAH0+hfro17Fp2oONQnjnGjU=h0XUY7vpio2YVZ9gW-vb7CTkA@mail.gmail.com>
References: <9480b1d3-1ed3-8108-15d8-a5914312c5f2@cfa.harvard.edu>
	<87ftexjlb0.fsf@jedbrown.org>
	<CAOnA6qFaqmirnM2Yov73NVrngaBmU0pwT48X7eUuD28pt57+Nw@mail.gmail.com>
	<etPan.5e57e666.51831d06.108b1@anl.gov>
	<CAAH0+hfro17Fp2oONQnjnGjU=h0XUY7vpio2YVZ9gW-vb7CTkA@mail.gmail.com>
Message-ID: <CAOnA6qFnC3Pd=gL=huu_uLj29Q0TD+DPMg2qG0aJoEXsdu8Yxg@mail.gmail.com>

I tried what you suggested and used the bounded CG method. It gets a lot
farther than the unconstrained CG method and finds a lower residual, but it
still experiences a line search failure after a while. Any thoughts? I'm
attaching the log output.

In case it's helpful, I also spent a few more hours working on the code and
now can compute the Hessian times an arbitrary vector (matrix-free using a
MATSHELL); even matrix-free, however, the Hessian is much slower to compute
than the gradient and objective. To answer a previous question, I am as
sure as I can be that the gradient is correct, since I'm using automatic
differentiation and not relying on a hand-coded function.

Thanks for your help,
Ellen

On Thu, Feb 27, 2020 at 11:40 AM Adam Denchfield <adenchfi at hawk.iit.edu>
wrote:

> Hi Ellen,
>
> It is as Alp said. To emphasize what he said, you don't need to worry
> about using a bounded CG method - the bounded CG methods can be used for
> unconstrained problems, and are much better than the old unconstrained CG
> code.
>
>
> On Thu, Feb 27, 2020, 9:55 AM Dener, Alp via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
>> Hi Ellen,
>>
>> It looks like you?re using the old unconstrained CG code. This will be
>> deprecated in the near future in favor of the newer bound-constrained CG
>> algorithm (TAOBNCG) that can also solve unconstrained problems when the
>> user does not specify any bounds on the problem.
>>
>> The newer TAOBNCG algorithm implements a preconditioner that
>> significantly improves the scaling of the search direction and helps the
>> line search accept the unit step length most of the time. I would recommend
>> making sure that you?re on PETSc version 3.11 or newer, and then switching
>> to this with ?-tao_type bncg?. You will not need to change any of your code
>> to do this. If you still fail to converge, please send a new log with the
>> new algorithm and we can evaluate the next steps.
>>
>> ?
>> Alp Dener
>> Postdoctoral Researcher
>> Argonne National Laboratory
>> https://www.anl.gov/profile/alp-dener
>>
>> On February 26, 2020 at 6:01:34 PM, Ellen Price (
>> ellen.price at cfa.harvard.edu) wrote:
>>
>> Hi Jed,
>>
>> Thanks for getting back to me! Here's the output for my CG config. Sorry
>> it's kind of a lot.
>>
>> Ellen
>>
>> On Wed, Feb 26, 2020 at 12:43 PM Jed Brown <jed at jedbrown.org> wrote:
>>
>>> Could you share output for your current configuration with -tao_monitor
>>> -tao_ls_monitor -tao_view?
>>>
>>> "Ellen M. Price" <ellen.price at cfa.harvard.edu> writes:
>>>
>>> > Hello PETSc users!
>>> >
>>> > I am using Tao for an unconstrained minimization problem. I have found
>>> > that CG works better than the other types for this application. After
>>> > about 85 iterations, I get an error about line search failure. I'm not
>>> > clear on what this means, or how I could mitigate the problem, and
>>> > neither the manual nor FAQ give any guidance. Can anyone suggest things
>>> > I could try to help the method converge? I have function and gradient
>>> > info, but no Hessian.
>>> >
>>> > Thanks,
>>> > Ellen Price
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200227/c02a5ba0/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tao_bncg.out
Type: application/octet-stream
Size: 51783 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200227/c02a5ba0/attachment-0001.obj>

From adener at anl.gov  Thu Feb 27 15:58:44 2020
From: adener at anl.gov (Dener, Alp)
Date: Thu, 27 Feb 2020 21:58:44 +0000
Subject: [petsc-users] Guidance on using Tao for unconstrained
 minimization
In-Reply-To: <CAOnA6qFnC3Pd=gL=huu_uLj29Q0TD+DPMg2qG0aJoEXsdu8Yxg@mail.gmail.com>
References: <9480b1d3-1ed3-8108-15d8-a5914312c5f2@cfa.harvard.edu>
	<87ftexjlb0.fsf@jedbrown.org>
	<CAOnA6qFaqmirnM2Yov73NVrngaBmU0pwT48X7eUuD28pt57+Nw@mail.gmail.com>
	<etPan.5e57e666.51831d06.108b1@anl.gov>
	<CAAH0+hfro17Fp2oONQnjnGjU=h0XUY7vpio2YVZ9gW-vb7CTkA@mail.gmail.com>
	<CAOnA6qFnC3Pd=gL=huu_uLj29Q0TD+DPMg2qG0aJoEXsdu8Yxg@mail.gmail.com>
Message-ID: <etPan.5e583b94.4fe2d465.108b1@anl.gov>

The log shows that the LS keeps backtracking on the step length and still fails to find any point that reduces the function value even though the gradient passes the directional derivative test (i.e. it is a valid descent direction). If you?re confident that the gradient is accurate, then this behavior suggests that you?re stuck at a discontinuity in the function. Are you certain that this objective is at least C1 continuous?

I also see that the LS is taking a step length of 5.0 in the 109th iteration right before it fails on the 110th. This might be too aggressive. You could restrict this with ?-tao_ls_stepmax 1.0? or switch to a backtracking LS with ?-tao_ls_type armijo? see if it changes anything at all. Note that the backtracking line search may behave a bit differently than restricting max step to 1.0, because backtracking completely ignores curvature information. Trying both would be helpful.

You could additionally try to disable the line search entirely with ?-tao_ls_type unit? and accept 1.0 step lengths no matter what. This might cause an issue at the very beginning where I do see the LS doing some work. However, if it helps you reach the same point of failure as before, then it will keep going further without LS failures, and what happens with the function value and gradient norm here can help diagnose the problem. If there is indeed a discontinuous point like I suspect, then BNCG without line search might bounce back and forth around that point until it hits the iteration limit.

TAO does have support for matrix-free Hessians in the Newton-type algorithms. You can switch to them with ?-tao_type bnls? for Newton Line Search or ?-tao_type bntr? for Newton Trust Region. They both use a Krylov method to iteratively invert the Hessian, and in matrix-free cases, use a BFGS approximation as the preconditioner. They're going to take more time per iteration, but should at least reach the point of failure in fewer iterations than BNCG. Whether or not that trade-off improves the overall time is problem dependent. The same LS modifications I mentioned above are also applicable to these. Ultimately though, if there really is a discontinuity, they're likely to get stuck in the same spot unless they manages to find a different local minimum that BNCG is not finding.

?
Alp Dener
Postdoctoral Researcher
Argonne National Laboratory
https://www.anl.gov/profile/alp-dener


On February 27, 2020 at 1:48:44 PM, Ellen Price (ellen.price at cfa.harvard.edu<mailto:ellen.price at cfa.harvard.edu>) wrote:

I tried what you suggested and used the bounded CG method. It gets a lot farther than the unconstrained CG method and finds a lower residual, but it still experiences a line search failure after a while. Any thoughts? I'm attaching the log output.

In case it's helpful, I also spent a few more hours working on the code and now can compute the Hessian times an arbitrary vector (matrix-free using a MATSHELL); even matrix-free, however, the Hessian is much slower to compute than the gradient and objective. To answer a previous question, I am as sure as I can be that the gradient is correct, since I'm using automatic differentiation and not relying on a hand-coded function.

Thanks for your help,
Ellen

On Thu, Feb 27, 2020 at 11:40 AM Adam Denchfield <adenchfi at hawk.iit.edu<mailto:adenchfi at hawk.iit.edu>> wrote:
Hi Ellen,

It is as Alp said. To emphasize what he said, you don't need to worry about using a bounded CG method - the bounded CG methods can be used for unconstrained problems, and are much better than the old unconstrained CG code.


On Thu, Feb 27, 2020, 9:55 AM Dener, Alp via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
Hi Ellen,

It looks like you?re using the old unconstrained CG code. This will be deprecated in the near future in favor of the newer bound-constrained CG algorithm (TAOBNCG) that can also solve unconstrained problems when the user does not specify any bounds on the problem.

The newer TAOBNCG algorithm implements a preconditioner that significantly improves the scaling of the search direction and helps the line search accept the unit step length most of the time. I would recommend making sure that you?re on PETSc version 3.11 or newer, and then switching to this with ?-tao_type bncg?. You will not need to change any of your code to do this. If you still fail to converge, please send a new log with the new algorithm and we can evaluate the next steps.

?
Alp Dener
Postdoctoral Researcher
Argonne National Laboratory
https://www.anl.gov/profile/alp-dener


On February 26, 2020 at 6:01:34 PM, Ellen Price (ellen.price at cfa.harvard.edu<mailto:ellen.price at cfa.harvard.edu>) wrote:

Hi Jed,

Thanks for getting back to me! Here's the output for my CG config. Sorry it's kind of a lot.

Ellen

On Wed, Feb 26, 2020 at 12:43 PM Jed Brown <jed at jedbrown.org<mailto:jed at jedbrown.org>> wrote:
Could you share output for your current configuration with -tao_monitor -tao_ls_monitor -tao_view?

"Ellen M. Price" <ellen.price at cfa.harvard.edu<mailto:ellen.price at cfa.harvard.edu>> writes:

> Hello PETSc users!
>
> I am using Tao for an unconstrained minimization problem. I have found
> that CG works better than the other types for this application. After
> about 85 iterations, I get an error about line search failure. I'm not
> clear on what this means, or how I could mitigate the problem, and
> neither the manual nor FAQ give any guidance. Can anyone suggest things
> I could try to help the method converge? I have function and gradient
> info, but no Hessian.
>
> Thanks,
> Ellen Price
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200227/62628de0/attachment.html>

From juaneah at gmail.com  Thu Feb 27 16:55:05 2020
From: juaneah at gmail.com (Emmanuel Ayala)
Date: Thu, 27 Feb 2020 16:55:05 -0600
Subject: [petsc-users] MatMatMult
In-Reply-To: <CAGCphBsTL5XSqha0_ESmZdDtpu40VvWYwH0fjYvbiVwdz8QoJg@mail.gmail.com>
References: <CAMo+o5hnKwVFkgDYFVRZ2-CXm__p2kWSp7v4Ywq=X6YPYVuN0A@mail.gmail.com>
	<CAGCphBsTL5XSqha0_ESmZdDtpu40VvWYwH0fjYvbiVwdz8QoJg@mail.gmail.com>
Message-ID: <CAMo+o5gnsoF54FgpsML8u_0WFo+o_B9GCAUDWbLGvEkbbA01Jg@mail.gmail.com>

Thanks for the answer.

Emmanuel:
> You can create a dense C with the required parallel layout without
> calling MatAssemblyBegin() and MatAssemblyEnd().
> Did you get error without calling these routines?
>

Yes, the output is (after create the C dense matrix and do not assembly it,
run1 - see attached file -):

[0]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------
[0]PETSC ERROR: Object is in wrong state
[0]PETSC ERROR: Not for unassembled matrix
[0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.12.4, Feb, 04, 2020
[0]PETSC ERROR: ./comp on a arch-linux-c-opt-O2 named ayala by ayala Thu
Feb 27 16:47:15 2020
[0]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2
-march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native"
FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich=1
--download-fblaslapack=1 --with-cxx-dialect=C++11
[0]PETSC ERROR: #1 MatNorm() line 5123 in
/home/ayala/Documents/PETSc/petsc-3.12.4/src/mat/interface/matrix.c

We only updated the help manu, not internal implementation. In the next
> release, we'll introduce new set of API to consolidate the API of
> mat-mat-operations.
> Hong
>

I attach my test file, or maybe I'm doing something wrong. I tested this
file on my laptop ubuntu 18

Kind regards.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200227/c4f167b3/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: main.cc
Type: text/x-c++src
Size: 3157 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200227/c4f167b3/attachment.bin>

From alexlindsay239 at gmail.com  Thu Feb 27 19:59:49 2020
From: alexlindsay239 at gmail.com (Alexander Lindsay)
Date: Thu, 27 Feb 2020 17:59:49 -0800
Subject: [petsc-users] Scraping MPI information from PETSc conf
Message-ID: <CANFcJrEBBfnC3fF-fQrDqyZ=Ww1YbUaPZMaGVHdm9iDFSKHf-Q@mail.gmail.com>

What's the cleanest way to determine the MPI install used to build PETSc?
We are configuring a an MPI-based C++ library with autotools  that will
eventually be used by libMesh, and we'd like to make sure that this library
(as well as libMesh) uses the same MPI that PETSc used or at worst detect
our own and then error/warn the user if its an MPI that differs from the
one used to build PETc. It seems like the only path info that shows up is
in MPICXX_SHOW, PETSC_EXTERNAL_LIB_BASIC, and PETSC_WITH_EXTERNAL_LIB (I'm
looking in petscvariables). I'm willing to learn the m4/portable shell
built-ins necessary to parse those variables and come out with an mpi-dir,
but before doing that figured I'd ask here and see if I'm missing something
easier.

Alex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200227/ba5481c8/attachment-0001.html>

From jed at jedbrown.org  Thu Feb 27 20:08:40 2020
From: jed at jedbrown.org (Jed Brown)
Date: Thu, 27 Feb 2020 19:08:40 -0700
Subject: [petsc-users] Scraping MPI information from PETSc conf
In-Reply-To: <CANFcJrEBBfnC3fF-fQrDqyZ=Ww1YbUaPZMaGVHdm9iDFSKHf-Q@mail.gmail.com>
References: <CANFcJrEBBfnC3fF-fQrDqyZ=Ww1YbUaPZMaGVHdm9iDFSKHf-Q@mail.gmail.com>
Message-ID: <87y2snsbtj.fsf@jedbrown.org>

If determining mpicc is sufficient, this will work

  pkg-config --var=ccompiler PETSc

We also define

$ grep NUMVERSION mpich-optg/include/petscconf.h 
#define PETSC_HAVE_MPICH_NUMVERSION 30302300

or

$ grep OMPI_ ompi-optg/include/petscconf.h 
#define PETSC_HAVE_OMPI_MAJOR_VERSION 4
#define PETSC_HAVE_OMPI_MINOR_VERSION 0
#define PETSC_HAVE_OMPI_RELEASE_VERSION 2

which PETSc uses to raise a compile-time error if it believes you're
compiling PETSc code using an incompatible MPI.

Note that some of this is hidden in the environment on Cray systems, for
example, where CC=cc regardless of what compiler you're actually using.

Alexander Lindsay <alexlindsay239 at gmail.com> writes:

> What's the cleanest way to determine the MPI install used to build PETSc?
> We are configuring a an MPI-based C++ library with autotools  that will
> eventually be used by libMesh, and we'd like to make sure that this library
> (as well as libMesh) uses the same MPI that PETSc used or at worst detect
> our own and then error/warn the user if its an MPI that differs from the
> one used to build PETc. It seems like the only path info that shows up is
> in MPICXX_SHOW, PETSC_EXTERNAL_LIB_BASIC, and PETSC_WITH_EXTERNAL_LIB (I'm
> looking in petscvariables). I'm willing to learn the m4/portable shell
> built-ins necessary to parse those variables and come out with an mpi-dir,
> but before doing that figured I'd ask here and see if I'm missing something
> easier.
>
> Alex

From balay at mcs.anl.gov  Thu Feb 27 21:15:19 2020
From: balay at mcs.anl.gov (Satish Balay)
Date: Thu, 27 Feb 2020 21:15:19 -0600
Subject: [petsc-users] Scraping MPI information from PETSc conf
In-Reply-To: <87y2snsbtj.fsf@jedbrown.org>
References: <CANFcJrEBBfnC3fF-fQrDqyZ=Ww1YbUaPZMaGVHdm9iDFSKHf-Q@mail.gmail.com>
	<87y2snsbtj.fsf@jedbrown.org>
Message-ID: <alpine.LFD.2.21.2002272110240.2172@sb>

Not really useful for autotools - but we print the mpi.h used during
build in make.log

Using mpi.h: # 1 "/home/petsc/soft/mpich-3.3b1/include/mpi.h" 1

I guess the same code [using a petsc makefile] - can be scripted and
parsed to get the PATH to compare in autotools.

However the current version check [below] is likely the best way. Our
prior check was deemed too strict - for ex: when linux distros updated
MPI packages with a bug fixed version [without API change] - our prior
check flagged this as incompatible - so we had to change it.

Satish

On Thu, 27 Feb 2020, Jed Brown wrote:

> If determining mpicc is sufficient, this will work
> 
>   pkg-config --var=ccompiler PETSc
> 
> We also define
> 
> $ grep NUMVERSION mpich-optg/include/petscconf.h 
> #define PETSC_HAVE_MPICH_NUMVERSION 30302300
> 
> or
> 
> $ grep OMPI_ ompi-optg/include/petscconf.h 
> #define PETSC_HAVE_OMPI_MAJOR_VERSION 4
> #define PETSC_HAVE_OMPI_MINOR_VERSION 0
> #define PETSC_HAVE_OMPI_RELEASE_VERSION 2
> 
> which PETSc uses to raise a compile-time error if it believes you're
> compiling PETSc code using an incompatible MPI.
> 
> Note that some of this is hidden in the environment on Cray systems, for
> example, where CC=cc regardless of what compiler you're actually using.
> 
> Alexander Lindsay <alexlindsay239 at gmail.com> writes:
> 
> > What's the cleanest way to determine the MPI install used to build PETSc?
> > We are configuring a an MPI-based C++ library with autotools  that will
> > eventually be used by libMesh, and we'd like to make sure that this library
> > (as well as libMesh) uses the same MPI that PETSc used or at worst detect
> > our own and then error/warn the user if its an MPI that differs from the
> > one used to build PETc. It seems like the only path info that shows up is
> > in MPICXX_SHOW, PETSC_EXTERNAL_LIB_BASIC, and PETSC_WITH_EXTERNAL_LIB (I'm
> > looking in petscvariables). I'm willing to learn the m4/portable shell
> > built-ins necessary to parse those variables and come out with an mpi-dir,
> > but before doing that figured I'd ask here and see if I'm missing something
> > easier.
> >
> > Alex
> 


From jordic at cttc.upc.edu  Fri Feb 28 04:29:44 2020
From: jordic at cttc.upc.edu (jordic)
Date: Fri, 28 Feb 2020 11:29:44 +0100
Subject: [petsc-users] Memory leak at GPU when updating matrix of type
 mpiaijcusparse (CUDA)
Message-ID: <df552816f8b15e4772aa71e6ec42549a@cttc.upc.edu>

Dear all, 

the following simple program: 

//////////////////////////////////////////////////////////////////////////////////////


#include <petscmat.h> 

PetscInt ierr=0;
int main(int argc,char **argv)
{
  MPI_Comm comm;
  PetscMPIInt rank,size; 

  PetscInitialize(&argc,&argv,NULL,help);if (ierr) return ierr;
  comm = PETSC_COMM_WORLD;
  MPI_Comm_rank(comm,&rank);
  MPI_Comm_size(comm,&size); 

  Mat A;
  MatCreate(comm, &A);
  MatSetSizes(A, 1, 1, PETSC_DETERMINE, PETSC_DETERMINE);
  MatSetFromOptions(A);
  PetscInt dnz=1, onz=0;
  MatMPIAIJSetPreallocation(A, 0, &dnz, 0, &onz);
  MatSetOption(A, MAT_NO_OFF_PROC_ENTRIES, PETSC_TRUE);
  MatSetOption(A, MAT_IGNORE_ZERO_ENTRIES, PETSC_TRUE);
  PetscInt igid=rank, jgid=rank;
  PetscScalar value=rank+1.0; 

//  for(int i=0; i<10; ++i)
  for(;;) //infinite loop
  {
    MatSetValue(A, igid, jgid, value, INSERT_VALUES);
    MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY);
    MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY);
  }
  MatDestroy(&A);
  PetscFinalize();
  return ierr;
} 

//////////////////////////////////////////////////////////////////////////////////////


creates a simple diagonal matrix with one value per mpi-core. If the
type of the matrix is "mpiaij" (-mat_type mpiaij) there is no problem
but with "mpiaijcusparse" (-mat_type mpiaijcusparse) the memory usage at
the GPU grows with every iteration of the infinite loop. The only
solution that I found is to destroy and create the matrix every time
that it needs to be updated. Is there a better way to avoid this
problem? 

I am using Petsc Release Version 3.12.2 with this configure options: 

Configure options --package-prefix-hash=/home_nobck/user/petsc-hash-pkgs
--with-debugging=0 --with-fc=0 CC=gcc CXX=g++ --COPTFLAGS="-g -O3"
--CXXOPTFLAGS="-g -O3" --CUDAOPTFLAGS="-D_FORCE_INLINES -g -O3"
--with-mpi-include=/usr/lib/openmpi/include
--with-mpi-lib="-L/usr/lib/openmpi/lib -lmpi_cxx -lmpi" --with-cuda=1
--with-precision=double --with-cuda-include=/usr/include
--with-cuda-lib="-L/usr/lib/x86_64-linux-gnu -lcuda -lcudart -lcublas
-lcufft -lcusparse -lcusolver"
PETSC_ARCH=arch-ci-linux-opt-cxx-cuda-double 

Thanks for your help, 

Jorge
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200228/e52df577/attachment.html>

From hzhang at mcs.anl.gov  Fri Feb 28 10:04:28 2020
From: hzhang at mcs.anl.gov (hzhang at mcs.anl.gov)
Date: Fri, 28 Feb 2020 10:04:28 -0600
Subject: [petsc-users] MatMatMult
In-Reply-To: <CAMo+o5gnsoF54FgpsML8u_0WFo+o_B9GCAUDWbLGvEkbbA01Jg@mail.gmail.com>
References: <CAMo+o5hnKwVFkgDYFVRZ2-CXm__p2kWSp7v4Ywq=X6YPYVuN0A@mail.gmail.com>
	<CAGCphBsTL5XSqha0_ESmZdDtpu40VvWYwH0fjYvbiVwdz8QoJg@mail.gmail.com>
	<CAMo+o5gnsoF54FgpsML8u_0WFo+o_B9GCAUDWbLGvEkbbA01Jg@mail.gmail.com>
Message-ID: <CAGCphBs=sFTt4WuMbD84CKRVc3c+5t3KnAhLztmjEn3_Fuv0Hg@mail.gmail.com>

Emmanuel:
This is a bug in petsc. I've pushed a fix
https://gitlab.com/petsc/petsc/-/commit/fd2a003f2c07165526de5c2fa5ca4f3c85618da7

You can edit it in your petsc library, or add MatAssemblyBegin/End in your
application code until petsc-release is patched.
Thanks for reporting it and sending us the test!
Hong

Thanks for the answer.
>
> Emmanuel:
>> You can create a dense C with the required parallel layout without
>> calling MatAssemblyBegin() and MatAssemblyEnd().
>> Did you get error without calling these routines?
>>
>
> Yes, the output is (after create the C dense matrix and do not assembly
> it, run1 - see attached file -):
>
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: Object is in wrong state
> [0]PETSC ERROR: Not for unassembled matrix
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.12.4, Feb, 04, 2020
> [0]PETSC ERROR: ./comp on a arch-linux-c-opt-O2 named ayala by ayala Thu
> Feb 27 16:47:15 2020
> [0]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2
> -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native"
> FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich=1
> --download-fblaslapack=1 --with-cxx-dialect=C++11
> [0]PETSC ERROR: #1 MatNorm() line 5123 in
> /home/ayala/Documents/PETSc/petsc-3.12.4/src/mat/interface/matrix.c
>
> We only updated the help manu, not internal implementation. In the next
>> release, we'll introduce new set of API to consolidate the API of
>> mat-mat-operations.
>> Hong
>>
>
> I attach my test file, or maybe I'm doing something wrong. I tested this
> file on my laptop ubuntu 18
>
> Kind regards.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200228/489fabcf/attachment.html>

From juaneah at gmail.com  Fri Feb 28 10:18:47 2020
From: juaneah at gmail.com (Emmanuel Ayala)
Date: Fri, 28 Feb 2020 10:18:47 -0600
Subject: [petsc-users] MatMatMult
In-Reply-To: <CAGCphBs=sFTt4WuMbD84CKRVc3c+5t3KnAhLztmjEn3_Fuv0Hg@mail.gmail.com>
References: <CAMo+o5hnKwVFkgDYFVRZ2-CXm__p2kWSp7v4Ywq=X6YPYVuN0A@mail.gmail.com>
	<CAGCphBsTL5XSqha0_ESmZdDtpu40VvWYwH0fjYvbiVwdz8QoJg@mail.gmail.com>
	<CAMo+o5gnsoF54FgpsML8u_0WFo+o_B9GCAUDWbLGvEkbbA01Jg@mail.gmail.com>
	<CAGCphBs=sFTt4WuMbD84CKRVc3c+5t3KnAhLztmjEn3_Fuv0Hg@mail.gmail.com>
Message-ID: <CAMo+o5iKRqC9zjyJMMknepT8n5Q=SssxHkWmk+_5sWHQeQMqYw@mail.gmail.com>

Hi, Thanks.

Kind regards.

El vie., 28 de feb. de 2020 a la(s) 10:04, hzhang at mcs.anl.gov (
hzhang at mcs.anl.gov) escribi?:

> Emmanuel:
> This is a bug in petsc. I've pushed a fix
> https://gitlab.com/petsc/petsc/-/commit/fd2a003f2c07165526de5c2fa5ca4f3c85618da7
>
> You can edit it in your petsc library, or add MatAssemblyBegin/End in
> your application code until petsc-release is patched.
> Thanks for reporting it and sending us the test!
> Hong
>
> Thanks for the answer.
>>
>> Emmanuel:
>>> You can create a dense C with the required parallel layout without
>>> calling MatAssemblyBegin() and MatAssemblyEnd().
>>> Did you get error without calling these routines?
>>>
>>
>> Yes, the output is (after create the C dense matrix and do not assembly
>> it, run1 - see attached file -):
>>
>> [0]PETSC ERROR: --------------------- Error Message
>> --------------------------------------------------------------
>> [0]PETSC ERROR: Object is in wrong state
>> [0]PETSC ERROR: Not for unassembled matrix
>> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
>> for trouble shooting.
>> [0]PETSC ERROR: Petsc Release Version 3.12.4, Feb, 04, 2020
>> [0]PETSC ERROR: ./comp on a arch-linux-c-opt-O2 named ayala by ayala Thu
>> Feb 27 16:47:15 2020
>> [0]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2
>> -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native"
>> FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich=1
>> --download-fblaslapack=1 --with-cxx-dialect=C++11
>> [0]PETSC ERROR: #1 MatNorm() line 5123 in
>> /home/ayala/Documents/PETSc/petsc-3.12.4/src/mat/interface/matrix.c
>>
>> We only updated the help manu, not internal implementation. In the next
>>> release, we'll introduce new set of API to consolidate the API of
>>> mat-mat-operations.
>>> Hong
>>>
>>
>> I attach my test file, or maybe I'm doing something wrong. I tested this
>> file on my laptop ubuntu 18
>>
>> Kind regards.
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200228/2f4c2f7c/attachment-0001.html>

From D.J.P.Lahaye at tudelft.nl  Fri Feb 28 10:32:50 2020
From: D.J.P.Lahaye at tudelft.nl (Domenico Lahaye - EWI)
Date: Fri, 28 Feb 2020 16:32:50 +0000
Subject: [petsc-users] Master student exploring DMPLEX and TS/ex11.c
In-Reply-To: <5b91e057fc0a4dc280b188525d921718@tudelft.nl>
References: <5b91e057fc0a4dc280b188525d921718@tudelft.nl>
Message-ID: <0ae7d639d923420e974792593ac6d90f@tudelft.nl>

Dear all,


  I have a master student exploring DMPLEX and TS/ex11.c.


  He has build his own examples aiming at easing the learning curve.


  His examples are here GITHUB LINK -> https://github.com/mukkund1996/DMPLEX-advection<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_mukkund1996_DMPLEX-2Dadvection&d=DwMFaQ&c=XYzUhXBD2cD-CornpT4QE19xOJBbRy-TBPLK0X9U2o8&r=sjNwqt6Fj8E80a4ypywbEkdBAdxyfLwEJIqxOJHnL6A&m=VT_mQdQLDl2Vsd2RobGrJo-pyuyuKNFVdwrSwPSCFFA&s=o8mN6lWNVMY_eVwjg9zYT5_w0IIj2V1J4MTbrGptLS8&e=>


  Possibly this material is valuable to you. Any feedback is most welcome.

  Domenico Lahaye.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200228/f6055b6b/attachment.html>

From jczhang at mcs.anl.gov  Fri Feb 28 12:13:12 2020
From: jczhang at mcs.anl.gov (Junchao Zhang)
Date: Fri, 28 Feb 2020 12:13:12 -0600
Subject: [petsc-users] Memory leak at GPU when updating matrix of type
 mpiaijcusparse (CUDA)
In-Reply-To: <df552816f8b15e4772aa71e6ec42549a@cttc.upc.edu>
References: <df552816f8b15e4772aa71e6ec42549a@cttc.upc.edu>
Message-ID: <CA+MQGp9CURnJQrrQ-s-DHuZ9JnVJ9WewuGuwU-BBKA1EGuhRjg@mail.gmail.com>

I will take a look at it and get back to you. Thanks.

On Fri, Feb 28, 2020, 7:29 AM jordic <jordic at cttc.upc.edu> wrote:

> Dear all,
>
> the following simple program:
>
>
> //////////////////////////////////////////////////////////////////////////////////////
>
> #include <petscmat.h>
>
> PetscInt ierr=0;
> int main(int argc,char **argv)
> {
>   MPI_Comm comm;
>   PetscMPIInt rank,size;
>
>   PetscInitialize(&argc,&argv,NULL,help);if (ierr) return ierr;
>   comm = PETSC_COMM_WORLD;
>   MPI_Comm_rank(comm,&rank);
>   MPI_Comm_size(comm,&size);
>
>   Mat A;
>   MatCreate(comm, &A);
>   MatSetSizes(A, 1, 1, PETSC_DETERMINE, PETSC_DETERMINE);
>   MatSetFromOptions(A);
>   PetscInt dnz=1, onz=0;
>   MatMPIAIJSetPreallocation(A, 0, &dnz, 0, &onz);
>   MatSetOption(A, MAT_NO_OFF_PROC_ENTRIES, PETSC_TRUE);
>   MatSetOption(A, MAT_IGNORE_ZERO_ENTRIES, PETSC_TRUE);
>   PetscInt igid=rank, jgid=rank;
>   PetscScalar value=rank+1.0;
>
> //  for(int i=0; i<10; ++i)
>   for(;;) //infinite loop
>   {
>     MatSetValue(A, igid, jgid, value, INSERT_VALUES);
>     MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY);
>     MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY);
>   }
>   MatDestroy(&A);
>   PetscFinalize();
>   return ierr;
> }
>
>
> //////////////////////////////////////////////////////////////////////////////////////
>
> creates a simple diagonal matrix with one value per mpi-core. If the type
> of the matrix is "mpiaij" (-mat_type mpiaij) there is no problem but with
> "mpiaijcusparse" (-mat_type mpiaijcusparse) the memory usage at the GPU
> grows with every iteration of the infinite loop. The only solution that I
> found is to destroy and create the matrix every time that it needs to be
> updated. Is there a better way to avoid this problem?
>
> I am using Petsc Release Version 3.12.2 with this configure options:
>
> Configure options --package-prefix-hash=/home_nobck/user/petsc-hash-pkgs
> --with-debugging=0 --with-fc=0 CC=gcc CXX=g++ --COPTFLAGS="-g -O3"
> --CXXOPTFLAGS="-g -O3" --CUDAOPTFLAGS="-D_FORCE_INLINES -g -O3"
> --with-mpi-include=/usr/lib/openmpi/include
> --with-mpi-lib="-L/usr/lib/openmpi/lib -lmpi_cxx -lmpi" --with-cuda=1
> --with-precision=double --with-cuda-include=/usr/include
> --with-cuda-lib="-L/usr/lib/x86_64-linux-gnu -lcuda -lcudart -lcublas
> -lcufft -lcusparse -lcusolver" PETSC_ARCH=arch-ci-linux-opt-cxx-cuda-double
>
> Thanks for your help,
>
> Jorge
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200228/80e7b0eb/attachment.html>

From Zane.Jakobs at colorado.edu  Fri Feb 28 13:10:41 2020
From: Zane.Jakobs at colorado.edu (Zane Charles Jakobs)
Date: Fri, 28 Feb 2020 12:10:41 -0700
Subject: [petsc-users] Correct use of VecGetArray with std::vector
Message-ID: <CABTd2fxJqfuuHYao06a=tRK0yNpGOLo0dj4xm7tmJODw+mKGSA@mail.gmail.com>

Hi PETSc devs,

I'm writing some C++ code that calls PETSc, and I'd like to be able to
place the result of VecGetArray into an std::vector and then later call
VecRestoreArray on that data, or get the same effects. It seems like the
correct way to do this would be something like:

Vec x;
std::vector<PetscScalar> vals, idx;
int   num_vals, global_offset;
PetscErrorCode ierr;
...
/* do some stuff to x and compute num_vals  and global_offset*/
...
vals.resize(num_vals);
idx.resize(num_vals);
std::iota(idx.begin(), idx.end(), global_offset);
ierr = VecGetValues(x, num_vals, idx.data(), vals.data());CHKERRQ(ierr);
/* do stuff to vals */
...
ierr = VecSetValues(x, num_vals, idx.data(), vals.data(), [whatever insert
mode]);CHKERRQ(ierr);
idx.clear();
vals.clear();

Is that correct (in the sense that it does what you'd expect if you
replaced the vectors with pointers to indices/data and used
VecGet/RestoreArray() instead of VecGet/SetValues, and it doesn't violate
any of std::vector's invariants, e.g. by reallocating its memory)? If not,
is there a "normal" way to do this?

Thanks!

-Zane Jakobs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200228/c20610b8/attachment.html>

From knepley at gmail.com  Fri Feb 28 13:21:59 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 28 Feb 2020 14:21:59 -0500
Subject: [petsc-users] Correct use of VecGetArray with std::vector
In-Reply-To: <CABTd2fxJqfuuHYao06a=tRK0yNpGOLo0dj4xm7tmJODw+mKGSA@mail.gmail.com>
References: <CABTd2fxJqfuuHYao06a=tRK0yNpGOLo0dj4xm7tmJODw+mKGSA@mail.gmail.com>
Message-ID: <CAMYG4Gm+m_oU9n8PZXqbFns+J==tZUMFAcMwTKqKL=XUjHKhcg@mail.gmail.com>

On Fri, Feb 28, 2020 at 2:12 PM Zane Charles Jakobs <
Zane.Jakobs at colorado.edu> wrote:

> Hi PETSc devs,
>
> I'm writing some C++ code that calls PETSc, and I'd like to be able to
> place the result of VecGetArray into an std::vector and then later call
> VecRestoreArray on that data, or get the same effects. It seems like the
> correct way to do this would be something like:
>

Why are you calling Get/SetValues() instead Get/SetArray()? Shouldn't you
just get the pointer using GetArray() and stick it into
your std::vector?

  Thanks,

    Matt


> Vec x;
> std::vector<PetscScalar> vals, idx;
> int   num_vals, global_offset;
> PetscErrorCode ierr;
> ...
> /* do some stuff to x and compute num_vals  and global_offset*/
> ...
> vals.resize(num_vals);
> idx.resize(num_vals);
> std::iota(idx.begin(), idx.end(), global_offset);
> ierr = VecGetValues(x, num_vals, idx.data(), vals.data());CHKERRQ(ierr);
> /* do stuff to vals */
> ...
> ierr = VecSetValues(x, num_vals, idx.data(), vals.data(), [whatever insert
> mode]);CHKERRQ(ierr);
> idx.clear();
> vals.clear();
>
> Is that correct (in the sense that it does what you'd expect if you
> replaced the vectors with pointers to indices/data and used
> VecGet/RestoreArray() instead of VecGet/SetValues, and it doesn't violate
> any of std::vector's invariants, e.g. by reallocating its memory)? If not,
> is there a "normal" way to do this?
>
> Thanks!
>
> -Zane Jakobs
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200228/8b911772/attachment.html>

From jczhang at mcs.anl.gov  Fri Feb 28 14:00:54 2020
From: jczhang at mcs.anl.gov (Junchao Zhang)
Date: Fri, 28 Feb 2020 14:00:54 -0600
Subject: [petsc-users] Correct use of VecGetArray with std::vector
In-Reply-To: <CABTd2fxJqfuuHYao06a=tRK0yNpGOLo0dj4xm7tmJODw+mKGSA@mail.gmail.com>
References: <CABTd2fxJqfuuHYao06a=tRK0yNpGOLo0dj4xm7tmJODw+mKGSA@mail.gmail.com>
Message-ID: <CA+MQGp83D-trXBEXhyyJm-X_jY=seLOsgLRvPid9MbVgvxg-hA@mail.gmail.com>

You can create the C++ vector vals and resize it to a proper size, get its
data pointer, then pass it to PETSc,

int n;
Vec x;
std::vector<PetscScalar> vals;
vals.resize(n); /* You need to calculate n by other means */
ierr =
VecCreateMPIWithArray(PETSC_COMM_WORLD,bs,n,PETSC_DECIDE,vals.data(),&v);CHKERRQ(ierr);
// Code using v
ierr = VecDestroy(&v);CHKERRQ(ierr);
// Code using vals;

--Junchao Zhang



On Fri, Feb 28, 2020 at 1:12 PM Zane Charles Jakobs <
Zane.Jakobs at colorado.edu> wrote:

> Hi PETSc devs,
>
> I'm writing some C++ code that calls PETSc, and I'd like to be able to
> place the result of VecGetArray into an std::vector and then later call
> VecRestoreArray on that data, or get the same effects. It seems like the
> correct way to do this would be something like:
>
> Vec x;
> std::vector<PetscScalar> vals, idx;
> int   num_vals, global_offset;
> PetscErrorCode ierr;
> ...
> /* do some stuff to x and compute num_vals  and global_offset*/
> ...
> vals.resize(num_vals);
> idx.resize(num_vals);
> std::iota(idx.begin(), idx.end(), global_offset);
> ierr = VecGetValues(x, num_vals, idx.data(), vals.data());CHKERRQ(ierr);
> /* do stuff to vals */
> ...
> ierr = VecSetValues(x, num_vals, idx.data(), vals.data(), [whatever insert
> mode]);CHKERRQ(ierr);
> idx.clear();
> vals.clear();
>
> Is that correct (in the sense that it does what you'd expect if you
> replaced the vectors with pointers to indices/data and used
> VecGet/RestoreArray() instead of VecGet/SetValues, and it doesn't violate
> any of std::vector's invariants, e.g. by reallocating its memory)? If not,
> is there a "normal" way to do this?
>
> Thanks!
>
> -Zane Jakobs
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200228/0c6de7cd/attachment-0001.html>

From Jed.Brown at Colorado.EDU  Fri Feb 28 14:14:03 2020
From: Jed.Brown at Colorado.EDU (Jed Brown)
Date: Fri, 28 Feb 2020 13:14:03 -0700
Subject: [petsc-users] Correct use of VecGetArray with std::vector
In-Reply-To: <CAMYG4Gm+m_oU9n8PZXqbFns+J==tZUMFAcMwTKqKL=XUjHKhcg@mail.gmail.com>
References: <CABTd2fxJqfuuHYao06a=tRK0yNpGOLo0dj4xm7tmJODw+mKGSA@mail.gmail.com>
	<CAMYG4Gm+m_oU9n8PZXqbFns+J==tZUMFAcMwTKqKL=XUjHKhcg@mail.gmail.com>
Message-ID: <87lfomqxkk.fsf@jedbrown.org>

Matthew Knepley <knepley at gmail.com> writes:

> On Fri, Feb 28, 2020 at 2:12 PM Zane Charles Jakobs <
> Zane.Jakobs at colorado.edu> wrote:
>
>> Hi PETSc devs,
>>
>> I'm writing some C++ code that calls PETSc, and I'd like to be able to
>> place the result of VecGetArray into an std::vector and then later call
>> VecRestoreArray on that data, or get the same effects. It seems like the
>> correct way to do this would be something like:
>>
>
> Why are you calling Get/SetValues() instead Get/SetArray()? Shouldn't you
> just get the pointer using GetArray() and stick it into
> your std::vector?

Can't do this because std::vector can't be wrapped around existing memory.

I would recommend not using std::vector.  Dynamic resizing isn't a
feature you want in this context, and since you'd like to use existing
memory, you need to use a container that can accept existing memory.

From eijkhout at tacc.utexas.edu  Fri Feb 28 15:26:58 2020
From: eijkhout at tacc.utexas.edu (Victor Eijkhout)
Date: Fri, 28 Feb 2020 21:26:58 +0000
Subject: [petsc-users] Correct use of VecGetArray with std::vector
In-Reply-To: <87lfomqxkk.fsf@jedbrown.org>
References: <CABTd2fxJqfuuHYao06a=tRK0yNpGOLo0dj4xm7tmJODw+mKGSA@mail.gmail.com>
	<CAMYG4Gm+m_oU9n8PZXqbFns+J==tZUMFAcMwTKqKL=XUjHKhcg@mail.gmail.com>
	<87lfomqxkk.fsf@jedbrown.org>
Message-ID: <6E47824B-7BE9-43E2-BC98-C0BC39B28987@tacc.utexas.edu>



On , 2020Feb28, at 14:14, Jed Brown <Jed.Brown at Colorado.EDU<mailto:Jed.Brown at Colorado.EDU>> wrote:

Can't do this because std::vector can't be wrapped around existing memory.

That?s why I use gsl::span which ?will be in c++20?

Victor.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200228/e973c5fb/attachment.html>