[petsc-users] KSP changes for successive solver

Wed Jul 29 12:45:39 CDT 2015

Hi Barry,

I tried what you suggested:

1) 5 levels of MG + defaults at the coarse level (PCREDUNDANT)
2) 5 levels of MG + 2 levels of MG via DMDAREPART +  defaults at the
coarse level (PCREDUNDANT)

I attached ksp_view and log_summary for both cases.
The use of PCREDUNDAND halves the time for case 1 ( from ~ 20 sec per
solve to ~ 10 sec per solve ), while it seems not having much effect on
case 2.
Any thoughts on this?

Thanks,
Michele

On Sat, 2015-07-25 at 22:18 -0500, Barry Smith wrote:

>   This dmdarepart business, which I am guessing is running PCMG on smaller sets of processes with a DMDDA on that smaller set of processes for a coarse problem is a fine idea but you should keep in mind the rule of thumb that that parallel iterative (and even more direct) solvers don't do well we there is roughly 10,000 or fewer degrees of freedom per processor.  So you should definitely not be using SuperLU_DIST in parallel to solve a problem with 1048 degrees of freedom on 128 processes, just use PCREDUNDANT and its default (sequential) LU. That should be faster.
> 
>   Barry
> 
> > On Jul 25, 2015, at 10:09 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> > 
> > 
> >  Don't use 
> > 
> > -mg_coarse_pc_factor_mat_solver_package superlu_dist
> > -mg_coarse_pc_type lu
> > 
> >  with 8000+ processes and 1 degree of freedom per process SuperLU_DIST will be terrible. Just leave the defaults for this and send the -log_summary
> > 
> >  Barry
> > 
> >> On Jul 24, 2015, at 2:44 PM, Michele Rosso <mrosso at uci.edu> wrote:
> >> 
> >> Barry,
> >> 
> >> I attached ksp_view and log_summary for two different setups:
> >> 
> >> 1) Plain MG on 5 levels + LU at the coarse level (files ending in mg5)
> >> 2) Plain MG on 5 levels + custom PC + LU at the coarse level (files ending in mg7)
> >> 
> >> The custom PC works on a subset of processes, thus allowing to use two more levels of MG, for a total of 7.
> >> Case 1) is extremely slow ( ~ 20 sec per solve ) and converges in 21 iterations.
> >> Case 2) is way faster ( ~ 0.25 sec per solve ) and converges in 29 iterations.
> >> 
> >> Thanks for your help!
> >> 
> >> Michele
> >> 
> >> 
> >> On Fri, 2015-07-24 at 13:56 -0500, Barry Smith wrote:
> >>>  The coarse problem for the PCMG (geometric multigrid) is 
> >>> 
> >>> Mat Object:       8192 MPI processes
> >>>        type: mpiaij
> >>>        rows=8192, cols=8192
> >>> 
> >>> then it tries to solve it with algebraic multigrid on 8192 processes (which is completely insane). A lot of the time is spent in setting up the algebraic multigrid (not surprisingly).
> >>> 
> >>> 8192 is kind of small to parallelize.  Please run the same code but with the default coarse grid problem instead of PCGAMG and send us the -log_summary again
> >>> 
> >>>  Barry
> >>> 
> >>> 
> >>>> On Jul 24, 2015, at 1:35 PM, Michele Rosso <mrosso at uci.edu> wrote:
> >>>> 
> >>>> Hi Mark and Barry,
> >>>> 
> >>>> I am sorry for my late reply: it was a busy week!
> >>>> I run a test case for a larger problem with  as many levels (i.e. 5) of MG I could and  GAMG as PC at the coarse level. I attached the output of info ( after grep for "gmag"),  ksp_view and log_summary.
> >>>> The solve takes about 2 seconds on 8192 cores, which is way too much. The number of iterations to convergence is 24.
> >>>> I hope there is a way to speed it up.
> >>>> 
> >>>> Thanks,
> >>>> Michele
> >>>> 
> >>>> 
> >>>> On Fri, 2015-07-17 at 09:38 -0400, Mark Adams wrote:
> >>>>> 
> >>>>> 
> >>>>> On Thu, Jul 16, 2015 at 8:18 PM, Michele Rosso <mrosso at uci.edu> wrote:
> >>>>> Barry,
> >>>>> 
> >>>>> thank you very much for the detailed answer.  I tried what you suggested and it works.
> >>>>> So far I tried on a small system but the final goal is to use it for very large runs.  How does  PCGAMG compares to PCMG  as far as performances and scalability are concerned?
> >>>>> Also, could you help me to tune the GAMG part ( my current setup is in the attached ksp_view.txt file )? 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> I am going to add this to the document today but you can run with -info.  This is very noisy so you might want to do the next step at run time.  Then grep on GAMG.  This will be about 20 lines.  Send that to us and we can go from there.
> >>>>> 
> >>>>> 
> >>>>> Mark
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> I also tried to use superlu_dist for the LU decomposition on mg_coarse_mg_sub_
> >>>>> -mg_coarse_mg_coarse_sub_pc_type lu
> >>>>> -mg_coarse_mg_coarse_sub_pc_factor_mat_solver_package superlu_dist
> >>>>> 
> >>>>> but I got an error:
> >>>>> 
> >>>>> ****** Error in MC64A/AD. INFO(1) = -2 
> >>>>> ****** Error in MC64A/AD. INFO(1) = -2
> >>>>> ****** Error in MC64A/AD. INFO(1) = -2
> >>>>> ****** Error in MC64A/AD. INFO(1) = -2
> >>>>> ****** Error in MC64A/AD. INFO(1) = -2
> >>>>> ****** Error in MC64A/AD. INFO(1) = -2
> >>>>> ****** Error in MC64A/AD. INFO(1) = -2
> >>>>> symbfact() error returns 0
> >>>>> symbfact() error returns 0
> >>>>> symbfact() error returns 0
> >>>>> symbfact() error returns 0
> >>>>> symbfact() error returns 0
> >>>>> symbfact() error returns 0
> >>>>> symbfact() error returns 0
> >>>>> 
> >>>>> 
> >>>>> Thank you,
> >>>>> Michele
> >>>>> 
> >>>>> 
> >>>>> On Thu, 2015-07-16 at 18:07 -0500, Barry Smith wrote:
> >>>>>> 
> >>>>>>> On Jul 16, 2015, at 5:42 PM, Michele Rosso <mrosso at uci.edu> wrote:
> >>>>>>> 
> >>>>>>> Barry,
> >>>>>>> 
> >>>>>>> thanks for your reply. So if I want it fixed, I will have to use the master branch, correct?
> >>>>>> 
> >>>>>> 
> >>>>>>  Yes, or edit mg.c and remove the offending lines of code (easy enough). 
> >>>>>> 
> >>>>>>> 
> >>>>>>> On a side note, what I am trying to achieve is to be able to use how many levels of MG I want, despite the limitation imposed by the local number of grid nodes.
> >>>>>> 
> >>>>>> 
> >>>>>>   I assume you are talking about with DMDA? There is no generic limitation for PETSc's multigrid, it is only with the way the DMDA code figures out the interpolation that causes a restriction.
> >>>>>> 
> >>>>>> 
> >>>>>>> So far I am using a borrowed code that implements a PC that creates a sub communicator and perform MG on it.
> >>>>>>> While reading the documentation I found out that PCMGSetLevels takes in an optional array of communicators. How does this work?
> >>>>>> 
> >>>>>> 
> >>>>>>   It doesn't work. It was an idea that never got pursued.
> >>>>>> 
> >>>>>> 
> >>>>>>> Can I can simply define my matrix and rhs on the fine grid as I would do normally ( I do not use kspsetoperators and kspsetrhs ) and KSP would take care of it by using the correct communicator for each level?
> >>>>>> 
> >>>>>> 
> >>>>>>   No.
> >>>>>> 
> >>>>>>   You can use the PCMG geometric multigrid with DMDA for as many levels as it works and then use PCGAMG as the coarse grid solver. PCGAMG automatically uses fewer processes for the coarse level matrices and vectors. You could do this all from the command line without writing code. 
> >>>>>> 
> >>>>>>   For example if your code uses a DMDA and calls KSPSetDM() use for example -da_refine 3 -pc_type mg -pc_mg_galerkin -mg_coarse_pc_type gamg  -ksp_view 
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>>  Barry
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>>> 
> >>>>>>> Thanks,
> >>>>>>> Michele
> >>>>>>> 
> >>>>>>> 
> >>>>>>> 
> >>>>>>> 
> >>>>>>> On Thu, 2015-07-16 at 17:30 -0500, Barry Smith wrote:
> >>>>>>>>   Michel,
> >>>>>>>> 
> >>>>>>>>    This is a very annoying feature that has been fixed in master 
> >>>>>>>> http://www.mcs.anl.gov/petsc/developers/index.html
> >>>>>>>>  I would like to have changed it in maint but Jed would have a shit-fit :-) since it changes behavior.
> >>>>>>>> 
> >>>>>>>>  Barry
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>>> On Jul 16, 2015, at 4:53 PM, Michele Rosso <mrosso at uci.edu> wrote:
> >>>>>>>>> 
> >>>>>>>>> Hi,
> >>>>>>>>> 
> >>>>>>>>> I am performing a series of solves inside a loop. The matrix for each solve changes but not enough to justify a rebuilt of the PC at each solve.
> >>>>>>>>> Therefore I am using  KSPSetReusePreconditioner to avoid rebuilding unless necessary. The solver is CG + MG with a custom  PC at the coarse level.
> >>>>>>>>> If KSP is not updated each time, everything works as it is supposed to. 
> >>>>>>>>> When instead I allow the default PETSc  behavior, i.e. updating PC every time the matrix changes, the coarse level KSP , initially set to PREONLY, is changed into GMRES 
> >>>>>>>>> after the first solve. I am not sure where the problem lies (my PC or PETSc), so I would like to have your opinion on this.
> >>>>>>>>> I attached the ksp_view for the 2 successive solve and the options stack.
> >>>>>>>>> 
> >>>>>>>>> Thanks for your help,
> >>>>>>>>> Michel
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>> <ksp_view.txt><petsc_options.txt>
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>> 
> >>>> <info.txt><ksp_view.txt><log_gamg.txt>
> >>> 
> >>> 
> >>> 
> >> 
> >> <ksp_view_mg5.txt><ksp_view_mg7.txt><log_mg5.txt><log_mg7.txt>
> > 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150729/32cf2739/attachment-0001.html>
-------------- next part --------------
KSP Object: 8192 MPI processes
  type: cg
  maximum iterations=10000
  tolerances:  relative=1e-09, absolute=1e-50, divergence=10000
  left preconditioning
  using nonzero initial guess
  using UNPRECONDITIONED norm type for convergence test
PC Object: 8192 MPI processes
  type: mg
    MG: type is MULTIPLICATIVE, levels=5 cycles=v
      Cycles per PCApply=1
      Using Galerkin computed coarse grid matrices
  Coarse grid solver -- level -------------------------------
    KSP Object:    (mg_coarse_)     8192 MPI processes
      type: preonly
      maximum iterations=1, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using NONE norm type for convergence test
    PC Object:    (mg_coarse_)     8192 MPI processes
      type: dmdarepart
        DMDARepart: parent comm size reduction factor = 64
        DMDARepart: subcomm_size = 128
      KSP Object:      (mg_coarse_dmdarepart_)       128 MPI processes
        type: preonly
        maximum iterations=10000, initial guess is zero
        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
        left preconditioning
        using NONE norm type for convergence test
      PC Object:      (mg_coarse_dmdarepart_)       128 MPI processes
        type: mg
          MG: type is MULTIPLICATIVE, levels=2 cycles=v
            Cycles per PCApply=1
            Using Galerkin computed coarse grid matrices
        Coarse grid solver -- level -------------------------------
          KSP Object:          (mg_coarse_dmdarepart_mg_coarse_)           128 MPI processes
            type: preonly
            maximum iterations=1, initial guess is zero
            tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
            left preconditioning
            using NONE norm type for convergence test
          PC Object:          (mg_coarse_dmdarepart_mg_coarse_)           128 MPI processes
            type: redundant
              Redundant preconditioner: First (color=0) of 128 PCs follows
            KSP Object:            (mg_coarse_dmdarepart_mg_coarse_redundant_)             1 MPI processes
              type: preonly
              maximum iterations=10000, initial guess is zero
              tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
              left preconditioning
              using NONE norm type for convergence test
            PC Object:            (mg_coarse_dmdarepart_mg_coarse_redundant_)             1 MPI processes
              type: lu
                LU: out-of-place factorization
                tolerance for zero pivot 2.22045e-14
                using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
                matrix ordering: nd
                factor fill ratio given 5, needed 9.76317
                  Factored matrix follows:
                    Mat Object:                     1 MPI processes
                      type: seqaij
                      rows=1024, cols=1024
                      package used to perform factorization: petsc
                      total: nonzeros=63734, allocated nonzeros=63734
                      total number of mallocs used during MatSetValues calls =0
                        not using I-node routines
              linear system matrix = precond matrix:
              Mat Object:               1 MPI processes
                type: seqaij
                rows=1024, cols=1024
                total: nonzeros=6528, allocated nonzeros=6528
                total number of mallocs used during MatSetValues calls =0
                  not using I-node routines
            linear system matrix = precond matrix:
            Mat Object:             128 MPI processes
              type: mpiaij
              rows=1024, cols=1024
              total: nonzeros=6528, allocated nonzeros=6528
              total number of mallocs used during MatSetValues calls =0
                not using I-node (on process 0) routines
        Down solver (pre-smoother) on level 1 -------------------------------
          KSP Object:          (mg_coarse_dmdarepart_mg_levels_1_)           128 MPI processes
            type: richardson
              Richardson: damping factor=1
            maximum iterations=2
            tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
            left preconditioning
            using nonzero initial guess
            using NONE norm type for convergence test
          PC Object:          (mg_coarse_dmdarepart_mg_levels_1_)           128 MPI processes
            type: sor
              SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
            linear system matrix = precond matrix:
            Mat Object:             128 MPI processes
              type: mpiaij
              rows=8192, cols=8192
              total: nonzeros=54784, allocated nonzeros=54784
              total number of mallocs used during MatSetValues calls =0
                not using I-node (on process 0) routines
        Up solver (post-smoother) same as down solver (pre-smoother)
        linear system matrix = precond matrix:
        Mat Object:         128 MPI processes
          type: mpiaij
          rows=8192, cols=8192
          total: nonzeros=54784, allocated nonzeros=54784
          total number of mallocs used during MatSetValues calls =0
            not using I-node (on process 0) routines
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=8192, cols=8192
        total: nonzeros=54784, allocated nonzeros=54784
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Down solver (pre-smoother) on level 1 -------------------------------
    KSP Object:    (mg_levels_1_)     8192 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_1_)     8192 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=65536, cols=65536
        total: nonzeros=448512, allocated nonzeros=448512
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 2 -------------------------------
    KSP Object:    (mg_levels_2_)     8192 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_2_)     8192 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=524288, cols=524288
        total: nonzeros=3.62906e+06, allocated nonzeros=3.62906e+06
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 3 -------------------------------
    KSP Object:    (mg_levels_3_)     8192 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_3_)     8192 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=4194304, cols=4194304
        total: nonzeros=2.91963e+07, allocated nonzeros=2.91963e+07
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 4 -------------------------------
    KSP Object:    (mg_levels_4_)     8192 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_4_)     8192 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=33554432, cols=33554432
        total: nonzeros=2.34226e+08, allocated nonzeros=2.34226e+08
        total number of mallocs used during MatSetValues calls =0
          has attached null space
  Up solver (post-smoother) same as down solver (pre-smoother)
  linear system matrix = precond matrix:
  Mat Object:   8192 MPI processes
    type: mpiaij
    rows=33554432, cols=33554432
    total: nonzeros=2.34226e+08, allocated nonzeros=2.34226e+08
    total number of mallocs used during MatSetValues calls =0
      has attached null space

#PETSc Option Table entries:
-finput input.txt
-ksp_initial_guess_nonzero yes
-ksp_norm_type unpreconditioned
-ksp_rtol 1e-9
-ksp_type cg
-ksp_view
-log_summary log_mg7.txt
-mg_coarse_dmdarepart_ksp_constant_null_space
-mg_coarse_dmdarepart_ksp_type preonly
-mg_coarse_dmdarepart_mg_coarse_ksp_type preonly
-mg_coarse_dmdarepart_mg_coarse_pc_type redundant
-mg_coarse_dmdarepart_mg_levels_ksp_type richardson
-mg_coarse_dmdarepart_pc_mg_galerkin
-mg_coarse_dmdarepart_pc_mg_levels 2
-mg_coarse_dmdarepart_pc_type mg
-mg_coarse_ksp_type preonly
-mg_coarse_pc_dmdarepart_factor 64
-mg_coarse_pc_type dmdarepart
-mg_levels_ksp_type richardson
-options_left
-pc_dmdarepart_log
-pc_dmdarepart_monitor
-pc_mg_galerkin
-pc_mg_levels 5
-pc_type mg
#End of PETSc Option Table entries
There are 3 unused database options. They are:
Option left: name:-finput value: input.txt
Option left: name:-mg_coarse_dmdarepart_ksp_constant_null_space (no value)
Option left: name:-pc_dmdarepart_monitor (no value)
Application 25736695 resources: utime ~29149s, stime ~48455s, Rss ~64608, inblocks ~6174814, outblocks ~18104253
-------------- next part --------------
KSP Object: 8192 MPI processes
  type: cg
  maximum iterations=10000
  tolerances:  relative=1e-09, absolute=1e-50, divergence=10000
  left preconditioning
  using nonzero initial guess
  using UNPRECONDITIONED norm type for convergence test
PC Object: 8192 MPI processes
  type: mg
    MG: type is MULTIPLICATIVE, levels=5 cycles=v
      Cycles per PCApply=1
      Using Galerkin computed coarse grid matrices
  Coarse grid solver -- level -------------------------------
    KSP Object:    (mg_coarse_)     8192 MPI processes
      type: preonly
      maximum iterations=1, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using NONE norm type for convergence test
    PC Object:    (mg_coarse_)     8192 MPI processes
      type: redundant
        Redundant preconditioner: First (color=0) of 8192 PCs follows
      KSP Object:      (mg_coarse_redundant_)       1 MPI processes
        type: preonly
        maximum iterations=10000, initial guess is zero
        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
        left preconditioning
        using NONE norm type for convergence test
      PC Object:      (mg_coarse_redundant_)       1 MPI processes
        type: lu
          LU: out-of-place factorization
          tolerance for zero pivot 2.22045e-14
          using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
          matrix ordering: nd
          factor fill ratio given 5, needed 23.9038
            Factored matrix follows:
              Mat Object:               1 MPI processes
                type: seqaij
                rows=8192, cols=8192
                package used to perform factorization: petsc
                total: nonzeros=1.30955e+06, allocated nonzeros=1.30955e+06
                total number of mallocs used during MatSetValues calls =0
                  not using I-node routines
        linear system matrix = precond matrix:
        Mat Object:         1 MPI processes
          type: seqaij
          rows=8192, cols=8192
          total: nonzeros=54784, allocated nonzeros=54784
          total number of mallocs used during MatSetValues calls =0
            not using I-node routines
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=8192, cols=8192
        total: nonzeros=54784, allocated nonzeros=54784
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Down solver (pre-smoother) on level 1 -------------------------------
    KSP Object:    (mg_levels_1_)     8192 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_1_)     8192 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=65536, cols=65536
        total: nonzeros=448512, allocated nonzeros=448512
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 2 -------------------------------
    KSP Object:    (mg_levels_2_)     8192 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_2_)     8192 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=524288, cols=524288
        total: nonzeros=3.62906e+06, allocated nonzeros=3.62906e+06
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 3 -------------------------------
    KSP Object:    (mg_levels_3_)     8192 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_3_)     8192 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=4194304, cols=4194304
        total: nonzeros=2.91963e+07, allocated nonzeros=2.91963e+07
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 4 -------------------------------
    KSP Object:    (mg_levels_4_)     8192 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_4_)     8192 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=33554432, cols=33554432
        total: nonzeros=2.34226e+08, allocated nonzeros=2.34226e+08
        total number of mallocs used during MatSetValues calls =0
          has attached null space
  Up solver (post-smoother) same as down solver (pre-smoother)
  linear system matrix = precond matrix:
  Mat Object:   8192 MPI processes
    type: mpiaij
    rows=33554432, cols=33554432
    total: nonzeros=2.34226e+08, allocated nonzeros=2.34226e+08
    total number of mallocs used during MatSetValues calls =0
      has attached null space
-------------- next part --------------
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

/u/sciteam/mrosso/mrosso-repo/build/bin/test_droplet_box.exe on a gnu-opt-32idx named p…þÿÿ with 8192 processors, by mrosso Tue Jul 28 16:20:21 2015
Using Petsc Development GIT revision: v3.6-233-g4936542  GIT Date: 2015-07-17 10:15:47 -0500

                         Max       Max/Min        Avg      Total 
Time (sec):           7.498e+00      1.01676   7.375e+00
Objects:              1.385e+03      1.30537   1.066e+03
Flops:                9.815e+07      1.30922   7.642e+07  6.260e+11
Flops/sec:            1.331e+07      1.30928   1.036e+07  8.488e+10
MPI Messages:         3.595e+04      5.80931   1.225e+04  1.003e+08
MPI Message Lengths:  9.104e+06      2.00024   7.063e+02  7.086e+10
MPI Reductions:       1.427e+03      1.09349

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 7.0526e+00  95.6%  6.2314e+11  99.5%  9.376e+07  93.5%  7.044e+02       99.7%  1.260e+03  88.3% 
 1: PCRprt_SetUpMat: 2.7279e-02   0.4%  6.5418e+05   0.0%  6.123e+05   0.6%  5.817e-02        0.0%  4.425e+01   3.1% 
 2:    PCRprt_Apply: 2.9504e-01   4.0%  2.8632e+09   0.5%  5.947e+06   5.9%  1.880e+00        0.3%  1.156e+00   0.1% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecTDot              232 1.0 3.9837e-02 2.6 1.90e+06 1.0 0.0e+00 0.0e+00 2.3e+02  0  2  0  0 16   0  2  0  0 18 390775
VecNorm              123 1.0 1.7174e-02 1.9 1.01e+06 1.0 0.0e+00 0.0e+00 1.2e+02  0  1  0  0  9   0  1  0  0 10 480626
VecScale            1048 1.0 1.5078e-0218.8 1.92e+05 1.8 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 99231
VecCopy              121 1.0 1.2872e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              1647 1.0 1.6298e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              696 1.0 6.7093e-03 1.4 5.70e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  7  0  0  0   0  7  0  0  0 6961607
VecAYPX              927 1.0 4.6690e-03 1.4 2.90e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  4  0  0  0   0  4  0  0  0 5084883
VecAssemblyBegin       4 1.0 1.3000e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01  0  0  0  0  1   0  0  0  0  1     0
VecAssemblyEnd         4 1.0 1.4210e-0429.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin     2907 1.0 2.7453e-02 2.1 0.00e+00 0.0 9.2e+07 7.6e+02 0.0e+00  0  0 92 99  0   0  0 98 99  0     0
VecScatterEnd       2907 1.0 1.8748e-01 3.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
MatMult              931 1.0 2.3768e-01 2.6 3.19e+07 1.0 4.3e+07 1.4e+03 0.0e+00  2 42 43 84  0   2 42 46 84  0 1096892
MatMultAdd           464 1.0 4.9362e-03 1.2 1.09e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 1801895
MatMultTranspose     468 1.0 1.6587e-02 2.6 1.10e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 540858
MatSOR              1160 1.0 1.8799e-01 1.6 3.03e+07 1.0 4.9e+07 2.2e+02 0.0e+00  2 40 48 15  0   2 40 52 15  0 1319153
MatResidual          464 1.0 7.4724e-02 2.5 7.60e+06 1.0 2.2e+07 6.8e+02 0.0e+00  1 10 22 21  0   1 10 23 21  0 830522
MatAssemblyBegin      26 1.0 3.0778e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 3.6e+01  0  0  0  0  3   0  0  0  0  3     0
MatAssemblyEnd        26 1.0 3.6265e-02 1.0 0.00e+00 0.0 4.8e+05 1.3e+02 8.0e+01  0  0  0  0  6   0  0  1  0  6     0
MatView               55 1.8 3.3602e-01 9.8 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+01  4  0  0  0  2   5  0  0  0  2     0
MatPtAP                8 1.0 4.7572e-02 1.0 2.06e+05 1.0 1.1e+06 3.5e+02 7.6e+01  1  0  1  1  5   1  0  1  1  6 35313
MatPtAPSymbolic        4 1.0 2.7729e-02 1.1 0.00e+00 0.0 5.6e+05 4.5e+02 2.8e+01  0  0  1  0  2   0  0  1  0  2     0
MatPtAPNumeric         8 1.0 2.1160e-02 1.1 2.06e+05 1.0 5.6e+05 2.6e+02 4.8e+01  0  0  1  0  3   0  0  1  0  4 79392
MatGetLocalMat         8 1.0 6.5184e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetBrAoCol          8 1.0 1.9581e-03 2.4 0.00e+00 0.0 7.5e+05 5.1e+02 0.0e+00  0  0  1  1  0   0  0  1  1  0     0
MatGetSymTrans         8 1.0 1.2302e-04 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp              14 1.0 6.8645e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.4e+01  0  0  0  0  1   0  0  0  0  1     0
KSPSolve               4 1.0 1.0214e+00 1.0 9.81e+07 1.3 1.0e+08 7.1e+02 1.2e+03 14100100100 86  14100107100 97 612784
PCSetUp                4 1.0 1.7279e-01 1.0 2.76e+05 1.0 2.2e+06 1.9e+02 2.8e+02  2  0  2  1 20   2  0  2  1 22 13054
PCApply              116 1.0 7.6665e-01 1.0 8.58e+07 1.4 9.2e+07 6.4e+02 4.7e+02 10 84 92 83 33  11 84 99 83 37 684611

--- Event Stage 1: PCRprt_SetUpMat

VecSet                 3 1.5 1.3113e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin      10 1.2 5.4898e-03 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 4.1e+00  0  0  0  0  0  11  0  0  0  9     0
MatAssemblyEnd        10 1.2 9.6285e-03 1.1 0.00e+00 0.0 1.9e+05 4.2e+00 1.6e+01  0  0  0  0  1  33  0 31 13 36     0
MatGetRow            192 0.0 4.2677e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetSubMatrice       2 1.0 1.0698e-02 2.3 0.00e+00 0.0 8.1e+04 2.3e+01 6.0e+00  0  0  0  0  0  22  0 13 32 14     0
MatZeroEntries         1 0.0 3.0994e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatPtAP                2 1.0 2.0634e-02 1.0 8.40e+01 2.6 5.3e+05 7.4e+00 3.4e+01  0  0  1  0  2  75100 87 67 77    32
MatPtAPSymbolic        2 1.0 8.6851e-03 1.1 0.00e+00 0.0 3.3e+05 7.0e+00 1.4e+01  0  0  0  0  1  31  0 54 40 32     0
MatPtAPNumeric         2 1.0 1.2376e-02 1.0 8.40e+01 2.6 2.0e+05 7.9e+00 2.0e+01  0  0  0  0  1  44100 33 28 45    53
MatGetLocalMat         2 1.0 6.1274e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetBrAoCol          2 1.0 4.8995e-04 3.7 0.00e+00 0.0 2.8e+05 5.3e+00 0.0e+00  0  0  0  0  0   1  0 46 26  0     0
MatGetSymTrans         4 1.0 2.0742e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0

--- Event Stage 2: PCRprt_Apply

VecScale             348 0.0 2.3985e-04 0.0 3.34e+04 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 14114
VecSet              1167 3.4 5.2118e-04 9.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAYPX              116 0.0 7.3195e-05 0.0 7.42e+03 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 12983
VecScatterBegin     1393 3.0 3.2119e-02112.6 0.00e+00 0.0 5.9e+06 3.2e+01 0.0e+00  0  0  6  0  0   0  0 99 99  0     0
VecScatterEnd       1393 3.0 3.2946e-01 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  4  0  0  0  0  99  0  0  0  0     0
MatMult              232 2.0 4.5841e-02336.1 9.67e+04834.0 1.0e+06 1.6e+01 0.0e+00  0  0  1  0  0   1  0 17  9  0   298
MatMultAdd           116 0.0 2.9373e-04 0.0 1.48e+04 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  6470
MatMultTranspose     233 2.0 3.0067e-0290.1 1.52e+0465.6 9.4e+05 8.0e+00 0.0e+00  0  0  1  0  0   1  0 16  4  0   127
MatSolve             116 0.0 2.3469e-02 0.0 1.47e+07 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0 66  0  0  0 79995
MatSOR               232 0.0 4.8394e-02 0.0 5.50e+05 0.0 2.1e+05 1.3e+02 0.0e+00  0  0  0  0  0   0  2  4 14  0  1398
MatLUFactorSym         1 0.0 2.5880e-03 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLUFactorNum         2 0.0 1.0722e-02 0.0 7.01e+06 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0 31  0  0  0 83692
MatCopy                1 0.0 3.0041e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatConvert             1 0.0 7.4148e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatResidual          116 0.0 4.5305e-02 0.0 1.04e+05 0.0 7.1e+04 1.3e+02 0.0e+00  0  0  0  0  0   0  0  1  5  0   281
MatAssemblyBegin       6 0.0 4.5967e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 9.4e-02  0  0  0  0  0   0  0  0  0  8     0
MatAssemblyEnd         6 0.0 9.6583e-04 0.0 0.00e+00 0.0 1.2e+03 1.0e+01 2.5e-01  0  0  0  0  0   0  0  0  0 22     0
MatGetRowIJ            1 0.0 9.5844e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetSubMatrice       2 0.0 2.3339e-03 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 9.4e-02  0  0  0  0  0   0  0  0  0  8     0
MatGetOrdering         1 0.0 8.8000e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatPtAP                2 0.0 1.5650e-03 0.0 2.82e+03 0.0 3.6e+03 6.7e+01 3.0e-01  0  0  0  0  0   0  0  0  0 26   217
MatPtAPSymbolic        1 0.0 6.5613e-04 0.0 0.00e+00 0.0 1.8e+03 8.5e+01 1.1e-01  0  0  0  0  0   0  0  0  0  9     0
MatPtAPNumeric         2 0.0 9.1791e-04 0.0 2.82e+03 0.0 1.8e+03 4.9e+01 1.9e-01  0  0  0  0  0   0  0  0  0 16   370
MatRedundantMat        2 0.0 2.4142e-03 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 9.4e-02  0  0  0  0  0   0  0  0  0  8     0
MatGetLocalMat         2 0.0 3.7909e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetBrAoCol          2 0.0 2.0623e-04 0.0 0.00e+00 0.0 2.4e+03 9.6e+01 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetSymTrans         2 0.0 4.0531e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp               8 0.0 1.2207e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 3.1e-02  0  0  0  0  0   0  0  0  0  3     0
KSPSolve             116 0.0 2.6315e-01 0.0 2.24e+07 0.0 2.2e+06 7.2e+01 1.2e+00  0  0  2  0  0   1100 37 84100 10866
PCSetUp                2 0.0 4.0980e-02 0.0 7.01e+06 0.0 3.8e+04 5.0e+01 1.2e+00  0  0  0  0  0   0 31  1  1100 21909
PCApply              116 0.0 2.2205e-01 0.0 1.54e+07 0.0 2.2e+06 7.2e+01 0.0e+00  0  0  2  0  0   1 69 36 83  0  8834
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector   778            791      2774488     0
      Vector Scatter    18             23        29872     0
              Matrix    38             52      1988092     0
   Matrix Null Space     1              1          584     0
    Distributed Mesh     7              7        34664     0
Star Forest Bipartite Graph    14             14        11760     0
     Discrete System     7              7         5880     0
           Index Set    36             41        67040     0
   IS L to G Mapping     7              7         8480     0
       Krylov Solver    11             11        13376     0
     DMKSP interface     4              5         3200     0
      Preconditioner    11             11        10864     0
              Viewer    13             11         8272     0

--- Event Stage 1: PCRprt_SetUpMat

              Vector     6              5         7840     0
      Vector Scatter     3              2         2128     0
              Matrix    15             12        43656     0
           Index Set    10             10         7896     0

--- Event Stage 2: PCRprt_Apply

              Vector   369            357       686800     0
      Vector Scatter     5              0            0     0
              Matrix    11              0            0     0
    Distributed Mesh     1              0            0     0
Star Forest Bipartite Graph     2              0            0     0
     Discrete System     1              0            0     0
           Index Set    15             10        16000     0
   IS L to G Mapping     1              0            0     0
     DMKSP interface     1              0            0     0
========================================================================================================================
Average time to get PetscTime(): 1.19209e-07
Average time for MPI_Barrier(): 5.19753e-05
Average time for zero size MPI_Send(): 2.16846e-05
#PETSc Option Table entries:
-finput input.txt
-ksp_initial_guess_nonzero yes
-ksp_norm_type unpreconditioned
-ksp_rtol 1e-9
-ksp_type cg
-ksp_view
-log_summary log_mg7.txt
-mg_coarse_dmdarepart_ksp_constant_null_space
-mg_coarse_dmdarepart_ksp_type preonly
-mg_coarse_dmdarepart_mg_coarse_ksp_type preonly
-mg_coarse_dmdarepart_mg_coarse_pc_type redundant
-mg_coarse_dmdarepart_mg_levels_ksp_type richardson
-mg_coarse_dmdarepart_pc_mg_galerkin
-mg_coarse_dmdarepart_pc_mg_levels 2
-mg_coarse_dmdarepart_pc_type mg
-mg_coarse_ksp_type preonly
-mg_coarse_pc_dmdarepart_factor 64
-mg_coarse_pc_type dmdarepart
-mg_levels_ksp_type richardson
-options_left
-pc_dmdarepart_log
-pc_dmdarepart_monitor
-pc_mg_galerkin
-pc_mg_levels 5
-pc_type mg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --known-level1-dcache-size=16384 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=4 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-memcmp-ok=1 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1 --known-mpi-c-double-complex=1 --known-sdot-returns-double=0 --known-snrm2-returns-double=0 --with-batch="1 " --known-mpi-shared="0 " --known-mpi-shared-libraries=0 --known-memcmp-ok  --with-blas-lapack-lib=/opt/acml/5.3.1/gfortran64/lib/libacml.a --COPTFLAGS="-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O3 -march=native -mtune=native" --FOPTFLAGS="-Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O3 -march=native -mtune=native" --CXXOPTFLAGS="-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O3 -march=native -mtune=native" --with-x="0 " --with-debugging=0 --with-clib-autodetect="0 " --with-cxxlib-autodetect="0 " --with-fortranlib-autodetect="0 " --with-shared-libraries="0 " --with-mpi-compilers="1 " --with-cc="cc " --with-cxx="CC " --with-fc="ftn " --download-hypre=1 --download-blacs="1 " --download-scalapack="1 " --download-superlu_dist="1 " --download-metis="1 " --download-parmetis="1 " PETSC_ARCH=gnu-opt-32idx
-----------------------------------------
Libraries compiled on Sat Jul 18 19:48:51 2015 on h2ologin1 
Machine characteristics: Linux-3.0.101-0.46-default-x86_64-with-SuSE-11-x86_64
Using PETSc directory: /u/sciteam/mrosso/LIBS/petsc
Using PETSc arch: gnu-opt-32idx
-----------------------------------------

Using C compiler: cc  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O3 -march=native -mtune=native  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: ftn  -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O3 -march=native -mtune=native   ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/include -I/u/sciteam/mrosso/LIBS/petsc/include -I/u/sciteam/mrosso/LIBS/petsc/include -I/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/include
-----------------------------------------

Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -L/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -lpetsc -Wl,-rpath,/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -L/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -lsuperlu_dist_4.0 -lHYPRE -lscalapack -Wl,-rpath,/opt/acml/5.3.1/gfortran64/lib -L/opt/acml/5.3.1/gfortran64/lib -lacml -lparmetis -lmetis -lssl -lcrypto -ldl 
-----------------------------------------

-------------- next part --------------
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

/u/sciteam/mrosso/mrosso-repo/build/bin/test_droplet_box.exe on a gnu-opt-32idx named p…þÿÿ with 8192 processors, by mrosso Tue Jul 28 15:28:29 2015
Using Petsc Development GIT revision: v3.6-233-g4936542  GIT Date: 2015-07-17 10:15:47 -0500

                         Max       Max/Min        Avg      Total 
Time (sec):           5.098e+02      1.00007   5.098e+02
Objects:              7.400e+02      1.00000   7.400e+02
Flops:                5.499e+08      1.00167   5.498e+08  4.504e+12
Flops/sec:            1.079e+06      1.00174   1.078e+06  8.834e+09
MPI Messages:         7.381e+05      1.00619   7.376e+05  6.043e+09
MPI Message Lengths:  1.267e+07      1.36946   1.669e+01  1.008e+11
MPI Reductions:       1.009e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 5.0982e+02 100.0%  4.5037e+12 100.0%  6.043e+09 100.0%  1.669e+01      100.0%  1.008e+03  99.9% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecTDot              174 1.0 1.5646e-01 1.5 1.43e+06 1.0 0.0e+00 0.0e+00 1.7e+02  0  0  0  0 17   0  0  0  0 17 74621
VecNorm               94 1.0 5.5188e-02 2.5 7.70e+05 1.0 0.0e+00 0.0e+00 9.4e+01  0  0  0  0  9   0  0  0  0  9 114305
VecScale             787 1.0 1.4017e-03 1.9 1.48e+05 1.8 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 824521
VecCopy               92 1.0 1.0190e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              1329 1.0 3.7305e-03 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              522 1.0 5.5845e-03 1.3 4.28e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 6272892
VecAYPX              695 1.0 3.0615e-02 9.2 2.17e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 580237
VecAssemblyBegin       4 1.0 1.3102e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01  0  0  0  0  1   0  0  0  0  1     0
VecAssemblyEnd         4 1.0 1.8620e-0432.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin     2356 1.0 1.6390e+01 4.7 0.00e+00 0.0 5.9e+09 1.7e+01 0.0e+00  2  0 98 99  0   2  0 98 99  0     0
VecScatterEnd       2356 1.0 4.1647e+02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 69  0  0  0  0  69  0  0  0  0     0
MatMult              699 1.0 5.2895e+01643.0 2.40e+07 1.0 3.3e+07 1.4e+03 0.0e+00  1  4  1 44  0   1  4  1 44  0  3703
MatMultAdd           348 1.0 5.8870e-03 1.5 8.14e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 1133153
MatMultTranspose     352 1.0 6.3620e-03 1.3 8.24e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 1060614
MatSolve              87 1.0 3.9927e-01 1.3 2.27e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0 41  0  0  0   0 41  0  0  0 4660544
MatSOR               870 1.0 1.1567e+02523.3 2.27e+07 1.0 3.6e+07 2.2e+02 0.0e+00  7  4  1  8  0   7  4  1  8  0  1608
MatLUFactorSym         1 1.0 5.9881e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLUFactorNum         1 1.0 5.9217e-01 1.1 2.66e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0 48  0  0  0   0 48  0  0  0 3673552
MatConvert             1 1.0 1.0331e-03 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatResidual          348 1.0 3.3047e-0113.8 5.70e+06 1.0 1.6e+07 6.8e+02 0.0e+00  0  1  0 11  0   0  1  0 11  0 140845
MatAssemblyBegin      22 1.0 2.4983e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 2.6e+01  0  0  0  0  3   0  0  0  0  3     0
MatAssemblyEnd        22 1.0 3.3268e-02 1.1 0.00e+00 0.0 4.7e+05 1.4e+02 7.2e+01  0  0  0  0  7   0  0  0  0  7     0
MatGetRowIJ            1 1.0 5.8293e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetSubMatrice       1 1.0 2.2252e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 9.7980e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView               40 1.3 3.3014e-01 3.5 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+01  0  0  0  0  3   0  0  0  0  3     0
MatPtAP                4 1.0 4.4705e-02 1.0 1.03e+05 1.0 9.3e+05 2.9e+02 6.8e+01  0  0  0  0  7   0  0  0  0  7 18789
MatPtAPSymbolic        4 1.0 2.9025e-02 1.0 0.00e+00 0.0 5.6e+05 4.5e+02 2.8e+01  0  0  0  0  3   0  0  0  0  3     0
MatPtAPNumeric         4 1.0 1.6840e-02 1.1 1.03e+05 1.0 3.7e+05 4.4e+01 4.0e+01  0  0  0  0  4   0  0  0  0  4 49879
MatRedundantMat        1 1.0 2.3107e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetLocalMat         4 1.0 6.1631e-04 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetBrAoCol          4 1.0 1.4648e-03 2.8 0.00e+00 0.0 5.6e+05 4.5e+02 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetSymTrans         8 1.0 1.4162e-04 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp              10 1.0 4.6747e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.4e+01  0  0  0  0  1   0  0  0  0  1     0
KSPSolve               4 1.0 5.0087e+02 1.0 5.50e+08 1.0 6.0e+09 1.7e+01 9.2e+02 98100100100 91  98100100100 92  8992
PCSetUp                4 1.0 6.8538e+01 1.0 2.66e+08 1.0 1.4e+08 1.0e+01 2.1e+02 13 48  2  1 21  13 48  2  1 21 31760
PCApply               87 1.0 4.3206e+02 1.0 2.75e+08 1.0 5.9e+09 1.5e+01 3.5e+02 85 50 98 90 34  85 50 98 90 35  5213
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector   597            597      2364880     0
      Vector Scatter    16             15        20656     0
              Matrix    38             38     18267636     0
   Matrix Null Space     1              1          584     0
    Distributed Mesh     5              4        19808     0
Star Forest Bipartite Graph    10              8         6720     0
     Discrete System     5              4         3360     0
           Index Set    37             37       186396     0
   IS L to G Mapping     5              4         6020     0
       Krylov Solver     7              7         8608     0
     DMKSP interface     4              4         2560     0
      Preconditioner     7              7         6792     0
              Viewer     8              6         4512     0
========================================================================================================================
Average time to get PetscTime(): 2.14577e-07
Average time for MPI_Barrier(): 7.26223e-05
Average time for zero size MPI_Send(): 1.60854e-06
#PETSc Option Table entries:
-finput input.txt
-ksp_initial_guess_nonzero yes
-ksp_norm_type unpreconditioned
-ksp_rtol 1e-9
-ksp_type cg
-ksp_view
-log_summary log_mg_defaults.txt
-mg_coarse_ksp_type preonly
-mg_levels_ksp_type richardson
-pc_dmdarepart_log
-pc_mg_galerkin
-pc_mg_levels 5
-pc_type mg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --known-level1-dcache-size=16384 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=4 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-memcmp-ok=1 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1 --known-mpi-c-double-complex=1 --known-sdot-returns-double=0 --known-snrm2-returns-double=0 --with-batch="1 " --known-mpi-shared="0 " --known-mpi-shared-libraries=0 --known-memcmp-ok  --with-blas-lapack-lib=/opt/acml/5.3.1/gfortran64/lib/libacml.a --COPTFLAGS="-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O3 -march=native -mtune=native" --FOPTFLAGS="-Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O3 -march=native -mtune=native" --CXXOPTFLAGS="-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O3 -march=native -mtune=native" --with-x="0 " --with-debugging=0 --with-clib-autodetect="0 " --with-cxxlib-autodetect="0 " --with-fortranlib-autodetect="0 " --with-shared-libraries="0 " --with-mpi-compilers="1 " --with-cc="cc " --with-cxx="CC " --with-fc="ftn " --download-hypre=1 --download-blacs="1 " --download-scalapack="1 " --download-superlu_dist="1 " --download-metis="1 " --download-parmetis="1 " PETSC_ARCH=gnu-opt-32idx
-----------------------------------------
Libraries compiled on Sat Jul 18 19:48:51 2015 on h2ologin1 
Machine characteristics: Linux-3.0.101-0.46-default-x86_64-with-SuSE-11-x86_64
Using PETSc directory: /u/sciteam/mrosso/LIBS/petsc
Using PETSc arch: gnu-opt-32idx
-----------------------------------------

Using C compiler: cc  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O3 -march=native -mtune=native  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: ftn  -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O3 -march=native -mtune=native   ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/include -I/u/sciteam/mrosso/LIBS/petsc/include -I/u/sciteam/mrosso/LIBS/petsc/include -I/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/include
-----------------------------------------

Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -L/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -lpetsc -Wl,-rpath,/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -L/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -lsuperlu_dist_4.0 -lHYPRE -lscalapack -Wl,-rpath,/opt/acml/5.3.1/gfortran64/lib -L/opt/acml/5.3.1/gfortran64/lib -lacml -lparmetis -lmetis -lssl -lcrypto -ldl 
-----------------------------------------