From davydden at gmail.com Sun Nov 1 03:49:12 2015 From: davydden at gmail.com (Denis Davydov) Date: Sun, 1 Nov 2015 10:49:12 +0100 Subject: [petsc-users] soname seems to be absent in OS-X In-Reply-To: <5D62CF7C-2798-422E-856F-663904012549@mcs.anl.gov> References: <6C83934B-C769-4C05-9235-96EE7C22944D@gmail.com> <5D62CF7C-2798-422E-856F-663904012549@mcs.anl.gov> Message-ID: <165EDFE5-3E41-4F97-95B9-EE0F3372D25D@gmail.com> Hi Barry, I think you use it already. After configure /lib/petsc/conf/petscvariables : SL_LINKER_FUNCTION = -dynamiclib -install_name $(call SONAME_FUNCTION,$(1),$(2)) -compatibility_version $(2) -current_version $(3) -single_module -multiply_defined suppress -undefined dynamic_lookup SONAME_FUNCTION = $(1).$(2).dylib on the linking stage in the homebrew logs I see that the exact linking line contains -install_name : -Wl,-multiply_defined,suppress -Wl,-multiply_defined -Wl,suppress -Wl,-commons,use_dylibs -Wl,-search_paths_first -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O -dynamiclib -install_name /private/tmp/petsc20151027-97392-7khduu/petsc-3.6.2/real/lib/libpetsc.3.6.dylib -compatibility_version 3.6 -current_version 3.6.2 -single_module -multiply_defined suppress -undefined dynamic_lookup If I do configure manually (bare-bones PETSc) and compile it, the install_name seems to be correct (note that it?s libpetsc.3.6.dylib instead of libpetsc.3.6.2.dylib) : ????? $ otool -D arch-darwin-c-debug/lib/libpetsc.3.6.2.dylib arch-darwin-c-debug/lib/libpetsc.3.6.2.dylib: /Users/davydden/Downloads/petsc-3.6.2/arch-darwin-c-debug/lib/libpetsc.3.6.dylib executable linked against end up using correct ABI version: $ otool -L test | grep petsc /Users/davydden/Downloads/petsc-3.6.2/arch-darwin-c-debug/lib/libpetsc.3.6.dylib (compatibility version 3.6.0, current version 3.6.2) ????? however after installation to ?prefix=/Users/davydden/Downloads/petsc-3.6.2/real the install name ends up being wrong: ????? $ otool -D real/lib/libpetsc.3.6.2.dylib real/lib/libpetsc.3.6.2.dylib: /Users/davydden/Downloads/petsc-3.6.2/real/lib/libpetsc.3.6.2.dylib executable linked against end up using libpetsc.3.6.2.dylib instead of ABI version: otool -L test | grep petsc /Users/davydden/Downloads/petsc-3.6.2/real/lib/libpetsc.3.6.2.dylib (compatibility version 3.6.0, current version 3.6.2) ????? My guess would be there is something wrong happening in `make install`. Perhaps when using ?install_name_tool? with "-id? flag to change a library?s install name. As a workaround i will fix install name manually, but you may consider investigating this issue further. p.s. an excerpt from http://cocoadev.com/ApplicationLinking : Unlike many OSes, OS X does not have a search path for the dynamic linker**. This means that you can't simply put a dynamic library in some "standard" location and have dyld find it, because there is no standard location. Instead, OS X embeds an "install name" inside each dynamic library. This install name is the path to where the library can be found when dyld needs to load it. When you build an application that links against a dynamic library, this install name is copied into the application binary. When the application runs, the copied install name is then used to locate the library or framework. ** Technically, dyld does have a search path, defined in the DYLD_FRAMEWORK_PATH and DYLD_LIBRARY_PATH variables. However, these are empty on OS X by default, so they rarely matter. Kind regards, Denis > On 29 Oct 2015, at 22:01, Barry Smith wrote: > > > Denis, > > We don't understand what purpose a soname serves on Apple or how to add it. If you need it let us know how to install PETSc so that it is set and we will do it. > > Barry -------------- next part -------------- An HTML attachment was scrubbed... URL: From zonexo at gmail.com Sun Nov 1 07:30:50 2015 From: zonexo at gmail.com (TAY wee-beng) Date: Sun, 1 Nov 2015 21:30:50 +0800 Subject: [petsc-users] Scaling with number of cores In-Reply-To: <0BEDF334-C0C6-49D1-9FE8-D178181C5F00@mcs.anl.gov> References: <5634EDA4.5030203@gmail.com> <56356E2A.8090000@gmail.com> <0BEDF334-C0C6-49D1-9FE8-D178181C5F00@mcs.anl.gov> Message-ID: <5636140A.3040506@gmail.com> On 1/11/2015 10:00 AM, Barry Smith wrote: >> On Oct 31, 2015, at 8:43 PM, TAY wee-beng wrote: >> >> >> On 1/11/2015 12:47 AM, Matthew Knepley wrote: >>> On Sat, Oct 31, 2015 at 11:34 AM, TAY wee-beng wrote: >>> Hi, >>> >>> I understand that as mentioned in the faq, due to the limitations in memory, the scaling is not linear. So, I am trying to write a proposal to use a supercomputer. >>> Its specs are: >>> Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of memory per node) >>> >>> 8 cores / processor >>> Interconnect: Tofu (6-dimensional mesh/torus) Interconnect >>> Each cabinet contains 96 computing nodes, >>> One of the requirement is to give the performance of my current code with my current set of data, and there is a formula to calculate the estimated parallel efficiency when using the new large set of data >>> There are 2 ways to give performance: >>> 1. Strong scaling, which is defined as how the elapsed time varies with the number of processors for a fixed >>> problem. >>> 2. Weak scaling, which is defined as how the elapsed time varies with the number of processors for a >>> fixed problem size per processor. >>> I ran my cases with 48 and 96 cores with my current cluster, giving 140 and 90 mins respectively. This is classified as strong scaling. >>> Cluster specs: >>> CPU: AMD 6234 2.4GHz >>> 8 cores / processor (CPU) >>> 6 CPU / node >>> So 48 Cores / CPU >>> Not sure abt the memory / node >>> >>> The parallel efficiency ?En? for a given degree of parallelism ?n? indicates how much the program is >>> efficiently accelerated by parallel processing. ?En? is given by the following formulae. Although their >>> derivation processes are different depending on strong and weak scaling, derived formulae are the >>> same. >>> From the estimated time, my parallel efficiency using Amdahl's law on the current old cluster was 52.7%. >>> So is my results acceptable? >>> For the large data set, if using 2205 nodes (2205X8cores), my expected parallel efficiency is only 0.5%. The proposal recommends value of > 50%. >>> The problem with this analysis is that the estimated serial fraction from Amdahl's Law changes as a function >>> of problem size, so you cannot take the strong scaling from one problem and apply it to another without a >>> model of this dependence. >>> >>> Weak scaling does model changes with problem size, so I would measure weak scaling on your current >>> cluster, and extrapolate to the big machine. I realize that this does not make sense for many scientific >>> applications, but neither does requiring a certain parallel efficiency. >> Ok I check the results for my weak scaling it is even worse for the expected parallel efficiency. From the formula used, it's obvious it's doing some sort of exponential extrapolation decrease. So unless I can achieve a near > 90% speed up when I double the cores and problem size for my current 48/96 cores setup, extrapolating from about 96 nodes to 10,000 nodes will give a much lower expected parallel efficiency for the new case. >> >> However, it's mentioned in the FAQ that due to memory requirement, it's impossible to get >90% speed when I double the cores and problem size (ie linear increase in performance), which means that I can't get >90% speed up when I double the cores and problem size for my current 48/96 cores setup. Is that so? > What is the output of -ksp_view -log_summary on the problem and then on the problem doubled in size and number of processors? > > Barry Hi, I have attached the output 48 cores: log48 96 cores: log96 There are 2 solvers - The momentum linear eqn uses bcgs, while the Poisson eqn uses hypre BoomerAMG. Problem size doubled from 158x266x150 to 158x266x300. > >> So is it fair to say that the main problem does not lie in my programming skills, but rather the way the linear equations are solved? >> >> Thanks. >>> Thanks, >>> >>> Matt >>> Is it possible for this type of scaling in PETSc (>50%), when using 17640 (2205X8) cores? >>> Btw, I do not have access to the system. >>> >>> >>> >>> Sent using CloudMagic Email >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener -------------- next part -------------- 0.000000000000000E+000 0.353000000000000 0.000000000000000E+000 90.0000000000000 0.000000000000000E+000 0.000000000000000E+000 1.00000000000000 0.400000000000000 0 -400000 AB,AA,BB -2.47900002275128 2.50750002410496 3.46600006963126 3.40250006661518 size_x,size_y,size_z 158 266 150 body_cg_ini 0.523700833348298 0.778648765134454 7.03282656467989 Warning - length difference between element and cell max_element_length,min_element_length,min_delta 0.000000000000000E+000 10000000000.0000 1.800000000000000E-002 maximum ngh_surfaces and ngh_vertics are 45 22 minimum ngh_surfaces and ngh_vertics are 28 10 body_cg_ini 0.896813342835977 -0.976707581163755 7.03282656467989 Warning - length difference between element and cell max_element_length,min_element_length,min_delta 0.000000000000000E+000 10000000000.0000 1.800000000000000E-002 maximum ngh_surfaces and ngh_vertics are 45 22 minimum ngh_surfaces and ngh_vertics are 28 10 min IIB_cell_no 0 max IIB_cell_no 429 final initial IIB_cell_no 2145 min I_cell_no 0 max I_cell_no 460 final initial I_cell_no 2300 size(IIB_cell_u),size(I_cell_u),size(IIB_equal_cell_u),size(I_equal_cell_u) 2145 2300 2145 2300 IIB_I_cell_no_uvw_total1 3090 3094 3078 3080 3074 3073 IIB_I_cell_no_uvw_total2 3102 3108 3089 3077 3060 3086 KSP Object:(poisson_) 48 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines 1 0.00150000 0.26453420 0.26150786 1.18591549 -0.76698473E+03 -0.32601079E+02 0.62972429E+07 KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines 2 0.00150000 0.32176572 0.32263580 1.26788415 -0.60299667E+03 0.32647398E+02 0.62967217E+07 body 1 implicit forces and moment 1 -4.26275429376489 3.25233748148304 4.24465168045839 2.64197203573494 -4.33941463724848 6.02314872588369 body 2 implicit forces and moment 2 -2.73305586295329 -4.58125536984606 4.09749596192292 -0.631365801453371 -4.97859179904208 -6.05233907360457 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./a.out on a petsc-3.6.2_shared_rel named n12-09 with 48 processors, by wtay Sun Nov 1 14:22:24 2015 Using Petsc Release Version 3.6.2, Oct, 02, 2015 Max Max/Min Avg Total Time (sec): 5.216e+01 1.00078 5.213e+01 Objects: 5.700e+01 1.00000 5.700e+01 Flops: 1.187e+08 1.51264 9.189e+07 4.411e+09 Flops/sec: 2.278e+06 1.51282 1.763e+06 8.461e+07 MPI Messages: 6.500e+01 3.42105 3.721e+01 1.786e+03 MPI Message Lengths: 1.338e+07 2.00000 3.521e+05 6.288e+08 MPI Reductions: 8.900e+01 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 5.2132e+01 100.0% 4.4107e+09 100.0% 1.786e+03 100.0% 3.521e+05 100.0% 8.800e+01 98.9% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 14 1.0 3.3337e-01 1.6 3.91e+07 1.5 1.3e+03 4.3e+05 0.0e+00 0 33 74 90 0 0 33 74 90 0 4381 MatSolve 3 1.0 6.6930e-02 1.3 1.79e+07 1.9 0.0e+00 0.0e+00 0.0e+00 0 15 0 0 0 0 15 0 0 0 9759 MatLUFactorNum 1 1.0 9.9076e-02 1.9 9.58e+06 2.0 0.0e+00 0.0e+00 0.0e+00 0 8 0 0 0 0 8 0 0 0 3458 MatILUFactorSym 1 1.0 8.4120e-02 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatConvert 1 1.0 5.4024e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyBegin 2 1.0 2.2867e-0141.2 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 4 0 0 0 0 5 0 MatAssemblyEnd 2 1.0 3.0022e-01 1.0 0.00e+00 0.0 3.8e+02 1.7e+05 1.6e+01 1 0 21 10 18 1 0 21 10 18 0 MatGetRowIJ 3 1.0 1.0967e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 1.0795e-02 3.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatView 5 1.7 6.9060e-03 4.1 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 3 0 0 0 0 3 0 KSPGMRESOrthog 12 1.0 6.7062e-0110.1 2.82e+07 1.3 0.0e+00 0.0e+00 1.2e+01 0 24 0 0 13 0 24 0 0 14 1579 KSPSetUp 3 1.0 4.4972e-02 4.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 3 1.0 4.0408e+01 1.0 1.19e+08 1.5 1.3e+03 4.3e+05 3.5e+01 78100 74 90 39 78100 74 90 40 109 VecDot 2 1.0 2.5249e-02 9.1 2.02e+06 1.3 0.0e+00 0.0e+00 2.0e+00 0 2 0 0 2 0 2 0 0 2 2996 VecDotNorm2 1 1.0 1.9680e-0211.9 2.02e+06 1.3 0.0e+00 0.0e+00 1.0e+00 0 2 0 0 1 0 2 0 0 1 3844 VecMDot 12 1.0 6.4181e-0119.8 1.41e+07 1.3 0.0e+00 0.0e+00 1.2e+01 0 12 0 0 13 0 12 0 0 14 825 VecNorm 16 1.0 1.9977e-0115.8 6.72e+06 1.3 0.0e+00 0.0e+00 1.6e+01 0 6 0 0 18 0 6 0 0 18 1262 VecScale 14 1.0 4.8060e-03 1.6 2.35e+06 1.3 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 18364 VecCopy 4 1.0 1.2854e-02 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 26 1.0 3.7126e-02 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 2 1.0 2.4660e-03 3.0 6.72e+05 1.3 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 10226 VecAXPBYCZ 2 1.0 1.3788e-02 1.5 4.03e+06 1.3 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 10973 VecWAXPY 2 1.0 1.3360e-02 1.4 2.02e+06 1.3 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 5662 VecMAXPY 14 1.0 4.5768e-02 1.3 1.82e+07 1.3 0.0e+00 0.0e+00 0.0e+00 0 15 0 0 0 0 15 0 0 0 14876 VecAssemblyBegin 6 1.0 2.8086e-02 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 20 0 0 0 0 20 0 VecAssemblyEnd 6 1.0 3.3140e-05 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 14 1.0 1.6444e-02 3.0 0.00e+00 0.0 1.3e+03 4.3e+05 0.0e+00 0 0 74 90 0 0 0 74 90 0 0 VecScatterEnd 14 1.0 1.1537e-01 3.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecNormalize 14 1.0 1.1996e-0111.2 7.06e+06 1.3 0.0e+00 0.0e+00 1.4e+01 0 6 0 0 16 0 6 0 0 16 2207 PCSetUp 3 1.0 3.2445e+01 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 62 8 0 0 4 62 8 0 0 5 11 PCSetUpOnBlocks 1 1.0 1.9210e-01 1.7 9.58e+06 2.0 0.0e+00 0.0e+00 0.0e+00 0 8 0 0 0 0 8 0 0 0 1783 PCApply 17 1.0 7.5148e+00 1.1 1.79e+07 1.9 0.0e+00 0.0e+00 0.0e+00 14 15 0 0 0 14 15 0 0 0 87 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 7 7 182147036 0 Krylov Solver 3 3 20664 0 Vector 33 33 59213792 0 Vector Scatter 2 2 2176 0 Index Set 7 7 4705612 0 Preconditioner 3 3 3208 0 Viewer 2 1 760 0 ======================================================================================================================== Average time to get PetscTime(): 1.90735e-07 Average time for MPI_Barrier(): 1.12057e-05 Average time for zero size MPI_Send(): 2.15024e-05 #PETSc Option Table entries: -log_summary -momentum_ksp_view -poisson_ksp_view #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --with-mpi-dir=/opt/ud/openmpi-1.8.8/ --with-blas-lapack-dir=/opt/ud/intel_xe_2013sp1/mkl/lib/intel64/ --with-debugging=0 --download-hypre=1 --prefix=/home/wtay/Lib/petsc-3.6.2_shared_rel --known-mpi-shared=1 --with-shared-libraries --with-fortran-interfaces=1 ----------------------------------------- Libraries compiled on Sun Oct 18 17:34:07 2015 on hpc12 Machine characteristics: Linux-3.10.0-123.20.1.el7.x86_64-x86_64-with-centos-7.1.1503-Core Using PETSc directory: /home/wtay/Codes/petsc-3.6.2 Using PETSc arch: petsc-3.6.2_shared_rel ----------------------------------------- Using C compiler: /opt/ud/openmpi-1.8.8/bin/mpicc -fPIC -wd1572 -O3 ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: /opt/ud/openmpi-1.8.8/bin/mpif90 -fPIC -O3 ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/include -I/home/wtay/Codes/petsc-3.6.2/include -I/home/wtay/Codes/petsc-3.6.2/include -I/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/include -I/home/wtay/Lib/petsc-3.6.2_shared_rel/include -I/opt/ud/openmpi-1.8.8/include ----------------------------------------- Using C linker: /opt/ud/openmpi-1.8.8/bin/mpicc Using Fortran linker: /opt/ud/openmpi-1.8.8/bin/mpif90 Using libraries: -Wl,-rpath,/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/lib -L/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/lib -lpetsc -Wl,-rpath,/home/wtay/Lib/petsc-3.6.2_shared_rel/lib -L/home/wtay/Lib/petsc-3.6.2_shared_rel/lib -lHYPRE -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -lmpi_cxx -Wl,-rpath,/opt/ud/intel_xe_2013sp1/mkl/lib/intel64 -L/opt/ud/intel_xe_2013sp1/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -lX11 -lhwloc -lssl -lcrypto -lmpi_usempi -lmpi_mpifh -lifport -lifcore -lm -lmpi_cxx -ldl -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -lmpi -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -limf -lsvml -lirng -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lpthread -lirc_s -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -ldl ----------------------------------------- -------------- next part -------------- 0.000000000000000E+000 0.353000000000000 0.000000000000000E+000 90.0000000000000 0.000000000000000E+000 0.000000000000000E+000 1.00000000000000 0.400000000000000 0 -400000 AB,AA,BB -3.41000006697141 3.44100006844383 3.46600006963126 3.40250006661518 size_x,size_y,size_z 158 266 301 body_cg_ini 0.523700833348298 0.778648765134454 7.03282656467989 Warning - length difference between element and cell max_element_length,min_element_length,min_delta 0.000000000000000E+000 10000000000.0000 1.800000000000000E-002 maximum ngh_surfaces and ngh_vertics are 41 20 minimum ngh_surfaces and ngh_vertics are 28 10 body_cg_ini 0.896813342835977 -0.976707581163755 7.03282656467989 Warning - length difference between element and cell max_element_length,min_element_length,min_delta 0.000000000000000E+000 10000000000.0000 1.800000000000000E-002 maximum ngh_surfaces and ngh_vertics are 41 20 minimum ngh_surfaces and ngh_vertics are 28 10 min IIB_cell_no 0 max IIB_cell_no 415 final initial IIB_cell_no 2075 min I_cell_no 0 max I_cell_no 468 final initial I_cell_no 2340 size(IIB_cell_u),size(I_cell_u),size(IIB_equal_cell_u),size(I_equal_cell_u) 2075 2340 2075 2340 IIB_I_cell_no_uvw_total1 7635 7644 7643 8279 8271 8297 IIB_I_cell_no_uvw_total2 7647 7646 7643 8271 8274 8266 KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines 1 0.00150000 0.35826998 0.36414728 1.27156134 -0.24352631E+04 -0.99308685E+02 0.12633660E+08 KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines 2 0.00150000 0.49306841 0.48961181 1.45614703 -0.20361132E+04 0.77916035E+02 0.12632159E+08 body 1 implicit forces and moment 1 -4.19326176380900 3.26285229643405 4.91657786206150 3.33023211607813 -4.66288821809535 5.95105697339790 body 2 implicit forces and moment 2 -2.71360610740664 -4.53650746988691 4.76048497342022 -1.13560954211517 -5.55259427154780 -5.98958778241742 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./a.out on a petsc-3.6.2_shared_rel named n12-05 with 96 processors, by wtay Sun Nov 1 14:26:28 2015 Using Petsc Release Version 3.6.2, Oct, 02, 2015 Max Max/Min Avg Total Time (sec): 5.132e+02 1.00006 5.132e+02 Objects: 4.400e+01 1.00000 4.400e+01 Flops: 5.257e+07 1.75932 4.049e+07 3.887e+09 Flops/sec: 1.024e+05 1.75933 7.890e+04 7.574e+06 MPI Messages: 1.010e+02 14.42857 1.385e+01 1.330e+03 MPI Message Lengths: 5.310e+06 2.00000 3.793e+05 5.045e+08 MPI Reductions: 6.300e+01 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 5.1317e+02 100.0% 3.8869e+09 100.0% 1.330e+03 100.0% 3.793e+05 100.0% 6.200e+01 98.4% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 2 1.0 1.1735e-01 1.6 1.30e+07 1.9 3.8e+02 9.9e+05 0.0e+00 0 25 29 75 0 0 25 29 75 0 8276 MatSolve 3 1.0 7.3929e-02 1.7 1.79e+07 1.9 0.0e+00 0.0e+00 0.0e+00 0 34 0 0 0 0 34 0 0 0 17787 MatLUFactorNum 1 1.0 1.0028e-01 1.9 9.58e+06 2.0 0.0e+00 0.0e+00 0.0e+00 0 18 0 0 0 0 18 0 0 0 6881 MatILUFactorSym 1 1.0 8.3889e-02 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatConvert 1 1.0 6.0471e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyBegin 2 1.0 4.2017e-0132.8 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 6 0 0 0 0 6 0 MatAssemblyEnd 2 1.0 3.2434e-01 1.0 0.00e+00 0.0 7.6e+02 1.7e+05 1.6e+01 0 0 57 25 25 0 0 57 25 26 0 MatGetRowIJ 3 1.0 1.3113e-0513.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 1.6168e-02 5.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatView 5 1.7 1.0472e-02 3.8 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 5 0 0 0 0 5 0 KSPSetUp 3 1.0 5.0210e-02 7.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 3 1.0 4.9085e+02 1.0 5.26e+07 1.8 3.8e+02 9.9e+05 9.0e+00 96100 29 75 14 96100 29 75 15 8 VecDot 2 1.0 3.2738e-0210.5 2.02e+06 1.3 0.0e+00 0.0e+00 2.0e+00 0 4 0 0 3 0 4 0 0 3 4637 VecDotNorm2 1 1.0 3.0590e-0215.2 2.02e+06 1.3 0.0e+00 0.0e+00 1.0e+00 0 4 0 0 2 0 4 0 0 2 4963 VecNorm 2 1.0 1.2529e-0120.6 2.02e+06 1.3 0.0e+00 0.0e+00 2.0e+00 0 4 0 0 3 0 4 0 0 3 1212 VecCopy 2 1.0 5.5149e-03 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 10 1.0 2.0353e-02 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPBYCZ 2 1.0 1.8868e-02 2.1 4.03e+06 1.3 0.0e+00 0.0e+00 0.0e+00 0 8 0 0 0 0 8 0 0 0 16091 VecWAXPY 2 1.0 1.4920e-02 1.6 2.02e+06 1.3 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 10174 VecAssemblyBegin 6 1.0 2.2800e-0118.8 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 29 0 0 0 0 29 0 VecAssemblyEnd 6 1.0 5.3167e-05 6.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 2 1.0 1.0023e-02 4.1 0.00e+00 0.0 3.8e+02 9.9e+05 0.0e+00 0 0 29 75 0 0 0 29 75 0 0 VecScatterEnd 2 1.0 4.0989e-02 3.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCSetUp 3 1.0 4.3599e+02 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 85 18 0 0 6 85 18 0 0 6 2 PCSetUpOnBlocks 1 1.0 1.9143e-01 1.9 9.58e+06 2.0 0.0e+00 0.0e+00 0.0e+00 0 18 0 0 0 0 18 0 0 0 3605 PCApply 3 1.0 7.6561e-02 1.6 1.79e+07 1.9 0.0e+00 0.0e+00 0.0e+00 0 34 0 0 0 0 34 0 0 0 17175 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 7 7 182147036 0 Krylov Solver 3 3 3464 0 Vector 20 20 41709448 0 Vector Scatter 2 2 2176 0 Index Set 7 7 4705612 0 Preconditioner 3 3 3208 0 Viewer 2 1 760 0 ======================================================================================================================== Average time to get PetscTime(): 1.90735e-07 Average time for MPI_Barrier(): 0.0004426 Average time for zero size MPI_Send(): 1.45609e-05 #PETSc Option Table entries: -log_summary -momentum_ksp_view -poisson_ksp_view #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --with-mpi-dir=/opt/ud/openmpi-1.8.8/ --with-blas-lapack-dir=/opt/ud/intel_xe_2013sp1/mkl/lib/intel64/ --with-debugging=0 --download-hypre=1 --prefix=/home/wtay/Lib/petsc-3.6.2_shared_rel --known-mpi-shared=1 --with-shared-libraries --with-fortran-interfaces=1 ----------------------------------------- Libraries compiled on Sun Oct 18 17:34:07 2015 on hpc12 Machine characteristics: Linux-3.10.0-123.20.1.el7.x86_64-x86_64-with-centos-7.1.1503-Core Using PETSc directory: /home/wtay/Codes/petsc-3.6.2 Using PETSc arch: petsc-3.6.2_shared_rel ----------------------------------------- Using C compiler: /opt/ud/openmpi-1.8.8/bin/mpicc -fPIC -wd1572 -O3 ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: /opt/ud/openmpi-1.8.8/bin/mpif90 -fPIC -O3 ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/include -I/home/wtay/Codes/petsc-3.6.2/include -I/home/wtay/Codes/petsc-3.6.2/include -I/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/include -I/home/wtay/Lib/petsc-3.6.2_shared_rel/include -I/opt/ud/openmpi-1.8.8/include ----------------------------------------- Using C linker: /opt/ud/openmpi-1.8.8/bin/mpicc Using Fortran linker: /opt/ud/openmpi-1.8.8/bin/mpif90 Using libraries: -Wl,-rpath,/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/lib -L/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/lib -lpetsc -Wl,-rpath,/home/wtay/Lib/petsc-3.6.2_shared_rel/lib -L/home/wtay/Lib/petsc-3.6.2_shared_rel/lib -lHYPRE -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -lmpi_cxx -Wl,-rpath,/opt/ud/intel_xe_2013sp1/mkl/lib/intel64 -L/opt/ud/intel_xe_2013sp1/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -lX11 -lhwloc -lssl -lcrypto -lmpi_usempi -lmpi_mpifh -lifport -lifcore -lm -lmpi_cxx -ldl -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -lmpi -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -limf -lsvml -lirng -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lpthread -lirc_s -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -ldl ----------------------------------------- From snakexf at gmail.com Sun Nov 1 07:39:00 2015 From: snakexf at gmail.com (Feng Xing) Date: Sun, 1 Nov 2015 14:39:00 +0100 Subject: [petsc-users] Create Hypre ILU(0) PC Segmentation error Message-ID: <24F4493F-FA9F-4C4C-9562-8B7A50BC313B@gmail.com> Hello everyone, I would like to look for help for a small problem. I am trying to create a Hypre ilu(0) preconditioned in Fortran with the following code, where the matrix A_mpi has been created. call PCCreate(MPI_COMM_WORLD, pcilu0, Ierr) CHKERRQ(Ierr) call PCSetOperators(pcilu0, A_mpi, A_mpi, Ierr) CHKERRQ(Ierr) call PCSetType(pcilu0, PCHYPRE, Ierr) CHKERRQ(Ierr) call PCHYPRESetType(pcilu0, 'euclid', Ierr) CHKERRQ(Ierr) call PetscOptionsSetValue(pcilu0, '-pc_hypre_euclid_levels', '0', Ierr) CHKERRQ(Ierr) But, I got some segmentation errors. I tried to use valgrind, but it doesn?t report any errors. [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger Thank you very much! Kind regards, Feng Xing Postdoc in France -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Sun Nov 1 09:19:17 2015 From: hzhang at mcs.anl.gov (Hong) Date: Sun, 1 Nov 2015 09:19:17 -0600 Subject: [petsc-users] PetscOptionsGetString Not Finding Option In-Reply-To: <56356233.3060207@gmail.com> References: <56356233.3060207@gmail.com> Message-ID: Jared : Either call KSPSetPCSide() or change const char name[] = "-ksp_pc_side" to a non-petsc option name, e.g., "-my_ksp_pc_side". Hong Hello, > I am trying to use PetscOptionsGetString to retrieve the value of an > option in the options database, but the value returned in the last argument > indicates the option was not found. In the attached code (a modified > version of ksp example 2), the string "-ksp_pc_side" is passed in as the > argument name. If I run the code as > > ./jc2 -pc_type ilu -ksp_pc_side right > > I get the output: > > option -ksp_pc_side was found > > from line 71 of the file. Petsc does not complain of unused options > when the program finishes. Am I using this function incorrectly? > > Jared Crean > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sun Nov 1 10:11:47 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sun, 1 Nov 2015 10:11:47 -0600 Subject: [petsc-users] summary of the bandwidth received with different number of MPI processes In-Reply-To: <56359485.7000004@gmail.com> References: <5634EF2F.3090208@gmail.com> <56359485.7000004@gmail.com> Message-ID: Just plot the bandwidth yourself using gunplot or Matlab or something. Also you might benefit from using process binding http://www.mcs.anl.gov/petsc/documentation/faq.html#computers > On Oct 31, 2015, at 11:26 PM, TAY wee-beng wrote: > > > On 1/11/2015 1:17 AM, Barry Smith wrote: >> Yes, just put the output from running with 1 2 etc processes in order into the file > Hi, > > I just did but I got some errors. > > The scaling.log file is: > > Number of MPI processes 3 Processor names n12-06 n12-06 n12-06 > Triad: 27031.0419 Rate (MB/s) > Number of MPI processes 6 Processor names n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 > Triad: 53517.8980 Rate (MB/s) > Number of MPI processes 12 Processor names n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 > Triad: 53162.5346 Rate (MB/s) > Number of MPI processes 24 Processor names n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 > Triad: 101455.6581 Rate (MB/s) > Number of MPI processes 48 Processor names n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 > Triad: 115575.8960 Rate (MB/s) > Number of MPI processes 96 Processor names n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 > Triad: 223742.1796 Rate (MB/s) > Number of MPI processes 192 Processor names n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-07 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-09 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 n12-10 > Triad: 436940.9859 Rate (MB/s) > > When I tried to run "./process.py createfile ; process.py", I got > > np speedup > Traceback (most recent call last): > File "./process.py", line 110, in > process(len(sys.argv)-1) > File "./process.py", line 34, in process > speedups[sizes] = triads[sizes]/triads[1] > KeyError: 1 > Traceback (most recent call last): > File "./process.py", line 110, in > process(len(sys.argv)-1) > File "./process.py", line 34, in process > speedups[sizes] = triads[sizes]/triads[1] > KeyError: 1 > > How can I solve it? Thanks. > >>> On Oct 31, 2015, at 11:41 AM, TAY wee-beng wrote: >>> >>> Hi, >>> >>> It's mentioned that for a batch sys, I have to: >>> >>> 1. cd src/benchmarks/steams >>> 2. make MPIVersion >>> 3. submit MPIVersion to the batch system a number of times with 1, 2, 3, etc MPI processes collecting all of the output from the runs into the single file scaling.log. >>> 4. copy scaling.log into the src/benchmarks/steams directory >>> 5. ./process.py createfile ; process.py >>> >>> So for 3, how do I collect all of the output from the runs into the single file scaling.log. >>> >>> Should scaling.log look for this: >>> >>> Number of MPI processes 3 Processor names n12-06 n12-06 n12-06 >>> Triad: 27031.0419 Rate (MB/s) >>> Number of MPI processes 6 Processor names n12-06 n12-06 n12-06 n12-06 n12-06 n12-06 >>> Triad: 53517.8980 Rate (MB/s) >>> >>> ... >>> >>> >>> >>> -- >>> Thank you. >>> >>> Yours sincerely, >>> >>> TAY wee-beng >>> > From bsmith at mcs.anl.gov Sun Nov 1 10:18:33 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sun, 1 Nov 2015 10:18:33 -0600 Subject: [petsc-users] Create Hypre ILU(0) PC Segmentation error In-Reply-To: <24F4493F-FA9F-4C4C-9562-8B7A50BC313B@gmail.com> References: <24F4493F-FA9F-4C4C-9562-8B7A50BC313B@gmail.com> Message-ID: <9166D4B5-46D8-48E5-A64D-3639F49F1FC4@mcs.anl.gov> You need to learn how to use the debugger to debug this type of crash. In this case it is very simple just run the code in the debugger and when it crashes type "where" and "up" and "list" to see where it crashed. Make sure to use the debug version of the code. > On Nov 1, 2015, at 7:39 AM, Feng Xing wrote: > > Hello everyone, > > I would like to look for help for a small problem. I am trying to create a Hypre ilu(0) preconditioned in Fortran with the following code, where the matrix A_mpi has been created. > > call PCCreate(MPI_COMM_WORLD, pcilu0, Ierr) > CHKERRQ(Ierr) > call PCSetOperators(pcilu0, A_mpi, A_mpi, Ierr) > CHKERRQ(Ierr) > call PCSetType(pcilu0, PCHYPRE, Ierr) > CHKERRQ(Ierr) > call PCHYPRESetType(pcilu0, 'euclid', Ierr) > CHKERRQ(Ierr) > call PetscOptionsSetValue(pcilu0, '-pc_hypre_euclid_levels', '0', Ierr) This is likely the problem since you are using totally the wrong first argument for this function. Note also you would use KSPGetPC() and then set the PC and not use a PCCreate() > CHKERRQ(Ierr) > > But, I got some segmentation errors. I tried to use valgrind, but it doesn?t report any errors. > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > > Thank you very much! > > Kind regards, > Feng Xing > Postdoc in France > From bsmith at mcs.anl.gov Sun Nov 1 10:30:52 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sun, 1 Nov 2015 10:30:52 -0600 Subject: [petsc-users] Scaling with number of cores In-Reply-To: <5636140A.3040506@gmail.com> References: <5634EDA4.5030203@gmail.com> <56356E2A.8090000@gmail.com> <0BEDF334-C0C6-49D1-9FE8-D178181C5F00@mcs.anl.gov> <5636140A.3040506@gmail.com> Message-ID: <61EEB3FD-A093-475C-9841-9E79BB120385@mcs.anl.gov> You used gmres with 48 processes but richardson with 96. You need to be careful and make sure you don't change the solvers when you change the number of processors since you can get very different inconsistent results Anyways all the time is being spent in the BoomerAMG algebraic multigrid setup and it is is scaling badly. When you double the problem size and number of processes it went from 3.2445e+01 to 4.3599e+02 seconds. PCSetUp 3 1.0 3.2445e+01 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 62 8 0 0 4 62 8 0 0 5 11 PCSetUp 3 1.0 4.3599e+02 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 85 18 0 0 6 85 18 0 0 6 2 Now is the Poisson problem changing at each timestep or can you use the same preconditioner built with BoomerAMG for all the time steps? Algebraic multigrid has a large set up time that you often doesn't matter if you have many time steps but if you have to rebuild it each timestep it is too large? You might also try -pc_type gamg and see how PETSc's algebraic multigrid scales for your problem/machine. Barry > On Nov 1, 2015, at 7:30 AM, TAY wee-beng wrote: > > > On 1/11/2015 10:00 AM, Barry Smith wrote: >>> On Oct 31, 2015, at 8:43 PM, TAY wee-beng wrote: >>> >>> >>> On 1/11/2015 12:47 AM, Matthew Knepley wrote: >>>> On Sat, Oct 31, 2015 at 11:34 AM, TAY wee-beng wrote: >>>> Hi, >>>> >>>> I understand that as mentioned in the faq, due to the limitations in memory, the scaling is not linear. So, I am trying to write a proposal to use a supercomputer. >>>> Its specs are: >>>> Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of memory per node) >>>> >>>> 8 cores / processor >>>> Interconnect: Tofu (6-dimensional mesh/torus) Interconnect >>>> Each cabinet contains 96 computing nodes, >>>> One of the requirement is to give the performance of my current code with my current set of data, and there is a formula to calculate the estimated parallel efficiency when using the new large set of data >>>> There are 2 ways to give performance: >>>> 1. Strong scaling, which is defined as how the elapsed time varies with the number of processors for a fixed >>>> problem. >>>> 2. Weak scaling, which is defined as how the elapsed time varies with the number of processors for a >>>> fixed problem size per processor. >>>> I ran my cases with 48 and 96 cores with my current cluster, giving 140 and 90 mins respectively. This is classified as strong scaling. >>>> Cluster specs: >>>> CPU: AMD 6234 2.4GHz >>>> 8 cores / processor (CPU) >>>> 6 CPU / node >>>> So 48 Cores / CPU >>>> Not sure abt the memory / node >>>> >>>> The parallel efficiency ?En? for a given degree of parallelism ?n? indicates how much the program is >>>> efficiently accelerated by parallel processing. ?En? is given by the following formulae. Although their >>>> derivation processes are different depending on strong and weak scaling, derived formulae are the >>>> same. >>>> From the estimated time, my parallel efficiency using Amdahl's law on the current old cluster was 52.7%. >>>> So is my results acceptable? >>>> For the large data set, if using 2205 nodes (2205X8cores), my expected parallel efficiency is only 0.5%. The proposal recommends value of > 50%. >>>> The problem with this analysis is that the estimated serial fraction from Amdahl's Law changes as a function >>>> of problem size, so you cannot take the strong scaling from one problem and apply it to another without a >>>> model of this dependence. >>>> >>>> Weak scaling does model changes with problem size, so I would measure weak scaling on your current >>>> cluster, and extrapolate to the big machine. I realize that this does not make sense for many scientific >>>> applications, but neither does requiring a certain parallel efficiency. >>> Ok I check the results for my weak scaling it is even worse for the expected parallel efficiency. From the formula used, it's obvious it's doing some sort of exponential extrapolation decrease. So unless I can achieve a near > 90% speed up when I double the cores and problem size for my current 48/96 cores setup, extrapolating from about 96 nodes to 10,000 nodes will give a much lower expected parallel efficiency for the new case. >>> >>> However, it's mentioned in the FAQ that due to memory requirement, it's impossible to get >90% speed when I double the cores and problem size (ie linear increase in performance), which means that I can't get >90% speed up when I double the cores and problem size for my current 48/96 cores setup. Is that so? >> What is the output of -ksp_view -log_summary on the problem and then on the problem doubled in size and number of processors? >> >> Barry > Hi, > > I have attached the output > > 48 cores: log48 > 96 cores: log96 > > There are 2 solvers - The momentum linear eqn uses bcgs, while the Poisson eqn uses hypre BoomerAMG. > > Problem size doubled from 158x266x150 to 158x266x300. >> >>> So is it fair to say that the main problem does not lie in my programming skills, but rather the way the linear equations are solved? >>> >>> Thanks. >>>> Thanks, >>>> >>>> Matt >>>> Is it possible for this type of scaling in PETSc (>50%), when using 17640 (2205X8) cores? >>>> Btw, I do not have access to the system. >>>> >>>> >>>> >>>> Sent using CloudMagic Email >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener > > From snakexf at gmail.com Sun Nov 1 12:50:39 2015 From: snakexf at gmail.com (Feng Xing) Date: Sun, 1 Nov 2015 19:50:39 +0100 Subject: [petsc-users] Create Hypre ILU(0) PC Segmentation error In-Reply-To: <9166D4B5-46D8-48E5-A64D-3639F49F1FC4@mcs.anl.gov> References: <24F4493F-FA9F-4C4C-9562-8B7A50BC313B@gmail.com> <9166D4B5-46D8-48E5-A64D-3639F49F1FC4@mcs.anl.gov> Message-ID: Thanks very much for the stupid error and the advices. There should not have pcilu0 in call "PetscOptionsSetValue?. Since I would like to create a shell pc (multicative and ...), I use PCCreate(). Kind reagards, > On 01 Nov 2015, at 17:18, Barry Smith wrote: > > > You need to learn how to use the debugger to debug this type of crash. In this case it is very simple just run the code in the debugger and when it crashes type "where" and "up" and "list" to see where it crashed. Make sure to use the debug version of the code. > >> On Nov 1, 2015, at 7:39 AM, Feng Xing wrote: >> >> Hello everyone, >> >> I would like to look for help for a small problem. I am trying to create a Hypre ilu(0) preconditioned in Fortran with the following code, where the matrix A_mpi has been created. >> >> call PCCreate(MPI_COMM_WORLD, pcilu0, Ierr) >> CHKERRQ(Ierr) >> call PCSetOperators(pcilu0, A_mpi, A_mpi, Ierr) >> CHKERRQ(Ierr) >> call PCSetType(pcilu0, PCHYPRE, Ierr) >> CHKERRQ(Ierr) >> call PCHYPRESetType(pcilu0, 'euclid', Ierr) >> CHKERRQ(Ierr) >> call PetscOptionsSetValue(pcilu0, '-pc_hypre_euclid_levels', '0', Ierr) > > This is likely the problem since you are using totally the wrong first argument for this function. > > Note also you would use KSPGetPC() and then set the PC and not use a PCCreate() > >> CHKERRQ(Ierr) >> >> But, I got some segmentation errors. I tried to use valgrind, but it doesn?t report any errors. >> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range >> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >> >> Thank you very much! >> >> Kind regards, >> Feng Xing >> Postdoc in France -------------- next part -------------- An HTML attachment was scrubbed... URL: From zonexo at gmail.com Sun Nov 1 19:35:47 2015 From: zonexo at gmail.com (TAY wee-beng) Date: Mon, 2 Nov 2015 09:35:47 +0800 Subject: [petsc-users] Scaling with number of cores In-Reply-To: <61EEB3FD-A093-475C-9841-9E79BB120385@mcs.anl.gov> References: <5634EDA4.5030203@gmail.com> <56356E2A.8090000@gmail.com> <0BEDF334-C0C6-49D1-9FE8-D178181C5F00@mcs.anl.gov> <5636140A.3040506@gmail.com> <61EEB3FD-A093-475C-9841-9E79BB120385@mcs.anl.gov> Message-ID: <5636BDF3.8000109@gmail.com> Hi, Sorry I forgot and use the old a.out. I have attached the new log for 48cores (log48), together with the 96cores log (log96). Why does the number of processes increase so much? Is there something wrong with my coding? Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? Lastly, I only simulated 2 time steps previously. Now I run for 10 timesteps (log48_10). Is it building the preconditioner at every timestep? Also, what about momentum eqn? Is it working well? I will try the gamg later too. Thank you Yours sincerely, TAY wee-beng On 2/11/2015 12:30 AM, Barry Smith wrote: > You used gmres with 48 processes but richardson with 96. You need to be careful and make sure you don't change the solvers when you change the number of processors since you can get very different inconsistent results > > Anyways all the time is being spent in the BoomerAMG algebraic multigrid setup and it is is scaling badly. When you double the problem size and number of processes it went from 3.2445e+01 to 4.3599e+02 seconds. > > PCSetUp 3 1.0 3.2445e+01 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 62 8 0 0 4 62 8 0 0 5 11 > > PCSetUp 3 1.0 4.3599e+02 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 85 18 0 0 6 85 18 0 0 6 2 > > Now is the Poisson problem changing at each timestep or can you use the same preconditioner built with BoomerAMG for all the time steps? Algebraic multigrid has a large set up time that you often doesn't matter if you have many time steps but if you have to rebuild it each timestep it is too large? > > You might also try -pc_type gamg and see how PETSc's algebraic multigrid scales for your problem/machine. > > Barry > > > >> On Nov 1, 2015, at 7:30 AM, TAY wee-beng wrote: >> >> >> On 1/11/2015 10:00 AM, Barry Smith wrote: >>>> On Oct 31, 2015, at 8:43 PM, TAY wee-beng wrote: >>>> >>>> >>>> On 1/11/2015 12:47 AM, Matthew Knepley wrote: >>>>> On Sat, Oct 31, 2015 at 11:34 AM, TAY wee-beng wrote: >>>>> Hi, >>>>> >>>>> I understand that as mentioned in the faq, due to the limitations in memory, the scaling is not linear. So, I am trying to write a proposal to use a supercomputer. >>>>> Its specs are: >>>>> Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of memory per node) >>>>> >>>>> 8 cores / processor >>>>> Interconnect: Tofu (6-dimensional mesh/torus) Interconnect >>>>> Each cabinet contains 96 computing nodes, >>>>> One of the requirement is to give the performance of my current code with my current set of data, and there is a formula to calculate the estimated parallel efficiency when using the new large set of data >>>>> There are 2 ways to give performance: >>>>> 1. Strong scaling, which is defined as how the elapsed time varies with the number of processors for a fixed >>>>> problem. >>>>> 2. Weak scaling, which is defined as how the elapsed time varies with the number of processors for a >>>>> fixed problem size per processor. >>>>> I ran my cases with 48 and 96 cores with my current cluster, giving 140 and 90 mins respectively. This is classified as strong scaling. >>>>> Cluster specs: >>>>> CPU: AMD 6234 2.4GHz >>>>> 8 cores / processor (CPU) >>>>> 6 CPU / node >>>>> So 48 Cores / CPU >>>>> Not sure abt the memory / node >>>>> >>>>> The parallel efficiency ?En? for a given degree of parallelism ?n? indicates how much the program is >>>>> efficiently accelerated by parallel processing. ?En? is given by the following formulae. Although their >>>>> derivation processes are different depending on strong and weak scaling, derived formulae are the >>>>> same. >>>>> From the estimated time, my parallel efficiency using Amdahl's law on the current old cluster was 52.7%. >>>>> So is my results acceptable? >>>>> For the large data set, if using 2205 nodes (2205X8cores), my expected parallel efficiency is only 0.5%. The proposal recommends value of > 50%. >>>>> The problem with this analysis is that the estimated serial fraction from Amdahl's Law changes as a function >>>>> of problem size, so you cannot take the strong scaling from one problem and apply it to another without a >>>>> model of this dependence. >>>>> >>>>> Weak scaling does model changes with problem size, so I would measure weak scaling on your current >>>>> cluster, and extrapolate to the big machine. I realize that this does not make sense for many scientific >>>>> applications, but neither does requiring a certain parallel efficiency. >>>> Ok I check the results for my weak scaling it is even worse for the expected parallel efficiency. From the formula used, it's obvious it's doing some sort of exponential extrapolation decrease. So unless I can achieve a near > 90% speed up when I double the cores and problem size for my current 48/96 cores setup, extrapolating from about 96 nodes to 10,000 nodes will give a much lower expected parallel efficiency for the new case. >>>> >>>> However, it's mentioned in the FAQ that due to memory requirement, it's impossible to get >90% speed when I double the cores and problem size (ie linear increase in performance), which means that I can't get >90% speed up when I double the cores and problem size for my current 48/96 cores setup. Is that so? >>> What is the output of -ksp_view -log_summary on the problem and then on the problem doubled in size and number of processors? >>> >>> Barry >> Hi, >> >> I have attached the output >> >> 48 cores: log48 >> 96 cores: log96 >> >> There are 2 solvers - The momentum linear eqn uses bcgs, while the Poisson eqn uses hypre BoomerAMG. >> >> Problem size doubled from 158x266x150 to 158x266x300. >>>> So is it fair to say that the main problem does not lie in my programming skills, but rather the way the linear equations are solved? >>>> >>>> Thanks. >>>>> Thanks, >>>>> >>>>> Matt >>>>> Is it possible for this type of scaling in PETSc (>50%), when using 17640 (2205X8) cores? >>>>> Btw, I do not have access to the system. >>>>> >>>>> >>>>> >>>>> Sent using CloudMagic Email >>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>> -- Norbert Wiener >> -------------- next part -------------- 0.000000000000000E+000 0.353000000000000 0.000000000000000E+000 90.0000000000000 0.000000000000000E+000 0.000000000000000E+000 1.00000000000000 0.400000000000000 0 -400000 AB,AA,BB -2.47900002275128 2.50750002410496 3.46600006963126 3.40250006661518 size_x,size_y,size_z 158 266 150 body_cg_ini 0.523700833348298 0.778648765134454 7.03282656467989 Warning - length difference between element and cell max_element_length,min_element_length,min_delta 0.000000000000000E+000 10000000000.0000 1.800000000000000E-002 maximum ngh_surfaces and ngh_vertics are 45 22 minimum ngh_surfaces and ngh_vertics are 28 10 body_cg_ini 0.896813342835977 -0.976707581163755 7.03282656467989 Warning - length difference between element and cell max_element_length,min_element_length,min_delta 0.000000000000000E+000 10000000000.0000 1.800000000000000E-002 maximum ngh_surfaces and ngh_vertics are 45 22 minimum ngh_surfaces and ngh_vertics are 28 10 min IIB_cell_no 0 max IIB_cell_no 429 final initial IIB_cell_no 2145 min I_cell_no 0 max I_cell_no 460 final initial I_cell_no 2300 size(IIB_cell_u),size(I_cell_u),size(IIB_equal_cell_u),size(I_equal_cell_u) 2145 2300 2145 2300 IIB_I_cell_no_uvw_total1 3090 3094 3078 3080 3074 3073 IIB_I_cell_no_uvw_total2 3102 3108 3089 3077 3060 3086 KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines 1 0.00150000 0.26454057 0.26151125 1.18591342 -0.76697866E+03 -0.32601415E+02 0.62972429E+07 KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines 2 0.00150000 0.32176840 0.32263677 1.26788535 -0.60296986E+03 0.32645061E+02 0.62967216E+07 KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines 3 0.00150000 0.36158843 0.37649782 1.31962547 -0.40206982E+03 0.10005980E+03 0.62965570E+07 KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines 4 0.00150000 0.38435320 0.41322368 1.35717436 -0.21463805E+03 0.16271834E+03 0.62964387E+07 KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines 5 0.00150000 0.39753585 0.43993066 1.39058201 -0.42701340E+02 0.22029669E+03 0.62963392E+07 KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines 6 0.00150000 0.41909332 0.46009046 1.41762692 0.11498677E+03 0.27310502E+03 0.62962522E+07 KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines 7 0.00150000 0.43914280 0.47568685 1.43956921 0.25987995E+03 0.32149970E+03 0.62961747E+07 KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines 8 0.00150000 0.45521621 0.48796822 1.45767552 0.39328978E+03 0.36583669E+03 0.62961048E+07 KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines 9 0.00150000 0.46814054 0.49777707 1.47492488 0.51635507E+03 0.40645276E+03 0.62960413E+07 KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines 10 0.00150000 0.47856863 0.50571050 1.49141099 0.63006093E+03 0.44366044E+03 0.62959832E+07 body 1 implicit forces and moment 1 -2.47548682245920 1.64962238999444 0.511583428314605 -0.312588669622222 -1.55898599939365 3.29919937188979 body 2 implicit forces and moment 2 -1.50915464344919 -2.39346816361116 0.496546039715906 0.854598810463335 -1.23770041331909 -3.16285577016750 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./a.out on a petsc-3.6.2_shared_rel named n12-04 with 48 processors, by wtay Mon Nov 2 02:22:24 2015 Using Petsc Release Version 3.6.2, Oct, 02, 2015 Max Max/Min Avg Total Time (sec): 1.003e+02 1.00020 1.003e+02 Objects: 5.200e+01 1.00000 5.200e+01 Flops: 4.731e+08 1.75932 3.622e+08 1.739e+10 Flops/sec: 4.718e+06 1.75942 3.612e+06 1.734e+08 MPI Messages: 4.450e+02 14.35484 6.071e+01 2.914e+03 MPI Message Lengths: 3.714e+07 2.00000 5.991e+05 1.746e+09 MPI Reductions: 2.310e+02 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 1.0029e+02 100.0% 1.7387e+10 100.0% 2.914e+03 100.0% 5.991e+05 100.0% 2.300e+02 99.6% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 18 1.0 1.0074e+00 1.6 1.17e+08 1.9 1.7e+03 9.9e+05 0.0e+00 1 25 58 96 0 1 25 58 96 0 4308 MatSolve 27 1.0 6.7583e-01 1.5 1.61e+08 1.9 0.0e+00 0.0e+00 0.0e+00 1 34 0 0 0 1 34 0 0 0 8698 MatLUFactorNum 9 1.0 9.4694e-01 2.1 8.62e+07 2.0 0.0e+00 0.0e+00 0.0e+00 1 18 0 0 0 1 18 0 0 0 3256 MatILUFactorSym 1 1.0 8.0723e-02 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatConvert 1 1.0 5.6746e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyBegin 10 1.0 1.6173e+0020.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+01 1 0 0 0 9 1 0 0 0 9 0 MatAssemblyEnd 10 1.0 6.4045e-01 1.4 0.00e+00 0.0 3.8e+02 1.7e+05 1.6e+01 1 0 13 4 7 1 0 13 4 7 0 MatGetRowIJ 3 1.0 1.5974e-0516.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 1.1506e-02 3.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatView 37 1.9 3.7605e-02 3.8 0.00e+00 0.0 0.0e+00 0.0e+00 1.9e+01 0 0 0 0 8 0 0 0 0 8 0 KSPSetUp 19 1.0 4.1611e-02 6.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 19 1.0 7.8608e+01 1.0 4.73e+08 1.8 1.7e+03 9.9e+05 4.9e+01 78100 58 96 21 78100 58 96 21 221 VecDot 18 1.0 3.0766e-01 4.8 1.82e+07 1.3 0.0e+00 0.0e+00 1.8e+01 0 4 0 0 8 0 4 0 0 8 2213 VecDotNorm2 9 1.0 2.5026e-0110.8 1.82e+07 1.3 0.0e+00 0.0e+00 9.0e+00 0 4 0 0 4 0 4 0 0 4 2721 VecNorm 18 1.0 6.8599e-01 8.3 1.82e+07 1.3 0.0e+00 0.0e+00 1.8e+01 0 4 0 0 8 0 4 0 0 8 993 VecCopy 18 1.0 5.5553e-02 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 66 1.0 9.9062e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPBYCZ 18 1.0 1.5036e-01 1.8 3.63e+07 1.3 0.0e+00 0.0e+00 0.0e+00 0 8 0 0 0 0 8 0 0 0 9056 VecWAXPY 18 1.0 1.2700e-01 1.5 1.82e+07 1.3 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 5361 VecAssemblyBegin 38 1.0 3.7598e-01 3.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.1e+02 0 0 0 0 49 0 0 0 0 50 0 VecAssemblyEnd 38 1.0 1.7548e-04 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 18 1.0 6.0631e-02 2.9 0.00e+00 0.0 1.7e+03 9.9e+05 0.0e+00 0 0 58 96 0 0 0 58 96 0 0 VecScatterEnd 18 1.0 3.8051e-01 6.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCSetUp 19 1.0 3.2629e+01 1.0 8.62e+07 2.0 0.0e+00 0.0e+00 4.0e+00 32 18 0 0 2 32 18 0 0 2 94 PCSetUpOnBlocks 9 1.0 1.0313e+00 2.1 8.62e+07 2.0 0.0e+00 0.0e+00 0.0e+00 1 18 0 0 0 1 18 0 0 0 2990 PCApply 27 1.0 7.1712e-01 1.5 1.61e+08 1.9 0.0e+00 0.0e+00 0.0e+00 1 34 0 0 0 1 34 0 0 0 8197 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 7 7 182147036 0 Krylov Solver 3 3 3464 0 Vector 20 20 41709448 0 Vector Scatter 2 2 2176 0 Index Set 7 7 4705612 0 Preconditioner 3 3 3208 0 Viewer 10 9 6840 0 ======================================================================================================================== Average time to get PetscTime(): 1.90735e-07 Average time for MPI_Barrier(): 1.35899e-05 Average time for zero size MPI_Send(): 6.83467e-06 #PETSc Option Table entries: -log_summary -momentum_ksp_view -poisson_ksp_view #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --with-mpi-dir=/opt/ud/openmpi-1.8.8/ --with-blas-lapack-dir=/opt/ud/intel_xe_2013sp1/mkl/lib/intel64/ --with-debugging=0 --download-hypre=1 --prefix=/home/wtay/Lib/petsc-3.6.2_shared_rel --known-mpi-shared=1 --with-shared-libraries --with-fortran-interfaces=1 ----------------------------------------- Libraries compiled on Sun Oct 18 17:34:07 2015 on hpc12 Machine characteristics: Linux-3.10.0-123.20.1.el7.x86_64-x86_64-with-centos-7.1.1503-Core Using PETSc directory: /home/wtay/Codes/petsc-3.6.2 Using PETSc arch: petsc-3.6.2_shared_rel ----------------------------------------- Using C compiler: /opt/ud/openmpi-1.8.8/bin/mpicc -fPIC -wd1572 -O3 ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: /opt/ud/openmpi-1.8.8/bin/mpif90 -fPIC -O3 ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/include -I/home/wtay/Codes/petsc-3.6.2/include -I/home/wtay/Codes/petsc-3.6.2/include -I/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/include -I/home/wtay/Lib/petsc-3.6.2_shared_rel/include -I/opt/ud/openmpi-1.8.8/include ----------------------------------------- Using C linker: /opt/ud/openmpi-1.8.8/bin/mpicc Using Fortran linker: /opt/ud/openmpi-1.8.8/bin/mpif90 Using libraries: -Wl,-rpath,/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/lib -L/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/lib -lpetsc -Wl,-rpath,/home/wtay/Lib/petsc-3.6.2_shared_rel/lib -L/home/wtay/Lib/petsc-3.6.2_shared_rel/lib -lHYPRE -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -lmpi_cxx -Wl,-rpath,/opt/ud/intel_xe_2013sp1/mkl/lib/intel64 -L/opt/ud/intel_xe_2013sp1/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -lX11 -lhwloc -lssl -lcrypto -lmpi_usempi -lmpi_mpifh -lifport -lifcore -lm -lmpi_cxx -ldl -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -lmpi -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -limf -lsvml -lirng -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lpthread -lirc_s -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -ldl ----------------------------------------- -------------- next part -------------- 0.000000000000000E+000 0.353000000000000 0.000000000000000E+000 90.0000000000000 0.000000000000000E+000 0.000000000000000E+000 1.00000000000000 0.400000000000000 0 -400000 AB,AA,BB -2.47900002275128 2.50750002410496 3.46600006963126 3.40250006661518 size_x,size_y,size_z 158 266 150 body_cg_ini 0.523700833348298 0.778648765134454 7.03282656467989 Warning - length difference between element and cell max_element_length,min_element_length,min_delta 0.000000000000000E+000 10000000000.0000 1.800000000000000E-002 maximum ngh_surfaces and ngh_vertics are 45 22 minimum ngh_surfaces and ngh_vertics are 28 10 body_cg_ini 0.896813342835977 -0.976707581163755 7.03282656467989 Warning - length difference between element and cell max_element_length,min_element_length,min_delta 0.000000000000000E+000 10000000000.0000 1.800000000000000E-002 maximum ngh_surfaces and ngh_vertics are 45 22 minimum ngh_surfaces and ngh_vertics are 28 10 min IIB_cell_no 0 max IIB_cell_no 429 final initial IIB_cell_no 2145 min I_cell_no 0 max I_cell_no 460 final initial I_cell_no 2300 size(IIB_cell_u),size(I_cell_u),size(IIB_equal_cell_u),size(I_equal_cell_u) 2145 2300 2145 2300 IIB_I_cell_no_uvw_total1 3090 3094 3078 3080 3074 3073 IIB_I_cell_no_uvw_total2 3102 3108 3089 3077 3060 3086 KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines 1 0.00150000 0.26454057 0.26151125 1.18591342 -0.76697866E+03 -0.32601415E+02 0.62972429E+07 KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines 2 0.00150000 0.32176840 0.32263677 1.26788535 -0.60296986E+03 0.32645061E+02 0.62967216E+07 body 1 implicit forces and moment 1 -4.26282587609784 3.25239287178069 4.24467550120379 2.64197101323915 -4.33946240378535 6.02325229135247 body 2 implicit forces and moment 2 -2.73310784825621 -4.58132758579707 4.09752056091207 -0.631346424947326 -4.97864096805248 -6.05243915873125 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./a.out on a petsc-3.6.2_shared_rel named n12-04 with 48 processors, by wtay Mon Nov 2 02:08:45 2015 Using Petsc Release Version 3.6.2, Oct, 02, 2015 Max Max/Min Avg Total Time (sec): 5.583e+01 1.00052 5.581e+01 Objects: 4.400e+01 1.00000 4.400e+01 Flops: 5.257e+07 1.75932 4.025e+07 1.932e+09 Flops/sec: 9.420e+05 1.75989 7.211e+05 3.461e+07 MPI Messages: 5.300e+01 7.57143 1.371e+01 6.580e+02 MPI Message Lengths: 5.310e+06 2.00000 3.793e+05 2.496e+08 MPI Reductions: 6.300e+01 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 5.5813e+01 100.0% 1.9319e+09 100.0% 6.580e+02 100.0% 3.793e+05 100.0% 6.200e+01 98.4% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 2 1.0 1.1111e-01 1.5 1.30e+07 1.9 1.9e+02 9.9e+05 0.0e+00 0 25 29 75 0 0 25 29 75 0 4340 MatSolve 3 1.0 6.9118e-02 1.3 1.79e+07 1.9 0.0e+00 0.0e+00 0.0e+00 0 34 0 0 0 0 34 0 0 0 9450 MatLUFactorNum 1 1.0 1.0166e-01 1.9 9.58e+06 2.0 0.0e+00 0.0e+00 0.0e+00 0 18 0 0 0 0 18 0 0 0 3370 MatILUFactorSym 1 1.0 7.7649e-02 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatConvert 1 1.0 5.6372e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyBegin 2 1.0 3.5564e-01290.6 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 6 0 0 0 0 6 0 MatAssemblyEnd 2 1.0 2.9979e-01 1.0 0.00e+00 0.0 3.8e+02 1.7e+05 1.6e+01 1 0 57 25 25 1 0 57 25 26 0 MatGetRowIJ 3 1.0 2.2888e-0524.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 1.3555e-02 4.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatView 5 1.7 3.3672e-03 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 5 0 0 0 0 5 0 KSPSetUp 3 1.0 4.7309e-02 6.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 3 1.0 4.2701e+01 1.0 5.26e+07 1.8 1.9e+02 9.9e+05 9.0e+00 77100 29 75 14 77100 29 75 15 45 VecDot 2 1.0 2.6857e-0211.2 2.02e+06 1.3 0.0e+00 0.0e+00 2.0e+00 0 4 0 0 3 0 4 0 0 3 2817 VecDotNorm2 1 1.0 2.4464e-0215.0 2.02e+06 1.3 0.0e+00 0.0e+00 1.0e+00 0 4 0 0 2 0 4 0 0 2 3092 VecNorm 2 1.0 1.0654e-0118.5 2.02e+06 1.3 0.0e+00 0.0e+00 2.0e+00 0 4 0 0 3 0 4 0 0 3 710 VecCopy 2 1.0 4.1361e-03 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 10 1.0 2.0313e-02 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPBYCZ 2 1.0 1.5906e-02 1.8 4.03e+06 1.3 0.0e+00 0.0e+00 0.0e+00 0 8 0 0 0 0 8 0 0 0 9512 VecWAXPY 2 1.0 1.3226e-02 1.3 2.02e+06 1.3 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 5720 VecAssemblyBegin 6 1.0 2.8764e-02 9.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 29 0 0 0 0 29 0 VecAssemblyEnd 6 1.0 4.0054e-05 4.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 2 1.0 6.8028e-03 2.7 0.00e+00 0.0 1.9e+02 9.9e+05 0.0e+00 0 0 29 75 0 0 0 29 75 0 0 VecScatterEnd 2 1.0 3.6922e-02 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCSetUp 3 1.0 3.2228e+01 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 58 18 0 0 6 58 18 0 0 6 11 PCSetUpOnBlocks 1 1.0 1.8228e-01 1.6 9.58e+06 2.0 0.0e+00 0.0e+00 0.0e+00 0 18 0 0 0 0 18 0 0 0 1880 PCApply 3 1.0 7.3298e-02 1.3 1.79e+07 1.9 0.0e+00 0.0e+00 0.0e+00 0 34 0 0 0 0 34 0 0 0 8911 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 7 7 182147036 0 Krylov Solver 3 3 3464 0 Vector 20 20 41709448 0 Vector Scatter 2 2 2176 0 Index Set 7 7 4705612 0 Preconditioner 3 3 3208 0 Viewer 2 1 760 0 ======================================================================================================================== Average time to get PetscTime(): 9.53674e-08 Average time for MPI_Barrier(): 9.77516e-06 Average time for zero size MPI_Send(): 6.91414e-06 #PETSc Option Table entries: -log_summary -momentum_ksp_view -poisson_ksp_view #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --with-mpi-dir=/opt/ud/openmpi-1.8.8/ --with-blas-lapack-dir=/opt/ud/intel_xe_2013sp1/mkl/lib/intel64/ --with-debugging=0 --download-hypre=1 --prefix=/home/wtay/Lib/petsc-3.6.2_shared_rel --known-mpi-shared=1 --with-shared-libraries --with-fortran-interfaces=1 ----------------------------------------- Libraries compiled on Sun Oct 18 17:34:07 2015 on hpc12 Machine characteristics: Linux-3.10.0-123.20.1.el7.x86_64-x86_64-with-centos-7.1.1503-Core Using PETSc directory: /home/wtay/Codes/petsc-3.6.2 Using PETSc arch: petsc-3.6.2_shared_rel ----------------------------------------- Using C compiler: /opt/ud/openmpi-1.8.8/bin/mpicc -fPIC -wd1572 -O3 ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: /opt/ud/openmpi-1.8.8/bin/mpif90 -fPIC -O3 ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/include -I/home/wtay/Codes/petsc-3.6.2/include -I/home/wtay/Codes/petsc-3.6.2/include -I/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/include -I/home/wtay/Lib/petsc-3.6.2_shared_rel/include -I/opt/ud/openmpi-1.8.8/include ----------------------------------------- Using C linker: /opt/ud/openmpi-1.8.8/bin/mpicc Using Fortran linker: /opt/ud/openmpi-1.8.8/bin/mpif90 Using libraries: -Wl,-rpath,/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/lib -L/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/lib -lpetsc -Wl,-rpath,/home/wtay/Lib/petsc-3.6.2_shared_rel/lib -L/home/wtay/Lib/petsc-3.6.2_shared_rel/lib -lHYPRE -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -lmpi_cxx -Wl,-rpath,/opt/ud/intel_xe_2013sp1/mkl/lib/intel64 -L/opt/ud/intel_xe_2013sp1/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -lX11 -lhwloc -lssl -lcrypto -lmpi_usempi -lmpi_mpifh -lifport -lifcore -lm -lmpi_cxx -ldl -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -lmpi -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -limf -lsvml -lirng -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lpthread -lirc_s -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -ldl ----------------------------------------- -------------- next part -------------- 0.000000000000000E+000 0.353000000000000 0.000000000000000E+000 90.0000000000000 0.000000000000000E+000 0.000000000000000E+000 1.00000000000000 0.400000000000000 0 -400000 AB,AA,BB -3.41000006697141 3.44100006844383 3.46600006963126 3.40250006661518 size_x,size_y,size_z 158 266 301 body_cg_ini 0.523700833348298 0.778648765134454 7.03282656467989 Warning - length difference between element and cell max_element_length,min_element_length,min_delta 0.000000000000000E+000 10000000000.0000 1.800000000000000E-002 maximum ngh_surfaces and ngh_vertics are 41 20 minimum ngh_surfaces and ngh_vertics are 28 10 body_cg_ini 0.896813342835977 -0.976707581163755 7.03282656467989 Warning - length difference between element and cell max_element_length,min_element_length,min_delta 0.000000000000000E+000 10000000000.0000 1.800000000000000E-002 maximum ngh_surfaces and ngh_vertics are 41 20 minimum ngh_surfaces and ngh_vertics are 28 10 min IIB_cell_no 0 max IIB_cell_no 415 final initial IIB_cell_no 2075 min I_cell_no 0 max I_cell_no 468 final initial I_cell_no 2340 size(IIB_cell_u),size(I_cell_u),size(IIB_equal_cell_u),size(I_equal_cell_u) 2075 2340 2075 2340 IIB_I_cell_no_uvw_total1 7635 7644 7643 8279 8271 8297 IIB_I_cell_no_uvw_total2 7647 7646 7643 8271 8274 8266 KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines 1 0.00150000 0.35826998 0.36414728 1.27156134 -0.24352631E+04 -0.99308685E+02 0.12633660E+08 KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines 2 0.00150000 0.49306841 0.48961181 1.45614703 -0.20361132E+04 0.77916035E+02 0.12632159E+08 body 1 implicit forces and moment 1 -4.19326176380900 3.26285229643405 4.91657786206150 3.33023211607813 -4.66288821809535 5.95105697339790 body 2 implicit forces and moment 2 -2.71360610740664 -4.53650746988691 4.76048497342022 -1.13560954211517 -5.55259427154780 -5.98958778241742 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./a.out on a petsc-3.6.2_shared_rel named n12-05 with 96 processors, by wtay Sun Nov 1 14:26:28 2015 Using Petsc Release Version 3.6.2, Oct, 02, 2015 Max Max/Min Avg Total Time (sec): 5.132e+02 1.00006 5.132e+02 Objects: 4.400e+01 1.00000 4.400e+01 Flops: 5.257e+07 1.75932 4.049e+07 3.887e+09 Flops/sec: 1.024e+05 1.75933 7.890e+04 7.574e+06 MPI Messages: 1.010e+02 14.42857 1.385e+01 1.330e+03 MPI Message Lengths: 5.310e+06 2.00000 3.793e+05 5.045e+08 MPI Reductions: 6.300e+01 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 5.1317e+02 100.0% 3.8869e+09 100.0% 1.330e+03 100.0% 3.793e+05 100.0% 6.200e+01 98.4% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 2 1.0 1.1735e-01 1.6 1.30e+07 1.9 3.8e+02 9.9e+05 0.0e+00 0 25 29 75 0 0 25 29 75 0 8276 MatSolve 3 1.0 7.3929e-02 1.7 1.79e+07 1.9 0.0e+00 0.0e+00 0.0e+00 0 34 0 0 0 0 34 0 0 0 17787 MatLUFactorNum 1 1.0 1.0028e-01 1.9 9.58e+06 2.0 0.0e+00 0.0e+00 0.0e+00 0 18 0 0 0 0 18 0 0 0 6881 MatILUFactorSym 1 1.0 8.3889e-02 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatConvert 1 1.0 6.0471e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyBegin 2 1.0 4.2017e-0132.8 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 6 0 0 0 0 6 0 MatAssemblyEnd 2 1.0 3.2434e-01 1.0 0.00e+00 0.0 7.6e+02 1.7e+05 1.6e+01 0 0 57 25 25 0 0 57 25 26 0 MatGetRowIJ 3 1.0 1.3113e-0513.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 1.6168e-02 5.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatView 5 1.7 1.0472e-02 3.8 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 5 0 0 0 0 5 0 KSPSetUp 3 1.0 5.0210e-02 7.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 3 1.0 4.9085e+02 1.0 5.26e+07 1.8 3.8e+02 9.9e+05 9.0e+00 96100 29 75 14 96100 29 75 15 8 VecDot 2 1.0 3.2738e-0210.5 2.02e+06 1.3 0.0e+00 0.0e+00 2.0e+00 0 4 0 0 3 0 4 0 0 3 4637 VecDotNorm2 1 1.0 3.0590e-0215.2 2.02e+06 1.3 0.0e+00 0.0e+00 1.0e+00 0 4 0 0 2 0 4 0 0 2 4963 VecNorm 2 1.0 1.2529e-0120.6 2.02e+06 1.3 0.0e+00 0.0e+00 2.0e+00 0 4 0 0 3 0 4 0 0 3 1212 VecCopy 2 1.0 5.5149e-03 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 10 1.0 2.0353e-02 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPBYCZ 2 1.0 1.8868e-02 2.1 4.03e+06 1.3 0.0e+00 0.0e+00 0.0e+00 0 8 0 0 0 0 8 0 0 0 16091 VecWAXPY 2 1.0 1.4920e-02 1.6 2.02e+06 1.3 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 10174 VecAssemblyBegin 6 1.0 2.2800e-0118.8 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 29 0 0 0 0 29 0 VecAssemblyEnd 6 1.0 5.3167e-05 6.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 2 1.0 1.0023e-02 4.1 0.00e+00 0.0 3.8e+02 9.9e+05 0.0e+00 0 0 29 75 0 0 0 29 75 0 0 VecScatterEnd 2 1.0 4.0989e-02 3.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCSetUp 3 1.0 4.3599e+02 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 85 18 0 0 6 85 18 0 0 6 2 PCSetUpOnBlocks 1 1.0 1.9143e-01 1.9 9.58e+06 2.0 0.0e+00 0.0e+00 0.0e+00 0 18 0 0 0 0 18 0 0 0 3605 PCApply 3 1.0 7.6561e-02 1.6 1.79e+07 1.9 0.0e+00 0.0e+00 0.0e+00 0 34 0 0 0 0 34 0 0 0 17175 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 7 7 182147036 0 Krylov Solver 3 3 3464 0 Vector 20 20 41709448 0 Vector Scatter 2 2 2176 0 Index Set 7 7 4705612 0 Preconditioner 3 3 3208 0 Viewer 2 1 760 0 ======================================================================================================================== Average time to get PetscTime(): 1.90735e-07 Average time for MPI_Barrier(): 0.0004426 Average time for zero size MPI_Send(): 1.45609e-05 #PETSc Option Table entries: -log_summary -momentum_ksp_view -poisson_ksp_view #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --with-mpi-dir=/opt/ud/openmpi-1.8.8/ --with-blas-lapack-dir=/opt/ud/intel_xe_2013sp1/mkl/lib/intel64/ --with-debugging=0 --download-hypre=1 --prefix=/home/wtay/Lib/petsc-3.6.2_shared_rel --known-mpi-shared=1 --with-shared-libraries --with-fortran-interfaces=1 ----------------------------------------- Libraries compiled on Sun Oct 18 17:34:07 2015 on hpc12 Machine characteristics: Linux-3.10.0-123.20.1.el7.x86_64-x86_64-with-centos-7.1.1503-Core Using PETSc directory: /home/wtay/Codes/petsc-3.6.2 Using PETSc arch: petsc-3.6.2_shared_rel ----------------------------------------- Using C compiler: /opt/ud/openmpi-1.8.8/bin/mpicc -fPIC -wd1572 -O3 ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: /opt/ud/openmpi-1.8.8/bin/mpif90 -fPIC -O3 ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/include -I/home/wtay/Codes/petsc-3.6.2/include -I/home/wtay/Codes/petsc-3.6.2/include -I/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/include -I/home/wtay/Lib/petsc-3.6.2_shared_rel/include -I/opt/ud/openmpi-1.8.8/include ----------------------------------------- Using C linker: /opt/ud/openmpi-1.8.8/bin/mpicc Using Fortran linker: /opt/ud/openmpi-1.8.8/bin/mpif90 Using libraries: -Wl,-rpath,/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/lib -L/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/lib -lpetsc -Wl,-rpath,/home/wtay/Lib/petsc-3.6.2_shared_rel/lib -L/home/wtay/Lib/petsc-3.6.2_shared_rel/lib -lHYPRE -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -lmpi_cxx -Wl,-rpath,/opt/ud/intel_xe_2013sp1/mkl/lib/intel64 -L/opt/ud/intel_xe_2013sp1/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -lX11 -lhwloc -lssl -lcrypto -lmpi_usempi -lmpi_mpifh -lifport -lifcore -lm -lmpi_cxx -ldl -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -lmpi -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -limf -lsvml -lirng -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lpthread -lirc_s -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -ldl ----------------------------------------- From bsmith at mcs.anl.gov Sun Nov 1 19:49:36 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sun, 1 Nov 2015 19:49:36 -0600 Subject: [petsc-users] Scaling with number of cores In-Reply-To: <5636BDF3.8000109@gmail.com> References: <5634EDA4.5030203@gmail.com> <56356E2A.8090000@gmail.com> <0BEDF334-C0C6-49D1-9FE8-D178181C5F00@mcs.anl.gov> <5636140A.3040506@gmail.com> <61EEB3FD-A093-475C-9841-9E79BB120385@mcs.anl.gov> <5636BDF3.8000109@gmail.com> Message-ID: If you are doing many time steps with the same linear solver then you MUST do your weak scaling studies with MANY time steps since the setup time of AMG only takes place in the first stimestep. So run both 48 and 96 processes with the same large number of time steps. Barry > On Nov 1, 2015, at 7:35 PM, TAY wee-beng wrote: > > Hi, > > Sorry I forgot and use the old a.out. I have attached the new log for 48cores (log48), together with the 96cores log (log96). > > Why does the number of processes increase so much? Is there something wrong with my coding? > > Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? > > Lastly, I only simulated 2 time steps previously. Now I run for 10 timesteps (log48_10). Is it building the preconditioner at every timestep? > > Also, what about momentum eqn? Is it working well? > > I will try the gamg later too. > > Thank you > > Yours sincerely, > > TAY wee-beng > > On 2/11/2015 12:30 AM, Barry Smith wrote: >> You used gmres with 48 processes but richardson with 96. You need to be careful and make sure you don't change the solvers when you change the number of processors since you can get very different inconsistent results >> >> Anyways all the time is being spent in the BoomerAMG algebraic multigrid setup and it is is scaling badly. When you double the problem size and number of processes it went from 3.2445e+01 to 4.3599e+02 seconds. >> >> PCSetUp 3 1.0 3.2445e+01 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 62 8 0 0 4 62 8 0 0 5 11 >> >> PCSetUp 3 1.0 4.3599e+02 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 85 18 0 0 6 85 18 0 0 6 2 >> >> Now is the Poisson problem changing at each timestep or can you use the same preconditioner built with BoomerAMG for all the time steps? Algebraic multigrid has a large set up time that you often doesn't matter if you have many time steps but if you have to rebuild it each timestep it is too large? >> >> You might also try -pc_type gamg and see how PETSc's algebraic multigrid scales for your problem/machine. >> >> Barry >> >> >> >>> On Nov 1, 2015, at 7:30 AM, TAY wee-beng wrote: >>> >>> >>> On 1/11/2015 10:00 AM, Barry Smith wrote: >>>>> On Oct 31, 2015, at 8:43 PM, TAY wee-beng wrote: >>>>> >>>>> >>>>> On 1/11/2015 12:47 AM, Matthew Knepley wrote: >>>>>> On Sat, Oct 31, 2015 at 11:34 AM, TAY wee-beng wrote: >>>>>> Hi, >>>>>> >>>>>> I understand that as mentioned in the faq, due to the limitations in memory, the scaling is not linear. So, I am trying to write a proposal to use a supercomputer. >>>>>> Its specs are: >>>>>> Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of memory per node) >>>>>> >>>>>> 8 cores / processor >>>>>> Interconnect: Tofu (6-dimensional mesh/torus) Interconnect >>>>>> Each cabinet contains 96 computing nodes, >>>>>> One of the requirement is to give the performance of my current code with my current set of data, and there is a formula to calculate the estimated parallel efficiency when using the new large set of data >>>>>> There are 2 ways to give performance: >>>>>> 1. Strong scaling, which is defined as how the elapsed time varies with the number of processors for a fixed >>>>>> problem. >>>>>> 2. Weak scaling, which is defined as how the elapsed time varies with the number of processors for a >>>>>> fixed problem size per processor. >>>>>> I ran my cases with 48 and 96 cores with my current cluster, giving 140 and 90 mins respectively. This is classified as strong scaling. >>>>>> Cluster specs: >>>>>> CPU: AMD 6234 2.4GHz >>>>>> 8 cores / processor (CPU) >>>>>> 6 CPU / node >>>>>> So 48 Cores / CPU >>>>>> Not sure abt the memory / node >>>>>> >>>>>> The parallel efficiency ?En? for a given degree of parallelism ?n? indicates how much the program is >>>>>> efficiently accelerated by parallel processing. ?En? is given by the following formulae. Although their >>>>>> derivation processes are different depending on strong and weak scaling, derived formulae are the >>>>>> same. >>>>>> From the estimated time, my parallel efficiency using Amdahl's law on the current old cluster was 52.7%. >>>>>> So is my results acceptable? >>>>>> For the large data set, if using 2205 nodes (2205X8cores), my expected parallel efficiency is only 0.5%. The proposal recommends value of > 50%. >>>>>> The problem with this analysis is that the estimated serial fraction from Amdahl's Law changes as a function >>>>>> of problem size, so you cannot take the strong scaling from one problem and apply it to another without a >>>>>> model of this dependence. >>>>>> >>>>>> Weak scaling does model changes with problem size, so I would measure weak scaling on your current >>>>>> cluster, and extrapolate to the big machine. I realize that this does not make sense for many scientific >>>>>> applications, but neither does requiring a certain parallel efficiency. >>>>> Ok I check the results for my weak scaling it is even worse for the expected parallel efficiency. From the formula used, it's obvious it's doing some sort of exponential extrapolation decrease. So unless I can achieve a near > 90% speed up when I double the cores and problem size for my current 48/96 cores setup, extrapolating from about 96 nodes to 10,000 nodes will give a much lower expected parallel efficiency for the new case. >>>>> >>>>> However, it's mentioned in the FAQ that due to memory requirement, it's impossible to get >90% speed when I double the cores and problem size (ie linear increase in performance), which means that I can't get >90% speed up when I double the cores and problem size for my current 48/96 cores setup. Is that so? >>>> What is the output of -ksp_view -log_summary on the problem and then on the problem doubled in size and number of processors? >>>> >>>> Barry >>> Hi, >>> >>> I have attached the output >>> >>> 48 cores: log48 >>> 96 cores: log96 >>> >>> There are 2 solvers - The momentum linear eqn uses bcgs, while the Poisson eqn uses hypre BoomerAMG. >>> >>> Problem size doubled from 158x266x150 to 158x266x300. >>>>> So is it fair to say that the main problem does not lie in my programming skills, but rather the way the linear equations are solved? >>>>> >>>>> Thanks. >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> Is it possible for this type of scaling in PETSc (>50%), when using 17640 (2205X8) cores? >>>>>> Btw, I do not have access to the system. >>>>>> >>>>>> >>>>>> >>>>>> Sent using CloudMagic Email >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>> -- Norbert Wiener >>> > > From zonexo at gmail.com Sun Nov 1 22:02:33 2015 From: zonexo at gmail.com (TAY wee-beng) Date: Mon, 2 Nov 2015 12:02:33 +0800 Subject: [petsc-users] Scaling with number of cores In-Reply-To: References: <5634EDA4.5030203@gmail.com> <56356E2A.8090000@gmail.com> <0BEDF334-C0C6-49D1-9FE8-D178181C5F00@mcs.anl.gov> <5636140A.3040506@gmail.com> <61EEB3FD-A093-475C-9841-9E79BB120385@mcs.anl.gov> <5636BDF3.8000109@gmail.com> Message-ID: <5636E059.2010107@gmail.com> Hi, I have attached the new run with 100 time steps for 48 and 96 cores. Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? Why does the number of processes increase so much? Is there something wrong with my coding? Seems to be so too for my new run. Thank you Yours sincerely, TAY wee-beng On 2/11/2015 9:49 AM, Barry Smith wrote: > If you are doing many time steps with the same linear solver then you MUST do your weak scaling studies with MANY time steps since the setup time of AMG only takes place in the first stimestep. So run both 48 and 96 processes with the same large number of time steps. > > Barry > > > >> On Nov 1, 2015, at 7:35 PM, TAY wee-beng wrote: >> >> Hi, >> >> Sorry I forgot and use the old a.out. I have attached the new log for 48cores (log48), together with the 96cores log (log96). >> >> Why does the number of processes increase so much? Is there something wrong with my coding? >> >> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >> >> Lastly, I only simulated 2 time steps previously. Now I run for 10 timesteps (log48_10). Is it building the preconditioner at every timestep? >> >> Also, what about momentum eqn? Is it working well? >> >> I will try the gamg later too. >> >> Thank you >> >> Yours sincerely, >> >> TAY wee-beng >> >> On 2/11/2015 12:30 AM, Barry Smith wrote: >>> You used gmres with 48 processes but richardson with 96. You need to be careful and make sure you don't change the solvers when you change the number of processors since you can get very different inconsistent results >>> >>> Anyways all the time is being spent in the BoomerAMG algebraic multigrid setup and it is is scaling badly. When you double the problem size and number of processes it went from 3.2445e+01 to 4.3599e+02 seconds. >>> >>> PCSetUp 3 1.0 3.2445e+01 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 62 8 0 0 4 62 8 0 0 5 11 >>> >>> PCSetUp 3 1.0 4.3599e+02 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 85 18 0 0 6 85 18 0 0 6 2 >>> >>> Now is the Poisson problem changing at each timestep or can you use the same preconditioner built with BoomerAMG for all the time steps? Algebraic multigrid has a large set up time that you often doesn't matter if you have many time steps but if you have to rebuild it each timestep it is too large? >>> >>> You might also try -pc_type gamg and see how PETSc's algebraic multigrid scales for your problem/machine. >>> >>> Barry >>> >>> >>> >>>> On Nov 1, 2015, at 7:30 AM, TAY wee-beng wrote: >>>> >>>> >>>> On 1/11/2015 10:00 AM, Barry Smith wrote: >>>>>> On Oct 31, 2015, at 8:43 PM, TAY wee-beng wrote: >>>>>> >>>>>> >>>>>> On 1/11/2015 12:47 AM, Matthew Knepley wrote: >>>>>>> On Sat, Oct 31, 2015 at 11:34 AM, TAY wee-beng wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I understand that as mentioned in the faq, due to the limitations in memory, the scaling is not linear. So, I am trying to write a proposal to use a supercomputer. >>>>>>> Its specs are: >>>>>>> Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of memory per node) >>>>>>> >>>>>>> 8 cores / processor >>>>>>> Interconnect: Tofu (6-dimensional mesh/torus) Interconnect >>>>>>> Each cabinet contains 96 computing nodes, >>>>>>> One of the requirement is to give the performance of my current code with my current set of data, and there is a formula to calculate the estimated parallel efficiency when using the new large set of data >>>>>>> There are 2 ways to give performance: >>>>>>> 1. Strong scaling, which is defined as how the elapsed time varies with the number of processors for a fixed >>>>>>> problem. >>>>>>> 2. Weak scaling, which is defined as how the elapsed time varies with the number of processors for a >>>>>>> fixed problem size per processor. >>>>>>> I ran my cases with 48 and 96 cores with my current cluster, giving 140 and 90 mins respectively. This is classified as strong scaling. >>>>>>> Cluster specs: >>>>>>> CPU: AMD 6234 2.4GHz >>>>>>> 8 cores / processor (CPU) >>>>>>> 6 CPU / node >>>>>>> So 48 Cores / CPU >>>>>>> Not sure abt the memory / node >>>>>>> >>>>>>> The parallel efficiency ?En? for a given degree of parallelism ?n? indicates how much the program is >>>>>>> efficiently accelerated by parallel processing. ?En? is given by the following formulae. Although their >>>>>>> derivation processes are different depending on strong and weak scaling, derived formulae are the >>>>>>> same. >>>>>>> From the estimated time, my parallel efficiency using Amdahl's law on the current old cluster was 52.7%. >>>>>>> So is my results acceptable? >>>>>>> For the large data set, if using 2205 nodes (2205X8cores), my expected parallel efficiency is only 0.5%. The proposal recommends value of > 50%. >>>>>>> The problem with this analysis is that the estimated serial fraction from Amdahl's Law changes as a function >>>>>>> of problem size, so you cannot take the strong scaling from one problem and apply it to another without a >>>>>>> model of this dependence. >>>>>>> >>>>>>> Weak scaling does model changes with problem size, so I would measure weak scaling on your current >>>>>>> cluster, and extrapolate to the big machine. I realize that this does not make sense for many scientific >>>>>>> applications, but neither does requiring a certain parallel efficiency. >>>>>> Ok I check the results for my weak scaling it is even worse for the expected parallel efficiency. From the formula used, it's obvious it's doing some sort of exponential extrapolation decrease. So unless I can achieve a near > 90% speed up when I double the cores and problem size for my current 48/96 cores setup, extrapolating from about 96 nodes to 10,000 nodes will give a much lower expected parallel efficiency for the new case. >>>>>> >>>>>> However, it's mentioned in the FAQ that due to memory requirement, it's impossible to get >90% speed when I double the cores and problem size (ie linear increase in performance), which means that I can't get >90% speed up when I double the cores and problem size for my current 48/96 cores setup. Is that so? >>>>> What is the output of -ksp_view -log_summary on the problem and then on the problem doubled in size and number of processors? >>>>> >>>>> Barry >>>> Hi, >>>> >>>> I have attached the output >>>> >>>> 48 cores: log48 >>>> 96 cores: log96 >>>> >>>> There are 2 solvers - The momentum linear eqn uses bcgs, while the Poisson eqn uses hypre BoomerAMG. >>>> >>>> Problem size doubled from 158x266x150 to 158x266x300. >>>>>> So is it fair to say that the main problem does not lie in my programming skills, but rather the way the linear equations are solved? >>>>>> >>>>>> Thanks. >>>>>>> Thanks, >>>>>>> >>>>>>> Matt >>>>>>> Is it possible for this type of scaling in PETSc (>50%), when using 17640 (2205X8) cores? >>>>>>> Btw, I do not have access to the system. >>>>>>> >>>>>>> >>>>>>> >>>>>>> Sent using CloudMagic Email >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>>> -- Norbert Wiener >>>> >> -------------- next part -------------- 0.000000000000000E+000 0.353000000000000 0.000000000000000E+000 90.0000000000000 0.000000000000000E+000 0.000000000000000E+000 1.00000000000000 0.400000000000000 0 -400000 AB,AA,BB -3.41000006697141 3.44100006844383 3.46600006963126 3.40250006661518 size_x,size_y,size_z 158 266 301 body_cg_ini 0.523700833348298 0.778648765134454 7.03282656467989 Warning - length difference between element and cell max_element_length,min_element_length,min_delta 0.000000000000000E+000 10000000000.0000 1.800000000000000E-002 maximum ngh_surfaces and ngh_vertics are 41 20 minimum ngh_surfaces and ngh_vertics are 28 10 body_cg_ini 0.896813342835977 -0.976707581163755 7.03282656467989 Warning - length difference between element and cell max_element_length,min_element_length,min_delta 0.000000000000000E+000 10000000000.0000 1.800000000000000E-002 maximum ngh_surfaces and ngh_vertics are 41 20 minimum ngh_surfaces and ngh_vertics are 28 10 min IIB_cell_no 0 max IIB_cell_no 415 final initial IIB_cell_no 2075 min I_cell_no 0 max I_cell_no 468 final initial I_cell_no 2340 size(IIB_cell_u),size(I_cell_u),size(IIB_equal_cell_u),size(I_equal_cell_u) 2075 2340 2075 2340 IIB_I_cell_no_uvw_total1 7635 7644 7643 8279 8271 8297 IIB_I_cell_no_uvw_total2 7647 7646 7643 8271 8274 8266 KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines 1 0.00150000 0.35826998 0.36414728 1.27156134 -0.24352631E+04 -0.99308685E+02 0.12633660E+08 KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 96 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 96 MPI processes type: bjacobi block Jacobi: number of blocks = 96 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=37951284, cols=37951284 total: nonzeros=2.61758e+08, allocated nonzeros=5.31318e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 96 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 96 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 96 MPI processes type: mpiaij rows=12650428, cols=12650428 total: nonzeros=8.82137e+07, allocated nonzeros=1.77106e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines escape_time reached, so abort body 1 implicit forces and moment 1 0.927442607223602 -0.562098081140987 0.170409685651173 0.483779468746378 0.422008389858664 -1.17504373525251 body 2 implicit forces and moment 2 0.569670444239399 0.795659947391087 0.159539659289149 -0.555930483541150 0.172727625010991 1.07040540515635 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./a.out on a petsc-3.6.2_shared_rel named n12-04 with 96 processors, by wtay Mon Nov 2 04:32:27 2015 Using Petsc Release Version 3.6.2, Oct, 02, 2015 Max Max/Min Avg Total Time (sec): 2.577e+03 1.00002 2.577e+03 Objects: 1.420e+02 1.00000 1.420e+02 Flops: 5.204e+09 1.75932 4.008e+09 3.848e+11 Flops/sec: 2.020e+06 1.75932 1.556e+06 1.493e+08 MPI Messages: 9.607e+03 31.91694 5.957e+02 5.719e+04 MPI Message Lengths: 3.953e+08 2.00000 6.566e+05 3.755e+10 MPI Reductions: 2.121e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 2.5767e+03 100.0% 3.8481e+11 100.0% 5.719e+04 100.0% 6.566e+05 100.0% 2.120e+03 100.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 198 1.0 1.0318e+01 1.4 1.28e+09 1.9 3.8e+04 9.9e+05 0.0e+00 0 25 66100 0 0 25 66100 0 9318 MatSolve 297 1.0 7.5894e+00 1.5 1.78e+09 1.9 0.0e+00 0.0e+00 0.0e+00 0 34 0 0 0 0 34 0 0 0 17153 MatLUFactorNum 99 1.0 9.6152e+00 2.0 9.48e+08 2.0 0.0e+00 0.0e+00 0.0e+00 0 18 0 0 0 0 18 0 0 0 7105 MatILUFactorSym 1 1.0 7.7054e-02 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatConvert 1 1.0 5.8075e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyBegin 100 1.0 1.6192e+0120.8 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+02 0 0 0 0 9 0 0 0 0 9 0 MatAssemblyEnd 100 1.0 3.0870e+00 1.4 0.00e+00 0.0 7.6e+02 1.7e+05 1.6e+01 0 0 1 0 1 0 0 1 0 1 0 MatGetRowIJ 3 1.0 8.8215e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 1.7460e-02 6.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatView 397 2.0 9.6105e-01 4.2 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+02 0 0 0 0 9 0 0 0 0 9 0 KSPSetUp 199 1.0 5.5183e-02 8.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 199 1.0 2.3298e+03 1.0 5.20e+09 1.8 3.8e+04 9.9e+05 5.0e+02 90100 66100 24 90100 66100 24 165 VecDot 198 1.0 3.5927e+00 2.9 2.00e+08 1.3 0.0e+00 0.0e+00 2.0e+02 0 4 0 0 9 0 4 0 0 9 4183 VecDotNorm2 99 1.0 2.7516e+00 4.2 2.00e+08 1.3 0.0e+00 0.0e+00 9.9e+01 0 4 0 0 5 0 4 0 0 5 5462 VecNorm 198 1.0 7.0039e+00 6.3 2.00e+08 1.3 0.0e+00 0.0e+00 2.0e+02 0 4 0 0 9 0 4 0 0 9 2146 VecCopy 198 1.0 4.7999e-01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 696 1.0 9.7896e-01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPBYCZ 198 1.0 1.4153e+00 1.5 3.99e+08 1.3 0.0e+00 0.0e+00 0.0e+00 0 8 0 0 0 0 8 0 0 0 21238 VecWAXPY 198 1.0 1.3749e+00 1.4 2.00e+08 1.3 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 10931 VecAssemblyBegin 398 1.0 2.8590e+00 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+03 0 0 0 0 56 0 0 0 0 56 0 VecAssemblyEnd 398 1.0 1.3816e-03 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 198 1.0 6.5493e-01 2.6 0.00e+00 0.0 3.8e+04 9.9e+05 0.0e+00 0 0 66100 0 0 0 66100 0 0 VecScatterEnd 198 1.0 4.3411e+00 6.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCSetUp 199 1.0 4.3934e+02 1.0 9.48e+08 2.0 0.0e+00 0.0e+00 4.0e+00 17 18 0 0 0 17 18 0 0 0 155 PCSetUpOnBlocks 99 1.0 9.6988e+00 2.0 9.48e+08 2.0 0.0e+00 0.0e+00 0.0e+00 0 18 0 0 0 0 18 0 0 0 7044 PCApply 297 1.0 7.9751e+00 1.5 1.78e+09 1.9 0.0e+00 0.0e+00 0.0e+00 0 34 0 0 0 0 34 0 0 0 16323 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 7 7 182147036 0 Krylov Solver 3 3 3464 0 Vector 20 20 41709448 0 Vector Scatter 2 2 2176 0 Index Set 7 7 4705612 0 Preconditioner 3 3 3208 0 Viewer 100 99 75240 0 ======================================================================================================================== Average time to get PetscTime(): 1.90735e-07 Average time for MPI_Barrier(): 0.000446224 Average time for zero size MPI_Send(): 9.06487e-06 #PETSc Option Table entries: -log_summary -momentum_ksp_view -poisson_ksp_view #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --with-mpi-dir=/opt/ud/openmpi-1.8.8/ --with-blas-lapack-dir=/opt/ud/intel_xe_2013sp1/mkl/lib/intel64/ --with-debugging=0 --download-hypre=1 --prefix=/home/wtay/Lib/petsc-3.6.2_shared_rel --known-mpi-shared=1 --with-shared-libraries --with-fortran-interfaces=1 ----------------------------------------- Libraries compiled on Sun Oct 18 17:34:07 2015 on hpc12 Machine characteristics: Linux-3.10.0-123.20.1.el7.x86_64-x86_64-with-centos-7.1.1503-Core Using PETSc directory: /home/wtay/Codes/petsc-3.6.2 Using PETSc arch: petsc-3.6.2_shared_rel ----------------------------------------- Using C compiler: /opt/ud/openmpi-1.8.8/bin/mpicc -fPIC -wd1572 -O3 ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: /opt/ud/openmpi-1.8.8/bin/mpif90 -fPIC -O3 ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/include -I/home/wtay/Codes/petsc-3.6.2/include -I/home/wtay/Codes/petsc-3.6.2/include -I/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/include -I/home/wtay/Lib/petsc-3.6.2_shared_rel/include -I/opt/ud/openmpi-1.8.8/include ----------------------------------------- Using C linker: /opt/ud/openmpi-1.8.8/bin/mpicc Using Fortran linker: /opt/ud/openmpi-1.8.8/bin/mpif90 Using libraries: -Wl,-rpath,/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/lib -L/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/lib -lpetsc -Wl,-rpath,/home/wtay/Lib/petsc-3.6.2_shared_rel/lib -L/home/wtay/Lib/petsc-3.6.2_shared_rel/lib -lHYPRE -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -lmpi_cxx -Wl,-rpath,/opt/ud/intel_xe_2013sp1/mkl/lib/intel64 -L/opt/ud/intel_xe_2013sp1/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -lX11 -lhwloc -lssl -lcrypto -lmpi_usempi -lmpi_mpifh -lifport -lifcore -lm -lmpi_cxx -ldl -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -lmpi -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -limf -lsvml -lirng -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lpthread -lirc_s -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -ldl ----------------------------------------- -------------- next part -------------- -------------------------------------------------------------------------- [[58577,1],9]: A high-performance Open MPI point-to-point messaging module was unable to find any relevant network interfaces: Module: OpenFabrics (openib) Host: n12-02 Another transport will be used instead, although this may result in lower performance. -------------------------------------------------------------------------- 0.000000000000000E+000 0.353000000000000 0.000000000000000E+000 90.0000000000000 0.000000000000000E+000 0.000000000000000E+000 1.00000000000000 0.400000000000000 0 -400000 AB,AA,BB -2.47900002275128 2.50750002410496 3.46600006963126 3.40250006661518 size_x,size_y,size_z 158 266 150 body_cg_ini 0.523700833348298 0.778648765134454 7.03282656467989 Warning - length difference between element and cell max_element_length,min_element_length,min_delta 0.000000000000000E+000 10000000000.0000 1.800000000000000E-002 maximum ngh_surfaces and ngh_vertics are 45 22 minimum ngh_surfaces and ngh_vertics are 28 10 body_cg_ini 0.896813342835977 -0.976707581163755 7.03282656467989 Warning - length difference between element and cell max_element_length,min_element_length,min_delta 0.000000000000000E+000 10000000000.0000 1.800000000000000E-002 maximum ngh_surfaces and ngh_vertics are 45 22 minimum ngh_surfaces and ngh_vertics are 28 10 min IIB_cell_no 0 max IIB_cell_no 429 final initial IIB_cell_no 2145 min I_cell_no 0 max I_cell_no 460 final initial I_cell_no 2300 size(IIB_cell_u),size(I_cell_u),size(IIB_equal_cell_u),size(I_equal_cell_u) 2145 2300 2145 2300 IIB_I_cell_no_uvw_total1 3090 3094 3078 3080 3074 3073 IIB_I_cell_no_uvw_total2 3102 3108 3089 3077 3060 3086 KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines 1 0.00150000 0.26454057 0.26151125 1.18591342 -0.76697866E+03 -0.32601415E+02 0.62972429E+07 KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(momentum_) 48 MPI processes type: bcgs maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(momentum_) 48 MPI processes type: bjacobi block Jacobi: number of blocks = 48 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (momentum_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (momentum_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 package used to perform factorization: petsc total: nonzeros=3.24201e+06, allocated nonzeros=3.24201e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=504336, cols=504336 total: nonzeros=3.24201e+06, allocated nonzeros=3.53035e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=18912600, cols=18912600 total: nonzeros=1.30008e+08, allocated nonzeros=2.64776e+08 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object:(poisson_) 48 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object:(poisson_) 48 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 48 MPI processes type: mpiaij rows=6304200, cols=6304200 total: nonzeros=4.39181e+07, allocated nonzeros=8.82588e+07 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines escape_time reached, so abort body 1 implicit forces and moment 1 0.862585008111159 -0.514909355150849 0.188664224674766 0.478394001094961 0.368389427717324 -1.05426249343926 body 2 implicit forces and moment 2 0.527315451670885 0.731524817665969 0.148469052731966 -0.515183371217827 0.158120496614554 0.961546178988603 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./a.out on a petsc-3.6.2_shared_rel named n12-02 with 48 processors, by wtay Mon Nov 2 04:02:57 2015 Using Petsc Release Version 3.6.2, Oct, 02, 2015 Max Max/Min Avg Total Time (sec): 8.371e+02 1.00009 8.370e+02 Objects: 1.420e+02 1.00000 1.420e+02 Flops: 5.204e+09 1.75932 3.985e+09 1.913e+11 Flops/sec: 6.217e+06 1.75940 4.760e+06 2.285e+08 MPI Messages: 4.855e+03 16.12957 5.895e+02 2.829e+04 MPI Message Lengths: 3.953e+08 2.00000 6.566e+05 1.858e+10 MPI Reductions: 2.121e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 8.3704e+02 100.0% 1.9126e+11 100.0% 2.829e+04 100.0% 6.566e+05 100.0% 2.120e+03 100.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 198 1.0 2.4116e+01 3.0 1.28e+09 1.9 1.9e+04 9.9e+05 0.0e+00 2 25 66100 0 2 25 66100 0 1979 MatSolve 297 1.0 1.1487e+01 2.6 1.78e+09 1.9 0.0e+00 0.0e+00 0.0e+00 1 34 0 0 0 1 34 0 0 0 5629 MatLUFactorNum 99 1.0 1.7278e+01 2.5 9.48e+08 2.0 0.0e+00 0.0e+00 0.0e+00 1 18 0 0 0 1 18 0 0 0 1963 MatILUFactorSym 1 1.0 2.6259e-01 4.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatConvert 1 1.0 8.8444e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyBegin 100 1.0 1.4298e+0120.4 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+02 1 0 0 0 9 1 0 0 0 9 0 MatAssemblyEnd 100 1.0 4.7111e+00 1.8 0.00e+00 0.0 3.8e+02 1.7e+05 1.6e+01 0 0 1 0 1 0 0 1 0 1 0 MatGetRowIJ 3 1.0 8.8215e-06 9.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 3.5094e-02 5.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatView 397 2.0 2.7908e-01 3.1 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+02 0 0 0 0 9 0 0 0 0 9 0 KSPSetUp 199 1.0 1.5142e-01 5.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 199 1.0 6.4722e+02 1.0 5.20e+09 1.8 1.9e+04 9.9e+05 5.0e+02 77100 66100 24 77100 66100 24 296 VecDot 198 1.0 1.2704e+0111.7 2.00e+08 1.3 0.0e+00 0.0e+00 2.0e+02 1 4 0 0 9 1 4 0 0 9 590 VecDotNorm2 99 1.0 1.1053e+0128.6 2.00e+08 1.3 0.0e+00 0.0e+00 9.9e+01 1 4 0 0 5 1 4 0 0 5 678 VecNorm 198 1.0 1.3096e+0122.4 2.00e+08 1.3 0.0e+00 0.0e+00 2.0e+02 1 4 0 0 9 1 4 0 0 9 572 VecCopy 198 1.0 1.4333e+00 4.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 696 1.0 1.9516e+00 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPBYCZ 198 1.0 4.0017e+00 4.2 3.99e+08 1.3 0.0e+00 0.0e+00 0.0e+00 0 8 0 0 0 0 8 0 0 0 3743 VecWAXPY 198 1.0 3.2934e+00 3.4 2.00e+08 1.3 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 2274 VecAssemblyBegin 398 1.0 4.5900e+0053.3 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+03 0 0 0 0 56 0 0 0 0 56 0 VecAssemblyEnd 398 1.0 1.7416e-03 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 198 1.0 1.9184e+00 7.4 0.00e+00 0.0 1.9e+04 9.9e+05 0.0e+00 0 0 66100 0 0 0 66100 0 0 VecScatterEnd 198 1.0 1.7996e+0111.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 PCSetUp 199 1.0 5.8989e+01 1.2 9.48e+08 2.0 0.0e+00 0.0e+00 4.0e+00 6 18 0 0 0 6 18 0 0 0 575 PCSetUpOnBlocks 99 1.0 1.7492e+01 2.5 9.48e+08 2.0 0.0e+00 0.0e+00 0.0e+00 1 18 0 0 0 1 18 0 0 0 1939 PCApply 297 1.0 1.2338e+01 2.7 1.78e+09 1.9 0.0e+00 0.0e+00 0.0e+00 1 34 0 0 0 1 34 0 0 0 5241 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 7 7 182147036 0 Krylov Solver 3 3 3464 0 Vector 20 20 41709448 0 Vector Scatter 2 2 2176 0 Index Set 7 7 4705612 0 Preconditioner 3 3 3208 0 Viewer 100 99 75240 0 ======================================================================================================================== Average time to get PetscTime(): 9.53674e-08 Average time for MPI_Barrier(): 9.39369e-06 Average time for zero size MPI_Send(): 6.41743e-06 #PETSc Option Table entries: -log_summary -momentum_ksp_view -poisson_ksp_view #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --with-mpi-dir=/opt/ud/openmpi-1.8.8/ --with-blas-lapack-dir=/opt/ud/intel_xe_2013sp1/mkl/lib/intel64/ --with-debugging=0 --download-hypre=1 --prefix=/home/wtay/Lib/petsc-3.6.2_shared_rel --known-mpi-shared=1 --with-shared-libraries --with-fortran-interfaces=1 ----------------------------------------- Libraries compiled on Sun Oct 18 17:34:07 2015 on hpc12 Machine characteristics: Linux-3.10.0-123.20.1.el7.x86_64-x86_64-with-centos-7.1.1503-Core Using PETSc directory: /home/wtay/Codes/petsc-3.6.2 Using PETSc arch: petsc-3.6.2_shared_rel ----------------------------------------- Using C compiler: /opt/ud/openmpi-1.8.8/bin/mpicc -fPIC -wd1572 -O3 ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: /opt/ud/openmpi-1.8.8/bin/mpif90 -fPIC -O3 ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/include -I/home/wtay/Codes/petsc-3.6.2/include -I/home/wtay/Codes/petsc-3.6.2/include -I/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/include -I/home/wtay/Lib/petsc-3.6.2_shared_rel/include -I/opt/ud/openmpi-1.8.8/include ----------------------------------------- Using C linker: /opt/ud/openmpi-1.8.8/bin/mpicc Using Fortran linker: /opt/ud/openmpi-1.8.8/bin/mpif90 Using libraries: -Wl,-rpath,/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/lib -L/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/lib -lpetsc -Wl,-rpath,/home/wtay/Lib/petsc-3.6.2_shared_rel/lib -L/home/wtay/Lib/petsc-3.6.2_shared_rel/lib -lHYPRE -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -lmpi_cxx -Wl,-rpath,/opt/ud/intel_xe_2013sp1/mkl/lib/intel64 -L/opt/ud/intel_xe_2013sp1/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -lX11 -lhwloc -lssl -lcrypto -lmpi_usempi -lmpi_mpifh -lifport -lifcore -lm -lmpi_cxx -ldl -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -lmpi -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -limf -lsvml -lirng -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lpthread -lirc_s -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -ldl ----------------------------------------- From bsmith at mcs.anl.gov Sun Nov 1 22:27:58 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sun, 1 Nov 2015 22:27:58 -0600 Subject: [petsc-users] Scaling with number of cores In-Reply-To: <5636E059.2010107@gmail.com> References: <5634EDA4.5030203@gmail.com> <56356E2A.8090000@gmail.com> <0BEDF334-C0C6-49D1-9FE8-D178181C5F00@mcs.anl.gov> <5636140A.3040506@gmail.com> <61EEB3FD-A093-475C-9841-9E79BB120385@mcs.anl.gov> <5636BDF3.8000109@gmail.com> <5636E059.2010107@gmail.com> Message-ID: <501DC517-2887-4222-9ECD-9951D521A4E5@mcs.anl.gov> Run without the -momentum_ksp_view -poisson_ksp_view and send the new results You can see from the log summary that the PCSetUp is taking a much smaller percentage of the time meaning that it is reusing the preconditioner and not rebuilding it each time. Barry Something makes no sense with the output: it gives KSPSolve 199 1.0 2.3298e+03 1.0 5.20e+09 1.8 3.8e+04 9.9e+05 5.0e+02 90100 66100 24 90100 66100 24 165 90% of the time is in the solve but there is no significant amount of time in other events of the code which is just not possible. I hope it is due to your IO. > On Nov 1, 2015, at 10:02 PM, TAY wee-beng wrote: > > Hi, > > I have attached the new run with 100 time steps for 48 and 96 cores. > > Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? > > Why does the number of processes increase so much? Is there something wrong with my coding? Seems to be so too for my new run. > > Thank you > > Yours sincerely, > > TAY wee-beng > > On 2/11/2015 9:49 AM, Barry Smith wrote: >> If you are doing many time steps with the same linear solver then you MUST do your weak scaling studies with MANY time steps since the setup time of AMG only takes place in the first stimestep. So run both 48 and 96 processes with the same large number of time steps. >> >> Barry >> >> >> >>> On Nov 1, 2015, at 7:35 PM, TAY wee-beng wrote: >>> >>> Hi, >>> >>> Sorry I forgot and use the old a.out. I have attached the new log for 48cores (log48), together with the 96cores log (log96). >>> >>> Why does the number of processes increase so much? Is there something wrong with my coding? >>> >>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >>> >>> Lastly, I only simulated 2 time steps previously. Now I run for 10 timesteps (log48_10). Is it building the preconditioner at every timestep? >>> >>> Also, what about momentum eqn? Is it working well? >>> >>> I will try the gamg later too. >>> >>> Thank you >>> >>> Yours sincerely, >>> >>> TAY wee-beng >>> >>> On 2/11/2015 12:30 AM, Barry Smith wrote: >>>> You used gmres with 48 processes but richardson with 96. You need to be careful and make sure you don't change the solvers when you change the number of processors since you can get very different inconsistent results >>>> >>>> Anyways all the time is being spent in the BoomerAMG algebraic multigrid setup and it is is scaling badly. When you double the problem size and number of processes it went from 3.2445e+01 to 4.3599e+02 seconds. >>>> >>>> PCSetUp 3 1.0 3.2445e+01 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 62 8 0 0 4 62 8 0 0 5 11 >>>> >>>> PCSetUp 3 1.0 4.3599e+02 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 85 18 0 0 6 85 18 0 0 6 2 >>>> >>>> Now is the Poisson problem changing at each timestep or can you use the same preconditioner built with BoomerAMG for all the time steps? Algebraic multigrid has a large set up time that you often doesn't matter if you have many time steps but if you have to rebuild it each timestep it is too large? >>>> >>>> You might also try -pc_type gamg and see how PETSc's algebraic multigrid scales for your problem/machine. >>>> >>>> Barry >>>> >>>> >>>> >>>>> On Nov 1, 2015, at 7:30 AM, TAY wee-beng wrote: >>>>> >>>>> >>>>> On 1/11/2015 10:00 AM, Barry Smith wrote: >>>>>>> On Oct 31, 2015, at 8:43 PM, TAY wee-beng wrote: >>>>>>> >>>>>>> >>>>>>> On 1/11/2015 12:47 AM, Matthew Knepley wrote: >>>>>>>> On Sat, Oct 31, 2015 at 11:34 AM, TAY wee-beng wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> I understand that as mentioned in the faq, due to the limitations in memory, the scaling is not linear. So, I am trying to write a proposal to use a supercomputer. >>>>>>>> Its specs are: >>>>>>>> Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of memory per node) >>>>>>>> >>>>>>>> 8 cores / processor >>>>>>>> Interconnect: Tofu (6-dimensional mesh/torus) Interconnect >>>>>>>> Each cabinet contains 96 computing nodes, >>>>>>>> One of the requirement is to give the performance of my current code with my current set of data, and there is a formula to calculate the estimated parallel efficiency when using the new large set of data >>>>>>>> There are 2 ways to give performance: >>>>>>>> 1. Strong scaling, which is defined as how the elapsed time varies with the number of processors for a fixed >>>>>>>> problem. >>>>>>>> 2. Weak scaling, which is defined as how the elapsed time varies with the number of processors for a >>>>>>>> fixed problem size per processor. >>>>>>>> I ran my cases with 48 and 96 cores with my current cluster, giving 140 and 90 mins respectively. This is classified as strong scaling. >>>>>>>> Cluster specs: >>>>>>>> CPU: AMD 6234 2.4GHz >>>>>>>> 8 cores / processor (CPU) >>>>>>>> 6 CPU / node >>>>>>>> So 48 Cores / CPU >>>>>>>> Not sure abt the memory / node >>>>>>>> >>>>>>>> The parallel efficiency ?En? for a given degree of parallelism ?n? indicates how much the program is >>>>>>>> efficiently accelerated by parallel processing. ?En? is given by the following formulae. Although their >>>>>>>> derivation processes are different depending on strong and weak scaling, derived formulae are the >>>>>>>> same. >>>>>>>> From the estimated time, my parallel efficiency using Amdahl's law on the current old cluster was 52.7%. >>>>>>>> So is my results acceptable? >>>>>>>> For the large data set, if using 2205 nodes (2205X8cores), my expected parallel efficiency is only 0.5%. The proposal recommends value of > 50%. >>>>>>>> The problem with this analysis is that the estimated serial fraction from Amdahl's Law changes as a function >>>>>>>> of problem size, so you cannot take the strong scaling from one problem and apply it to another without a >>>>>>>> model of this dependence. >>>>>>>> >>>>>>>> Weak scaling does model changes with problem size, so I would measure weak scaling on your current >>>>>>>> cluster, and extrapolate to the big machine. I realize that this does not make sense for many scientific >>>>>>>> applications, but neither does requiring a certain parallel efficiency. >>>>>>> Ok I check the results for my weak scaling it is even worse for the expected parallel efficiency. From the formula used, it's obvious it's doing some sort of exponential extrapolation decrease. So unless I can achieve a near > 90% speed up when I double the cores and problem size for my current 48/96 cores setup, extrapolating from about 96 nodes to 10,000 nodes will give a much lower expected parallel efficiency for the new case. >>>>>>> >>>>>>> However, it's mentioned in the FAQ that due to memory requirement, it's impossible to get >90% speed when I double the cores and problem size (ie linear increase in performance), which means that I can't get >90% speed up when I double the cores and problem size for my current 48/96 cores setup. Is that so? >>>>>> What is the output of -ksp_view -log_summary on the problem and then on the problem doubled in size and number of processors? >>>>>> >>>>>> Barry >>>>> Hi, >>>>> >>>>> I have attached the output >>>>> >>>>> 48 cores: log48 >>>>> 96 cores: log96 >>>>> >>>>> There are 2 solvers - The momentum linear eqn uses bcgs, while the Poisson eqn uses hypre BoomerAMG. >>>>> >>>>> Problem size doubled from 158x266x150 to 158x266x300. >>>>>>> So is it fair to say that the main problem does not lie in my programming skills, but rather the way the linear equations are solved? >>>>>>> >>>>>>> Thanks. >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Matt >>>>>>>> Is it possible for this type of scaling in PETSc (>50%), when using 17640 (2205X8) cores? >>>>>>>> Btw, I do not have access to the system. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Sent using CloudMagic Email >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>>>> -- Norbert Wiener >>>>> >>> > > From zonexo at gmail.com Mon Nov 2 00:19:49 2015 From: zonexo at gmail.com (TAY wee-beng) Date: Mon, 2 Nov 2015 14:19:49 +0800 Subject: [petsc-users] Scaling with number of cores In-Reply-To: <501DC517-2887-4222-9ECD-9951D521A4E5@mcs.anl.gov> References: <5634EDA4.5030203@gmail.com> <56356E2A.8090000@gmail.com> <0BEDF334-C0C6-49D1-9FE8-D178181C5F00@mcs.anl.gov> <5636140A.3040506@gmail.com> <61EEB3FD-A093-475C-9841-9E79BB120385@mcs.anl.gov> <5636BDF3.8000109@gmail.com> <5636E059.2010107@gmail.com> <501DC517-2887-4222-9ECD-9951D521A4E5@mcs.anl.gov> Message-ID: <56370085.7070502@gmail.com> Hi, I have attached the new results. Thank you Yours sincerely, TAY wee-beng On 2/11/2015 12:27 PM, Barry Smith wrote: > Run without the -momentum_ksp_view -poisson_ksp_view and send the new results > > > You can see from the log summary that the PCSetUp is taking a much smaller percentage of the time meaning that it is reusing the preconditioner and not rebuilding it each time. > > Barry > > Something makes no sense with the output: it gives > > KSPSolve 199 1.0 2.3298e+03 1.0 5.20e+09 1.8 3.8e+04 9.9e+05 5.0e+02 90100 66100 24 90100 66100 24 165 > > 90% of the time is in the solve but there is no significant amount of time in other events of the code which is just not possible. I hope it is due to your IO. > > > >> On Nov 1, 2015, at 10:02 PM, TAY wee-beng wrote: >> >> Hi, >> >> I have attached the new run with 100 time steps for 48 and 96 cores. >> >> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >> >> Why does the number of processes increase so much? Is there something wrong with my coding? Seems to be so too for my new run. >> >> Thank you >> >> Yours sincerely, >> >> TAY wee-beng >> >> On 2/11/2015 9:49 AM, Barry Smith wrote: >>> If you are doing many time steps with the same linear solver then you MUST do your weak scaling studies with MANY time steps since the setup time of AMG only takes place in the first stimestep. So run both 48 and 96 processes with the same large number of time steps. >>> >>> Barry >>> >>> >>> >>>> On Nov 1, 2015, at 7:35 PM, TAY wee-beng wrote: >>>> >>>> Hi, >>>> >>>> Sorry I forgot and use the old a.out. I have attached the new log for 48cores (log48), together with the 96cores log (log96). >>>> >>>> Why does the number of processes increase so much? Is there something wrong with my coding? >>>> >>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >>>> >>>> Lastly, I only simulated 2 time steps previously. Now I run for 10 timesteps (log48_10). Is it building the preconditioner at every timestep? >>>> >>>> Also, what about momentum eqn? Is it working well? >>>> >>>> I will try the gamg later too. >>>> >>>> Thank you >>>> >>>> Yours sincerely, >>>> >>>> TAY wee-beng >>>> >>>> On 2/11/2015 12:30 AM, Barry Smith wrote: >>>>> You used gmres with 48 processes but richardson with 96. You need to be careful and make sure you don't change the solvers when you change the number of processors since you can get very different inconsistent results >>>>> >>>>> Anyways all the time is being spent in the BoomerAMG algebraic multigrid setup and it is is scaling badly. When you double the problem size and number of processes it went from 3.2445e+01 to 4.3599e+02 seconds. >>>>> >>>>> PCSetUp 3 1.0 3.2445e+01 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 62 8 0 0 4 62 8 0 0 5 11 >>>>> >>>>> PCSetUp 3 1.0 4.3599e+02 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 85 18 0 0 6 85 18 0 0 6 2 >>>>> >>>>> Now is the Poisson problem changing at each timestep or can you use the same preconditioner built with BoomerAMG for all the time steps? Algebraic multigrid has a large set up time that you often doesn't matter if you have many time steps but if you have to rebuild it each timestep it is too large? >>>>> >>>>> You might also try -pc_type gamg and see how PETSc's algebraic multigrid scales for your problem/machine. >>>>> >>>>> Barry >>>>> >>>>> >>>>> >>>>>> On Nov 1, 2015, at 7:30 AM, TAY wee-beng wrote: >>>>>> >>>>>> >>>>>> On 1/11/2015 10:00 AM, Barry Smith wrote: >>>>>>>> On Oct 31, 2015, at 8:43 PM, TAY wee-beng wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 1/11/2015 12:47 AM, Matthew Knepley wrote: >>>>>>>>> On Sat, Oct 31, 2015 at 11:34 AM, TAY wee-beng wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I understand that as mentioned in the faq, due to the limitations in memory, the scaling is not linear. So, I am trying to write a proposal to use a supercomputer. >>>>>>>>> Its specs are: >>>>>>>>> Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of memory per node) >>>>>>>>> >>>>>>>>> 8 cores / processor >>>>>>>>> Interconnect: Tofu (6-dimensional mesh/torus) Interconnect >>>>>>>>> Each cabinet contains 96 computing nodes, >>>>>>>>> One of the requirement is to give the performance of my current code with my current set of data, and there is a formula to calculate the estimated parallel efficiency when using the new large set of data >>>>>>>>> There are 2 ways to give performance: >>>>>>>>> 1. Strong scaling, which is defined as how the elapsed time varies with the number of processors for a fixed >>>>>>>>> problem. >>>>>>>>> 2. Weak scaling, which is defined as how the elapsed time varies with the number of processors for a >>>>>>>>> fixed problem size per processor. >>>>>>>>> I ran my cases with 48 and 96 cores with my current cluster, giving 140 and 90 mins respectively. This is classified as strong scaling. >>>>>>>>> Cluster specs: >>>>>>>>> CPU: AMD 6234 2.4GHz >>>>>>>>> 8 cores / processor (CPU) >>>>>>>>> 6 CPU / node >>>>>>>>> So 48 Cores / CPU >>>>>>>>> Not sure abt the memory / node >>>>>>>>> >>>>>>>>> The parallel efficiency ?En? for a given degree of parallelism ?n? indicates how much the program is >>>>>>>>> efficiently accelerated by parallel processing. ?En? is given by the following formulae. Although their >>>>>>>>> derivation processes are different depending on strong and weak scaling, derived formulae are the >>>>>>>>> same. >>>>>>>>> From the estimated time, my parallel efficiency using Amdahl's law on the current old cluster was 52.7%. >>>>>>>>> So is my results acceptable? >>>>>>>>> For the large data set, if using 2205 nodes (2205X8cores), my expected parallel efficiency is only 0.5%. The proposal recommends value of > 50%. >>>>>>>>> The problem with this analysis is that the estimated serial fraction from Amdahl's Law changes as a function >>>>>>>>> of problem size, so you cannot take the strong scaling from one problem and apply it to another without a >>>>>>>>> model of this dependence. >>>>>>>>> >>>>>>>>> Weak scaling does model changes with problem size, so I would measure weak scaling on your current >>>>>>>>> cluster, and extrapolate to the big machine. I realize that this does not make sense for many scientific >>>>>>>>> applications, but neither does requiring a certain parallel efficiency. >>>>>>>> Ok I check the results for my weak scaling it is even worse for the expected parallel efficiency. From the formula used, it's obvious it's doing some sort of exponential extrapolation decrease. So unless I can achieve a near > 90% speed up when I double the cores and problem size for my current 48/96 cores setup, extrapolating from about 96 nodes to 10,000 nodes will give a much lower expected parallel efficiency for the new case. >>>>>>>> >>>>>>>> However, it's mentioned in the FAQ that due to memory requirement, it's impossible to get >90% speed when I double the cores and problem size (ie linear increase in performance), which means that I can't get >90% speed up when I double the cores and problem size for my current 48/96 cores setup. Is that so? >>>>>>> What is the output of -ksp_view -log_summary on the problem and then on the problem doubled in size and number of processors? >>>>>>> >>>>>>> Barry >>>>>> Hi, >>>>>> >>>>>> I have attached the output >>>>>> >>>>>> 48 cores: log48 >>>>>> 96 cores: log96 >>>>>> >>>>>> There are 2 solvers - The momentum linear eqn uses bcgs, while the Poisson eqn uses hypre BoomerAMG. >>>>>> >>>>>> Problem size doubled from 158x266x150 to 158x266x300. >>>>>>>> So is it fair to say that the main problem does not lie in my programming skills, but rather the way the linear equations are solved? >>>>>>>> >>>>>>>> Thanks. >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Matt >>>>>>>>> Is it possible for this type of scaling in PETSc (>50%), when using 17640 (2205X8) cores? >>>>>>>>> Btw, I do not have access to the system. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Sent using CloudMagic Email >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>>>>> -- Norbert Wiener >>>>>> >>>> >> -------------- next part -------------- 0.000000000000000E+000 0.353000000000000 0.000000000000000E+000 90.0000000000000 0.000000000000000E+000 0.000000000000000E+000 1.00000000000000 0.400000000000000 0 -400000 AB,AA,BB -3.41000006697141 3.44100006844383 3.46600006963126 3.40250006661518 size_x,size_y,size_z 158 266 301 body_cg_ini 0.523700833348298 0.778648765134454 7.03282656467989 Warning - length difference between element and cell max_element_length,min_element_length,min_delta 0.000000000000000E+000 10000000000.0000 1.800000000000000E-002 maximum ngh_surfaces and ngh_vertics are 41 20 minimum ngh_surfaces and ngh_vertics are 28 10 body_cg_ini 0.896813342835977 -0.976707581163755 7.03282656467989 Warning - length difference between element and cell max_element_length,min_element_length,min_delta 0.000000000000000E+000 10000000000.0000 1.800000000000000E-002 maximum ngh_surfaces and ngh_vertics are 41 20 minimum ngh_surfaces and ngh_vertics are 28 10 min IIB_cell_no 0 max IIB_cell_no 415 final initial IIB_cell_no 2075 min I_cell_no 0 max I_cell_no 468 final initial I_cell_no 2340 size(IIB_cell_u),size(I_cell_u),size(IIB_equal_cell_u),size(I_equal_cell_u) 2075 2340 2075 2340 IIB_I_cell_no_uvw_total1 7635 7644 7643 8279 8271 8297 IIB_I_cell_no_uvw_total2 7647 7646 7643 8271 8274 8266 1 0.00150000 0.35826998 0.36414728 1.27156134 -0.24352631E+04 -0.99308685E+02 0.12633660E+08 escape_time reached, so abort body 1 implicit forces and moment 1 0.927442607223602 -0.562098081140987 0.170409685651173 0.483779468746378 0.422008389858664 -1.17504373525251 body 2 implicit forces and moment 2 0.569670444239399 0.795659947391087 0.159539659289149 -0.555930483541150 0.172727625010991 1.07040540515635 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./a.out on a petsc-3.6.2_shared_rel named n12-03 with 96 processors, by wtay Mon Nov 2 06:33:26 2015 Using Petsc Release Version 3.6.2, Oct, 02, 2015 Max Max/Min Avg Total Time (sec): 2.616e+03 1.00000 2.616e+03 Objects: 4.300e+01 1.00000 4.300e+01 Flops: 5.204e+09 1.75932 4.008e+09 3.848e+11 Flops/sec: 1.989e+06 1.75932 1.532e+06 1.471e+08 MPI Messages: 4.040e+02 2.00000 3.998e+02 3.838e+04 MPI Message Lengths: 3.953e+08 2.00000 9.784e+05 3.755e+10 MPI Reductions: 1.922e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 2.6158e+03 100.0% 3.8481e+11 100.0% 3.838e+04 100.0% 9.784e+05 100.0% 1.921e+03 99.9% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 198 1.0 1.0111e+01 1.4 1.28e+09 1.9 3.8e+04 9.9e+05 0.0e+00 0 25 98100 0 0 25 98100 0 9509 MatSolve 297 1.0 7.3316e+00 1.5 1.78e+09 1.9 0.0e+00 0.0e+00 0.0e+00 0 34 0 0 0 0 34 0 0 0 17756 MatLUFactorNum 99 1.0 9.7915e+00 2.0 9.48e+08 2.0 0.0e+00 0.0e+00 0.0e+00 0 18 0 0 0 0 18 0 0 0 6977 MatILUFactorSym 1 1.0 8.0566e-02 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatConvert 1 1.0 5.6834e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyBegin 100 1.0 1.5968e+0112.5 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+02 0 0 0 0 10 0 0 0 0 10 0 MatAssemblyEnd 100 1.0 3.0815e+00 1.4 0.00e+00 0.0 7.6e+02 1.7e+05 1.6e+01 0 0 2 0 1 0 0 2 0 1 0 MatGetRowIJ 3 1.0 5.9605e-06 6.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 1.4124e-02 4.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSetUp 199 1.0 5.0311e-02 7.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 199 1.0 2.3644e+03 1.0 5.20e+09 1.8 3.8e+04 9.9e+05 5.0e+02 90100 98100 26 90100 98100 26 163 VecDot 198 1.0 3.5329e+00 2.7 2.00e+08 1.3 0.0e+00 0.0e+00 2.0e+02 0 4 0 0 10 0 4 0 0 10 4254 VecDotNorm2 99 1.0 2.8264e+00 4.4 2.00e+08 1.3 0.0e+00 0.0e+00 9.9e+01 0 4 0 0 5 0 4 0 0 5 5317 VecNorm 198 1.0 6.6515e+00 5.1 2.00e+08 1.3 0.0e+00 0.0e+00 2.0e+02 0 4 0 0 10 0 4 0 0 10 2259 VecCopy 198 1.0 5.1771e-01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 696 1.0 9.4293e-01 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPBYCZ 198 1.0 1.4347e+00 1.5 3.99e+08 1.3 0.0e+00 0.0e+00 0.0e+00 0 8 0 0 0 0 8 0 0 0 20951 VecWAXPY 198 1.0 1.3298e+00 1.3 2.00e+08 1.3 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 11302 VecAssemblyBegin 398 1.0 3.1136e+00 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+03 0 0 0 0 62 0 0 0 0 62 0 VecAssemblyEnd 398 1.0 1.3890e-03 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 198 1.0 6.5443e-01 2.3 0.00e+00 0.0 3.8e+04 9.9e+05 0.0e+00 0 0 98100 0 0 0 98100 0 0 VecScatterEnd 198 1.0 4.2735e+00 5.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCSetUp 199 1.0 4.4552e+02 1.0 9.48e+08 2.0 0.0e+00 0.0e+00 4.0e+00 17 18 0 0 0 17 18 0 0 0 153 PCSetUpOnBlocks 99 1.0 9.8690e+00 1.9 9.48e+08 2.0 0.0e+00 0.0e+00 0.0e+00 0 18 0 0 0 0 18 0 0 0 6922 PCApply 297 1.0 7.7572e+00 1.4 1.78e+09 1.9 0.0e+00 0.0e+00 0.0e+00 0 34 0 0 0 0 34 0 0 0 16782 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 7 7 182147036 0 Krylov Solver 3 3 3464 0 Vector 20 20 41709448 0 Vector Scatter 2 2 2176 0 Index Set 7 7 4705612 0 Preconditioner 3 3 3208 0 Viewer 1 0 0 0 ======================================================================================================================== Average time to get PetscTime(): 9.53674e-08 Average time for MPI_Barrier(): 0.000525999 Average time for zero size MPI_Send(): 8.90593e-06 #PETSc Option Table entries: -log_summary #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --with-mpi-dir=/opt/ud/openmpi-1.8.8/ --with-blas-lapack-dir=/opt/ud/intel_xe_2013sp1/mkl/lib/intel64/ --with-debugging=0 --download-hypre=1 --prefix=/home/wtay/Lib/petsc-3.6.2_shared_rel --known-mpi-shared=1 --with-shared-libraries --with-fortran-interfaces=1 ----------------------------------------- Libraries compiled on Sun Oct 18 17:34:07 2015 on hpc12 Machine characteristics: Linux-3.10.0-123.20.1.el7.x86_64-x86_64-with-centos-7.1.1503-Core Using PETSc directory: /home/wtay/Codes/petsc-3.6.2 Using PETSc arch: petsc-3.6.2_shared_rel ----------------------------------------- Using C compiler: /opt/ud/openmpi-1.8.8/bin/mpicc -fPIC -wd1572 -O3 ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: /opt/ud/openmpi-1.8.8/bin/mpif90 -fPIC -O3 ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/include -I/home/wtay/Codes/petsc-3.6.2/include -I/home/wtay/Codes/petsc-3.6.2/include -I/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/include -I/home/wtay/Lib/petsc-3.6.2_shared_rel/include -I/opt/ud/openmpi-1.8.8/include ----------------------------------------- Using C linker: /opt/ud/openmpi-1.8.8/bin/mpicc Using Fortran linker: /opt/ud/openmpi-1.8.8/bin/mpif90 Using libraries: -Wl,-rpath,/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/lib -L/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/lib -lpetsc -Wl,-rpath,/home/wtay/Lib/petsc-3.6.2_shared_rel/lib -L/home/wtay/Lib/petsc-3.6.2_shared_rel/lib -lHYPRE -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -lmpi_cxx -Wl,-rpath,/opt/ud/intel_xe_2013sp1/mkl/lib/intel64 -L/opt/ud/intel_xe_2013sp1/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -lX11 -lhwloc -lssl -lcrypto -lmpi_usempi -lmpi_mpifh -lifport -lifcore -lm -lmpi_cxx -ldl -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -lmpi -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -limf -lsvml -lirng -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lpthread -lirc_s -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -ldl ----------------------------------------- -------------- next part -------------- -------------------------------------------------------------------------- [[61614,1],3]: A high-performance Open MPI point-to-point messaging module was unable to find any relevant network interfaces: Module: OpenFabrics (openib) Host: n12-02 Another transport will be used instead, although this may result in lower performance. -------------------------------------------------------------------------- 0.000000000000000E+000 0.353000000000000 0.000000000000000E+000 90.0000000000000 0.000000000000000E+000 0.000000000000000E+000 1.00000000000000 0.400000000000000 0 -400000 AB,AA,BB -2.47900002275128 2.50750002410496 3.46600006963126 3.40250006661518 size_x,size_y,size_z 158 266 150 body_cg_ini 0.523700833348298 0.778648765134454 7.03282656467989 Warning - length difference between element and cell max_element_length,min_element_length,min_delta 0.000000000000000E+000 10000000000.0000 1.800000000000000E-002 maximum ngh_surfaces and ngh_vertics are 45 22 minimum ngh_surfaces and ngh_vertics are 28 10 body_cg_ini 0.896813342835977 -0.976707581163755 7.03282656467989 Warning - length difference between element and cell max_element_length,min_element_length,min_delta 0.000000000000000E+000 10000000000.0000 1.800000000000000E-002 maximum ngh_surfaces and ngh_vertics are 45 22 minimum ngh_surfaces and ngh_vertics are 28 10 min IIB_cell_no 0 max IIB_cell_no 429 final initial IIB_cell_no 2145 min I_cell_no 0 max I_cell_no 460 final initial I_cell_no 2300 size(IIB_cell_u),size(I_cell_u),size(IIB_equal_cell_u),size(I_equal_cell_u) 2145 2300 2145 2300 IIB_I_cell_no_uvw_total1 3090 3094 3078 3080 3074 3073 IIB_I_cell_no_uvw_total2 3102 3108 3089 3077 3060 3086 1 0.00150000 0.26454057 0.26151125 1.18591342 -0.76697866E+03 -0.32601415E+02 0.62972429E+07 escape_time reached, so abort body 1 implicit forces and moment 1 0.862585008111159 -0.514909355150849 0.188664224674766 0.478394001094961 0.368389427717324 -1.05426249343926 body 2 implicit forces and moment 2 0.527315451670885 0.731524817665969 0.148469052731966 -0.515183371217827 0.158120496614554 0.961546178988603 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./a.out on a petsc-3.6.2_shared_rel named n12-02 with 48 processors, by wtay Mon Nov 2 06:04:49 2015 Using Petsc Release Version 3.6.2, Oct, 02, 2015 Max Max/Min Avg Total Time (sec): 8.683e+02 1.00000 8.683e+02 Objects: 4.300e+01 1.00000 4.300e+01 Flops: 5.204e+09 1.75932 3.985e+09 1.913e+11 Flops/sec: 5.993e+06 1.75932 4.589e+06 2.203e+08 MPI Messages: 4.040e+02 2.00000 3.956e+02 1.899e+04 MPI Message Lengths: 3.953e+08 2.00000 9.784e+05 1.858e+10 MPI Reductions: 1.922e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 8.6829e+02 100.0% 1.9126e+11 100.0% 1.899e+04 100.0% 9.784e+05 100.0% 1.921e+03 99.9% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 198 1.0 2.1698e+01 2.8 1.28e+09 1.9 1.9e+04 9.9e+05 0.0e+00 2 25 98100 0 2 25 98100 0 2200 MatSolve 297 1.0 1.1486e+01 2.8 1.78e+09 1.9 0.0e+00 0.0e+00 0.0e+00 1 34 0 0 0 1 34 0 0 0 5630 MatLUFactorNum 99 1.0 1.3933e+01 2.1 9.48e+08 2.0 0.0e+00 0.0e+00 0.0e+00 1 18 0 0 0 1 18 0 0 0 2434 MatILUFactorSym 1 1.0 2.7501e-01 4.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatConvert 1 1.0 8.8003e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyBegin 100 1.0 1.3273e+0154.9 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+02 1 0 0 0 10 1 0 0 0 10 0 MatAssemblyEnd 100 1.0 4.6471e+00 1.9 0.00e+00 0.0 3.8e+02 1.7e+05 1.6e+01 0 0 2 0 1 0 0 2 0 1 0 MatGetRowIJ 3 1.0 6.1989e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 2.9773e-02 5.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSetUp 199 1.0 1.4844e-01 4.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 199 1.0 6.7551e+02 1.0 5.20e+09 1.8 1.9e+04 9.9e+05 5.0e+02 78100 98100 26 78100 98100 26 283 VecDot 198 1.0 1.1890e+01 9.8 2.00e+08 1.3 0.0e+00 0.0e+00 2.0e+02 1 4 0 0 10 1 4 0 0 10 630 VecDotNorm2 99 1.0 1.0095e+0111.7 2.00e+08 1.3 0.0e+00 0.0e+00 9.9e+01 1 4 0 0 5 1 4 0 0 5 742 VecNorm 198 1.0 1.2050e+0110.0 2.00e+08 1.3 0.0e+00 0.0e+00 2.0e+02 1 4 0 0 10 1 4 0 0 10 622 VecCopy 198 1.0 1.5117e+00 4.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 696 1.0 1.8900e+00 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPBYCZ 198 1.0 3.6260e+00 3.8 3.99e+08 1.3 0.0e+00 0.0e+00 0.0e+00 0 8 0 0 0 0 8 0 0 0 4131 VecWAXPY 198 1.0 2.8821e+00 2.9 2.00e+08 1.3 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 2599 VecAssemblyBegin 398 1.0 3.3092e+00 5.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+03 0 0 0 0 62 0 0 0 0 62 0 VecAssemblyEnd 398 1.0 1.7860e-03 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 198 1.0 1.8810e+00 7.3 0.00e+00 0.0 1.9e+04 9.9e+05 0.0e+00 0 0 98100 0 0 0 98100 0 0 VecScatterEnd 198 1.0 1.6243e+0112.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 PCSetUp 199 1.0 5.5428e+01 1.2 9.48e+08 2.0 0.0e+00 0.0e+00 4.0e+00 6 18 0 0 0 6 18 0 0 0 612 PCSetUpOnBlocks 99 1.0 1.4139e+01 2.1 9.48e+08 2.0 0.0e+00 0.0e+00 0.0e+00 1 18 0 0 0 1 18 0 0 0 2399 PCApply 297 1.0 1.2171e+01 2.8 1.78e+09 1.9 0.0e+00 0.0e+00 0.0e+00 1 34 0 0 0 1 34 0 0 0 5313 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 7 7 182147036 0 Krylov Solver 3 3 3464 0 Vector 20 20 41709448 0 Vector Scatter 2 2 2176 0 Index Set 7 7 4705612 0 Preconditioner 3 3 3208 0 Viewer 1 0 0 0 ======================================================================================================================== Average time to get PetscTime(): 9.53674e-08 Average time for MPI_Barrier(): 9.39369e-06 Average time for zero size MPI_Send(): 5.21044e-06 #PETSc Option Table entries: -log_summary #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --with-mpi-dir=/opt/ud/openmpi-1.8.8/ --with-blas-lapack-dir=/opt/ud/intel_xe_2013sp1/mkl/lib/intel64/ --with-debugging=0 --download-hypre=1 --prefix=/home/wtay/Lib/petsc-3.6.2_shared_rel --known-mpi-shared=1 --with-shared-libraries --with-fortran-interfaces=1 ----------------------------------------- Libraries compiled on Sun Oct 18 17:34:07 2015 on hpc12 Machine characteristics: Linux-3.10.0-123.20.1.el7.x86_64-x86_64-with-centos-7.1.1503-Core Using PETSc directory: /home/wtay/Codes/petsc-3.6.2 Using PETSc arch: petsc-3.6.2_shared_rel ----------------------------------------- Using C compiler: /opt/ud/openmpi-1.8.8/bin/mpicc -fPIC -wd1572 -O3 ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: /opt/ud/openmpi-1.8.8/bin/mpif90 -fPIC -O3 ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/include -I/home/wtay/Codes/petsc-3.6.2/include -I/home/wtay/Codes/petsc-3.6.2/include -I/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/include -I/home/wtay/Lib/petsc-3.6.2_shared_rel/include -I/opt/ud/openmpi-1.8.8/include ----------------------------------------- Using C linker: /opt/ud/openmpi-1.8.8/bin/mpicc Using Fortran linker: /opt/ud/openmpi-1.8.8/bin/mpif90 Using libraries: -Wl,-rpath,/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/lib -L/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/lib -lpetsc -Wl,-rpath,/home/wtay/Lib/petsc-3.6.2_shared_rel/lib -L/home/wtay/Lib/petsc-3.6.2_shared_rel/lib -lHYPRE -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -lmpi_cxx -Wl,-rpath,/opt/ud/intel_xe_2013sp1/mkl/lib/intel64 -L/opt/ud/intel_xe_2013sp1/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -lX11 -lhwloc -lssl -lcrypto -lmpi_usempi -lmpi_mpifh -lifport -lifcore -lm -lmpi_cxx -ldl -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -lmpi -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -limf -lsvml -lirng -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lpthread -lirc_s -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -ldl ----------------------------------------- From bsmith at mcs.anl.gov Mon Nov 2 00:55:35 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 2 Nov 2015 00:55:35 -0600 Subject: [petsc-users] Scaling with number of cores In-Reply-To: <56370085.7070502@gmail.com> References: <5634EDA4.5030203@gmail.com> <56356E2A.8090000@gmail.com> <0BEDF334-C0C6-49D1-9FE8-D178181C5F00@mcs.anl.gov> <5636140A.3040506@gmail.com> <61EEB3FD-A093-475C-9841-9E79BB120385@mcs.anl.gov> <5636BDF3.8000109@gmail.com> <5636E059.2010107@gmail.com> <501DC517-2887-4222-9ECD-9951D521A4E5@mcs.anl.gov> <56370085.7070502@gmail.com> Message-ID: <6FF57BB6-FB0D-49F0-BF84-E4EA760FC010@mcs.anl.gov> Run (158/2)x(266/2)x(150/2) grid on 8 processes and then (158)x(266)x(150) on 64 processors and send the two -log_summary results Barry > On Nov 2, 2015, at 12:19 AM, TAY wee-beng wrote: > > Hi, > > I have attached the new results. > > Thank you > > Yours sincerely, > > TAY wee-beng > > On 2/11/2015 12:27 PM, Barry Smith wrote: >> Run without the -momentum_ksp_view -poisson_ksp_view and send the new results >> >> >> You can see from the log summary that the PCSetUp is taking a much smaller percentage of the time meaning that it is reusing the preconditioner and not rebuilding it each time. >> >> Barry >> >> Something makes no sense with the output: it gives >> >> KSPSolve 199 1.0 2.3298e+03 1.0 5.20e+09 1.8 3.8e+04 9.9e+05 5.0e+02 90100 66100 24 90100 66100 24 165 >> >> 90% of the time is in the solve but there is no significant amount of time in other events of the code which is just not possible. I hope it is due to your IO. >> >> >> >>> On Nov 1, 2015, at 10:02 PM, TAY wee-beng wrote: >>> >>> Hi, >>> >>> I have attached the new run with 100 time steps for 48 and 96 cores. >>> >>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >>> >>> Why does the number of processes increase so much? Is there something wrong with my coding? Seems to be so too for my new run. >>> >>> Thank you >>> >>> Yours sincerely, >>> >>> TAY wee-beng >>> >>> On 2/11/2015 9:49 AM, Barry Smith wrote: >>>> If you are doing many time steps with the same linear solver then you MUST do your weak scaling studies with MANY time steps since the setup time of AMG only takes place in the first stimestep. So run both 48 and 96 processes with the same large number of time steps. >>>> >>>> Barry >>>> >>>> >>>> >>>>> On Nov 1, 2015, at 7:35 PM, TAY wee-beng wrote: >>>>> >>>>> Hi, >>>>> >>>>> Sorry I forgot and use the old a.out. I have attached the new log for 48cores (log48), together with the 96cores log (log96). >>>>> >>>>> Why does the number of processes increase so much? Is there something wrong with my coding? >>>>> >>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >>>>> >>>>> Lastly, I only simulated 2 time steps previously. Now I run for 10 timesteps (log48_10). Is it building the preconditioner at every timestep? >>>>> >>>>> Also, what about momentum eqn? Is it working well? >>>>> >>>>> I will try the gamg later too. >>>>> >>>>> Thank you >>>>> >>>>> Yours sincerely, >>>>> >>>>> TAY wee-beng >>>>> >>>>> On 2/11/2015 12:30 AM, Barry Smith wrote: >>>>>> You used gmres with 48 processes but richardson with 96. You need to be careful and make sure you don't change the solvers when you change the number of processors since you can get very different inconsistent results >>>>>> >>>>>> Anyways all the time is being spent in the BoomerAMG algebraic multigrid setup and it is is scaling badly. When you double the problem size and number of processes it went from 3.2445e+01 to 4.3599e+02 seconds. >>>>>> >>>>>> PCSetUp 3 1.0 3.2445e+01 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 62 8 0 0 4 62 8 0 0 5 11 >>>>>> >>>>>> PCSetUp 3 1.0 4.3599e+02 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 85 18 0 0 6 85 18 0 0 6 2 >>>>>> >>>>>> Now is the Poisson problem changing at each timestep or can you use the same preconditioner built with BoomerAMG for all the time steps? Algebraic multigrid has a large set up time that you often doesn't matter if you have many time steps but if you have to rebuild it each timestep it is too large? >>>>>> >>>>>> You might also try -pc_type gamg and see how PETSc's algebraic multigrid scales for your problem/machine. >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> >>>>>>> On Nov 1, 2015, at 7:30 AM, TAY wee-beng wrote: >>>>>>> >>>>>>> >>>>>>> On 1/11/2015 10:00 AM, Barry Smith wrote: >>>>>>>>> On Oct 31, 2015, at 8:43 PM, TAY wee-beng wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> On 1/11/2015 12:47 AM, Matthew Knepley wrote: >>>>>>>>>> On Sat, Oct 31, 2015 at 11:34 AM, TAY wee-beng wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I understand that as mentioned in the faq, due to the limitations in memory, the scaling is not linear. So, I am trying to write a proposal to use a supercomputer. >>>>>>>>>> Its specs are: >>>>>>>>>> Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of memory per node) >>>>>>>>>> >>>>>>>>>> 8 cores / processor >>>>>>>>>> Interconnect: Tofu (6-dimensional mesh/torus) Interconnect >>>>>>>>>> Each cabinet contains 96 computing nodes, >>>>>>>>>> One of the requirement is to give the performance of my current code with my current set of data, and there is a formula to calculate the estimated parallel efficiency when using the new large set of data >>>>>>>>>> There are 2 ways to give performance: >>>>>>>>>> 1. Strong scaling, which is defined as how the elapsed time varies with the number of processors for a fixed >>>>>>>>>> problem. >>>>>>>>>> 2. Weak scaling, which is defined as how the elapsed time varies with the number of processors for a >>>>>>>>>> fixed problem size per processor. >>>>>>>>>> I ran my cases with 48 and 96 cores with my current cluster, giving 140 and 90 mins respectively. This is classified as strong scaling. >>>>>>>>>> Cluster specs: >>>>>>>>>> CPU: AMD 6234 2.4GHz >>>>>>>>>> 8 cores / processor (CPU) >>>>>>>>>> 6 CPU / node >>>>>>>>>> So 48 Cores / CPU >>>>>>>>>> Not sure abt the memory / node >>>>>>>>>> >>>>>>>>>> The parallel efficiency ?En? for a given degree of parallelism ?n? indicates how much the program is >>>>>>>>>> efficiently accelerated by parallel processing. ?En? is given by the following formulae. Although their >>>>>>>>>> derivation processes are different depending on strong and weak scaling, derived formulae are the >>>>>>>>>> same. >>>>>>>>>> From the estimated time, my parallel efficiency using Amdahl's law on the current old cluster was 52.7%. >>>>>>>>>> So is my results acceptable? >>>>>>>>>> For the large data set, if using 2205 nodes (2205X8cores), my expected parallel efficiency is only 0.5%. The proposal recommends value of > 50%. >>>>>>>>>> The problem with this analysis is that the estimated serial fraction from Amdahl's Law changes as a function >>>>>>>>>> of problem size, so you cannot take the strong scaling from one problem and apply it to another without a >>>>>>>>>> model of this dependence. >>>>>>>>>> >>>>>>>>>> Weak scaling does model changes with problem size, so I would measure weak scaling on your current >>>>>>>>>> cluster, and extrapolate to the big machine. I realize that this does not make sense for many scientific >>>>>>>>>> applications, but neither does requiring a certain parallel efficiency. >>>>>>>>> Ok I check the results for my weak scaling it is even worse for the expected parallel efficiency. From the formula used, it's obvious it's doing some sort of exponential extrapolation decrease. So unless I can achieve a near > 90% speed up when I double the cores and problem size for my current 48/96 cores setup, extrapolating from about 96 nodes to 10,000 nodes will give a much lower expected parallel efficiency for the new case. >>>>>>>>> >>>>>>>>> However, it's mentioned in the FAQ that due to memory requirement, it's impossible to get >90% speed when I double the cores and problem size (ie linear increase in performance), which means that I can't get >90% speed up when I double the cores and problem size for my current 48/96 cores setup. Is that so? >>>>>>>> What is the output of -ksp_view -log_summary on the problem and then on the problem doubled in size and number of processors? >>>>>>>> >>>>>>>> Barry >>>>>>> Hi, >>>>>>> >>>>>>> I have attached the output >>>>>>> >>>>>>> 48 cores: log48 >>>>>>> 96 cores: log96 >>>>>>> >>>>>>> There are 2 solvers - The momentum linear eqn uses bcgs, while the Poisson eqn uses hypre BoomerAMG. >>>>>>> >>>>>>> Problem size doubled from 158x266x150 to 158x266x300. >>>>>>>>> So is it fair to say that the main problem does not lie in my programming skills, but rather the way the linear equations are solved? >>>>>>>>> >>>>>>>>> Thanks. >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Matt >>>>>>>>>> Is it possible for this type of scaling in PETSc (>50%), when using 17640 (2205X8) cores? >>>>>>>>>> Btw, I do not have access to the system. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Sent using CloudMagic Email >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>>>>>> -- Norbert Wiener >>>>>>> >>>>> >>> > > From zonexo at gmail.com Mon Nov 2 03:17:06 2015 From: zonexo at gmail.com (TAY wee-beng) Date: Mon, 2 Nov 2015 17:17:06 +0800 Subject: [petsc-users] Scaling with number of cores In-Reply-To: <6FF57BB6-FB0D-49F0-BF84-E4EA760FC010@mcs.anl.gov> References: <5634EDA4.5030203@gmail.com> <56356E2A.8090000@gmail.com> <0BEDF334-C0C6-49D1-9FE8-D178181C5F00@mcs.anl.gov> <5636140A.3040506@gmail.com> <61EEB3FD-A093-475C-9841-9E79BB120385@mcs.anl.gov> <5636BDF3.8000109@gmail.com> <5636E059.2010107@gmail.com> <501DC517-2887-4222-9ECD-9951D521A4E5@mcs.anl.gov> <56370085.7070502@gmail.com> <6FF57BB6-FB0D-49F0-BF84-E4EA760FC010@mcs.anl.gov> Message-ID: <56372A12.90900@gmail.com> Hi, I have attached the 2 files. Thank you Yours sincerely, TAY wee-beng On 2/11/2015 2:55 PM, Barry Smith wrote: > Run (158/2)x(266/2)x(150/2) grid on 8 processes and then (158)x(266)x(150) on 64 processors and send the two -log_summary results > > Barry > > > > >> On Nov 2, 2015, at 12:19 AM, TAY wee-beng wrote: >> >> Hi, >> >> I have attached the new results. >> >> Thank you >> >> Yours sincerely, >> >> TAY wee-beng >> >> On 2/11/2015 12:27 PM, Barry Smith wrote: >>> Run without the -momentum_ksp_view -poisson_ksp_view and send the new results >>> >>> >>> You can see from the log summary that the PCSetUp is taking a much smaller percentage of the time meaning that it is reusing the preconditioner and not rebuilding it each time. >>> >>> Barry >>> >>> Something makes no sense with the output: it gives >>> >>> KSPSolve 199 1.0 2.3298e+03 1.0 5.20e+09 1.8 3.8e+04 9.9e+05 5.0e+02 90100 66100 24 90100 66100 24 165 >>> >>> 90% of the time is in the solve but there is no significant amount of time in other events of the code which is just not possible. I hope it is due to your IO. >>> >>> >>> >>>> On Nov 1, 2015, at 10:02 PM, TAY wee-beng wrote: >>>> >>>> Hi, >>>> >>>> I have attached the new run with 100 time steps for 48 and 96 cores. >>>> >>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >>>> >>>> Why does the number of processes increase so much? Is there something wrong with my coding? Seems to be so too for my new run. >>>> >>>> Thank you >>>> >>>> Yours sincerely, >>>> >>>> TAY wee-beng >>>> >>>> On 2/11/2015 9:49 AM, Barry Smith wrote: >>>>> If you are doing many time steps with the same linear solver then you MUST do your weak scaling studies with MANY time steps since the setup time of AMG only takes place in the first stimestep. So run both 48 and 96 processes with the same large number of time steps. >>>>> >>>>> Barry >>>>> >>>>> >>>>> >>>>>> On Nov 1, 2015, at 7:35 PM, TAY wee-beng wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> Sorry I forgot and use the old a.out. I have attached the new log for 48cores (log48), together with the 96cores log (log96). >>>>>> >>>>>> Why does the number of processes increase so much? Is there something wrong with my coding? >>>>>> >>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >>>>>> >>>>>> Lastly, I only simulated 2 time steps previously. Now I run for 10 timesteps (log48_10). Is it building the preconditioner at every timestep? >>>>>> >>>>>> Also, what about momentum eqn? Is it working well? >>>>>> >>>>>> I will try the gamg later too. >>>>>> >>>>>> Thank you >>>>>> >>>>>> Yours sincerely, >>>>>> >>>>>> TAY wee-beng >>>>>> >>>>>> On 2/11/2015 12:30 AM, Barry Smith wrote: >>>>>>> You used gmres with 48 processes but richardson with 96. You need to be careful and make sure you don't change the solvers when you change the number of processors since you can get very different inconsistent results >>>>>>> >>>>>>> Anyways all the time is being spent in the BoomerAMG algebraic multigrid setup and it is is scaling badly. When you double the problem size and number of processes it went from 3.2445e+01 to 4.3599e+02 seconds. >>>>>>> >>>>>>> PCSetUp 3 1.0 3.2445e+01 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 62 8 0 0 4 62 8 0 0 5 11 >>>>>>> >>>>>>> PCSetUp 3 1.0 4.3599e+02 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 85 18 0 0 6 85 18 0 0 6 2 >>>>>>> >>>>>>> Now is the Poisson problem changing at each timestep or can you use the same preconditioner built with BoomerAMG for all the time steps? Algebraic multigrid has a large set up time that you often doesn't matter if you have many time steps but if you have to rebuild it each timestep it is too large? >>>>>>> >>>>>>> You might also try -pc_type gamg and see how PETSc's algebraic multigrid scales for your problem/machine. >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Nov 1, 2015, at 7:30 AM, TAY wee-beng wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 1/11/2015 10:00 AM, Barry Smith wrote: >>>>>>>>>> On Oct 31, 2015, at 8:43 PM, TAY wee-beng wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 1/11/2015 12:47 AM, Matthew Knepley wrote: >>>>>>>>>>> On Sat, Oct 31, 2015 at 11:34 AM, TAY wee-beng wrote: >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I understand that as mentioned in the faq, due to the limitations in memory, the scaling is not linear. So, I am trying to write a proposal to use a supercomputer. >>>>>>>>>>> Its specs are: >>>>>>>>>>> Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of memory per node) >>>>>>>>>>> >>>>>>>>>>> 8 cores / processor >>>>>>>>>>> Interconnect: Tofu (6-dimensional mesh/torus) Interconnect >>>>>>>>>>> Each cabinet contains 96 computing nodes, >>>>>>>>>>> One of the requirement is to give the performance of my current code with my current set of data, and there is a formula to calculate the estimated parallel efficiency when using the new large set of data >>>>>>>>>>> There are 2 ways to give performance: >>>>>>>>>>> 1. Strong scaling, which is defined as how the elapsed time varies with the number of processors for a fixed >>>>>>>>>>> problem. >>>>>>>>>>> 2. Weak scaling, which is defined as how the elapsed time varies with the number of processors for a >>>>>>>>>>> fixed problem size per processor. >>>>>>>>>>> I ran my cases with 48 and 96 cores with my current cluster, giving 140 and 90 mins respectively. This is classified as strong scaling. >>>>>>>>>>> Cluster specs: >>>>>>>>>>> CPU: AMD 6234 2.4GHz >>>>>>>>>>> 8 cores / processor (CPU) >>>>>>>>>>> 6 CPU / node >>>>>>>>>>> So 48 Cores / CPU >>>>>>>>>>> Not sure abt the memory / node >>>>>>>>>>> >>>>>>>>>>> The parallel efficiency ?En? for a given degree of parallelism ?n? indicates how much the program is >>>>>>>>>>> efficiently accelerated by parallel processing. ?En? is given by the following formulae. Although their >>>>>>>>>>> derivation processes are different depending on strong and weak scaling, derived formulae are the >>>>>>>>>>> same. >>>>>>>>>>> From the estimated time, my parallel efficiency using Amdahl's law on the current old cluster was 52.7%. >>>>>>>>>>> So is my results acceptable? >>>>>>>>>>> For the large data set, if using 2205 nodes (2205X8cores), my expected parallel efficiency is only 0.5%. The proposal recommends value of > 50%. >>>>>>>>>>> The problem with this analysis is that the estimated serial fraction from Amdahl's Law changes as a function >>>>>>>>>>> of problem size, so you cannot take the strong scaling from one problem and apply it to another without a >>>>>>>>>>> model of this dependence. >>>>>>>>>>> >>>>>>>>>>> Weak scaling does model changes with problem size, so I would measure weak scaling on your current >>>>>>>>>>> cluster, and extrapolate to the big machine. I realize that this does not make sense for many scientific >>>>>>>>>>> applications, but neither does requiring a certain parallel efficiency. >>>>>>>>>> Ok I check the results for my weak scaling it is even worse for the expected parallel efficiency. From the formula used, it's obvious it's doing some sort of exponential extrapolation decrease. So unless I can achieve a near > 90% speed up when I double the cores and problem size for my current 48/96 cores setup, extrapolating from about 96 nodes to 10,000 nodes will give a much lower expected parallel efficiency for the new case. >>>>>>>>>> >>>>>>>>>> However, it's mentioned in the FAQ that due to memory requirement, it's impossible to get >90% speed when I double the cores and problem size (ie linear increase in performance), which means that I can't get >90% speed up when I double the cores and problem size for my current 48/96 cores setup. Is that so? >>>>>>>>> What is the output of -ksp_view -log_summary on the problem and then on the problem doubled in size and number of processors? >>>>>>>>> >>>>>>>>> Barry >>>>>>>> Hi, >>>>>>>> >>>>>>>> I have attached the output >>>>>>>> >>>>>>>> 48 cores: log48 >>>>>>>> 96 cores: log96 >>>>>>>> >>>>>>>> There are 2 solvers - The momentum linear eqn uses bcgs, while the Poisson eqn uses hypre BoomerAMG. >>>>>>>> >>>>>>>> Problem size doubled from 158x266x150 to 158x266x300. >>>>>>>>>> So is it fair to say that the main problem does not lie in my programming skills, but rather the way the linear equations are solved? >>>>>>>>>> >>>>>>>>>> Thanks. >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Matt >>>>>>>>>>> Is it possible for this type of scaling in PETSc (>50%), when using 17640 (2205X8) cores? >>>>>>>>>>> Btw, I do not have access to the system. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Sent using CloudMagic Email >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>>>>>>> -- Norbert Wiener >>>>>>>> >>>>>> >>>> >> -------------- next part -------------- 0.000000000000000E+000 0.353000000000000 0.000000000000000E+000 90.0000000000000 0.000000000000000E+000 0.000000000000000E+000 1.00000000000000 0.400000000000000 0 -400000 z grid divid too small! myid,each procs z size 32 2 z grid divid too small! myid,each procs z size 50 2 z grid divid too small! myid,each procs z size 34 2 AB,AA,BB -2.47900002275128 2.50750002410496 3.46600006963126 3.40250006661518 size_x,size_y,size_z 158 266 150 z grid divid too small! myid,each procs z size 41 2 z grid divid too small! myid,each procs z size 52 2 z grid divid too small! myid,each procs z size 60 2 z grid divid too small! myid,each procs z size 27 2 z grid divid too small! myid,each procs z size 29 2 z grid divid too small! myid,each procs z size 39 2 z grid divid too small! myid,each procs z size 23 2 z grid divid too small! myid,each procs z size 26 2 z grid divid too small! myid,each procs z size 24 2 z grid divid too small! myid,each procs z size 25 2 z grid divid too small! myid,each procs z size 49 2 z grid divid too small! myid,each procs z size 57 2 z grid divid too small! myid,each procs z size 37 2 z grid divid too small! myid,each procs z size 61 2 z grid divid too small! myid,each procs z size 28 2 z grid divid too small! myid,each procs z size 31 2 z grid divid too small! myid,each procs z size 54 2 z grid divid too small! myid,each procs z size 35 2 z grid divid too small! myid,each procs z size 51 2 z grid divid too small! myid,each procs z size 53 2 z grid divid too small! myid,each procs z size 22 2 z grid divid too small! myid,each procs z size 33 2 z grid divid too small! myid,each procs z size 48 2 z grid divid too small! myid,each procs z size 44 2 z grid divid too small! myid,each procs z size 43 2 z grid divid too small! myid,each procs z size 30 2 z grid divid too small! myid,each procs z size 62 2 z grid divid too small! myid,each procs z size 45 2 z grid divid too small! myid,each procs z size 47 2 z grid divid too small! myid,each procs z size 40 2 z grid divid too small! myid,each procs z size 42 2 z grid divid too small! myid,each procs z size 59 2 z grid divid too small! myid,each procs z size 46 2 z grid divid too small! myid,each procs z size 55 2 z grid divid too small! myid,each procs z size 58 2 z grid divid too small! myid,each procs z size 36 2 z grid divid too small! myid,each procs z size 38 2 z grid divid too small! myid,each procs z size 56 2 z grid divid too small! myid,each procs z size 63 2 body_cg_ini 0.523700833348298 0.778648765134454 7.03282656467989 Warning - length difference between element and cell max_element_length,min_element_length,min_delta 0.000000000000000E+000 10000000000.0000 1.800000000000000E-002 maximum ngh_surfaces and ngh_vertics are 42 22 minimum ngh_surfaces and ngh_vertics are 28 10 body_cg_ini 0.896813342835977 -0.976707581163755 7.03282656467989 Warning - length difference between element and cell max_element_length,min_element_length,min_delta 0.000000000000000E+000 10000000000.0000 1.800000000000000E-002 maximum ngh_surfaces and ngh_vertics are 42 22 minimum ngh_surfaces and ngh_vertics are 28 10 min IIB_cell_no 0 max IIB_cell_no 429 final initial IIB_cell_no 2145 min I_cell_no 0 max I_cell_no 460 final initial I_cell_no 2300 size(IIB_cell_u),size(I_cell_u),size(IIB_equal_cell_u),size(I_equal_cell_u) 2145 2300 2145 2300 IIB_I_cell_no_uvw_total1 3090 3094 3078 3080 3074 3073 IIB_I_cell_no_uvw_total2 3102 3108 3089 3077 3060 3086 1 0.00150000 0.26454057 0.26151125 1.18591343 -0.76697946E+03 -0.32604327E+02 0.62972429E+07 escape_time reached, so abort body 1 implicit forces and moment 1 0.862588119656401 -0.514914325828415 0.188666046906171 0.478398501406518 0.368390136470159 -1.05426803582325 body 2 implicit forces and moment 2 0.527317340758098 0.731529687675724 0.148470913323249 -0.515187332360951 0.158119801327539 0.961551576757635 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./a.out on a petsc-3.6.2_shared_rel named n12-04 with 64 processors, by wtay Mon Nov 2 08:09:14 2015 Using Petsc Release Version 3.6.2, Oct, 02, 2015 Max Max/Min Avg Total Time (sec): 6.462e+02 1.00000 6.462e+02 Objects: 4.300e+01 1.00000 4.300e+01 Flops: 3.832e+09 2.41599 2.918e+09 1.867e+11 Flops/sec: 5.930e+06 2.41599 4.515e+06 2.889e+08 MPI Messages: 4.040e+02 2.00000 3.977e+02 2.545e+04 MPI Message Lengths: 3.953e+08 2.00000 9.784e+05 2.490e+10 MPI Reductions: 1.922e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 6.4623e+02 100.0% 1.8673e+11 100.0% 2.545e+04 100.0% 9.784e+05 100.0% 1.921e+03 99.9% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 198 1.0 6.0898e+00 1.5 9.62e+08 2.8 2.5e+04 9.9e+05 0.0e+00 1 26 98100 0 1 26 98100 0 7839 MatSolve 297 1.0 5.0697e+00 2.5 1.30e+09 2.9 0.0e+00 0.0e+00 0.0e+00 1 33 0 0 0 1 33 0 0 0 12289 MatLUFactorNum 99 1.0 6.1544e+00 2.6 6.77e+08 3.4 0.0e+00 0.0e+00 0.0e+00 1 17 0 0 0 1 17 0 0 0 5159 MatILUFactorSym 1 1.0 5.6852e-02 4.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatConvert 1 1.0 4.6493e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyBegin 100 1.0 1.5075e+0110.6 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+02 1 0 0 0 10 1 0 0 0 10 0 MatAssemblyEnd 100 1.0 1.7887e+00 1.6 0.00e+00 0.0 5.0e+02 1.7e+05 1.6e+01 0 0 2 0 1 0 0 2 0 1 0 MatGetRowIJ 3 1.0 1.1921e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 7.5250e-03 4.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSetUp 199 1.0 3.5067e-02 9.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 199 1.0 4.9862e+02 1.0 3.83e+09 2.4 2.5e+04 9.9e+05 5.0e+02 77100 98100 26 77100 98100 26 374 VecDot 198 1.0 2.5283e+00 3.7 1.50e+08 1.5 0.0e+00 0.0e+00 2.0e+02 0 4 0 0 10 0 4 0 0 10 2962 VecDotNorm2 99 1.0 2.1805e+00 5.6 1.50e+08 1.5 0.0e+00 0.0e+00 9.9e+01 0 4 0 0 5 0 4 0 0 5 3435 VecNorm 198 1.0 5.7988e+00 8.6 1.50e+08 1.5 0.0e+00 0.0e+00 2.0e+02 0 4 0 0 10 0 4 0 0 10 1292 VecCopy 198 1.0 2.7041e-01 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 696 1.0 6.4512e-01 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPBYCZ 198 1.0 7.5918e-01 2.2 3.00e+08 1.5 0.0e+00 0.0e+00 0.0e+00 0 8 0 0 0 0 8 0 0 0 19730 VecWAXPY 198 1.0 7.3240e-01 2.1 1.50e+08 1.5 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 10226 VecAssemblyBegin 398 1.0 2.3003e+00 3.3 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+03 0 0 0 0 62 0 0 0 0 62 0 VecAssemblyEnd 398 1.0 1.9789e-03 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 198 1.0 5.3828e-01 3.6 0.00e+00 0.0 2.5e+04 9.9e+05 0.0e+00 0 0 98100 0 0 0 98100 0 0 VecScatterEnd 198 1.0 2.7654e+00 4.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCSetUp 199 1.0 9.0926e+01 1.0 6.77e+08 3.4 0.0e+00 0.0e+00 4.0e+00 14 17 0 0 0 14 17 0 0 0 349 PCSetUpOnBlocks 99 1.0 6.2079e+00 2.6 6.77e+08 3.4 0.0e+00 0.0e+00 0.0e+00 1 17 0 0 0 1 17 0 0 0 5115 PCApply 297 1.0 5.3325e+00 2.3 1.30e+09 2.9 0.0e+00 0.0e+00 0.0e+00 1 33 0 0 0 1 33 0 0 0 11683 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 7 7 136037064 0 Krylov Solver 3 3 3464 0 Vector 20 20 31622728 0 Vector Scatter 2 2 2176 0 Index Set 7 7 3696940 0 Preconditioner 3 3 3208 0 Viewer 1 0 0 0 ======================================================================================================================== Average time to get PetscTime(): 2.14577e-07 Average time for MPI_Barrier(): 0.000201178 Average time for zero size MPI_Send(): 3.05139e-05 #PETSc Option Table entries: -log_summary #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --with-mpi-dir=/opt/ud/openmpi-1.8.8/ --with-blas-lapack-dir=/opt/ud/intel_xe_2013sp1/mkl/lib/intel64/ --with-debugging=0 --download-hypre=1 --prefix=/home/wtay/Lib/petsc-3.6.2_shared_rel --known-mpi-shared=1 --with-shared-libraries --with-fortran-interfaces=1 ----------------------------------------- Libraries compiled on Sun Oct 18 17:34:07 2015 on hpc12 Machine characteristics: Linux-3.10.0-123.20.1.el7.x86_64-x86_64-with-centos-7.1.1503-Core Using PETSc directory: /home/wtay/Codes/petsc-3.6.2 Using PETSc arch: petsc-3.6.2_shared_rel ----------------------------------------- Using C compiler: /opt/ud/openmpi-1.8.8/bin/mpicc -fPIC -wd1572 -O3 ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: /opt/ud/openmpi-1.8.8/bin/mpif90 -fPIC -O3 ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/include -I/home/wtay/Codes/petsc-3.6.2/include -I/home/wtay/Codes/petsc-3.6.2/include -I/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/include -I/home/wtay/Lib/petsc-3.6.2_shared_rel/include -I/opt/ud/openmpi-1.8.8/include ----------------------------------------- Using C linker: /opt/ud/openmpi-1.8.8/bin/mpicc Using Fortran linker: /opt/ud/openmpi-1.8.8/bin/mpif90 Using libraries: -Wl,-rpath,/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/lib -L/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/lib -lpetsc -Wl,-rpath,/home/wtay/Lib/petsc-3.6.2_shared_rel/lib -L/home/wtay/Lib/petsc-3.6.2_shared_rel/lib -lHYPRE -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -lmpi_cxx -Wl,-rpath,/opt/ud/intel_xe_2013sp1/mkl/lib/intel64 -L/opt/ud/intel_xe_2013sp1/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -lX11 -lhwloc -lssl -lcrypto -lmpi_usempi -lmpi_mpifh -lifport -lifcore -lm -lmpi_cxx -ldl -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -lmpi -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -limf -lsvml -lirng -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lpthread -lirc_s -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -ldl ----------------------------------------- -------------- next part -------------- 0.000000000000000E+000 0.353000000000000 0.000000000000000E+000 90.0000000000000 0.000000000000000E+000 0.000000000000000E+000 1.00000000000000 0.400000000000000 0 -400000 AB,AA,BB -2.00050000002375 2.00050000002375 2.61200002906844 2.53550002543489 size_x,size_y,size_z 79 133 75 body_cg_ini 0.523700833348298 0.778648765134454 7.03282656467989 Warning - length difference between element and cell max_element_length,min_element_length,min_delta 0.000000000000000E+000 10000000000.0000 4.300000000000000E-002 maximum ngh_surfaces and ngh_vertics are 149 68 minimum ngh_surfaces and ngh_vertics are 54 22 body_cg_ini 0.896813342835977 -0.976707581163755 7.03282656467989 Warning - length difference between element and cell max_element_length,min_element_length,min_delta 0.000000000000000E+000 10000000000.0000 4.300000000000000E-002 maximum ngh_surfaces and ngh_vertics are 149 68 minimum ngh_surfaces and ngh_vertics are 54 22 min IIB_cell_no 0 max IIB_cell_no 265 final initial IIB_cell_no 1325 min I_cell_no 0 max I_cell_no 94 final initial I_cell_no 470 size(IIB_cell_u),size(I_cell_u),size(IIB_equal_cell_u),size(I_equal_cell_u) 1325 470 1325 470 IIB_I_cell_no_uvw_total1 265 270 255 94 91 95 IIB_I_cell_no_uvw_total2 273 280 267 97 94 98 1 0.00150000 0.14647307 0.14738629 1.08799982 0.19042331E+02 0.17694812E+00 0.78750669E+06 escape_time reached, so abort body 1 implicit forces and moment 1 0.869079152284549 -0.476901507812372 8.158446867754350E-002 0.428147709668946 0.558124898859503 -0.928673788206044 body 2 implicit forces and moment 2 0.551071794231021 0.775546442990061 0.135476527830159 -0.634587379905926 0.290234735051080 0.936523173830761 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./a.out on a petsc-3.6.2_shared_rel named n12-04 with 8 processors, by wtay Mon Nov 2 08:08:29 2015 Using Petsc Release Version 3.6.2, Oct, 02, 2015 Max Max/Min Avg Total Time (sec): 1.687e+02 1.00000 1.687e+02 Objects: 4.300e+01 1.00000 4.300e+01 Flops: 3.326e+09 1.20038 3.085e+09 2.468e+10 Flops/sec: 1.971e+07 1.20038 1.828e+07 1.463e+08 MPI Messages: 4.040e+02 2.00000 3.535e+02 2.828e+03 MPI Message Lengths: 9.744e+07 2.00000 2.412e+05 6.821e+08 MPI Reductions: 1.922e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 1.6872e+02 100.0% 2.4679e+10 100.0% 2.828e+03 100.0% 2.412e+05 100.0% 1.921e+03 99.9% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 198 1.0 3.7159e+00 1.3 7.91e+08 1.2 2.8e+03 2.5e+05 0.0e+00 2 24 98100 0 2 24 98100 0 1575 MatSolve 297 1.0 3.5614e+00 1.4 1.15e+09 1.2 0.0e+00 0.0e+00 0.0e+00 2 35 0 0 0 2 35 0 0 0 2393 MatLUFactorNum 99 1.0 6.4595e+00 1.4 6.34e+08 1.2 0.0e+00 0.0e+00 0.0e+00 3 19 0 0 0 3 19 0 0 0 726 MatILUFactorSym 1 1.0 3.9131e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatConvert 1 1.0 2.7447e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyBegin 100 1.0 6.4550e+0011.7 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+02 2 0 0 0 10 2 0 0 0 10 0 MatAssemblyEnd 100 1.0 1.4923e+00 1.3 0.00e+00 0.0 5.6e+01 4.1e+04 1.6e+01 1 0 2 0 1 1 0 2 0 1 0 MatGetRowIJ 3 1.0 3.3379e-06 3.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 5.4829e-03 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSetUp 199 1.0 1.3117e-02 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 199 1.0 1.2467e+02 1.0 3.33e+09 1.2 2.8e+03 2.5e+05 5.0e+02 74100 98100 26 74100 98100 26 198 VecDot 198 1.0 1.1893e+00 2.8 1.25e+08 1.1 0.0e+00 0.0e+00 2.0e+02 0 4 0 0 10 0 4 0 0 10 787 VecDotNorm2 99 1.0 1.0476e+00 3.6 1.25e+08 1.1 0.0e+00 0.0e+00 9.9e+01 0 4 0 0 5 0 4 0 0 5 894 VecNorm 198 1.0 2.6889e+00 5.4 1.25e+08 1.1 0.0e+00 0.0e+00 2.0e+02 1 4 0 0 10 1 4 0 0 10 348 VecCopy 198 1.0 1.6091e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 696 1.0 4.0666e-01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPBYCZ 198 1.0 4.8916e-01 1.5 2.50e+08 1.1 0.0e+00 0.0e+00 0.0e+00 0 8 0 0 0 0 8 0 0 0 3828 VecWAXPY 198 1.0 4.7945e-01 1.5 1.25e+08 1.1 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 1953 VecAssemblyBegin 398 1.0 8.6470e-01 4.5 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+03 0 0 0 0 62 0 0 0 0 62 0 VecAssemblyEnd 398 1.0 1.1375e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 198 1.0 7.4661e-02 2.5 0.00e+00 0.0 2.8e+03 2.5e+05 0.0e+00 0 0 98100 0 0 0 98100 0 0 VecScatterEnd 198 1.0 8.5547e-01 5.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCSetUp 199 1.0 1.1193e+01 1.2 6.34e+08 1.2 0.0e+00 0.0e+00 4.0e+00 6 19 0 0 0 6 19 0 0 0 419 PCSetUpOnBlocks 99 1.0 6.5059e+00 1.4 6.34e+08 1.2 0.0e+00 0.0e+00 0.0e+00 3 19 0 0 0 3 19 0 0 0 721 PCApply 297 1.0 3.7292e+00 1.4 1.15e+09 1.2 0.0e+00 0.0e+00 0.0e+00 2 35 0 0 0 2 35 0 0 0 2285 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 7 7 114426392 0 Krylov Solver 3 3 3464 0 Vector 20 20 25577680 0 Vector Scatter 2 2 2176 0 Index Set 7 7 2691760 0 Preconditioner 3 3 3208 0 Viewer 1 0 0 0 ======================================================================================================================== Average time to get PetscTime(): 1.19209e-07 Average time for MPI_Barrier(): 3.8147e-06 Average time for zero size MPI_Send(): 2.77162e-06 #PETSc Option Table entries: -log_summary #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --with-mpi-dir=/opt/ud/openmpi-1.8.8/ --with-blas-lapack-dir=/opt/ud/intel_xe_2013sp1/mkl/lib/intel64/ --with-debugging=0 --download-hypre=1 --prefix=/home/wtay/Lib/petsc-3.6.2_shared_rel --known-mpi-shared=1 --with-shared-libraries --with-fortran-interfaces=1 ----------------------------------------- Libraries compiled on Sun Oct 18 17:34:07 2015 on hpc12 Machine characteristics: Linux-3.10.0-123.20.1.el7.x86_64-x86_64-with-centos-7.1.1503-Core Using PETSc directory: /home/wtay/Codes/petsc-3.6.2 Using PETSc arch: petsc-3.6.2_shared_rel ----------------------------------------- Using C compiler: /opt/ud/openmpi-1.8.8/bin/mpicc -fPIC -wd1572 -O3 ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: /opt/ud/openmpi-1.8.8/bin/mpif90 -fPIC -O3 ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/include -I/home/wtay/Codes/petsc-3.6.2/include -I/home/wtay/Codes/petsc-3.6.2/include -I/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/include -I/home/wtay/Lib/petsc-3.6.2_shared_rel/include -I/opt/ud/openmpi-1.8.8/include ----------------------------------------- Using C linker: /opt/ud/openmpi-1.8.8/bin/mpicc Using Fortran linker: /opt/ud/openmpi-1.8.8/bin/mpif90 Using libraries: -Wl,-rpath,/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/lib -L/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/lib -lpetsc -Wl,-rpath,/home/wtay/Lib/petsc-3.6.2_shared_rel/lib -L/home/wtay/Lib/petsc-3.6.2_shared_rel/lib -lHYPRE -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -lmpi_cxx -Wl,-rpath,/opt/ud/intel_xe_2013sp1/mkl/lib/intel64 -L/opt/ud/intel_xe_2013sp1/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -lX11 -lhwloc -lssl -lcrypto -lmpi_usempi -lmpi_mpifh -lifport -lifcore -lm -lmpi_cxx -ldl -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -lmpi -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -limf -lsvml -lirng -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lpthread -lirc_s -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -ldl ----------------------------------------- From francesco.magaletti at uniroma1.it Mon Nov 2 09:58:51 2015 From: francesco.magaletti at uniroma1.it (Francesco Magaletti) Date: Mon, 2 Nov 2015 16:58:51 +0100 Subject: [petsc-users] SNESComputeJacobianDefaultColor troubles Message-ID: Hi everyone, I?m trying to solve a system of PDE?s with a full implicit time integration and a DMDA context to manage the cartesian structured grid. I don?t know the actual jacobian matrix (actually it is too cumbersome to be easily evaluated analytically) so I used SNESComputeJacobianDefaultColor to approximate it: DMSetMatType(da,MATAIJ); DMCreateMatrix(da,&J); TSGetSNES(ts, &snes); SNESSetJacobian(snes, J, J, SNESComputeJacobianDefaultColor,0); All it works perfectly but it happens that, when I increase the number of grid points, I go out of memory since the matrix becomes too big to be stored despite my estimate of memory consumption which is much lower than the actual one. Indeed a more careful analysis (with -mat_view) shows that there are a lot of zeros in the nonzero structure of the jacobian. I read in the manual that DMCreateMatrix preallocates the matrix by somewhat using the stencil width of the DMDA context, but in my case I don?t really need all those elements for every equation. I then tried with a poor preallocation, hoping petsC could manage it with some malloc on the fly: MatCreateAIJ(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,size,size,3,0,0,0,&J); MatSetOption(J,MAT_NEW_NONZERO_LOCATIONS,PETSC_TRUE); MatSetOption(J,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_FALSE); MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE); TSGetSNES(ts, &snes); SNESSetJacobian(snes, J, J, SNESComputeJacobianDefaultColor,0); but now it gives me runtime errors: [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Matrix must be assembled by calls to MatAssemblyBegin/End(); [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 [0]PETSC ERROR: ./bubble_petsc on a arch-linux2-c-debug named ... Mon Nov 2 16:24:12 2015 [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich [0]PETSC ERROR: #1 MatFDColoringCreate() line 448 in /home/magaletto/Scaricati/petsc-3.6.0/src/mat/matfd/fdmatrix.c [0]PETSC ERROR: #2 SNESComputeJacobianDefaultColor() line 67 in /home/magaletto/Scaricati/petsc-3.6.0/src/snes/interface/snesj2.c [0]PETSC ERROR: #3 SNESComputeJacobian() line 2223 in /home/magaletto/Scaricati/petsc-3.6.0/src/snes/interface/snes.c [0]PETSC ERROR: #4 SNESSolve_NEWTONLS() line 231 in /home/magaletto/Scaricati/petsc-3.6.0/src/snes/impls/ls/ls.c [0]PETSC ERROR: #5 SNESSolve() line 3894 in /home/magaletto/Scaricati/petsc-3.6.0/src/snes/interface/snes.c [0]PETSC ERROR: #6 TSStep_Theta() line 197 in /home/magaletto/Scaricati/petsc-3.6.0/src/ts/impls/implicit/theta/theta.c [0]PETSC ERROR: #7 TSStep() line 3098 in /home/magaletto/Scaricati/petsc-3.6.0/src/ts/interface/ts.c [0]PETSC ERROR: #8 TSSolve() line 3282 in /home/magaletto/Scaricati/petsc-3.6.0/src/ts/interface/ts.c If I use instead the same preallocation but with the FD calculation without coloring it runs without problems (it is only extremely slow) and it uses the correct number of nonzero values: MatCreateAIJ(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,size,size,3,0,0,0,&J); MatSetOption(J,MAT_NEW_NONZERO_LOCATIONS,PETSC_TRUE); MatSetOption(J,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_FALSE); MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE); TSGetSNES(ts, &snes); SNESSetJacobian(snes, J, J, SNESComputeJacobianDefault,&appctx); What do you suggest to solve this problem? I also tried to set up a first step with fd without coloring in order to allocate the correct amount of memory and nonzero structure and successively to switch to the colored version: TSSetIFunction(ts,NULL,FormIFunction,&appctx); MatCreateAIJ(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,size,size,3,0,0,0,&J); MatSetOption(J,MAT_NEW_NONZERO_LOCATIONS,PETSC_TRUE); MatSetOption(J,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_FALSE); MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE); TSGetSNES(ts, &snes); SNESSetJacobian(snes, J, J, SNESComputeJacobianDefault,&appctx); TSStep(ts); ISColoring iscoloring; MatColoring coloring; MatFDColoring matfdcoloring; MatColoringCreate(J,&coloring); MatColoringSetFromOptions(coloring); MatColoringApply(coloring,&iscoloring); MatFDColoringCreate(J,iscoloring,&matfdcoloring); MatFDColoringSetFunction(matfdcoloring,(PetscErrorCode (*)(void))FormIFunction,&appctx); MatFDColoringSetFromOptions(matfdcoloring); MatFDColoringSetUp(J,iscoloring,matfdcoloring); SNESSetJacobian(snes,J,J,SNESComputeJacobianDefaultColor,matfdcoloring); TSStep(ts); but again it gives me runtime errors: [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Invalid argument [0]PETSC ERROR: Wrong type of object: Parameter # 1 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 [0]PETSC ERROR: ./bubble_petsc on a arch-linux2-c-debug named ... Mon Nov 2 16:37:24 2015 [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich [0]PETSC ERROR: #1 TSGetDM() line 4093 in /home/magaletto/Scaricati/petsc-3.6.0/src/ts/interface/ts.c [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Corrupt argument: http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind [0]PETSC ERROR: Invalid Pointer to Object: Parameter # 1 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 [0]PETSC ERROR: ./bubble_petsc on a arch-linux2-c-debug named ... Mon Nov 2 16:37:24 2015 [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich [0]PETSC ERROR: #2 DMGetLocalVector() line 44 in /home/magaletto/Scaricati/petsc-3.6.0/src/dm/interface/dmget.c ===================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = EXIT CODE: 11 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES ===================================================================================== APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11) where is my mistake? Your sincerely, Francesco From bsmith at mcs.anl.gov Mon Nov 2 10:54:00 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 2 Nov 2015 10:54:00 -0600 Subject: [petsc-users] SNESComputeJacobianDefaultColor troubles In-Reply-To: References: Message-ID: <6DA1EA23-67FB-47D4-B909-68F858B7A978@mcs.anl.gov> Don't do any of that stuff you tried below. Just use DMDASetFillBlocks() before DMCreateMatrix() Barry > On Nov 2, 2015, at 9:58 AM, Francesco Magaletti wrote: > > Hi everyone, > > I?m trying to solve a system of PDE?s with a full implicit time integration and a DMDA context to manage the cartesian structured grid. > I don?t know the actual jacobian matrix (actually it is too cumbersome to be easily evaluated analytically) so I used SNESComputeJacobianDefaultColor to approximate it: > > DMSetMatType(da,MATAIJ); > DMCreateMatrix(da,&J); > TSGetSNES(ts, &snes); > SNESSetJacobian(snes, J, J, SNESComputeJacobianDefaultColor,0); > > All it works perfectly but it happens that, when I increase the number of grid points, I go out of memory since the matrix becomes too big to be stored despite my estimate of memory consumption which is much lower than the actual one. Indeed a more careful analysis (with -mat_view) shows that there are a lot of zeros in the nonzero structure of the jacobian. I read in the manual that DMCreateMatrix preallocates the matrix by somewhat using the stencil width of the DMDA context, but in my case I don?t really need all those elements for every equation. > I then tried with a poor preallocation, hoping petsC could manage it with some malloc on the fly: > > MatCreateAIJ(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,size,size,3,0,0,0,&J); > MatSetOption(J,MAT_NEW_NONZERO_LOCATIONS,PETSC_TRUE); > MatSetOption(J,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_FALSE); > MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE); > TSGetSNES(ts, &snes); > SNESSetJacobian(snes, J, J, SNESComputeJacobianDefaultColor,0); > > but now it gives me runtime errors: > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Matrix must be assembled by calls to MatAssemblyBegin/End(); > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 > [0]PETSC ERROR: ./bubble_petsc on a arch-linux2-c-debug named ... Mon Nov 2 16:24:12 2015 > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich > [0]PETSC ERROR: #1 MatFDColoringCreate() line 448 in /home/magaletto/Scaricati/petsc-3.6.0/src/mat/matfd/fdmatrix.c > [0]PETSC ERROR: #2 SNESComputeJacobianDefaultColor() line 67 in /home/magaletto/Scaricati/petsc-3.6.0/src/snes/interface/snesj2.c > [0]PETSC ERROR: #3 SNESComputeJacobian() line 2223 in /home/magaletto/Scaricati/petsc-3.6.0/src/snes/interface/snes.c > [0]PETSC ERROR: #4 SNESSolve_NEWTONLS() line 231 in /home/magaletto/Scaricati/petsc-3.6.0/src/snes/impls/ls/ls.c > [0]PETSC ERROR: #5 SNESSolve() line 3894 in /home/magaletto/Scaricati/petsc-3.6.0/src/snes/interface/snes.c > [0]PETSC ERROR: #6 TSStep_Theta() line 197 in /home/magaletto/Scaricati/petsc-3.6.0/src/ts/impls/implicit/theta/theta.c > [0]PETSC ERROR: #7 TSStep() line 3098 in /home/magaletto/Scaricati/petsc-3.6.0/src/ts/interface/ts.c > [0]PETSC ERROR: #8 TSSolve() line 3282 in /home/magaletto/Scaricati/petsc-3.6.0/src/ts/interface/ts.c > > If I use instead the same preallocation but with the FD calculation without coloring it runs without problems (it is only extremely slow) and it uses the correct number of nonzero values: > > MatCreateAIJ(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,size,size,3,0,0,0,&J); > MatSetOption(J,MAT_NEW_NONZERO_LOCATIONS,PETSC_TRUE); > MatSetOption(J,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_FALSE); > MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE); > TSGetSNES(ts, &snes); > SNESSetJacobian(snes, J, J, SNESComputeJacobianDefault,&appctx); > > What do you suggest to solve this problem? > I also tried to set up a first step with fd without coloring in order to allocate the correct amount of memory and nonzero structure and successively to switch to the colored version: > > TSSetIFunction(ts,NULL,FormIFunction,&appctx); > > MatCreateAIJ(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,size,size,3,0,0,0,&J); > MatSetOption(J,MAT_NEW_NONZERO_LOCATIONS,PETSC_TRUE); > MatSetOption(J,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_FALSE); > MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE); > TSGetSNES(ts, &snes); > SNESSetJacobian(snes, J, J, SNESComputeJacobianDefault,&appctx); > TSStep(ts); > > ISColoring iscoloring; > MatColoring coloring; > MatFDColoring matfdcoloring; > MatColoringCreate(J,&coloring); > MatColoringSetFromOptions(coloring); > MatColoringApply(coloring,&iscoloring); > MatFDColoringCreate(J,iscoloring,&matfdcoloring); > MatFDColoringSetFunction(matfdcoloring,(PetscErrorCode (*)(void))FormIFunction,&appctx); > MatFDColoringSetFromOptions(matfdcoloring); > MatFDColoringSetUp(J,iscoloring,matfdcoloring); > SNESSetJacobian(snes,J,J,SNESComputeJacobianDefaultColor,matfdcoloring); > > TSStep(ts); > > but again it gives me runtime errors: > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Invalid argument > [0]PETSC ERROR: Wrong type of object: Parameter # 1 > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 > [0]PETSC ERROR: ./bubble_petsc on a arch-linux2-c-debug named ... Mon Nov 2 16:37:24 2015 > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich > [0]PETSC ERROR: #1 TSGetDM() line 4093 in /home/magaletto/Scaricati/petsc-3.6.0/src/ts/interface/ts.c > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Corrupt argument: http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [0]PETSC ERROR: Invalid Pointer to Object: Parameter # 1 > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 > [0]PETSC ERROR: ./bubble_petsc on a arch-linux2-c-debug named ... Mon Nov 2 16:37:24 2015 > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich > [0]PETSC ERROR: #2 DMGetLocalVector() line 44 in /home/magaletto/Scaricati/petsc-3.6.0/src/dm/interface/dmget.c > > ===================================================================================== > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > = EXIT CODE: 11 > = CLEANING UP REMAINING PROCESSES > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > ===================================================================================== > APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11) > > where is my mistake? > > Your sincerely, > Francesco From bsmith at mcs.anl.gov Mon Nov 2 13:18:37 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 2 Nov 2015 13:18:37 -0600 Subject: [petsc-users] Scaling with number of cores In-Reply-To: <56372A12.90900@gmail.com> References: <5634EDA4.5030203@gmail.com> <56356E2A.8090000@gmail.com> <0BEDF334-C0C6-49D1-9FE8-D178181C5F00@mcs.anl.gov> <5636140A.3040506@gmail.com> <61EEB3FD-A093-475C-9841-9E79BB120385@mcs.anl.gov> <5636BDF3.8000109@gmail.com> <5636E059.2010107@gmail.com> <501DC517-2887-4222-9ECD-9951D521A4E5@mcs.anl.gov> <56370085.7070502@gmail.com> <6FF57BB6-FB0D-49F0-BF84-E4EA760FC010@mcs.anl.gov> <56372A12.90900@gmail.com> Message-ID: hypre is just not scaling well here. I do not know why. Since hypre is a block box for us there is no way to determine why the poor scaling. If you make the same two runs with -pc_type gamg there will be a lot more information in the log summary about in what routines it is scaling well or poorly. Barry > On Nov 2, 2015, at 3:17 AM, TAY wee-beng wrote: > > Hi, > > I have attached the 2 files. > > Thank you > > Yours sincerely, > > TAY wee-beng > > On 2/11/2015 2:55 PM, Barry Smith wrote: >> Run (158/2)x(266/2)x(150/2) grid on 8 processes and then (158)x(266)x(150) on 64 processors and send the two -log_summary results >> >> Barry >> >> >> >>> On Nov 2, 2015, at 12:19 AM, TAY wee-beng wrote: >>> >>> Hi, >>> >>> I have attached the new results. >>> >>> Thank you >>> >>> Yours sincerely, >>> >>> TAY wee-beng >>> >>> On 2/11/2015 12:27 PM, Barry Smith wrote: >>>> Run without the -momentum_ksp_view -poisson_ksp_view and send the new results >>>> >>>> >>>> You can see from the log summary that the PCSetUp is taking a much smaller percentage of the time meaning that it is reusing the preconditioner and not rebuilding it each time. >>>> >>>> Barry >>>> >>>> Something makes no sense with the output: it gives >>>> >>>> KSPSolve 199 1.0 2.3298e+03 1.0 5.20e+09 1.8 3.8e+04 9.9e+05 5.0e+02 90100 66100 24 90100 66100 24 165 >>>> >>>> 90% of the time is in the solve but there is no significant amount of time in other events of the code which is just not possible. I hope it is due to your IO. >>>> >>>> >>>> >>>>> On Nov 1, 2015, at 10:02 PM, TAY wee-beng wrote: >>>>> >>>>> Hi, >>>>> >>>>> I have attached the new run with 100 time steps for 48 and 96 cores. >>>>> >>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >>>>> >>>>> Why does the number of processes increase so much? Is there something wrong with my coding? Seems to be so too for my new run. >>>>> >>>>> Thank you >>>>> >>>>> Yours sincerely, >>>>> >>>>> TAY wee-beng >>>>> >>>>> On 2/11/2015 9:49 AM, Barry Smith wrote: >>>>>> If you are doing many time steps with the same linear solver then you MUST do your weak scaling studies with MANY time steps since the setup time of AMG only takes place in the first stimestep. So run both 48 and 96 processes with the same large number of time steps. >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> >>>>>>> On Nov 1, 2015, at 7:35 PM, TAY wee-beng wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Sorry I forgot and use the old a.out. I have attached the new log for 48cores (log48), together with the 96cores log (log96). >>>>>>> >>>>>>> Why does the number of processes increase so much? Is there something wrong with my coding? >>>>>>> >>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >>>>>>> >>>>>>> Lastly, I only simulated 2 time steps previously. Now I run for 10 timesteps (log48_10). Is it building the preconditioner at every timestep? >>>>>>> >>>>>>> Also, what about momentum eqn? Is it working well? >>>>>>> >>>>>>> I will try the gamg later too. >>>>>>> >>>>>>> Thank you >>>>>>> >>>>>>> Yours sincerely, >>>>>>> >>>>>>> TAY wee-beng >>>>>>> >>>>>>> On 2/11/2015 12:30 AM, Barry Smith wrote: >>>>>>>> You used gmres with 48 processes but richardson with 96. You need to be careful and make sure you don't change the solvers when you change the number of processors since you can get very different inconsistent results >>>>>>>> >>>>>>>> Anyways all the time is being spent in the BoomerAMG algebraic multigrid setup and it is is scaling badly. When you double the problem size and number of processes it went from 3.2445e+01 to 4.3599e+02 seconds. >>>>>>>> >>>>>>>> PCSetUp 3 1.0 3.2445e+01 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 62 8 0 0 4 62 8 0 0 5 11 >>>>>>>> >>>>>>>> PCSetUp 3 1.0 4.3599e+02 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 85 18 0 0 6 85 18 0 0 6 2 >>>>>>>> >>>>>>>> Now is the Poisson problem changing at each timestep or can you use the same preconditioner built with BoomerAMG for all the time steps? Algebraic multigrid has a large set up time that you often doesn't matter if you have many time steps but if you have to rebuild it each timestep it is too large? >>>>>>>> >>>>>>>> You might also try -pc_type gamg and see how PETSc's algebraic multigrid scales for your problem/machine. >>>>>>>> >>>>>>>> Barry >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On Nov 1, 2015, at 7:30 AM, TAY wee-beng wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> On 1/11/2015 10:00 AM, Barry Smith wrote: >>>>>>>>>>> On Oct 31, 2015, at 8:43 PM, TAY wee-beng wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 1/11/2015 12:47 AM, Matthew Knepley wrote: >>>>>>>>>>>> On Sat, Oct 31, 2015 at 11:34 AM, TAY wee-beng wrote: >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> I understand that as mentioned in the faq, due to the limitations in memory, the scaling is not linear. So, I am trying to write a proposal to use a supercomputer. >>>>>>>>>>>> Its specs are: >>>>>>>>>>>> Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of memory per node) >>>>>>>>>>>> >>>>>>>>>>>> 8 cores / processor >>>>>>>>>>>> Interconnect: Tofu (6-dimensional mesh/torus) Interconnect >>>>>>>>>>>> Each cabinet contains 96 computing nodes, >>>>>>>>>>>> One of the requirement is to give the performance of my current code with my current set of data, and there is a formula to calculate the estimated parallel efficiency when using the new large set of data >>>>>>>>>>>> There are 2 ways to give performance: >>>>>>>>>>>> 1. Strong scaling, which is defined as how the elapsed time varies with the number of processors for a fixed >>>>>>>>>>>> problem. >>>>>>>>>>>> 2. Weak scaling, which is defined as how the elapsed time varies with the number of processors for a >>>>>>>>>>>> fixed problem size per processor. >>>>>>>>>>>> I ran my cases with 48 and 96 cores with my current cluster, giving 140 and 90 mins respectively. This is classified as strong scaling. >>>>>>>>>>>> Cluster specs: >>>>>>>>>>>> CPU: AMD 6234 2.4GHz >>>>>>>>>>>> 8 cores / processor (CPU) >>>>>>>>>>>> 6 CPU / node >>>>>>>>>>>> So 48 Cores / CPU >>>>>>>>>>>> Not sure abt the memory / node >>>>>>>>>>>> >>>>>>>>>>>> The parallel efficiency ?En? for a given degree of parallelism ?n? indicates how much the program is >>>>>>>>>>>> efficiently accelerated by parallel processing. ?En? is given by the following formulae. Although their >>>>>>>>>>>> derivation processes are different depending on strong and weak scaling, derived formulae are the >>>>>>>>>>>> same. >>>>>>>>>>>> From the estimated time, my parallel efficiency using Amdahl's law on the current old cluster was 52.7%. >>>>>>>>>>>> So is my results acceptable? >>>>>>>>>>>> For the large data set, if using 2205 nodes (2205X8cores), my expected parallel efficiency is only 0.5%. The proposal recommends value of > 50%. >>>>>>>>>>>> The problem with this analysis is that the estimated serial fraction from Amdahl's Law changes as a function >>>>>>>>>>>> of problem size, so you cannot take the strong scaling from one problem and apply it to another without a >>>>>>>>>>>> model of this dependence. >>>>>>>>>>>> >>>>>>>>>>>> Weak scaling does model changes with problem size, so I would measure weak scaling on your current >>>>>>>>>>>> cluster, and extrapolate to the big machine. I realize that this does not make sense for many scientific >>>>>>>>>>>> applications, but neither does requiring a certain parallel efficiency. >>>>>>>>>>> Ok I check the results for my weak scaling it is even worse for the expected parallel efficiency. From the formula used, it's obvious it's doing some sort of exponential extrapolation decrease. So unless I can achieve a near > 90% speed up when I double the cores and problem size for my current 48/96 cores setup, extrapolating from about 96 nodes to 10,000 nodes will give a much lower expected parallel efficiency for the new case. >>>>>>>>>>> >>>>>>>>>>> However, it's mentioned in the FAQ that due to memory requirement, it's impossible to get >90% speed when I double the cores and problem size (ie linear increase in performance), which means that I can't get >90% speed up when I double the cores and problem size for my current 48/96 cores setup. Is that so? >>>>>>>>>> What is the output of -ksp_view -log_summary on the problem and then on the problem doubled in size and number of processors? >>>>>>>>>> >>>>>>>>>> Barry >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I have attached the output >>>>>>>>> >>>>>>>>> 48 cores: log48 >>>>>>>>> 96 cores: log96 >>>>>>>>> >>>>>>>>> There are 2 solvers - The momentum linear eqn uses bcgs, while the Poisson eqn uses hypre BoomerAMG. >>>>>>>>> >>>>>>>>> Problem size doubled from 158x266x150 to 158x266x300. >>>>>>>>>>> So is it fair to say that the main problem does not lie in my programming skills, but rather the way the linear equations are solved? >>>>>>>>>>> >>>>>>>>>>> Thanks. >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> Matt >>>>>>>>>>>> Is it possible for this type of scaling in PETSc (>50%), when using 17640 (2205X8) cores? >>>>>>>>>>>> Btw, I do not have access to the system. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Sent using CloudMagic Email >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>>>>>>>> -- Norbert Wiener >>>>>>>>> >>>>>>> >>>>> >>> > > From jychang48 at gmail.com Mon Nov 2 14:30:27 2015 From: jychang48 at gmail.com (Justin Chang) Date: Mon, 2 Nov 2015 13:30:27 -0700 Subject: [petsc-users] Setting/creating Mats for SNES use Message-ID: Hi all, In my DMPlex program, I have these lines: Mat A,J; ... ierr = DMSetMatType(dm, MATAIJ); CHKERRQ(ierr); ierr = DMCreateMatrix(dm, &J); CHKERRQ(ierr); A = J; ierr = DMSNESSetFunctionLocal(dm, ...); CHKERRQ(ierr); ierr = DMSNESSetJacobianLocal(dm, ...); CHKERRQ(ierr); ierr = SNESSetJacobian(snes, A, J, NULL, NULL); CHKERRQ(ierr); ierr = SNESSetFromOptions(snes); CHKERRQ(ierr); ... ierr = SNESSolve(snes, NULL, x); CHKERRQ(ierr); ... ierr = MatDestroy(&J); CHKERRQ(ierr); For the line "A = J;", what exactly is the difference, if any, between that and "ierr = MatDuplicate(...)" or "ierr = MatCopy(...)"? Do these different options somehow affect memory usage/performance? Say I am solving a standard poisson equation using either GAMG and/or HYPRE. Thanks, Justin -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon Nov 2 14:39:09 2015 From: jed at jedbrown.org (Jed Brown) Date: Mon, 02 Nov 2015 13:39:09 -0700 Subject: [petsc-users] Setting/creating Mats for SNES use In-Reply-To: References: Message-ID: <87wpu0kufm.fsf@jedbrown.org> Justin Chang writes: > Hi all, > > In my DMPlex program, I have these lines: > > Mat A,J; > > ... > > ierr = DMSetMatType(dm, MATAIJ); CHKERRQ(ierr); > ierr = DMCreateMatrix(dm, &J); CHKERRQ(ierr); > A = J; > > ierr = DMSNESSetFunctionLocal(dm, ...); CHKERRQ(ierr); > ierr = DMSNESSetJacobianLocal(dm, ...); CHKERRQ(ierr); > ierr = SNESSetJacobian(snes, A, J, NULL, NULL); CHKERRQ(ierr); > ierr = SNESSetFromOptions(snes); CHKERRQ(ierr); > > ... > > ierr = SNESSolve(snes, NULL, x); CHKERRQ(ierr); > > ... > ierr = MatDestroy(&J); CHKERRQ(ierr); > > > For the line "A = J;", This means you have two handles referring to the same object. > what exactly is the difference, if any, between that and "ierr = > MatDuplicate(...)" This creates a new object. > or "ierr = MatCopy(...)"? The second argument needs to be a valid Mat to call this function. > Do these different options somehow affect memory usage/performance? Yes. > Say I am solving a standard poisson equation using either GAMG and/or > HYPRE. > > Thanks, > Justin -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From jychang48 at gmail.com Mon Nov 2 14:49:18 2015 From: jychang48 at gmail.com (Justin Chang) Date: Mon, 2 Nov 2015 13:49:18 -0700 Subject: [petsc-users] Setting/creating Mats for SNES use In-Reply-To: <87wpu0kufm.fsf@jedbrown.org> References: <87wpu0kufm.fsf@jedbrown.org> Message-ID: So when would I use one over the other? - If I wanted to solve a problem using a direct solver or an iterative solver without a preconditioner, I would use A = J? - The documentation for SNESSetJacobian() says that AMat and PMat are usually the same, but if I used something like GAMG would I need to create two different objects/Mats? Thanks, Justin On Mon, Nov 2, 2015 at 1:39 PM, Jed Brown wrote: > Justin Chang writes: > > > Hi all, > > > > In my DMPlex program, I have these lines: > > > > Mat A,J; > > > > ... > > > > ierr = DMSetMatType(dm, MATAIJ); CHKERRQ(ierr); > > ierr = DMCreateMatrix(dm, &J); CHKERRQ(ierr); > > A = J; > > > > ierr = DMSNESSetFunctionLocal(dm, ...); CHKERRQ(ierr); > > ierr = DMSNESSetJacobianLocal(dm, ...); CHKERRQ(ierr); > > ierr = SNESSetJacobian(snes, A, J, NULL, NULL); CHKERRQ(ierr); > > ierr = SNESSetFromOptions(snes); CHKERRQ(ierr); > > > > ... > > > > ierr = SNESSolve(snes, NULL, x); CHKERRQ(ierr); > > > > ... > > ierr = MatDestroy(&J); CHKERRQ(ierr); > > > > > > For the line "A = J;", > > This means you have two handles referring to the same object. > > > what exactly is the difference, if any, between that and "ierr = > > MatDuplicate(...)" > > This creates a new object. > > > or "ierr = MatCopy(...)"? > > The second argument needs to be a valid Mat to call this function. > > > Do these different options somehow affect memory usage/performance? > > Yes. > > > Say I am solving a standard poisson equation using either GAMG and/or > > HYPRE. > > > > Thanks, > > Justin > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon Nov 2 15:02:08 2015 From: jed at jedbrown.org (Jed Brown) Date: Mon, 02 Nov 2015 14:02:08 -0700 Subject: [petsc-users] Setting/creating Mats for SNES use In-Reply-To: References: <87wpu0kufm.fsf@jedbrown.org> Message-ID: <87oafcktdb.fsf@jedbrown.org> Justin Chang writes: > So when would I use one over the other? > > - If I wanted to solve a problem using a direct solver or an iterative > solver without a preconditioner, I would use A = J? > > - The documentation for SNESSetJacobian() says that AMat and PMat are > usually the same, but if I used something like GAMG would I need to create > two different objects/Mats? It is a semantic distinction that has nothing to do with the preconditioning algorithm. If you define the operator that you want to solve with using a matrix-free implementation (could be MFFD, could be some fast evaluation that doesn't store matrix entries), but want to use an assembled operator for preconditioning, then you pass a different matrix. If you use a different discretization to define the operator versus the preconditioner, then you would pass different matrices. This happens, for example, when preconditioning using a low-order discretization and/or dropping some coupling terms that are deemed unimportant for preconditioning. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From dave.mayhem23 at gmail.com Mon Nov 2 15:19:48 2015 From: dave.mayhem23 at gmail.com (Dave May) Date: Mon, 2 Nov 2015 22:19:48 +0100 Subject: [petsc-users] Setting/creating Mats for SNES use In-Reply-To: References: <87wpu0kufm.fsf@jedbrown.org> Message-ID: On 2 November 2015 at 21:49, Justin Chang wrote: > So when would I use one over the other? > > - If I wanted to solve a problem using a direct solver or an iterative > solver without a preconditioner, I would use A = J? > Yes. > > - The documentation for SNESSetJacobian() says that AMat and PMat are > usually the same, but if I used something like GAMG would I need to create > two different objects/Mats? > I would say "maybe". If the Jacobian (here A) was defined via a matrix-free finite difference approximation, but you wish to use a non-trivial preconditioner, you might wish to assemble J. J might be the Picard linearized operator (for example). Another use case where A != J might arise is if you define A with a high order spatial discretization (probably matrix free) and you use a low order discretization to define the preconditioner which will ultimately be passed to GAMG. > > Thanks, > Justin > > On Mon, Nov 2, 2015 at 1:39 PM, Jed Brown wrote: > >> Justin Chang writes: >> >> > Hi all, >> > >> > In my DMPlex program, I have these lines: >> > >> > Mat A,J; >> > >> > ... >> > >> > ierr = DMSetMatType(dm, MATAIJ); CHKERRQ(ierr); >> > ierr = DMCreateMatrix(dm, &J); CHKERRQ(ierr); >> > A = J; >> > >> > ierr = DMSNESSetFunctionLocal(dm, ...); CHKERRQ(ierr); >> > ierr = DMSNESSetJacobianLocal(dm, ...); CHKERRQ(ierr); >> > ierr = SNESSetJacobian(snes, A, J, NULL, NULL); CHKERRQ(ierr); >> > ierr = SNESSetFromOptions(snes); CHKERRQ(ierr); >> > >> > ... >> > >> > ierr = SNESSolve(snes, NULL, x); CHKERRQ(ierr); >> > >> > ... >> > ierr = MatDestroy(&J); CHKERRQ(ierr); >> > >> > >> > For the line "A = J;", >> >> This means you have two handles referring to the same object. >> >> > what exactly is the difference, if any, between that and "ierr = >> > MatDuplicate(...)" >> >> This creates a new object. >> >> > or "ierr = MatCopy(...)"? >> >> The second argument needs to be a valid Mat to call this function. >> >> > Do these different options somehow affect memory usage/performance? >> >> Yes. >> >> > Say I am solving a standard poisson equation using either GAMG and/or >> > HYPRE. >> > >> > Thanks, >> > Justin >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.mayhem23 at gmail.com Mon Nov 2 15:21:22 2015 From: dave.mayhem23 at gmail.com (Dave May) Date: Mon, 2 Nov 2015 22:21:22 +0100 Subject: [petsc-users] Setting/creating Mats for SNES use In-Reply-To: References: <87wpu0kufm.fsf@jedbrown.org> Message-ID: Damn - Jed scooped me :D On 2 November 2015 at 22:19, Dave May wrote: > > > On 2 November 2015 at 21:49, Justin Chang wrote: > >> So when would I use one over the other? >> >> - If I wanted to solve a problem using a direct solver or an iterative >> solver without a preconditioner, I would use A = J? >> > > Yes. > > >> >> - The documentation for SNESSetJacobian() says that AMat and PMat are >> usually the same, but if I used something like GAMG would I need to create >> two different objects/Mats? >> > > I would say "maybe". > > If the Jacobian (here A) was defined via a matrix-free finite difference > approximation, but you wish to use a non-trivial preconditioner, you might > wish to assemble J. J might be the Picard linearized operator (for example). > > Another use case where A != J might arise is if you define A with a high > order spatial discretization (probably matrix free) and you use a low order > discretization to define the preconditioner which will ultimately be passed > to GAMG. > > > >> >> Thanks, >> Justin >> >> On Mon, Nov 2, 2015 at 1:39 PM, Jed Brown wrote: >> >>> Justin Chang writes: >>> >>> > Hi all, >>> > >>> > In my DMPlex program, I have these lines: >>> > >>> > Mat A,J; >>> > >>> > ... >>> > >>> > ierr = DMSetMatType(dm, MATAIJ); CHKERRQ(ierr); >>> > ierr = DMCreateMatrix(dm, &J); CHKERRQ(ierr); >>> > A = J; >>> > >>> > ierr = DMSNESSetFunctionLocal(dm, ...); CHKERRQ(ierr); >>> > ierr = DMSNESSetJacobianLocal(dm, ...); CHKERRQ(ierr); >>> > ierr = SNESSetJacobian(snes, A, J, NULL, NULL); CHKERRQ(ierr); >>> > ierr = SNESSetFromOptions(snes); CHKERRQ(ierr); >>> > >>> > ... >>> > >>> > ierr = SNESSolve(snes, NULL, x); CHKERRQ(ierr); >>> > >>> > ... >>> > ierr = MatDestroy(&J); CHKERRQ(ierr); >>> > >>> > >>> > For the line "A = J;", >>> >>> This means you have two handles referring to the same object. >>> >>> > what exactly is the difference, if any, between that and "ierr = >>> > MatDuplicate(...)" >>> >>> This creates a new object. >>> >>> > or "ierr = MatCopy(...)"? >>> >>> The second argument needs to be a valid Mat to call this function. >>> >>> > Do these different options somehow affect memory usage/performance? >>> >>> Yes. >>> >>> > Say I am solving a standard poisson equation using either GAMG and/or >>> > HYPRE. >>> > >>> > Thanks, >>> > Justin >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From francesco.magaletti at uniroma1.it Mon Nov 2 16:06:11 2015 From: francesco.magaletti at uniroma1.it (Francesco Magaletti) Date: Mon, 2 Nov 2015 23:06:11 +0100 Subject: [petsc-users] SNESComputeJacobianDefaultColor troubles In-Reply-To: <6DA1EA23-67FB-47D4-B909-68F858B7A978@mcs.anl.gov> References: <6DA1EA23-67FB-47D4-B909-68F858B7A978@mcs.anl.gov> Message-ID: Thanks Barry!! I tried and it worked nice, but it will be improved if I could manage the off-diagonal elements block by block. I have a stencil with 5 points (in 1D) and the coupling with the i+1 is different with respect to the i+2 elements. I found on PetsC site a reference to DMDASetGetMatrix function, but there are no examples. Can you give me an example with a simple matrix? Or there is a simpler way to treat the different off-diagonal blocks separately? Thank you again Francesco > Il giorno 02/nov/2015, alle ore 17:54, Barry Smith ha scritto: > > > Don't do any of that stuff you tried below. Just use DMDASetFillBlocks() before DMCreateMatrix() > > Barry > > >> On Nov 2, 2015, at 9:58 AM, Francesco Magaletti wrote: >> >> Hi everyone, >> >> I?m trying to solve a system of PDE?s with a full implicit time integration and a DMDA context to manage the cartesian structured grid. >> I don?t know the actual jacobian matrix (actually it is too cumbersome to be easily evaluated analytically) so I used SNESComputeJacobianDefaultColor to approximate it: >> >> DMSetMatType(da,MATAIJ); >> DMCreateMatrix(da,&J); >> TSGetSNES(ts, &snes); >> SNESSetJacobian(snes, J, J, SNESComputeJacobianDefaultColor,0); >> >> All it works perfectly but it happens that, when I increase the number of grid points, I go out of memory since the matrix becomes too big to be stored despite my estimate of memory consumption which is much lower than the actual one. Indeed a more careful analysis (with -mat_view) shows that there are a lot of zeros in the nonzero structure of the jacobian. I read in the manual that DMCreateMatrix preallocates the matrix by somewhat using the stencil width of the DMDA context, but in my case I don?t really need all those elements for every equation. >> I then tried with a poor preallocation, hoping petsC could manage it with some malloc on the fly: >> >> MatCreateAIJ(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,size,size,3,0,0,0,&J); >> MatSetOption(J,MAT_NEW_NONZERO_LOCATIONS,PETSC_TRUE); >> MatSetOption(J,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_FALSE); >> MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE); >> TSGetSNES(ts, &snes); >> SNESSetJacobian(snes, J, J, SNESComputeJacobianDefaultColor,0); >> >> but now it gives me runtime errors: >> >> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >> [0]PETSC ERROR: Object is in wrong state >> [0]PETSC ERROR: Matrix must be assembled by calls to MatAssemblyBegin/End(); >> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >> [0]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 >> [0]PETSC ERROR: ./bubble_petsc on a arch-linux2-c-debug named ... Mon Nov 2 16:24:12 2015 >> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich >> [0]PETSC ERROR: #1 MatFDColoringCreate() line 448 in /home/magaletto/Scaricati/petsc-3.6.0/src/mat/matfd/fdmatrix.c >> [0]PETSC ERROR: #2 SNESComputeJacobianDefaultColor() line 67 in /home/magaletto/Scaricati/petsc-3.6.0/src/snes/interface/snesj2.c >> [0]PETSC ERROR: #3 SNESComputeJacobian() line 2223 in /home/magaletto/Scaricati/petsc-3.6.0/src/snes/interface/snes.c >> [0]PETSC ERROR: #4 SNESSolve_NEWTONLS() line 231 in /home/magaletto/Scaricati/petsc-3.6.0/src/snes/impls/ls/ls.c >> [0]PETSC ERROR: #5 SNESSolve() line 3894 in /home/magaletto/Scaricati/petsc-3.6.0/src/snes/interface/snes.c >> [0]PETSC ERROR: #6 TSStep_Theta() line 197 in /home/magaletto/Scaricati/petsc-3.6.0/src/ts/impls/implicit/theta/theta.c >> [0]PETSC ERROR: #7 TSStep() line 3098 in /home/magaletto/Scaricati/petsc-3.6.0/src/ts/interface/ts.c >> [0]PETSC ERROR: #8 TSSolve() line 3282 in /home/magaletto/Scaricati/petsc-3.6.0/src/ts/interface/ts.c >> >> If I use instead the same preallocation but with the FD calculation without coloring it runs without problems (it is only extremely slow) and it uses the correct number of nonzero values: >> >> MatCreateAIJ(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,size,size,3,0,0,0,&J); >> MatSetOption(J,MAT_NEW_NONZERO_LOCATIONS,PETSC_TRUE); >> MatSetOption(J,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_FALSE); >> MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE); >> TSGetSNES(ts, &snes); >> SNESSetJacobian(snes, J, J, SNESComputeJacobianDefault,&appctx); >> >> What do you suggest to solve this problem? >> I also tried to set up a first step with fd without coloring in order to allocate the correct amount of memory and nonzero structure and successively to switch to the colored version: >> >> TSSetIFunction(ts,NULL,FormIFunction,&appctx); >> >> MatCreateAIJ(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,size,size,3,0,0,0,&J); >> MatSetOption(J,MAT_NEW_NONZERO_LOCATIONS,PETSC_TRUE); >> MatSetOption(J,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_FALSE); >> MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE); >> TSGetSNES(ts, &snes); >> SNESSetJacobian(snes, J, J, SNESComputeJacobianDefault,&appctx); >> TSStep(ts); >> >> ISColoring iscoloring; >> MatColoring coloring; >> MatFDColoring matfdcoloring; >> MatColoringCreate(J,&coloring); >> MatColoringSetFromOptions(coloring); >> MatColoringApply(coloring,&iscoloring); >> MatFDColoringCreate(J,iscoloring,&matfdcoloring); >> MatFDColoringSetFunction(matfdcoloring,(PetscErrorCode (*)(void))FormIFunction,&appctx); >> MatFDColoringSetFromOptions(matfdcoloring); >> MatFDColoringSetUp(J,iscoloring,matfdcoloring); >> SNESSetJacobian(snes,J,J,SNESComputeJacobianDefaultColor,matfdcoloring); >> >> TSStep(ts); >> >> but again it gives me runtime errors: >> >> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >> [0]PETSC ERROR: Invalid argument >> [0]PETSC ERROR: Wrong type of object: Parameter # 1 >> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >> [0]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 >> [0]PETSC ERROR: ./bubble_petsc on a arch-linux2-c-debug named ... Mon Nov 2 16:37:24 2015 >> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich >> [0]PETSC ERROR: #1 TSGetDM() line 4093 in /home/magaletto/Scaricati/petsc-3.6.0/src/ts/interface/ts.c >> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >> [0]PETSC ERROR: Corrupt argument: http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >> [0]PETSC ERROR: Invalid Pointer to Object: Parameter # 1 >> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >> [0]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 >> [0]PETSC ERROR: ./bubble_petsc on a arch-linux2-c-debug named ... Mon Nov 2 16:37:24 2015 >> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich >> [0]PETSC ERROR: #2 DMGetLocalVector() line 44 in /home/magaletto/Scaricati/petsc-3.6.0/src/dm/interface/dmget.c >> >> ===================================================================================== >> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >> = EXIT CODE: 11 >> = CLEANING UP REMAINING PROCESSES >> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >> ===================================================================================== >> APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11) >> >> where is my mistake? >> >> Your sincerely, >> Francesco > From bsmith at mcs.anl.gov Mon Nov 2 16:26:46 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 2 Nov 2015 16:26:46 -0600 Subject: [petsc-users] SNESComputeJacobianDefaultColor troubles In-Reply-To: References: <6DA1EA23-67FB-47D4-B909-68F858B7A978@mcs.anl.gov> Message-ID: <70EDED27-5F1F-43D2-A7A9-00B033EF451E@mcs.anl.gov> > On Nov 2, 2015, at 4:06 PM, Francesco Magaletti wrote: > > Thanks Barry!! > I tried and it worked nice, but it will be improved if I could manage the off-diagonal elements block by block. > I have a stencil with 5 points (in 1D) and the coupling with the i+1 is different with respect to the i+2 elements. I found on PetsC site a reference to DMDASetGetMatrix function, but there are no examples. Can you give me an example with a simple matrix? Or there is a simpler way to treat the different off-diagonal blocks separately? You will need to dig directly into the PETSc code that generates the matrix preallocation and nonzero structure for the DMDA and make a copy of the routine and modify it as needed. See for example DMCreateMatrix_DA_2d_MPIAIJ_Fill in src/dm/impls/da/fdda.c > > Thank you again > Francesco > >> Il giorno 02/nov/2015, alle ore 17:54, Barry Smith ha scritto: >> >> >> Don't do any of that stuff you tried below. Just use DMDASetFillBlocks() before DMCreateMatrix() >> >> Barry >> >> >>> On Nov 2, 2015, at 9:58 AM, Francesco Magaletti wrote: >>> >>> Hi everyone, >>> >>> I?m trying to solve a system of PDE?s with a full implicit time integration and a DMDA context to manage the cartesian structured grid. >>> I don?t know the actual jacobian matrix (actually it is too cumbersome to be easily evaluated analytically) so I used SNESComputeJacobianDefaultColor to approximate it: >>> >>> DMSetMatType(da,MATAIJ); >>> DMCreateMatrix(da,&J); >>> TSGetSNES(ts, &snes); >>> SNESSetJacobian(snes, J, J, SNESComputeJacobianDefaultColor,0); >>> >>> All it works perfectly but it happens that, when I increase the number of grid points, I go out of memory since the matrix becomes too big to be stored despite my estimate of memory consumption which is much lower than the actual one. Indeed a more careful analysis (with -mat_view) shows that there are a lot of zeros in the nonzero structure of the jacobian. I read in the manual that DMCreateMatrix preallocates the matrix by somewhat using the stencil width of the DMDA context, but in my case I don?t really need all those elements for every equation. >>> I then tried with a poor preallocation, hoping petsC could manage it with some malloc on the fly: >>> >>> MatCreateAIJ(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,size,size,3,0,0,0,&J); >>> MatSetOption(J,MAT_NEW_NONZERO_LOCATIONS,PETSC_TRUE); >>> MatSetOption(J,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_FALSE); >>> MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE); >>> TSGetSNES(ts, &snes); >>> SNESSetJacobian(snes, J, J, SNESComputeJacobianDefaultColor,0); >>> >>> but now it gives me runtime errors: >>> >>> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>> [0]PETSC ERROR: Object is in wrong state >>> [0]PETSC ERROR: Matrix must be assembled by calls to MatAssemblyBegin/End(); >>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>> [0]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 >>> [0]PETSC ERROR: ./bubble_petsc on a arch-linux2-c-debug named ... Mon Nov 2 16:24:12 2015 >>> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich >>> [0]PETSC ERROR: #1 MatFDColoringCreate() line 448 in /home/magaletto/Scaricati/petsc-3.6.0/src/mat/matfd/fdmatrix.c >>> [0]PETSC ERROR: #2 SNESComputeJacobianDefaultColor() line 67 in /home/magaletto/Scaricati/petsc-3.6.0/src/snes/interface/snesj2.c >>> [0]PETSC ERROR: #3 SNESComputeJacobian() line 2223 in /home/magaletto/Scaricati/petsc-3.6.0/src/snes/interface/snes.c >>> [0]PETSC ERROR: #4 SNESSolve_NEWTONLS() line 231 in /home/magaletto/Scaricati/petsc-3.6.0/src/snes/impls/ls/ls.c >>> [0]PETSC ERROR: #5 SNESSolve() line 3894 in /home/magaletto/Scaricati/petsc-3.6.0/src/snes/interface/snes.c >>> [0]PETSC ERROR: #6 TSStep_Theta() line 197 in /home/magaletto/Scaricati/petsc-3.6.0/src/ts/impls/implicit/theta/theta.c >>> [0]PETSC ERROR: #7 TSStep() line 3098 in /home/magaletto/Scaricati/petsc-3.6.0/src/ts/interface/ts.c >>> [0]PETSC ERROR: #8 TSSolve() line 3282 in /home/magaletto/Scaricati/petsc-3.6.0/src/ts/interface/ts.c >>> >>> If I use instead the same preallocation but with the FD calculation without coloring it runs without problems (it is only extremely slow) and it uses the correct number of nonzero values: >>> >>> MatCreateAIJ(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,size,size,3,0,0,0,&J); >>> MatSetOption(J,MAT_NEW_NONZERO_LOCATIONS,PETSC_TRUE); >>> MatSetOption(J,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_FALSE); >>> MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE); >>> TSGetSNES(ts, &snes); >>> SNESSetJacobian(snes, J, J, SNESComputeJacobianDefault,&appctx); >>> >>> What do you suggest to solve this problem? >>> I also tried to set up a first step with fd without coloring in order to allocate the correct amount of memory and nonzero structure and successively to switch to the colored version: >>> >>> TSSetIFunction(ts,NULL,FormIFunction,&appctx); >>> >>> MatCreateAIJ(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,size,size,3,0,0,0,&J); >>> MatSetOption(J,MAT_NEW_NONZERO_LOCATIONS,PETSC_TRUE); >>> MatSetOption(J,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_FALSE); >>> MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE); >>> TSGetSNES(ts, &snes); >>> SNESSetJacobian(snes, J, J, SNESComputeJacobianDefault,&appctx); >>> TSStep(ts); >>> >>> ISColoring iscoloring; >>> MatColoring coloring; >>> MatFDColoring matfdcoloring; >>> MatColoringCreate(J,&coloring); >>> MatColoringSetFromOptions(coloring); >>> MatColoringApply(coloring,&iscoloring); >>> MatFDColoringCreate(J,iscoloring,&matfdcoloring); >>> MatFDColoringSetFunction(matfdcoloring,(PetscErrorCode (*)(void))FormIFunction,&appctx); >>> MatFDColoringSetFromOptions(matfdcoloring); >>> MatFDColoringSetUp(J,iscoloring,matfdcoloring); >>> SNESSetJacobian(snes,J,J,SNESComputeJacobianDefaultColor,matfdcoloring); >>> >>> TSStep(ts); >>> >>> but again it gives me runtime errors: >>> >>> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>> [0]PETSC ERROR: Invalid argument >>> [0]PETSC ERROR: Wrong type of object: Parameter # 1 >>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>> [0]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 >>> [0]PETSC ERROR: ./bubble_petsc on a arch-linux2-c-debug named ... Mon Nov 2 16:37:24 2015 >>> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich >>> [0]PETSC ERROR: #1 TSGetDM() line 4093 in /home/magaletto/Scaricati/petsc-3.6.0/src/ts/interface/ts.c >>> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>> [0]PETSC ERROR: Corrupt argument: http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >>> [0]PETSC ERROR: Invalid Pointer to Object: Parameter # 1 >>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>> [0]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 >>> [0]PETSC ERROR: ./bubble_petsc on a arch-linux2-c-debug named ... Mon Nov 2 16:37:24 2015 >>> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich >>> [0]PETSC ERROR: #2 DMGetLocalVector() line 44 in /home/magaletto/Scaricati/petsc-3.6.0/src/dm/interface/dmget.c >>> >>> ===================================================================================== >>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >>> = EXIT CODE: 11 >>> = CLEANING UP REMAINING PROCESSES >>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >>> ===================================================================================== >>> APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11) >>> >>> where is my mistake? >>> >>> Your sincerely, >>> Francesco >> > From bsmith at mcs.anl.gov Mon Nov 2 19:29:23 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 2 Nov 2015 19:29:23 -0600 Subject: [petsc-users] How do I know it is steady state? In-Reply-To: References: Message-ID: <30E80E96-BC25-489D-8110-D81D9123ED19@mcs.anl.gov> > On Oct 30, 2015, at 12:23 PM, Zou (Non-US), Ling wrote: > > Hi All, > > From physics point of view, I know my simulation converges if nothing changes any more. > > I wonder how normally you do to detect if your simulation reaches steady state from numerical point of view. > Is it a good practice to use SNES convergence as a criterion, i.e., > SNES converges and it takes 0 iteration(s) Depends on the time integrator and SNES tolerance you are using. If you use a -snes_rtol 1.e-5 it will always try to squeeze 5 MORE digits out of the residual so won't take 0 iterations even if there is only a small change in the solution. > > Thanks, > > Ling From bsmith at mcs.anl.gov Mon Nov 2 21:55:20 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 2 Nov 2015 21:55:20 -0600 Subject: [petsc-users] soname seems to be absent in OS-X In-Reply-To: <165EDFE5-3E41-4F97-95B9-EE0F3372D25D@gmail.com> References: <6C83934B-C769-4C05-9235-96EE7C22944D@gmail.com> <5D62CF7C-2798-422E-856F-663904012549@mcs.anl.gov> <165EDFE5-3E41-4F97-95B9-EE0F3372D25D@gmail.com> Message-ID: Denis, Thanks for your careful explanation; I had no clue. Please find attached a patch for 3.6 that seems to give the behavior you desire. It is also in the maint, master and next branches in the repository Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: fix-soname-mac-prefix-install.patch Type: application/octet-stream Size: 829 bytes Desc: not available URL: -------------- next part -------------- > On Nov 1, 2015, at 3:49 AM, Denis Davydov wrote: > > Hi Barry, > > I think you use it already. After configure /lib/petsc/conf/petscvariables : > > SL_LINKER_FUNCTION = -dynamiclib -install_name $(call SONAME_FUNCTION,$(1),$(2)) -compatibility_version $(2) -current_version $(3) -single_module -multiply_defined suppress -undefined dynamic_lookup > SONAME_FUNCTION = $(1).$(2).dylib > > on the linking stage in the homebrew logs I see that the exact linking line contains -install_name : > > -Wl,-multiply_defined,suppress -Wl,-multiply_defined -Wl,suppress -Wl,-commons,use_dylibs -Wl,-search_paths_first -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O -dynamiclib -install_name /private/tmp/petsc20151027-97392-7khduu/petsc-3.6.2/real/lib/libpetsc.3.6.dylib -compatibility_version 3.6 -current_version 3.6.2 -single_module -multiply_defined suppress -undefined dynamic_lookup > > > If I do configure manually (bare-bones PETSc) and compile it, the install_name seems to be correct (note that it?s libpetsc.3.6.dylib instead of libpetsc.3.6.2.dylib) : > ????? > $ otool -D arch-darwin-c-debug/lib/libpetsc.3.6.2.dylib > arch-darwin-c-debug/lib/libpetsc.3.6.2.dylib: > /Users/davydden/Downloads/petsc-3.6.2/arch-darwin-c-debug/lib/libpetsc.3.6.dylib > > executable linked against end up using correct ABI version: > $ otool -L test | grep petsc > /Users/davydden/Downloads/petsc-3.6.2/arch-darwin-c-debug/lib/libpetsc.3.6.dylib (compatibility version 3.6.0, current version 3.6.2) > ????? > > > however after installation to ?prefix=/Users/davydden/Downloads/petsc-3.6.2/real the install name ends up being wrong: > ????? > $ otool -D real/lib/libpetsc.3.6.2.dylib > real/lib/libpetsc.3.6.2.dylib: > /Users/davydden/Downloads/petsc-3.6.2/real/lib/libpetsc.3.6.2.dylib > > executable linked against end up using libpetsc.3.6.2.dylib instead of ABI version: > otool -L test | grep petsc > /Users/davydden/Downloads/petsc-3.6.2/real/lib/libpetsc.3.6.2.dylib (compatibility version 3.6.0, current version 3.6.2) > ????? > > My guess would be there is something wrong happening in `make install`. > Perhaps when using ?install_name_tool? with "-id? flag to change a library?s install name. > As a workaround i will fix install name manually, but you may consider investigating this issue further. > > > p.s. an excerpt from http://cocoadev.com/ApplicationLinking : > > Unlike many OSes, OS X does not have a search path for the dynamic linker**. This means that you can't simply put a dynamic library in some "standard" location and have dyld find it, because there is no standard location. Instead, OS X embeds an "install name" inside each dynamic library. This install name is the path to where the library can be found when dyld needs to load it. When you build an application that links against a dynamic library, this install name is copied into the application binary. When the application runs, the copied install name is then used to locate the library or framework. > > ** Technically, dyld does have a search path, defined in the DYLD_FRAMEWORK_PATH and DYLD_LIBRARY_PATH variables. However, these are empty on OS X by default, so they rarely matter. > > Kind regards, > Denis > >> On 29 Oct 2015, at 22:01, Barry Smith wrote: >> >> >> Denis, >> >> We don't understand what purpose a soname serves on Apple or how to add it. If you need it let us know how to install PETSc so that it is set and we will do it. >> >> Barry > From zonexo at gmail.com Mon Nov 2 22:37:12 2015 From: zonexo at gmail.com (TAY wee-beng) Date: Tue, 3 Nov 2015 12:37:12 +0800 Subject: [petsc-users] Scaling with number of cores In-Reply-To: References: <5634EDA4.5030203@gmail.com> <56356E2A.8090000@gmail.com> <0BEDF334-C0C6-49D1-9FE8-D178181C5F00@mcs.anl.gov> <5636140A.3040506@gmail.com> <61EEB3FD-A093-475C-9841-9E79BB120385@mcs.anl.gov> <5636BDF3.8000109@gmail.com> <5636E059.2010107@gmail.com> <501DC517-2887-4222-9ECD-9951D521A4E5@mcs.anl.gov> <56370085.7070502@gmail.com> <6FF57BB6-FB0D-49F0-BF84-E4EA760FC010@mcs.anl.gov> <56372A12.90900@gmail.com> Message-ID: <563839F8.9080000@gmail.com> Hi, I tried : 1. -poisson_pc_gamg_agg_nsmooths 1 -poisson_pc_type gamg 2. -poisson_pc_type gamg Both options give: 1 0.00150000 0.00000000 0.00000000 1.00000000 NaN NaN NaN M Diverged but why?, time = 2 reason = -9 How can I check what's wrong? Thank you Yours sincerely, TAY wee-beng On 3/11/2015 3:18 AM, Barry Smith wrote: > hypre is just not scaling well here. I do not know why. Since hypre is a block box for us there is no way to determine why the poor scaling. > > If you make the same two runs with -pc_type gamg there will be a lot more information in the log summary about in what routines it is scaling well or poorly. > > Barry > > > >> On Nov 2, 2015, at 3:17 AM, TAY wee-beng wrote: >> >> Hi, >> >> I have attached the 2 files. >> >> Thank you >> >> Yours sincerely, >> >> TAY wee-beng >> >> On 2/11/2015 2:55 PM, Barry Smith wrote: >>> Run (158/2)x(266/2)x(150/2) grid on 8 processes and then (158)x(266)x(150) on 64 processors and send the two -log_summary results >>> >>> Barry >>> >>> >>> >>>> On Nov 2, 2015, at 12:19 AM, TAY wee-beng wrote: >>>> >>>> Hi, >>>> >>>> I have attached the new results. >>>> >>>> Thank you >>>> >>>> Yours sincerely, >>>> >>>> TAY wee-beng >>>> >>>> On 2/11/2015 12:27 PM, Barry Smith wrote: >>>>> Run without the -momentum_ksp_view -poisson_ksp_view and send the new results >>>>> >>>>> >>>>> You can see from the log summary that the PCSetUp is taking a much smaller percentage of the time meaning that it is reusing the preconditioner and not rebuilding it each time. >>>>> >>>>> Barry >>>>> >>>>> Something makes no sense with the output: it gives >>>>> >>>>> KSPSolve 199 1.0 2.3298e+03 1.0 5.20e+09 1.8 3.8e+04 9.9e+05 5.0e+02 90100 66100 24 90100 66100 24 165 >>>>> >>>>> 90% of the time is in the solve but there is no significant amount of time in other events of the code which is just not possible. I hope it is due to your IO. >>>>> >>>>> >>>>> >>>>>> On Nov 1, 2015, at 10:02 PM, TAY wee-beng wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I have attached the new run with 100 time steps for 48 and 96 cores. >>>>>> >>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >>>>>> >>>>>> Why does the number of processes increase so much? Is there something wrong with my coding? Seems to be so too for my new run. >>>>>> >>>>>> Thank you >>>>>> >>>>>> Yours sincerely, >>>>>> >>>>>> TAY wee-beng >>>>>> >>>>>> On 2/11/2015 9:49 AM, Barry Smith wrote: >>>>>>> If you are doing many time steps with the same linear solver then you MUST do your weak scaling studies with MANY time steps since the setup time of AMG only takes place in the first stimestep. So run both 48 and 96 processes with the same large number of time steps. >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Nov 1, 2015, at 7:35 PM, TAY wee-beng wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Sorry I forgot and use the old a.out. I have attached the new log for 48cores (log48), together with the 96cores log (log96). >>>>>>>> >>>>>>>> Why does the number of processes increase so much? Is there something wrong with my coding? >>>>>>>> >>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >>>>>>>> >>>>>>>> Lastly, I only simulated 2 time steps previously. Now I run for 10 timesteps (log48_10). Is it building the preconditioner at every timestep? >>>>>>>> >>>>>>>> Also, what about momentum eqn? Is it working well? >>>>>>>> >>>>>>>> I will try the gamg later too. >>>>>>>> >>>>>>>> Thank you >>>>>>>> >>>>>>>> Yours sincerely, >>>>>>>> >>>>>>>> TAY wee-beng >>>>>>>> >>>>>>>> On 2/11/2015 12:30 AM, Barry Smith wrote: >>>>>>>>> You used gmres with 48 processes but richardson with 96. You need to be careful and make sure you don't change the solvers when you change the number of processors since you can get very different inconsistent results >>>>>>>>> >>>>>>>>> Anyways all the time is being spent in the BoomerAMG algebraic multigrid setup and it is is scaling badly. When you double the problem size and number of processes it went from 3.2445e+01 to 4.3599e+02 seconds. >>>>>>>>> >>>>>>>>> PCSetUp 3 1.0 3.2445e+01 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 62 8 0 0 4 62 8 0 0 5 11 >>>>>>>>> >>>>>>>>> PCSetUp 3 1.0 4.3599e+02 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 85 18 0 0 6 85 18 0 0 6 2 >>>>>>>>> >>>>>>>>> Now is the Poisson problem changing at each timestep or can you use the same preconditioner built with BoomerAMG for all the time steps? Algebraic multigrid has a large set up time that you often doesn't matter if you have many time steps but if you have to rebuild it each timestep it is too large? >>>>>>>>> >>>>>>>>> You might also try -pc_type gamg and see how PETSc's algebraic multigrid scales for your problem/machine. >>>>>>>>> >>>>>>>>> Barry >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Nov 1, 2015, at 7:30 AM, TAY wee-beng wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 1/11/2015 10:00 AM, Barry Smith wrote: >>>>>>>>>>>> On Oct 31, 2015, at 8:43 PM, TAY wee-beng wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 1/11/2015 12:47 AM, Matthew Knepley wrote: >>>>>>>>>>>>> On Sat, Oct 31, 2015 at 11:34 AM, TAY wee-beng wrote: >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I understand that as mentioned in the faq, due to the limitations in memory, the scaling is not linear. So, I am trying to write a proposal to use a supercomputer. >>>>>>>>>>>>> Its specs are: >>>>>>>>>>>>> Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of memory per node) >>>>>>>>>>>>> >>>>>>>>>>>>> 8 cores / processor >>>>>>>>>>>>> Interconnect: Tofu (6-dimensional mesh/torus) Interconnect >>>>>>>>>>>>> Each cabinet contains 96 computing nodes, >>>>>>>>>>>>> One of the requirement is to give the performance of my current code with my current set of data, and there is a formula to calculate the estimated parallel efficiency when using the new large set of data >>>>>>>>>>>>> There are 2 ways to give performance: >>>>>>>>>>>>> 1. Strong scaling, which is defined as how the elapsed time varies with the number of processors for a fixed >>>>>>>>>>>>> problem. >>>>>>>>>>>>> 2. Weak scaling, which is defined as how the elapsed time varies with the number of processors for a >>>>>>>>>>>>> fixed problem size per processor. >>>>>>>>>>>>> I ran my cases with 48 and 96 cores with my current cluster, giving 140 and 90 mins respectively. This is classified as strong scaling. >>>>>>>>>>>>> Cluster specs: >>>>>>>>>>>>> CPU: AMD 6234 2.4GHz >>>>>>>>>>>>> 8 cores / processor (CPU) >>>>>>>>>>>>> 6 CPU / node >>>>>>>>>>>>> So 48 Cores / CPU >>>>>>>>>>>>> Not sure abt the memory / node >>>>>>>>>>>>> >>>>>>>>>>>>> The parallel efficiency ?En? for a given degree of parallelism ?n? indicates how much the program is >>>>>>>>>>>>> efficiently accelerated by parallel processing. ?En? is given by the following formulae. Although their >>>>>>>>>>>>> derivation processes are different depending on strong and weak scaling, derived formulae are the >>>>>>>>>>>>> same. >>>>>>>>>>>>> From the estimated time, my parallel efficiency using Amdahl's law on the current old cluster was 52.7%. >>>>>>>>>>>>> So is my results acceptable? >>>>>>>>>>>>> For the large data set, if using 2205 nodes (2205X8cores), my expected parallel efficiency is only 0.5%. The proposal recommends value of > 50%. >>>>>>>>>>>>> The problem with this analysis is that the estimated serial fraction from Amdahl's Law changes as a function >>>>>>>>>>>>> of problem size, so you cannot take the strong scaling from one problem and apply it to another without a >>>>>>>>>>>>> model of this dependence. >>>>>>>>>>>>> >>>>>>>>>>>>> Weak scaling does model changes with problem size, so I would measure weak scaling on your current >>>>>>>>>>>>> cluster, and extrapolate to the big machine. I realize that this does not make sense for many scientific >>>>>>>>>>>>> applications, but neither does requiring a certain parallel efficiency. >>>>>>>>>>>> Ok I check the results for my weak scaling it is even worse for the expected parallel efficiency. From the formula used, it's obvious it's doing some sort of exponential extrapolation decrease. So unless I can achieve a near > 90% speed up when I double the cores and problem size for my current 48/96 cores setup, extrapolating from about 96 nodes to 10,000 nodes will give a much lower expected parallel efficiency for the new case. >>>>>>>>>>>> >>>>>>>>>>>> However, it's mentioned in the FAQ that due to memory requirement, it's impossible to get >90% speed when I double the cores and problem size (ie linear increase in performance), which means that I can't get >90% speed up when I double the cores and problem size for my current 48/96 cores setup. Is that so? >>>>>>>>>>> What is the output of -ksp_view -log_summary on the problem and then on the problem doubled in size and number of processors? >>>>>>>>>>> >>>>>>>>>>> Barry >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I have attached the output >>>>>>>>>> >>>>>>>>>> 48 cores: log48 >>>>>>>>>> 96 cores: log96 >>>>>>>>>> >>>>>>>>>> There are 2 solvers - The momentum linear eqn uses bcgs, while the Poisson eqn uses hypre BoomerAMG. >>>>>>>>>> >>>>>>>>>> Problem size doubled from 158x266x150 to 158x266x300. >>>>>>>>>>>> So is it fair to say that the main problem does not lie in my programming skills, but rather the way the linear equations are solved? >>>>>>>>>>>> >>>>>>>>>>>> Thanks. >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> >>>>>>>>>>>>> Matt >>>>>>>>>>>>> Is it possible for this type of scaling in PETSc (>50%), when using 17640 (2205X8) cores? >>>>>>>>>>>>> Btw, I do not have access to the system. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Sent using CloudMagic Email >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>> >>>>>>>> >>>>>> >>>> >> From bsmith at mcs.anl.gov Mon Nov 2 22:45:03 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 2 Nov 2015 22:45:03 -0600 Subject: [petsc-users] Scaling with number of cores In-Reply-To: <563839F8.9080000@gmail.com> References: <5634EDA4.5030203@gmail.com> <56356E2A.8090000@gmail.com> <0BEDF334-C0C6-49D1-9FE8-D178181C5F00@mcs.anl.gov> <5636140A.3040506@gmail.com> <61EEB3FD-A093-475C-9841-9E79BB120385@mcs.anl.gov> <5636BDF3.8000109@gmail.com> <5636E059.2010107@gmail.com> <501DC517-2887-4222-9ECD-9951D521A4E5@mcs.anl.gov> <56370085.7070502@gmail.com> <6FF57BB6-FB0D-49F0-BF84-E4EA760FC010@mcs.anl.gov> <56372A12.90900@gmail.com> <563839F8.9080000@gmail.com> Message-ID: <589BE4CF-18B5-4F04-A442-7FB0F2F3AFC1@mcs.anl.gov> > On Nov 2, 2015, at 10:37 PM, TAY wee-beng wrote: > > Hi, > > I tried : > > 1. -poisson_pc_gamg_agg_nsmooths 1 -poisson_pc_type gamg > > 2. -poisson_pc_type gamg Run with -poisson_ksp_monitor_true_residual -poisson_ksp_monitor_converged_reason Does your poisson have Neumann boundary conditions? Do you have any zeros on the diagonal for the matrix (you shouldn't). There may be something wrong with your poisson discretization that was also messing up hypre > > Both options give: > > 1 0.00150000 0.00000000 0.00000000 1.00000000 NaN NaN NaN > M Diverged but why?, time = 2 > reason = -9 > > How can I check what's wrong? > > Thank you > > Yours sincerely, > > TAY wee-beng > > On 3/11/2015 3:18 AM, Barry Smith wrote: >> hypre is just not scaling well here. I do not know why. Since hypre is a block box for us there is no way to determine why the poor scaling. >> >> If you make the same two runs with -pc_type gamg there will be a lot more information in the log summary about in what routines it is scaling well or poorly. >> >> Barry >> >> >> >>> On Nov 2, 2015, at 3:17 AM, TAY wee-beng wrote: >>> >>> Hi, >>> >>> I have attached the 2 files. >>> >>> Thank you >>> >>> Yours sincerely, >>> >>> TAY wee-beng >>> >>> On 2/11/2015 2:55 PM, Barry Smith wrote: >>>> Run (158/2)x(266/2)x(150/2) grid on 8 processes and then (158)x(266)x(150) on 64 processors and send the two -log_summary results >>>> >>>> Barry >>>> >>>> >>>>> On Nov 2, 2015, at 12:19 AM, TAY wee-beng wrote: >>>>> >>>>> Hi, >>>>> >>>>> I have attached the new results. >>>>> >>>>> Thank you >>>>> >>>>> Yours sincerely, >>>>> >>>>> TAY wee-beng >>>>> >>>>> On 2/11/2015 12:27 PM, Barry Smith wrote: >>>>>> Run without the -momentum_ksp_view -poisson_ksp_view and send the new results >>>>>> >>>>>> >>>>>> You can see from the log summary that the PCSetUp is taking a much smaller percentage of the time meaning that it is reusing the preconditioner and not rebuilding it each time. >>>>>> >>>>>> Barry >>>>>> >>>>>> Something makes no sense with the output: it gives >>>>>> >>>>>> KSPSolve 199 1.0 2.3298e+03 1.0 5.20e+09 1.8 3.8e+04 9.9e+05 5.0e+02 90100 66100 24 90100 66100 24 165 >>>>>> >>>>>> 90% of the time is in the solve but there is no significant amount of time in other events of the code which is just not possible. I hope it is due to your IO. >>>>>> >>>>>> >>>>>> >>>>>>> On Nov 1, 2015, at 10:02 PM, TAY wee-beng wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I have attached the new run with 100 time steps for 48 and 96 cores. >>>>>>> >>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >>>>>>> >>>>>>> Why does the number of processes increase so much? Is there something wrong with my coding? Seems to be so too for my new run. >>>>>>> >>>>>>> Thank you >>>>>>> >>>>>>> Yours sincerely, >>>>>>> >>>>>>> TAY wee-beng >>>>>>> >>>>>>> On 2/11/2015 9:49 AM, Barry Smith wrote: >>>>>>>> If you are doing many time steps with the same linear solver then you MUST do your weak scaling studies with MANY time steps since the setup time of AMG only takes place in the first stimestep. So run both 48 and 96 processes with the same large number of time steps. >>>>>>>> >>>>>>>> Barry >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On Nov 1, 2015, at 7:35 PM, TAY wee-beng wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Sorry I forgot and use the old a.out. I have attached the new log for 48cores (log48), together with the 96cores log (log96). >>>>>>>>> >>>>>>>>> Why does the number of processes increase so much? Is there something wrong with my coding? >>>>>>>>> >>>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >>>>>>>>> >>>>>>>>> Lastly, I only simulated 2 time steps previously. Now I run for 10 timesteps (log48_10). Is it building the preconditioner at every timestep? >>>>>>>>> >>>>>>>>> Also, what about momentum eqn? Is it working well? >>>>>>>>> >>>>>>>>> I will try the gamg later too. >>>>>>>>> >>>>>>>>> Thank you >>>>>>>>> >>>>>>>>> Yours sincerely, >>>>>>>>> >>>>>>>>> TAY wee-beng >>>>>>>>> >>>>>>>>> On 2/11/2015 12:30 AM, Barry Smith wrote: >>>>>>>>>> You used gmres with 48 processes but richardson with 96. You need to be careful and make sure you don't change the solvers when you change the number of processors since you can get very different inconsistent results >>>>>>>>>> >>>>>>>>>> Anyways all the time is being spent in the BoomerAMG algebraic multigrid setup and it is is scaling badly. When you double the problem size and number of processes it went from 3.2445e+01 to 4.3599e+02 seconds. >>>>>>>>>> >>>>>>>>>> PCSetUp 3 1.0 3.2445e+01 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 62 8 0 0 4 62 8 0 0 5 11 >>>>>>>>>> >>>>>>>>>> PCSetUp 3 1.0 4.3599e+02 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 85 18 0 0 6 85 18 0 0 6 2 >>>>>>>>>> >>>>>>>>>> Now is the Poisson problem changing at each timestep or can you use the same preconditioner built with BoomerAMG for all the time steps? Algebraic multigrid has a large set up time that you often doesn't matter if you have many time steps but if you have to rebuild it each timestep it is too large? >>>>>>>>>> >>>>>>>>>> You might also try -pc_type gamg and see how PETSc's algebraic multigrid scales for your problem/machine. >>>>>>>>>> >>>>>>>>>> Barry >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Nov 1, 2015, at 7:30 AM, TAY wee-beng wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 1/11/2015 10:00 AM, Barry Smith wrote: >>>>>>>>>>>>> On Oct 31, 2015, at 8:43 PM, TAY wee-beng wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 1/11/2015 12:47 AM, Matthew Knepley wrote: >>>>>>>>>>>>>> On Sat, Oct 31, 2015 at 11:34 AM, TAY wee-beng wrote: >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I understand that as mentioned in the faq, due to the limitations in memory, the scaling is not linear. So, I am trying to write a proposal to use a supercomputer. >>>>>>>>>>>>>> Its specs are: >>>>>>>>>>>>>> Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of memory per node) >>>>>>>>>>>>>> >>>>>>>>>>>>>> 8 cores / processor >>>>>>>>>>>>>> Interconnect: Tofu (6-dimensional mesh/torus) Interconnect >>>>>>>>>>>>>> Each cabinet contains 96 computing nodes, >>>>>>>>>>>>>> One of the requirement is to give the performance of my current code with my current set of data, and there is a formula to calculate the estimated parallel efficiency when using the new large set of data >>>>>>>>>>>>>> There are 2 ways to give performance: >>>>>>>>>>>>>> 1. Strong scaling, which is defined as how the elapsed time varies with the number of processors for a fixed >>>>>>>>>>>>>> problem. >>>>>>>>>>>>>> 2. Weak scaling, which is defined as how the elapsed time varies with the number of processors for a >>>>>>>>>>>>>> fixed problem size per processor. >>>>>>>>>>>>>> I ran my cases with 48 and 96 cores with my current cluster, giving 140 and 90 mins respectively. This is classified as strong scaling. >>>>>>>>>>>>>> Cluster specs: >>>>>>>>>>>>>> CPU: AMD 6234 2.4GHz >>>>>>>>>>>>>> 8 cores / processor (CPU) >>>>>>>>>>>>>> 6 CPU / node >>>>>>>>>>>>>> So 48 Cores / CPU >>>>>>>>>>>>>> Not sure abt the memory / node >>>>>>>>>>>>>> >>>>>>>>>>>>>> The parallel efficiency ?En? for a given degree of parallelism ?n? indicates how much the program is >>>>>>>>>>>>>> efficiently accelerated by parallel processing. ?En? is given by the following formulae. Although their >>>>>>>>>>>>>> derivation processes are different depending on strong and weak scaling, derived formulae are the >>>>>>>>>>>>>> same. >>>>>>>>>>>>>> From the estimated time, my parallel efficiency using Amdahl's law on the current old cluster was 52.7%. >>>>>>>>>>>>>> So is my results acceptable? >>>>>>>>>>>>>> For the large data set, if using 2205 nodes (2205X8cores), my expected parallel efficiency is only 0.5%. The proposal recommends value of > 50%. >>>>>>>>>>>>>> The problem with this analysis is that the estimated serial fraction from Amdahl's Law changes as a function >>>>>>>>>>>>>> of problem size, so you cannot take the strong scaling from one problem and apply it to another without a >>>>>>>>>>>>>> model of this dependence. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Weak scaling does model changes with problem size, so I would measure weak scaling on your current >>>>>>>>>>>>>> cluster, and extrapolate to the big machine. I realize that this does not make sense for many scientific >>>>>>>>>>>>>> applications, but neither does requiring a certain parallel efficiency. >>>>>>>>>>>>> Ok I check the results for my weak scaling it is even worse for the expected parallel efficiency. From the formula used, it's obvious it's doing some sort of exponential extrapolation decrease. So unless I can achieve a near > 90% speed up when I double the cores and problem size for my current 48/96 cores setup, extrapolating from about 96 nodes to 10,000 nodes will give a much lower expected parallel efficiency for the new case. >>>>>>>>>>>>> >>>>>>>>>>>>> However, it's mentioned in the FAQ that due to memory requirement, it's impossible to get >90% speed when I double the cores and problem size (ie linear increase in performance), which means that I can't get >90% speed up when I double the cores and problem size for my current 48/96 cores setup. Is that so? >>>>>>>>>>>> What is the output of -ksp_view -log_summary on the problem and then on the problem doubled in size and number of processors? >>>>>>>>>>>> >>>>>>>>>>>> Barry >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I have attached the output >>>>>>>>>>> >>>>>>>>>>> 48 cores: log48 >>>>>>>>>>> 96 cores: log96 >>>>>>>>>>> >>>>>>>>>>> There are 2 solvers - The momentum linear eqn uses bcgs, while the Poisson eqn uses hypre BoomerAMG. >>>>>>>>>>> >>>>>>>>>>> Problem size doubled from 158x266x150 to 158x266x300. >>>>>>>>>>>>> So is it fair to say that the main problem does not lie in my programming skills, but rather the way the linear equations are solved? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Matt >>>>>>>>>>>>>> Is it possible for this type of scaling in PETSc (>50%), when using 17640 (2205X8) cores? >>>>>>>>>>>>>> Btw, I do not have access to the system. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Sent using CloudMagic Email >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>> > From davydden at gmail.com Tue Nov 3 02:07:15 2015 From: davydden at gmail.com (Denis Davydov) Date: Tue, 3 Nov 2015 09:07:15 +0100 Subject: [petsc-users] CG+GAMG convergence issues in GHEP Krylov-Schur for some MPI runs Message-ID: <084F486B-00B4-4803-A95B-EE390EB25391@gmail.com> Dear all, I experience strange convergence problems in SLEPc for GHEP with Krylov-Schur and CG + GAMG. The issue appears to be contingent on the number of MPI cores used. Say for 8 cores there is no issue and for 4 cores there is an issue. When I substitute GAMG with Jacobi for the problematic number of cores -- all works. To be more specific, I solve Ax=\lambda Bx for a sequence of A?s where A is a function of eigenvectors. On each iteration step currently the eigensolver EPS is initialised from scratch. And thus should the underlying ST, KSP, PC objects: -st_ksp_type cg -st_pc_type gamg -st_ksp_rtol 1e-12. For these particular matrices the issue appears on the 4th iteration, even though the matrix to be inverted (mass/overlap matrix) is the same, does not change!!! From my debuging info the A matrix has the same norm for CG + GAMG vs CG + Jacobi cases: DEAL:: frobenius_norm = 365.7 DEAL:: linfty_norm = 19.87 DEAL:: l1_norm = 19.87 Just to be sure that there are no bugs on my side which would result in different mass matrices i check that it has the same norm for CG + GAMG vs CG + Jacobi BEFORE i start iteration: DEAL:: frobenius_norm = 166.4 DEAL:: linfty_norm = 8.342 DEAL:: l1_norm = 8.342 All the dependent scalar quantities I calculate on each iteration are identical for the two cases, which makes me believe that the solution path is the same up to the certain tolerance. The only output which is slightly different are the number iterations for convergence in EPS (e.g. 113 vs 108) and the resulting maxing EPSComputeResidualNorm : 4.1524e-07 vs 2.9639e-08. Any ideas what could be an issue, especially given the fact that it does work for some numbers of cores and does not for other? Perhaps some default settings in GAMG preconditioner? Although that does not explain why it works for the first 3 iterations and does not on 4th as the mass matrix is unchanged... Lastly, i suppose ideally i should keep the eigensolver context between the iterations and just update the matrices by EPSSetOperators. Is it correct to assume that since B matrix does not change between iterations and I use the default shift transformation with zero shift (operator is B^{-1)A ), the GAMG preconditioner will not be re-initialised and thus I should save some time? p.s. the relevant error message is below. I have the same issues on CentOS cluster, so it is not related to OS-X. Kind regards, Denis === [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch system) has told this process to end [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [0]PETSC ERROR: likely location of problem given in stack below [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------ [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, [0]PETSC ERROR: INSTEAD the line number of the start of the function [0]PETSC ERROR: is given. [0]PETSC ERROR: [0] KSPSolve line 510 /private/tmp/petsc20151102-50378-1t7b3in/petsc-3.6.2/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: [0] STMatSolve line 148 /private/tmp/slepc20151102-3081-1xln4h0/slepc-3.6.1/src/sys/classes/st/interface/stsles.c [0]PETSC ERROR: [0] STApply_Shift line 33 /private/tmp/slepc20151102-3081-1xln4h0/slepc-3.6.1/src/sys/classes/st/impls/shift/shift.c [0]PETSC ERROR: [0] STApply line 50 /private/tmp/slepc20151102-3081-1xln4h0/slepc-3.6.1/src/sys/classes/st/interface/stsolve.c [0]PETSC ERROR: [0] EPSGetStartVector line 726 /private/tmp/slepc20151102-3081-1xln4h0/slepc-3.6.1/src/eps/interface/epssolve.c [0]PETSC ERROR: [0] EPSSolve_KrylovSchur_Symm line 41 /private/tmp/slepc20151102-3081-1xln4h0/slepc-3.6.1/src/eps/impls/krylov/krylovschur/ks-symm.c [0]PETSC ERROR: [0] EPSSolve line 83 /private/tmp/slepc20151102-3081-1xln4h0/slepc-3.6.1/src/eps/interface/epssolve.c [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Signal received [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.6.2, Oct, 02, 2015 [0]PETSC ERROR: /Users/davydden/Desktop/work/C++/deal.ii-dft/build_debug~/dft on a real named MBP-Denis.fritz.box by davydden Tue Nov 3 07:02:47 2015 [0]PETSC ERROR: Configure options CC=/usr/local/bin/mpicc CXX=/usr/local/bin/mpicxx F77=/usr/local/bin/mpif77 FC=/usr/local/bin/mpif90 --with-shared-libraries=1 --with-pthread=0 --with-openmp=0 --with-debugging=1 --with-ssl=0 --with-superlu_dist-include=/usr/local/opt/superlu_dist/include/superlu_dist --with-superlu_dist-lib="-L/usr/local/opt/superlu_dist/lib -lsuperlu_dist" --with-superlu-include=/usr/local/Cellar/superlu43/4.3/include/superlu --with-superlu-lib="-L/usr/local/Cellar/superlu43/4.3/lib -lsuperlu" --with-fftw-dir=/usr/local/opt/fftw --with-netcdf-dir=/usr/local/opt/netcdf --with-suitesparse-dir=/usr/local/opt/suite-sparse --with-hdf5-dir=/usr/local/opt/hdf5 --with-metis-dir=/usr/local/opt/metis --with-parmetis-dir=/usr/local/opt/parmetis --with-scalapack-dir=/usr/local/opt/scalapack --with-mumps-dir=/usr/local/opt/mumps --with-x=0 --prefix=/usr/local/Cellar/petsc/3.6.2/real --with-scalar-type=real --with-hypre-dir=/usr/local/opt/hypre --with-sundials-dir=/usr/local/opt/sundials --with-hwloc-dir=/usr/local/opt/hwloc [0]PETSC ERROR: #8 User provided function() line 0 in unknown file -------------------------------------------------------------------------- mpirun noticed that process rank 3 with PID 96754 on node MBP-Denis exited on signal 6 (Abort trap: 6). -------------------------------------------------------------------------- From jroman at dsic.upv.es Tue Nov 3 05:20:43 2015 From: jroman at dsic.upv.es (Jose E. Roman) Date: Tue, 3 Nov 2015 12:20:43 +0100 Subject: [petsc-users] CG+GAMG convergence issues in GHEP Krylov-Schur for some MPI runs In-Reply-To: <084F486B-00B4-4803-A95B-EE390EB25391@gmail.com> References: <084F486B-00B4-4803-A95B-EE390EB25391@gmail.com> Message-ID: <2724FD04-9FED-43B8-B8A4-9F66BB8BD43B@dsic.upv.es> I am answering the SLEPc-related questions: - Having different number of iterations when changing the number of processes is normal. - Yes, if you do not destroy the EPS solver, then the preconditioner would be reused. Regarding the segmentation fault, I have no clue. Not sure if this is related to GAMG or not. Maybe running under valgrind could provide more information. Jose On 3/11/2015, Denis Davydov wrote: > > > Dear all, > > I experience strange convergence problems in SLEPc for GHEP with Krylov-Schur and CG + GAMG. > The issue appears to be contingent on the number of MPI cores used. > Say for 8 cores there is no issue and for 4 cores there is an issue. > When I substitute GAMG with Jacobi for the problematic number of cores -- all works. > > To be more specific, I solve Ax=\lambda Bx for a sequence of A?s where A is a function of eigenvectors. > On each iteration step currently the eigensolver EPS is initialised from scratch. And thus should the underlying > ST, KSP, PC objects: -st_ksp_type cg -st_pc_type gamg -st_ksp_rtol 1e-12. > For these particular matrices the issue appears on the 4th iteration, even though the matrix to be inverted (mass/overlap matrix) > is the same, does not change!!! > From my debuging info the A matrix has the same norm for CG + GAMG vs CG + Jacobi cases: > DEAL:: frobenius_norm = 365.7 > DEAL:: linfty_norm = 19.87 > DEAL:: l1_norm = 19.87 > Just to be sure that there are no bugs on my side which would result in different mass matrices i check that > it has the same norm for CG + GAMG vs CG + Jacobi BEFORE i start iteration: > DEAL:: frobenius_norm = 166.4 > DEAL:: linfty_norm = 8.342 > DEAL:: l1_norm = 8.342 > All the dependent scalar quantities I calculate on each iteration are identical for the two cases, which makes me believe that > the solution path is the same up to the certain tolerance. > The only output which is slightly different are the number iterations for convergence in EPS (e.g. 113 vs 108) and the > resulting maxing EPSComputeResidualNorm : 4.1524e-07 vs 2.9639e-08. > > > Any ideas what could be an issue, especially given the fact that it does work for some numbers of cores and does not for other? > Perhaps some default settings in GAMG preconditioner? Although that does not explain why it works for the first 3 iterations > and does not on 4th as the mass matrix is unchanged... > > Lastly, i suppose ideally i should keep the eigensolver context between the iterations and just update the matrices by EPSSetOperators. > Is it correct to assume that since B matrix does not change between iterations and I use the default shift transformation with zero shift > (operator is B^{-1)A ), the GAMG preconditioner will not be re-initialised and thus I should save some time? > > p.s. the relevant error message is below. I have the same issues on CentOS cluster, so it is not related to OS-X. > > Kind regards, > Denis > > > === > [0]PETSC ERROR: ------------------------------------------------------------------------ > [0]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch system) has told this process to end > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors > [0]PETSC ERROR: likely location of problem given in stack below > [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------ > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > [0]PETSC ERROR: INSTEAD the line number of the start of the function > [0]PETSC ERROR: is given. > [0]PETSC ERROR: [0] KSPSolve line 510 /private/tmp/petsc20151102-50378-1t7b3in/petsc-3.6.2/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: [0] STMatSolve line 148 /private/tmp/slepc20151102-3081-1xln4h0/slepc-3.6.1/src/sys/classes/st/interface/stsles.c > [0]PETSC ERROR: [0] STApply_Shift line 33 /private/tmp/slepc20151102-3081-1xln4h0/slepc-3.6.1/src/sys/classes/st/impls/shift/shift.c > [0]PETSC ERROR: [0] STApply line 50 /private/tmp/slepc20151102-3081-1xln4h0/slepc-3.6.1/src/sys/classes/st/interface/stsolve.c > [0]PETSC ERROR: [0] EPSGetStartVector line 726 /private/tmp/slepc20151102-3081-1xln4h0/slepc-3.6.1/src/eps/interface/epssolve.c > [0]PETSC ERROR: [0] EPSSolve_KrylovSchur_Symm line 41 /private/tmp/slepc20151102-3081-1xln4h0/slepc-3.6.1/src/eps/impls/krylov/krylovschur/ks-symm.c > [0]PETSC ERROR: [0] EPSSolve line 83 /private/tmp/slepc20151102-3081-1xln4h0/slepc-3.6.1/src/eps/interface/epssolve.c > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Signal received > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.6.2, Oct, 02, 2015 > [0]PETSC ERROR: /Users/davydden/Desktop/work/C++/deal.ii-dft/build_debug~/dft on a real named MBP-Denis.fritz.box by davydden Tue Nov 3 07:02:47 2015 > [0]PETSC ERROR: Configure options CC=/usr/local/bin/mpicc CXX=/usr/local/bin/mpicxx F77=/usr/local/bin/mpif77 FC=/usr/local/bin/mpif90 --with-shared-libraries=1 --with-pthread=0 --with-openmp=0 --with-debugging=1 --with-ssl=0 --with-superlu_dist-include=/usr/local/opt/superlu_dist/include/superlu_dist --with-superlu_dist-lib="-L/usr/local/opt/superlu_dist/lib -lsuperlu_dist" --with-superlu-include=/usr/local/Cellar/superlu43/4.3/include/superlu --with-superlu-lib="-L/usr/local/Cellar/superlu43/4.3/lib -lsuperlu" --with-fftw-dir=/usr/local/opt/fftw --with-netcdf-dir=/usr/local/opt/netcdf --with-suitesparse-dir=/usr/local/opt/suite-sparse --with-hdf5-dir=/usr/local/opt/hdf5 --with-metis-dir=/usr/local/opt/metis --with-parmetis-dir=/usr/local/opt/parmetis --with-scalapack-dir=/usr/local/opt/scalapack --with-mumps-dir=/usr/local/opt/mumps --with-x=0 --prefix=/usr/local/Cellar/petsc/3.6.2/real --with-scalar-type=real --with-hypre-dir=/usr/local/opt/hypre --with-sundials-dir=/usr/local/opt/sundials --with-hwloc-dir=/usr/local/opt/hwloc > [0]PETSC ERROR: #8 User provided function() line 0 in unknown file > -------------------------------------------------------------------------- > mpirun noticed that process rank 3 with PID 96754 on node MBP-Denis exited on signal 6 (Abort trap: 6). > -------------------------------------------------------------------------- > From knepley at gmail.com Tue Nov 3 06:25:25 2015 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 3 Nov 2015 06:25:25 -0600 Subject: [petsc-users] How do I know it is steady state? In-Reply-To: <30E80E96-BC25-489D-8110-D81D9123ED19@mcs.anl.gov> References: <30E80E96-BC25-489D-8110-D81D9123ED19@mcs.anl.gov> Message-ID: On Mon, Nov 2, 2015 at 7:29 PM, Barry Smith wrote: > > > On Oct 30, 2015, at 12:23 PM, Zou (Non-US), Ling > wrote: > > > > Hi All, > > > > From physics point of view, I know my simulation converges if nothing > changes any more. > > > > I wonder how normally you do to detect if your simulation reaches steady > state from numerical point of view. > > Is it a good practice to use SNES convergence as a criterion, i.e., > > SNES converges and it takes 0 iteration(s) > > Depends on the time integrator and SNES tolerance you are using. If you > use a -snes_rtol 1.e-5 it will always try to squeeze 5 MORE digits out of > the residual so won't take 0 iterations even if there is only a small > change in the solution. > There are two different situations here: 1) Solving for a mathematical steady state. You remove the time derivative and solve the algebraic system with SNES. Then the SNES tolerance is a good measure. 2) Use timestepping to advance until nothing looks like it is changing. This is a "physical" steady state. You can use 1) with a timestepping preconditioner TSPSEUDO, which is what I would recommend if you want a true steady state. Thanks, Matt > > > > Thanks, > > > > Ling > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Nov 3 06:31:46 2015 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 3 Nov 2015 06:31:46 -0600 Subject: [petsc-users] CG+GAMG convergence issues in GHEP Krylov-Schur for some MPI runs In-Reply-To: <084F486B-00B4-4803-A95B-EE390EB25391@gmail.com> References: <084F486B-00B4-4803-A95B-EE390EB25391@gmail.com> Message-ID: On Tue, Nov 3, 2015 at 2:07 AM, Denis Davydov wrote: > Dear all, > > I experience strange convergence problems in SLEPc for GHEP with > Krylov-Schur and CG + GAMG. > The issue appears to be contingent on the number of MPI cores used. > Say for 8 cores there is no issue and for 4 cores there is an issue. > When I substitute GAMG with Jacobi for the problematic number of cores -- > all works. > > To be more specific, I solve Ax=\lambda Bx for a sequence of A?s where A > is a function of eigenvectors. > On each iteration step currently the eigensolver EPS is initialised from > scratch. And thus should the underlying > ST, KSP, PC objects: -st_ksp_type cg -st_pc_type gamg -st_ksp_rtol 1e-12. > For these particular matrices the issue appears on the 4th iteration, even > though the matrix to be inverted (mass/overlap matrix) > is the same, does not change!!!' > I assume the issue is the SEGV below? I agree with Jose that you need to run valgrind. An SEGV can result from memory corruption in a distant part of the code. This seems very likely to me since it is the same matrix coming in. Thanks, Matt > From my debuging info the A matrix has the same norm for CG + GAMG vs CG + > Jacobi cases: > DEAL:: frobenius_norm = 365.7 > DEAL:: linfty_norm = 19.87 > DEAL:: l1_norm = 19.87 > Just to be sure that there are no bugs on my side which would result in > different mass matrices i check that > it has the same norm for CG + GAMG vs CG + Jacobi BEFORE i start iteration: > DEAL:: frobenius_norm = 166.4 > DEAL:: linfty_norm = 8.342 > DEAL:: l1_norm = 8.342 > All the dependent scalar quantities I calculate on each iteration are > identical for the two cases, which makes me believe that > the solution path is the same up to the certain tolerance. > The only output which is slightly different are the number iterations for > convergence in EPS (e.g. 113 vs 108) and the > resulting maxing EPSComputeResidualNorm : 4.1524e-07 vs 2.9639e-08. > > > Any ideas what could be an issue, especially given the fact that it does > work for some numbers of cores and does not for other? > Perhaps some default settings in GAMG preconditioner? Although that does > not explain why it works for the first 3 iterations > and does not on 4th as the mass matrix is unchanged... > > Lastly, i suppose ideally i should keep the eigensolver context between > the iterations and just update the matrices by EPSSetOperators. > Is it correct to assume that since B matrix does not change between > iterations and I use the default shift transformation with zero shift > (operator is B^{-1)A ), the GAMG preconditioner will not be re-initialised > and thus I should save some time? > > p.s. the relevant error message is below. I have the same issues on CentOS > cluster, so it is not related to OS-X. > > Kind regards, > Denis > > > === > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the > batch system) has told this process to end > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [0]PETSC ERROR: or see > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS > X to find memory corruption errors > [0]PETSC ERROR: likely location of problem given in stack below > [0]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not > available, > [0]PETSC ERROR: INSTEAD the line number of the start of the function > [0]PETSC ERROR: is given. > [0]PETSC ERROR: [0] KSPSolve line 510 > /private/tmp/petsc20151102-50378-1t7b3in/petsc-3.6.2/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: [0] STMatSolve line 148 > /private/tmp/slepc20151102-3081-1xln4h0/slepc-3.6.1/src/sys/classes/st/interface/stsles.c > [0]PETSC ERROR: [0] STApply_Shift line 33 > /private/tmp/slepc20151102-3081-1xln4h0/slepc-3.6.1/src/sys/classes/st/impls/shift/shift.c > [0]PETSC ERROR: [0] STApply line 50 > /private/tmp/slepc20151102-3081-1xln4h0/slepc-3.6.1/src/sys/classes/st/interface/stsolve.c > [0]PETSC ERROR: [0] EPSGetStartVector line 726 > /private/tmp/slepc20151102-3081-1xln4h0/slepc-3.6.1/src/eps/interface/epssolve.c > [0]PETSC ERROR: [0] EPSSolve_KrylovSchur_Symm line 41 > /private/tmp/slepc20151102-3081-1xln4h0/slepc-3.6.1/src/eps/impls/krylov/krylovschur/ks-symm.c > [0]PETSC ERROR: [0] EPSSolve line 83 > /private/tmp/slepc20151102-3081-1xln4h0/slepc-3.6.1/src/eps/interface/epssolve.c > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Signal received > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.6.2, Oct, 02, 2015 > [0]PETSC ERROR: > /Users/davydden/Desktop/work/C++/deal.ii-dft/build_debug~/dft on a real > named MBP-Denis.fritz.box by davydden Tue Nov 3 07:02:47 2015 > [0]PETSC ERROR: Configure options CC=/usr/local/bin/mpicc > CXX=/usr/local/bin/mpicxx F77=/usr/local/bin/mpif77 > FC=/usr/local/bin/mpif90 --with-shared-libraries=1 --with-pthread=0 > --with-openmp=0 --with-debugging=1 --with-ssl=0 > --with-superlu_dist-include=/usr/local/opt/superlu_dist/include/superlu_dist > --with-superlu_dist-lib="-L/usr/local/opt/superlu_dist/lib -lsuperlu_dist" > --with-superlu-include=/usr/local/Cellar/superlu43/4.3/include/superlu > --with-superlu-lib="-L/usr/local/Cellar/superlu43/4.3/lib -lsuperlu" > --with-fftw-dir=/usr/local/opt/fftw --with-netcdf-dir=/usr/local/opt/netcdf > --with-suitesparse-dir=/usr/local/opt/suite-sparse > --with-hdf5-dir=/usr/local/opt/hdf5 --with-metis-dir=/usr/local/opt/metis > --with-parmetis-dir=/usr/local/opt/parmetis > --with-scalapack-dir=/usr/local/opt/scalapack > --with-mumps-dir=/usr/local/opt/mumps --with-x=0 > --prefix=/usr/local/Cellar/petsc/3.6.2/real --with-scalar-type=real > --with-hypre-dir=/usr/local/opt/hypre > --with-sundials-dir=/usr/local/opt/sundials > --with-hwloc-dir=/usr/local/opt/hwloc > [0]PETSC ERROR: #8 User provided function() line 0 in unknown file > -------------------------------------------------------------------------- > mpirun noticed that process rank 3 with PID 96754 on node MBP-Denis exited > on signal 6 (Abort trap: 6). > -------------------------------------------------------------------------- > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From davydden at gmail.com Tue Nov 3 06:32:27 2015 From: davydden at gmail.com (Denis Davydov) Date: Tue, 3 Nov 2015 13:32:27 +0100 Subject: [petsc-users] CG+GAMG convergence issues in GHEP Krylov-Schur for some MPI runs In-Reply-To: <2724FD04-9FED-43B8-B8A4-9F66BB8BD43B@dsic.upv.es> References: <084F486B-00B4-4803-A95B-EE390EB25391@gmail.com> <2724FD04-9FED-43B8-B8A4-9F66BB8BD43B@dsic.upv.es> Message-ID: <4E4A4E38-B9C3-4F41-8D52-1AFF3E4E8C8C@gmail.com> Hi Jose, > On 3 Nov 2015, at 12:20, Jose E. Roman wrote: > > I am answering the SLEPc-related questions: > - Having different number of iterations when changing the number of processes is normal. the change in iterations i mentioned are for different preconditioners, but the same number of MPI processes. > - Yes, if you do not destroy the EPS solver, then the preconditioner would be reused. > > Regarding the segmentation fault, I have no clue. Not sure if this is related to GAMG or not. Maybe running under valgrind could provide more information. will try that. Denis. From zonexo at gmail.com Tue Nov 3 06:49:06 2015 From: zonexo at gmail.com (TAY wee-beng) Date: Tue, 3 Nov 2015 20:49:06 +0800 Subject: [petsc-users] Scaling with number of cores In-Reply-To: <589BE4CF-18B5-4F04-A442-7FB0F2F3AFC1@mcs.anl.gov> References: <5634EDA4.5030203@gmail.com> <56356E2A.8090000@gmail.com> <0BEDF334-C0C6-49D1-9FE8-D178181C5F00@mcs.anl.gov> <5636140A.3040506@gmail.com> <61EEB3FD-A093-475C-9841-9E79BB120385@mcs.anl.gov> <5636BDF3.8000109@gmail.com> <5636E059.2010107@gmail.com> <501DC517-2887-4222-9ECD-9951D521A4E5@mcs.anl.gov> <56370085.7070502@gmail.com> <6FF57BB6-FB0D-49F0-BF84-E4EA760FC010@mcs.anl.gov> <56372A12.90900@gmail.com> <563839F8.9080000@gmail.com> <589BE4CF-18B5-4F04-A442-7FB0F2F3AFC1@mcs.anl.gov> Message-ID: <5638AD42.9060609@gmail.com> Hi, I tried and have attached the log. Ya, my Poisson eqn has Neumann boundary condition. Do I need to specify some null space stuff? Like KSPSetNullSpace or MatNullSpaceCreate? Thank you Yours sincerely, TAY wee-beng On 3/11/2015 12:45 PM, Barry Smith wrote: >> On Nov 2, 2015, at 10:37 PM, TAY wee-beng wrote: >> >> Hi, >> >> I tried : >> >> 1. -poisson_pc_gamg_agg_nsmooths 1 -poisson_pc_type gamg >> >> 2. -poisson_pc_type gamg > Run with -poisson_ksp_monitor_true_residual -poisson_ksp_monitor_converged_reason > Does your poisson have Neumann boundary conditions? Do you have any zeros on the diagonal for the matrix (you shouldn't). > > There may be something wrong with your poisson discretization that was also messing up hypre > > > >> Both options give: >> >> 1 0.00150000 0.00000000 0.00000000 1.00000000 NaN NaN NaN >> M Diverged but why?, time = 2 >> reason = -9 >> >> How can I check what's wrong? >> >> Thank you >> >> Yours sincerely, >> >> TAY wee-beng >> >> On 3/11/2015 3:18 AM, Barry Smith wrote: >>> hypre is just not scaling well here. I do not know why. Since hypre is a block box for us there is no way to determine why the poor scaling. >>> >>> If you make the same two runs with -pc_type gamg there will be a lot more information in the log summary about in what routines it is scaling well or poorly. >>> >>> Barry >>> >>> >>> >>>> On Nov 2, 2015, at 3:17 AM, TAY wee-beng wrote: >>>> >>>> Hi, >>>> >>>> I have attached the 2 files. >>>> >>>> Thank you >>>> >>>> Yours sincerely, >>>> >>>> TAY wee-beng >>>> >>>> On 2/11/2015 2:55 PM, Barry Smith wrote: >>>>> Run (158/2)x(266/2)x(150/2) grid on 8 processes and then (158)x(266)x(150) on 64 processors and send the two -log_summary results >>>>> >>>>> Barry >>>>> >>>>> >>>>>> On Nov 2, 2015, at 12:19 AM, TAY wee-beng wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I have attached the new results. >>>>>> >>>>>> Thank you >>>>>> >>>>>> Yours sincerely, >>>>>> >>>>>> TAY wee-beng >>>>>> >>>>>> On 2/11/2015 12:27 PM, Barry Smith wrote: >>>>>>> Run without the -momentum_ksp_view -poisson_ksp_view and send the new results >>>>>>> >>>>>>> >>>>>>> You can see from the log summary that the PCSetUp is taking a much smaller percentage of the time meaning that it is reusing the preconditioner and not rebuilding it each time. >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>> Something makes no sense with the output: it gives >>>>>>> >>>>>>> KSPSolve 199 1.0 2.3298e+03 1.0 5.20e+09 1.8 3.8e+04 9.9e+05 5.0e+02 90100 66100 24 90100 66100 24 165 >>>>>>> >>>>>>> 90% of the time is in the solve but there is no significant amount of time in other events of the code which is just not possible. I hope it is due to your IO. >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Nov 1, 2015, at 10:02 PM, TAY wee-beng wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I have attached the new run with 100 time steps for 48 and 96 cores. >>>>>>>> >>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >>>>>>>> >>>>>>>> Why does the number of processes increase so much? Is there something wrong with my coding? Seems to be so too for my new run. >>>>>>>> >>>>>>>> Thank you >>>>>>>> >>>>>>>> Yours sincerely, >>>>>>>> >>>>>>>> TAY wee-beng >>>>>>>> >>>>>>>> On 2/11/2015 9:49 AM, Barry Smith wrote: >>>>>>>>> If you are doing many time steps with the same linear solver then you MUST do your weak scaling studies with MANY time steps since the setup time of AMG only takes place in the first stimestep. So run both 48 and 96 processes with the same large number of time steps. >>>>>>>>> >>>>>>>>> Barry >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Nov 1, 2015, at 7:35 PM, TAY wee-beng wrote: >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> Sorry I forgot and use the old a.out. I have attached the new log for 48cores (log48), together with the 96cores log (log96). >>>>>>>>>> >>>>>>>>>> Why does the number of processes increase so much? Is there something wrong with my coding? >>>>>>>>>> >>>>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >>>>>>>>>> >>>>>>>>>> Lastly, I only simulated 2 time steps previously. Now I run for 10 timesteps (log48_10). Is it building the preconditioner at every timestep? >>>>>>>>>> >>>>>>>>>> Also, what about momentum eqn? Is it working well? >>>>>>>>>> >>>>>>>>>> I will try the gamg later too. >>>>>>>>>> >>>>>>>>>> Thank you >>>>>>>>>> >>>>>>>>>> Yours sincerely, >>>>>>>>>> >>>>>>>>>> TAY wee-beng >>>>>>>>>> >>>>>>>>>> On 2/11/2015 12:30 AM, Barry Smith wrote: >>>>>>>>>>> You used gmres with 48 processes but richardson with 96. You need to be careful and make sure you don't change the solvers when you change the number of processors since you can get very different inconsistent results >>>>>>>>>>> >>>>>>>>>>> Anyways all the time is being spent in the BoomerAMG algebraic multigrid setup and it is is scaling badly. When you double the problem size and number of processes it went from 3.2445e+01 to 4.3599e+02 seconds. >>>>>>>>>>> >>>>>>>>>>> PCSetUp 3 1.0 3.2445e+01 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 62 8 0 0 4 62 8 0 0 5 11 >>>>>>>>>>> >>>>>>>>>>> PCSetUp 3 1.0 4.3599e+02 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 85 18 0 0 6 85 18 0 0 6 2 >>>>>>>>>>> >>>>>>>>>>> Now is the Poisson problem changing at each timestep or can you use the same preconditioner built with BoomerAMG for all the time steps? Algebraic multigrid has a large set up time that you often doesn't matter if you have many time steps but if you have to rebuild it each timestep it is too large? >>>>>>>>>>> >>>>>>>>>>> You might also try -pc_type gamg and see how PETSc's algebraic multigrid scales for your problem/machine. >>>>>>>>>>> >>>>>>>>>>> Barry >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On Nov 1, 2015, at 7:30 AM, TAY wee-beng wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 1/11/2015 10:00 AM, Barry Smith wrote: >>>>>>>>>>>>>> On Oct 31, 2015, at 8:43 PM, TAY wee-beng wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 1/11/2015 12:47 AM, Matthew Knepley wrote: >>>>>>>>>>>>>>> On Sat, Oct 31, 2015 at 11:34 AM, TAY wee-beng wrote: >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I understand that as mentioned in the faq, due to the limitations in memory, the scaling is not linear. So, I am trying to write a proposal to use a supercomputer. >>>>>>>>>>>>>>> Its specs are: >>>>>>>>>>>>>>> Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of memory per node) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 8 cores / processor >>>>>>>>>>>>>>> Interconnect: Tofu (6-dimensional mesh/torus) Interconnect >>>>>>>>>>>>>>> Each cabinet contains 96 computing nodes, >>>>>>>>>>>>>>> One of the requirement is to give the performance of my current code with my current set of data, and there is a formula to calculate the estimated parallel efficiency when using the new large set of data >>>>>>>>>>>>>>> There are 2 ways to give performance: >>>>>>>>>>>>>>> 1. Strong scaling, which is defined as how the elapsed time varies with the number of processors for a fixed >>>>>>>>>>>>>>> problem. >>>>>>>>>>>>>>> 2. Weak scaling, which is defined as how the elapsed time varies with the number of processors for a >>>>>>>>>>>>>>> fixed problem size per processor. >>>>>>>>>>>>>>> I ran my cases with 48 and 96 cores with my current cluster, giving 140 and 90 mins respectively. This is classified as strong scaling. >>>>>>>>>>>>>>> Cluster specs: >>>>>>>>>>>>>>> CPU: AMD 6234 2.4GHz >>>>>>>>>>>>>>> 8 cores / processor (CPU) >>>>>>>>>>>>>>> 6 CPU / node >>>>>>>>>>>>>>> So 48 Cores / CPU >>>>>>>>>>>>>>> Not sure abt the memory / node >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The parallel efficiency ?En? for a given degree of parallelism ?n? indicates how much the program is >>>>>>>>>>>>>>> efficiently accelerated by parallel processing. ?En? is given by the following formulae. Although their >>>>>>>>>>>>>>> derivation processes are different depending on strong and weak scaling, derived formulae are the >>>>>>>>>>>>>>> same. >>>>>>>>>>>>>>> From the estimated time, my parallel efficiency using Amdahl's law on the current old cluster was 52.7%. >>>>>>>>>>>>>>> So is my results acceptable? >>>>>>>>>>>>>>> For the large data set, if using 2205 nodes (2205X8cores), my expected parallel efficiency is only 0.5%. The proposal recommends value of > 50%. >>>>>>>>>>>>>>> The problem with this analysis is that the estimated serial fraction from Amdahl's Law changes as a function >>>>>>>>>>>>>>> of problem size, so you cannot take the strong scaling from one problem and apply it to another without a >>>>>>>>>>>>>>> model of this dependence. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Weak scaling does model changes with problem size, so I would measure weak scaling on your current >>>>>>>>>>>>>>> cluster, and extrapolate to the big machine. I realize that this does not make sense for many scientific >>>>>>>>>>>>>>> applications, but neither does requiring a certain parallel efficiency. >>>>>>>>>>>>>> Ok I check the results for my weak scaling it is even worse for the expected parallel efficiency. From the formula used, it's obvious it's doing some sort of exponential extrapolation decrease. So unless I can achieve a near > 90% speed up when I double the cores and problem size for my current 48/96 cores setup, extrapolating from about 96 nodes to 10,000 nodes will give a much lower expected parallel efficiency for the new case. >>>>>>>>>>>>>> >>>>>>>>>>>>>> However, it's mentioned in the FAQ that due to memory requirement, it's impossible to get >90% speed when I double the cores and problem size (ie linear increase in performance), which means that I can't get >90% speed up when I double the cores and problem size for my current 48/96 cores setup. Is that so? >>>>>>>>>>>>> What is the output of -ksp_view -log_summary on the problem and then on the problem doubled in size and number of processors? >>>>>>>>>>>>> >>>>>>>>>>>>> Barry >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> I have attached the output >>>>>>>>>>>> >>>>>>>>>>>> 48 cores: log48 >>>>>>>>>>>> 96 cores: log96 >>>>>>>>>>>> >>>>>>>>>>>> There are 2 solvers - The momentum linear eqn uses bcgs, while the Poisson eqn uses hypre BoomerAMG. >>>>>>>>>>>> >>>>>>>>>>>> Problem size doubled from 158x266x150 to 158x266x300. >>>>>>>>>>>>>> So is it fair to say that the main problem does not lie in my programming skills, but rather the way the linear equations are solved? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Matt >>>>>>>>>>>>>>> Is it possible for this type of scaling in PETSc (>50%), when using 17640 (2205X8) cores? >>>>>>>>>>>>>>> Btw, I do not have access to the system. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Sent using CloudMagic Email >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>> >>>> -------------- next part -------------- -------------------------------------------------------------------------- [[25197,1],4]: A high-performance Open MPI point-to-point messaging module was unable to find any relevant network interfaces: Module: OpenFabrics (openib) Host: n12-02 Another transport will be used instead, although this may result in lower performance. -------------------------------------------------------------------------- 0.000000000000000E+000 0.353000000000000 0.000000000000000E+000 90.0000000000000 0.000000000000000E+000 0.000000000000000E+000 1.00000000000000 0.400000000000000 0 -400000 AB,AA,BB -2.00050000002375 2.00050000002375 2.61200002906844 2.53550002543489 size_x,size_y,size_z 79 133 75 body_cg_ini 0.523700833348298 0.778648765134454 7.03282656467989 Warning - length difference between element and cell max_element_length,min_element_length,min_delta 0.000000000000000E+000 10000000000.0000 4.300000000000000E-002 maximum ngh_surfaces and ngh_vertics are 149 68 minimum ngh_surfaces and ngh_vertics are 54 22 body_cg_ini 0.896813342835977 -0.976707581163755 7.03282656467989 Warning - length difference between element and cell max_element_length,min_element_length,min_delta 0.000000000000000E+000 10000000000.0000 4.300000000000000E-002 maximum ngh_surfaces and ngh_vertics are 149 68 minimum ngh_surfaces and ngh_vertics are 54 22 min IIB_cell_no 0 max IIB_cell_no 265 final initial IIB_cell_no 1325 min I_cell_no 0 max I_cell_no 94 final initial I_cell_no 470 size(IIB_cell_u),size(I_cell_u),size(IIB_equal_cell_u),size(I_equal_cell_u) 1325 470 1325 470 IIB_I_cell_no_uvw_total1 265 270 255 94 91 95 IIB_I_cell_no_uvw_total2 273 280 267 97 94 98 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 1.678498777182e+03 true resid norm 2.123983986436e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.350343925107e+02 true resid norm 9.147383437308e-01 ||r(i)||/||b|| 4.306710170945e-02 2 KSP preconditioned resid norm 4.471579782509e+01 true resid norm 7.513793497533e-02 ||r(i)||/||b|| 3.537594231179e-03 3 KSP preconditioned resid norm 1.073330756059e+01 true resid norm 1.090215770399e-02 ||r(i)||/||b|| 5.132881308716e-04 4 KSP preconditioned resid norm 2.949234784674e+00 true resid norm 2.391637504235e-03 ||r(i)||/||b|| 1.126014847338e-04 5 KSP preconditioned resid norm 8.648961034707e-01 true resid norm 6.033562244592e-04 ||r(i)||/||b|| 2.840681607358e-05 6 KSP preconditioned resid norm 2.610725758706e-01 true resid norm 1.591221044985e-04 ||r(i)||/||b|| 7.491680987930e-06 7 KSP preconditioned resid norm 7.979574646710e-02 true resid norm 4.287560296087e-05 ||r(i)||/||b|| 2.018640594029e-06 8 KSP preconditioned resid norm 2.451830505638e-02 true resid norm 1.173408327840e-05 ||r(i)||/||b|| 5.524562969086e-07 9 KSP preconditioned resid norm 7.549360596120e-03 true resid norm 3.257120962443e-06 ||r(i)||/||b|| 1.533496007146e-07 1 0.00150000 0.14647307 0.14738629 1.08799982 0.19042331E+02 0.17694812E+00 0.78750669E+06 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 4.104947230977e+02 true resid norm 5.281418042914e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.930661367232e+01 true resid norm 1.010817808071e+00 ||r(i)||/||b|| 1.913913649436e-02 2 KSP preconditioned resid norm 2.952959662686e+00 true resid norm 4.964610570452e-02 ||r(i)||/||b|| 9.400146949384e-04 3 KSP preconditioned resid norm 4.029733209724e-01 true resid norm 3.218570506209e-03 ||r(i)||/||b|| 6.094140778208e-05 4 KSP preconditioned resid norm 7.054873087480e-02 true resid norm 3.231841592353e-04 ||r(i)||/||b|| 6.119268662493e-06 5 KSP preconditioned resid norm 1.531632953418e-02 true resid norm 5.294497307913e-05 ||r(i)||/||b|| 1.002476468421e-06 6 KSP preconditioned resid norm 3.956719065028e-03 true resid norm 1.083434318852e-05 ||r(i)||/||b|| 2.051407993172e-07 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 3.420306071229e+02 true resid norm 4.777262987644e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.428376023450e+01 true resid norm 8.435502776376e-01 ||r(i)||/||b|| 1.765760603549e-02 2 KSP preconditioned resid norm 2.564213461836e+00 true resid norm 4.062413787010e-02 ||r(i)||/||b|| 8.503642771012e-04 3 KSP preconditioned resid norm 4.100273414880e-01 true resid norm 2.573317457767e-03 ||r(i)||/||b|| 5.386593671780e-05 4 KSP preconditioned resid norm 8.958898167542e-02 true resid norm 2.691660582452e-04 ||r(i)||/||b|| 5.634315275114e-06 5 KSP preconditioned resid norm 2.327109356318e-02 true resid norm 4.752826596451e-05 ||r(i)||/||b|| 9.948848553541e-07 6 KSP preconditioned resid norm 6.587988976862e-03 true resid norm 1.017397977336e-05 ||r(i)||/||b|| 2.129667091738e-07 7 KSP preconditioned resid norm 1.945171782222e-03 true resid norm 2.309441092248e-06 ||r(i)||/||b|| 4.834234787201e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.972092856822e+02 true resid norm 4.331381247139e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.125636929385e+01 true resid norm 7.231693673840e-01 ||r(i)||/||b|| 1.669604512098e-02 2 KSP preconditioned resid norm 2.307022456820e+00 true resid norm 3.470289137234e-02 ||r(i)||/||b|| 8.011968790618e-04 3 KSP preconditioned resid norm 3.894052849080e-01 true resid norm 2.199991810076e-03 ||r(i)||/||b|| 5.079192258889e-05 4 KSP preconditioned resid norm 8.966151573963e-02 true resid norm 2.339821487464e-04 ||r(i)||/||b|| 5.402021558387e-06 5 KSP preconditioned resid norm 2.409076065741e-02 true resid norm 4.198022448829e-05 ||r(i)||/||b|| 9.692110228352e-07 6 KSP preconditioned resid norm 6.946768360801e-03 true resid norm 9.091333654905e-06 ||r(i)||/||b|| 2.098945610228e-07 7 KSP preconditioned resid norm 2.070683142546e-03 true resid norm 2.092665569576e-06 ||r(i)||/||b|| 4.831404695576e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.629657400932e+02 true resid norm 3.956358392949e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.887556876142e+01 true resid norm 6.327939570295e-01 ||r(i)||/||b|| 1.599435375110e-02 2 KSP preconditioned resid norm 2.054176697024e+00 true resid norm 3.033147721793e-02 ||r(i)||/||b|| 7.666514052920e-04 3 KSP preconditioned resid norm 3.464068550624e-01 true resid norm 1.924667356854e-03 ||r(i)||/||b|| 4.864744711408e-05 4 KSP preconditioned resid norm 7.952705467505e-02 true resid norm 2.057541836205e-04 ||r(i)||/||b|| 5.200595173258e-06 5 KSP preconditioned resid norm 2.131517044421e-02 true resid norm 3.694790401596e-05 ||r(i)||/||b|| 9.338866792707e-07 6 KSP preconditioned resid norm 6.136276802982e-03 true resid norm 7.991827005621e-06 ||r(i)||/||b|| 2.019995716228e-07 7 KSP preconditioned resid norm 1.827093511736e-03 true resid norm 1.838038505251e-06 ||r(i)||/||b|| 4.645783629023e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.378307965377e+02 true resid norm 3.648733311990e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.710190896669e+01 true resid norm 5.649368936947e-01 ||r(i)||/||b|| 1.548309633478e-02 2 KSP preconditioned resid norm 1.822255743455e+00 true resid norm 2.717599208239e-02 ||r(i)||/||b|| 7.448062041992e-04 3 KSP preconditioned resid norm 2.885962304638e-01 true resid norm 1.732973401584e-03 ||r(i)||/||b|| 4.749520596337e-05 4 KSP preconditioned resid norm 6.183154741675e-02 true resid norm 1.819458565598e-04 ||r(i)||/||b|| 4.986548508817e-06 5 KSP preconditioned resid norm 1.582554421127e-02 true resid norm 3.168007020311e-05 ||r(i)||/||b|| 8.682484438915e-07 6 KSP preconditioned resid norm 4.443321996051e-03 true resid norm 6.710659949979e-06 ||r(i)||/||b|| 1.839175235945e-07 7 KSP preconditioned resid norm 1.306049292085e-03 true resid norm 1.516410531721e-06 ||r(i)||/||b|| 4.155991688232e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.145295333178e+02 true resid norm 3.362186245560e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.521833780773e+01 true resid norm 5.080729889584e-01 ||r(i)||/||b|| 1.511138740839e-02 2 KSP preconditioned resid norm 1.540203633479e+00 true resid norm 2.447409373579e-02 ||r(i)||/||b|| 7.279220111053e-04 3 KSP preconditioned resid norm 2.186762828066e-01 true resid norm 1.564982470977e-03 ||r(i)||/||b|| 4.654657287483e-05 4 KSP preconditioned resid norm 4.054357120803e-02 true resid norm 1.639731139827e-04 ||r(i)||/||b|| 4.876978906186e-06 5 KSP preconditioned resid norm 9.084232214911e-03 true resid norm 2.819050882959e-05 ||r(i)||/||b|| 8.384576811238e-07 6 KSP preconditioned resid norm 2.323003412898e-03 true resid norm 5.861572928863e-06 ||r(i)||/||b|| 1.743381389595e-07 7 KSP preconditioned resid norm 6.459217074927e-04 true resid norm 1.291487117742e-06 ||r(i)||/||b|| 3.841212304784e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 1.958392379201e+02 true resid norm 3.141889323849e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.408463611475e+01 true resid norm 4.595554429637e-01 ||r(i)||/||b|| 1.462672282806e-02 2 KSP preconditioned resid norm 1.517929147841e+00 true resid norm 2.200153509672e-02 ||r(i)||/||b|| 7.002644851210e-04 3 KSP preconditioned resid norm 2.512214122292e-01 true resid norm 1.400636574302e-03 ||r(i)||/||b|| 4.457943708171e-05 4 KSP preconditioned resid norm 5.684390461381e-02 true resid norm 1.510679173961e-04 ||r(i)||/||b|| 4.808187107336e-06 5 KSP preconditioned resid norm 1.511727527840e-02 true resid norm 2.710164250127e-05 ||r(i)||/||b|| 8.625906169116e-07 6 KSP preconditioned resid norm 4.335174856056e-03 true resid norm 5.834206129224e-06 ||r(i)||/||b|| 1.856910135230e-07 7 KSP preconditioned resid norm 1.288245809697e-03 true resid norm 1.335866527435e-06 ||r(i)||/||b|| 4.251793713084e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 1.820352733618e+02 true resid norm 3.073155059512e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.316836534494e+01 true resid norm 4.239722750302e-01 ||r(i)||/||b|| 1.379599359030e-02 2 KSP preconditioned resid norm 1.444071686051e+00 true resid norm 2.026722374984e-02 ||r(i)||/||b|| 6.594923899823e-04 3 KSP preconditioned resid norm 2.424633999719e-01 true resid norm 1.292246225044e-03 ||r(i)||/||b|| 4.204949636511e-05 4 KSP preconditioned resid norm 5.503779257978e-02 true resid norm 1.407623934665e-04 ||r(i)||/||b|| 4.580386955447e-06 5 KSP preconditioned resid norm 1.461603474483e-02 true resid norm 2.545258238226e-05 ||r(i)||/||b|| 8.282231742094e-07 6 KSP preconditioned resid norm 4.183704633554e-03 true resid norm 5.497313634594e-06 ||r(i)||/||b|| 1.788817527310e-07 7 KSP preconditioned resid norm 1.241559608635e-03 true resid norm 1.260521620917e-06 ||r(i)||/||b|| 4.101718255367e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 1.647906925076e+02 true resid norm 2.901694020337e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.179687080578e+01 true resid norm 3.825956990419e-01 ||r(i)||/||b|| 1.318525304048e-02 2 KSP preconditioned resid norm 1.242164084005e+00 true resid norm 1.828668514657e-02 ||r(i)||/||b|| 6.302072175220e-04 3 KSP preconditioned resid norm 1.938659650601e-01 true resid norm 1.164821593402e-03 ||r(i)||/||b|| 4.014281262041e-05 4 KSP preconditioned resid norm 4.107133923342e-02 true resid norm 1.253776048671e-04 ||r(i)||/||b|| 4.320841687249e-06 5 KSP preconditioned resid norm 1.042749388794e-02 true resid norm 2.226382821379e-05 ||r(i)||/||b|| 7.672700173674e-07 6 KSP preconditioned resid norm 2.911838969918e-03 true resid norm 4.729087344538e-06 ||r(i)||/||b|| 1.629767753386e-07 7 KSP preconditioned resid norm 8.530429563637e-04 true resid norm 1.065059408484e-06 ||r(i)||/||b|| 3.670474560787e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 1.541373151856e+02 true resid norm 2.897719736622e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.102941561433e+01 true resid norm 3.556156422028e-01 ||r(i)||/||b|| 1.227225800026e-02 2 KSP preconditioned resid norm 1.147292721889e+00 true resid norm 1.701217601352e-02 ||r(i)||/||b|| 5.870883853437e-04 3 KSP preconditioned resid norm 1.747985531740e-01 true resid norm 1.089011249732e-03 ||r(i)||/||b|| 3.758166243507e-05 4 KSP preconditioned resid norm 3.629728329991e-02 true resid norm 1.169220687546e-04 ||r(i)||/||b|| 4.034968160549e-06 5 KSP preconditioned resid norm 9.120353742156e-03 true resid norm 2.052432646255e-05 ||r(i)||/||b|| 7.082923238973e-07 6 KSP preconditioned resid norm 2.536543625934e-03 true resid norm 4.326773211928e-06 ||r(i)||/||b|| 1.493164834834e-07 7 KSP preconditioned resid norm 7.421921360317e-04 true resid norm 9.706344843151e-07 ||r(i)||/||b|| 3.349649284739e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 1.402882015246e+02 true resid norm 2.629761832838e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 9.997724720205e+00 true resid norm 3.245590429518e-01 ||r(i)||/||b|| 1.234176566482e-02 2 KSP preconditioned resid norm 1.025672935887e+00 true resid norm 1.557791362214e-02 ||r(i)||/||b|| 5.923697510405e-04 3 KSP preconditioned resid norm 1.535317951334e-01 true resid norm 9.977613667856e-04 ||r(i)||/||b|| 3.794113042202e-05 4 KSP preconditioned resid norm 3.150283687656e-02 true resid norm 1.058916899617e-04 ||r(i)||/||b|| 4.026664644662e-06 5 KSP preconditioned resid norm 7.870322255682e-03 true resid norm 1.834747704849e-05 ||r(i)||/||b|| 6.976858824013e-07 6 KSP preconditioned resid norm 2.183496128742e-03 true resid norm 3.843169881527e-06 ||r(i)||/||b|| 1.461413666263e-07 7 KSP preconditioned resid norm 6.382250562470e-04 true resid norm 8.591963757110e-07 ||r(i)||/||b|| 3.267202242355e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 1.283300845917e+02 true resid norm 2.442650096060e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 9.201215379584e+00 true resid norm 2.971660648680e-01 ||r(i)||/||b|| 1.216572383197e-02 2 KSP preconditioned resid norm 9.587957055096e-01 true resid norm 1.424107815052e-02 ||r(i)||/||b|| 5.830175256574e-04 3 KSP preconditioned resid norm 1.453811408070e-01 true resid norm 9.093615375182e-04 ||r(i)||/||b|| 3.722848143437e-05 4 KSP preconditioned resid norm 2.984978540941e-02 true resid norm 9.612488190780e-05 ||r(i)||/||b|| 3.935270224044e-06 5 KSP preconditioned resid norm 7.418946546413e-03 true resid norm 1.664413492623e-05 ||r(i)||/||b|| 6.813966090797e-07 6 KSP preconditioned resid norm 2.048171595654e-03 true resid norm 3.487964386969e-06 ||r(i)||/||b|| 1.427942705587e-07 7 KSP preconditioned resid norm 5.966347574758e-04 true resid norm 7.800511968971e-07 ||r(i)||/||b|| 3.193462699203e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 1.178216646076e+02 true resid norm 2.347344510009e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 8.375257621677e+00 true resid norm 2.722448317697e-01 ||r(i)||/||b|| 1.159799214001e-02 2 KSP preconditioned resid norm 8.588236590644e-01 true resid norm 1.299321062748e-02 ||r(i)||/||b|| 5.535280642478e-04 3 KSP preconditioned resid norm 1.264099287701e-01 true resid norm 8.252312404541e-04 ||r(i)||/||b|| 3.515594907076e-05 4 KSP preconditioned resid norm 2.501513839566e-02 true resid norm 8.716292721462e-05 ||r(i)||/||b|| 3.713256696789e-06 5 KSP preconditioned resid norm 6.027509834236e-03 true resid norm 1.511951118476e-05 ||r(i)||/||b|| 6.441112976937e-07 6 KSP preconditioned resid norm 1.631235191881e-03 true resid norm 3.160102647078e-06 ||r(i)||/||b|| 1.346245782672e-07 7 KSP preconditioned resid norm 4.699101910099e-04 true resid norm 7.023075500576e-07 ||r(i)||/||b|| 2.991923627159e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 1.101699090169e+02 true resid norm 2.271756941147e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 7.844714294933e+00 true resid norm 2.545750537301e-01 ||r(i)||/||b|| 1.120608675687e-02 2 KSP preconditioned resid norm 8.085157003061e-01 true resid norm 1.215246780615e-02 ||r(i)||/||b|| 5.349369726154e-04 3 KSP preconditioned resid norm 1.187317786881e-01 true resid norm 7.704668963293e-04 ||r(i)||/||b|| 3.391502331848e-05 4 KSP preconditioned resid norm 2.317147312501e-02 true resid norm 8.040436173546e-05 ||r(i)||/||b|| 3.539303007251e-06 5 KSP preconditioned resid norm 5.495201634895e-03 true resid norm 1.378929421373e-05 ||r(i)||/||b|| 6.069880964803e-07 6 KSP preconditioned resid norm 1.470955494886e-03 true resid norm 2.869794131672e-06 ||r(i)||/||b|| 1.263248756807e-07 7 KSP preconditioned resid norm 4.211857610045e-04 true resid norm 6.367685510275e-07 ||r(i)||/||b|| 2.802978344620e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 1.031630778115e+02 true resid norm 2.155449752249e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 7.304823826349e+00 true resid norm 2.377327398314e-01 ||r(i)||/||b|| 1.102937981196e-02 2 KSP preconditioned resid norm 7.327545910834e-01 true resid norm 1.136486567555e-02 ||r(i)||/||b|| 5.272619166226e-04 3 KSP preconditioned resid norm 9.951635340299e-02 true resid norm 7.239911061861e-04 ||r(i)||/||b|| 3.358886494249e-05 4 KSP preconditioned resid norm 1.682136846344e-02 true resid norm 7.542366872347e-05 ||r(i)||/||b|| 3.499207933044e-06 5 KSP preconditioned resid norm 3.303240583076e-03 true resid norm 1.277804105139e-05 ||r(i)||/||b|| 5.928248170971e-07 6 KSP preconditioned resid norm 7.349737360535e-04 true resid norm 2.621240286592e-06 ||r(i)||/||b|| 1.216098999226e-07 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 9.948167224649e+01 true resid norm 2.168218121988e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 7.057300314112e+00 true resid norm 2.291670217337e-01 ||r(i)||/||b|| 1.056937119977e-02 2 KSP preconditioned resid norm 7.046686251152e-01 true resid norm 1.101783881879e-02 ||r(i)||/||b|| 5.081517725112e-04 3 KSP preconditioned resid norm 9.415273979324e-02 true resid norm 7.132997415800e-04 ||r(i)||/||b|| 3.289796973590e-05 4 KSP preconditioned resid norm 1.551805572416e-02 true resid norm 7.440988464738e-05 ||r(i)||/||b|| 3.431844974118e-06 5 KSP preconditioned resid norm 2.942186771019e-03 true resid norm 1.237269291722e-05 ||r(i)||/||b|| 5.706387559326e-07 6 KSP preconditioned resid norm 6.260498929470e-04 true resid norm 2.507915793873e-06 ||r(i)||/||b|| 1.156671355359e-07 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 9.262349209093e+01 true resid norm 2.055792414931e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 6.555321848570e+00 true resid norm 2.137203445716e-01 ||r(i)||/||b|| 1.039600803172e-02 2 KSP preconditioned resid norm 6.664045892829e-01 true resid norm 1.020467221046e-02 ||r(i)||/||b|| 4.963863148998e-04 3 KSP preconditioned resid norm 9.629845241478e-02 true resid norm 6.502200642501e-04 ||r(i)||/||b|| 3.162868291213e-05 4 KSP preconditioned resid norm 1.862875000957e-02 true resid norm 6.785485389817e-05 ||r(i)||/||b|| 3.300666614263e-06 5 KSP preconditioned resid norm 4.403130421939e-03 true resid norm 1.154162622648e-05 ||r(i)||/||b|| 5.614198273453e-07 6 KSP preconditioned resid norm 1.176793752371e-03 true resid norm 2.386472662832e-06 ||r(i)||/||b|| 1.160852937047e-07 7 KSP preconditioned resid norm 3.365853813200e-04 true resid norm 5.274116254371e-07 ||r(i)||/||b|| 2.565490667281e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 8.693698331042e+01 true resid norm 1.958393820643e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 6.150168211009e+00 true resid norm 2.015506318834e-01 ||r(i)||/||b|| 1.029162928104e-02 2 KSP preconditioned resid norm 6.388979074654e-01 true resid norm 9.604266828299e-03 ||r(i)||/||b|| 4.904154990207e-04 3 KSP preconditioned resid norm 9.817619557123e-02 true resid norm 6.117523116381e-04 ||r(i)||/||b|| 3.123745107800e-05 4 KSP preconditioned resid norm 2.069382902410e-02 true resid norm 6.421388289967e-05 ||r(i)||/||b|| 3.278905510362e-06 5 KSP preconditioned resid norm 5.265401427292e-03 true resid norm 1.105430288784e-05 ||r(i)||/||b|| 5.644576066015e-07 6 KSP preconditioned resid norm 1.475267405481e-03 true resid norm 2.316844999492e-06 ||r(i)||/||b|| 1.183033246464e-07 7 KSP preconditioned resid norm 4.331879847381e-04 true resid norm 5.204924428437e-07 ||r(i)||/||b|| 2.657751660351e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 8.112784869858e+01 true resid norm 1.892382018582e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 5.712614643333e+00 true resid norm 1.875796419053e-01 ||r(i)||/||b|| 9.912355965304e-03 2 KSP preconditioned resid norm 5.667577313428e-01 true resid norm 8.967804004244e-03 ||r(i)||/||b|| 4.738897281936e-04 3 KSP preconditioned resid norm 7.473002285292e-02 true resid norm 5.739150468959e-04 ||r(i)||/||b|| 3.032765272869e-05 4 KSP preconditioned resid norm 1.192788398891e-02 true resid norm 5.909621942051e-05 ||r(i)||/||b|| 3.122848285400e-06 5 KSP preconditioned resid norm 2.118143540732e-03 true resid norm 9.797990127139e-06 ||r(i)||/||b|| 5.177596294473e-07 6 KSP preconditioned resid norm 4.023312994478e-04 true resid norm 1.982577734446e-06 ||r(i)||/||b|| 1.047662530598e-07 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 7.678752608675e+01 true resid norm 1.787223654359e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 5.402532307566e+00 true resid norm 1.788603523803e-01 ||r(i)||/||b|| 1.000772074296e-02 2 KSP preconditioned resid norm 5.461014564841e-01 true resid norm 8.573681774753e-03 ||r(i)||/||b|| 4.797206971742e-04 3 KSP preconditioned resid norm 7.688485365391e-02 true resid norm 5.523007345519e-04 ||r(i)||/||b|| 3.090272072020e-05 4 KSP preconditioned resid norm 1.411385748780e-02 true resid norm 5.705705081951e-05 ||r(i)||/||b|| 3.192496399673e-06 5 KSP preconditioned resid norm 3.139571197606e-03 true resid norm 9.464693230140e-06 ||r(i)||/||b|| 5.295751993354e-07 6 KSP preconditioned resid norm 7.993279321755e-04 true resid norm 1.929119211937e-06 ||r(i)||/||b|| 1.079394404406e-07 7 KSP preconditioned resid norm 2.215988083672e-04 true resid norm 4.224770660263e-07 ||r(i)||/||b|| 2.363873514072e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 7.262621438155e+01 true resid norm 1.721887843249e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 5.108227284039e+00 true resid norm 1.690751072139e-01 ||r(i)||/||b|| 9.819170736165e-03 2 KSP preconditioned resid norm 5.201807406336e-01 true resid norm 8.052819798426e-03 ||r(i)||/||b|| 4.676738865425e-04 3 KSP preconditioned resid norm 7.407845885622e-02 true resid norm 5.127908088504e-04 ||r(i)||/||b|| 2.978073228526e-05 4 KSP preconditioned resid norm 1.375241388414e-02 true resid norm 5.297500450637e-05 ||r(i)||/||b|| 3.076565335778e-06 5 KSP preconditioned resid norm 3.088336188407e-03 true resid norm 8.887757557594e-06 ||r(i)||/||b|| 5.161635580645e-07 6 KSP preconditioned resid norm 7.915993187476e-04 true resid norm 1.820586062526e-06 ||r(i)||/||b|| 1.057319772402e-07 7 KSP preconditioned resid norm 2.203349289202e-04 true resid norm 3.990747905507e-07 ||r(i)||/||b|| 2.317658447473e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 7.048564689447e+01 true resid norm 1.703528637492e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 4.991507652032e+00 true resid norm 1.623784196894e-01 ||r(i)||/||b|| 9.531886703615e-03 2 KSP preconditioned resid norm 5.044751640373e-01 true resid norm 7.782205546678e-03 ||r(i)||/||b|| 4.568285719068e-04 3 KSP preconditioned resid norm 6.776042608052e-02 true resid norm 4.970956587066e-04 ||r(i)||/||b|| 2.918035234432e-05 4 KSP preconditioned resid norm 1.110239891223e-02 true resid norm 5.006327720845e-05 ||r(i)||/||b|| 2.938798685659e-06 5 KSP preconditioned resid norm 2.137537163250e-03 true resid norm 8.102739547064e-06 ||r(i)||/||b|| 4.756444575533e-07 6 KSP preconditioned resid norm 4.924651258116e-04 true resid norm 1.629359603545e-06 ||r(i)||/||b|| 9.564615279632e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 6.682160944790e+01 true resid norm 1.599432470390e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 4.762668662071e+00 true resid norm 1.542331495627e-01 ||r(i)||/||b|| 9.642992274949e-03 2 KSP preconditioned resid norm 4.892049185935e-01 true resid norm 7.437235062909e-03 ||r(i)||/||b|| 4.649921269322e-04 3 KSP preconditioned resid norm 6.840475435384e-02 true resid norm 4.793567618681e-04 ||r(i)||/||b|| 2.997042830769e-05 4 KSP preconditioned resid norm 1.236624426952e-02 true resid norm 4.837391212835e-05 ||r(i)||/||b|| 3.024442295870e-06 5 KSP preconditioned resid norm 2.824089718157e-03 true resid norm 7.836783425823e-06 ||r(i)||/||b|| 4.899727603950e-07 6 KSP preconditioned resid norm 7.707249657812e-04 true resid norm 1.589084610362e-06 ||r(i)||/||b|| 9.935302926384e-08 7 KSP preconditioned resid norm 2.298906589147e-04 true resid norm 3.497272567890e-07 ||r(i)||/||b|| 2.186570944779e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 6.452978694232e+01 true resid norm 1.550628020640e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 4.685179718034e+00 true resid norm 1.493392754498e-01 ||r(i)||/||b|| 9.630889772530e-03 2 KSP preconditioned resid norm 5.046263455989e-01 true resid norm 7.290778835460e-03 ||r(i)||/||b|| 4.701823221570e-04 3 KSP preconditioned resid norm 7.848648383700e-02 true resid norm 4.838354196030e-04 ||r(i)||/||b|| 3.120254588224e-05 4 KSP preconditioned resid norm 1.683352817808e-02 true resid norm 4.978046612113e-05 ||r(i)||/||b|| 3.210342226409e-06 5 KSP preconditioned resid norm 4.491722460457e-03 true resid norm 8.075510307918e-06 ||r(i)||/||b|| 5.207896542839e-07 6 KSP preconditioned resid norm 1.328053853929e-03 true resid norm 1.654324404274e-06 ||r(i)||/||b|| 1.066873797103e-07 7 KSP preconditioned resid norm 4.068720658665e-04 true resid norm 3.724955339607e-07 ||r(i)||/||b|| 2.402223673264e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 5.861121501319e+01 true resid norm 1.443270378995e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 4.109394216080e+00 true resid norm 1.366857625054e-01 ||r(i)||/||b|| 9.470558288643e-03 2 KSP preconditioned resid norm 4.119857620789e-01 true resid norm 6.532267762689e-03 ||r(i)||/||b|| 4.526018033598e-04 3 KSP preconditioned resid norm 5.595949769453e-02 true resid norm 4.180829283713e-04 ||r(i)||/||b|| 2.896774814033e-05 4 KSP preconditioned resid norm 9.470175241578e-03 true resid norm 4.273498844363e-05 ||r(i)||/||b|| 2.960982852941e-06 5 KSP preconditioned resid norm 1.864282408594e-03 true resid norm 7.011023346741e-06 ||r(i)||/||b|| 4.857733830595e-07 6 KSP preconditioned resid norm 4.158239030941e-04 true resid norm 1.413704198285e-06 ||r(i)||/||b|| 9.795144547134e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 5.673888172431e+01 true resid norm 1.406944648695e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 3.946683239185e+00 true resid norm 1.327893428612e-01 ||r(i)||/||b|| 9.438135536063e-03 2 KSP preconditioned resid norm 3.980193773331e-01 true resid norm 6.262166368503e-03 ||r(i)||/||b|| 4.450897463742e-04 3 KSP preconditioned resid norm 5.674075563051e-02 true resid norm 3.948533405614e-04 ||r(i)||/||b|| 2.806459663695e-05 4 KSP preconditioned resid norm 1.077612851708e-02 true resid norm 4.046798903274e-05 ||r(i)||/||b|| 2.876302850313e-06 5 KSP preconditioned resid norm 2.506360466446e-03 true resid norm 6.770229373280e-06 ||r(i)||/||b|| 4.812008332779e-07 6 KSP preconditioned resid norm 6.629347767994e-04 true resid norm 1.388033843435e-06 ||r(i)||/||b|| 9.865589557641e-08 7 KSP preconditioned resid norm 1.885033094363e-04 true resid norm 3.057498590455e-07 ||r(i)||/||b|| 2.173147744860e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 5.516988800277e+01 true resid norm 1.377778935547e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 3.868808660791e+00 true resid norm 1.308938230888e-01 ||r(i)||/||b|| 9.500350144114e-03 2 KSP preconditioned resid norm 3.955129144300e-01 true resid norm 6.337488773974e-03 ||r(i)||/||b|| 4.599786373898e-04 3 KSP preconditioned resid norm 5.623197729999e-02 true resid norm 4.176266383124e-04 ||r(i)||/||b|| 3.031158537393e-05 4 KSP preconditioned resid norm 1.065603701486e-02 true resid norm 4.184583245159e-05 ||r(i)||/||b|| 3.037194964442e-06 5 KSP preconditioned resid norm 2.564185095588e-03 true resid norm 6.514753865387e-06 ||r(i)||/||b|| 4.728446412776e-07 6 KSP preconditioned resid norm 7.192628037654e-04 true resid norm 1.294683295796e-06 ||r(i)||/||b|| 9.396886992482e-08 7 KSP preconditioned resid norm 2.160241686176e-04 true resid norm 2.842684895108e-07 ||r(i)||/||b|| 2.063237303000e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 5.104876617190e+01 true resid norm 1.328192391004e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 3.567137404083e+00 true resid norm 1.195952488440e-01 ||r(i)||/||b|| 9.004361842003e-03 2 KSP preconditioned resid norm 3.609562129934e-01 true resid norm 5.684481894378e-03 ||r(i)||/||b|| 4.279863318657e-04 3 KSP preconditioned resid norm 4.950104589952e-02 true resid norm 3.629552229674e-04 ||r(i)||/||b|| 2.732700664645e-05 4 KSP preconditioned resid norm 8.644616980557e-03 true resid norm 3.582931065499e-05 ||r(i)||/||b|| 2.697599451530e-06 5 KSP preconditioned resid norm 1.866966520358e-03 true resid norm 5.659455126192e-06 ||r(i)||/||b|| 4.261020590483e-07 6 KSP preconditioned resid norm 4.840536132616e-04 true resid norm 1.135765957424e-06 ||r(i)||/||b|| 8.551215660592e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 5.121270711628e+01 true resid norm 1.472606367960e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 3.586430656262e+00 true resid norm 1.196921951900e-01 ||r(i)||/||b|| 8.127915089478e-03 2 KSP preconditioned resid norm 3.744954103015e-01 true resid norm 5.695956796010e-03 ||r(i)||/||b|| 3.867942526896e-04 3 KSP preconditioned resid norm 5.416867421543e-02 true resid norm 3.687656994489e-04 ||r(i)||/||b|| 2.504170207819e-05 4 KSP preconditioned resid norm 1.023092158056e-02 true resid norm 3.712284801499e-05 ||r(i)||/||b|| 2.520894165793e-06 5 KSP preconditioned resid norm 2.406602672142e-03 true resid norm 5.957102715217e-06 ||r(i)||/||b|| 4.045278388596e-07 6 KSP preconditioned resid norm 6.593686001716e-04 true resid norm 1.212405307363e-06 ||r(i)||/||b|| 8.233057616361e-08 7 KSP preconditioned resid norm 1.950813607501e-04 true resid norm 2.700424494642e-07 ||r(i)||/||b|| 1.833772115479e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 4.874904589316e+01 true resid norm 1.343144975348e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 3.360157436658e+00 true resid norm 1.146352340061e-01 ||r(i)||/||b|| 8.534836976660e-03 2 KSP preconditioned resid norm 3.377751305762e-01 true resid norm 5.437987328719e-03 ||r(i)||/||b|| 4.048697220723e-04 3 KSP preconditioned resid norm 4.621442833423e-02 true resid norm 3.488061668582e-04 ||r(i)||/||b|| 2.596936095955e-05 4 KSP preconditioned resid norm 8.138334755720e-03 true resid norm 3.513004963049e-05 ||r(i)||/||b|| 2.615506909177e-06 5 KSP preconditioned resid norm 1.797943414609e-03 true resid norm 5.653117091879e-06 ||r(i)||/||b|| 4.208865904751e-07 6 KSP preconditioned resid norm 4.774598524766e-04 true resid norm 1.134885299732e-06 ||r(i)||/||b|| 8.449462422613e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 4.810211910951e+01 true resid norm 1.277235525019e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 3.447380947730e+00 true resid norm 1.108413799723e-01 ||r(i)||/||b|| 8.678225573996e-03 2 KSP preconditioned resid norm 3.684615967428e-01 true resid norm 5.290907485284e-03 ||r(i)||/||b|| 4.142468152228e-04 3 KSP preconditioned resid norm 5.556496536113e-02 true resid norm 3.429252099171e-04 ||r(i)||/||b|| 2.684901908848e-05 4 KSP preconditioned resid norm 1.127711867600e-02 true resid norm 3.531599143158e-05 ||r(i)||/||b|| 2.765033601071e-06 5 KSP preconditioned resid norm 2.869194631779e-03 true resid norm 5.800736344501e-06 ||r(i)||/||b|| 4.541634045464e-07 6 KSP preconditioned resid norm 8.274850256941e-04 true resid norm 1.187831650058e-06 ||r(i)||/||b|| 9.300020448782e-08 7 KSP preconditioned resid norm 2.508945751017e-04 true resid norm 2.652454183568e-07 ||r(i)||/||b|| 2.076715008009e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 4.725257576004e+01 true resid norm 1.361330334029e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 3.291197538319e+00 true resid norm 1.104464836556e-01 ||r(i)||/||b|| 8.113128819268e-03 2 KSP preconditioned resid norm 3.405176394870e-01 true resid norm 5.219556145529e-03 ||r(i)||/||b|| 3.834158407445e-04 3 KSP preconditioned resid norm 4.706487764669e-02 true resid norm 3.342967507104e-04 ||r(i)||/||b|| 2.455662247097e-05 4 KSP preconditioned resid norm 7.746992871270e-03 true resid norm 3.370134200456e-05 ||r(i)||/||b|| 2.475618236230e-06 5 KSP preconditioned resid norm 1.395833809561e-03 true resid norm 5.454321226140e-06 ||r(i)||/||b|| 4.006611099303e-07 6 KSP preconditioned resid norm 2.649982357288e-04 true resid norm 1.100936634093e-06 ||r(i)||/||b|| 8.087211506073e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 4.495902182684e+01 true resid norm 1.196669975374e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 3.237685071057e+00 true resid norm 1.045645129285e-01 ||r(i)||/||b|| 8.737957421868e-03 2 KSP preconditioned resid norm 3.552542726170e-01 true resid norm 5.001849709155e-03 ||r(i)||/||b|| 4.179807141556e-04 3 KSP preconditioned resid norm 5.618991517558e-02 true resid norm 3.260137183412e-04 ||r(i)||/||b|| 2.724341088605e-05 4 KSP preconditioned resid norm 1.201085584933e-02 true resid norm 3.327225894567e-05 ||r(i)||/||b|| 2.780403923417e-06 5 KSP preconditioned resid norm 3.153208135357e-03 true resid norm 5.465112584457e-06 ||r(i)||/||b|| 4.566933822123e-07 6 KSP preconditioned resid norm 9.189768090976e-04 true resid norm 1.142999746455e-06 ||r(i)||/||b|| 9.551503505366e-08 7 KSP preconditioned resid norm 2.790095109775e-04 true resid norm 2.623423286706e-07 ||r(i)||/||b|| 2.192269665565e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 4.262465432297e+01 true resid norm 1.163690177019e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.977309158072e+00 true resid norm 1.001862475497e-01 ||r(i)||/||b|| 8.609357501524e-03 2 KSP preconditioned resid norm 3.104087197967e-01 true resid norm 4.732758696446e-03 ||r(i)||/||b|| 4.067026421561e-04 3 KSP preconditioned resid norm 4.551635300260e-02 true resid norm 3.018781137203e-04 ||r(i)||/||b|| 2.594145071274e-05 4 KSP preconditioned resid norm 8.890255686451e-03 true resid norm 3.005160254716e-05 ||r(i)||/||b|| 2.582440166690e-06 5 KSP preconditioned resid norm 2.164789314128e-03 true resid norm 4.843372486952e-06 ||r(i)||/||b|| 4.162080751906e-07 6 KSP preconditioned resid norm 6.039061763272e-04 true resid norm 9.922138068132e-07 ||r(i)||/||b|| 8.526443089476e-08 7 KSP preconditioned resid norm 1.794958078069e-04 true resid norm 2.222862602717e-07 ||r(i)||/||b|| 1.910184211069e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 3.988547356419e+01 true resid norm 1.088506537662e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.749489180416e+00 true resid norm 9.368805412525e-02 ||r(i)||/||b|| 8.607027232606e-03 2 KSP preconditioned resid norm 2.747559898624e-01 true resid norm 4.417910776206e-03 ||r(i)||/||b|| 4.058690162483e-04 3 KSP preconditioned resid norm 3.675725252692e-02 true resid norm 2.805845579099e-04 ||r(i)||/||b|| 2.577702091827e-05 4 KSP preconditioned resid norm 5.963904877795e-03 true resid norm 2.830468141233e-05 ||r(i)||/||b|| 2.600322591826e-06 5 KSP preconditioned resid norm 1.079132582697e-03 true resid norm 4.586339426435e-06 ||r(i)||/||b|| 4.213423868162e-07 6 KSP preconditioned resid norm 2.098034658704e-04 true resid norm 9.194727401642e-07 ||r(i)||/||b|| 8.447103516156e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 3.962423330122e+01 true resid norm 1.063793110731e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.842877945709e+00 true resid norm 9.255234545963e-02 ||r(i)||/||b|| 8.700220421243e-03 2 KSP preconditioned resid norm 3.084363653956e-01 true resid norm 4.466042778224e-03 ||r(i)||/||b|| 4.198224949166e-04 3 KSP preconditioned resid norm 4.815599825978e-02 true resid norm 2.946050433317e-04 ||r(i)||/||b|| 2.769382884320e-05 4 KSP preconditioned resid norm 1.023824082966e-02 true resid norm 2.995277315037e-05 ||r(i)||/||b|| 2.815657748505e-06 5 KSP preconditioned resid norm 2.694370618619e-03 true resid norm 4.773027547370e-06 ||r(i)||/||b|| 4.486800581072e-07 6 KSP preconditioned resid norm 7.885487340630e-04 true resid norm 9.731685103004e-07 ||r(i)||/||b|| 9.148099385898e-08 7 KSP preconditioned resid norm 2.401798558986e-04 true resid norm 2.203010079732e-07 ||r(i)||/||b|| 2.070900871145e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 3.762707027554e+01 true resid norm 1.022574519474e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.607361866092e+00 true resid norm 8.889418887329e-02 ||r(i)||/||b|| 8.693174646968e-03 2 KSP preconditioned resid norm 2.636602793849e-01 true resid norm 4.206093957835e-03 ||r(i)||/||b|| 4.113239551478e-04 3 KSP preconditioned resid norm 3.560433093664e-02 true resid norm 2.688441547065e-04 ||r(i)||/||b|| 2.629091079296e-05 4 KSP preconditioned resid norm 5.773739419690e-03 true resid norm 2.710258924026e-05 ||r(i)||/||b|| 2.650426812335e-06 5 KSP preconditioned resid norm 1.043953813078e-03 true resid norm 4.348030450935e-06 ||r(i)||/||b|| 4.252042631739e-07 6 KSP preconditioned resid norm 2.079808295667e-04 true resid norm 8.657058507465e-07 ||r(i)||/||b|| 8.465943892202e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 3.824342303183e+01 true resid norm 1.008441088033e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.814906220650e+00 true resid norm 8.941137638661e-02 ||r(i)||/||b|| 8.866296449803e-03 2 KSP preconditioned resid norm 3.216685781905e-01 true resid norm 4.305352654678e-03 ||r(i)||/||b|| 4.269314990998e-04 3 KSP preconditioned resid norm 5.202981717757e-02 true resid norm 2.828728180615e-04 ||r(i)||/||b|| 2.805050502388e-05 4 KSP preconditioned resid norm 1.096499861333e-02 true resid norm 2.866243622356e-05 ||r(i)||/||b|| 2.842251923656e-06 5 KSP preconditioned resid norm 2.807405846490e-03 true resid norm 4.658592969966e-06 ||r(i)||/||b|| 4.619598532080e-07 6 KSP preconditioned resid norm 8.051241968520e-04 true resid norm 9.791481201679e-07 ||r(i)||/||b|| 9.709522269446e-08 7 KSP preconditioned resid norm 2.427109328860e-04 true resid norm 2.275731198675e-07 ||r(i)||/||b|| 2.256682344344e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 3.508366370304e+01 true resid norm 9.649255370745e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.437121465680e+00 true resid norm 8.377359107846e-02 ||r(i)||/||b|| 8.681871072916e-03 2 KSP preconditioned resid norm 2.492674901339e-01 true resid norm 3.962386752030e-03 ||r(i)||/||b|| 4.106417127319e-04 3 KSP preconditioned resid norm 3.407497684019e-02 true resid norm 2.513560371266e-04 ||r(i)||/||b|| 2.604926778999e-05 4 KSP preconditioned resid norm 5.573711057742e-03 true resid norm 2.468400800066e-05 ||r(i)||/||b|| 2.558125684548e-06 5 KSP preconditioned resid norm 1.003045995624e-03 true resid norm 3.874431778017e-06 ||r(i)||/||b|| 4.015265042900e-07 6 KSP preconditioned resid norm 1.911437979868e-04 true resid norm 7.679452025769e-07 ||r(i)||/||b|| 7.958595488158e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 3.396957180883e+01 true resid norm 9.287122764435e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.378222762415e+00 true resid norm 8.075889355151e-02 ||r(i)||/||b|| 8.695792615208e-03 2 KSP preconditioned resid norm 2.449082171154e-01 true resid norm 3.838412508672e-03 ||r(i)||/||b|| 4.133048098999e-04 3 KSP preconditioned resid norm 3.362186515916e-02 true resid norm 2.438341345806e-04 ||r(i)||/||b|| 2.625507821587e-05 4 KSP preconditioned resid norm 5.510151353513e-03 true resid norm 2.372968256069e-05 ||r(i)||/||b|| 2.555116709726e-06 5 KSP preconditioned resid norm 9.975657786817e-04 true resid norm 3.706035019609e-06 ||r(i)||/||b|| 3.990509346771e-07 6 KSP preconditioned resid norm 1.953175021674e-04 true resid norm 7.384466547354e-07 ||r(i)||/||b|| 7.951296364502e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 3.606885365295e+01 true resid norm 9.189754208489e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.806038235446e+00 true resid norm 8.156994374646e-02 ||r(i)||/||b|| 8.876183399019e-03 2 KSP preconditioned resid norm 3.552014861051e-01 true resid norm 3.927428306704e-03 ||r(i)||/||b|| 4.273703319591e-04 3 KSP preconditioned resid norm 6.668475176842e-02 true resid norm 2.619021858323e-04 ||r(i)||/||b|| 2.849936787105e-05 4 KSP preconditioned resid norm 1.631141510841e-02 true resid norm 2.878717008698e-05 ||r(i)||/||b|| 3.132528839606e-06 5 KSP preconditioned resid norm 4.585814970672e-03 true resid norm 5.137489164680e-06 ||r(i)||/||b|| 5.590453289746e-07 6 KSP preconditioned resid norm 1.368983288216e-03 true resid norm 1.148795549040e-06 ||r(i)||/||b|| 1.250082997844e-07 7 KSP preconditioned resid norm 4.180985081938e-04 true resid norm 2.813854850658e-07 ||r(i)||/||b|| 3.061947889812e-08 8 KSP preconditioned resid norm 1.287012599561e-04 true resid norm 7.218588191859e-08 ||r(i)||/||b|| 7.855039458173e-09 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 3.193173296926e+01 true resid norm 8.702001300167e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.236460065698e+00 true resid norm 7.605817986439e-02 ||r(i)||/||b|| 8.740308952026e-03 2 KSP preconditioned resid norm 2.303347198266e-01 true resid norm 3.628082560218e-03 ||r(i)||/||b|| 4.169250767807e-04 3 KSP preconditioned resid norm 3.181561623873e-02 true resid norm 2.314889894551e-04 ||r(i)||/||b|| 2.660181048820e-05 4 KSP preconditioned resid norm 5.333904042821e-03 true resid norm 2.268949527040e-05 ||r(i)||/||b|| 2.607388172875e-06 5 KSP preconditioned resid norm 1.027485721033e-03 true resid norm 3.554746437176e-06 ||r(i)||/||b|| 4.084975759665e-07 6 KSP preconditioned resid norm 2.284948941707e-04 true resid norm 7.091062547101e-07 ||r(i)||/||b|| 8.148772107130e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 3.137092309335e+01 true resid norm 8.666898072681e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.176754408570e+00 true resid norm 7.639396309089e-02 ||r(i)||/||b|| 8.814452696946e-03 2 KSP preconditioned resid norm 2.283730068528e-01 true resid norm 3.682575484086e-03 ||r(i)||/||b|| 4.249012106989e-04 3 KSP preconditioned resid norm 3.428585983628e-02 true resid norm 2.406143396904e-04 ||r(i)||/||b|| 2.776245176447e-05 4 KSP preconditioned resid norm 6.918220571675e-03 true resid norm 2.377469863892e-05 ||r(i)||/||b|| 2.743161214029e-06 5 KSP preconditioned resid norm 1.738677032322e-03 true resid norm 3.657626592274e-06 ||r(i)||/||b|| 4.220225692746e-07 6 KSP preconditioned resid norm 4.960750999416e-04 true resid norm 7.242983031468e-07 ||r(i)||/||b|| 8.357064973798e-08 7 KSP preconditioned resid norm 1.494685142182e-04 true resid norm 1.601111717650e-07 ||r(i)||/||b|| 1.847387270766e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 3.073872029854e+01 true resid norm 8.563575584239e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.206337877652e+00 true resid norm 7.379277439805e-02 ||r(i)||/||b|| 8.617051799469e-03 2 KSP preconditioned resid norm 2.487106162576e-01 true resid norm 3.564832145361e-03 ||r(i)||/||b|| 4.162784703999e-04 3 KSP preconditioned resid norm 4.164213868165e-02 true resid norm 2.341691872631e-04 ||r(i)||/||b|| 2.734479131522e-05 4 KSP preconditioned resid norm 9.465724643935e-03 true resid norm 2.343563315100e-05 ||r(i)||/||b|| 2.736664483249e-06 5 KSP preconditioned resid norm 2.582477407658e-03 true resid norm 3.721857859628e-06 ||r(i)||/||b|| 4.346149366017e-07 6 KSP preconditioned resid norm 7.651311561968e-04 true resid norm 7.724779337142e-07 ||r(i)||/||b|| 9.020506984675e-08 7 KSP preconditioned resid norm 2.336811184342e-04 true resid norm 1.801522553917e-07 ||r(i)||/||b|| 2.103703688016e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 3.095505878777e+01 true resid norm 8.748597857524e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.285389548870e+00 true resid norm 7.271376136127e-02 ||r(i)||/||b|| 8.311476026840e-03 2 KSP preconditioned resid norm 2.661659727370e-01 true resid norm 3.490092026758e-03 ||r(i)||/||b|| 3.989315869350e-04 3 KSP preconditioned resid norm 4.487432560550e-02 true resid norm 2.280972371133e-04 ||r(i)||/||b|| 2.607243364342e-05 4 KSP preconditioned resid norm 1.004332555190e-02 true resid norm 2.328510584532e-05 ||r(i)||/||b|| 2.661581458484e-06 5 KSP preconditioned resid norm 2.696082957107e-03 true resid norm 3.866477632021e-06 ||r(i)||/||b|| 4.419539787962e-07 6 KSP preconditioned resid norm 7.919211418415e-04 true resid norm 8.313574804191e-07 ||r(i)||/||b|| 9.502751114616e-08 7 KSP preconditioned resid norm 2.410872645724e-04 true resid norm 1.980468628692e-07 ||r(i)||/||b|| 2.263755473672e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 3.046916857413e+01 true resid norm 8.264726575309e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.253594604771e+00 true resid norm 7.275617852060e-02 ||r(i)||/||b|| 8.803216640943e-03 2 KSP preconditioned resid norm 2.695434315150e-01 true resid norm 3.670323640965e-03 ||r(i)||/||b|| 4.440949869933e-04 3 KSP preconditioned resid norm 4.836427496591e-02 true resid norm 2.578682604582e-04 ||r(i)||/||b|| 3.120106371439e-05 4 KSP preconditioned resid norm 1.156622592439e-02 true resid norm 2.626006703858e-05 ||r(i)||/||b|| 3.177366704064e-06 5 KSP preconditioned resid norm 3.230928716428e-03 true resid norm 4.082971003437e-06 ||r(i)||/||b|| 4.940237243461e-07 6 KSP preconditioned resid norm 9.646776524645e-04 true resid norm 8.621306294249e-07 ||r(i)||/||b|| 1.043144769000e-07 7 KSP preconditioned resid norm 2.951793514335e-04 true resid norm 2.082495976086e-07 ||r(i)||/||b|| 2.519739712028e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 3.829852886510e+01 true resid norm 1.091031973667e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 3.451901827162e+00 true resid norm 8.482875258148e-02 ||r(i)||/||b|| 7.775093180480e-03 2 KSP preconditioned resid norm 5.408090397793e-01 true resid norm 4.633298266462e-03 ||r(i)||/||b|| 4.246711717244e-04 3 KSP preconditioned resid norm 1.208383932555e-01 true resid norm 3.649141698423e-04 ||r(i)||/||b|| 3.344669804824e-05 4 KSP preconditioned resid norm 3.253242645273e-02 true resid norm 4.178992026238e-05 ||r(i)||/||b|| 3.830311234775e-06 5 KSP preconditioned resid norm 9.513981908210e-03 true resid norm 7.480444580225e-06 ||r(i)||/||b|| 6.856301887358e-07 6 KSP preconditioned resid norm 2.877600791871e-03 true resid norm 1.787727473902e-06 ||r(i)||/||b|| 1.638565612236e-07 7 KSP preconditioned resid norm 8.818280659085e-04 true resid norm 4.715452594458e-07 ||r(i)||/||b|| 4.322011369298e-08 8 KSP preconditioned resid norm 2.715347623081e-04 true resid norm 1.285661272634e-07 ||r(i)||/||b|| 1.178390096408e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.983536443300e+01 true resid norm 9.082003448608e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.058368922674e+00 true resid norm 7.298166367276e-02 ||r(i)||/||b|| 8.035855093619e-03 2 KSP preconditioned resid norm 2.180731099016e-01 true resid norm 3.467840554115e-03 ||r(i)||/||b|| 3.818365158898e-04 3 KSP preconditioned resid norm 3.286968191759e-02 true resid norm 2.268201741946e-04 ||r(i)||/||b|| 2.497468487851e-05 4 KSP preconditioned resid norm 6.640925043224e-03 true resid norm 2.259620636955e-05 ||r(i)||/||b|| 2.488020016444e-06 5 KSP preconditioned resid norm 1.664061246446e-03 true resid norm 3.487477669079e-06 ||r(i)||/||b|| 3.839987166723e-07 6 KSP preconditioned resid norm 4.722798134409e-04 true resid norm 6.920223541553e-07 ||r(i)||/||b|| 7.619710321309e-08 7 KSP preconditioned resid norm 1.416500258489e-04 true resid norm 1.536579142210e-07 ||r(i)||/||b|| 1.691894471198e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.835490415445e+01 true resid norm 8.321279295621e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.942838969063e+00 true resid norm 6.955598140755e-02 ||r(i)||/||b|| 8.358808656279e-03 2 KSP preconditioned resid norm 2.008317758315e-01 true resid norm 3.286446737800e-03 ||r(i)||/||b|| 3.949448902082e-04 3 KSP preconditioned resid norm 2.883553632482e-02 true resid norm 2.097869993908e-04 ||r(i)||/||b|| 2.521090711391e-05 4 KSP preconditioned resid norm 5.381450443711e-03 true resid norm 2.063274966412e-05 ||r(i)||/||b|| 2.479516542003e-06 5 KSP preconditioned resid norm 1.249263510677e-03 true resid norm 3.210203409750e-06 ||r(i)||/||b|| 3.857824374961e-07 6 KSP preconditioned resid norm 3.398835750315e-04 true resid norm 6.334668788234e-07 ||r(i)||/||b|| 7.612614074338e-08 7 KSP preconditioned resid norm 1.004311532878e-04 true resid norm 1.380755848437e-07 ||r(i)||/||b|| 1.659307180284e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 3.373587170742e+01 true resid norm 8.172247540613e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.924383508827e+00 true resid norm 7.902797342627e-02 ||r(i)||/||b|| 9.670286299274e-03 2 KSP preconditioned resid norm 4.328539533307e-01 true resid norm 4.241677036904e-03 ||r(i)||/||b|| 5.190343312320e-04 3 KSP preconditioned resid norm 9.225189556006e-02 true resid norm 3.265884280253e-04 ||r(i)||/||b|| 3.996311007495e-05 4 KSP preconditioned resid norm 2.417459336234e-02 true resid norm 3.673093689087e-05 ||r(i)||/||b|| 4.494594260432e-06 5 KSP preconditioned resid norm 6.980964587899e-03 true resid norm 6.344814354279e-06 ||r(i)||/||b|| 7.763854830324e-07 6 KSP preconditioned resid norm 2.099766540870e-03 true resid norm 1.454084529349e-06 ||r(i)||/||b|| 1.779295747128e-07 7 KSP preconditioned resid norm 6.418745862365e-04 true resid norm 3.715263175584e-07 ||r(i)||/||b|| 4.546195103758e-08 8 KSP preconditioned resid norm 1.974188779500e-04 true resid norm 9.902326125567e-08 ||r(i)||/||b|| 1.211701686269e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.680729071759e+01 true resid norm 7.435619030763e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.863512414172e+00 true resid norm 6.630390995123e-02 ||r(i)||/||b|| 8.917066578709e-03 2 KSP preconditioned resid norm 1.969775829684e-01 true resid norm 3.179867803856e-03 ||r(i)||/||b|| 4.276534059506e-04 3 KSP preconditioned resid norm 2.909764465868e-02 true resid norm 2.062814715757e-04 ||r(i)||/||b|| 2.774234004220e-05 4 KSP preconditioned resid norm 5.600420243214e-03 true resid norm 2.022570872787e-05 ||r(i)||/||b|| 2.720110947615e-06 5 KSP preconditioned resid norm 1.334183717797e-03 true resid norm 3.112178665683e-06 ||r(i)||/||b|| 4.185500430841e-07 6 KSP preconditioned resid norm 3.679648420749e-04 true resid norm 6.169046376475e-07 ||r(i)||/||b|| 8.296614378645e-08 7 KSP preconditioned resid norm 1.092062275776e-04 true resid norm 1.360909307996e-07 ||r(i)||/||b|| 1.830256905802e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.678751341762e+01 true resid norm 7.494300683097e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.946829118575e+00 true resid norm 6.449904528766e-02 ||r(i)||/||b|| 8.606412794876e-03 2 KSP preconditioned resid norm 2.272456603835e-01 true resid norm 3.077744304328e-03 ||r(i)||/||b|| 4.106779851081e-04 3 KSP preconditioned resid norm 3.920499890752e-02 true resid norm 1.999746222356e-04 ||r(i)||/||b|| 2.668356004004e-05 4 KSP preconditioned resid norm 9.015809528630e-03 true resid norm 2.020572233259e-05 ||r(i)||/||b|| 2.696145135751e-06 5 KSP preconditioned resid norm 2.459999379692e-03 true resid norm 3.299590285366e-06 ||r(i)||/||b|| 4.402799440390e-07 6 KSP preconditioned resid norm 7.268351050349e-04 true resid norm 6.957027221279e-07 ||r(i)||/||b|| 9.283090598392e-08 7 KSP preconditioned resid norm 2.214608621693e-04 true resid norm 1.638025709791e-07 ||r(i)||/||b|| 2.185695209008e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.537991509037e+01 true resid norm 7.252333593294e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.733615502481e+00 true resid norm 6.338727601678e-02 ||r(i)||/||b|| 8.740259283631e-03 2 KSP preconditioned resid norm 1.783350055105e-01 true resid norm 3.005486535188e-03 ||r(i)||/||b|| 4.144164766452e-04 3 KSP preconditioned resid norm 2.505746355416e-02 true resid norm 1.929192622280e-04 ||r(i)||/||b|| 2.660099121839e-05 4 KSP preconditioned resid norm 4.381902255946e-03 true resid norm 1.880629262346e-05 ||r(i)||/||b|| 2.593136730618e-06 5 KSP preconditioned resid norm 9.099274131326e-04 true resid norm 2.871980988313e-06 ||r(i)||/||b|| 3.960078437331e-07 6 KSP preconditioned resid norm 2.219129488282e-04 true resid norm 5.621866668696e-07 ||r(i)||/||b|| 7.751803742031e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.542624557074e+01 true resid norm 7.192933782560e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.776365745205e+00 true resid norm 6.369284977540e-02 ||r(i)||/||b|| 8.854919522522e-03 2 KSP preconditioned resid norm 1.949108963582e-01 true resid norm 3.051501425624e-03 ||r(i)||/||b|| 4.242359957522e-04 3 KSP preconditioned resid norm 3.126309753786e-02 true resid norm 1.993746534336e-04 ||r(i)||/||b|| 2.771812718714e-05 4 KSP preconditioned resid norm 6.759580220255e-03 true resid norm 1.957459297196e-05 ||r(i)||/||b|| 2.721364267168e-06 5 KSP preconditioned resid norm 1.779588705420e-03 true resid norm 3.029354321493e-06 ||r(i)||/||b|| 4.211569872696e-07 6 KSP preconditioned resid norm 5.180922164626e-04 true resid norm 6.162251967617e-07 ||r(i)||/||b|| 8.567091195191e-08 7 KSP preconditioned resid norm 1.571171752268e-04 true resid norm 1.411644756220e-07 ||r(i)||/||b|| 1.962543794915e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.762504907264e+01 true resid norm 7.832106309640e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.050907930710e+00 true resid norm 6.695316440356e-02 ||r(i)||/||b|| 8.548551533468e-03 2 KSP preconditioned resid norm 2.588539759162e-01 true resid norm 3.118483608074e-03 ||r(i)||/||b|| 3.981666597446e-04 3 KSP preconditioned resid norm 4.860404873844e-02 true resid norm 2.033655949673e-04 ||r(i)||/||b|| 2.596563260601e-05 4 KSP preconditioned resid norm 1.178828170088e-02 true resid norm 2.160659099655e-05 ||r(i)||/||b|| 2.758720341928e-06 5 KSP preconditioned resid norm 3.282082454633e-03 true resid norm 3.801117057798e-06 ||r(i)||/||b|| 4.853250080530e-07 6 KSP preconditioned resid norm 9.733475382690e-04 true resid norm 8.513538227057e-07 ||r(i)||/||b|| 1.087004937175e-07 7 KSP preconditioned resid norm 2.961616642350e-04 true resid norm 2.087109354769e-07 ||r(i)||/||b|| 2.664812340711e-08 8 KSP preconditioned resid norm 9.098607520173e-05 true resid norm 5.341601444464e-08 ||r(i)||/||b|| 6.820133988591e-09 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.548180919598e+01 true resid norm 7.465426561016e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.699829514481e+00 true resid norm 6.389103247376e-02 ||r(i)||/||b|| 8.558256109221e-03 2 KSP preconditioned resid norm 1.708570590009e-01 true resid norm 2.993432258119e-03 ||r(i)||/||b|| 4.009727017811e-04 3 KSP preconditioned resid norm 2.347835606773e-02 true resid norm 1.907509792495e-04 ||r(i)||/||b|| 2.555124984358e-05 4 KSP preconditioned resid norm 3.954304963747e-03 true resid norm 1.854680408913e-05 ||r(i)||/||b|| 2.484359592522e-06 5 KSP preconditioned resid norm 7.505001281845e-04 true resid norm 2.824759967993e-06 ||r(i)||/||b|| 3.783789104220e-07 6 KSP preconditioned resid norm 1.552123175874e-04 true resid norm 5.466709167124e-07 ||r(i)||/||b|| 7.322701686827e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.632478718807e+01 true resid norm 8.566747431294e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.735510317140e+00 true resid norm 6.583250993603e-02 ||r(i)||/||b|| 7.684656337078e-03 2 KSP preconditioned resid norm 1.751029685214e-01 true resid norm 3.098332103855e-03 ||r(i)||/||b|| 3.616695984916e-04 3 KSP preconditioned resid norm 2.426768100523e-02 true resid norm 2.009460464140e-04 ||r(i)||/||b|| 2.345651579268e-05 4 KSP preconditioned resid norm 4.225789611878e-03 true resid norm 1.961758389005e-05 ||r(i)||/||b|| 2.289968748044e-06 5 KSP preconditioned resid norm 8.725378069870e-04 true resid norm 2.949597277892e-06 ||r(i)||/||b|| 3.443077202343e-07 6 KSP preconditioned resid norm 2.095179449934e-04 true resid norm 5.706253859378e-07 ||r(i)||/||b|| 6.660933925207e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.860249105404e+01 true resid norm 9.001514536167e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.069154133034e+00 true resid norm 6.962712792435e-02 ||r(i)||/||b|| 7.735045879734e-03 2 KSP preconditioned resid norm 2.510600468300e-01 true resid norm 3.360826564434e-03 ||r(i)||/||b|| 3.733623437401e-04 3 KSP preconditioned resid norm 4.633161539308e-02 true resid norm 2.262446304274e-04 ||r(i)||/||b|| 2.513406266450e-05 4 KSP preconditioned resid norm 1.122254218667e-02 true resid norm 2.296282920142e-05 ||r(i)||/||b|| 2.550996180605e-06 5 KSP preconditioned resid norm 3.115790683978e-03 true resid norm 3.789048426453e-06 ||r(i)||/||b|| 4.209345450956e-07 6 KSP preconditioned resid norm 9.194278903540e-04 true resid norm 8.298675225996e-07 ||r(i)||/||b|| 9.219198827767e-08 7 KSP preconditioned resid norm 2.784241394980e-04 true resid norm 2.014531158676e-07 ||r(i)||/||b|| 2.237991340882e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.993181386530e+01 true resid norm 9.623266171656e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.486857106565e+00 true resid norm 7.004442754222e-02 ||r(i)||/||b|| 7.278654283566e-03 2 KSP preconditioned resid norm 3.744650598190e-01 true resid norm 3.473403609726e-03 ||r(i)||/||b|| 3.609381209839e-04 3 KSP preconditioned resid norm 8.239100998955e-02 true resid norm 2.485059864426e-04 ||r(i)||/||b|| 2.582345557214e-05 4 KSP preconditioned resid norm 2.206241004740e-02 true resid norm 2.757219658067e-05 ||r(i)||/||b|| 2.865159924796e-06 5 KSP preconditioned resid norm 6.435361410641e-03 true resid norm 5.017541069469e-06 ||r(i)||/||b|| 5.213968916549e-07 6 KSP preconditioned resid norm 1.943291590505e-03 true resid norm 1.204869753454e-06 ||r(i)||/||b|| 1.252038270543e-07 7 KSP preconditioned resid norm 5.948778695428e-04 true resid norm 3.171865147586e-07 ||r(i)||/||b|| 3.296038050915e-08 8 KSP preconditioned resid norm 1.830488631629e-04 true resid norm 8.634163501711e-08 ||r(i)||/||b|| 8.972175712173e-09 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.818675010052e+01 true resid norm 8.317336743725e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.125013541867e+00 true resid norm 6.938407962862e-02 ||r(i)||/||b|| 8.342102979173e-03 2 KSP preconditioned resid norm 2.753224763840e-01 true resid norm 3.470259476340e-03 ||r(i)||/||b|| 4.172320519495e-04 3 KSP preconditioned resid norm 5.453723868197e-02 true resid norm 2.435711563085e-04 ||r(i)||/||b|| 2.928475349904e-05 4 KSP preconditioned resid norm 1.393605351203e-02 true resid norm 2.483572541359e-05 ||r(i)||/||b|| 2.986018984061e-06 5 KSP preconditioned resid norm 4.001576086112e-03 true resid norm 3.967061661642e-06 ||r(i)||/||b|| 4.769629731098e-07 6 KSP preconditioned resid norm 1.203419255777e-03 true resid norm 8.694345955467e-07 ||r(i)||/||b|| 1.045328116843e-07 7 KSP preconditioned resid norm 3.681776353361e-04 true resid norm 2.170164363810e-07 ||r(i)||/||b|| 2.609205843983e-08 8 KSP preconditioned resid norm 1.133185139502e-04 true resid norm 5.709208450774e-08 ||r(i)||/||b|| 6.864226646927e-09 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.899731398235e+01 true resid norm 8.473443321824e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.271320976668e+00 true resid norm 7.175576566350e-02 ||r(i)||/||b|| 8.468312460259e-03 2 KSP preconditioned resid norm 3.041478946460e-01 true resid norm 3.751068060088e-03 ||r(i)||/||b|| 4.426852128020e-04 3 KSP preconditioned resid norm 6.073192281746e-02 true resid norm 2.768258375997e-04 ||r(i)||/||b|| 3.266981640000e-05 4 KSP preconditioned resid norm 1.533818714046e-02 true resid norm 2.936075651648e-05 ||r(i)||/||b|| 3.465032502296e-06 5 KSP preconditioned resid norm 4.340731625494e-03 true resid norm 4.687203424258e-06 ||r(i)||/||b|| 5.531639554590e-07 6 KSP preconditioned resid norm 1.291360481717e-03 true resid norm 9.997258365401e-07 ||r(i)||/||b|| 1.179834216823e-07 7 KSP preconditioned resid norm 3.922740644238e-04 true resid norm 2.432458554553e-07 ||r(i)||/||b|| 2.870684870562e-08 8 KSP preconditioned resid norm 1.201774132414e-04 true resid norm 6.281192183188e-08 ||r(i)||/||b|| 7.412797778455e-09 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.502595388700e+01 true resid norm 7.764751445773e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.711375677132e+00 true resid norm 6.281391181068e-02 ||r(i)||/||b|| 8.089622990428e-03 2 KSP preconditioned resid norm 1.798196048557e-01 true resid norm 3.026342163926e-03 ||r(i)||/||b|| 3.897539006961e-04 3 KSP preconditioned resid norm 2.646075461213e-02 true resid norm 1.990275169879e-04 ||r(i)||/||b|| 2.563218132323e-05 4 KSP preconditioned resid norm 5.005455378564e-03 true resid norm 1.962673757769e-05 ||r(i)||/||b|| 2.527671067742e-06 5 KSP preconditioned resid norm 1.157686482779e-03 true resid norm 2.982627832190e-06 ||r(i)||/||b|| 3.841240576753e-07 6 KSP preconditioned resid norm 3.111496627693e-04 true resid norm 5.812699697452e-07 ||r(i)||/||b|| 7.486008712638e-08 7 KSP preconditioned resid norm 9.092459826863e-05 true resid norm 1.259577880087e-07 ||r(i)||/||b|| 1.622174114502e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.539704789089e+01 true resid norm 7.439074268398e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.764959183776e+00 true resid norm 6.255938595730e-02 ||r(i)||/||b|| 8.409565989018e-03 2 KSP preconditioned resid norm 1.923075694131e-01 true resid norm 2.994044237263e-03 ||r(i)||/||b|| 4.024753792259e-04 3 KSP preconditioned resid norm 2.964742785146e-02 true resid norm 1.952603101682e-04 ||r(i)||/||b|| 2.624793127792e-05 4 KSP preconditioned resid norm 5.832936796276e-03 true resid norm 1.964045833973e-05 ||r(i)||/||b|| 2.640175058228e-06 5 KSP preconditioned resid norm 1.373735495457e-03 true resid norm 3.107326702596e-06 ||r(i)||/||b|| 4.177034118071e-07 6 KSP preconditioned resid norm 3.680966754528e-04 true resid norm 6.225296914311e-07 ||r(i)||/||b|| 8.368375808205e-08 7 KSP preconditioned resid norm 1.063368180415e-04 true resid norm 1.372873460509e-07 ||r(i)||/||b|| 1.845489654998e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.516853965321e+01 true resid norm 7.512071683441e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.745948043658e+00 true resid norm 6.146997081857e-02 ||r(i)||/||b|| 8.182825378792e-03 2 KSP preconditioned resid norm 1.856320628503e-01 true resid norm 2.926333722302e-03 ||r(i)||/||b|| 3.895508250744e-04 3 KSP preconditioned resid norm 2.664111747003e-02 true resid norm 1.909608073209e-04 ||r(i)||/||b|| 2.542052517175e-05 4 KSP preconditioned resid norm 4.561106884265e-03 true resid norm 1.908629761338e-05 ||r(i)||/||b|| 2.540750197506e-06 5 KSP preconditioned resid norm 8.571346688183e-04 true resid norm 2.949144006915e-06 ||r(i)||/||b|| 3.925873089598e-07 6 KSP preconditioned resid norm 1.703374200425e-04 true resid norm 5.742192048235e-07 ||r(i)||/||b|| 7.643952680713e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.638377277247e+01 true resid norm 7.209608750357e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.068046522605e+00 true resid norm 6.410501424279e-02 ||r(i)||/||b|| 8.891607916951e-03 2 KSP preconditioned resid norm 2.755312668380e-01 true resid norm 3.134980663089e-03 ||r(i)||/||b|| 4.348336742869e-04 3 KSP preconditioned resid norm 5.355116524575e-02 true resid norm 2.142329106286e-04 ||r(i)||/||b|| 2.971491492073e-05 4 KSP preconditioned resid norm 1.325735400907e-02 true resid norm 2.278066123409e-05 ||r(i)||/||b|| 3.159763868319e-06 5 KSP preconditioned resid norm 3.738981893806e-03 true resid norm 3.869362084462e-06 ||r(i)||/||b|| 5.366951548196e-07 6 KSP preconditioned resid norm 1.116763287221e-03 true resid norm 8.542614139285e-07 ||r(i)||/||b|| 1.184892888794e-07 7 KSP preconditioned resid norm 3.409705116615e-04 true resid norm 2.106424642057e-07 ||r(i)||/||b|| 2.921690642301e-08 8 KSP preconditioned resid norm 1.049042971513e-04 true resid norm 5.479641645353e-08 ||r(i)||/||b|| 7.600470198998e-09 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.582496806753e+01 true resid norm 7.007978381257e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.922270330354e+00 true resid norm 6.405970624398e-02 ||r(i)||/||b|| 9.140968016584e-03 2 KSP preconditioned resid norm 2.361088382923e-01 true resid norm 3.197337555893e-03 ||r(i)||/||b|| 4.562424970437e-04 3 KSP preconditioned resid norm 4.307774811713e-02 true resid norm 2.214030465997e-04 ||r(i)||/||b|| 3.159299794529e-05 4 KSP preconditioned resid norm 1.030180300756e-02 true resid norm 2.255959421114e-05 ||r(i)||/||b|| 3.219130109117e-06 5 KSP preconditioned resid norm 2.863215680204e-03 true resid norm 3.532986374356e-06 ||r(i)||/||b|| 5.041377387529e-07 6 KSP preconditioned resid norm 8.511412143899e-04 true resid norm 7.391370624722e-07 ||r(i)||/||b|| 1.054707966065e-07 7 KSP preconditioned resid norm 2.596243166646e-04 true resid norm 1.765201242099e-07 ||r(i)||/||b|| 2.518845159141e-08 8 KSP preconditioned resid norm 7.989810165895e-05 true resid norm 4.485790711321e-08 ||r(i)||/||b|| 6.400976811398e-09 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.522854583493e+01 true resid norm 6.692373785622e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.807898199113e+00 true resid norm 6.212951655353e-02 ||r(i)||/||b|| 9.283629179083e-03 2 KSP preconditioned resid norm 2.023372597425e-01 true resid norm 3.034673158239e-03 ||r(i)||/||b|| 4.534524304005e-04 3 KSP preconditioned resid norm 3.221863649017e-02 true resid norm 2.042057039303e-04 ||r(i)||/||b|| 3.051319464089e-05 4 KSP preconditioned resid norm 6.711279285179e-03 true resid norm 2.071376911376e-05 ||r(i)||/||b|| 3.095130334510e-06 5 KSP preconditioned resid norm 1.699069911367e-03 true resid norm 3.204338183200e-06 ||r(i)||/||b|| 4.788044251330e-07 6 KSP preconditioned resid norm 4.828295035329e-04 true resid norm 6.370681683610e-07 ||r(i)||/||b|| 9.519315399413e-08 7 KSP preconditioned resid norm 1.446629350040e-04 true resid norm 1.423332457863e-07 ||r(i)||/||b|| 2.126797610917e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.369369865932e+01 true resid norm 6.419960937003e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.634447628527e+00 true resid norm 5.936698703507e-02 ||r(i)||/||b|| 9.247250507850e-03 2 KSP preconditioned resid norm 1.691452018928e-01 true resid norm 2.844089238837e-03 ||r(i)||/||b|| 4.430072498486e-04 3 KSP preconditioned resid norm 2.354936646513e-02 true resid norm 1.834098095630e-04 ||r(i)||/||b|| 2.856867999085e-05 4 KSP preconditioned resid norm 3.910713765820e-03 true resid norm 1.789914597914e-05 ||r(i)||/||b|| 2.788045932798e-06 5 KSP preconditioned resid norm 7.108169647909e-04 true resid norm 2.716445318762e-06 ||r(i)||/||b|| 4.231248983316e-07 6 KSP preconditioned resid norm 1.360506276374e-04 true resid norm 5.231018192223e-07 ||r(i)||/||b|| 8.148052992150e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.439282011697e+01 true resid norm 6.501304853461e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.784122571026e+00 true resid norm 5.927774072184e-02 ||r(i)||/||b|| 9.117822046183e-03 2 KSP preconditioned resid norm 2.100070708726e-01 true resid norm 2.898542945034e-03 ||r(i)||/||b|| 4.458401829121e-04 3 KSP preconditioned resid norm 3.631995882393e-02 true resid norm 1.923579437134e-04 ||r(i)||/||b|| 2.958759019138e-05 4 KSP preconditioned resid norm 8.273660265221e-03 true resid norm 1.934788391392e-05 ||r(i)||/||b|| 2.976000103059e-06 5 KSP preconditioned resid norm 2.233547222895e-03 true resid norm 3.095340995228e-06 ||r(i)||/||b|| 4.761107293069e-07 6 KSP preconditioned resid norm 6.556013690560e-04 true resid norm 6.469408377034e-07 ||r(i)||/||b|| 9.950938346769e-08 7 KSP preconditioned resid norm 1.990825959201e-04 true resid norm 1.517952832163e-07 ||r(i)||/||b|| 2.334843337419e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.604171368754e+01 true resid norm 6.772662123796e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.068557372241e+00 true resid norm 6.210916897051e-02 ||r(i)||/||b|| 9.170569539013e-03 2 KSP preconditioned resid norm 2.762443525895e-01 true resid norm 3.084218227050e-03 ||r(i)||/||b|| 4.553923066993e-04 3 KSP preconditioned resid norm 5.367285589591e-02 true resid norm 2.126382044193e-04 ||r(i)||/||b|| 3.139654696078e-05 4 KSP preconditioned resid norm 1.320328896198e-02 true resid norm 2.270049850473e-05 ||r(i)||/||b|| 3.351783698905e-06 5 KSP preconditioned resid norm 3.690518029806e-03 true resid norm 3.991728310729e-06 ||r(i)||/||b|| 5.893883731043e-07 6 KSP preconditioned resid norm 1.094745831098e-03 true resid norm 9.164221060700e-07 ||r(i)||/||b|| 1.353119481408e-07 7 KSP preconditioned resid norm 3.328952092583e-04 true resid norm 2.300219560819e-07 ||r(i)||/||b|| 3.396330008458e-08 8 KSP preconditioned resid norm 1.022081738960e-04 true resid norm 5.981925936625e-08 ||r(i)||/||b|| 8.832458828276e-09 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.780808376439e+01 true resid norm 6.713312534575e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.385372539891e+00 true resid norm 6.383475912199e-02 ||r(i)||/||b|| 9.508682754337e-03 2 KSP preconditioned resid norm 3.399821769282e-01 true resid norm 3.251141156598e-03 ||r(i)||/||b|| 4.842827054236e-04 3 KSP preconditioned resid norm 6.902491822362e-02 true resid norm 2.345613497260e-04 ||r(i)||/||b|| 3.493973333105e-05 4 KSP preconditioned resid norm 1.747997768745e-02 true resid norm 2.632845640786e-05 ||r(i)||/||b|| 3.921827901243e-06 5 KSP preconditioned resid norm 4.962332371772e-03 true resid norm 4.823537940599e-06 ||r(i)||/||b|| 7.185034088249e-07 6 KSP preconditioned resid norm 1.482124578546e-03 true resid norm 1.139256721020e-06 ||r(i)||/||b|| 1.697011296812e-07 7 KSP preconditioned resid norm 4.519220796520e-04 true resid norm 2.915434569013e-07 ||r(i)||/||b|| 4.342766039861e-08 8 KSP preconditioned resid norm 1.388949309131e-04 true resid norm 7.694646101830e-08 ||r(i)||/||b|| 1.146177250381e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.319051100093e+01 true resid norm 6.344627407928e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.641245235677e+00 true resid norm 5.780484056365e-02 ||r(i)||/||b|| 9.110832968917e-03 2 KSP preconditioned resid norm 1.826227973840e-01 true resid norm 2.797571424735e-03 ||r(i)||/||b|| 4.409354946894e-04 3 KSP preconditioned resid norm 2.884904842629e-02 true resid norm 1.840215908936e-04 ||r(i)||/||b|| 2.900431799410e-05 4 KSP preconditioned resid norm 5.917113448411e-03 true resid norm 1.827455323506e-05 ||r(i)||/||b|| 2.880319372611e-06 5 KSP preconditioned resid norm 1.473683841991e-03 true resid norm 2.844506820931e-06 ||r(i)||/||b|| 4.483331546587e-07 6 KSP preconditioned resid norm 4.145535974213e-04 true resid norm 5.741083476731e-07 ||r(i)||/||b|| 9.048732270010e-08 7 KSP preconditioned resid norm 1.236679715586e-04 true resid norm 1.295317420614e-07 ||r(i)||/||b|| 2.041597303248e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.293113782820e+01 true resid norm 6.420108840463e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.593843939780e+00 true resid norm 5.749077824305e-02 ||r(i)||/||b|| 8.954798068330e-03 2 KSP preconditioned resid norm 1.710746783053e-01 true resid norm 2.755126795065e-03 ||r(i)||/||b|| 4.291402005058e-04 3 KSP preconditioned resid norm 2.516435337044e-02 true resid norm 1.791334259303e-04 ||r(i)||/||b|| 2.790192976190e-05 4 KSP preconditioned resid norm 4.605972638216e-03 true resid norm 1.751268510994e-05 ||r(i)||/||b|| 2.727786326543e-06 5 KSP preconditioned resid norm 1.009524419497e-03 true resid norm 2.671417855664e-06 ||r(i)||/||b|| 4.161016459451e-07 6 KSP preconditioned resid norm 2.590055715941e-04 true resid norm 5.263012567136e-07 ||r(i)||/||b|| 8.197699911200e-08 7 KSP preconditioned resid norm 7.383110348489e-05 true resid norm 1.154777979971e-07 ||r(i)||/||b|| 1.798689101177e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.218431977233e+01 true resid norm 6.291065478063e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.494753136017e+00 true resid norm 5.612615075844e-02 ||r(i)||/||b|| 8.921565187033e-03 2 KSP preconditioned resid norm 1.539898656868e-01 true resid norm 2.662778731487e-03 ||r(i)||/||b|| 4.232635538085e-04 3 KSP preconditioned resid norm 2.206038391435e-02 true resid norm 1.691798237713e-04 ||r(i)||/||b|| 2.689207803689e-05 4 KSP preconditioned resid norm 3.970755918411e-03 true resid norm 1.628247516861e-05 ||r(i)||/||b|| 2.588190382915e-06 5 KSP preconditioned resid norm 8.399322411882e-04 true resid norm 2.472987728290e-06 ||r(i)||/||b|| 3.930952136665e-07 6 KSP preconditioned resid norm 2.022196423671e-04 true resid norm 4.777083153113e-07 ||r(i)||/||b|| 7.593440522548e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.540429076153e+01 true resid norm 6.369472262435e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.153099639636e+00 true resid norm 6.007783371519e-02 ||r(i)||/||b|| 9.432152498648e-03 2 KSP preconditioned resid norm 3.074019062379e-01 true resid norm 3.059464841522e-03 ||r(i)||/||b|| 4.803325480458e-04 3 KSP preconditioned resid norm 6.185564600165e-02 true resid norm 2.215263202874e-04 ||r(i)||/||b|| 3.477938377940e-05 4 KSP preconditioned resid norm 1.542789914487e-02 true resid norm 2.450766693616e-05 ||r(i)||/||b|| 3.847676216552e-06 5 KSP preconditioned resid norm 4.334293732412e-03 true resid norm 4.343947687759e-06 ||r(i)||/||b|| 6.819949139867e-07 6 KSP preconditioned resid norm 1.287897373768e-03 true resid norm 1.005288999945e-06 ||r(i)||/||b|| 1.578292452695e-07 7 KSP preconditioned resid norm 3.918197099005e-04 true resid norm 2.549752701986e-07 ||r(i)||/||b|| 4.003083139279e-08 8 KSP preconditioned resid norm 1.203120163104e-04 true resid norm 6.704090780492e-08 ||r(i)||/||b|| 1.052534731964e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.749803664393e+01 true resid norm 9.278294074541e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.079833386084e+00 true resid norm 6.972955884157e-02 ||r(i)||/||b|| 7.515342613779e-03 2 KSP preconditioned resid norm 2.836735950951e-01 true resid norm 3.467626899071e-03 ||r(i)||/||b|| 3.737353948056e-04 3 KSP preconditioned resid norm 5.673182551584e-02 true resid norm 2.469602897599e-04 ||r(i)||/||b|| 2.661699314290e-05 4 KSP preconditioned resid norm 1.405709627129e-02 true resid norm 2.600712815309e-05 ||r(i)||/||b|| 2.803007529633e-06 5 KSP preconditioned resid norm 3.888786230383e-03 true resid norm 4.334171782771e-06 ||r(i)||/||b|| 4.671302448435e-07 6 KSP preconditioned resid norm 1.136447939433e-03 true resid norm 9.651127971671e-07 ||r(i)||/||b|| 1.040183453352e-07 7 KSP preconditioned resid norm 3.410655979078e-04 true resid norm 2.380813632113e-07 ||r(i)||/||b|| 2.566003634920e-08 8 KSP preconditioned resid norm 1.036781343467e-04 true resid norm 6.109988847979e-08 ||r(i)||/||b|| 6.585250261408e-09 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.458737369846e+01 true resid norm 7.615165427328e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.720992718798e+00 true resid norm 6.232185128220e-02 ||r(i)||/||b|| 8.183912992691e-03 2 KSP preconditioned resid norm 2.007921213053e-01 true resid norm 2.982933906698e-03 ||r(i)||/||b|| 3.917096660820e-04 3 KSP preconditioned resid norm 3.531412411212e-02 true resid norm 1.950731113373e-04 ||r(i)||/||b|| 2.561639838279e-05 4 KSP preconditioned resid norm 8.205324224781e-03 true resid norm 1.951273165309e-05 ||r(i)||/||b|| 2.562351644137e-06 5 KSP preconditioned resid norm 2.224227142822e-03 true resid norm 3.139367832999e-06 ||r(i)||/||b|| 4.122520860457e-07 6 KSP preconditioned resid norm 6.494153948755e-04 true resid norm 6.588928012754e-07 ||r(i)||/||b|| 8.652376728558e-08 7 KSP preconditioned resid norm 1.958547346219e-04 true resid norm 1.545180605252e-07 ||r(i)||/||b|| 2.029083438827e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.629102090237e+01 true resid norm 7.072336255926e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.065717986735e+00 true resid norm 6.639774682568e-02 ||r(i)||/||b|| 9.388375272746e-03 2 KSP preconditioned resid norm 2.700052881950e-01 true resid norm 3.496329815006e-03 ||r(i)||/||b|| 4.943670222237e-04 3 KSP preconditioned resid norm 5.138908017562e-02 true resid norm 2.596373620755e-04 ||r(i)||/||b|| 3.671168234655e-05 4 KSP preconditioned resid norm 1.244185271295e-02 true resid norm 2.697824466071e-05 ||r(i)||/||b|| 3.814615663686e-06 5 KSP preconditioned resid norm 3.438596133415e-03 true resid norm 4.204319877880e-06 ||r(i)||/||b|| 5.944739794233e-07 6 KSP preconditioned resid norm 1.012852620404e-03 true resid norm 9.039684894325e-07 ||r(i)||/||b|| 1.278175212152e-07 7 KSP preconditioned resid norm 3.067474386665e-04 true resid norm 2.218851690774e-07 ||r(i)||/||b|| 3.137367357095e-08 8 KSP preconditioned resid norm 9.396843058457e-05 true resid norm 5.709324169645e-08 ||r(i)||/||b|| 8.072755540804e-09 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.289650635823e+01 true resid norm 6.553006195680e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.553013423343e+00 true resid norm 5.854747317727e-02 ||r(i)||/||b|| 8.934444959912e-03 2 KSP preconditioned resid norm 1.593388108056e-01 true resid norm 2.834141727237e-03 ||r(i)||/||b|| 4.324948951071e-04 3 KSP preconditioned resid norm 2.255671860042e-02 true resid norm 1.850681740669e-04 ||r(i)||/||b|| 2.824172121017e-05 4 KSP preconditioned resid norm 3.984749474842e-03 true resid norm 1.792163398063e-05 ||r(i)||/||b|| 2.734872125169e-06 5 KSP preconditioned resid norm 8.238974970326e-04 true resid norm 2.683499378727e-06 ||r(i)||/||b|| 4.095066140020e-07 6 KSP preconditioned resid norm 1.958241503804e-04 true resid norm 5.209804475311e-07 ||r(i)||/||b|| 7.950251105737e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.881219844174e+01 true resid norm 6.740017841879e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.741489451857e+00 true resid norm 6.533530187944e-02 ||r(i)||/||b|| 9.693639306633e-03 2 KSP preconditioned resid norm 4.499039173531e-01 true resid norm 3.349706916855e-03 ||r(i)||/||b|| 4.969878411956e-04 3 KSP preconditioned resid norm 1.013467127374e-01 true resid norm 2.451547980500e-04 ||r(i)||/||b|| 3.637301915238e-05 4 KSP preconditioned resid norm 2.708100688431e-02 true resid norm 3.008581008908e-05 ||r(i)||/||b|| 4.463758226595e-06 5 KSP preconditioned resid norm 7.861148841005e-03 true resid norm 6.126541157124e-06 ||r(i)||/||b|| 9.089799613077e-07 6 KSP preconditioned resid norm 2.366564875924e-03 true resid norm 1.527380780447e-06 ||r(i)||/||b|| 2.266137592332e-07 7 KSP preconditioned resid norm 7.232952120375e-04 true resid norm 4.039463992639e-07 ||r(i)||/||b|| 5.993254153631e-08 8 KSP preconditioned resid norm 2.223905176399e-04 true resid norm 1.095394718396e-07 ||r(i)||/||b|| 1.625210413524e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.706861915162e+01 true resid norm 6.420949519166e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.360833715582e+00 true resid norm 6.331256726746e-02 ||r(i)||/||b|| 9.860312260434e-03 2 KSP preconditioned resid norm 3.493847290218e-01 true resid norm 3.190852714660e-03 ||r(i)||/||b|| 4.969440586842e-04 3 KSP preconditioned resid norm 7.301763982634e-02 true resid norm 2.256081041869e-04 ||r(i)||/||b|| 3.513625259215e-05 4 KSP preconditioned resid norm 1.866865220627e-02 true resid norm 2.616405155852e-05 ||r(i)||/||b|| 4.074794776135e-06 5 KSP preconditioned resid norm 5.301868750930e-03 true resid norm 5.067109998317e-06 ||r(i)||/||b|| 7.891527543072e-07 6 KSP preconditioned resid norm 1.580921575531e-03 true resid norm 1.223280285354e-06 ||r(i)||/||b|| 1.905139234785e-07 7 KSP preconditioned resid norm 4.813435539408e-04 true resid norm 3.148655731272e-07 ||r(i)||/||b|| 4.903722918042e-08 8 KSP preconditioned resid norm 1.478017274770e-04 true resid norm 8.317748981471e-08 ||r(i)||/||b|| 1.295407938755e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.910853766205e+01 true resid norm 6.620653834584e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.736319550761e+00 true resid norm 6.575892292109e-02 ||r(i)||/||b|| 9.932391054429e-03 2 KSP preconditioned resid norm 4.388188423538e-01 true resid norm 3.363392880051e-03 ||r(i)||/||b|| 5.080152148239e-04 3 KSP preconditioned resid norm 9.808395073397e-02 true resid norm 2.411339887112e-04 ||r(i)||/||b|| 3.642147659973e-05 4 KSP preconditioned resid norm 2.625388294770e-02 true resid norm 2.845342392277e-05 ||r(i)||/||b|| 4.297675823819e-06 5 KSP preconditioned resid norm 7.648166863014e-03 true resid norm 5.681113601087e-06 ||r(i)||/||b|| 8.580895094396e-07 6 KSP preconditioned resid norm 2.308821300648e-03 true resid norm 1.419726800415e-06 ||r(i)||/||b|| 2.144390623475e-07 7 KSP preconditioned resid norm 7.069063649087e-04 true resid norm 3.786896774109e-07 ||r(i)||/||b|| 5.719822949099e-08 8 KSP preconditioned resid norm 2.175909878951e-04 true resid norm 1.036208078478e-07 ||r(i)||/||b|| 1.565114419765e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.470747564202e+01 true resid norm 7.259131951444e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.711997441874e+00 true resid norm 6.211512993389e-02 ||r(i)||/||b|| 8.556826125957e-03 2 KSP preconditioned resid norm 1.896866714560e-01 true resid norm 3.033453978711e-03 ||r(i)||/||b|| 4.178810908799e-04 3 KSP preconditioned resid norm 3.069620322969e-02 true resid norm 2.007480584339e-04 ||r(i)||/||b|| 2.765455426029e-05 4 KSP preconditioned resid norm 6.640518857028e-03 true resid norm 1.941446264120e-05 ||r(i)||/||b|| 2.674488185510e-06 5 KSP preconditioned resid norm 1.740299295902e-03 true resid norm 2.921573935057e-06 ||r(i)||/||b|| 4.024687737596e-07 6 KSP preconditioned resid norm 5.042069623671e-04 true resid norm 5.873277557122e-07 ||r(i)||/||b|| 8.090881384176e-08 7 KSP preconditioned resid norm 1.523844301637e-04 true resid norm 1.345597509887e-07 ||r(i)||/||b|| 1.853661731027e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.374424859702e+01 true resid norm 6.473654827092e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.628485352705e+00 true resid norm 5.960193474573e-02 ||r(i)||/||b|| 9.206844717191e-03 2 KSP preconditioned resid norm 1.686736685924e-01 true resid norm 2.865875964280e-03 ||r(i)||/||b|| 4.426982965305e-04 3 KSP preconditioned resid norm 2.349945914887e-02 true resid norm 1.839597551959e-04 ||r(i)||/||b|| 2.841667653117e-05 4 KSP preconditioned resid norm 3.927592457555e-03 true resid norm 1.760990191896e-05 ||r(i)||/||b|| 2.720241098623e-06 5 KSP preconditioned resid norm 7.259862705746e-04 true resid norm 2.630318037425e-06 ||r(i)||/||b|| 4.063111345413e-07 6 KSP preconditioned resid norm 1.432347489889e-04 true resid norm 5.030699550097e-07 ||r(i)||/||b|| 7.771034577011e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.577238569537e+01 true resid norm 6.318478971900e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.058299770484e+00 true resid norm 6.244060857339e-02 ||r(i)||/||b|| 9.882221473092e-03 2 KSP preconditioned resid norm 2.808430201557e-01 true resid norm 3.137924992544e-03 ||r(i)||/||b|| 4.966266417122e-04 3 KSP preconditioned resid norm 5.686836090369e-02 true resid norm 2.192881953252e-04 ||r(i)||/||b|| 3.470585188309e-05 4 KSP preconditioned resid norm 1.455218349083e-02 true resid norm 2.350365877819e-05 ||r(i)||/||b|| 3.719828598421e-06 5 KSP preconditioned resid norm 4.168506600094e-03 true resid norm 4.035522713429e-06 ||r(i)||/||b|| 6.386857867812e-07 6 KSP preconditioned resid norm 1.251818879453e-03 true resid norm 9.103000175459e-07 ||r(i)||/||b|| 1.440694859624e-07 7 KSP preconditioned resid norm 3.828231612233e-04 true resid norm 2.289462399539e-07 ||r(i)||/||b|| 3.623439137364e-08 8 KSP preconditioned resid norm 1.178379622703e-04 true resid norm 6.042902793358e-08 ||r(i)||/||b|| 9.563856776658e-09 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.367051146709e+01 true resid norm 5.835036556706e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.726814827019e+00 true resid norm 5.842162690490e-02 ||r(i)||/||b|| 1.001221266348e-02 2 KSP preconditioned resid norm 1.979763412678e-01 true resid norm 2.837146843999e-03 ||r(i)||/||b|| 4.862260615554e-04 3 KSP preconditioned resid norm 3.205495268574e-02 true resid norm 1.875389746876e-04 ||r(i)||/||b|| 3.214015419870e-05 4 KSP preconditioned resid norm 6.689814232473e-03 true resid norm 1.903414165335e-05 ||r(i)||/||b|| 3.262043256862e-06 5 KSP preconditioned resid norm 1.680081561397e-03 true resid norm 3.063958341889e-06 ||r(i)||/||b|| 5.250966831336e-07 6 KSP preconditioned resid norm 4.732876784904e-04 true resid norm 6.361539998176e-07 ||r(i)||/||b|| 1.090231386959e-07 7 KSP preconditioned resid norm 1.410325293550e-04 true resid norm 1.464824300503e-07 ||r(i)||/||b|| 2.510394384452e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.490868682060e+01 true resid norm 6.367309258066e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.880657728230e+00 true resid norm 6.154066039168e-02 ||r(i)||/||b|| 9.665096808943e-03 2 KSP preconditioned resid norm 2.335059101102e-01 true resid norm 3.077285487121e-03 ||r(i)||/||b|| 4.832944910322e-04 3 KSP preconditioned resid norm 4.336444476919e-02 true resid norm 2.122239849596e-04 ||r(i)||/||b|| 3.333024616179e-05 4 KSP preconditioned resid norm 1.048642501184e-02 true resid norm 2.203657394315e-05 ||r(i)||/||b|| 3.460892670673e-06 5 KSP preconditioned resid norm 2.919066231687e-03 true resid norm 3.601195009315e-06 ||r(i)||/||b|| 5.655756400952e-07 6 KSP preconditioned resid norm 8.660138254662e-04 true resid norm 7.763193113252e-07 ||r(i)||/||b|| 1.219226646392e-07 7 KSP preconditioned resid norm 2.636201927127e-04 true resid norm 1.878712202035e-07 ||r(i)||/||b|| 2.950559060180e-08 8 KSP preconditioned resid norm 8.102252664185e-05 true resid norm 4.790544641669e-08 ||r(i)||/||b|| 7.523656300503e-09 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.328009745133e+01 true resid norm 6.600828421413e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.647494214703e+00 true resid norm 5.925902098795e-02 ||r(i)||/||b|| 8.977512700635e-03 2 KSP preconditioned resid norm 1.932947126784e-01 true resid norm 2.881375851518e-03 ||r(i)||/||b|| 4.365173077626e-04 3 KSP preconditioned resid norm 3.393807913462e-02 true resid norm 1.921669813879e-04 ||r(i)||/||b|| 2.911255513997e-05 4 KSP preconditioned resid norm 7.897525078710e-03 true resid norm 1.922469987282e-05 ||r(i)||/||b|| 2.912467745783e-06 5 KSP preconditioned resid norm 2.161598590804e-03 true resid norm 3.037575253559e-06 ||r(i)||/||b|| 4.601809136115e-07 6 KSP preconditioned resid norm 6.383760868274e-04 true resid norm 6.370514920602e-07 ||r(i)||/||b|| 9.651083945670e-08 7 KSP preconditioned resid norm 1.943419222788e-04 true resid norm 1.508121906122e-07 ||r(i)||/||b|| 2.284746413389e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.356446884823e+01 true resid norm 6.455604126813e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.733176529055e+00 true resid norm 5.882076557168e-02 ||r(i)||/||b|| 9.111581877733e-03 2 KSP preconditioned resid norm 2.132488667088e-01 true resid norm 2.885176001591e-03 ||r(i)||/||b|| 4.469257942270e-04 3 KSP preconditioned resid norm 3.936090116414e-02 true resid norm 1.962460190460e-04 ||r(i)||/||b|| 3.039932672311e-05 4 KSP preconditioned resid norm 9.506226430159e-03 true resid norm 2.007674579645e-05 ||r(i)||/||b|| 3.109971646660e-06 5 KSP preconditioned resid norm 2.649434105757e-03 true resid norm 3.253859571305e-06 ||r(i)||/||b|| 5.040364166368e-07 6 KSP preconditioned resid norm 7.869519402409e-04 true resid norm 7.008591578850e-07 ||r(i)||/||b|| 1.085660062354e-07 7 KSP preconditioned resid norm 2.397026616495e-04 true resid norm 1.696455873827e-07 ||r(i)||/||b|| 2.627880893100e-08 8 KSP preconditioned resid norm 7.368682726642e-05 true resid norm 4.326798094993e-08 ||r(i)||/||b|| 6.702390682572e-09 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.433251626841e+01 true resid norm 8.096542120603e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.635365724662e+00 true resid norm 6.235910126288e-02 ||r(i)||/||b|| 7.701942426038e-03 2 KSP preconditioned resid norm 1.807795564433e-01 true resid norm 3.063207987949e-03 ||r(i)||/||b|| 3.783353365326e-04 3 KSP preconditioned resid norm 2.764599420796e-02 true resid norm 2.092256608745e-04 ||r(i)||/||b|| 2.584136014584e-05 4 KSP preconditioned resid norm 5.142233172447e-03 true resid norm 2.053964813723e-05 ||r(i)||/||b|| 2.536842003819e-06 5 KSP preconditioned resid norm 1.065960922431e-03 true resid norm 3.048917632242e-06 ||r(i)||/||b|| 3.765703416133e-07 6 KSP preconditioned resid norm 2.364828752845e-04 true resid norm 5.989802228012e-07 ||r(i)||/||b|| 7.397975751611e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.577634100517e+01 true resid norm 7.032286094337e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.989066161074e+00 true resid norm 6.356445361830e-02 ||r(i)||/||b|| 9.038945908285e-03 2 KSP preconditioned resid norm 2.613513554803e-01 true resid norm 3.261943008369e-03 ||r(i)||/||b|| 4.638524321408e-04 3 KSP preconditioned resid norm 4.994138961243e-02 true resid norm 2.343194361429e-04 ||r(i)||/||b|| 3.332052095144e-05 4 KSP preconditioned resid norm 1.211874281830e-02 true resid norm 2.460816501042e-05 ||r(i)||/||b|| 3.499312269198e-06 5 KSP preconditioned resid norm 3.359113222156e-03 true resid norm 4.068627710469e-06 ||r(i)||/||b|| 5.785640197069e-07 6 KSP preconditioned resid norm 9.925423514485e-04 true resid norm 9.049874911615e-07 ||r(i)||/||b|| 1.286903688248e-07 7 KSP preconditioned resid norm 3.014358554735e-04 true resid norm 2.242315940078e-07 ||r(i)||/||b|| 3.188601700780e-08 8 KSP preconditioned resid norm 9.255484249996e-05 true resid norm 5.780746733551e-08 ||r(i)||/||b|| 8.220295158648e-09 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.569243924476e+01 true resid norm 9.064613736792e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.703823115560e+00 true resid norm 6.290338627461e-02 ||r(i)||/||b|| 6.939444757507e-03 2 KSP preconditioned resid norm 1.840154818933e-01 true resid norm 2.945944603143e-03 ||r(i)||/||b|| 3.249939477493e-04 3 KSP preconditioned resid norm 2.860505586694e-02 true resid norm 1.933282313708e-04 ||r(i)||/||b|| 2.132779586471e-05 4 KSP preconditioned resid norm 5.880478223947e-03 true resid norm 1.952176072646e-05 ||r(i)||/||b|| 2.153623010678e-06 5 KSP preconditioned resid norm 1.465104906819e-03 true resid norm 3.074768895816e-06 ||r(i)||/||b|| 3.392057273589e-07 6 KSP preconditioned resid norm 4.093302273000e-04 true resid norm 6.214488216062e-07 ||r(i)||/||b|| 6.855767268757e-08 7 KSP preconditioned resid norm 1.210996687700e-04 true resid norm 1.395868234654e-07 ||r(i)||/||b|| 1.539909228551e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.523869914152e+01 true resid norm 7.927084745636e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.742526280638e+00 true resid norm 6.254988421301e-02 ||r(i)||/||b|| 7.890654158510e-03 2 KSP preconditioned resid norm 1.970885948627e-01 true resid norm 3.027082348891e-03 ||r(i)||/||b|| 3.818657736134e-04 3 KSP preconditioned resid norm 3.264960212775e-02 true resid norm 2.039962759632e-04 ||r(i)||/||b|| 2.573408541841e-05 4 KSP preconditioned resid norm 7.174035863428e-03 true resid norm 2.056889410911e-05 ||r(i)||/||b|| 2.594761475261e-06 5 KSP preconditioned resid norm 1.888859278802e-03 true resid norm 3.177229115786e-06 ||r(i)||/||b|| 4.008067552874e-07 6 KSP preconditioned resid norm 5.466172712045e-04 true resid norm 6.401968342862e-07 ||r(i)||/||b|| 8.076069006814e-08 7 KSP preconditioned resid norm 1.648022907798e-04 true resid norm 1.457627303784e-07 ||r(i)||/||b|| 1.838793643005e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.782638706492e+01 true resid norm 8.399532060039e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.151770042954e+00 true resid norm 6.670364052087e-02 ||r(i)||/||b|| 7.941351975810e-03 2 KSP preconditioned resid norm 2.868679242477e-01 true resid norm 3.255096385754e-03 ||r(i)||/||b|| 3.875330628525e-04 3 KSP preconditioned resid norm 5.629720364679e-02 true resid norm 2.204944714407e-04 ||r(i)||/||b|| 2.625080419536e-05 4 KSP preconditioned resid norm 1.399805025793e-02 true resid norm 2.347609779644e-05 ||r(i)||/||b|| 2.794929244705e-06 5 KSP preconditioned resid norm 3.933081083181e-03 true resid norm 4.097004250130e-06 ||r(i)||/||b|| 4.877657732413e-07 6 KSP preconditioned resid norm 1.168008706796e-03 true resid norm 9.220271581467e-07 ||r(i)||/||b|| 1.097712529170e-07 7 KSP preconditioned resid norm 3.549912639929e-04 true resid norm 2.285960493281e-07 ||r(i)||/||b|| 2.721533148443e-08 8 KSP preconditioned resid norm 1.088962797244e-04 true resid norm 5.934150648755e-08 ||r(i)||/||b|| 7.064858621098e-09 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.456616611264e+01 true resid norm 7.103952667466e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.738048282652e+00 true resid norm 6.146681223780e-02 ||r(i)||/||b|| 8.652480543584e-03 2 KSP preconditioned resid norm 2.010827403476e-01 true resid norm 3.005537480122e-03 ||r(i)||/||b|| 4.230796038220e-04 3 KSP preconditioned resid norm 3.479587108888e-02 true resid norm 2.013576836119e-04 ||r(i)||/||b|| 2.834445737991e-05 4 KSP preconditioned resid norm 8.004954312408e-03 true resid norm 2.011626863706e-05 ||r(i)||/||b|| 2.831700826102e-06 5 KSP preconditioned resid norm 2.169999543319e-03 true resid norm 3.145994088018e-06 ||r(i)||/||b|| 4.428512175237e-07 6 KSP preconditioned resid norm 6.363013418778e-04 true resid norm 6.509518823405e-07 ||r(i)||/||b|| 9.163235072239e-08 7 KSP preconditioned resid norm 1.927789823338e-04 true resid norm 1.523507700127e-07 ||r(i)||/||b|| 2.144591569570e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.736598102695e+01 true resid norm 6.689035578158e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.192935217599e+00 true resid norm 6.756219553379e-02 ||r(i)||/||b|| 1.010043895631e-02 2 KSP preconditioned resid norm 3.050153278798e-01 true resid norm 3.456733451247e-03 ||r(i)||/||b|| 5.167760599949e-04 3 KSP preconditioned resid norm 6.266314579726e-02 true resid norm 2.426393427180e-04 ||r(i)||/||b|| 3.627418928826e-05 4 KSP preconditioned resid norm 1.604895456474e-02 true resid norm 2.556429851436e-05 ||r(i)||/||b|| 3.821821279861e-06 5 KSP preconditioned resid norm 4.576463262197e-03 true resid norm 4.501759338923e-06 ||r(i)||/||b|| 6.730057399639e-07 6 KSP preconditioned resid norm 1.368433038124e-03 true resid norm 1.054396941134e-06 ||r(i)||/||b|| 1.576306373040e-07 7 KSP preconditioned resid norm 4.172999938038e-04 true resid norm 2.695552177622e-07 ||r(i)||/||b|| 4.029806907328e-08 8 KSP preconditioned resid norm 1.282421588259e-04 true resid norm 7.114499163044e-08 ||r(i)||/||b|| 1.063606117790e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.448907486439e+01 true resid norm 6.170711792567e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.799963512560e+00 true resid norm 6.184797801950e-02 ||r(i)||/||b|| 1.002282720350e-02 2 KSP preconditioned resid norm 2.214104968027e-01 true resid norm 3.093568300090e-03 ||r(i)||/||b|| 5.013308681531e-04 3 KSP preconditioned resid norm 4.112789941313e-02 true resid norm 2.127259817292e-04 ||r(i)||/||b|| 3.447349169434e-05 4 KSP preconditioned resid norm 9.952116779740e-03 true resid norm 2.149478809098e-05 ||r(i)||/||b|| 3.483356347460e-06 5 KSP preconditioned resid norm 2.774581462852e-03 true resid norm 3.354487003741e-06 ||r(i)||/||b|| 5.436142727946e-07 6 KSP preconditioned resid norm 8.244366600153e-04 true resid norm 7.044609596997e-07 ||r(i)||/||b|| 1.141620259349e-07 7 KSP preconditioned resid norm 2.512216693396e-04 true resid norm 1.694457445215e-07 ||r(i)||/||b|| 2.745967567723e-08 8 KSP preconditioned resid norm 7.725241021402e-05 true resid norm 4.335650763646e-08 ||r(i)||/||b|| 7.026176086961e-09 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.229649865397e+01 true resid norm 5.829388981682e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.487659069678e+00 true resid norm 5.688818654468e-02 ||r(i)||/||b|| 9.758859242957e-03 2 KSP preconditioned resid norm 1.504139475261e-01 true resid norm 2.738474282385e-03 ||r(i)||/||b|| 4.697703809078e-04 3 KSP preconditioned resid norm 2.114452474735e-02 true resid norm 1.760893597256e-04 ||r(i)||/||b|| 3.020717270351e-05 4 KSP preconditioned resid norm 3.705462396559e-03 true resid norm 1.697757030885e-05 ||r(i)||/||b|| 2.912409921897e-06 5 KSP preconditioned resid norm 7.523032598293e-04 true resid norm 2.560135259679e-06 ||r(i)||/||b|| 4.391772907459e-07 6 KSP preconditioned resid norm 1.725361804539e-04 true resid norm 4.904874750783e-07 ||r(i)||/||b|| 8.414046079606e-08 Residual norms for poisson_ solve. 0 KSP preconditioned resid norm 2.169273471129e+01 true resid norm 5.633528528137e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.461001754611e+00 true resid norm 5.545968870228e-02 ||r(i)||/||b|| 9.844574040104e-03 2 KSP preconditioned resid norm 1.506258444525e-01 true resid norm 2.664134578651e-03 ||r(i)||/||b|| 4.729069117773e-04 3 KSP preconditioned resid norm 2.181448653132e-02 true resid norm 1.716180957136e-04 ||r(i)||/||b|| 3.046369515241e-05 4 KSP preconditioned resid norm 4.025019609685e-03 true resid norm 1.662603988500e-05 ||r(i)||/||b|| 2.951265765667e-06 5 KSP preconditioned resid norm 8.841256523281e-04 true resid norm 2.514562175279e-06 ||r(i)||/||b|| 4.463565175395e-07 6 KSP preconditioned resid norm 2.217615861548e-04 true resid norm 4.835418462089e-07 ||r(i)||/||b|| 8.583285658248e-08 7 KSP preconditioned resid norm 6.092049793034e-05 true resid norm 1.022597738209e-07 ||r(i)||/||b|| 1.815199360580e-08 escape_time reached, so abort body 1 implicit forces and moment 1 0.869079103902895 -0.476901469532182 8.158433911816894E-002 0.428147562839886 0.558124919160805 -0.928673720033728 body 2 implicit forces and moment 2 0.551071771603911 0.775546452816806 0.135476343629996 -0.634587258760693 0.290234856330845 0.936523165974780 From knepley at gmail.com Tue Nov 3 06:52:30 2015 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 3 Nov 2015 06:52:30 -0600 Subject: [petsc-users] Scaling with number of cores In-Reply-To: <5638AD42.9060609@gmail.com> References: <5634EDA4.5030203@gmail.com> <56356E2A.8090000@gmail.com> <0BEDF334-C0C6-49D1-9FE8-D178181C5F00@mcs.anl.gov> <5636140A.3040506@gmail.com> <61EEB3FD-A093-475C-9841-9E79BB120385@mcs.anl.gov> <5636BDF3.8000109@gmail.com> <5636E059.2010107@gmail.com> <501DC517-2887-4222-9ECD-9951D521A4E5@mcs.anl.gov> <56370085.7070502@gmail.com> <6FF57BB6-FB0D-49F0-BF84-E4EA760FC010@mcs.anl.gov> <56372A12.90900@gmail.com> <563839F8.9080000@gmail.com> <589BE4CF-18B5-4F04-A442-7FB0F2F3AFC1@mcs.anl.gov> <5638AD42.9060609@gmail.com> Message-ID: On Tue, Nov 3, 2015 at 6:49 AM, TAY wee-beng wrote: > Hi, > > I tried and have attached the log. > > Ya, my Poisson eqn has Neumann boundary condition. Do I need to specify > some null space stuff? Like KSPSetNullSpace or MatNullSpaceCreate? Yes, you need to attach the constant null space to the matrix. Thanks, Matt > > Thank you > > Yours sincerely, > > TAY wee-beng > > On 3/11/2015 12:45 PM, Barry Smith wrote: > >> On Nov 2, 2015, at 10:37 PM, TAY wee-beng wrote: >>> >>> Hi, >>> >>> I tried : >>> >>> 1. -poisson_pc_gamg_agg_nsmooths 1 -poisson_pc_type gamg >>> >>> 2. -poisson_pc_type gamg >>> >> Run with -poisson_ksp_monitor_true_residual >> -poisson_ksp_monitor_converged_reason >> Does your poisson have Neumann boundary conditions? Do you have any zeros >> on the diagonal for the matrix (you shouldn't). >> >> There may be something wrong with your poisson discretization that was >> also messing up hypre >> >> >> >> Both options give: >>> >>> 1 0.00150000 0.00000000 0.00000000 1.00000000 >>> NaN NaN NaN >>> M Diverged but why?, time = 2 >>> reason = -9 >>> >>> How can I check what's wrong? >>> >>> Thank you >>> >>> Yours sincerely, >>> >>> TAY wee-beng >>> >>> On 3/11/2015 3:18 AM, Barry Smith wrote: >>> >>>> hypre is just not scaling well here. I do not know why. Since hypre >>>> is a block box for us there is no way to determine why the poor scaling. >>>> >>>> If you make the same two runs with -pc_type gamg there will be a >>>> lot more information in the log summary about in what routines it is >>>> scaling well or poorly. >>>> >>>> Barry >>>> >>>> >>>> >>>> On Nov 2, 2015, at 3:17 AM, TAY wee-beng wrote: >>>>> >>>>> Hi, >>>>> >>>>> I have attached the 2 files. >>>>> >>>>> Thank you >>>>> >>>>> Yours sincerely, >>>>> >>>>> TAY wee-beng >>>>> >>>>> On 2/11/2015 2:55 PM, Barry Smith wrote: >>>>> >>>>>> Run (158/2)x(266/2)x(150/2) grid on 8 processes and then >>>>>> (158)x(266)x(150) on 64 processors and send the two -log_summary results >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> >>>>>>> On Nov 2, 2015, at 12:19 AM, TAY wee-beng wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I have attached the new results. >>>>>>> >>>>>>> Thank you >>>>>>> >>>>>>> Yours sincerely, >>>>>>> >>>>>>> TAY wee-beng >>>>>>> >>>>>>> On 2/11/2015 12:27 PM, Barry Smith wrote: >>>>>>> >>>>>>>> Run without the -momentum_ksp_view -poisson_ksp_view and send >>>>>>>> the new results >>>>>>>> >>>>>>>> >>>>>>>> You can see from the log summary that the PCSetUp is taking a >>>>>>>> much smaller percentage of the time meaning that it is reusing the >>>>>>>> preconditioner and not rebuilding it each time. >>>>>>>> >>>>>>>> Barry >>>>>>>> >>>>>>>> Something makes no sense with the output: it gives >>>>>>>> >>>>>>>> KSPSolve 199 1.0 2.3298e+03 1.0 5.20e+09 1.8 3.8e+04 >>>>>>>> 9.9e+05 5.0e+02 90100 66100 24 90100 66100 24 165 >>>>>>>> >>>>>>>> 90% of the time is in the solve but there is no significant amount >>>>>>>> of time in other events of the code which is just not possible. I hope it >>>>>>>> is due to your IO. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Nov 1, 2015, at 10:02 PM, TAY wee-beng wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I have attached the new run with 100 time steps for 48 and 96 >>>>>>>>> cores. >>>>>>>>> >>>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want >>>>>>>>> to reuse the preconditioner, what must I do? Or what must I not do? >>>>>>>>> >>>>>>>>> Why does the number of processes increase so much? Is there >>>>>>>>> something wrong with my coding? Seems to be so too for my new run. >>>>>>>>> >>>>>>>>> Thank you >>>>>>>>> >>>>>>>>> Yours sincerely, >>>>>>>>> >>>>>>>>> TAY wee-beng >>>>>>>>> >>>>>>>>> On 2/11/2015 9:49 AM, Barry Smith wrote: >>>>>>>>> >>>>>>>>>> If you are doing many time steps with the same linear solver >>>>>>>>>> then you MUST do your weak scaling studies with MANY time steps since the >>>>>>>>>> setup time of AMG only takes place in the first stimestep. So run both 48 >>>>>>>>>> and 96 processes with the same large number of time steps. >>>>>>>>>> >>>>>>>>>> Barry >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Nov 1, 2015, at 7:35 PM, TAY wee-beng >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> Sorry I forgot and use the old a.out. I have attached the new >>>>>>>>>>> log for 48cores (log48), together with the 96cores log (log96). >>>>>>>>>>> >>>>>>>>>>> Why does the number of processes increase so much? Is there >>>>>>>>>>> something wrong with my coding? >>>>>>>>>>> >>>>>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I >>>>>>>>>>> want to reuse the preconditioner, what must I do? Or what must I not do? >>>>>>>>>>> >>>>>>>>>>> Lastly, I only simulated 2 time steps previously. Now I run for >>>>>>>>>>> 10 timesteps (log48_10). Is it building the preconditioner at every >>>>>>>>>>> timestep? >>>>>>>>>>> >>>>>>>>>>> Also, what about momentum eqn? Is it working well? >>>>>>>>>>> >>>>>>>>>>> I will try the gamg later too. >>>>>>>>>>> >>>>>>>>>>> Thank you >>>>>>>>>>> >>>>>>>>>>> Yours sincerely, >>>>>>>>>>> >>>>>>>>>>> TAY wee-beng >>>>>>>>>>> >>>>>>>>>>> On 2/11/2015 12:30 AM, Barry Smith wrote: >>>>>>>>>>> >>>>>>>>>>>> You used gmres with 48 processes but richardson with 96. You >>>>>>>>>>>> need to be careful and make sure you don't change the solvers when you >>>>>>>>>>>> change the number of processors since you can get very different >>>>>>>>>>>> inconsistent results >>>>>>>>>>>> >>>>>>>>>>>> Anyways all the time is being spent in the BoomerAMG >>>>>>>>>>>> algebraic multigrid setup and it is is scaling badly. When you double the >>>>>>>>>>>> problem size and number of processes it went from 3.2445e+01 to 4.3599e+02 >>>>>>>>>>>> seconds. >>>>>>>>>>>> >>>>>>>>>>>> PCSetUp 3 1.0 3.2445e+01 1.0 9.58e+06 2.0 >>>>>>>>>>>> 0.0e+00 0.0e+00 4.0e+00 62 8 0 0 4 62 8 0 0 5 11 >>>>>>>>>>>> >>>>>>>>>>>> PCSetUp 3 1.0 4.3599e+02 1.0 9.58e+06 2.0 >>>>>>>>>>>> 0.0e+00 0.0e+00 4.0e+00 85 18 0 0 6 85 18 0 0 6 2 >>>>>>>>>>>> >>>>>>>>>>>> Now is the Poisson problem changing at each timestep or can >>>>>>>>>>>> you use the same preconditioner built with BoomerAMG for all the time >>>>>>>>>>>> steps? Algebraic multigrid has a large set up time that you often doesn't >>>>>>>>>>>> matter if you have many time steps but if you have to rebuild it each >>>>>>>>>>>> timestep it is too large? >>>>>>>>>>>> >>>>>>>>>>>> You might also try -pc_type gamg and see how PETSc's >>>>>>>>>>>> algebraic multigrid scales for your problem/machine. >>>>>>>>>>>> >>>>>>>>>>>> Barry >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Nov 1, 2015, at 7:30 AM, TAY wee-beng >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 1/11/2015 10:00 AM, Barry Smith wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> On Oct 31, 2015, at 8:43 PM, TAY wee-beng >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 1/11/2015 12:47 AM, Matthew Knepley wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Sat, Oct 31, 2015 at 11:34 AM, TAY wee-beng< >>>>>>>>>>>>>>>> zonexo at gmail.com> wrote: >>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I understand that as mentioned in the faq, due to the >>>>>>>>>>>>>>>> limitations in memory, the scaling is not linear. So, I am trying to write >>>>>>>>>>>>>>>> a proposal to use a supercomputer. >>>>>>>>>>>>>>>> Its specs are: >>>>>>>>>>>>>>>> Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of memory >>>>>>>>>>>>>>>> per node) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 8 cores / processor >>>>>>>>>>>>>>>> Interconnect: Tofu (6-dimensional mesh/torus) Interconnect >>>>>>>>>>>>>>>> Each cabinet contains 96 computing nodes, >>>>>>>>>>>>>>>> One of the requirement is to give the performance of my >>>>>>>>>>>>>>>> current code with my current set of data, and there is a formula to >>>>>>>>>>>>>>>> calculate the estimated parallel efficiency when using the new large set of >>>>>>>>>>>>>>>> data >>>>>>>>>>>>>>>> There are 2 ways to give performance: >>>>>>>>>>>>>>>> 1. Strong scaling, which is defined as how the elapsed time >>>>>>>>>>>>>>>> varies with the number of processors for a fixed >>>>>>>>>>>>>>>> problem. >>>>>>>>>>>>>>>> 2. Weak scaling, which is defined as how the elapsed time >>>>>>>>>>>>>>>> varies with the number of processors for a >>>>>>>>>>>>>>>> fixed problem size per processor. >>>>>>>>>>>>>>>> I ran my cases with 48 and 96 cores with my current >>>>>>>>>>>>>>>> cluster, giving 140 and 90 mins respectively. This is classified as strong >>>>>>>>>>>>>>>> scaling. >>>>>>>>>>>>>>>> Cluster specs: >>>>>>>>>>>>>>>> CPU: AMD 6234 2.4GHz >>>>>>>>>>>>>>>> 8 cores / processor (CPU) >>>>>>>>>>>>>>>> 6 CPU / node >>>>>>>>>>>>>>>> So 48 Cores / CPU >>>>>>>>>>>>>>>> Not sure abt the memory / node >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The parallel efficiency ?En? for a given degree of >>>>>>>>>>>>>>>> parallelism ?n? indicates how much the program is >>>>>>>>>>>>>>>> efficiently accelerated by parallel processing. ?En? is >>>>>>>>>>>>>>>> given by the following formulae. Although their >>>>>>>>>>>>>>>> derivation processes are different depending on strong and >>>>>>>>>>>>>>>> weak scaling, derived formulae are the >>>>>>>>>>>>>>>> same. >>>>>>>>>>>>>>>> From the estimated time, my parallel efficiency using >>>>>>>>>>>>>>>> Amdahl's law on the current old cluster was 52.7%. >>>>>>>>>>>>>>>> So is my results acceptable? >>>>>>>>>>>>>>>> For the large data set, if using 2205 nodes (2205X8cores), >>>>>>>>>>>>>>>> my expected parallel efficiency is only 0.5%. The proposal recommends value >>>>>>>>>>>>>>>> of > 50%. >>>>>>>>>>>>>>>> The problem with this analysis is that the estimated serial >>>>>>>>>>>>>>>> fraction from Amdahl's Law changes as a function >>>>>>>>>>>>>>>> of problem size, so you cannot take the strong scaling from >>>>>>>>>>>>>>>> one problem and apply it to another without a >>>>>>>>>>>>>>>> model of this dependence. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Weak scaling does model changes with problem size, so I >>>>>>>>>>>>>>>> would measure weak scaling on your current >>>>>>>>>>>>>>>> cluster, and extrapolate to the big machine. I realize that >>>>>>>>>>>>>>>> this does not make sense for many scientific >>>>>>>>>>>>>>>> applications, but neither does requiring a certain parallel >>>>>>>>>>>>>>>> efficiency. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Ok I check the results for my weak scaling it is even worse >>>>>>>>>>>>>>> for the expected parallel efficiency. From the formula used, it's obvious >>>>>>>>>>>>>>> it's doing some sort of exponential extrapolation decrease. So unless I can >>>>>>>>>>>>>>> achieve a near > 90% speed up when I double the cores and problem size for >>>>>>>>>>>>>>> my current 48/96 cores setup, extrapolating from about 96 nodes to >>>>>>>>>>>>>>> 10,000 nodes will give a much lower expected parallel efficiency for the >>>>>>>>>>>>>>> new case. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> However, it's mentioned in the FAQ that due to memory >>>>>>>>>>>>>>> requirement, it's impossible to get >90% speed when I double the cores and >>>>>>>>>>>>>>> problem size (ie linear increase in performance), which means that I can't >>>>>>>>>>>>>>> get >90% speed up when I double the cores and problem size for my current >>>>>>>>>>>>>>> 48/96 cores setup. Is that so? >>>>>>>>>>>>>>> >>>>>>>>>>>>>> What is the output of -ksp_view -log_summary on the >>>>>>>>>>>>>> problem and then on the problem doubled in size and number of processors? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Barry >>>>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I have attached the output >>>>>>>>>>>>> >>>>>>>>>>>>> 48 cores: log48 >>>>>>>>>>>>> 96 cores: log96 >>>>>>>>>>>>> >>>>>>>>>>>>> There are 2 solvers - The momentum linear eqn uses bcgs, while >>>>>>>>>>>>> the Poisson eqn uses hypre BoomerAMG. >>>>>>>>>>>>> >>>>>>>>>>>>> Problem size doubled from 158x266x150 to 158x266x300. >>>>>>>>>>>>> >>>>>>>>>>>>>> So is it fair to say that the main problem does not lie in my >>>>>>>>>>>>>>> programming skills, but rather the way the linear equations are solved? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Matt >>>>>>>>>>>>>>>> Is it possible for this type of scaling in PETSc (>50%), >>>>>>>>>>>>>>>> when using 17640 (2205X8) cores? >>>>>>>>>>>>>>>> Btw, I do not have access to the system. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Sent using CloudMagic Email >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> What most experimenters take for granted before they begin >>>>>>>>>>>>>>>> their experiments is infinitely more interesting than any results to which >>>>>>>>>>>>>>>> their experiments lead. >>>>>>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From zonexo at gmail.com Tue Nov 3 06:58:23 2015 From: zonexo at gmail.com (TAY wee-beng) Date: Tue, 3 Nov 2015 20:58:23 +0800 Subject: [petsc-users] Scaling with number of cores In-Reply-To: References: <5634EDA4.5030203@gmail.com> <56356E2A.8090000@gmail.com> <0BEDF334-C0C6-49D1-9FE8-D178181C5F00@mcs.anl.gov> <5636140A.3040506@gmail.com> <61EEB3FD-A093-475C-9841-9E79BB120385@mcs.anl.gov> <5636BDF3.8000109@gmail.com> <5636E059.2010107@gmail.com> <501DC517-2887-4222-9ECD-9951D521A4E5@mcs.anl.gov> <56370085.7070502@gmail.com> <6FF57BB6-FB0D-49F0-BF84-E4EA760FC010@mcs.anl.gov> <56372A12.90900@gmail.com> <563839F8.9080000@gmail.com> <589BE4CF-18B5-4F04-A442-7FB0F2F3AFC1@mcs.anl.gov> <5638AD42.9060609@gmail.com> Message-ID: <5638AF6F.80405@gmail.com> On 3/11/2015 8:52 PM, Matthew Knepley wrote: > On Tue, Nov 3, 2015 at 6:49 AM, TAY wee-beng > wrote: > > Hi, > > I tried and have attached the log. > > Ya, my Poisson eqn has Neumann boundary condition. Do I need to > specify some null space stuff? Like KSPSetNullSpace or > MatNullSpaceCreate? > > > Yes, you need to attach the constant null space to the matrix. > > Thanks, > > Matt Ok so can you point me to a suitable example so that I know which one to use specifically? Thanks. > > > Thank you > > Yours sincerely, > > TAY wee-beng > > On 3/11/2015 12:45 PM, Barry Smith wrote: > > On Nov 2, 2015, at 10:37 PM, TAY wee-beng > wrote: > > Hi, > > I tried : > > 1. -poisson_pc_gamg_agg_nsmooths 1 -poisson_pc_type gamg > > 2. -poisson_pc_type gamg > > Run with -poisson_ksp_monitor_true_residual > -poisson_ksp_monitor_converged_reason > Does your poisson have Neumann boundary conditions? Do you > have any zeros on the diagonal for the matrix (you shouldn't). > > There may be something wrong with your poisson > discretization that was also messing up hypre > > > > Both options give: > > 1 0.00150000 0.00000000 0.00000000 > 1.00000000 NaN NaN NaN > M Diverged but why?, time = 2 > reason = -9 > > How can I check what's wrong? > > Thank you > > Yours sincerely, > > TAY wee-beng > > On 3/11/2015 3:18 AM, Barry Smith wrote: > > hypre is just not scaling well here. I do not know > why. Since hypre is a block box for us there is no way > to determine why the poor scaling. > > If you make the same two runs with -pc_type gamg > there will be a lot more information in the log > summary about in what routines it is scaling well or > poorly. > > Barry > > > > On Nov 2, 2015, at 3:17 AM, TAY > wee-beng > wrote: > > Hi, > > I have attached the 2 files. > > Thank you > > Yours sincerely, > > TAY wee-beng > > On 2/11/2015 2:55 PM, Barry Smith wrote: > > Run (158/2)x(266/2)x(150/2) grid on 8 > processes and then (158)x(266)x(150) on 64 > processors and send the two -log_summary results > > Barry > > > On Nov 2, 2015, at 12:19 AM, TAY > wee-beng > wrote: > > Hi, > > I have attached the new results. > > Thank you > > Yours sincerely, > > TAY wee-beng > > On 2/11/2015 12:27 PM, Barry Smith wrote: > > Run without the -momentum_ksp_view > -poisson_ksp_view and send the new results > > > You can see from the log summary > that the PCSetUp is taking a much > smaller percentage of the time meaning > that it is reusing the preconditioner > and not rebuilding it each time. > > Barry > > Something makes no sense with the > output: it gives > > KSPSolve 199 1.0 > 2.3298e+03 1.0 5.20e+09 1.8 3.8e+04 > 9.9e+05 5.0e+02 90100 66100 24 90100 > 66100 24 165 > > 90% of the time is in the solve but > there is no significant amount of time > in other events of the code which is > just not possible. I hope it is due to > your IO. > > > > On Nov 1, 2015, at 10:02 PM, TAY > wee-beng > wrote: > > Hi, > > I have attached the new run with > 100 time steps for 48 and 96 cores. > > Only the Poisson eqn 's RHS > changes, the LHS doesn't. So if I > want to reuse the preconditioner, > what must I do? Or what must I not do? > > Why does the number of processes > increase so much? Is there > something wrong with my coding? > Seems to be so too for my new run. > > Thank you > > Yours sincerely, > > TAY wee-beng > > On 2/11/2015 9:49 AM, Barry Smith > wrote: > > If you are doing many time > steps with the same linear > solver then you MUST do your > weak scaling studies with MANY > time steps since the setup > time of AMG only takes place > in the first stimestep. So run > both 48 and 96 processes with > the same large number of time > steps. > > Barry > > > > On Nov 1, 2015, at 7:35 > PM, TAY > wee-beng > > wrote: > > Hi, > > Sorry I forgot and use the > old a.out. I have attached > the new log for 48cores > (log48), together with the > 96cores log (log96). > > Why does the number of > processes increase so > much? Is there something > wrong with my coding? > > Only the Poisson eqn 's > RHS changes, the LHS > doesn't. So if I want to > reuse the preconditioner, > what must I do? Or what > must I not do? > > Lastly, I only simulated 2 > time steps previously. Now > I run for 10 timesteps > (log48_10). Is it building > the preconditioner at > every timestep? > > Also, what about momentum > eqn? Is it working well? > > I will try the gamg later too. > > Thank you > > Yours sincerely, > > TAY wee-beng > > On 2/11/2015 12:30 AM, > Barry Smith wrote: > > You used gmres with > 48 processes but > richardson with 96. > You need to be careful > and make sure you > don't change the > solvers when you > change the number of > processors since you > can get very different > inconsistent results > > Anyways all the > time is being spent in > the BoomerAMG > algebraic multigrid > setup and it is is > scaling badly. When > you double the problem > size and number of > processes it went from > 3.2445e+01 to > 4.3599e+02 seconds. > > PCSetUp > 3 1.0 3.2445e+01 1.0 > 9.58e+06 2.0 0.0e+00 > 0.0e+00 4.0e+00 62 8 > 0 0 4 62 8 0 0 > 5 11 > > PCSetUp > 3 1.0 4.3599e+02 1.0 > 9.58e+06 2.0 0.0e+00 > 0.0e+00 4.0e+00 85 18 > 0 0 6 85 18 0 0 > 6 2 > > Now is the Poisson > problem changing at > each timestep or can > you use the same > preconditioner built > with BoomerAMG for all > the time steps? > Algebraic multigrid > has a large set up > time that you often > doesn't matter if you > have many time steps > but if you have to > rebuild it each > timestep it is too large? > > You might also try > -pc_type gamg and see > how PETSc's algebraic > multigrid scales for > your problem/machine. > > Barry > > > > On Nov 1, 2015, at > 7:30 AM, TAY > wee-beng > > wrote: > > > On 1/11/2015 10:00 > AM, Barry Smith wrote: > > On Oct 31, > 2015, at > 8:43 PM, > TAY > wee-beng > > wrote: > > > On > 1/11/2015 > 12:47 AM, > Matthew > Knepley wrote: > > On > Sat, > Oct > 31, > 2015 > at > 11:34 > AM, > TAY > wee-beng > > wrote: > Hi, > > I > understand > that > as > mentioned > in the > faq, > due to > the > limitations > in > memory, the > scaling is > not > linear. So, > I am > trying > to > write > a > proposal > to use > a > supercomputer. > Its > specs are: > Compute nodes: > 82,944 > nodes > (SPARC64 > VIIIfx; 16GB > of > memory > per node) > > 8 > cores > / > processor > Interconnect: > Tofu > (6-dimensional > mesh/torus) > Interconnect > Each > cabinet contains > 96 > computing > nodes, > One of > the > requirement > is to > give > the > performance > of my > current code > with > my > current set > of > data, > and > there > is a > formula to > calculate > the > estimated > parallel > efficiency > when > using > the > new > large > set of > data > There > are 2 > ways > to > give > performance: > 1. > Strong > scaling, > which > is > defined as > how > the > elapsed time > varies > with > the > number > of > processors > for a > fixed > problem. > 2. > Weak > scaling, > which > is > defined as > how > the > elapsed time > varies > with > the > number > of > processors > for a > fixed > problem size > per > processor. > I ran > my > cases > with > 48 and > 96 > cores > with > my > current cluster, > giving > 140 > and 90 > mins > respectively. > This > is > classified > as > strong > scaling. > Cluster specs: > CPU: > AMD > 6234 > 2.4GHz > 8 > cores > / > processor > (CPU) > 6 CPU > / node > So 48 > Cores > / CPU > Not > sure > abt > the > memory > / node > > The > parallel > efficiency > ?En? > for a > given > degree > of > parallelism > ?n? > indicates > how > much > the > program is > efficiently > accelerated > by > parallel > processing. > ?En? > is > given > by the > following > formulae. > Although > their > derivation > processes > are > different > depending > on > strong > and > weak > scaling, > derived formulae > are the > same. > From > the > estimated > time, > my > parallel > efficiency > using > Amdahl's > law on > the > current old > cluster was > 52.7%. > So is > my > results acceptable? > For > the > large > data > set, > if > using > 2205 > nodes > (2205X8cores), > my > expected > parallel > efficiency > is > only > 0.5%. > The > proposal > recommends > value > of > 50%. > The > problem with > this > analysis > is > that > the > estimated > serial > fraction > from > Amdahl's > Law > changes as > a function > of > problem size, > so you > cannot > take > the > strong > scaling from > one > problem and > apply > it to > another without > a > model > of > this > dependence. > > Weak > scaling does > model > changes with > problem size, > so I > would > measure weak > scaling on > your > current > cluster, > and > extrapolate > to the > big > machine. > I > realize that > this > does > not > make > sense > for > many > scientific > applications, > but > neither does > requiring > a > certain parallel > efficiency. > > Ok I check > the > results > for my > weak > scaling it > is even > worse for > the > expected > parallel > efficiency. From > the > formula > used, it's > obvious > it's doing > some sort > of > exponential extrapolation > decrease. > So unless > I can > achieve a > near > 90% > speed up > when I > double the > cores and > problem > size for > my current > 48/96 > cores > setup, > extrapolating > from about > 96 nodes > to 10,000 > nodes will > give a > much lower > expected > parallel > efficiency > for the > new case. > > However, > it's > mentioned > in the FAQ > that due > to memory > requirement, > it's > impossible > to get > >90% speed > when I > double the > cores and > problem > size (ie > linear > increase > in > performance), > which > means that > I can't > get >90% > speed up > when I > double the > cores and > problem > size for > my current > 48/96 > cores > setup. Is > that so? > > What is the > output of > -ksp_view > -log_summary > on the problem > and then on > the problem > doubled in > size and > number of > processors? > > Barry > > Hi, > > I have attached > the output > > 48 cores: log48 > 96 cores: log96 > > There are 2 > solvers - The > momentum linear > eqn uses bcgs, > while the Poisson > eqn uses hypre > BoomerAMG. > > Problem size > doubled from > 158x266x150 to > 158x266x300. > > So is it > fair to > say that > the main > problem > does not > lie in my > programming skills, > but rather > the way > the linear > equations > are solved? > > Thanks. > > Thanks, > > Matt > Is it > possible > for > this > type > of > scaling in > PETSc > (>50%), when > using > 17640 > (2205X8) > cores? > Btw, I > do not > have > access > to the > system. > > > > Sent > using > CloudMagic > Email > > > > -- > What > most > experimenters > take > for > granted before > they > begin > their > experiments > is > infinitely > more > interesting > than > any > results to > which > their > experiments > lead. > -- > Norbert Wiener > > > > > > > > > > > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Nov 3 07:01:18 2015 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 3 Nov 2015 07:01:18 -0600 Subject: [petsc-users] Scaling with number of cores In-Reply-To: <5638AF6F.80405@gmail.com> References: <5634EDA4.5030203@gmail.com> <56356E2A.8090000@gmail.com> <0BEDF334-C0C6-49D1-9FE8-D178181C5F00@mcs.anl.gov> <5636140A.3040506@gmail.com> <61EEB3FD-A093-475C-9841-9E79BB120385@mcs.anl.gov> <5636BDF3.8000109@gmail.com> <5636E059.2010107@gmail.com> <501DC517-2887-4222-9ECD-9951D521A4E5@mcs.anl.gov> <56370085.7070502@gmail.com> <6FF57BB6-FB0D-49F0-BF84-E4EA760FC010@mcs.anl.gov> <56372A12.90900@gmail.com> <563839F8.9080000@gmail.com> <589BE4CF-18B5-4F04-A442-7FB0F2F3AFC1@mcs.anl.gov> <5638AD42.9060609@gmail.com> <5638AF6F.80405@gmail.com> Message-ID: On Tue, Nov 3, 2015 at 6:58 AM, TAY wee-beng wrote: > > On 3/11/2015 8:52 PM, Matthew Knepley wrote: > > On Tue, Nov 3, 2015 at 6:49 AM, TAY wee-beng wrote: > >> Hi, >> >> I tried and have attached the log. >> >> Ya, my Poisson eqn has Neumann boundary condition. Do I need to specify >> some null space stuff? Like KSPSetNullSpace or MatNullSpaceCreate? > > > Yes, you need to attach the constant null space to the matrix. > > Thanks, > > Matt > > Ok so can you point me to a suitable example so that I know which one to > use specifically? > https://bitbucket.org/petsc/petsc/src/9ae8fd060698c4d6fc0d13188aca8a1828c138ab/src/snes/examples/tutorials/ex12.c?at=master&fileviewer=file-view-default#ex12.c-761 Matt > Thanks. > > > >> >> Thank you >> >> Yours sincerely, >> >> TAY wee-beng >> >> On 3/11/2015 12:45 PM, Barry Smith wrote: >> >>> On Nov 2, 2015, at 10:37 PM, TAY wee-beng< >>>> zonexo at gmail.com> wrote: >>>> >>>> Hi, >>>> >>>> I tried : >>>> >>>> 1. -poisson_pc_gamg_agg_nsmooths 1 -poisson_pc_type gamg >>>> >>>> 2. -poisson_pc_type gamg >>>> >>> Run with -poisson_ksp_monitor_true_residual >>> -poisson_ksp_monitor_converged_reason >>> Does your poisson have Neumann boundary conditions? Do you have any >>> zeros on the diagonal for the matrix (you shouldn't). >>> >>> There may be something wrong with your poisson discretization that >>> was also messing up hypre >>> >>> >>> >>> Both options give: >>>> >>>> 1 0.00150000 0.00000000 0.00000000 1.00000000 >>>> NaN NaN NaN >>>> M Diverged but why?, time = 2 >>>> reason = -9 >>>> >>>> How can I check what's wrong? >>>> >>>> Thank you >>>> >>>> Yours sincerely, >>>> >>>> TAY wee-beng >>>> >>>> On 3/11/2015 3:18 AM, Barry Smith wrote: >>>> >>>>> hypre is just not scaling well here. I do not know why. Since >>>>> hypre is a block box for us there is no way to determine why the poor >>>>> scaling. >>>>> >>>>> If you make the same two runs with -pc_type gamg there will be a >>>>> lot more information in the log summary about in what routines it is >>>>> scaling well or poorly. >>>>> >>>>> Barry >>>>> >>>>> >>>>> >>>>> On Nov 2, 2015, at 3:17 AM, TAY wee-beng< >>>>>> zonexo at gmail.com> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I have attached the 2 files. >>>>>> >>>>>> Thank you >>>>>> >>>>>> Yours sincerely, >>>>>> >>>>>> TAY wee-beng >>>>>> >>>>>> On 2/11/2015 2:55 PM, Barry Smith wrote: >>>>>> >>>>>>> Run (158/2)x(266/2)x(150/2) grid on 8 processes and then >>>>>>> (158)x(266)x(150) on 64 processors and send the two -log_summary results >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Nov 2, 2015, at 12:19 AM, TAY wee-beng wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I have attached the new results. >>>>>>>> >>>>>>>> Thank you >>>>>>>> >>>>>>>> Yours sincerely, >>>>>>>> >>>>>>>> TAY wee-beng >>>>>>>> >>>>>>>> On 2/11/2015 12:27 PM, Barry Smith wrote: >>>>>>>> >>>>>>>>> Run without the -momentum_ksp_view -poisson_ksp_view and send >>>>>>>>> the new results >>>>>>>>> >>>>>>>>> >>>>>>>>> You can see from the log summary that the PCSetUp is taking a >>>>>>>>> much smaller percentage of the time meaning that it is reusing the >>>>>>>>> preconditioner and not rebuilding it each time. >>>>>>>>> >>>>>>>>> Barry >>>>>>>>> >>>>>>>>> Something makes no sense with the output: it gives >>>>>>>>> >>>>>>>>> KSPSolve 199 1.0 2.3298e+03 1.0 5.20e+09 1.8 3.8e+04 >>>>>>>>> 9.9e+05 5.0e+02 90100 66100 24 90100 66100 24 165 >>>>>>>>> >>>>>>>>> 90% of the time is in the solve but there is no significant amount >>>>>>>>> of time in other events of the code which is just not possible. I hope it >>>>>>>>> is due to your IO. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Nov 1, 2015, at 10:02 PM, TAY wee-beng >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I have attached the new run with 100 time steps for 48 and 96 >>>>>>>>>> cores. >>>>>>>>>> >>>>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I >>>>>>>>>> want to reuse the preconditioner, what must I do? Or what must I not do? >>>>>>>>>> >>>>>>>>>> Why does the number of processes increase so much? Is there >>>>>>>>>> something wrong with my coding? Seems to be so too for my new run. >>>>>>>>>> >>>>>>>>>> Thank you >>>>>>>>>> >>>>>>>>>> Yours sincerely, >>>>>>>>>> >>>>>>>>>> TAY wee-beng >>>>>>>>>> >>>>>>>>>> On 2/11/2015 9:49 AM, Barry Smith wrote: >>>>>>>>>> >>>>>>>>>>> If you are doing many time steps with the same linear solver >>>>>>>>>>> then you MUST do your weak scaling studies with MANY time steps since the >>>>>>>>>>> setup time of AMG only takes place in the first stimestep. So run both 48 >>>>>>>>>>> and 96 processes with the same large number of time steps. >>>>>>>>>>> >>>>>>>>>>> Barry >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Nov 1, 2015, at 7:35 PM, TAY wee-beng< >>>>>>>>>>>> zonexo at gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> Sorry I forgot and use the old a.out. I have attached the new >>>>>>>>>>>> log for 48cores (log48), together with the 96cores log (log96). >>>>>>>>>>>> >>>>>>>>>>>> Why does the number of processes increase so much? Is there >>>>>>>>>>>> something wrong with my coding? >>>>>>>>>>>> >>>>>>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I >>>>>>>>>>>> want to reuse the preconditioner, what must I do? Or what must I not do? >>>>>>>>>>>> >>>>>>>>>>>> Lastly, I only simulated 2 time steps previously. Now I run for >>>>>>>>>>>> 10 timesteps (log48_10). Is it building the preconditioner at every >>>>>>>>>>>> timestep? >>>>>>>>>>>> >>>>>>>>>>>> Also, what about momentum eqn? Is it working well? >>>>>>>>>>>> >>>>>>>>>>>> I will try the gamg later too. >>>>>>>>>>>> >>>>>>>>>>>> Thank you >>>>>>>>>>>> >>>>>>>>>>>> Yours sincerely, >>>>>>>>>>>> >>>>>>>>>>>> TAY wee-beng >>>>>>>>>>>> >>>>>>>>>>>> On 2/11/2015 12:30 AM, Barry Smith wrote: >>>>>>>>>>>> >>>>>>>>>>>>> You used gmres with 48 processes but richardson with 96. >>>>>>>>>>>>> You need to be careful and make sure you don't change the solvers when you >>>>>>>>>>>>> change the number of processors since you can get very different >>>>>>>>>>>>> inconsistent results >>>>>>>>>>>>> >>>>>>>>>>>>> Anyways all the time is being spent in the BoomerAMG >>>>>>>>>>>>> algebraic multigrid setup and it is is scaling badly. When you double the >>>>>>>>>>>>> problem size and number of processes it went from 3.2445e+01 to 4.3599e+02 >>>>>>>>>>>>> seconds. >>>>>>>>>>>>> >>>>>>>>>>>>> PCSetUp 3 1.0 3.2445e+01 1.0 9.58e+06 2.0 >>>>>>>>>>>>> 0.0e+00 0.0e+00 4.0e+00 62 8 0 0 4 62 8 0 0 5 11 >>>>>>>>>>>>> >>>>>>>>>>>>> PCSetUp 3 1.0 4.3599e+02 1.0 9.58e+06 2.0 >>>>>>>>>>>>> 0.0e+00 0.0e+00 4.0e+00 85 18 0 0 6 85 18 0 0 6 2 >>>>>>>>>>>>> >>>>>>>>>>>>> Now is the Poisson problem changing at each timestep or can >>>>>>>>>>>>> you use the same preconditioner built with BoomerAMG for all the time >>>>>>>>>>>>> steps? Algebraic multigrid has a large set up time that you often doesn't >>>>>>>>>>>>> matter if you have many time steps but if you have to rebuild it each >>>>>>>>>>>>> timestep it is too large? >>>>>>>>>>>>> >>>>>>>>>>>>> You might also try -pc_type gamg and see how PETSc's >>>>>>>>>>>>> algebraic multigrid scales for your problem/machine. >>>>>>>>>>>>> >>>>>>>>>>>>> Barry >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Nov 1, 2015, at 7:30 AM, TAY wee-beng< >>>>>>>>>>>>>> zonexo at gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 1/11/2015 10:00 AM, Barry Smith wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Oct 31, 2015, at 8:43 PM, TAY wee-beng< >>>>>>>>>>>>>>>> zonexo at gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 1/11/2015 12:47 AM, Matthew Knepley wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Sat, Oct 31, 2015 at 11:34 AM, TAY wee-beng< >>>>>>>>>>>>>>>>> zonexo at gmail.com> wrote: >>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I understand that as mentioned in the faq, due to the >>>>>>>>>>>>>>>>> limitations in memory, the scaling is not linear. So, I am trying to write >>>>>>>>>>>>>>>>> a proposal to use a supercomputer. >>>>>>>>>>>>>>>>> Its specs are: >>>>>>>>>>>>>>>>> Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of >>>>>>>>>>>>>>>>> memory per node) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 8 cores / processor >>>>>>>>>>>>>>>>> Interconnect: Tofu (6-dimensional mesh/torus) Interconnect >>>>>>>>>>>>>>>>> Each cabinet contains 96 computing nodes, >>>>>>>>>>>>>>>>> One of the requirement is to give the performance of my >>>>>>>>>>>>>>>>> current code with my current set of data, and there is a formula to >>>>>>>>>>>>>>>>> calculate the estimated parallel efficiency when using the new large set of >>>>>>>>>>>>>>>>> data >>>>>>>>>>>>>>>>> There are 2 ways to give performance: >>>>>>>>>>>>>>>>> 1. Strong scaling, which is defined as how the elapsed >>>>>>>>>>>>>>>>> time varies with the number of processors for a fixed >>>>>>>>>>>>>>>>> problem. >>>>>>>>>>>>>>>>> 2. Weak scaling, which is defined as how the elapsed time >>>>>>>>>>>>>>>>> varies with the number of processors for a >>>>>>>>>>>>>>>>> fixed problem size per processor. >>>>>>>>>>>>>>>>> I ran my cases with 48 and 96 cores with my current >>>>>>>>>>>>>>>>> cluster, giving 140 and 90 mins respectively. This is classified as strong >>>>>>>>>>>>>>>>> scaling. >>>>>>>>>>>>>>>>> Cluster specs: >>>>>>>>>>>>>>>>> CPU: AMD 6234 2.4GHz >>>>>>>>>>>>>>>>> 8 cores / processor (CPU) >>>>>>>>>>>>>>>>> 6 CPU / node >>>>>>>>>>>>>>>>> So 48 Cores / CPU >>>>>>>>>>>>>>>>> Not sure abt the memory / node >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The parallel efficiency ?En? for a given degree of >>>>>>>>>>>>>>>>> parallelism ?n? indicates how much the program is >>>>>>>>>>>>>>>>> efficiently accelerated by parallel processing. ?En? is >>>>>>>>>>>>>>>>> given by the following formulae. Although their >>>>>>>>>>>>>>>>> derivation processes are different depending on strong and >>>>>>>>>>>>>>>>> weak scaling, derived formulae are the >>>>>>>>>>>>>>>>> same. >>>>>>>>>>>>>>>>> From the estimated time, my parallel efficiency using >>>>>>>>>>>>>>>>> Amdahl's law on the current old cluster was 52.7%. >>>>>>>>>>>>>>>>> So is my results acceptable? >>>>>>>>>>>>>>>>> For the large data set, if using 2205 nodes (2205X8cores), >>>>>>>>>>>>>>>>> my expected parallel efficiency is only 0.5%. The proposal recommends value >>>>>>>>>>>>>>>>> of > 50%. >>>>>>>>>>>>>>>>> The problem with this analysis is that the estimated >>>>>>>>>>>>>>>>> serial fraction from Amdahl's Law changes as a function >>>>>>>>>>>>>>>>> of problem size, so you cannot take the strong scaling >>>>>>>>>>>>>>>>> from one problem and apply it to another without a >>>>>>>>>>>>>>>>> model of this dependence. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Weak scaling does model changes with problem size, so I >>>>>>>>>>>>>>>>> would measure weak scaling on your current >>>>>>>>>>>>>>>>> cluster, and extrapolate to the big machine. I realize >>>>>>>>>>>>>>>>> that this does not make sense for many scientific >>>>>>>>>>>>>>>>> applications, but neither does requiring a certain >>>>>>>>>>>>>>>>> parallel efficiency. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Ok I check the results for my weak scaling it is even worse >>>>>>>>>>>>>>>> for the expected parallel efficiency. From the formula used, it's obvious >>>>>>>>>>>>>>>> it's doing some sort of exponential extrapolation decrease. So unless I can >>>>>>>>>>>>>>>> achieve a near > 90% speed up when I double the cores and problem size for >>>>>>>>>>>>>>>> my current 48/96 cores setup, extrapolating from about 96 nodes to >>>>>>>>>>>>>>>> 10,000 nodes will give a much lower expected parallel efficiency for the >>>>>>>>>>>>>>>> new case. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> However, it's mentioned in the FAQ that due to memory >>>>>>>>>>>>>>>> requirement, it's impossible to get >90% speed when I double the cores and >>>>>>>>>>>>>>>> problem size (ie linear increase in performance), which means that I can't >>>>>>>>>>>>>>>> get >90% speed up when I double the cores and problem size for my current >>>>>>>>>>>>>>>> 48/96 cores setup. Is that so? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> What is the output of -ksp_view -log_summary on the >>>>>>>>>>>>>>> problem and then on the problem doubled in size and number of processors? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Barry >>>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I have attached the output >>>>>>>>>>>>>> >>>>>>>>>>>>>> 48 cores: log48 >>>>>>>>>>>>>> 96 cores: log96 >>>>>>>>>>>>>> >>>>>>>>>>>>>> There are 2 solvers - The momentum linear eqn uses bcgs, >>>>>>>>>>>>>> while the Poisson eqn uses hypre BoomerAMG. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Problem size doubled from 158x266x150 to 158x266x300. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> So is it fair to say that the main problem does not lie in >>>>>>>>>>>>>>>> my programming skills, but rather the way the linear equations are solved? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Matt >>>>>>>>>>>>>>>>> Is it possible for this type of scaling in PETSc (>50%), >>>>>>>>>>>>>>>>> when using 17640 (2205X8) cores? >>>>>>>>>>>>>>>>> Btw, I do not have access to the system. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Sent using CloudMagic Email >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> What most experimenters take for granted before they begin >>>>>>>>>>>>>>>>> their experiments is infinitely more interesting than any results to which >>>>>>>>>>>>>>>>> their experiments lead. >>>>>>>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Tue Nov 3 07:47:39 2015 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 3 Nov 2015 08:47:39 -0500 Subject: [petsc-users] CG+GAMG convergence issues in GHEP Krylov-Schur for some MPI runs In-Reply-To: <4E4A4E38-B9C3-4F41-8D52-1AFF3E4E8C8C@gmail.com> References: <084F486B-00B4-4803-A95B-EE390EB25391@gmail.com> <2724FD04-9FED-43B8-B8A4-9F66BB8BD43B@dsic.upv.es> <4E4A4E38-B9C3-4F41-8D52-1AFF3E4E8C8C@gmail.com> Message-ID: BTW, I think that our advice for segv is use a debugger. DDT or Totalview, and gdb if need be, will get you right to the source code and will get 90% of bugs diagnosed. Valgrind is noisy and cumbersome to use but can diagnose 90% of the other 10%. On Tue, Nov 3, 2015 at 7:32 AM, Denis Davydov wrote: > Hi Jose, > > > On 3 Nov 2015, at 12:20, Jose E. Roman wrote: > > > > I am answering the SLEPc-related questions: > > - Having different number of iterations when changing the number of > processes is normal. > the change in iterations i mentioned are for different preconditioners, > but the same number of MPI processes. > > > > - Yes, if you do not destroy the EPS solver, then the preconditioner > would be reused. > > > > Regarding the segmentation fault, I have no clue. Not sure if this is > related to GAMG or not. Maybe running under valgrind could provide more > information. > will try that. > > Denis. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Tue Nov 3 08:00:20 2015 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 3 Nov 2015 09:00:20 -0500 Subject: [petsc-users] Scaling with number of cores In-Reply-To: References: <5634EDA4.5030203@gmail.com> <56356E2A.8090000@gmail.com> <0BEDF334-C0C6-49D1-9FE8-D178181C5F00@mcs.anl.gov> <5636140A.3040506@gmail.com> <61EEB3FD-A093-475C-9841-9E79BB120385@mcs.anl.gov> <5636BDF3.8000109@gmail.com> <5636E059.2010107@gmail.com> <501DC517-2887-4222-9ECD-9951D521A4E5@mcs.anl.gov> <56370085.7070502@gmail.com> <6FF57BB6-FB0D-49F0-BF84-E4EA760FC010@mcs.anl.gov> <56372A12.90900@gmail.com> <563839F8.9080000@gmail.com> <589BE4CF-18B5-4F04-A442-7FB0F2F3AFC1@mcs.anl.gov> <5638AD42.9060609@gmail.com> <5638AF6F.80405@gmail.com> Message-ID: If you clean the RHS of the null space then you can probably (only) use "*-mg_coarse_pc_type svd*", but you need this or the lu coarse grid solver will have problems. If your solution starts drifting then you need to set the null space but this is often not needed. Also, after you get this working you want to check these PCSetUp times. This setup will not scale great but this behavior indicates that there is something wrong. Hypre's default parameters are tune for 2D problems, you have a 3D problem, I assume. GAMG should be fine. As a rule of thumb the PCSetup should not be much more than a solve. An easy 3D Poisson solve might require relatively more setup and a hard 2D problem might require relatively less. On Tue, Nov 3, 2015 at 8:01 AM, Matthew Knepley wrote: > On Tue, Nov 3, 2015 at 6:58 AM, TAY wee-beng wrote: > >> >> On 3/11/2015 8:52 PM, Matthew Knepley wrote: >> >> On Tue, Nov 3, 2015 at 6:49 AM, TAY wee-beng wrote: >> >>> Hi, >>> >>> I tried and have attached the log. >>> >>> Ya, my Poisson eqn has Neumann boundary condition. Do I need to specify >>> some null space stuff? Like KSPSetNullSpace or MatNullSpaceCreate? >> >> >> Yes, you need to attach the constant null space to the matrix. >> >> Thanks, >> >> Matt >> >> Ok so can you point me to a suitable example so that I know which one to >> use specifically? >> > > > https://bitbucket.org/petsc/petsc/src/9ae8fd060698c4d6fc0d13188aca8a1828c138ab/src/snes/examples/tutorials/ex12.c?at=master&fileviewer=file-view-default#ex12.c-761 > > Matt > > >> Thanks. >> >> >> >>> >>> Thank you >>> >>> Yours sincerely, >>> >>> TAY wee-beng >>> >>> On 3/11/2015 12:45 PM, Barry Smith wrote: >>> >>>> On Nov 2, 2015, at 10:37 PM, TAY wee-beng< >>>>> zonexo at gmail.com> wrote: >>>>> >>>>> Hi, >>>>> >>>>> I tried : >>>>> >>>>> 1. -poisson_pc_gamg_agg_nsmooths 1 -poisson_pc_type gamg >>>>> >>>>> 2. -poisson_pc_type gamg >>>>> >>>> Run with -poisson_ksp_monitor_true_residual >>>> -poisson_ksp_monitor_converged_reason >>>> Does your poisson have Neumann boundary conditions? Do you have any >>>> zeros on the diagonal for the matrix (you shouldn't). >>>> >>>> There may be something wrong with your poisson discretization that >>>> was also messing up hypre >>>> >>>> >>>> >>>> Both options give: >>>>> >>>>> 1 0.00150000 0.00000000 0.00000000 1.00000000 >>>>> NaN NaN NaN >>>>> M Diverged but why?, time = 2 >>>>> reason = -9 >>>>> >>>>> How can I check what's wrong? >>>>> >>>>> Thank you >>>>> >>>>> Yours sincerely, >>>>> >>>>> TAY wee-beng >>>>> >>>>> On 3/11/2015 3:18 AM, Barry Smith wrote: >>>>> >>>>>> hypre is just not scaling well here. I do not know why. Since >>>>>> hypre is a block box for us there is no way to determine why the poor >>>>>> scaling. >>>>>> >>>>>> If you make the same two runs with -pc_type gamg there will be a >>>>>> lot more information in the log summary about in what routines it is >>>>>> scaling well or poorly. >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> >>>>>> On Nov 2, 2015, at 3:17 AM, TAY wee-beng< >>>>>>> zonexo at gmail.com> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I have attached the 2 files. >>>>>>> >>>>>>> Thank you >>>>>>> >>>>>>> Yours sincerely, >>>>>>> >>>>>>> TAY wee-beng >>>>>>> >>>>>>> On 2/11/2015 2:55 PM, Barry Smith wrote: >>>>>>> >>>>>>>> Run (158/2)x(266/2)x(150/2) grid on 8 processes and then >>>>>>>> (158)x(266)x(150) on 64 processors and send the two -log_summary results >>>>>>>> >>>>>>>> Barry >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On Nov 2, 2015, at 12:19 AM, TAY wee-beng >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I have attached the new results. >>>>>>>>> >>>>>>>>> Thank you >>>>>>>>> >>>>>>>>> Yours sincerely, >>>>>>>>> >>>>>>>>> TAY wee-beng >>>>>>>>> >>>>>>>>> On 2/11/2015 12:27 PM, Barry Smith wrote: >>>>>>>>> >>>>>>>>>> Run without the -momentum_ksp_view -poisson_ksp_view and send >>>>>>>>>> the new results >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> You can see from the log summary that the PCSetUp is taking a >>>>>>>>>> much smaller percentage of the time meaning that it is reusing the >>>>>>>>>> preconditioner and not rebuilding it each time. >>>>>>>>>> >>>>>>>>>> Barry >>>>>>>>>> >>>>>>>>>> Something makes no sense with the output: it gives >>>>>>>>>> >>>>>>>>>> KSPSolve 199 1.0 2.3298e+03 1.0 5.20e+09 1.8 3.8e+04 >>>>>>>>>> 9.9e+05 5.0e+02 90100 66100 24 90100 66100 24 165 >>>>>>>>>> >>>>>>>>>> 90% of the time is in the solve but there is no significant >>>>>>>>>> amount of time in other events of the code which is just not possible. I >>>>>>>>>> hope it is due to your IO. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Nov 1, 2015, at 10:02 PM, TAY wee-beng >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I have attached the new run with 100 time steps for 48 and 96 >>>>>>>>>>> cores. >>>>>>>>>>> >>>>>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I >>>>>>>>>>> want to reuse the preconditioner, what must I do? Or what must I not do? >>>>>>>>>>> >>>>>>>>>>> Why does the number of processes increase so much? Is there >>>>>>>>>>> something wrong with my coding? Seems to be so too for my new run. >>>>>>>>>>> >>>>>>>>>>> Thank you >>>>>>>>>>> >>>>>>>>>>> Yours sincerely, >>>>>>>>>>> >>>>>>>>>>> TAY wee-beng >>>>>>>>>>> >>>>>>>>>>> On 2/11/2015 9:49 AM, Barry Smith wrote: >>>>>>>>>>> >>>>>>>>>>>> If you are doing many time steps with the same linear solver >>>>>>>>>>>> then you MUST do your weak scaling studies with MANY time steps since the >>>>>>>>>>>> setup time of AMG only takes place in the first stimestep. So run both 48 >>>>>>>>>>>> and 96 processes with the same large number of time steps. >>>>>>>>>>>> >>>>>>>>>>>> Barry >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Nov 1, 2015, at 7:35 PM, TAY wee-beng< >>>>>>>>>>>>> zonexo at gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> Sorry I forgot and use the old a.out. I have attached the new >>>>>>>>>>>>> log for 48cores (log48), together with the 96cores log (log96). >>>>>>>>>>>>> >>>>>>>>>>>>> Why does the number of processes increase so much? Is there >>>>>>>>>>>>> something wrong with my coding? >>>>>>>>>>>>> >>>>>>>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I >>>>>>>>>>>>> want to reuse the preconditioner, what must I do? Or what must I not do? >>>>>>>>>>>>> >>>>>>>>>>>>> Lastly, I only simulated 2 time steps previously. Now I run >>>>>>>>>>>>> for 10 timesteps (log48_10). Is it building the preconditioner at every >>>>>>>>>>>>> timestep? >>>>>>>>>>>>> >>>>>>>>>>>>> Also, what about momentum eqn? Is it working well? >>>>>>>>>>>>> >>>>>>>>>>>>> I will try the gamg later too. >>>>>>>>>>>>> >>>>>>>>>>>>> Thank you >>>>>>>>>>>>> >>>>>>>>>>>>> Yours sincerely, >>>>>>>>>>>>> >>>>>>>>>>>>> TAY wee-beng >>>>>>>>>>>>> >>>>>>>>>>>>> On 2/11/2015 12:30 AM, Barry Smith wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> You used gmres with 48 processes but richardson with 96. >>>>>>>>>>>>>> You need to be careful and make sure you don't change the solvers when you >>>>>>>>>>>>>> change the number of processors since you can get very different >>>>>>>>>>>>>> inconsistent results >>>>>>>>>>>>>> >>>>>>>>>>>>>> Anyways all the time is being spent in the BoomerAMG >>>>>>>>>>>>>> algebraic multigrid setup and it is is scaling badly. When you double the >>>>>>>>>>>>>> problem size and number of processes it went from 3.2445e+01 to 4.3599e+02 >>>>>>>>>>>>>> seconds. >>>>>>>>>>>>>> >>>>>>>>>>>>>> PCSetUp 3 1.0 3.2445e+01 1.0 9.58e+06 2.0 >>>>>>>>>>>>>> 0.0e+00 0.0e+00 4.0e+00 62 8 0 0 4 62 8 0 0 5 11 >>>>>>>>>>>>>> >>>>>>>>>>>>>> PCSetUp 3 1.0 4.3599e+02 1.0 9.58e+06 2.0 >>>>>>>>>>>>>> 0.0e+00 0.0e+00 4.0e+00 85 18 0 0 6 85 18 0 0 6 2 >>>>>>>>>>>>>> >>>>>>>>>>>>>> Now is the Poisson problem changing at each timestep or >>>>>>>>>>>>>> can you use the same preconditioner built with BoomerAMG for all the time >>>>>>>>>>>>>> steps? Algebraic multigrid has a large set up time that you often doesn't >>>>>>>>>>>>>> matter if you have many time steps but if you have to rebuild it each >>>>>>>>>>>>>> timestep it is too large? >>>>>>>>>>>>>> >>>>>>>>>>>>>> You might also try -pc_type gamg and see how PETSc's >>>>>>>>>>>>>> algebraic multigrid scales for your problem/machine. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Barry >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Nov 1, 2015, at 7:30 AM, TAY wee-beng< >>>>>>>>>>>>>>> zonexo at gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 1/11/2015 10:00 AM, Barry Smith wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Oct 31, 2015, at 8:43 PM, TAY wee-beng< >>>>>>>>>>>>>>>>> zonexo at gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 1/11/2015 12:47 AM, Matthew Knepley wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Sat, Oct 31, 2015 at 11:34 AM, TAY wee-beng< >>>>>>>>>>>>>>>>>> zonexo at gmail.com> wrote: >>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I understand that as mentioned in the faq, due to the >>>>>>>>>>>>>>>>>> limitations in memory, the scaling is not linear. So, I am trying to write >>>>>>>>>>>>>>>>>> a proposal to use a supercomputer. >>>>>>>>>>>>>>>>>> Its specs are: >>>>>>>>>>>>>>>>>> Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of >>>>>>>>>>>>>>>>>> memory per node) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 8 cores / processor >>>>>>>>>>>>>>>>>> Interconnect: Tofu (6-dimensional mesh/torus) Interconnect >>>>>>>>>>>>>>>>>> Each cabinet contains 96 computing nodes, >>>>>>>>>>>>>>>>>> One of the requirement is to give the performance of my >>>>>>>>>>>>>>>>>> current code with my current set of data, and there is a formula to >>>>>>>>>>>>>>>>>> calculate the estimated parallel efficiency when using the new large set of >>>>>>>>>>>>>>>>>> data >>>>>>>>>>>>>>>>>> There are 2 ways to give performance: >>>>>>>>>>>>>>>>>> 1. Strong scaling, which is defined as how the elapsed >>>>>>>>>>>>>>>>>> time varies with the number of processors for a fixed >>>>>>>>>>>>>>>>>> problem. >>>>>>>>>>>>>>>>>> 2. Weak scaling, which is defined as how the elapsed time >>>>>>>>>>>>>>>>>> varies with the number of processors for a >>>>>>>>>>>>>>>>>> fixed problem size per processor. >>>>>>>>>>>>>>>>>> I ran my cases with 48 and 96 cores with my current >>>>>>>>>>>>>>>>>> cluster, giving 140 and 90 mins respectively. This is classified as strong >>>>>>>>>>>>>>>>>> scaling. >>>>>>>>>>>>>>>>>> Cluster specs: >>>>>>>>>>>>>>>>>> CPU: AMD 6234 2.4GHz >>>>>>>>>>>>>>>>>> 8 cores / processor (CPU) >>>>>>>>>>>>>>>>>> 6 CPU / node >>>>>>>>>>>>>>>>>> So 48 Cores / CPU >>>>>>>>>>>>>>>>>> Not sure abt the memory / node >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The parallel efficiency ?En? for a given degree of >>>>>>>>>>>>>>>>>> parallelism ?n? indicates how much the program is >>>>>>>>>>>>>>>>>> efficiently accelerated by parallel processing. ?En? is >>>>>>>>>>>>>>>>>> given by the following formulae. Although their >>>>>>>>>>>>>>>>>> derivation processes are different depending on strong >>>>>>>>>>>>>>>>>> and weak scaling, derived formulae are the >>>>>>>>>>>>>>>>>> same. >>>>>>>>>>>>>>>>>> From the estimated time, my parallel efficiency using >>>>>>>>>>>>>>>>>> Amdahl's law on the current old cluster was 52.7%. >>>>>>>>>>>>>>>>>> So is my results acceptable? >>>>>>>>>>>>>>>>>> For the large data set, if using 2205 nodes >>>>>>>>>>>>>>>>>> (2205X8cores), my expected parallel efficiency is only 0.5%. The proposal >>>>>>>>>>>>>>>>>> recommends value of > 50%. >>>>>>>>>>>>>>>>>> The problem with this analysis is that the estimated >>>>>>>>>>>>>>>>>> serial fraction from Amdahl's Law changes as a function >>>>>>>>>>>>>>>>>> of problem size, so you cannot take the strong scaling >>>>>>>>>>>>>>>>>> from one problem and apply it to another without a >>>>>>>>>>>>>>>>>> model of this dependence. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Weak scaling does model changes with problem size, so I >>>>>>>>>>>>>>>>>> would measure weak scaling on your current >>>>>>>>>>>>>>>>>> cluster, and extrapolate to the big machine. I realize >>>>>>>>>>>>>>>>>> that this does not make sense for many scientific >>>>>>>>>>>>>>>>>> applications, but neither does requiring a certain >>>>>>>>>>>>>>>>>> parallel efficiency. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Ok I check the results for my weak scaling it is even >>>>>>>>>>>>>>>>> worse for the expected parallel efficiency. From the formula used, it's >>>>>>>>>>>>>>>>> obvious it's doing some sort of exponential extrapolation decrease. So >>>>>>>>>>>>>>>>> unless I can achieve a near > 90% speed up when I double the cores and >>>>>>>>>>>>>>>>> problem size for my current 48/96 cores setup, extrapolating from about >>>>>>>>>>>>>>>>> 96 nodes to 10,000 nodes will give a much lower expected parallel >>>>>>>>>>>>>>>>> efficiency for the new case. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> However, it's mentioned in the FAQ that due to memory >>>>>>>>>>>>>>>>> requirement, it's impossible to get >90% speed when I double the cores and >>>>>>>>>>>>>>>>> problem size (ie linear increase in performance), which means that I can't >>>>>>>>>>>>>>>>> get >90% speed up when I double the cores and problem size for my current >>>>>>>>>>>>>>>>> 48/96 cores setup. Is that so? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> What is the output of -ksp_view -log_summary on the >>>>>>>>>>>>>>>> problem and then on the problem doubled in size and number of processors? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Barry >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have attached the output >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 48 cores: log48 >>>>>>>>>>>>>>> 96 cores: log96 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> There are 2 solvers - The momentum linear eqn uses bcgs, >>>>>>>>>>>>>>> while the Poisson eqn uses hypre BoomerAMG. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Problem size doubled from 158x266x150 to 158x266x300. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> So is it fair to say that the main problem does not lie in >>>>>>>>>>>>>>>>> my programming skills, but rather the way the linear equations are solved? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Matt >>>>>>>>>>>>>>>>>> Is it possible for this type of scaling in PETSc (>50%), >>>>>>>>>>>>>>>>>> when using 17640 (2205X8) cores? >>>>>>>>>>>>>>>>>> Btw, I do not have access to the system. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Sent using CloudMagic Email >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> What most experimenters take for granted before they >>>>>>>>>>>>>>>>>> begin their experiments is infinitely more interesting than any results to >>>>>>>>>>>>>>>>>> which their experiments lead. >>>>>>>>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zonexo at gmail.com Tue Nov 3 09:04:56 2015 From: zonexo at gmail.com (TAY wee-beng) Date: Tue, 3 Nov 2015 23:04:56 +0800 Subject: [petsc-users] Scaling with number of cores In-Reply-To: References: <5634EDA4.5030203@gmail.com> <56356E2A.8090000@gmail.com> <0BEDF334-C0C6-49D1-9FE8-D178181C5F00@mcs.anl.gov> <5636140A.3040506@gmail.com> <61EEB3FD-A093-475C-9841-9E79BB120385@mcs.anl.gov> <5636BDF3.8000109@gmail.com> <5636E059.2010107@gmail.com> <501DC517-2887-4222-9ECD-9951D521A4E5@mcs.anl.gov> <56370085.7070502@gmail.com> <6FF57BB6-FB0D-49F0-BF84-E4EA760FC010@mcs.anl.gov> <56372A12.90900@gmail.com> <563839F8.9080000@gmail.com> <589BE4CF-18B5-4F04-A442-7FB0F2F3AFC1@mcs.anl.gov> <5638AD42.9060609@gmail.com> <5638AF6F.80405@gmail.com> Message-ID: <5638CD18.80101@gmail.com> On 3/11/2015 9:01 PM, Matthew Knepley wrote: > On Tue, Nov 3, 2015 at 6:58 AM, TAY wee-beng > wrote: > > > On 3/11/2015 8:52 PM, Matthew Knepley wrote: >> On Tue, Nov 3, 2015 at 6:49 AM, TAY wee-beng > > wrote: >> >> Hi, >> >> I tried and have attached the log. >> >> Ya, my Poisson eqn has Neumann boundary condition. Do I need >> to specify some null space stuff? Like KSPSetNullSpace or >> MatNullSpaceCreate? >> >> >> Yes, you need to attach the constant null space to the matrix. >> >> Thanks, >> >> Matt > Ok so can you point me to a suitable example so that I know which > one to use specifically? > > > https://bitbucket.org/petsc/petsc/src/9ae8fd060698c4d6fc0d13188aca8a1828c138ab/src/snes/examples/tutorials/ex12.c?at=master&fileviewer=file-view-default#ex12.c-761 > > Matt Hi, Actually, I realised that for my Poisson eqn, I have neumann and dirichlet BC. Dirichlet BC is at the output grids by specifying pressure = 0. So do I still need the null space? My Poisson eqn LHS is fixed but RHS is changing with every timestep. If I need to use null space, how do I know if the null space contains the constant vector and what the the no. of vectors? I follow the example given and added: call MatNullSpaceCreate(MPI_COMM_WORLD,PETSC_TRUE,0,NULL,nullsp,ierr) call MatSetNullSpace(A,nullsp,ierr) call MatNullSpaceDestroy(nullsp,ierr) Is that all? Before this, I was using HYPRE geometric solver and the matrix / vector in the subroutine was written based on HYPRE. It worked pretty well and fast. However, it's a black box and it's hard to diagnose problems. I always had the PETSc subroutine to solve my Poisson eqn but I used KSPBCGS or KSPGMRES with HYPRE's boomeramg as the PC. It worked but was slow. Matt: Thanks, I will see how it goes using the nullspace and may try "/-mg_coarse_pc_type svd/" later. > > Thanks. >> >> >> Thank you >> >> Yours sincerely, >> >> TAY wee-beng >> >> On 3/11/2015 12:45 PM, Barry Smith wrote: >> >> On Nov 2, 2015, at 10:37 PM, TAY >> wee-beng> >> wrote: >> >> Hi, >> >> I tried : >> >> 1. -poisson_pc_gamg_agg_nsmooths 1 -poisson_pc_type gamg >> >> 2. -poisson_pc_type gamg >> >> Run with -poisson_ksp_monitor_true_residual >> -poisson_ksp_monitor_converged_reason >> Does your poisson have Neumann boundary conditions? Do >> you have any zeros on the diagonal for the matrix (you >> shouldn't). >> >> There may be something wrong with your poisson >> discretization that was also messing up hypre >> >> >> >> Both options give: >> >> 1 0.00150000 0.00000000 0.00000000 >> 1.00000000 NaN NaN NaN >> M Diverged but why?, time = 2 >> reason = -9 >> >> How can I check what's wrong? >> >> Thank you >> >> Yours sincerely, >> >> TAY wee-beng >> >> On 3/11/2015 3:18 AM, Barry Smith wrote: >> >> hypre is just not scaling well here. I do not >> know why. Since hypre is a block box for us there >> is no way to determine why the poor scaling. >> >> If you make the same two runs with -pc_type >> gamg there will be a lot more information in the >> log summary about in what routines it is scaling >> well or poorly. >> >> Barry >> >> >> >> On Nov 2, 2015, at 3:17 AM, TAY >> wee-beng> > wrote: >> >> Hi, >> >> I have attached the 2 files. >> >> Thank you >> >> Yours sincerely, >> >> TAY wee-beng >> >> On 2/11/2015 2:55 PM, Barry Smith wrote: >> >> Run (158/2)x(266/2)x(150/2) grid on 8 >> processes and then (158)x(266)x(150) on >> 64 processors and send the two >> -log_summary results >> >> Barry >> >> >> On Nov 2, 2015, at 12:19 AM, TAY >> wee-beng> > wrote: >> >> Hi, >> >> I have attached the new results. >> >> Thank you >> >> Yours sincerely, >> >> TAY wee-beng >> >> On 2/11/2015 12:27 PM, Barry Smith wrote: >> >> Run without the >> -momentum_ksp_view >> -poisson_ksp_view and send the >> new results >> >> >> You can see from the log >> summary that the PCSetUp is >> taking a much smaller percentage >> of the time meaning that it is >> reusing the preconditioner and >> not rebuilding it each time. >> >> Barry >> >> Something makes no sense with >> the output: it gives >> >> KSPSolve 199 1.0 >> 2.3298e+03 1.0 5.20e+09 1.8 >> 3.8e+04 9.9e+05 5.0e+02 90100 >> 66100 24 90100 66100 24 165 >> >> 90% of the time is in the solve >> but there is no significant >> amount of time in other events of >> the code which is just not >> possible. I hope it is due to >> your IO. >> >> >> >> On Nov 1, 2015, at 10:02 PM, >> TAY wee-beng> > wrote: >> >> Hi, >> >> I have attached the new run >> with 100 time steps for 48 >> and 96 cores. >> >> Only the Poisson eqn 's RHS >> changes, the LHS doesn't. So >> if I want to reuse the >> preconditioner, what must I >> do? Or what must I not do? >> >> Why does the number of >> processes increase so much? >> Is there something wrong with >> my coding? Seems to be so too >> for my new run. >> >> Thank you >> >> Yours sincerely, >> >> TAY wee-beng >> >> On 2/11/2015 9:49 AM, Barry >> Smith wrote: >> >> If you are doing many >> time steps with the same >> linear solver then you >> MUST do your weak scaling >> studies with MANY time >> steps since the setup >> time of AMG only takes >> place in the first >> stimestep. So run both 48 >> and 96 processes with the >> same large number of time >> steps. >> >> Barry >> >> >> >> On Nov 1, 2015, at >> 7:35 PM, TAY >> wee-beng> > >> wrote: >> >> Hi, >> >> Sorry I forgot and >> use the old a.out. I >> have attached the new >> log for 48cores >> (log48), together >> with the 96cores log >> (log96). >> >> Why does the number >> of processes increase >> so much? Is there >> something wrong with >> my coding? >> >> Only the Poisson eqn >> 's RHS changes, the >> LHS doesn't. So if I >> want to reuse the >> preconditioner, what >> must I do? Or what >> must I not do? >> >> Lastly, I only >> simulated 2 time >> steps previously. Now >> I run for 10 >> timesteps (log48_10). >> Is it building the >> preconditioner at >> every timestep? >> >> Also, what about >> momentum eqn? Is it >> working well? >> >> I will try the gamg >> later too. >> >> Thank you >> >> Yours sincerely, >> >> TAY wee-beng >> >> On 2/11/2015 12:30 >> AM, Barry Smith wrote: >> >> You used gmres >> with 48 processes >> but richardson >> with 96. You need >> to be careful and >> make sure you >> don't change the >> solvers when you >> change the number >> of processors >> since you can get >> very different >> inconsistent results >> >> Anyways all >> the time is being >> spent in the >> BoomerAMG >> algebraic >> multigrid setup >> and it is is >> scaling badly. >> When you double >> the problem size >> and number of >> processes it went >> from 3.2445e+01 >> to 4.3599e+02 >> seconds. >> >> PCSetUp 3 1.0 >> 3.2445e+01 1.0 >> 9.58e+06 2.0 >> 0.0e+00 0.0e+00 >> 4.0e+00 62 8 0 >> 0 4 62 8 0 0 >> 5 11 >> >> PCSetUp 3 1.0 >> 4.3599e+02 1.0 >> 9.58e+06 2.0 >> 0.0e+00 0.0e+00 >> 4.0e+00 85 18 0 >> 0 6 85 18 0 0 >> 6 2 >> >> Now is the >> Poisson problem >> changing at each >> timestep or can >> you use the same >> preconditioner >> built with >> BoomerAMG for all >> the time steps? >> Algebraic >> multigrid has a >> large set up time >> that you often >> doesn't matter if >> you have many >> time steps but if >> you have to >> rebuild it each >> timestep it is >> too large? >> >> You might also >> try -pc_type gamg >> and see how >> PETSc's algebraic >> multigrid scales >> for your >> problem/machine. >> >> Barry >> >> >> >> On Nov 1, >> 2015, at 7:30 >> AM, TAY >> wee-beng> > >> wrote: >> >> >> On 1/11/2015 >> 10:00 AM, >> Barry Smith >> wrote: >> >> On >> Oct >> 31, >> 2015, >> at >> 8:43 >> PM, >> TAY >> wee-beng> > >> wrote: >> >> >> On >> 1/11/2015 >> 12:47 >> AM, >> Matthew >> Knepley >> wrote: >> >> On Sat, >> Oct >> 31, >> 2015 >> at 11:34 >> AM, >> TAY >> wee-beng> > >> wrote: >> Hi, >> >> I >> understand >> that >> as mentioned >> in the >> faq, >> due >> to the >> limitations >> in memory, >> the >> scaling >> is not >> linear. >> So, >> I >> am trying >> to write >> a >> proposal >> to use >> a >> supercomputer. >> Its >> specs >> are: >> Compute >> nodes: >> 82,944 >> nodes >> (SPARC64 >> VIIIfx; >> 16GB >> of memory >> per >> node) >> >> 8 >> cores >> / >> processor >> Interconnect: >> Tofu >> (6-dimensional >> mesh/torus) >> Interconnect >> Each >> cabinet >> contains >> 96 computing >> nodes, >> One >> of the >> requirement >> is to >> give >> the >> performance >> of my >> current >> code >> with >> my current >> set >> of data, >> and >> there >> is a >> formula >> to calculate >> the >> estimated >> parallel >> efficiency >> when >> using >> the >> new >> large >> set >> of data >> There >> are >> 2 >> ways >> to give >> performance: >> 1. Strong >> scaling, >> which >> is defined >> as how >> the >> elapsed >> time >> varies >> with >> the >> number >> of processors >> for >> a >> fixed >> problem. >> 2. Weak >> scaling, >> which >> is defined >> as how >> the >> elapsed >> time >> varies >> with >> the >> number >> of processors >> for a >> fixed >> problem >> size >> per >> processor. >> I >> ran >> my cases >> with >> 48 and >> 96 cores >> with >> my current >> cluster, >> giving >> 140 >> and >> 90 mins >> respectively. >> This >> is classified >> as strong >> scaling. >> Cluster >> specs: >> CPU: >> AMD >> 6234 >> 2.4GHz >> 8 >> cores >> / >> processor >> (CPU) >> 6 >> CPU >> / >> node >> So 48 >> Cores >> / CPU >> Not >> sure >> abt >> the >> memory >> / >> node >> >> The >> parallel >> efficiency >> ?En? >> for >> a >> given >> degree >> of parallelism >> ?n? >> indicates >> how >> much >> the >> program >> is >> efficiently >> accelerated >> by parallel >> processing. >> ?En? >> is given >> by the >> following >> formulae. >> Although >> their >> derivation >> processes >> are >> different >> depending >> on strong >> and >> weak >> scaling, >> derived >> formulae >> are >> the >> same. >> From >> the >> estimated >> time, >> my parallel >> efficiency >> using >> Amdahl's >> law >> on the >> current >> old >> cluster >> was >> 52.7%. >> So is >> my results >> acceptable? >> For >> the >> large >> data >> set, >> if using >> 2205 >> nodes >> (2205X8cores), >> my expected >> parallel >> efficiency >> is only >> 0.5%. >> The >> proposal >> recommends >> value >> of > >> 50%. >> The >> problem >> with >> this >> analysis >> is that >> the >> estimated >> serial >> fraction >> from >> Amdahl's >> Law >> changes >> as a >> function >> of problem >> size, >> so you >> cannot >> take >> the >> strong >> scaling >> from >> one >> problem >> and >> apply >> it to >> another >> without >> a >> model >> of this >> dependence. >> >> Weak >> scaling >> does >> model >> changes >> with >> problem >> size, >> so I >> would >> measure >> weak >> scaling >> on your >> current >> cluster, >> and >> extrapolate >> to the >> big >> machine. >> I >> realize >> that >> this >> does >> not >> make >> sense >> for >> many >> scientific >> applications, >> but >> neither >> does >> requiring >> a >> certain >> parallel >> efficiency. >> >> Ok I >> check >> the >> results >> for >> my >> weak >> scaling >> it is >> even >> worse >> for >> the >> expected >> parallel >> efficiency. >> From >> the >> formula >> used, >> it's >> obvious >> it's >> doing >> some >> sort >> of >> exponential >> extrapolation >> decrease. >> So >> unless I >> can >> achieve >> a >> near >> > 90% >> speed >> up >> when >> I >> double the >> cores >> and >> problem >> size >> for >> my >> current >> 48/96 >> cores >> setup, extrapolating >> from >> about >> 96 >> nodes >> to >> 10,000 nodes >> will >> give >> a >> much >> lower >> expected >> parallel >> efficiency >> for >> the >> new case. >> >> However, >> it's >> mentioned >> in >> the >> FAQ >> that >> due >> to >> memory requirement, >> it's >> impossible >> to >> get >> >90% >> speed >> when >> I >> double the >> cores >> and >> problem >> size >> (ie >> linear increase >> in >> performance), >> which >> means >> that >> I >> can't >> get >> >90% >> speed >> up >> when >> I >> double the >> cores >> and >> problem >> size >> for >> my >> current >> 48/96 >> cores >> setup. Is >> that so? >> >> What >> is the >> output of >> -ksp_view >> -log_summary >> on the >> problem >> and then >> on the >> problem >> doubled >> in size >> and >> number of >> processors? >> >> Barry >> >> Hi, >> >> I have >> attached the >> output >> >> 48 cores: log48 >> 96 cores: log96 >> >> There are 2 >> solvers - The >> momentum >> linear eqn >> uses bcgs, >> while the >> Poisson eqn >> uses hypre >> BoomerAMG. >> >> Problem size >> doubled from >> 158x266x150 >> to 158x266x300. >> >> So is >> it >> fair >> to >> say >> that >> the >> main >> problem >> does >> not >> lie >> in my >> programming >> skills, >> but >> rather the >> way >> the >> linear equations >> are >> solved? >> >> Thanks. >> >> >> Thanks, >> >> >> >> >> Matt >> Is it >> possible >> for >> this >> type >> of scaling >> in PETSc >> (>50%), >> when >> using >> 17640 >> (2205X8) >> cores? >> Btw, >> I >> do not >> have >> access >> to the >> system. >> >> >> >> Sent >> using >> CloudMagic >> Email >> >> >> >> -- >> What >> most >> experimenters >> take >> for >> granted >> before >> they >> begin >> their >> experiments >> is infinitely >> more >> interesting >> than >> any >> results >> to which >> their >> experiments >> lead. >> -- Norbert >> Wiener >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to >> which their experiments lead. >> -- Norbert Wiener > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From ling.zou at inl.gov Tue Nov 3 09:12:50 2015 From: ling.zou at inl.gov (Zou (Non-US), Ling) Date: Tue, 3 Nov 2015 08:12:50 -0700 Subject: [petsc-users] How do I know it is steady state? In-Reply-To: References: <30E80E96-BC25-489D-8110-D81D9123ED19@mcs.anl.gov> Message-ID: Matt, thanks for the reply. The simulation is a transient simulation, which eventually converges to a steady-state solution, given enough simulation time. My code runs fine and I could tell the simulation reaches steady state by looking at the residual monitored by SNES monitor function. See an example screen output Solving time step 90, using BDF1, dt = 0.1. Current time (the starting time of this time step) = 8.85. NL step = 0, SNES Function norm = 1.47538E-02 NL step = 1, SNES Function norm = 8.06971E-04 total_FunctionCall_number: 0 converged, time step increased = 0.1 Solving time step 91, using BDF1, dt = 0.1. Current time (the starting time of this time step) = 8.95. NL step = 0, SNES Function norm = 1.10861E-02 NL step = 1, SNES Function norm = 6.26584E-04 total_FunctionCall_number: 0 converged, time step increased = 0.1 Solving time step 92, using BDF1, dt = 0.1. Current time (the starting time of this time step) = 9.05. NL step = 0, SNES Function norm = 7.21253E-03 NL step = 1, SNES Function norm = 9.93402E-04 total_FunctionCall_number: 0 converged, time step increased = 0.1 Solving time step 93, using BDF1, dt = 0.1. Current time (the starting time of this time step) = 9.15. NL step = 0, SNES Function norm = 5.40260E-03 NL step = 1, SNES Function norm = 6.21162E-04 total_FunctionCall_number: 0 converged, time step increased = 0.1 Solving time step 94, using BDF1, dt = 0.1. Current time (the starting time of this time step) = 9.25. NL step = 0, SNES Function norm = 3.40214E-03 NL step = 1, SNES Function norm = 6.16805E-04 total_FunctionCall_number: 0 converged, time step increased = 0.1 Solving time step 95, using BDF1, dt = 0.1. Current time (the starting time of this time step) = 9.35. NL step = 0, SNES Function norm = 2.29656E-03 NL step = 1, SNES Function norm = 6.19337E-04 total_FunctionCall_number: 0 converged, time step increased = 0.1 Solving time step 96, using BDF1, dt = 0.1. Current time (the starting time of this time step) = 9.45. NL step = 0, SNES Function norm = 1.53218E-03 NL step = 1, SNES Function norm = 5.94845E-04 total_FunctionCall_number: 0 converged, time step increased = 0.1 Solving time step 97, using BDF1, dt = 0.1. Current time (the starting time of this time step) = 9.55. NL step = 0, SNES Function norm = 1.32136E-03 NL step = 1, SNES Function norm = 6.19933E-04 total_FunctionCall_number: 0 converged, time step increased = 0.1 Solving time step 98, using BDF1, dt = 0.1. Current time (the starting time of this time step) = 9.65. NL step = 0, SNES Function norm = 7.09342E-04 NL step = 1, SNES Function norm = 6.18694E-04 total_FunctionCall_number: 0 converged, time step increased = 0.1 Solving time step 99, using BDF1, dt = 0.1. Current time (the starting time of this time step) = 9.75. NL step = 0, SNES Function norm = 5.49192E-04 total_FunctionCall_number: 0 converged, time step increased = 0.1 Solving time step 100, using BDF1, dt = 0.1. Current time (the starting time of this time step) = 9.85. NL step = 0, SNES Function norm = 5.49192E-04 total_FunctionCall_number: 0 converged, time step increased = 0.1 Solving time step 101, using BDF1, dt = 0.1. Current time (the starting time of this time step) = 9.95. NL step = 0, SNES Function norm = 5.49192E-04 total_FunctionCall_number: 0 I observed that after time step 99, the residual never changed, so I believe the transient simulation converges at time step 99. I wonder can I use the criterion "SNES converges and it takes 0 iteration" to say the simulation reaches a steady state. Such that I don't have to look at the screen and the code knows it converges and should stop. Put it another way, what's the common way people would implement a scheme to detect a transient simulation reaches steady state. Thanks, Ling On Tue, Nov 3, 2015 at 5:25 AM, Matthew Knepley wrote: > On Mon, Nov 2, 2015 at 7:29 PM, Barry Smith wrote: > >> >> > On Oct 30, 2015, at 12:23 PM, Zou (Non-US), Ling >> wrote: >> > >> > Hi All, >> > >> > From physics point of view, I know my simulation converges if nothing >> changes any more. >> > >> > I wonder how normally you do to detect if your simulation reaches >> steady state from numerical point of view. >> > Is it a good practice to use SNES convergence as a criterion, i.e., >> > SNES converges and it takes 0 iteration(s) >> >> Depends on the time integrator and SNES tolerance you are using. If >> you use a -snes_rtol 1.e-5 it will always try to squeeze 5 MORE digits out >> of the residual so won't take 0 iterations even if there is only a small >> change in the solution. >> > > There are two different situations here: > > 1) Solving for a mathematical steady state. You remove the time > derivative and solve the algebraic system with SNES. Then > the SNES tolerance is a good measure. > > 2) Use timestepping to advance until nothing looks like it is changing. > This is a "physical" steady state. > > You can use 1) with a timestepping preconditioner TSPSEUDO, which is what > I would recommend if you > want a true steady state. > > Thanks, > > Matt > > >> > >> > Thanks, >> > >> > Ling >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zonexo at gmail.com Tue Nov 3 09:16:23 2015 From: zonexo at gmail.com (TAY wee-beng) Date: Tue, 3 Nov 2015 23:16:23 +0800 Subject: [petsc-users] Scaling with number of cores In-Reply-To: References: <5634EDA4.5030203@gmail.com> <56356E2A.8090000@gmail.com> <0BEDF334-C0C6-49D1-9FE8-D178181C5F00@mcs.anl.gov> <5636140A.3040506@gmail.com> <61EEB3FD-A093-475C-9841-9E79BB120385@mcs.anl.gov> <5636BDF3.8000109@gmail.com> <5636E059.2010107@gmail.com> <501DC517-2887-4222-9ECD-9951D521A4E5@mcs.anl.gov> <56370085.7070502@gmail.com> <6FF57BB6-FB0D-49F0-BF84-E4EA760FC010@mcs.anl.gov> <56372A12.90900@gmail.com> <563839F8.9080000@gmail.com> <589BE4CF-18B5-4F04-A442-7FB0F2F3AFC1@mcs.anl.gov> <5638AD42.9060609@gmail.com> <5638AF6F.80405@gmail.com> Message-ID: <5638CFC7.3000300@gmail.com> On 3/11/2015 9:01 PM, Matthew Knepley wrote: > On Tue, Nov 3, 2015 at 6:58 AM, TAY wee-beng > wrote: > > > On 3/11/2015 8:52 PM, Matthew Knepley wrote: >> On Tue, Nov 3, 2015 at 6:49 AM, TAY wee-beng > > wrote: >> >> Hi, >> >> I tried and have attached the log. >> >> Ya, my Poisson eqn has Neumann boundary condition. Do I need >> to specify some null space stuff? Like KSPSetNullSpace or >> MatNullSpaceCreate? >> >> >> Yes, you need to attach the constant null space to the matrix. >> >> Thanks, >> >> Matt > Ok so can you point me to a suitable example so that I know which > one to use specifically? > > > https://bitbucket.org/petsc/petsc/src/9ae8fd060698c4d6fc0d13188aca8a1828c138ab/src/snes/examples/tutorials/ex12.c?at=master&fileviewer=file-view-default#ex12.c-761 > > Matt Oh ya, How do I call: call MatNullSpaceCreate(MPI_COMM_WORLD,PETSC_TRUE,0,NULL,nullsp,ierr) But it says NULL is not defined. How do I define it? Thanks > Thanks. >> >> >> Thank you >> >> Yours sincerely, >> >> TAY wee-beng >> >> On 3/11/2015 12:45 PM, Barry Smith wrote: >> >> On Nov 2, 2015, at 10:37 PM, TAY >> wee-beng> >> wrote: >> >> Hi, >> >> I tried : >> >> 1. -poisson_pc_gamg_agg_nsmooths 1 -poisson_pc_type gamg >> >> 2. -poisson_pc_type gamg >> >> Run with -poisson_ksp_monitor_true_residual >> -poisson_ksp_monitor_converged_reason >> Does your poisson have Neumann boundary conditions? Do >> you have any zeros on the diagonal for the matrix (you >> shouldn't). >> >> There may be something wrong with your poisson >> discretization that was also messing up hypre >> >> >> >> Both options give: >> >> 1 0.00150000 0.00000000 0.00000000 >> 1.00000000 NaN NaN NaN >> M Diverged but why?, time = 2 >> reason = -9 >> >> How can I check what's wrong? >> >> Thank you >> >> Yours sincerely, >> >> TAY wee-beng >> >> On 3/11/2015 3:18 AM, Barry Smith wrote: >> >> hypre is just not scaling well here. I do not >> know why. Since hypre is a block box for us there >> is no way to determine why the poor scaling. >> >> If you make the same two runs with -pc_type >> gamg there will be a lot more information in the >> log summary about in what routines it is scaling >> well or poorly. >> >> Barry >> >> >> >> On Nov 2, 2015, at 3:17 AM, TAY >> wee-beng> > wrote: >> >> Hi, >> >> I have attached the 2 files. >> >> Thank you >> >> Yours sincerely, >> >> TAY wee-beng >> >> On 2/11/2015 2:55 PM, Barry Smith wrote: >> >> Run (158/2)x(266/2)x(150/2) grid on 8 >> processes and then (158)x(266)x(150) on >> 64 processors and send the two >> -log_summary results >> >> Barry >> >> >> On Nov 2, 2015, at 12:19 AM, TAY >> wee-beng> > wrote: >> >> Hi, >> >> I have attached the new results. >> >> Thank you >> >> Yours sincerely, >> >> TAY wee-beng >> >> On 2/11/2015 12:27 PM, Barry Smith wrote: >> >> Run without the >> -momentum_ksp_view >> -poisson_ksp_view and send the >> new results >> >> >> You can see from the log >> summary that the PCSetUp is >> taking a much smaller percentage >> of the time meaning that it is >> reusing the preconditioner and >> not rebuilding it each time. >> >> Barry >> >> Something makes no sense with >> the output: it gives >> >> KSPSolve 199 1.0 >> 2.3298e+03 1.0 5.20e+09 1.8 >> 3.8e+04 9.9e+05 5.0e+02 90100 >> 66100 24 90100 66100 24 165 >> >> 90% of the time is in the solve >> but there is no significant >> amount of time in other events of >> the code which is just not >> possible. I hope it is due to >> your IO. >> >> >> >> On Nov 1, 2015, at 10:02 PM, >> TAY wee-beng> > wrote: >> >> Hi, >> >> I have attached the new run >> with 100 time steps for 48 >> and 96 cores. >> >> Only the Poisson eqn 's RHS >> changes, the LHS doesn't. So >> if I want to reuse the >> preconditioner, what must I >> do? Or what must I not do? >> >> Why does the number of >> processes increase so much? >> Is there something wrong with >> my coding? Seems to be so too >> for my new run. >> >> Thank you >> >> Yours sincerely, >> >> TAY wee-beng >> >> On 2/11/2015 9:49 AM, Barry >> Smith wrote: >> >> If you are doing many >> time steps with the same >> linear solver then you >> MUST do your weak scaling >> studies with MANY time >> steps since the setup >> time of AMG only takes >> place in the first >> stimestep. So run both 48 >> and 96 processes with the >> same large number of time >> steps. >> >> Barry >> >> >> >> On Nov 1, 2015, at >> 7:35 PM, TAY >> wee-beng> > >> wrote: >> >> Hi, >> >> Sorry I forgot and >> use the old a.out. I >> have attached the new >> log for 48cores >> (log48), together >> with the 96cores log >> (log96). >> >> Why does the number >> of processes increase >> so much? Is there >> something wrong with >> my coding? >> >> Only the Poisson eqn >> 's RHS changes, the >> LHS doesn't. So if I >> want to reuse the >> preconditioner, what >> must I do? Or what >> must I not do? >> >> Lastly, I only >> simulated 2 time >> steps previously. Now >> I run for 10 >> timesteps (log48_10). >> Is it building the >> preconditioner at >> every timestep? >> >> Also, what about >> momentum eqn? Is it >> working well? >> >> I will try the gamg >> later too. >> >> Thank you >> >> Yours sincerely, >> >> TAY wee-beng >> >> On 2/11/2015 12:30 >> AM, Barry Smith wrote: >> >> You used gmres >> with 48 processes >> but richardson >> with 96. You need >> to be careful and >> make sure you >> don't change the >> solvers when you >> change the number >> of processors >> since you can get >> very different >> inconsistent results >> >> Anyways all >> the time is being >> spent in the >> BoomerAMG >> algebraic >> multigrid setup >> and it is is >> scaling badly. >> When you double >> the problem size >> and number of >> processes it went >> from 3.2445e+01 >> to 4.3599e+02 >> seconds. >> >> PCSetUp 3 1.0 >> 3.2445e+01 1.0 >> 9.58e+06 2.0 >> 0.0e+00 0.0e+00 >> 4.0e+00 62 8 0 >> 0 4 62 8 0 0 >> 5 11 >> >> PCSetUp 3 1.0 >> 4.3599e+02 1.0 >> 9.58e+06 2.0 >> 0.0e+00 0.0e+00 >> 4.0e+00 85 18 0 >> 0 6 85 18 0 0 >> 6 2 >> >> Now is the >> Poisson problem >> changing at each >> timestep or can >> you use the same >> preconditioner >> built with >> BoomerAMG for all >> the time steps? >> Algebraic >> multigrid has a >> large set up time >> that you often >> doesn't matter if >> you have many >> time steps but if >> you have to >> rebuild it each >> timestep it is >> too large? >> >> You might also >> try -pc_type gamg >> and see how >> PETSc's algebraic >> multigrid scales >> for your >> problem/machine. >> >> Barry >> >> >> >> On Nov 1, >> 2015, at 7:30 >> AM, TAY >> wee-beng> > >> wrote: >> >> >> On 1/11/2015 >> 10:00 AM, >> Barry Smith >> wrote: >> >> On >> Oct >> 31, >> 2015, >> at >> 8:43 >> PM, >> TAY >> wee-beng> > >> wrote: >> >> >> On >> 1/11/2015 >> 12:47 >> AM, >> Matthew >> Knepley >> wrote: >> >> On Sat, >> Oct >> 31, >> 2015 >> at 11:34 >> AM, >> TAY >> wee-beng> > >> wrote: >> Hi, >> >> I >> understand >> that >> as mentioned >> in the >> faq, >> due >> to the >> limitations >> in memory, >> the >> scaling >> is not >> linear. >> So, >> I >> am trying >> to write >> a >> proposal >> to use >> a >> supercomputer. >> Its >> specs >> are: >> Compute >> nodes: >> 82,944 >> nodes >> (SPARC64 >> VIIIfx; >> 16GB >> of memory >> per >> node) >> >> 8 >> cores >> / >> processor >> Interconnect: >> Tofu >> (6-dimensional >> mesh/torus) >> Interconnect >> Each >> cabinet >> contains >> 96 computing >> nodes, >> One >> of the >> requirement >> is to >> give >> the >> performance >> of my >> current >> code >> with >> my current >> set >> of data, >> and >> there >> is a >> formula >> to calculate >> the >> estimated >> parallel >> efficiency >> when >> using >> the >> new >> large >> set >> of data >> There >> are >> 2 >> ways >> to give >> performance: >> 1. Strong >> scaling, >> which >> is defined >> as how >> the >> elapsed >> time >> varies >> with >> the >> number >> of processors >> for >> a >> fixed >> problem. >> 2. Weak >> scaling, >> which >> is defined >> as how >> the >> elapsed >> time >> varies >> with >> the >> number >> of processors >> for a >> fixed >> problem >> size >> per >> processor. >> I >> ran >> my cases >> with >> 48 and >> 96 cores >> with >> my current >> cluster, >> giving >> 140 >> and >> 90 mins >> respectively. >> This >> is classified >> as strong >> scaling. >> Cluster >> specs: >> CPU: >> AMD >> 6234 >> 2.4GHz >> 8 >> cores >> / >> processor >> (CPU) >> 6 >> CPU >> / >> node >> So 48 >> Cores >> / CPU >> Not >> sure >> abt >> the >> memory >> / >> node >> >> The >> parallel >> efficiency >> ?En? >> for >> a >> given >> degree >> of parallelism >> ?n? >> indicates >> how >> much >> the >> program >> is >> efficiently >> accelerated >> by parallel >> processing. >> ?En? >> is given >> by the >> following >> formulae. >> Although >> their >> derivation >> processes >> are >> different >> depending >> on strong >> and >> weak >> scaling, >> derived >> formulae >> are >> the >> same. >> From >> the >> estimated >> time, >> my parallel >> efficiency >> using >> Amdahl's >> law >> on the >> current >> old >> cluster >> was >> 52.7%. >> So is >> my results >> acceptable? >> For >> the >> large >> data >> set, >> if using >> 2205 >> nodes >> (2205X8cores), >> my expected >> parallel >> efficiency >> is only >> 0.5%. >> The >> proposal >> recommends >> value >> of > >> 50%. >> The >> problem >> with >> this >> analysis >> is that >> the >> estimated >> serial >> fraction >> from >> Amdahl's >> Law >> changes >> as a >> function >> of problem >> size, >> so you >> cannot >> take >> the >> strong >> scaling >> from >> one >> problem >> and >> apply >> it to >> another >> without >> a >> model >> of this >> dependence. >> >> Weak >> scaling >> does >> model >> changes >> with >> problem >> size, >> so I >> would >> measure >> weak >> scaling >> on your >> current >> cluster, >> and >> extrapolate >> to the >> big >> machine. >> I >> realize >> that >> this >> does >> not >> make >> sense >> for >> many >> scientific >> applications, >> but >> neither >> does >> requiring >> a >> certain >> parallel >> efficiency. >> >> Ok I >> check >> the >> results >> for >> my >> weak >> scaling >> it is >> even >> worse >> for >> the >> expected >> parallel >> efficiency. >> From >> the >> formula >> used, >> it's >> obvious >> it's >> doing >> some >> sort >> of >> exponential >> extrapolation >> decrease. >> So >> unless I >> can >> achieve >> a >> near >> > 90% >> speed >> up >> when >> I >> double the >> cores >> and >> problem >> size >> for >> my >> current >> 48/96 >> cores >> setup, extrapolating >> from >> about >> 96 >> nodes >> to >> 10,000 nodes >> will >> give >> a >> much >> lower >> expected >> parallel >> efficiency >> for >> the >> new case. >> >> However, >> it's >> mentioned >> in >> the >> FAQ >> that >> due >> to >> memory requirement, >> it's >> impossible >> to >> get >> >90% >> speed >> when >> I >> double the >> cores >> and >> problem >> size >> (ie >> linear increase >> in >> performance), >> which >> means >> that >> I >> can't >> get >> >90% >> speed >> up >> when >> I >> double the >> cores >> and >> problem >> size >> for >> my >> current >> 48/96 >> cores >> setup. Is >> that so? >> >> What >> is the >> output of >> -ksp_view >> -log_summary >> on the >> problem >> and then >> on the >> problem >> doubled >> in size >> and >> number of >> processors? >> >> Barry >> >> Hi, >> >> I have >> attached the >> output >> >> 48 cores: log48 >> 96 cores: log96 >> >> There are 2 >> solvers - The >> momentum >> linear eqn >> uses bcgs, >> while the >> Poisson eqn >> uses hypre >> BoomerAMG. >> >> Problem size >> doubled from >> 158x266x150 >> to 158x266x300. >> >> So is >> it >> fair >> to >> say >> that >> the >> main >> problem >> does >> not >> lie >> in my >> programming >> skills, >> but >> rather the >> way >> the >> linear equations >> are >> solved? >> >> Thanks. >> >> >> Thanks, >> >> >> >> >> Matt >> Is it >> possible >> for >> this >> type >> of scaling >> in PETSc >> (>50%), >> when >> using >> 17640 >> (2205X8) >> cores? >> Btw, >> I >> do not >> have >> access >> to the >> system. >> >> >> >> Sent >> using >> CloudMagic >> Email >> >> >> >> -- >> What >> most >> experimenters >> take >> for >> granted >> before >> they >> begin >> their >> experiments >> is infinitely >> more >> interesting >> than >> any >> results >> to which >> their >> experiments >> lead. >> -- Norbert >> Wiener >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to >> which their experiments lead. >> -- Norbert Wiener > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From ling.zou at inl.gov Tue Nov 3 09:19:54 2015 From: ling.zou at inl.gov (Zou (Non-US), Ling) Date: Tue, 3 Nov 2015 08:19:54 -0700 Subject: [petsc-users] How do I know it is steady state? In-Reply-To: <30E80E96-BC25-489D-8110-D81D9123ED19@mcs.anl.gov> References: <30E80E96-BC25-489D-8110-D81D9123ED19@mcs.anl.gov> Message-ID: Barry, thanks. True and not true. SNES can converge under other conditions, such as SNORM condition, e.g., Solving time step 300, using BDF1, dt = 0.1. Current time (the starting time of this time step) = 29.85. NL step = 0, SNES Function norm = 5.49192E-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 0 In this case, snes_rtol is ignored. Ling On Mon, Nov 2, 2015 at 6:29 PM, Barry Smith wrote: > > > On Oct 30, 2015, at 12:23 PM, Zou (Non-US), Ling > wrote: > > > > Hi All, > > > > From physics point of view, I know my simulation converges if nothing > changes any more. > > > > I wonder how normally you do to detect if your simulation reaches steady > state from numerical point of view. > > Is it a good practice to use SNES convergence as a criterion, i.e., > > SNES converges and it takes 0 iteration(s) > > Depends on the time integrator and SNES tolerance you are using. If you > use a -snes_rtol 1.e-5 it will always try to squeeze 5 MORE digits out of > the residual so won't take 0 iterations even if there is only a small > change in the solution. > > > > Thanks, > > > > Ling > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zonexo at gmail.com Tue Nov 3 09:21:20 2015 From: zonexo at gmail.com (TAY wee-beng) Date: Tue, 3 Nov 2015 23:21:20 +0800 Subject: [petsc-users] Scaling with number of cores In-Reply-To: References: <5634EDA4.5030203@gmail.com> <56356E2A.8090000@gmail.com> <0BEDF334-C0C6-49D1-9FE8-D178181C5F00@mcs.anl.gov> <5636140A.3040506@gmail.com> <61EEB3FD-A093-475C-9841-9E79BB120385@mcs.anl.gov> <5636BDF3.8000109@gmail.com> <5636E059.2010107@gmail.com> <501DC517-2887-4222-9ECD-9951D521A4E5@mcs.anl.gov> <56370085.7070502@gmail.com> <6FF57BB6-FB0D-49F0-BF84-E4EA760FC010@mcs.anl.gov> <56372A12.90900@gmail.com> <563839F8.9080000@gmail.com> <589BE4CF-18B5-4F04-A442-7FB0F2F3AFC1@mcs.anl.gov> <5638AD42.9060609@gmail.com> <5638AF6F.80405@gmail.com> Message-ID: <5638D0F0.60807@gmail.com> On 3/11/2015 9:01 PM, Matthew Knepley wrote: > On Tue, Nov 3, 2015 at 6:58 AM, TAY wee-beng > wrote: > > > On 3/11/2015 8:52 PM, Matthew Knepley wrote: >> On Tue, Nov 3, 2015 at 6:49 AM, TAY wee-beng > > wrote: >> >> Hi, >> >> I tried and have attached the log. >> >> Ya, my Poisson eqn has Neumann boundary condition. Do I need >> to specify some null space stuff? Like KSPSetNullSpace or >> MatNullSpaceCreate? >> >> >> Yes, you need to attach the constant null space to the matrix. >> >> Thanks, >> >> Matt > Ok so can you point me to a suitable example so that I know which > one to use specifically? > > > https://bitbucket.org/petsc/petsc/src/9ae8fd060698c4d6fc0d13188aca8a1828c138ab/src/snes/examples/tutorials/ex12.c?at=master&fileviewer=file-view-default#ex12.c-761 > > Matt Ok did a search and found the ans for the MatNullSpaceCreate: http://petsc-users.mcs.anl.narkive.com/jtIlVll0/pass-petsc-null-integer-to-dynamic-array-of-vec-in-frotran90 > Thanks. >> >> >> Thank you >> >> Yours sincerely, >> >> TAY wee-beng >> >> On 3/11/2015 12:45 PM, Barry Smith wrote: >> >> On Nov 2, 2015, at 10:37 PM, TAY >> wee-beng> >> wrote: >> >> Hi, >> >> I tried : >> >> 1. -poisson_pc_gamg_agg_nsmooths 1 -poisson_pc_type gamg >> >> 2. -poisson_pc_type gamg >> >> Run with -poisson_ksp_monitor_true_residual >> -poisson_ksp_monitor_converged_reason >> Does your poisson have Neumann boundary conditions? Do >> you have any zeros on the diagonal for the matrix (you >> shouldn't). >> >> There may be something wrong with your poisson >> discretization that was also messing up hypre >> >> >> >> Both options give: >> >> 1 0.00150000 0.00000000 0.00000000 >> 1.00000000 NaN NaN NaN >> M Diverged but why?, time = 2 >> reason = -9 >> >> How can I check what's wrong? >> >> Thank you >> >> Yours sincerely, >> >> TAY wee-beng >> >> On 3/11/2015 3:18 AM, Barry Smith wrote: >> >> hypre is just not scaling well here. I do not >> know why. Since hypre is a block box for us there >> is no way to determine why the poor scaling. >> >> If you make the same two runs with -pc_type >> gamg there will be a lot more information in the >> log summary about in what routines it is scaling >> well or poorly. >> >> Barry >> >> >> >> On Nov 2, 2015, at 3:17 AM, TAY >> wee-beng> > wrote: >> >> Hi, >> >> I have attached the 2 files. >> >> Thank you >> >> Yours sincerely, >> >> TAY wee-beng >> >> On 2/11/2015 2:55 PM, Barry Smith wrote: >> >> Run (158/2)x(266/2)x(150/2) grid on 8 >> processes and then (158)x(266)x(150) on >> 64 processors and send the two >> -log_summary results >> >> Barry >> >> >> On Nov 2, 2015, at 12:19 AM, TAY >> wee-beng> > wrote: >> >> Hi, >> >> I have attached the new results. >> >> Thank you >> >> Yours sincerely, >> >> TAY wee-beng >> >> On 2/11/2015 12:27 PM, Barry Smith wrote: >> >> Run without the >> -momentum_ksp_view >> -poisson_ksp_view and send the >> new results >> >> >> You can see from the log >> summary that the PCSetUp is >> taking a much smaller percentage >> of the time meaning that it is >> reusing the preconditioner and >> not rebuilding it each time. >> >> Barry >> >> Something makes no sense with >> the output: it gives >> >> KSPSolve 199 1.0 >> 2.3298e+03 1.0 5.20e+09 1.8 >> 3.8e+04 9.9e+05 5.0e+02 90100 >> 66100 24 90100 66100 24 165 >> >> 90% of the time is in the solve >> but there is no significant >> amount of time in other events of >> the code which is just not >> possible. I hope it is due to >> your IO. >> >> >> >> On Nov 1, 2015, at 10:02 PM, >> TAY wee-beng> > wrote: >> >> Hi, >> >> I have attached the new run >> with 100 time steps for 48 >> and 96 cores. >> >> Only the Poisson eqn 's RHS >> changes, the LHS doesn't. So >> if I want to reuse the >> preconditioner, what must I >> do? Or what must I not do? >> >> Why does the number of >> processes increase so much? >> Is there something wrong with >> my coding? Seems to be so too >> for my new run. >> >> Thank you >> >> Yours sincerely, >> >> TAY wee-beng >> >> On 2/11/2015 9:49 AM, Barry >> Smith wrote: >> >> If you are doing many >> time steps with the same >> linear solver then you >> MUST do your weak scaling >> studies with MANY time >> steps since the setup >> time of AMG only takes >> place in the first >> stimestep. So run both 48 >> and 96 processes with the >> same large number of time >> steps. >> >> Barry >> >> >> >> On Nov 1, 2015, at >> 7:35 PM, TAY >> wee-beng> > >> wrote: >> >> Hi, >> >> Sorry I forgot and >> use the old a.out. I >> have attached the new >> log for 48cores >> (log48), together >> with the 96cores log >> (log96). >> >> Why does the number >> of processes increase >> so much? Is there >> something wrong with >> my coding? >> >> Only the Poisson eqn >> 's RHS changes, the >> LHS doesn't. So if I >> want to reuse the >> preconditioner, what >> must I do? Or what >> must I not do? >> >> Lastly, I only >> simulated 2 time >> steps previously. Now >> I run for 10 >> timesteps (log48_10). >> Is it building the >> preconditioner at >> every timestep? >> >> Also, what about >> momentum eqn? Is it >> working well? >> >> I will try the gamg >> later too. >> >> Thank you >> >> Yours sincerely, >> >> TAY wee-beng >> >> On 2/11/2015 12:30 >> AM, Barry Smith wrote: >> >> You used gmres >> with 48 processes >> but richardson >> with 96. You need >> to be careful and >> make sure you >> don't change the >> solvers when you >> change the number >> of processors >> since you can get >> very different >> inconsistent results >> >> Anyways all >> the time is being >> spent in the >> BoomerAMG >> algebraic >> multigrid setup >> and it is is >> scaling badly. >> When you double >> the problem size >> and number of >> processes it went >> from 3.2445e+01 >> to 4.3599e+02 >> seconds. >> >> PCSetUp 3 1.0 >> 3.2445e+01 1.0 >> 9.58e+06 2.0 >> 0.0e+00 0.0e+00 >> 4.0e+00 62 8 0 >> 0 4 62 8 0 0 >> 5 11 >> >> PCSetUp 3 1.0 >> 4.3599e+02 1.0 >> 9.58e+06 2.0 >> 0.0e+00 0.0e+00 >> 4.0e+00 85 18 0 >> 0 6 85 18 0 0 >> 6 2 >> >> Now is the >> Poisson problem >> changing at each >> timestep or can >> you use the same >> preconditioner >> built with >> BoomerAMG for all >> the time steps? >> Algebraic >> multigrid has a >> large set up time >> that you often >> doesn't matter if >> you have many >> time steps but if >> you have to >> rebuild it each >> timestep it is >> too large? >> >> You might also >> try -pc_type gamg >> and see how >> PETSc's algebraic >> multigrid scales >> for your >> problem/machine. >> >> Barry >> >> >> >> On Nov 1, >> 2015, at 7:30 >> AM, TAY >> wee-beng> > >> wrote: >> >> >> On 1/11/2015 >> 10:00 AM, >> Barry Smith >> wrote: >> >> On >> Oct >> 31, >> 2015, >> at >> 8:43 >> PM, >> TAY >> wee-beng> > >> wrote: >> >> >> On >> 1/11/2015 >> 12:47 >> AM, >> Matthew >> Knepley >> wrote: >> >> On Sat, >> Oct >> 31, >> 2015 >> at 11:34 >> AM, >> TAY >> wee-beng> > >> wrote: >> Hi, >> >> I >> understand >> that >> as mentioned >> in the >> faq, >> due >> to the >> limitations >> in memory, >> the >> scaling >> is not >> linear. >> So, >> I >> am trying >> to write >> a >> proposal >> to use >> a >> supercomputer. >> Its >> specs >> are: >> Compute >> nodes: >> 82,944 >> nodes >> (SPARC64 >> VIIIfx; >> 16GB >> of memory >> per >> node) >> >> 8 >> cores >> / >> processor >> Interconnect: >> Tofu >> (6-dimensional >> mesh/torus) >> Interconnect >> Each >> cabinet >> contains >> 96 computing >> nodes, >> One >> of the >> requirement >> is to >> give >> the >> performance >> of my >> current >> code >> with >> my current >> set >> of data, >> and >> there >> is a >> formula >> to calculate >> the >> estimated >> parallel >> efficiency >> when >> using >> the >> new >> large >> set >> of data >> There >> are >> 2 >> ways >> to give >> performance: >> 1. Strong >> scaling, >> which >> is defined >> as how >> the >> elapsed >> time >> varies >> with >> the >> number >> of processors >> for >> a >> fixed >> problem. >> 2. Weak >> scaling, >> which >> is defined >> as how >> the >> elapsed >> time >> varies >> with >> the >> number >> of processors >> for a >> fixed >> problem >> size >> per >> processor. >> I >> ran >> my cases >> with >> 48 and >> 96 cores >> with >> my current >> cluster, >> giving >> 140 >> and >> 90 mins >> respectively. >> This >> is classified >> as strong >> scaling. >> Cluster >> specs: >> CPU: >> AMD >> 6234 >> 2.4GHz >> 8 >> cores >> / >> processor >> (CPU) >> 6 >> CPU >> / >> node >> So 48 >> Cores >> / CPU >> Not >> sure >> abt >> the >> memory >> / >> node >> >> The >> parallel >> efficiency >> ?En? >> for >> a >> given >> degree >> of parallelism >> ?n? >> indicates >> how >> much >> the >> program >> is >> efficiently >> accelerated >> by parallel >> processing. >> ?En? >> is given >> by the >> following >> formulae. >> Although >> their >> derivation >> processes >> are >> different >> depending >> on strong >> and >> weak >> scaling, >> derived >> formulae >> are >> the >> same. >> From >> the >> estimated >> time, >> my parallel >> efficiency >> using >> Amdahl's >> law >> on the >> current >> old >> cluster >> was >> 52.7%. >> So is >> my results >> acceptable? >> For >> the >> large >> data >> set, >> if using >> 2205 >> nodes >> (2205X8cores), >> my expected >> parallel >> efficiency >> is only >> 0.5%. >> The >> proposal >> recommends >> value >> of > >> 50%. >> The >> problem >> with >> this >> analysis >> is that >> the >> estimated >> serial >> fraction >> from >> Amdahl's >> Law >> changes >> as a >> function >> of problem >> size, >> so you >> cannot >> take >> the >> strong >> scaling >> from >> one >> problem >> and >> apply >> it to >> another >> without >> a >> model >> of this >> dependence. >> >> Weak >> scaling >> does >> model >> changes >> with >> problem >> size, >> so I >> would >> measure >> weak >> scaling >> on your >> current >> cluster, >> and >> extrapolate >> to the >> big >> machine. >> I >> realize >> that >> this >> does >> not >> make >> sense >> for >> many >> scientific >> applications, >> but >> neither >> does >> requiring >> a >> certain >> parallel >> efficiency. >> >> Ok I >> check >> the >> results >> for >> my >> weak >> scaling >> it is >> even >> worse >> for >> the >> expected >> parallel >> efficiency. >> From >> the >> formula >> used, >> it's >> obvious >> it's >> doing >> some >> sort >> of >> exponential >> extrapolation >> decrease. >> So >> unless I >> can >> achieve >> a >> near >> > 90% >> speed >> up >> when >> I >> double the >> cores >> and >> problem >> size >> for >> my >> current >> 48/96 >> cores >> setup, extrapolating >> from >> about >> 96 >> nodes >> to >> 10,000 nodes >> will >> give >> a >> much >> lower >> expected >> parallel >> efficiency >> for >> the >> new case. >> >> However, >> it's >> mentioned >> in >> the >> FAQ >> that >> due >> to >> memory requirement, >> it's >> impossible >> to >> get >> >90% >> speed >> when >> I >> double the >> cores >> and >> problem >> size >> (ie >> linear increase >> in >> performance), >> which >> means >> that >> I >> can't >> get >> >90% >> speed >> up >> when >> I >> double the >> cores >> and >> problem >> size >> for >> my >> current >> 48/96 >> cores >> setup. Is >> that so? >> >> What >> is the >> output of >> -ksp_view >> -log_summary >> on the >> problem >> and then >> on the >> problem >> doubled >> in size >> and >> number of >> processors? >> >> Barry >> >> Hi, >> >> I have >> attached the >> output >> >> 48 cores: log48 >> 96 cores: log96 >> >> There are 2 >> solvers - The >> momentum >> linear eqn >> uses bcgs, >> while the >> Poisson eqn >> uses hypre >> BoomerAMG. >> >> Problem size >> doubled from >> 158x266x150 >> to 158x266x300. >> >> So is >> it >> fair >> to >> say >> that >> the >> main >> problem >> does >> not >> lie >> in my >> programming >> skills, >> but >> rather the >> way >> the >> linear equations >> are >> solved? >> >> Thanks. >> >> >> Thanks, >> >> >> >> >> Matt >> Is it >> possible >> for >> this >> type >> of scaling >> in PETSc >> (>50%), >> when >> using >> 17640 >> (2205X8) >> cores? >> Btw, >> I >> do not >> have >> access >> to the >> system. >> >> >> >> Sent >> using >> CloudMagic >> Email >> >> >> >> -- >> What >> most >> experimenters >> take >> for >> granted >> before >> they >> begin >> their >> experiments >> is infinitely >> more >> interesting >> than >> any >> results >> to which >> their >> experiments >> lead. >> -- Norbert >> Wiener >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to >> which their experiments lead. >> -- Norbert Wiener > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Nov 3 09:24:52 2015 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 3 Nov 2015 09:24:52 -0600 Subject: [petsc-users] How do I know it is steady state? In-Reply-To: References: <30E80E96-BC25-489D-8110-D81D9123ED19@mcs.anl.gov> Message-ID: On Tue, Nov 3, 2015 at 9:12 AM, Zou (Non-US), Ling wrote: > Matt, thanks for the reply. > The simulation is a transient simulation, which eventually converges to a > steady-state solution, given enough simulation time. > My code runs fine and I could tell the simulation reaches steady state by > looking at the residual monitored by SNES monitor function. > > See an example screen output > > Solving time step 90, using BDF1, dt = 0.1. > > Current time (the starting time of this time step) = 8.85. > > NL step = 0, SNES Function norm = 1.47538E-02 > > NL step = 1, SNES Function norm = 8.06971E-04 > > total_FunctionCall_number: 0 > > converged, time step increased = 0.1 > > Solving time step 91, using BDF1, dt = 0.1. > > Current time (the starting time of this time step) = 8.95. > > NL step = 0, SNES Function norm = 1.10861E-02 > > NL step = 1, SNES Function norm = 6.26584E-04 > > total_FunctionCall_number: 0 > > converged, time step increased = 0.1 > > Solving time step 92, using BDF1, dt = 0.1. > > Current time (the starting time of this time step) = 9.05. > > NL step = 0, SNES Function norm = 7.21253E-03 > > NL step = 1, SNES Function norm = 9.93402E-04 > > total_FunctionCall_number: 0 > > converged, time step increased = 0.1 > > Solving time step 93, using BDF1, dt = 0.1. > > Current time (the starting time of this time step) = 9.15. > > NL step = 0, SNES Function norm = 5.40260E-03 > > NL step = 1, SNES Function norm = 6.21162E-04 > > total_FunctionCall_number: 0 > > converged, time step increased = 0.1 > > Solving time step 94, using BDF1, dt = 0.1. > > Current time (the starting time of this time step) = 9.25. > > NL step = 0, SNES Function norm = 3.40214E-03 > > NL step = 1, SNES Function norm = 6.16805E-04 > > total_FunctionCall_number: 0 > > converged, time step increased = 0.1 > > Solving time step 95, using BDF1, dt = 0.1. > > Current time (the starting time of this time step) = 9.35. > > NL step = 0, SNES Function norm = 2.29656E-03 > > NL step = 1, SNES Function norm = 6.19337E-04 > > total_FunctionCall_number: 0 > > converged, time step increased = 0.1 > > Solving time step 96, using BDF1, dt = 0.1. > > Current time (the starting time of this time step) = 9.45. > > NL step = 0, SNES Function norm = 1.53218E-03 > > NL step = 1, SNES Function norm = 5.94845E-04 > > total_FunctionCall_number: 0 > > converged, time step increased = 0.1 > > Solving time step 97, using BDF1, dt = 0.1. > > Current time (the starting time of this time step) = 9.55. > > NL step = 0, SNES Function norm = 1.32136E-03 > > NL step = 1, SNES Function norm = 6.19933E-04 > > total_FunctionCall_number: 0 > > converged, time step increased = 0.1 > > Solving time step 98, using BDF1, dt = 0.1. > > Current time (the starting time of this time step) = 9.65. > > NL step = 0, SNES Function norm = 7.09342E-04 > > NL step = 1, SNES Function norm = 6.18694E-04 > > total_FunctionCall_number: 0 > > converged, time step increased = 0.1 > > Solving time step 99, using BDF1, dt = 0.1. > > Current time (the starting time of this time step) = 9.75. > > NL step = 0, SNES Function norm = 5.49192E-04 > > total_FunctionCall_number: 0 > > converged, time step increased = 0.1 > > Solving time step 100, using BDF1, dt = 0.1. > > Current time (the starting time of this time step) = 9.85. > > NL step = 0, SNES Function norm = 5.49192E-04 > > total_FunctionCall_number: 0 > > converged, time step increased = 0.1 > > Solving time step 101, using BDF1, dt = 0.1. > > Current time (the starting time of this time step) = 9.95. > > NL step = 0, SNES Function norm = 5.49192E-04 > > total_FunctionCall_number: 0 > > I observed that after time step 99, the residual never changed, so I > believe the transient simulation converges at time step 99. > I wonder can I use the criterion "SNES converges and it takes 0 iteration" > to say the simulation reaches a steady state. Such that I don't have to > look at the screen and the code knows it converges and should stop. > > Put it another way, what's the common way people would implement a scheme > to detect a transient simulation reaches steady state. > I don't think so. The above makes no sense to me. You are signaling SNES convergence with a relative residual norm of 5e-4? That does not sound precise enough to me. As I said, I think the believable way to find steady states is to look for solutions to the algebraic equations, perhaps by using timestepping as a preconditioner. Thanks, Matt > Thanks, > > Ling > > > On Tue, Nov 3, 2015 at 5:25 AM, Matthew Knepley wrote: > >> On Mon, Nov 2, 2015 at 7:29 PM, Barry Smith wrote: >> >>> >>> > On Oct 30, 2015, at 12:23 PM, Zou (Non-US), Ling >>> wrote: >>> > >>> > Hi All, >>> > >>> > From physics point of view, I know my simulation converges if nothing >>> changes any more. >>> > >>> > I wonder how normally you do to detect if your simulation reaches >>> steady state from numerical point of view. >>> > Is it a good practice to use SNES convergence as a criterion, i.e., >>> > SNES converges and it takes 0 iteration(s) >>> >>> Depends on the time integrator and SNES tolerance you are using. If >>> you use a -snes_rtol 1.e-5 it will always try to squeeze 5 MORE digits out >>> of the residual so won't take 0 iterations even if there is only a small >>> change in the solution. >>> >> >> There are two different situations here: >> >> 1) Solving for a mathematical steady state. You remove the time >> derivative and solve the algebraic system with SNES. Then >> the SNES tolerance is a good measure. >> >> 2) Use timestepping to advance until nothing looks like it is changing. >> This is a "physical" steady state. >> >> You can use 1) with a timestepping preconditioner TSPSEUDO, which is what >> I would recommend if you >> want a true steady state. >> >> Thanks, >> >> Matt >> >> >>> > >>> > Thanks, >>> > >>> > Ling >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From ling.zou at inl.gov Tue Nov 3 09:38:35 2015 From: ling.zou at inl.gov (Zou (Non-US), Ling) Date: Tue, 3 Nov 2015 08:38:35 -0700 Subject: [petsc-users] How do I know it is steady state? In-Reply-To: References: <30E80E96-BC25-489D-8110-D81D9123ED19@mcs.anl.gov> Message-ID: On Tue, Nov 3, 2015 at 8:24 AM, Matthew Knepley wrote: > On Tue, Nov 3, 2015 at 9:12 AM, Zou (Non-US), Ling > wrote: > >> Matt, thanks for the reply. >> The simulation is a transient simulation, which eventually converges to a >> steady-state solution, given enough simulation time. >> My code runs fine and I could tell the simulation reaches steady state by >> looking at the residual monitored by SNES monitor function. >> >> See an example screen output >> >> Solving time step 90, using BDF1, dt = 0.1. >> >> Current time (the starting time of this time step) = 8.85. >> >> NL step = 0, SNES Function norm = 1.47538E-02 >> >> NL step = 1, SNES Function norm = 8.06971E-04 >> >> total_FunctionCall_number: 0 >> >> converged, time step increased = 0.1 >> >> Solving time step 91, using BDF1, dt = 0.1. >> >> Current time (the starting time of this time step) = 8.95. >> >> NL step = 0, SNES Function norm = 1.10861E-02 >> >> NL step = 1, SNES Function norm = 6.26584E-04 >> >> total_FunctionCall_number: 0 >> >> converged, time step increased = 0.1 >> >> Solving time step 92, using BDF1, dt = 0.1. >> >> Current time (the starting time of this time step) = 9.05. >> >> NL step = 0, SNES Function norm = 7.21253E-03 >> >> NL step = 1, SNES Function norm = 9.93402E-04 >> >> total_FunctionCall_number: 0 >> >> converged, time step increased = 0.1 >> >> Solving time step 93, using BDF1, dt = 0.1. >> >> Current time (the starting time of this time step) = 9.15. >> >> NL step = 0, SNES Function norm = 5.40260E-03 >> >> NL step = 1, SNES Function norm = 6.21162E-04 >> >> total_FunctionCall_number: 0 >> >> converged, time step increased = 0.1 >> >> Solving time step 94, using BDF1, dt = 0.1. >> >> Current time (the starting time of this time step) = 9.25. >> >> NL step = 0, SNES Function norm = 3.40214E-03 >> >> NL step = 1, SNES Function norm = 6.16805E-04 >> >> total_FunctionCall_number: 0 >> >> converged, time step increased = 0.1 >> >> Solving time step 95, using BDF1, dt = 0.1. >> >> Current time (the starting time of this time step) = 9.35. >> >> NL step = 0, SNES Function norm = 2.29656E-03 >> >> NL step = 1, SNES Function norm = 6.19337E-04 >> >> total_FunctionCall_number: 0 >> >> converged, time step increased = 0.1 >> >> Solving time step 96, using BDF1, dt = 0.1. >> >> Current time (the starting time of this time step) = 9.45. >> >> NL step = 0, SNES Function norm = 1.53218E-03 >> >> NL step = 1, SNES Function norm = 5.94845E-04 >> >> total_FunctionCall_number: 0 >> >> converged, time step increased = 0.1 >> >> Solving time step 97, using BDF1, dt = 0.1. >> >> Current time (the starting time of this time step) = 9.55. >> >> NL step = 0, SNES Function norm = 1.32136E-03 >> >> NL step = 1, SNES Function norm = 6.19933E-04 >> >> total_FunctionCall_number: 0 >> >> converged, time step increased = 0.1 >> >> Solving time step 98, using BDF1, dt = 0.1. >> >> Current time (the starting time of this time step) = 9.65. >> >> NL step = 0, SNES Function norm = 7.09342E-04 >> >> NL step = 1, SNES Function norm = 6.18694E-04 >> >> total_FunctionCall_number: 0 >> >> converged, time step increased = 0.1 >> >> Solving time step 99, using BDF1, dt = 0.1. >> >> Current time (the starting time of this time step) = 9.75. >> >> NL step = 0, SNES Function norm = 5.49192E-04 >> >> total_FunctionCall_number: 0 >> >> converged, time step increased = 0.1 >> >> Solving time step 100, using BDF1, dt = 0.1. >> >> Current time (the starting time of this time step) = 9.85. >> >> NL step = 0, SNES Function norm = 5.49192E-04 >> >> total_FunctionCall_number: 0 >> >> converged, time step increased = 0.1 >> >> Solving time step 101, using BDF1, dt = 0.1. >> >> Current time (the starting time of this time step) = 9.95. >> >> NL step = 0, SNES Function norm = 5.49192E-04 >> >> total_FunctionCall_number: 0 >> >> I observed that after time step 99, the residual never changed, so I >> believe the transient simulation converges at time step 99. >> I wonder can I use the criterion "SNES converges and it takes 0 >> iteration" to say the simulation reaches a steady state. Such that I don't >> have to look at the screen and the code knows it converges and should stop. >> >> Put it another way, what's the common way people would implement a scheme >> to detect a transient simulation reaches steady state. >> > > I don't think so. The above makes no sense to me. You are signaling SNES > convergence with a relative > residual norm of 5e-4? That does not sound precise enough to me. > > I would argue that number (5.e-4) depends on the problem you are solving (actually I am solving). The initial residual of the problem starts at ~1e8. But you might be right, and I have to think about this issue more carefully. > As I said, I think the believable way to find steady states is to look for > solutions to the algebraic equations, > perhaps by using timestepping as a preconditioner. > > You still need a numerical criterion to let the code understand it converges, right? For example, "a set of solutions have already been found to satisfy the algebraic equations because ___residuals drops below (a number here)__". Thanks, Ling > Thanks, > > Matt > > >> Thanks, >> >> Ling >> >> >> On Tue, Nov 3, 2015 at 5:25 AM, Matthew Knepley >> wrote: >> >>> On Mon, Nov 2, 2015 at 7:29 PM, Barry Smith wrote: >>> >>>> >>>> > On Oct 30, 2015, at 12:23 PM, Zou (Non-US), Ling >>>> wrote: >>>> > >>>> > Hi All, >>>> > >>>> > From physics point of view, I know my simulation converges if nothing >>>> changes any more. >>>> > >>>> > I wonder how normally you do to detect if your simulation reaches >>>> steady state from numerical point of view. >>>> > Is it a good practice to use SNES convergence as a criterion, i.e., >>>> > SNES converges and it takes 0 iteration(s) >>>> >>>> Depends on the time integrator and SNES tolerance you are using. If >>>> you use a -snes_rtol 1.e-5 it will always try to squeeze 5 MORE digits out >>>> of the residual so won't take 0 iterations even if there is only a small >>>> change in the solution. >>>> >>> >>> There are two different situations here: >>> >>> 1) Solving for a mathematical steady state. You remove the time >>> derivative and solve the algebraic system with SNES. Then >>> the SNES tolerance is a good measure. >>> >>> 2) Use timestepping to advance until nothing looks like it is >>> changing. This is a "physical" steady state. >>> >>> You can use 1) with a timestepping preconditioner TSPSEUDO, which is >>> what I would recommend if you >>> want a true steady state. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> > >>>> > Thanks, >>>> > >>>> > Ling >>>> >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Nov 3 10:55:59 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 3 Nov 2015 10:55:59 -0600 Subject: [petsc-users] CG+GAMG convergence issues in GHEP Krylov-Schur for some MPI runs In-Reply-To: References: <084F486B-00B4-4803-A95B-EE390EB25391@gmail.com> <2724FD04-9FED-43B8-B8A4-9F66BB8BD43B@dsic.upv.es> <4E4A4E38-B9C3-4F41-8D52-1AFF3E4E8C8C@gmail.com> Message-ID: <5ED4A81C-002B-4A6B-BC6D-F93BE200F0A5@mcs.anl.gov> I am more optimistic about valgrind than Mark. I first try valgrind and if that fails to be helpful then use the debugger. valgrind has the advantage that it finds the FIRST place that something is wrong, while in the debugger it is kind of late at the crash. Valgrind should not be noisy, if it is then the applications/libraries should be cleaned up so that they are valgrind clean and then valgrind is useful. Barry > On Nov 3, 2015, at 7:47 AM, Mark Adams wrote: > > BTW, I think that our advice for segv is use a debugger. DDT or Totalview, and gdb if need be, will get you right to the source code and will get 90% of bugs diagnosed. Valgrind is noisy and cumbersome to use but can diagnose 90% of the other 10%. > > On Tue, Nov 3, 2015 at 7:32 AM, Denis Davydov wrote: > Hi Jose, > > > On 3 Nov 2015, at 12:20, Jose E. Roman wrote: > > > > I am answering the SLEPc-related questions: > > - Having different number of iterations when changing the number of processes is normal. > the change in iterations i mentioned are for different preconditioners, but the same number of MPI processes. > > > > - Yes, if you do not destroy the EPS solver, then the preconditioner would be reused. > > > > Regarding the segmentation fault, I have no clue. Not sure if this is related to GAMG or not. Maybe running under valgrind could provide more information. > will try that. > > Denis. > From bsmith at mcs.anl.gov Tue Nov 3 11:04:58 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 3 Nov 2015 11:04:58 -0600 Subject: [petsc-users] Scaling with number of cores In-Reply-To: <5638CD18.80101@gmail.com> References: <5634EDA4.5030203@gmail.com> <56356E2A.8090000@gmail.com> <0BEDF334-C0C6-49D1-9FE8-D178181C5F00@mcs.anl.gov> <5636140A.3040506@gmail.com> <61EEB3FD-A093-475C-9841-9E79BB120385@mcs.anl.gov> <5636BDF3.8000109@gmail.com> <5636E059.2010107@gmail.com> <501DC517-2887-4222-9ECD-9951D521A4E5@mcs.anl.gov> <56370085.7070502@gmail.com> <6FF57BB6-FB0D-49F0-BF84-E4EA760FC010@mcs.anl.gov> <56372A12.90900@gmail.com> <563839F8.9080000@gmail.com> <589BE4CF-18B5-4F04-A442-7FB0F2F3AFC1@mcs.anl.gov> <5638AD42.9060609@gmail.com> <5638AF6F.80405@gmail.com> <5638CD18.80101@gmail.com> Message-ID: <4BAAADC0-4665-4CC4-92D0-3B548ECA4346@mcs.anl.gov> > On Nov 3, 2015, at 9:04 AM, TAY wee-beng wrote: > > > On 3/11/2015 9:01 PM, Matthew Knepley wrote: >> On Tue, Nov 3, 2015 at 6:58 AM, TAY wee-beng wrote: >> >> On 3/11/2015 8:52 PM, Matthew Knepley wrote: >>> On Tue, Nov 3, 2015 at 6:49 AM, TAY wee-beng wrote: >>> Hi, >>> >>> I tried and have attached the log. >>> >>> Ya, my Poisson eqn has Neumann boundary condition. Do I need to specify some null space stuff? Like KSPSetNullSpace or MatNullSpaceCreate? >>> >>> Yes, you need to attach the constant null space to the matrix. >>> >>> Thanks, >>> >>> Matt >> Ok so can you point me to a suitable example so that I know which one to use specifically? >> >> https://bitbucket.org/petsc/petsc/src/9ae8fd060698c4d6fc0d13188aca8a1828c138ab/src/snes/examples/tutorials/ex12.c?at=master&fileviewer=file-view-default#ex12.c-761 >> >> Matt > Hi, > > Actually, I realised that for my Poisson eqn, I have neumann and dirichlet BC. Dirichlet BC is at the output grids by specifying pressure = 0. So do I still need the null space? No, > > My Poisson eqn LHS is fixed but RHS is changing with every timestep. > > If I need to use null space, how do I know if the null space contains the constant vector and what the the no. of vectors? I follow the example given and added: > > call MatNullSpaceCreate(MPI_COMM_WORLD,PETSC_TRUE,0,NULL,nullsp,ierr) > > call MatSetNullSpace(A,nullsp,ierr) > > call MatNullSpaceDestroy(nullsp,ierr) > > Is that all? > > Before this, I was using HYPRE geometric solver and the matrix / vector in the subroutine was written based on HYPRE. It worked pretty well and fast. > > However, it's a black box and it's hard to diagnose problems. > > I always had the PETSc subroutine to solve my Poisson eqn but I used KSPBCGS or KSPGMRES with HYPRE's boomeramg as the PC. It worked but was slow. > > Matt: Thanks, I will see how it goes using the nullspace and may try "-mg_coarse_pc_type svd" later. >> >> Thanks. >>> >>> >>> Thank you >>> >>> Yours sincerely, >>> >>> TAY wee-beng >>> >>> On 3/11/2015 12:45 PM, Barry Smith wrote: >>> On Nov 2, 2015, at 10:37 PM, TAY wee-beng wrote: >>> >>> Hi, >>> >>> I tried : >>> >>> 1. -poisson_pc_gamg_agg_nsmooths 1 -poisson_pc_type gamg >>> >>> 2. -poisson_pc_type gamg >>> Run with -poisson_ksp_monitor_true_residual -poisson_ksp_monitor_converged_reason >>> Does your poisson have Neumann boundary conditions? Do you have any zeros on the diagonal for the matrix (you shouldn't). >>> >>> There may be something wrong with your poisson discretization that was also messing up hypre >>> >>> >>> >>> Both options give: >>> >>> 1 0.00150000 0.00000000 0.00000000 1.00000000 NaN NaN NaN >>> M Diverged but why?, time = 2 >>> reason = -9 >>> >>> How can I check what's wrong? >>> >>> Thank you >>> >>> Yours sincerely, >>> >>> TAY wee-beng >>> >>> On 3/11/2015 3:18 AM, Barry Smith wrote: >>> hypre is just not scaling well here. I do not know why. Since hypre is a block box for us there is no way to determine why the poor scaling. >>> >>> If you make the same two runs with -pc_type gamg there will be a lot more information in the log summary about in what routines it is scaling well or poorly. >>> >>> Barry >>> >>> >>> >>> On Nov 2, 2015, at 3:17 AM, TAY wee-beng wrote: >>> >>> Hi, >>> >>> I have attached the 2 files. >>> >>> Thank you >>> >>> Yours sincerely, >>> >>> TAY wee-beng >>> >>> On 2/11/2015 2:55 PM, Barry Smith wrote: >>> Run (158/2)x(266/2)x(150/2) grid on 8 processes and then (158)x(266)x(150) on 64 processors and send the two -log_summary results >>> >>> Barry >>> >>> >>> On Nov 2, 2015, at 12:19 AM, TAY wee-beng wrote: >>> >>> Hi, >>> >>> I have attached the new results. >>> >>> Thank you >>> >>> Yours sincerely, >>> >>> TAY wee-beng >>> >>> On 2/11/2015 12:27 PM, Barry Smith wrote: >>> Run without the -momentum_ksp_view -poisson_ksp_view and send the new results >>> >>> >>> You can see from the log summary that the PCSetUp is taking a much smaller percentage of the time meaning that it is reusing the preconditioner and not rebuilding it each time. >>> >>> Barry >>> >>> Something makes no sense with the output: it gives >>> >>> KSPSolve 199 1.0 2.3298e+03 1.0 5.20e+09 1.8 3.8e+04 9.9e+05 5.0e+02 90100 66100 24 90100 66100 24 165 >>> >>> 90% of the time is in the solve but there is no significant amount of time in other events of the code which is just not possible. I hope it is due to your IO. >>> >>> >>> >>> On Nov 1, 2015, at 10:02 PM, TAY wee-beng wrote: >>> >>> Hi, >>> >>> I have attached the new run with 100 time steps for 48 and 96 cores. >>> >>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >>> >>> Why does the number of processes increase so much? Is there something wrong with my coding? Seems to be so too for my new run. >>> >>> Thank you >>> >>> Yours sincerely, >>> >>> TAY wee-beng >>> >>> On 2/11/2015 9:49 AM, Barry Smith wrote: >>> If you are doing many time steps with the same linear solver then you MUST do your weak scaling studies with MANY time steps since the setup time of AMG only takes place in the first stimestep. So run both 48 and 96 processes with the same large number of time steps. >>> >>> Barry >>> >>> >>> >>> On Nov 1, 2015, at 7:35 PM, TAY wee-beng wrote: >>> >>> Hi, >>> >>> Sorry I forgot and use the old a.out. I have attached the new log for 48cores (log48), together with the 96cores log (log96). >>> >>> Why does the number of processes increase so much? Is there something wrong with my coding? >>> >>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >>> >>> Lastly, I only simulated 2 time steps previously. Now I run for 10 timesteps (log48_10). Is it building the preconditioner at every timestep? >>> >>> Also, what about momentum eqn? Is it working well? >>> >>> I will try the gamg later too. >>> >>> Thank you >>> >>> Yours sincerely, >>> >>> TAY wee-beng >>> >>> On 2/11/2015 12:30 AM, Barry Smith wrote: >>> You used gmres with 48 processes but richardson with 96. You need to be careful and make sure you don't change the solvers when you change the number of processors since you can get very different inconsistent results >>> >>> Anyways all the time is being spent in the BoomerAMG algebraic multigrid setup and it is is scaling badly. When you double the problem size and number of processes it went from 3.2445e+01 to 4.3599e+02 seconds. >>> >>> PCSetUp 3 1.0 3.2445e+01 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 62 8 0 0 4 62 8 0 0 5 11 >>> >>> PCSetUp 3 1.0 4.3599e+02 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 85 18 0 0 6 85 18 0 0 6 2 >>> >>> Now is the Poisson problem changing at each timestep or can you use the same preconditioner built with BoomerAMG for all the time steps? Algebraic multigrid has a large set up time that you often doesn't matter if you have many time steps but if you have to rebuild it each timestep it is too large? >>> >>> You might also try -pc_type gamg and see how PETSc's algebraic multigrid scales for your problem/machine. >>> >>> Barry >>> >>> >>> >>> On Nov 1, 2015, at 7:30 AM, TAY wee-beng wrote: >>> >>> >>> On 1/11/2015 10:00 AM, Barry Smith wrote: >>> On Oct 31, 2015, at 8:43 PM, TAY wee-beng wrote: >>> >>> >>> On 1/11/2015 12:47 AM, Matthew Knepley wrote: >>> On Sat, Oct 31, 2015 at 11:34 AM, TAY wee-beng wrote: >>> Hi, >>> >>> I understand that as mentioned in the faq, due to the limitations in memory, the scaling is not linear. So, I am trying to write a proposal to use a supercomputer. >>> Its specs are: >>> Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of memory per node) >>> >>> 8 cores / processor >>> Interconnect: Tofu (6-dimensional mesh/torus) Interconnect >>> Each cabinet contains 96 computing nodes, >>> One of the requirement is to give the performance of my current code with my current set of data, and there is a formula to calculate the estimated parallel efficiency when using the new large set of data >>> There are 2 ways to give performance: >>> 1. Strong scaling, which is defined as how the elapsed time varies with the number of processors for a fixed >>> problem. >>> 2. Weak scaling, which is defined as how the elapsed time varies with the number of processors for a >>> fixed problem size per processor. >>> I ran my cases with 48 and 96 cores with my current cluster, giving 140 and 90 mins respectively. This is classified as strong scaling. >>> Cluster specs: >>> CPU: AMD 6234 2.4GHz >>> 8 cores / processor (CPU) >>> 6 CPU / node >>> So 48 Cores / CPU >>> Not sure abt the memory / node >>> >>> The parallel efficiency ?En? for a given degree of parallelism ?n? indicates how much the program is >>> efficiently accelerated by parallel processing. ?En? is given by the following formulae. Although their >>> derivation processes are different depending on strong and weak scaling, derived formulae are the >>> same. >>> From the estimated time, my parallel efficiency using Amdahl's law on the current old cluster was 52.7%. >>> So is my results acceptable? >>> For the large data set, if using 2205 nodes (2205X8cores), my expected parallel efficiency is only 0.5%. The proposal recommends value of > 50%. >>> The problem with this analysis is that the estimated serial fraction from Amdahl's Law changes as a function >>> of problem size, so you cannot take the strong scaling from one problem and apply it to another without a >>> model of this dependence. >>> >>> Weak scaling does model changes with problem size, so I would measure weak scaling on your current >>> cluster, and extrapolate to the big machine. I realize that this does not make sense for many scientific >>> applications, but neither does requiring a certain parallel efficiency. >>> Ok I check the results for my weak scaling it is even worse for the expected parallel efficiency. From the formula used, it's obvious it's doing some sort of exponential extrapolation decrease. So unless I can achieve a near > 90% speed up when I double the cores and problem size for my current 48/96 cores setup, extrapolating from about 96 nodes to 10,000 nodes will give a much lower expected parallel efficiency for the new case. >>> >>> However, it's mentioned in the FAQ that due to memory requirement, it's impossible to get >90% speed when I double the cores and problem size (ie linear increase in performance), which means that I can't get >90% speed up when I double the cores and problem size for my current 48/96 cores setup. Is that so? >>> What is the output of -ksp_view -log_summary on the problem and then on the problem doubled in size and number of processors? >>> >>> Barry >>> Hi, >>> >>> I have attached the output >>> >>> 48 cores: log48 >>> 96 cores: log96 >>> >>> There are 2 solvers - The momentum linear eqn uses bcgs, while the Poisson eqn uses hypre BoomerAMG. >>> >>> Problem size doubled from 158x266x150 to 158x266x300. >>> So is it fair to say that the main problem does not lie in my programming skills, but rather the way the linear equations are solved? >>> >>> Thanks. >>> Thanks, >>> >>> Matt >>> Is it possible for this type of scaling in PETSc (>50%), when using 17640 (2205X8) cores? >>> Btw, I do not have access to the system. >>> >>> >>> >>> Sent using CloudMagic Email >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >> >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener > From bsmith at mcs.anl.gov Tue Nov 3 11:11:36 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 3 Nov 2015 11:11:36 -0600 Subject: [petsc-users] Scaling with number of cores In-Reply-To: <5638AD42.9060609@gmail.com> References: <5634EDA4.5030203@gmail.com> <56356E2A.8090000@gmail.com> <0BEDF334-C0C6-49D1-9FE8-D178181C5F00@mcs.anl.gov> <5636140A.3040506@gmail.com> <61EEB3FD-A093-475C-9841-9E79BB120385@mcs.anl.gov> <5636BDF3.8000109@gmail.com> <5636E059.2010107@gmail.com> <501DC517-2887-4222-9ECD-9951D521A4E5@mcs.anl.gov> <56370085.7070502@gmail.com> <6FF57BB6-FB0D-49F0-BF84-E4EA760FC010@mcs.anl.gov> <56372A12.90900@gmail.com> <563839F8.9080000@gmail.com> <589BE4CF-18B5-4F04-A442-7FB0F2F3AFC1@mcs.anl.gov> <5638AD42.9060609@gmail.com> Message-ID: <2057DCBD-BDB9-4424-A4E3-C1BA5BB8C214@mcs.anl.gov> Ok, the convergence looks good. Now run on 8 and 64 processes as before with -log_summary and not -ksp_monitor to see how it scales. Barry > On Nov 3, 2015, at 6:49 AM, TAY wee-beng wrote: > > Hi, > > I tried and have attached the log. > > Ya, my Poisson eqn has Neumann boundary condition. Do I need to specify some null space stuff? Like KSPSetNullSpace or MatNullSpaceCreate? > > Thank you > > Yours sincerely, > > TAY wee-beng > > On 3/11/2015 12:45 PM, Barry Smith wrote: >>> On Nov 2, 2015, at 10:37 PM, TAY wee-beng wrote: >>> >>> Hi, >>> >>> I tried : >>> >>> 1. -poisson_pc_gamg_agg_nsmooths 1 -poisson_pc_type gamg >>> >>> 2. -poisson_pc_type gamg >> Run with -poisson_ksp_monitor_true_residual -poisson_ksp_monitor_converged_reason >> Does your poisson have Neumann boundary conditions? Do you have any zeros on the diagonal for the matrix (you shouldn't). >> >> There may be something wrong with your poisson discretization that was also messing up hypre >> >> >> >>> Both options give: >>> >>> 1 0.00150000 0.00000000 0.00000000 1.00000000 NaN NaN NaN >>> M Diverged but why?, time = 2 >>> reason = -9 >>> >>> How can I check what's wrong? >>> >>> Thank you >>> >>> Yours sincerely, >>> >>> TAY wee-beng >>> >>> On 3/11/2015 3:18 AM, Barry Smith wrote: >>>> hypre is just not scaling well here. I do not know why. Since hypre is a block box for us there is no way to determine why the poor scaling. >>>> >>>> If you make the same two runs with -pc_type gamg there will be a lot more information in the log summary about in what routines it is scaling well or poorly. >>>> >>>> Barry >>>> >>>> >>>> >>>>> On Nov 2, 2015, at 3:17 AM, TAY wee-beng wrote: >>>>> >>>>> Hi, >>>>> >>>>> I have attached the 2 files. >>>>> >>>>> Thank you >>>>> >>>>> Yours sincerely, >>>>> >>>>> TAY wee-beng >>>>> >>>>> On 2/11/2015 2:55 PM, Barry Smith wrote: >>>>>> Run (158/2)x(266/2)x(150/2) grid on 8 processes and then (158)x(266)x(150) on 64 processors and send the two -log_summary results >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>>> On Nov 2, 2015, at 12:19 AM, TAY wee-beng wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I have attached the new results. >>>>>>> >>>>>>> Thank you >>>>>>> >>>>>>> Yours sincerely, >>>>>>> >>>>>>> TAY wee-beng >>>>>>> >>>>>>> On 2/11/2015 12:27 PM, Barry Smith wrote: >>>>>>>> Run without the -momentum_ksp_view -poisson_ksp_view and send the new results >>>>>>>> >>>>>>>> >>>>>>>> You can see from the log summary that the PCSetUp is taking a much smaller percentage of the time meaning that it is reusing the preconditioner and not rebuilding it each time. >>>>>>>> >>>>>>>> Barry >>>>>>>> >>>>>>>> Something makes no sense with the output: it gives >>>>>>>> >>>>>>>> KSPSolve 199 1.0 2.3298e+03 1.0 5.20e+09 1.8 3.8e+04 9.9e+05 5.0e+02 90100 66100 24 90100 66100 24 165 >>>>>>>> >>>>>>>> 90% of the time is in the solve but there is no significant amount of time in other events of the code which is just not possible. I hope it is due to your IO. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On Nov 1, 2015, at 10:02 PM, TAY wee-beng wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I have attached the new run with 100 time steps for 48 and 96 cores. >>>>>>>>> >>>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >>>>>>>>> >>>>>>>>> Why does the number of processes increase so much? Is there something wrong with my coding? Seems to be so too for my new run. >>>>>>>>> >>>>>>>>> Thank you >>>>>>>>> >>>>>>>>> Yours sincerely, >>>>>>>>> >>>>>>>>> TAY wee-beng >>>>>>>>> >>>>>>>>> On 2/11/2015 9:49 AM, Barry Smith wrote: >>>>>>>>>> If you are doing many time steps with the same linear solver then you MUST do your weak scaling studies with MANY time steps since the setup time of AMG only takes place in the first stimestep. So run both 48 and 96 processes with the same large number of time steps. >>>>>>>>>> >>>>>>>>>> Barry >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Nov 1, 2015, at 7:35 PM, TAY wee-beng wrote: >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> Sorry I forgot and use the old a.out. I have attached the new log for 48cores (log48), together with the 96cores log (log96). >>>>>>>>>>> >>>>>>>>>>> Why does the number of processes increase so much? Is there something wrong with my coding? >>>>>>>>>>> >>>>>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >>>>>>>>>>> >>>>>>>>>>> Lastly, I only simulated 2 time steps previously. Now I run for 10 timesteps (log48_10). Is it building the preconditioner at every timestep? >>>>>>>>>>> >>>>>>>>>>> Also, what about momentum eqn? Is it working well? >>>>>>>>>>> >>>>>>>>>>> I will try the gamg later too. >>>>>>>>>>> >>>>>>>>>>> Thank you >>>>>>>>>>> >>>>>>>>>>> Yours sincerely, >>>>>>>>>>> >>>>>>>>>>> TAY wee-beng >>>>>>>>>>> >>>>>>>>>>> On 2/11/2015 12:30 AM, Barry Smith wrote: >>>>>>>>>>>> You used gmres with 48 processes but richardson with 96. You need to be careful and make sure you don't change the solvers when you change the number of processors since you can get very different inconsistent results >>>>>>>>>>>> >>>>>>>>>>>> Anyways all the time is being spent in the BoomerAMG algebraic multigrid setup and it is is scaling badly. When you double the problem size and number of processes it went from 3.2445e+01 to 4.3599e+02 seconds. >>>>>>>>>>>> >>>>>>>>>>>> PCSetUp 3 1.0 3.2445e+01 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 62 8 0 0 4 62 8 0 0 5 11 >>>>>>>>>>>> >>>>>>>>>>>> PCSetUp 3 1.0 4.3599e+02 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 85 18 0 0 6 85 18 0 0 6 2 >>>>>>>>>>>> >>>>>>>>>>>> Now is the Poisson problem changing at each timestep or can you use the same preconditioner built with BoomerAMG for all the time steps? Algebraic multigrid has a large set up time that you often doesn't matter if you have many time steps but if you have to rebuild it each timestep it is too large? >>>>>>>>>>>> >>>>>>>>>>>> You might also try -pc_type gamg and see how PETSc's algebraic multigrid scales for your problem/machine. >>>>>>>>>>>> >>>>>>>>>>>> Barry >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> On Nov 1, 2015, at 7:30 AM, TAY wee-beng wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 1/11/2015 10:00 AM, Barry Smith wrote: >>>>>>>>>>>>>>> On Oct 31, 2015, at 8:43 PM, TAY wee-beng wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 1/11/2015 12:47 AM, Matthew Knepley wrote: >>>>>>>>>>>>>>>> On Sat, Oct 31, 2015 at 11:34 AM, TAY wee-beng wrote: >>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I understand that as mentioned in the faq, due to the limitations in memory, the scaling is not linear. So, I am trying to write a proposal to use a supercomputer. >>>>>>>>>>>>>>>> Its specs are: >>>>>>>>>>>>>>>> Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of memory per node) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 8 cores / processor >>>>>>>>>>>>>>>> Interconnect: Tofu (6-dimensional mesh/torus) Interconnect >>>>>>>>>>>>>>>> Each cabinet contains 96 computing nodes, >>>>>>>>>>>>>>>> One of the requirement is to give the performance of my current code with my current set of data, and there is a formula to calculate the estimated parallel efficiency when using the new large set of data >>>>>>>>>>>>>>>> There are 2 ways to give performance: >>>>>>>>>>>>>>>> 1. Strong scaling, which is defined as how the elapsed time varies with the number of processors for a fixed >>>>>>>>>>>>>>>> problem. >>>>>>>>>>>>>>>> 2. Weak scaling, which is defined as how the elapsed time varies with the number of processors for a >>>>>>>>>>>>>>>> fixed problem size per processor. >>>>>>>>>>>>>>>> I ran my cases with 48 and 96 cores with my current cluster, giving 140 and 90 mins respectively. This is classified as strong scaling. >>>>>>>>>>>>>>>> Cluster specs: >>>>>>>>>>>>>>>> CPU: AMD 6234 2.4GHz >>>>>>>>>>>>>>>> 8 cores / processor (CPU) >>>>>>>>>>>>>>>> 6 CPU / node >>>>>>>>>>>>>>>> So 48 Cores / CPU >>>>>>>>>>>>>>>> Not sure abt the memory / node >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The parallel efficiency ?En? for a given degree of parallelism ?n? indicates how much the program is >>>>>>>>>>>>>>>> efficiently accelerated by parallel processing. ?En? is given by the following formulae. Although their >>>>>>>>>>>>>>>> derivation processes are different depending on strong and weak scaling, derived formulae are the >>>>>>>>>>>>>>>> same. >>>>>>>>>>>>>>>> From the estimated time, my parallel efficiency using Amdahl's law on the current old cluster was 52.7%. >>>>>>>>>>>>>>>> So is my results acceptable? >>>>>>>>>>>>>>>> For the large data set, if using 2205 nodes (2205X8cores), my expected parallel efficiency is only 0.5%. The proposal recommends value of > 50%. >>>>>>>>>>>>>>>> The problem with this analysis is that the estimated serial fraction from Amdahl's Law changes as a function >>>>>>>>>>>>>>>> of problem size, so you cannot take the strong scaling from one problem and apply it to another without a >>>>>>>>>>>>>>>> model of this dependence. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Weak scaling does model changes with problem size, so I would measure weak scaling on your current >>>>>>>>>>>>>>>> cluster, and extrapolate to the big machine. I realize that this does not make sense for many scientific >>>>>>>>>>>>>>>> applications, but neither does requiring a certain parallel efficiency. >>>>>>>>>>>>>>> Ok I check the results for my weak scaling it is even worse for the expected parallel efficiency. From the formula used, it's obvious it's doing some sort of exponential extrapolation decrease. So unless I can achieve a near > 90% speed up when I double the cores and problem size for my current 48/96 cores setup, extrapolating from about 96 nodes to 10,000 nodes will give a much lower expected parallel efficiency for the new case. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> However, it's mentioned in the FAQ that due to memory requirement, it's impossible to get >90% speed when I double the cores and problem size (ie linear increase in performance), which means that I can't get >90% speed up when I double the cores and problem size for my current 48/96 cores setup. Is that so? >>>>>>>>>>>>>> What is the output of -ksp_view -log_summary on the problem and then on the problem doubled in size and number of processors? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Barry >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I have attached the output >>>>>>>>>>>>> >>>>>>>>>>>>> 48 cores: log48 >>>>>>>>>>>>> 96 cores: log96 >>>>>>>>>>>>> >>>>>>>>>>>>> There are 2 solvers - The momentum linear eqn uses bcgs, while the Poisson eqn uses hypre BoomerAMG. >>>>>>>>>>>>> >>>>>>>>>>>>> Problem size doubled from 158x266x150 to 158x266x300. >>>>>>>>>>>>>>> So is it fair to say that the main problem does not lie in my programming skills, but rather the way the linear equations are solved? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Matt >>>>>>>>>>>>>>>> Is it possible for this type of scaling in PETSc (>50%), when using 17640 (2205X8) cores? >>>>>>>>>>>>>>>> Btw, I do not have access to the system. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Sent using CloudMagic Email >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> > > From ecoon at lanl.gov Tue Nov 3 11:22:37 2015 From: ecoon at lanl.gov (Coon, Ethan) Date: Tue, 3 Nov 2015 17:22:37 +0000 Subject: [petsc-users] DMDA and local-to-global scatters In-Reply-To: References: Message-ID: <1332BF48-C225-4716-B61D-4AE033837EB9@lanl.gov> Thanks much for your thoughts Barry, I?ll holler again as I have more. Ethan ------------------------------------------------------------------------ Ethan Coon Research Scientist Computational Earth Science -- EES-16 Los Alamos National Laboratory 505-665-8289 http://www.lanl.gov/expertise/profiles/view/ethan-coon ------------------------------------------------------------------------ > On Oct 29, 2015, at 4:37 PM, Barry Smith wrote: > > > Ethan, > > The truth is I had to introduce the restriction because the previous code was a memory hog and difficult to maintain. I couldn't figure any way to eliminate the memory hog business except by eliminating the localtolocal feature. > > My recommendation is to not try to rewrite the local to local map which is horrible but instead try to use a global vector for the computed result. Hopefully you can do this in a way to that doesn't mess up the entire code. In the simplest case one would have something like > > for t=0 to many time steps > DMGlobalToLocal (xglobal, xlocal) > DMDAVecGetArray on xlocal and global > update xglobal arrays based on xlocal arrays > restore arrays > end > > If the timesteping is controlled elsewhere so your routine can only do a single time step at a time then something like > > function whose input is a local (ghosted) vector > DMDAGetGlobalVector( ... &xglobal); > DMDAVecGetArray on xlocal and global > update xglobal arrays based on xlocal arrays > restore arrays > DMDAGlobalToLocal(xglobal, xlocal); > DMDARestoreGlobalVector(xglobal); > > thus you can "hide" the global vector in the routine that does only the update. > > If this doesn't help then send more specifics of exactly where your localtolocal() map is used and I may have suggestions. I convinced myself that one could > always use a global intermediate vector in the computation without additional communication to replace the localtolocal but I could be wrong. > > Sorry about the change > > > Barry > > >> On Oct 29, 2015, at 5:18 PM, Coon, Ethan wrote: >> >> Hi Barry, all, >> >> I?m trying to understand some extremely old code (4 years now!) that people insist on using and therefore having me support, despite the fact that the DOE doesn?t think it is important enough to pay me to support. >> >> Barry ? can you explain to me the history and logic in your change: >> >> https://bitbucket.org/petsc/petsc/commits/bd1fc5ae41626b6cf1674a6070035cfd93e0c1dd >> >> that removed DMDA?s local-to-global map, in favor of doing the reverse scatter on global-to-local? When I wrote the aforementioned LBM code, the local-to-global scatter had different semantics than the reverse global-to-local scatter. The latter merged owned and ghosted values from the local Vec into the owned global Vec, while the former ignored ghost values and did a direct copy from owned local values to owned global values. Under INSERT_VALUES, this was important as the latter was a race condition while the former was a well-posed operation. >> >> Now I see that this L2G scatter has been removed (in 2014), which introduced a race condition. It was tweaked a bit for 1 process by this commit: >> >> https://bitbucket.org/petsc/petsc/commits/1eb28f2e8c580cb49316c983b5b6ec6c58d77ab8 >> >> which refers to a Lisandro email that I?m having trouble finding. Fortunately this caused some errors that I did see, as opposed to the previous race conditions which I didn?t see. >> >> Additionally there was documentation added to not do INSERT_VALUES with DMDAs at some point. (Maybe because it causes race conditions!) This documentation suggests to "simply compute the values directly into a global vector instead of a local one." This isn?t a good choice in my application, where I do many time steps in local vectors, using repeated calls to ?DMLocalToLocalBegin()?, and only go back to the global vector when I want to do i/o. Computing into global vectors requires managing two Vecs throughout the entire code, when otherwise I only manage one (except in the i/o portion of the code). I guess the answer is to create and use a L2G forward scatter myself? Is there a better solution I?m not thinking of? >> >> Thanks, >> >> Ethan >> >> >> >> ------------------------------------------------------------------------ >> Ethan Coon >> Research Scientist >> Computational Earth Science -- EES-16 >> Los Alamos National Laboratory >> 505-665-8289 >> >> http://www.lanl.gov/expertise/profiles/view/ethan-coon >> ------------------------------------------------------------------------ >> > From davydden at gmail.com Tue Nov 3 12:46:00 2015 From: davydden at gmail.com (Denis Davydov) Date: Tue, 3 Nov 2015 19:46:00 +0100 Subject: [petsc-users] CG+GAMG convergence issues in GHEP Krylov-Schur for some MPI runs In-Reply-To: <2724FD04-9FED-43B8-B8A4-9F66BB8BD43B@dsic.upv.es> References: <084F486B-00B4-4803-A95B-EE390EB25391@gmail.com> <2724FD04-9FED-43B8-B8A4-9F66BB8BD43B@dsic.upv.es> Message-ID: <2F2F9DA5-7597-4093-84CC-61E848187924@gmail.com> Jose, Even when I have PETSc --with-debugging=1 and SLEPc picks it up during configure, i don?t seem to have debug symbols in resulting SLEPc lib (make stage): warning: no debug symbols in executable (-arch x86_64) Same when starting a debugger: warning: (x86_64) /usr/local/opt/slepc/real/lib/libslepc.3.6.dylib empty dSYM file detected, dSYM was created with an executable with no debug info. C/Fortran flags seems to have debug flags: Using C/C++ linker: /usr/local/bin/mpicc Using C/C++ flags: -Wl,-multiply_defined,suppress -Wl,-multiply_defined -Wl,suppress -Wl,-commons,use_dylibs -Wl,-search_paths_first -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -O0 Using Fortran linker: /usr/local/bin/mpif90 Using Fortran flags: -Wl,-multiply_defined,suppress -Wl,-multiply_defined -Wl,suppress -Wl,-commons,use_dylibs -Wl,-search_paths_first -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -g -O0 Any ideas? Kind regards, Denis > On 3 Nov 2015, at 12:20, Jose E. Roman wrote: > > I am answering the SLEPc-related questions: > - Having different number of iterations when changing the number of processes is normal. > - Yes, if you do not destroy the EPS solver, then the preconditioner would be reused. > > Regarding the segmentation fault, I have no clue. Not sure if this is related to GAMG or not. Maybe running under valgrind could provide more information. > > Jose -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Tue Nov 3 12:54:43 2015 From: jroman at dsic.upv.es (Jose E. Roman) Date: Tue, 3 Nov 2015 19:54:43 +0100 Subject: [petsc-users] CG+GAMG convergence issues in GHEP Krylov-Schur for some MPI runs In-Reply-To: <2F2F9DA5-7597-4093-84CC-61E848187924@gmail.com> References: <084F486B-00B4-4803-A95B-EE390EB25391@gmail.com> <2724FD04-9FED-43B8-B8A4-9F66BB8BD43B@dsic.upv.es> <2F2F9DA5-7597-4093-84CC-61E848187924@gmail.com> Message-ID: <10EAE81C-1783-4018-9707-A8C8201C27E8@dsic.upv.es> In MacOSX you have to keep the *.o files, and not delete them. With PETSc's makefiles, this can be done easily with e.g. $ make ex1 RM=echo Jose > El 3/11/2015, a las 19:46, Denis Davydov escribi?: > > Jose, > > Even when I have PETSc --with-debugging=1 and SLEPc picks it up during configure, > i don?t seem to have debug symbols in resulting SLEPc lib (make stage): > > warning: no debug symbols in executable (-arch x86_64) > > Same when starting a debugger: > warning: (x86_64) /usr/local/opt/slepc/real/lib/libslepc.3.6.dylib empty dSYM file detected, dSYM was created with an executable with no debug info. > > C/Fortran flags seems to have debug flags: > > Using C/C++ linker: /usr/local/bin/mpicc > Using C/C++ flags: -Wl,-multiply_defined,suppress -Wl,-multiply_defined -Wl,suppress -Wl,-commons,use_dylibs -Wl,-search_paths_first -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -O0 > Using Fortran linker: /usr/local/bin/mpif90 > Using Fortran flags: -Wl,-multiply_defined,suppress -Wl,-multiply_defined -Wl,suppress -Wl,-commons,use_dylibs -Wl,-search_paths_first -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -g -O0 > > Any ideas? > > Kind regards, > Denis From bsmith at mcs.anl.gov Tue Nov 3 12:55:20 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 3 Nov 2015 12:55:20 -0600 Subject: [petsc-users] How do I know it is steady state? In-Reply-To: References: <30E80E96-BC25-489D-8110-D81D9123ED19@mcs.anl.gov> Message-ID: <50BFDE7D-D3A6-4E4B-B034-79E433C2F8D5@mcs.anl.gov> > On Nov 3, 2015, at 9:38 AM, Zou (Non-US), Ling wrote: > > > > On Tue, Nov 3, 2015 at 8:24 AM, Matthew Knepley wrote: > On Tue, Nov 3, 2015 at 9:12 AM, Zou (Non-US), Ling wrote: > Matt, thanks for the reply. > The simulation is a transient simulation, which eventually converges to a steady-state solution, given enough simulation time. > My code runs fine and I could tell the simulation reaches steady state by looking at the residual monitored by SNES monitor function. > > See an example screen output > > Solving time step 90, using BDF1, dt = 0.1. > Current time (the starting time of this time step) = 8.85. > NL step = 0, SNES Function norm = 1.47538E-02 > NL step = 1, SNES Function norm = 8.06971E-04 > total_FunctionCall_number: 0 > converged, time step increased = 0.1 > Solving time step 91, using BDF1, dt = 0.1. > Current time (the starting time of this time step) = 8.95. > NL step = 0, SNES Function norm = 1.10861E-02 > NL step = 1, SNES Function norm = 6.26584E-04 > total_FunctionCall_number: 0 > converged, time step increased = 0.1 > Solving time step 92, using BDF1, dt = 0.1. > Current time (the starting time of this time step) = 9.05. > NL step = 0, SNES Function norm = 7.21253E-03 > NL step = 1, SNES Function norm = 9.93402E-04 > total_FunctionCall_number: 0 > converged, time step increased = 0.1 > Solving time step 93, using BDF1, dt = 0.1. > Current time (the starting time of this time step) = 9.15. > NL step = 0, SNES Function norm = 5.40260E-03 > NL step = 1, SNES Function norm = 6.21162E-04 > total_FunctionCall_number: 0 > converged, time step increased = 0.1 > Solving time step 94, using BDF1, dt = 0.1. > Current time (the starting time of this time step) = 9.25. > NL step = 0, SNES Function norm = 3.40214E-03 > NL step = 1, SNES Function norm = 6.16805E-04 > total_FunctionCall_number: 0 > converged, time step increased = 0.1 > Solving time step 95, using BDF1, dt = 0.1. > Current time (the starting time of this time step) = 9.35. > NL step = 0, SNES Function norm = 2.29656E-03 > NL step = 1, SNES Function norm = 6.19337E-04 > total_FunctionCall_number: 0 > converged, time step increased = 0.1 > Solving time step 96, using BDF1, dt = 0.1. > Current time (the starting time of this time step) = 9.45. > NL step = 0, SNES Function norm = 1.53218E-03 > NL step = 1, SNES Function norm = 5.94845E-04 > total_FunctionCall_number: 0 > converged, time step increased = 0.1 > Solving time step 97, using BDF1, dt = 0.1. > Current time (the starting time of this time step) = 9.55. > NL step = 0, SNES Function norm = 1.32136E-03 > NL step = 1, SNES Function norm = 6.19933E-04 > total_FunctionCall_number: 0 > converged, time step increased = 0.1 > Solving time step 98, using BDF1, dt = 0.1. > Current time (the starting time of this time step) = 9.65. > NL step = 0, SNES Function norm = 7.09342E-04 > NL step = 1, SNES Function norm = 6.18694E-04 > total_FunctionCall_number: 0 > converged, time step increased = 0.1 > Solving time step 99, using BDF1, dt = 0.1. > Current time (the starting time of this time step) = 9.75. > NL step = 0, SNES Function norm = 5.49192E-04 > total_FunctionCall_number: 0 > converged, time step increased = 0.1 > Solving time step 100, using BDF1, dt = 0.1. > Current time (the starting time of this time step) = 9.85. > NL step = 0, SNES Function norm = 5.49192E-04 > total_FunctionCall_number: 0 > converged, time step increased = 0.1 > Solving time step 101, using BDF1, dt = 0.1. > Current time (the starting time of this time step) = 9.95. > NL step = 0, SNES Function norm = 5.49192E-04 > total_FunctionCall_number: 0 > > I observed that after time step 99, the residual never changed, so I believe the transient simulation converges at time step 99. > I wonder can I use the criterion "SNES converges and it takes 0 iteration" to say the simulation reaches a steady state. Such that I don't have to look at the screen and the code knows it converges and should stop. > > Put it another way, what's the common way people would implement a scheme to detect a transient simulation reaches steady state. > > I don't think so. The above makes no sense to me. You are signaling SNES convergence with a relative > residual norm of 5e-4? That does not sound precise enough to me. > > I would argue that number (5.e-4) depends on the problem you are solving (actually I am solving). > The initial residual of the problem starts at ~1e8. > But you might be right, and I have to think about this issue more carefully. > > As I said, I think the believable way to find steady states is to look for solutions to the algebraic equations, > perhaps by using timestepping as a preconditioner. > > You still need a numerical criterion to let the code understand it converges, right? For example, "a set of solutions have already been found to satisfy the algebraic equations because ___residuals drops below (a number here)__". After each SNESSolve you could call SNESGetConvergedReason() and if the number of iterations was 0 and the reason was snorm then declare it steady state. Barry > > Thanks, > > Ling > > > Thanks, > > Matt > > Thanks, > > Ling > > > On Tue, Nov 3, 2015 at 5:25 AM, Matthew Knepley wrote: > On Mon, Nov 2, 2015 at 7:29 PM, Barry Smith wrote: > > > On Oct 30, 2015, at 12:23 PM, Zou (Non-US), Ling wrote: > > > > Hi All, > > > > From physics point of view, I know my simulation converges if nothing changes any more. > > > > I wonder how normally you do to detect if your simulation reaches steady state from numerical point of view. > > Is it a good practice to use SNES convergence as a criterion, i.e., > > SNES converges and it takes 0 iteration(s) > > Depends on the time integrator and SNES tolerance you are using. If you use a -snes_rtol 1.e-5 it will always try to squeeze 5 MORE digits out of the residual so won't take 0 iterations even if there is only a small change in the solution. > > There are two different situations here: > > 1) Solving for a mathematical steady state. You remove the time derivative and solve the algebraic system with SNES. Then > the SNES tolerance is a good measure. > > 2) Use timestepping to advance until nothing looks like it is changing. This is a "physical" steady state. > > You can use 1) with a timestepping preconditioner TSPSEUDO, which is what I would recommend if you > want a true steady state. > > Thanks, > > Matt > > > > > Thanks, > > > > Ling > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > From ling.zou at inl.gov Tue Nov 3 13:05:09 2015 From: ling.zou at inl.gov (Zou (Non-US), Ling) Date: Tue, 3 Nov 2015 12:05:09 -0700 Subject: [petsc-users] How do I know it is steady state? In-Reply-To: <50BFDE7D-D3A6-4E4B-B034-79E433C2F8D5@mcs.anl.gov> References: <30E80E96-BC25-489D-8110-D81D9123ED19@mcs.anl.gov> <50BFDE7D-D3A6-4E4B-B034-79E433C2F8D5@mcs.anl.gov> Message-ID: Barry, thanks for the discussion and help. Ling On Tue, Nov 3, 2015 at 11:55 AM, Barry Smith wrote: > > > On Nov 3, 2015, at 9:38 AM, Zou (Non-US), Ling wrote: > > > > > > > > On Tue, Nov 3, 2015 at 8:24 AM, Matthew Knepley > wrote: > > On Tue, Nov 3, 2015 at 9:12 AM, Zou (Non-US), Ling > wrote: > > Matt, thanks for the reply. > > The simulation is a transient simulation, which eventually converges to > a steady-state solution, given enough simulation time. > > My code runs fine and I could tell the simulation reaches steady state > by looking at the residual monitored by SNES monitor function. > > > > See an example screen output > > > > Solving time step 90, using BDF1, dt = 0.1. > > Current time (the starting time of this time step) = 8.85. > > NL step = 0, SNES Function norm = 1.47538E-02 > > NL step = 1, SNES Function norm = 8.06971E-04 > > total_FunctionCall_number: 0 > > converged, time step increased = 0.1 > > Solving time step 91, using BDF1, dt = 0.1. > > Current time (the starting time of this time step) = 8.95. > > NL step = 0, SNES Function norm = 1.10861E-02 > > NL step = 1, SNES Function norm = 6.26584E-04 > > total_FunctionCall_number: 0 > > converged, time step increased = 0.1 > > Solving time step 92, using BDF1, dt = 0.1. > > Current time (the starting time of this time step) = 9.05. > > NL step = 0, SNES Function norm = 7.21253E-03 > > NL step = 1, SNES Function norm = 9.93402E-04 > > total_FunctionCall_number: 0 > > converged, time step increased = 0.1 > > Solving time step 93, using BDF1, dt = 0.1. > > Current time (the starting time of this time step) = 9.15. > > NL step = 0, SNES Function norm = 5.40260E-03 > > NL step = 1, SNES Function norm = 6.21162E-04 > > total_FunctionCall_number: 0 > > converged, time step increased = 0.1 > > Solving time step 94, using BDF1, dt = 0.1. > > Current time (the starting time of this time step) = 9.25. > > NL step = 0, SNES Function norm = 3.40214E-03 > > NL step = 1, SNES Function norm = 6.16805E-04 > > total_FunctionCall_number: 0 > > converged, time step increased = 0.1 > > Solving time step 95, using BDF1, dt = 0.1. > > Current time (the starting time of this time step) = 9.35. > > NL step = 0, SNES Function norm = 2.29656E-03 > > NL step = 1, SNES Function norm = 6.19337E-04 > > total_FunctionCall_number: 0 > > converged, time step increased = 0.1 > > Solving time step 96, using BDF1, dt = 0.1. > > Current time (the starting time of this time step) = 9.45. > > NL step = 0, SNES Function norm = 1.53218E-03 > > NL step = 1, SNES Function norm = 5.94845E-04 > > total_FunctionCall_number: 0 > > converged, time step increased = 0.1 > > Solving time step 97, using BDF1, dt = 0.1. > > Current time (the starting time of this time step) = 9.55. > > NL step = 0, SNES Function norm = 1.32136E-03 > > NL step = 1, SNES Function norm = 6.19933E-04 > > total_FunctionCall_number: 0 > > converged, time step increased = 0.1 > > Solving time step 98, using BDF1, dt = 0.1. > > Current time (the starting time of this time step) = 9.65. > > NL step = 0, SNES Function norm = 7.09342E-04 > > NL step = 1, SNES Function norm = 6.18694E-04 > > total_FunctionCall_number: 0 > > converged, time step increased = 0.1 > > Solving time step 99, using BDF1, dt = 0.1. > > Current time (the starting time of this time step) = 9.75. > > NL step = 0, SNES Function norm = 5.49192E-04 > > total_FunctionCall_number: 0 > > converged, time step increased = 0.1 > > Solving time step 100, using BDF1, dt = 0.1. > > Current time (the starting time of this time step) = 9.85. > > NL step = 0, SNES Function norm = 5.49192E-04 > > total_FunctionCall_number: 0 > > converged, time step increased = 0.1 > > Solving time step 101, using BDF1, dt = 0.1. > > Current time (the starting time of this time step) = 9.95. > > NL step = 0, SNES Function norm = 5.49192E-04 > > total_FunctionCall_number: 0 > > > > I observed that after time step 99, the residual never changed, so I > believe the transient simulation converges at time step 99. > > I wonder can I use the criterion "SNES converges and it takes 0 > iteration" to say the simulation reaches a steady state. Such that I don't > have to look at the screen and the code knows it converges and should stop. > > > > Put it another way, what's the common way people would implement a > scheme to detect a transient simulation reaches steady state. > > > > I don't think so. The above makes no sense to me. You are signaling SNES > convergence with a relative > > residual norm of 5e-4? That does not sound precise enough to me. > > > > I would argue that number (5.e-4) depends on the problem you are solving > (actually I am solving). > > The initial residual of the problem starts at ~1e8. > > But you might be right, and I have to think about this issue more > carefully. > > > > As I said, I think the believable way to find steady states is to look > for solutions to the algebraic equations, > > perhaps by using timestepping as a preconditioner. > > > > You still need a numerical criterion to let the code understand it > converges, right? For example, "a set of solutions have already been found > to satisfy the algebraic equations because ___residuals drops below (a > number here)__". > > After each SNESSolve you could call SNESGetConvergedReason() and if the > number of iterations was 0 and the reason was snorm then declare it steady > state. > > Barry > > > > > Thanks, > > > > Ling > > > > > > Thanks, > > > > Matt > > > > Thanks, > > > > Ling > > > > > > On Tue, Nov 3, 2015 at 5:25 AM, Matthew Knepley > wrote: > > On Mon, Nov 2, 2015 at 7:29 PM, Barry Smith wrote: > > > > > On Oct 30, 2015, at 12:23 PM, Zou (Non-US), Ling > wrote: > > > > > > Hi All, > > > > > > From physics point of view, I know my simulation converges if nothing > changes any more. > > > > > > I wonder how normally you do to detect if your simulation reaches > steady state from numerical point of view. > > > Is it a good practice to use SNES convergence as a criterion, i.e., > > > SNES converges and it takes 0 iteration(s) > > > > Depends on the time integrator and SNES tolerance you are using. If > you use a -snes_rtol 1.e-5 it will always try to squeeze 5 MORE digits out > of the residual so won't take 0 iterations even if there is only a small > change in the solution. > > > > There are two different situations here: > > > > 1) Solving for a mathematical steady state. You remove the time > derivative and solve the algebraic system with SNES. Then > > the SNES tolerance is a good measure. > > > > 2) Use timestepping to advance until nothing looks like it is > changing. This is a "physical" steady state. > > > > You can use 1) with a timestepping preconditioner TSPSEUDO, which is > what I would recommend if you > > want a true steady state. > > > > Thanks, > > > > Matt > > > > > > > > Thanks, > > > > > > Ling > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcrean01 at gmail.com Tue Nov 3 21:01:14 2015 From: jcrean01 at gmail.com (Jared Crean) Date: Tue, 03 Nov 2015 22:01:14 -0500 Subject: [petsc-users] PetscOptionsGetString Not Finding Option In-Reply-To: References: <56356233.3060207@gmail.com> Message-ID: <563974FA.5060906@gmail.com> Hell Hong, Is there a way to directly query the options database to get the value of Petsc options? This is related to the discussion here: http://lists.mcs.anl.gov/pipermail/petsc-dev/2015-July/017944.html . I though PetscOptionsGetString and PetscOptionsSetValue would provide this capability, but now it seems like petsc options cannot be accessed. Jared Crean On 11/01/2015 10:19 AM, Hong wrote: > Jared : > Either call KSPSetPCSide() or change > const char name[] = "-ksp_pc_side" > to a non-petsc option name, e.g., "-my_ksp_pc_side". > > Hong > > Hello, > I am trying to use PetscOptionsGetString to retrieve the value > of an option in the options database, but the value returned in > the last argument indicates the option was not found. In the > attached code (a modified version of ksp example 2), the string > "-ksp_pc_side" is passed in as the argument name. If I run the code as > > ./jc2 -pc_type ilu -ksp_pc_side right > > I get the output: > > option -ksp_pc_side was found > > from line 71 of the file. Petsc does not complain of unused > options when the program finishes. Am I using this function > incorrectly? > > Jared Crean > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Nov 3 21:04:17 2015 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 3 Nov 2015 21:04:17 -0600 Subject: [petsc-users] PetscOptionsGetString Not Finding Option In-Reply-To: <563974FA.5060906@gmail.com> References: <56356233.3060207@gmail.com> <563974FA.5060906@gmail.com> Message-ID: On Tue, Nov 3, 2015 at 9:01 PM, Jared Crean wrote: > Hell Hong, > Is there a way to directly query the options database to get the > value of Petsc options? This is related to the discussion here: > http://lists.mcs.anl.gov/pipermail/petsc-dev/2015-July/017944.html . I > though PetscOptionsGetString and PetscOptionsSetValue would provide this > capability, but now it seems like petsc options cannot be accessed. > I am not sure what you mean. This http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscOptionsGetString.html will get the raw input string for any option. The other functions return processed versions. What else do you want? Matt Jared Crean > > On 11/01/2015 10:19 AM, Hong wrote: > > Jared : > Either call KSPSetPCSide() or change > const char name[] = "-ksp_pc_side" > to a non-petsc option name, e.g., "-my_ksp_pc_side". > > Hong > > Hello, >> I am trying to use PetscOptionsGetString to retrieve the value of an >> option in the options database, but the value returned in the last argument >> indicates the option was not found. In the attached code (a modified >> version of ksp example 2), the string "-ksp_pc_side" is passed in as the >> argument name. If I run the code as >> >> ./jc2 -pc_type ilu -ksp_pc_side right >> >> I get the output: >> >> option -ksp_pc_side was found >> >> from line 71 of the file. Petsc does not complain of unused options >> when the program finishes. Am I using this function incorrectly? >> >> Jared Crean >> > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Nov 3 21:07:35 2015 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 3 Nov 2015 21:07:35 -0600 Subject: [petsc-users] PetscOptionsGetString Not Finding Option In-Reply-To: <56356233.3060207@gmail.com> References: <56356233.3060207@gmail.com> Message-ID: On Sat, Oct 31, 2015 at 7:52 PM, Jared Crean wrote: > Hello, > I am trying to use PetscOptionsGetString to retrieve the value of an > option in the options database, but the value returned in the last argument > indicates the option was not found. In the attached code (a modified > version of ksp example 2), the string "-ksp_pc_side" is passed in as the > argument name. If I run the code as > > ./jc2 -pc_type ilu -ksp_pc_side right > > I get the output: > > option -ksp_pc_side was found > You gave the options and the program says it was found. What is the problem here? Matt > from line 71 of the file. Petsc does not complain of unused options > when the program finishes. Am I using this function incorrectly? > > Jared Crean > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcrean01 at gmail.com Tue Nov 3 21:37:15 2015 From: jcrean01 at gmail.com (Jared Crean) Date: Tue, 03 Nov 2015 22:37:15 -0500 Subject: [petsc-users] PetscOptionsGetString Not Finding Option In-Reply-To: References: <56356233.3060207@gmail.com> Message-ID: <56397D6B.3070402@gmail.com> Hello Matt, Ok, something weird is going on. Some of my test cases are behaving strangely (the output of the test in my previous message is the expected behavior, but previously the test gave different results). Let me figure out what is going on with the tests before proceeding. Jared Crean On 11/03/2015 10:07 PM, Matthew Knepley wrote: > On Sat, Oct 31, 2015 at 7:52 PM, Jared Crean > wrote: > > Hello, > I am trying to use PetscOptionsGetString to retrieve the value > of an option in the options database, but the value returned in > the last argument indicates the option was not found. In the > attached code (a modified version of ksp example 2), the string > "-ksp_pc_side" is passed in as the argument name. If I run the code as > > ./jc2 -pc_type ilu -ksp_pc_side right > > I get the output: > > option -ksp_pc_side was found > > > You gave the options and the program says it was found. What is the > problem here? > > Matt > > from line 71 of the file. Petsc does not complain of unused > options when the program finishes. Am I using this function > incorrectly? > > Jared Crean > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcrean01 at gmail.com Tue Nov 3 21:50:40 2015 From: jcrean01 at gmail.com (Jared Crean) Date: Tue, 03 Nov 2015 22:50:40 -0500 Subject: [petsc-users] PetscOptionsGetString Not Finding Option In-Reply-To: <56397D6B.3070402@gmail.com> References: <56356233.3060207@gmail.com> <56397D6B.3070402@gmail.com> Message-ID: <56398090.5070705@gmail.com> Hello Matt, There was a problem with the way I was doing the tests, not with PetscOptionsGetString. PetscOptionsGetString now behaves as expected in all tests. Sorry for the confusion. Jared Crean On 11/03/2015 10:37 PM, Jared Crean wrote: > Hello Matt, > Ok, something weird is going on. Some of my test cases are > behaving strangely (the output of the test in my previous message is > the expected behavior, but previously the test gave different > results). Let me figure out what is going on with the tests before > proceeding. > > Jared Crean > > On 11/03/2015 10:07 PM, Matthew Knepley wrote: >> On Sat, Oct 31, 2015 at 7:52 PM, Jared Crean > > wrote: >> >> Hello, >> I am trying to use PetscOptionsGetString to retrieve the >> value of an option in the options database, but the value >> returned in the last argument indicates the option was not >> found. In the attached code (a modified version of ksp example >> 2), the string "-ksp_pc_side" is passed in as the argument name. >> If I run the code as >> >> ./jc2 -pc_type ilu -ksp_pc_side right >> >> I get the output: >> >> option -ksp_pc_side was found >> >> >> You gave the options and the program says it was found. What is the >> problem here? >> >> Matt >> >> from line 71 of the file. Petsc does not complain of unused >> options when the program finishes. Am I using this function >> incorrectly? >> >> Jared Crean >> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which >> their experiments lead. >> -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gianmail at gmail.com Wed Nov 4 14:18:39 2015 From: gianmail at gmail.com (Gianluca Meneghello) Date: Wed, 4 Nov 2015 12:18:39 -0800 Subject: [petsc-users] MUMPS with symmetric matrices Message-ID: Dear all, I am trying to solve a linear system for a symmetric matrix with MUMPS. Is there a way to tell MUMPS that the matrix is indeed symmetric? The way I build the matrix is Mat A,AT,ATA MatHermitianTranspose(A,MAT_INITIAL_MATRIX,&AT); MatMatMult(AT,A,MAT_INITIAL_MATRIX,7,&ATA); MatSetOption(ATA,MAT_SYMMETRY_ETERNAL,PETSC_TRUE); but MUMPS returns L U Solver for unsymmetric matrices Of course, any suggestion of a better/more efficient way to build ATA or store only half of it, that is more than welcome. Thanks for your help, Gianluca -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Nov 4 14:46:35 2015 From: jed at jedbrown.org (Jed Brown) Date: Wed, 04 Nov 2015 13:46:35 -0700 Subject: [petsc-users] MUMPS with symmetric matrices In-Reply-To: References: Message-ID: <87bnb9jxw4.fsf@jedbrown.org> Gianluca Meneghello writes: > Dear all, > > I am trying to solve a linear system for a symmetric matrix with MUMPS. > > Is there a way to tell MUMPS that the matrix is indeed symmetric? > > The way I build the matrix is > > Mat A,AT,ATA > MatHermitianTranspose(A,MAT_INITIAL_MATRIX,&AT); > MatMatMult(AT,A,MAT_INITIAL_MATRIX,7,&ATA); > MatSetOption(ATA,MAT_SYMMETRY_ETERNAL,PETSC_TRUE); > > but MUMPS returns > > L U Solver for unsymmetric matrices You're probably using -pc_type lu rather than -pc_type cholesky. > Of course, any suggestion of a better/more efficient way to build ATA or > store only half of it, that is more than welcome. Where does A come from? There is MatTransposeMatMult() -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From gianmail at gmail.com Wed Nov 4 15:06:15 2015 From: gianmail at gmail.com (Gianluca Meneghello) Date: Wed, 4 Nov 2015 13:06:15 -0800 Subject: [petsc-users] MUMPS with symmetric matrices In-Reply-To: <87bnb9jxw4.fsf@jedbrown.org> References: <87bnb9jxw4.fsf@jedbrown.org> Message-ID: That is correct... I will try with -pc_type cholesky and use MatTransposeMatMult. Using cholesky I do not need to specify mumps as a solver, am I right? A is a linearization of the Navier Stokes equation. Thanks! Gianluca On Wed, Nov 4, 2015 at 12:46 PM, Jed Brown wrote: > Gianluca Meneghello writes: > > > Dear all, > > > > I am trying to solve a linear system for a symmetric matrix with MUMPS. > > > > Is there a way to tell MUMPS that the matrix is indeed symmetric? > > > > The way I build the matrix is > > > > Mat A,AT,ATA > > MatHermitianTranspose(A,MAT_INITIAL_MATRIX,&AT); > > MatMatMult(AT,A,MAT_INITIAL_MATRIX,7,&ATA); > > MatSetOption(ATA,MAT_SYMMETRY_ETERNAL,PETSC_TRUE); > > > > but MUMPS returns > > > > L U Solver for unsymmetric matrices > > You're probably using -pc_type lu rather than -pc_type cholesky. > > > Of course, any suggestion of a better/more efficient way to build ATA or > > store only half of it, that is more than welcome. > > Where does A come from? > > There is MatTransposeMatMult() > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Nov 4 15:16:35 2015 From: jed at jedbrown.org (Jed Brown) Date: Wed, 04 Nov 2015 14:16:35 -0700 Subject: [petsc-users] MUMPS with symmetric matrices In-Reply-To: References: <87bnb9jxw4.fsf@jedbrown.org> Message-ID: <878u6djwi4.fsf@jedbrown.org> Gianluca Meneghello writes: > That is correct... I will try with -pc_type cholesky and use > MatTransposeMatMult. > > Using cholesky I do not need to specify mumps as a solver, am I right? Of course you do. > A is a linearization of the Navier Stokes equation. Of the differential operator, its inverse, or a map from some parameters to observations? -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From gianmail at gmail.com Wed Nov 4 17:37:47 2015 From: gianmail at gmail.com (Gianluca Meneghello) Date: Wed, 4 Nov 2015 15:37:47 -0800 Subject: [petsc-users] MUMPS with symmetric matrices In-Reply-To: <878u6djwi4.fsf@jedbrown.org> References: <87bnb9jxw4.fsf@jedbrown.org> <878u6djwi4.fsf@jedbrown.org> Message-ID: It is a discretization of the differential operator, of which I would need the inverse (or LU decomposition). My goal is frequency response (resolvant) analysis of the linearized Navier-Stokes operator. There was a reason I was not using MatTransposeMatMult, that is the matrix is complex and I would need MatTransposeHemitianMatMult (or something like that). It seems to me that is not available (or does MatTransposeMatMult compute the Hermitian transpose?) Any suggestion is of course welcome! Thanks Gianluca On Wed, Nov 4, 2015 at 1:16 PM, Jed Brown wrote: > Gianluca Meneghello writes: > > > That is correct... I will try with -pc_type cholesky and use > > MatTransposeMatMult. > > > > Using cholesky I do not need to specify mumps as a solver, am I right? > > Of course you do. > > > A is a linearization of the Navier Stokes equation. > > Of the differential operator, its inverse, or a map from some parameters > to observations? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gianmail at gmail.com Wed Nov 4 18:18:33 2015 From: gianmail at gmail.com (Gianluca Meneghello) Date: Wed, 4 Nov 2015 16:18:33 -0800 Subject: [petsc-users] MUMPS with symmetric matrices In-Reply-To: References: <87bnb9jxw4.fsf@jedbrown.org> <878u6djwi4.fsf@jedbrown.org> Message-ID: I have just read that there is no special algorithm for Hermitian matrices in MUMPS (sorry, I meant Hermitian, not symmetric... the matrix is complex). Sorry for this. In any case, if there is any suggestion it is more than welcome! Thanks for your help and your work, Gianluca On Wed, Nov 4, 2015 at 3:37 PM, Gianluca Meneghello wrote: > It is a discretization of the differential operator, of which I would need > the inverse (or LU decomposition). My goal is frequency response > (resolvant) analysis of the linearized Navier-Stokes operator. > > There was a reason I was not using MatTransposeMatMult, that is the > matrix is complex and I would need MatTransposeHemitianMatMult (or > something like that). It seems to me that is not available (or does > MatTransposeMatMult compute the Hermitian transpose?) > > Any suggestion is of course welcome! > > Thanks > > Gianluca > > > > > > On Wed, Nov 4, 2015 at 1:16 PM, Jed Brown wrote: > >> Gianluca Meneghello writes: >> >> > That is correct... I will try with -pc_type cholesky and use >> > MatTransposeMatMult. >> > >> > Using cholesky I do not need to specify mumps as a solver, am I right? >> >> Of course you do. >> >> > A is a linearization of the Navier Stokes equation. >> >> Of the differential operator, its inverse, or a map from some parameters >> to observations? >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zonexo at gmail.com Wed Nov 4 21:30:39 2015 From: zonexo at gmail.com (TAY wee-beng) Date: Thu, 5 Nov 2015 11:30:39 +0800 Subject: [petsc-users] Scaling with number of cores In-Reply-To: <2057DCBD-BDB9-4424-A4E3-C1BA5BB8C214@mcs.anl.gov> References: <5634EDA4.5030203@gmail.com> <56356E2A.8090000@gmail.com> <0BEDF334-C0C6-49D1-9FE8-D178181C5F00@mcs.anl.gov> <5636140A.3040506@gmail.com> <61EEB3FD-A093-475C-9841-9E79BB120385@mcs.anl.gov> <5636BDF3.8000109@gmail.com> <5636E059.2010107@gmail.com> <501DC517-2887-4222-9ECD-9951D521A4E5@mcs.anl.gov> <56370085.7070502@gmail.com> <6FF57BB6-FB0D-49F0-BF84-E4EA760FC010@mcs.anl.gov> <56372A12.90900@gmail.com> <563839F8.9080000@gmail.com> <589BE4CF-18B5-4F04-A442-7FB0F2F3AFC1@mcs.anl.gov> <5638AD42.9060609@gmail.com> <2057DCBD-BDB9-4424-A4E3-C1BA5BB8C214@mcs.anl.gov> Message-ID: <563ACD5F.6060301@gmail.com> Hi, I have attached the 2 logs. Thank you Yours sincerely, TAY wee-beng On 4/11/2015 1:11 AM, Barry Smith wrote: > Ok, the convergence looks good. Now run on 8 and 64 processes as before with -log_summary and not -ksp_monitor to see how it scales. > > Barry > >> On Nov 3, 2015, at 6:49 AM, TAY wee-beng wrote: >> >> Hi, >> >> I tried and have attached the log. >> >> Ya, my Poisson eqn has Neumann boundary condition. Do I need to specify some null space stuff? Like KSPSetNullSpace or MatNullSpaceCreate? >> >> Thank you >> >> Yours sincerely, >> >> TAY wee-beng >> >> On 3/11/2015 12:45 PM, Barry Smith wrote: >>>> On Nov 2, 2015, at 10:37 PM, TAY wee-beng wrote: >>>> >>>> Hi, >>>> >>>> I tried : >>>> >>>> 1. -poisson_pc_gamg_agg_nsmooths 1 -poisson_pc_type gamg >>>> >>>> 2. -poisson_pc_type gamg >>> Run with -poisson_ksp_monitor_true_residual -poisson_ksp_monitor_converged_reason >>> Does your poisson have Neumann boundary conditions? Do you have any zeros on the diagonal for the matrix (you shouldn't). >>> >>> There may be something wrong with your poisson discretization that was also messing up hypre >>> >>> >>> >>>> Both options give: >>>> >>>> 1 0.00150000 0.00000000 0.00000000 1.00000000 NaN NaN NaN >>>> M Diverged but why?, time = 2 >>>> reason = -9 >>>> >>>> How can I check what's wrong? >>>> >>>> Thank you >>>> >>>> Yours sincerely, >>>> >>>> TAY wee-beng >>>> >>>> On 3/11/2015 3:18 AM, Barry Smith wrote: >>>>> hypre is just not scaling well here. I do not know why. Since hypre is a block box for us there is no way to determine why the poor scaling. >>>>> >>>>> If you make the same two runs with -pc_type gamg there will be a lot more information in the log summary about in what routines it is scaling well or poorly. >>>>> >>>>> Barry >>>>> >>>>> >>>>> >>>>>> On Nov 2, 2015, at 3:17 AM, TAY wee-beng wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I have attached the 2 files. >>>>>> >>>>>> Thank you >>>>>> >>>>>> Yours sincerely, >>>>>> >>>>>> TAY wee-beng >>>>>> >>>>>> On 2/11/2015 2:55 PM, Barry Smith wrote: >>>>>>> Run (158/2)x(266/2)x(150/2) grid on 8 processes and then (158)x(266)x(150) on 64 processors and send the two -log_summary results >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>> >>>>>>>> On Nov 2, 2015, at 12:19 AM, TAY wee-beng wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I have attached the new results. >>>>>>>> >>>>>>>> Thank you >>>>>>>> >>>>>>>> Yours sincerely, >>>>>>>> >>>>>>>> TAY wee-beng >>>>>>>> >>>>>>>> On 2/11/2015 12:27 PM, Barry Smith wrote: >>>>>>>>> Run without the -momentum_ksp_view -poisson_ksp_view and send the new results >>>>>>>>> >>>>>>>>> >>>>>>>>> You can see from the log summary that the PCSetUp is taking a much smaller percentage of the time meaning that it is reusing the preconditioner and not rebuilding it each time. >>>>>>>>> >>>>>>>>> Barry >>>>>>>>> >>>>>>>>> Something makes no sense with the output: it gives >>>>>>>>> >>>>>>>>> KSPSolve 199 1.0 2.3298e+03 1.0 5.20e+09 1.8 3.8e+04 9.9e+05 5.0e+02 90100 66100 24 90100 66100 24 165 >>>>>>>>> >>>>>>>>> 90% of the time is in the solve but there is no significant amount of time in other events of the code which is just not possible. I hope it is due to your IO. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Nov 1, 2015, at 10:02 PM, TAY wee-beng wrote: >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I have attached the new run with 100 time steps for 48 and 96 cores. >>>>>>>>>> >>>>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >>>>>>>>>> >>>>>>>>>> Why does the number of processes increase so much? Is there something wrong with my coding? Seems to be so too for my new run. >>>>>>>>>> >>>>>>>>>> Thank you >>>>>>>>>> >>>>>>>>>> Yours sincerely, >>>>>>>>>> >>>>>>>>>> TAY wee-beng >>>>>>>>>> >>>>>>>>>> On 2/11/2015 9:49 AM, Barry Smith wrote: >>>>>>>>>>> If you are doing many time steps with the same linear solver then you MUST do your weak scaling studies with MANY time steps since the setup time of AMG only takes place in the first stimestep. So run both 48 and 96 processes with the same large number of time steps. >>>>>>>>>>> >>>>>>>>>>> Barry >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On Nov 1, 2015, at 7:35 PM, TAY wee-beng wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> Sorry I forgot and use the old a.out. I have attached the new log for 48cores (log48), together with the 96cores log (log96). >>>>>>>>>>>> >>>>>>>>>>>> Why does the number of processes increase so much? Is there something wrong with my coding? >>>>>>>>>>>> >>>>>>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >>>>>>>>>>>> >>>>>>>>>>>> Lastly, I only simulated 2 time steps previously. Now I run for 10 timesteps (log48_10). Is it building the preconditioner at every timestep? >>>>>>>>>>>> >>>>>>>>>>>> Also, what about momentum eqn? Is it working well? >>>>>>>>>>>> >>>>>>>>>>>> I will try the gamg later too. >>>>>>>>>>>> >>>>>>>>>>>> Thank you >>>>>>>>>>>> >>>>>>>>>>>> Yours sincerely, >>>>>>>>>>>> >>>>>>>>>>>> TAY wee-beng >>>>>>>>>>>> >>>>>>>>>>>> On 2/11/2015 12:30 AM, Barry Smith wrote: >>>>>>>>>>>>> You used gmres with 48 processes but richardson with 96. You need to be careful and make sure you don't change the solvers when you change the number of processors since you can get very different inconsistent results >>>>>>>>>>>>> >>>>>>>>>>>>> Anyways all the time is being spent in the BoomerAMG algebraic multigrid setup and it is is scaling badly. When you double the problem size and number of processes it went from 3.2445e+01 to 4.3599e+02 seconds. >>>>>>>>>>>>> >>>>>>>>>>>>> PCSetUp 3 1.0 3.2445e+01 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 62 8 0 0 4 62 8 0 0 5 11 >>>>>>>>>>>>> >>>>>>>>>>>>> PCSetUp 3 1.0 4.3599e+02 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 85 18 0 0 6 85 18 0 0 6 2 >>>>>>>>>>>>> >>>>>>>>>>>>> Now is the Poisson problem changing at each timestep or can you use the same preconditioner built with BoomerAMG for all the time steps? Algebraic multigrid has a large set up time that you often doesn't matter if you have many time steps but if you have to rebuild it each timestep it is too large? >>>>>>>>>>>>> >>>>>>>>>>>>> You might also try -pc_type gamg and see how PETSc's algebraic multigrid scales for your problem/machine. >>>>>>>>>>>>> >>>>>>>>>>>>> Barry >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> On Nov 1, 2015, at 7:30 AM, TAY wee-beng wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 1/11/2015 10:00 AM, Barry Smith wrote: >>>>>>>>>>>>>>>> On Oct 31, 2015, at 8:43 PM, TAY wee-beng wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 1/11/2015 12:47 AM, Matthew Knepley wrote: >>>>>>>>>>>>>>>>> On Sat, Oct 31, 2015 at 11:34 AM, TAY wee-beng wrote: >>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I understand that as mentioned in the faq, due to the limitations in memory, the scaling is not linear. So, I am trying to write a proposal to use a supercomputer. >>>>>>>>>>>>>>>>> Its specs are: >>>>>>>>>>>>>>>>> Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of memory per node) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 8 cores / processor >>>>>>>>>>>>>>>>> Interconnect: Tofu (6-dimensional mesh/torus) Interconnect >>>>>>>>>>>>>>>>> Each cabinet contains 96 computing nodes, >>>>>>>>>>>>>>>>> One of the requirement is to give the performance of my current code with my current set of data, and there is a formula to calculate the estimated parallel efficiency when using the new large set of data >>>>>>>>>>>>>>>>> There are 2 ways to give performance: >>>>>>>>>>>>>>>>> 1. Strong scaling, which is defined as how the elapsed time varies with the number of processors for a fixed >>>>>>>>>>>>>>>>> problem. >>>>>>>>>>>>>>>>> 2. Weak scaling, which is defined as how the elapsed time varies with the number of processors for a >>>>>>>>>>>>>>>>> fixed problem size per processor. >>>>>>>>>>>>>>>>> I ran my cases with 48 and 96 cores with my current cluster, giving 140 and 90 mins respectively. This is classified as strong scaling. >>>>>>>>>>>>>>>>> Cluster specs: >>>>>>>>>>>>>>>>> CPU: AMD 6234 2.4GHz >>>>>>>>>>>>>>>>> 8 cores / processor (CPU) >>>>>>>>>>>>>>>>> 6 CPU / node >>>>>>>>>>>>>>>>> So 48 Cores / CPU >>>>>>>>>>>>>>>>> Not sure abt the memory / node >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The parallel efficiency ?En? for a given degree of parallelism ?n? indicates how much the program is >>>>>>>>>>>>>>>>> efficiently accelerated by parallel processing. ?En? is given by the following formulae. Although their >>>>>>>>>>>>>>>>> derivation processes are different depending on strong and weak scaling, derived formulae are the >>>>>>>>>>>>>>>>> same. >>>>>>>>>>>>>>>>> From the estimated time, my parallel efficiency using Amdahl's law on the current old cluster was 52.7%. >>>>>>>>>>>>>>>>> So is my results acceptable? >>>>>>>>>>>>>>>>> For the large data set, if using 2205 nodes (2205X8cores), my expected parallel efficiency is only 0.5%. The proposal recommends value of > 50%. >>>>>>>>>>>>>>>>> The problem with this analysis is that the estimated serial fraction from Amdahl's Law changes as a function >>>>>>>>>>>>>>>>> of problem size, so you cannot take the strong scaling from one problem and apply it to another without a >>>>>>>>>>>>>>>>> model of this dependence. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Weak scaling does model changes with problem size, so I would measure weak scaling on your current >>>>>>>>>>>>>>>>> cluster, and extrapolate to the big machine. I realize that this does not make sense for many scientific >>>>>>>>>>>>>>>>> applications, but neither does requiring a certain parallel efficiency. >>>>>>>>>>>>>>>> Ok I check the results for my weak scaling it is even worse for the expected parallel efficiency. From the formula used, it's obvious it's doing some sort of exponential extrapolation decrease. So unless I can achieve a near > 90% speed up when I double the cores and problem size for my current 48/96 cores setup, extrapolating from about 96 nodes to 10,000 nodes will give a much lower expected parallel efficiency for the new case. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> However, it's mentioned in the FAQ that due to memory requirement, it's impossible to get >90% speed when I double the cores and problem size (ie linear increase in performance), which means that I can't get >90% speed up when I double the cores and problem size for my current 48/96 cores setup. Is that so? >>>>>>>>>>>>>>> What is the output of -ksp_view -log_summary on the problem and then on the problem doubled in size and number of processors? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Barry >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I have attached the output >>>>>>>>>>>>>> >>>>>>>>>>>>>> 48 cores: log48 >>>>>>>>>>>>>> 96 cores: log96 >>>>>>>>>>>>>> >>>>>>>>>>>>>> There are 2 solvers - The momentum linear eqn uses bcgs, while the Poisson eqn uses hypre BoomerAMG. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Problem size doubled from 158x266x150 to 158x266x300. >>>>>>>>>>>>>>>> So is it fair to say that the main problem does not lie in my programming skills, but rather the way the linear equations are solved? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Matt >>>>>>>>>>>>>>>>> Is it possible for this type of scaling in PETSc (>50%), when using 17640 (2205X8) cores? >>>>>>>>>>>>>>>>> Btw, I do not have access to the system. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Sent using CloudMagic Email >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>>>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>> >> -------------- next part -------------- 0.000000000000000E+000 0.353000000000000 0.000000000000000E+000 90.0000000000000 0.000000000000000E+000 0.000000000000000E+000 1.00000000000000 0.400000000000000 0 -400000 z grid divid too small! myid,each procs z size 45 2 z grid divid too small! myid,each procs z size 61 2 z grid divid too small! myid,each procs z size 60 2 z grid divid too small! myid,each procs z size 57 2 z grid divid too small! myid,each procs z size 47 2 z grid divid too small! myid,each procs z size 53 2 z grid divid too small! myid,each procs z size 32 2 z grid divid too small! myid,each procs z size 55 2 z grid divid too small! myid,each procs z size 40 2 z grid divid too small! myid,each procs z size 59 2 z grid divid too small! myid,each procs z size 56 2 z grid divid too small! myid,each procs z size 62 2 z grid divid too small! myid,each procs z size 37 2 z grid divid too small! myid,each procs z size 39 2 z grid divid too small! myid,each procs z size 49 2 z grid divid too small! myid,each procs z size 51 2 z grid divid too small! myid,each procs z size 23 2 z grid divid too small! myid,each procs z size 48 2 z grid divid too small! myid,each procs z size 31 2 z grid divid too small! myid,each procs z size 41 2 z grid divid too small! myid,each procs z size 44 2 z grid divid too small! myid,each procs z size 25 2 z grid divid too small! myid,each procs z size 42 2 z grid divid too small! myid,each procs z size 50 2 z grid divid too small! myid,each procs z size 33 2 z grid divid too small! myid,each procs z size 52 2 z grid divid too small! myid,each procs z size 36 2 z grid divid too small! myid,each procs z size 54 2 z grid divid too small! myid,each procs z size 63 2 z grid divid too small! myid,each procs z size 46 2 z grid divid too small! myid,each procs z size 43 2 AB,AA,BB -2.47900002275128 2.50750002410496 3.46600006963126 3.40250006661518 size_x,size_y,size_z 158 266 150 z grid divid too small! myid,each procs z size 27 2 z grid divid too small! myid,each procs z size 28 2 z grid divid too small! myid,each procs z size 35 2 z grid divid too small! myid,each procs z size 58 2 z grid divid too small! myid,each procs z size 29 2 z grid divid too small! myid,each procs z size 38 2 z grid divid too small! myid,each procs z size 34 2 z grid divid too small! myid,each procs z size 26 2 z grid divid too small! myid,each procs z size 30 2 z grid divid too small! myid,each procs z size 22 2 z grid divid too small! myid,each procs z size 24 2 body_cg_ini 0.523700833348298 0.778648765134454 7.03282656467989 Warning - length difference between element and cell max_element_length,min_element_length,min_delta 0.000000000000000E+000 10000000000.0000 1.800000000000000E-002 maximum ngh_surfaces and ngh_vertics are 42 22 minimum ngh_surfaces and ngh_vertics are 28 10 body_cg_ini 0.896813342835977 -0.976707581163755 7.03282656467989 Warning - length difference between element and cell max_element_length,min_element_length,min_delta 0.000000000000000E+000 10000000000.0000 1.800000000000000E-002 maximum ngh_surfaces and ngh_vertics are 42 22 minimum ngh_surfaces and ngh_vertics are 28 10 min IIB_cell_no 0 max IIB_cell_no 429 final initial IIB_cell_no 2145 min I_cell_no 0 max I_cell_no 460 final initial I_cell_no 2300 size(IIB_cell_u),size(I_cell_u),size(IIB_equal_cell_u),size(I_equal_cell_u) 2145 2300 2145 2300 IIB_I_cell_no_uvw_total1 3090 3094 3078 3080 3074 3073 IIB_I_cell_no_uvw_total2 3102 3108 3089 3077 3060 3086 1 0.00150000 0.26453723 0.26151046 1.18591392 -0.76723714E+03 -0.33383947E+02 0.62972365E+07 escape_time reached, so abort body 1 implicit forces and moment 1 0.862588231110303 -0.514914387215313 0.188666130472786 0.478398637226279 0.368390123384182 -1.05426820824698 body 2 implicit forces and moment 2 0.527317470000801 0.731529851430443 0.148470855991251 -0.515187365220847 0.158119906556628 0.961551831458363 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./a.out on a petsc-3.6.2_shared_rel named n12-04 with 64 processors, by wtay Thu Nov 5 04:24:51 2015 Using Petsc Release Version 3.6.2, Oct, 02, 2015 Max Max/Min Avg Total Time (sec): 7.490e+02 1.00000 7.490e+02 Objects: 5.700e+01 1.00000 5.700e+01 Flops: 6.176e+09 1.99202 4.747e+09 3.038e+11 Flops/sec: 8.245e+06 1.99202 6.338e+06 4.056e+08 MPI Messages: 1.552e+03 2.00000 1.528e+03 9.778e+04 MPI Message Lengths: 7.812e+08 2.00000 5.034e+05 4.922e+10 MPI Reductions: 3.844e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 7.4898e+02 100.0% 3.0379e+11 100.0% 9.778e+04 100.0% 5.034e+05 100.0% 3.843e+03 100.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 772 1.0 1.4625e+01 2.0 1.90e+09 2.1 9.7e+04 5.1e+05 0.0e+00 1 31 99100 0 1 31 99100 0 6464 MatSolve 297 1.0 5.3322e+00 2.5 1.30e+09 2.9 0.0e+00 0.0e+00 0.0e+00 0 21 0 0 0 0 21 0 0 0 11684 MatLUFactorNum 99 1.0 7.0492e+00 3.3 6.77e+08 3.4 0.0e+00 0.0e+00 0.0e+00 1 10 0 0 0 1 10 0 0 0 4504 MatILUFactorSym 1 1.0 7.0585e-02 5.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatConvert 1 1.0 6.0066e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyBegin 100 1.0 2.3072e+0173.7 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+02 2 0 0 0 5 2 0 0 0 5 0 MatAssemblyEnd 100 1.0 2.2332e+00 2.1 0.00e+00 0.0 5.0e+02 1.7e+05 1.6e+01 0 0 1 0 0 0 0 1 0 0 0 MatGetRowIJ 3 1.0 1.5020e-0515.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 8.7328e-03 5.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPGMRESOrthog 574 1.0 2.5287e+00 1.5 9.80e+08 1.5 0.0e+00 0.0e+00 5.7e+02 0 16 0 0 15 0 16 0 0 15 19386 KSPSetUp 199 1.0 4.9613e-0210.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 199 1.0 5.8122e+02 1.0 6.18e+09 2.0 9.7e+04 5.1e+05 2.4e+03 78100 99100 63 78100 99100 63 523 VecDot 198 1.0 3.8677e+00 8.2 1.50e+08 1.5 0.0e+00 0.0e+00 2.0e+02 0 2 0 0 5 0 2 0 0 5 1936 VecDotNorm2 99 1.0 3.5481e+00 9.6 1.50e+08 1.5 0.0e+00 0.0e+00 9.9e+01 0 2 0 0 3 0 2 0 0 3 2111 VecMDot 574 1.0 1.3885e+00 1.2 4.90e+08 1.5 0.0e+00 0.0e+00 5.7e+02 0 8 0 0 15 0 8 0 0 15 17652 VecNorm 872 1.0 8.3005e+00 8.6 3.20e+08 1.5 0.0e+00 0.0e+00 8.7e+02 1 5 0 0 23 1 5 0 0 23 1926 VecScale 674 1.0 1.5629e-01 3.4 8.50e+07 1.5 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 27187 VecCopy 298 1.0 5.1306e-01 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 1470 1.0 1.2767e+00 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 100 1.0 1.0121e-01 3.7 2.52e+07 1.5 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 12458 VecAXPBYCZ 198 1.0 1.0368e+00 3.0 3.00e+08 1.5 0.0e+00 0.0e+00 0.0e+00 0 5 0 0 0 0 5 0 0 0 14447 VecWAXPY 198 1.0 1.0037e+00 2.9 1.50e+08 1.5 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 7462 VecMAXPY 674 1.0 1.6812e+00 3.6 6.35e+08 1.5 0.0e+00 0.0e+00 0.0e+00 0 10 0 0 0 0 10 0 0 0 18884 VecAssemblyBegin 398 1.0 3.4004e+00 5.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+03 0 0 0 0 31 0 0 0 0 31 0 VecAssemblyEnd 398 1.0 1.8644e-03 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 772 1.0 9.4321e-01 3.3 0.00e+00 0.0 9.7e+04 5.1e+05 0.0e+00 0 0 99100 0 0 0 99100 0 0 VecScatterEnd 772 1.0 6.1952e+00 4.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecNormalize 674 1.0 1.2847e+00 2.8 2.55e+08 1.5 0.0e+00 0.0e+00 6.7e+02 0 4 0 0 18 0 4 0 0 18 9922 PCSetUp 199 1.0 9.9825e+01 1.1 6.77e+08 3.4 0.0e+00 0.0e+00 4.0e+00 13 10 0 0 0 13 10 0 0 0 318 PCSetUpOnBlocks 99 1.0 7.1259e+00 3.3 6.77e+08 3.4 0.0e+00 0.0e+00 0.0e+00 1 10 0 0 0 1 10 0 0 0 4456 PCApply 971 1.0 4.5812e+02 1.2 1.30e+09 2.9 0.0e+00 0.0e+00 0.0e+00 57 21 0 0 0 57 21 0 0 0 136 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 7 7 136037064 0 Matrix Null Space 1 1 592 0 Krylov Solver 3 3 20664 0 Vector 33 33 44756160 0 Vector Scatter 2 2 2176 0 Index Set 7 7 3696940 0 Preconditioner 3 3 3208 0 Viewer 1 0 0 0 ======================================================================================================================== Average time to get PetscTime(): 9.53674e-08 Average time for MPI_Barrier(): 0.000209427 Average time for zero size MPI_Send(): 2.06716e-05 #PETSc Option Table entries: -log_summary #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --with-mpi-dir=/opt/ud/openmpi-1.8.8/ --with-blas-lapack-dir=/opt/ud/intel_xe_2013sp1/mkl/lib/intel64/ --with-debugging=0 --download-hypre=1 --prefix=/home/wtay/Lib/petsc-3.6.2_shared_rel --known-mpi-shared=1 --with-shared-libraries --with-fortran-interfaces=1 ----------------------------------------- Libraries compiled on Sun Oct 18 17:34:07 2015 on hpc12 Machine characteristics: Linux-3.10.0-123.20.1.el7.x86_64-x86_64-with-centos-7.1.1503-Core Using PETSc directory: /home/wtay/Codes/petsc-3.6.2 Using PETSc arch: petsc-3.6.2_shared_rel ----------------------------------------- Using C compiler: /opt/ud/openmpi-1.8.8/bin/mpicc -fPIC -wd1572 -O3 ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: /opt/ud/openmpi-1.8.8/bin/mpif90 -fPIC -O3 ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/include -I/home/wtay/Codes/petsc-3.6.2/include -I/home/wtay/Codes/petsc-3.6.2/include -I/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/include -I/home/wtay/Lib/petsc-3.6.2_shared_rel/include -I/opt/ud/openmpi-1.8.8/include ----------------------------------------- Using C linker: /opt/ud/openmpi-1.8.8/bin/mpicc Using Fortran linker: /opt/ud/openmpi-1.8.8/bin/mpif90 Using libraries: -Wl,-rpath,/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/lib -L/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/lib -lpetsc -Wl,-rpath,/home/wtay/Lib/petsc-3.6.2_shared_rel/lib -L/home/wtay/Lib/petsc-3.6.2_shared_rel/lib -lHYPRE -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -lmpi_cxx -Wl,-rpath,/opt/ud/intel_xe_2013sp1/mkl/lib/intel64 -L/opt/ud/intel_xe_2013sp1/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -lX11 -lhwloc -lssl -lcrypto -lmpi_usempi -lmpi_mpifh -lifport -lifcore -lm -lmpi_cxx -ldl -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -lmpi -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -limf -lsvml -lirng -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lpthread -lirc_s -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -ldl ----------------------------------------- -------------- next part -------------- 0.000000000000000E+000 0.353000000000000 0.000000000000000E+000 90.0000000000000 0.000000000000000E+000 0.000000000000000E+000 1.00000000000000 0.400000000000000 0 -400000 AB,AA,BB -2.00050000002375 2.00050000002375 2.61200002906844 2.53550002543489 size_x,size_y,size_z 79 133 75 body_cg_ini 0.523700833348298 0.778648765134454 7.03282656467989 Warning - length difference between element and cell max_element_length,min_element_length,min_delta 0.000000000000000E+000 10000000000.0000 4.300000000000000E-002 maximum ngh_surfaces and ngh_vertics are 149 68 minimum ngh_surfaces and ngh_vertics are 54 22 body_cg_ini 0.896813342835977 -0.976707581163755 7.03282656467989 Warning - length difference between element and cell max_element_length,min_element_length,min_delta 0.000000000000000E+000 10000000000.0000 4.300000000000000E-002 maximum ngh_surfaces and ngh_vertics are 149 68 minimum ngh_surfaces and ngh_vertics are 54 22 min IIB_cell_no 0 max IIB_cell_no 265 final initial IIB_cell_no 1325 min I_cell_no 0 max I_cell_no 94 final initial I_cell_no 470 size(IIB_cell_u),size(I_cell_u),size(IIB_equal_cell_u),size(I_equal_cell_u) 1325 470 1325 470 IIB_I_cell_no_uvw_total1 265 270 255 94 91 95 IIB_I_cell_no_uvw_total2 273 280 267 97 94 98 1 0.00150000 0.14647508 0.14738746 1.08799843 0.18763287E+02 0.12408027E+00 0.78750180E+06 escape_time reached, so abort body 1 implicit forces and moment 1 0.869079034253505 -0.476901544372401 8.158481275554146E-002 0.428147881055389 0.558124364151253 -0.928673736540968 body 2 implicit forces and moment 2 0.551071762807368 0.775547234778320 0.135476932751926 -0.634587666384900 0.290233967077166 0.936524191625998 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./a.out on a petsc-3.6.2_shared_rel named n12-01 with 8 processors, by wtay Thu Nov 5 04:14:33 2015 Using Petsc Release Version 3.6.2, Oct, 02, 2015 Max Max/Min Avg Total Time (sec): 1.329e+02 1.00000 1.329e+02 Objects: 5.700e+01 1.00000 5.700e+01 Flops: 4.822e+09 1.17398 4.486e+09 3.589e+10 Flops/sec: 3.628e+07 1.17398 3.374e+07 2.700e+08 MPI Messages: 1.340e+03 2.00000 1.172e+03 9.380e+03 MPI Message Lengths: 1.761e+08 2.00000 1.314e+05 1.233e+09 MPI Reductions: 3.526e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 1.3294e+02 100.0% 3.5886e+10 100.0% 9.380e+03 100.0% 1.314e+05 100.0% 3.525e+03 100.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 666 1.0 3.5888e+00 1.2 1.43e+09 1.2 9.3e+03 1.3e+05 0.0e+00 2 30 99100 0 2 30 99100 0 2953 MatSolve 297 1.0 2.2749e+00 1.3 1.15e+09 1.2 0.0e+00 0.0e+00 0.0e+00 2 24 0 0 0 2 24 0 0 0 3746 MatLUFactorNum 99 1.0 4.7327e+00 1.2 6.34e+08 1.2 0.0e+00 0.0e+00 0.0e+00 3 13 0 0 0 3 13 0 0 0 991 MatILUFactorSym 1 1.0 3.1299e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatConvert 1 1.0 2.0472e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyBegin 100 1.0 3.2489e+00111.2 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+02 1 0 0 0 6 1 0 0 0 6 0 MatAssemblyEnd 100 1.0 1.0889e+00 1.2 0.00e+00 0.0 5.6e+01 4.1e+04 1.6e+01 1 0 1 0 0 1 0 1 0 0 0 MatGetRowIJ 3 1.0 3.0994e-06 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 5.3592e-03 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPGMRESOrthog 468 1.0 4.5374e-01 1.1 5.63e+08 1.1 0.0e+00 0.0e+00 4.7e+02 0 12 0 0 13 0 12 0 0 13 9309 KSPSetUp 199 1.0 2.7158e-02 4.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 199 1.0 9.0667e+01 1.0 4.82e+09 1.2 9.3e+03 1.3e+05 2.1e+03 68100 99100 60 68100 99100 60 396 VecDot 198 1.0 5.7577e-01 3.0 1.25e+08 1.1 0.0e+00 0.0e+00 2.0e+02 0 3 0 0 6 0 3 0 0 6 1626 VecDotNorm2 99 1.0 4.6992e-01 4.9 1.25e+08 1.1 0.0e+00 0.0e+00 9.9e+01 0 3 0 0 3 0 3 0 0 3 1992 VecMDot 468 1.0 2.4218e-01 1.0 2.82e+08 1.1 0.0e+00 0.0e+00 4.7e+02 0 6 0 0 13 0 6 0 0 13 8720 VecNorm 766 1.0 1.3641e+00 5.9 2.44e+08 1.1 0.0e+00 0.0e+00 7.7e+02 1 5 0 0 22 1 5 0 0 22 1343 VecScale 568 1.0 4.0681e-02 1.1 5.97e+07 1.1 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 11003 VecCopy 298 1.0 1.3690e-01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 1364 1.0 4.0399e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 100 1.0 2.4372e-02 1.3 2.10e+07 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 6467 VecAXPBYCZ 198 1.0 3.0735e-01 1.4 2.50e+08 1.1 0.0e+00 0.0e+00 0.0e+00 0 5 0 0 0 0 5 0 0 0 6092 VecWAXPY 198 1.0 3.0173e-01 1.5 1.25e+08 1.1 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 3103 VecMAXPY 568 1.0 2.9665e-01 1.2 3.80e+08 1.1 0.0e+00 0.0e+00 0.0e+00 0 8 0 0 0 0 8 0 0 0 9606 VecAssemblyBegin 398 1.0 4.4876e-0117.6 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+03 0 0 0 0 34 0 0 0 0 34 0 VecAssemblyEnd 398 1.0 8.7547e-04 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 666 1.0 9.3587e-02 2.4 0.00e+00 0.0 9.3e+03 1.3e+05 0.0e+00 0 0 99100 0 0 0 99100 0 0 VecScatterEnd 666 1.0 6.4374e-01 4.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecNormalize 568 1.0 1.9372e-01 1.2 1.79e+08 1.1 0.0e+00 0.0e+00 5.7e+02 0 4 0 0 16 0 4 0 0 16 6932 PCSetUp 199 1.0 9.0977e+00 1.1 6.34e+08 1.2 0.0e+00 0.0e+00 4.0e+00 7 13 0 0 0 7 13 0 0 0 515 PCSetUpOnBlocks 99 1.0 4.7691e+00 1.2 6.34e+08 1.2 0.0e+00 0.0e+00 0.0e+00 3 13 0 0 0 3 13 0 0 0 983 PCApply 865 1.0 7.5746e+01 1.0 1.15e+09 1.2 0.0e+00 0.0e+00 0.0e+00 56 24 0 0 0 56 24 0 0 0 113 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 7 7 114426392 0 Matrix Null Space 1 1 592 0 Krylov Solver 3 3 20664 0 Vector 33 33 36525656 0 Vector Scatter 2 2 2176 0 Index Set 7 7 2691760 0 Preconditioner 3 3 3208 0 Viewer 1 0 0 0 ======================================================================================================================== Average time to get PetscTime(): 9.53674e-08 Average time for MPI_Barrier(): 3.38554e-06 Average time for zero size MPI_Send(): 3.72529e-06 #PETSc Option Table entries: -log_summary #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --with-mpi-dir=/opt/ud/openmpi-1.8.8/ --with-blas-lapack-dir=/opt/ud/intel_xe_2013sp1/mkl/lib/intel64/ --with-debugging=0 --download-hypre=1 --prefix=/home/wtay/Lib/petsc-3.6.2_shared_rel --known-mpi-shared=1 --with-shared-libraries --with-fortran-interfaces=1 ----------------------------------------- Libraries compiled on Sun Oct 18 17:34:07 2015 on hpc12 Machine characteristics: Linux-3.10.0-123.20.1.el7.x86_64-x86_64-with-centos-7.1.1503-Core Using PETSc directory: /home/wtay/Codes/petsc-3.6.2 Using PETSc arch: petsc-3.6.2_shared_rel ----------------------------------------- Using C compiler: /opt/ud/openmpi-1.8.8/bin/mpicc -fPIC -wd1572 -O3 ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: /opt/ud/openmpi-1.8.8/bin/mpif90 -fPIC -O3 ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/include -I/home/wtay/Codes/petsc-3.6.2/include -I/home/wtay/Codes/petsc-3.6.2/include -I/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/include -I/home/wtay/Lib/petsc-3.6.2_shared_rel/include -I/opt/ud/openmpi-1.8.8/include ----------------------------------------- Using C linker: /opt/ud/openmpi-1.8.8/bin/mpicc Using Fortran linker: /opt/ud/openmpi-1.8.8/bin/mpif90 Using libraries: -Wl,-rpath,/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/lib -L/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/lib -lpetsc -Wl,-rpath,/home/wtay/Lib/petsc-3.6.2_shared_rel/lib -L/home/wtay/Lib/petsc-3.6.2_shared_rel/lib -lHYPRE -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -lmpi_cxx -Wl,-rpath,/opt/ud/intel_xe_2013sp1/mkl/lib/intel64 -L/opt/ud/intel_xe_2013sp1/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -lX11 -lhwloc -lssl -lcrypto -lmpi_usempi -lmpi_mpifh -lifport -lifcore -lm -lmpi_cxx -ldl -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -lmpi -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -limf -lsvml -lirng -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lpthread -lirc_s -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -ldl ----------------------------------------- From bsmith at mcs.anl.gov Wed Nov 4 22:03:54 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 4 Nov 2015 22:03:54 -0600 Subject: [petsc-users] Scaling with number of cores In-Reply-To: <563ACD5F.6060301@gmail.com> References: <5634EDA4.5030203@gmail.com> <56356E2A.8090000@gmail.com> <0BEDF334-C0C6-49D1-9FE8-D178181C5F00@mcs.anl.gov> <5636140A.3040506@gmail.com> <61EEB3FD-A093-475C-9841-9E79BB120385@mcs.anl.gov> <5636BDF3.8000109@gmail.com> <5636E059.2010107@gmail.com> <501DC517-2887-4222-9ECD-9951D521A4E5@mcs.anl.gov> <56370085.7070502@gmail.com> <6FF57BB6-FB0D-49F0-BF84-E4EA760FC010@mcs.anl.gov> <56372A12.90900@gmail.com> <563839F8.9080000@gmail.com> <589BE4CF-18B5-4F04-A442-7FB0F2F3AFC1@mcs.anl.gov> <5638AD42.9060609@gmail.com> <2057DCBD-BDB9-4424-A4E3-C1BA5BB8C214@mcs.anl.gov> <563ACD5F.6060301@gmail.com> Message-ID: <56A89BE9-5AEC-4A99-A888-2B7EF35C7BC9@mcs.anl.gov> There is a problem here. The -log_summary doesn't show all the events associated with the -pc_type gamg preconditioner it should have rows like VecDot 2 1.0 6.1989e-06 1.0 1.00e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1613 VecMDot 134 1.0 5.4145e-04 1.0 1.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 3025 VecNorm 154 1.0 2.4176e-04 1.0 3.82e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1578 VecScale 148 1.0 1.6928e-04 1.0 1.76e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1039 VecCopy 106 1.0 1.2255e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 474 1.0 5.1236e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 54 1.0 1.3471e-04 1.0 2.35e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1742 VecAYPX 384 1.0 5.7459e-04 1.0 4.94e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 860 VecAXPBYCZ 192 1.0 4.7398e-04 1.0 9.88e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 2085 VecWAXPY 2 1.0 7.8678e-06 1.0 5.00e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 636 VecMAXPY 148 1.0 8.1539e-04 1.0 1.96e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 3 0 0 0 1 3 0 0 0 2399 VecPointwiseMult 66 1.0 1.1253e-04 1.0 6.79e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 604 VecScatterBegin 45 1.0 6.3419e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSetRandom 6 1.0 3.0994e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecReduceArith 4 1.0 1.3113e-05 1.0 2.00e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1525 VecReduceComm 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecNormalize 148 1.0 4.4799e-04 1.0 5.27e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1177 MatMult 424 1.0 8.9276e-03 1.0 2.09e+07 1.0 0.0e+00 0.0e+00 0.0e+00 7 37 0 0 0 7 37 0 0 0 2343 MatMultAdd 48 1.0 5.0926e-04 1.0 1.05e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 2069 MatMultTranspose 48 1.0 9.8586e-04 1.0 1.05e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 1069 MatSolve 16 1.0 2.2173e-05 1.0 1.02e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 460 MatSOR 354 1.0 1.0547e-02 1.0 1.72e+07 1.0 0.0e+00 0.0e+00 0.0e+00 9 31 0 0 0 9 31 0 0 0 1631 MatLUFactorSym 2 1.0 4.7922e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatLUFactorNum 2 1.0 2.5272e-05 1.0 7.75e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 307 MatScale 18 1.0 1.7142e-04 1.0 1.50e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 874 MatResidual 48 1.0 1.0548e-03 1.0 2.33e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 1 4 0 0 0 2212 MatAssemblyBegin 57 1.0 4.7684e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 57 1.0 1.9786e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 MatGetRow 21616 1.0 1.8497e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 MatGetRowIJ 2 1.0 6.9141e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 2 1.0 6.0797e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatCoarsen 6 1.0 9.3222e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatZeroEntries 2 1.0 3.9101e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAXPY 6 1.0 1.7998e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 MatFDColorCreate 1 1.0 3.2902e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatFDColorSetUp 1 1.0 1.6739e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatFDColorApply 2 1.0 1.3199e-03 1.0 2.41e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 1 4 0 0 0 1826 MatFDColorFunc 42 1.0 7.4601e-04 1.0 2.20e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 1 4 0 0 0 2956 MatMatMult 6 1.0 5.1048e-03 1.0 1.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 4 2 0 0 0 4 2 0 0 0 241 MatMatMultSym 6 1.0 3.2601e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 MatMatMultNum 6 1.0 1.8158e-03 1.0 1.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 679 MatPtAP 6 1.0 2.1328e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 0.0e+00 18 11 0 0 0 18 11 0 0 0 283 MatPtAPSymbolic 6 1.0 1.0073e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 8 0 0 0 0 8 0 0 0 0 0 MatPtAPNumeric 6 1.0 1.1230e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 0.0e+00 9 11 0 0 0 9 11 0 0 0 537 MatTrnMatMult 2 1.0 7.2789e-04 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 75 MatTrnMatMultSym 2 1.0 5.7006e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatTrnMatMultNum 2 1.0 1.5473e-04 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 352 MatGetSymTrans 8 1.0 3.1638e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPGMRESOrthog 134 1.0 1.3156e-03 1.0 3.28e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 6 0 0 0 1 6 0 0 0 2491 KSPSetUp 24 1.0 4.6754e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 2 1.0 1.1291e-01 1.0 5.32e+07 1.0 0.0e+00 0.0e+00 0.0e+00 94 95 0 0 0 94 95 0 0 0 471 PCGAMGGraph_AGG 6 1.0 1.2108e-02 1.0 1.82e+04 1.0 0.0e+00 0.0e+00 0.0e+00 10 0 0 0 0 10 0 0 0 0 2 PCGAMGCoarse_AGG 6 1.0 1.1127e-03 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 49 PCGAMGProl_AGG 6 1.0 4.1062e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 34 0 0 0 0 34 0 0 0 0 0 PCGAMGPOpt_AGG 6 1.0 1.1200e-02 1.0 5.98e+06 1.0 0.0e+00 0.0e+00 0.0e+00 9 11 0 0 0 9 11 0 0 0 534 GAMG: createProl 6 1.0 6.5530e-02 1.0 6.06e+06 1.0 0.0e+00 0.0e+00 0.0e+00 55 11 0 0 0 55 11 0 0 0 92 Graph 12 1.0 1.1692e-02 1.0 1.82e+04 1.0 0.0e+00 0.0e+00 0.0e+00 10 0 0 0 0 10 0 0 0 0 2 MIS/Agg 6 1.0 1.4496e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SA: col data 6 1.0 7.1526e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SA: frmProl0 6 1.0 4.0917e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 34 0 0 0 0 34 0 0 0 0 0 SA: smooth 6 1.0 1.1198e-02 1.0 5.98e+06 1.0 0.0e+00 0.0e+00 0.0e+00 9 11 0 0 0 9 11 0 0 0 534 GAMG: partLevel 6 1.0 2.1341e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 0.0e+00 18 11 0 0 0 18 11 0 0 0 283 PCSetUp 4 1.0 8.8020e-02 1.0 1.21e+07 1.0 0.0e+00 0.0e+00 0.0e+00 74 22 0 0 0 74 22 0 0 0 137 PCSetUpOnBlocks 16 1.0 1.8382e-04 1.0 7.75e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 42 PCApply 16 1.0 2.3858e-02 1.0 3.91e+07 1.0 0.0e+00 0.0e+00 0.0e+00 20 70 0 0 0 20 70 0 0 0 1637 Are you sure you ran with -pc_type gamg ? What about running with -info does it print anything about gamg? What about -ksp_view does it indicate it is using the gamg preconditioner? > On Nov 4, 2015, at 9:30 PM, TAY wee-beng wrote: > > Hi, > > I have attached the 2 logs. > > Thank you > > Yours sincerely, > > TAY wee-beng > > On 4/11/2015 1:11 AM, Barry Smith wrote: >> Ok, the convergence looks good. Now run on 8 and 64 processes as before with -log_summary and not -ksp_monitor to see how it scales. >> >> Barry >> >>> On Nov 3, 2015, at 6:49 AM, TAY wee-beng wrote: >>> >>> Hi, >>> >>> I tried and have attached the log. >>> >>> Ya, my Poisson eqn has Neumann boundary condition. Do I need to specify some null space stuff? Like KSPSetNullSpace or MatNullSpaceCreate? >>> >>> Thank you >>> >>> Yours sincerely, >>> >>> TAY wee-beng >>> >>> On 3/11/2015 12:45 PM, Barry Smith wrote: >>>>> On Nov 2, 2015, at 10:37 PM, TAY wee-beng wrote: >>>>> >>>>> Hi, >>>>> >>>>> I tried : >>>>> >>>>> 1. -poisson_pc_gamg_agg_nsmooths 1 -poisson_pc_type gamg >>>>> >>>>> 2. -poisson_pc_type gamg >>>> Run with -poisson_ksp_monitor_true_residual -poisson_ksp_monitor_converged_reason >>>> Does your poisson have Neumann boundary conditions? Do you have any zeros on the diagonal for the matrix (you shouldn't). >>>> >>>> There may be something wrong with your poisson discretization that was also messing up hypre >>>> >>>> >>>> >>>>> Both options give: >>>>> >>>>> 1 0.00150000 0.00000000 0.00000000 1.00000000 NaN NaN NaN >>>>> M Diverged but why?, time = 2 >>>>> reason = -9 >>>>> >>>>> How can I check what's wrong? >>>>> >>>>> Thank you >>>>> >>>>> Yours sincerely, >>>>> >>>>> TAY wee-beng >>>>> >>>>> On 3/11/2015 3:18 AM, Barry Smith wrote: >>>>>> hypre is just not scaling well here. I do not know why. Since hypre is a block box for us there is no way to determine why the poor scaling. >>>>>> >>>>>> If you make the same two runs with -pc_type gamg there will be a lot more information in the log summary about in what routines it is scaling well or poorly. >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> >>>>>>> On Nov 2, 2015, at 3:17 AM, TAY wee-beng wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I have attached the 2 files. >>>>>>> >>>>>>> Thank you >>>>>>> >>>>>>> Yours sincerely, >>>>>>> >>>>>>> TAY wee-beng >>>>>>> >>>>>>> On 2/11/2015 2:55 PM, Barry Smith wrote: >>>>>>>> Run (158/2)x(266/2)x(150/2) grid on 8 processes and then (158)x(266)x(150) on 64 processors and send the two -log_summary results >>>>>>>> >>>>>>>> Barry >>>>>>>> >>>>>>>> >>>>>>>>> On Nov 2, 2015, at 12:19 AM, TAY wee-beng wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I have attached the new results. >>>>>>>>> >>>>>>>>> Thank you >>>>>>>>> >>>>>>>>> Yours sincerely, >>>>>>>>> >>>>>>>>> TAY wee-beng >>>>>>>>> >>>>>>>>> On 2/11/2015 12:27 PM, Barry Smith wrote: >>>>>>>>>> Run without the -momentum_ksp_view -poisson_ksp_view and send the new results >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> You can see from the log summary that the PCSetUp is taking a much smaller percentage of the time meaning that it is reusing the preconditioner and not rebuilding it each time. >>>>>>>>>> >>>>>>>>>> Barry >>>>>>>>>> >>>>>>>>>> Something makes no sense with the output: it gives >>>>>>>>>> >>>>>>>>>> KSPSolve 199 1.0 2.3298e+03 1.0 5.20e+09 1.8 3.8e+04 9.9e+05 5.0e+02 90100 66100 24 90100 66100 24 165 >>>>>>>>>> >>>>>>>>>> 90% of the time is in the solve but there is no significant amount of time in other events of the code which is just not possible. I hope it is due to your IO. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Nov 1, 2015, at 10:02 PM, TAY wee-beng wrote: >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I have attached the new run with 100 time steps for 48 and 96 cores. >>>>>>>>>>> >>>>>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >>>>>>>>>>> >>>>>>>>>>> Why does the number of processes increase so much? Is there something wrong with my coding? Seems to be so too for my new run. >>>>>>>>>>> >>>>>>>>>>> Thank you >>>>>>>>>>> >>>>>>>>>>> Yours sincerely, >>>>>>>>>>> >>>>>>>>>>> TAY wee-beng >>>>>>>>>>> >>>>>>>>>>> On 2/11/2015 9:49 AM, Barry Smith wrote: >>>>>>>>>>>> If you are doing many time steps with the same linear solver then you MUST do your weak scaling studies with MANY time steps since the setup time of AMG only takes place in the first stimestep. So run both 48 and 96 processes with the same large number of time steps. >>>>>>>>>>>> >>>>>>>>>>>> Barry >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> On Nov 1, 2015, at 7:35 PM, TAY wee-beng wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> Sorry I forgot and use the old a.out. I have attached the new log for 48cores (log48), together with the 96cores log (log96). >>>>>>>>>>>>> >>>>>>>>>>>>> Why does the number of processes increase so much? Is there something wrong with my coding? >>>>>>>>>>>>> >>>>>>>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >>>>>>>>>>>>> >>>>>>>>>>>>> Lastly, I only simulated 2 time steps previously. Now I run for 10 timesteps (log48_10). Is it building the preconditioner at every timestep? >>>>>>>>>>>>> >>>>>>>>>>>>> Also, what about momentum eqn? Is it working well? >>>>>>>>>>>>> >>>>>>>>>>>>> I will try the gamg later too. >>>>>>>>>>>>> >>>>>>>>>>>>> Thank you >>>>>>>>>>>>> >>>>>>>>>>>>> Yours sincerely, >>>>>>>>>>>>> >>>>>>>>>>>>> TAY wee-beng >>>>>>>>>>>>> >>>>>>>>>>>>> On 2/11/2015 12:30 AM, Barry Smith wrote: >>>>>>>>>>>>>> You used gmres with 48 processes but richardson with 96. You need to be careful and make sure you don't change the solvers when you change the number of processors since you can get very different inconsistent results >>>>>>>>>>>>>> >>>>>>>>>>>>>> Anyways all the time is being spent in the BoomerAMG algebraic multigrid setup and it is is scaling badly. When you double the problem size and number of processes it went from 3.2445e+01 to 4.3599e+02 seconds. >>>>>>>>>>>>>> >>>>>>>>>>>>>> PCSetUp 3 1.0 3.2445e+01 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 62 8 0 0 4 62 8 0 0 5 11 >>>>>>>>>>>>>> >>>>>>>>>>>>>> PCSetUp 3 1.0 4.3599e+02 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 85 18 0 0 6 85 18 0 0 6 2 >>>>>>>>>>>>>> >>>>>>>>>>>>>> Now is the Poisson problem changing at each timestep or can you use the same preconditioner built with BoomerAMG for all the time steps? Algebraic multigrid has a large set up time that you often doesn't matter if you have many time steps but if you have to rebuild it each timestep it is too large? >>>>>>>>>>>>>> >>>>>>>>>>>>>> You might also try -pc_type gamg and see how PETSc's algebraic multigrid scales for your problem/machine. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Barry >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Nov 1, 2015, at 7:30 AM, TAY wee-beng wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 1/11/2015 10:00 AM, Barry Smith wrote: >>>>>>>>>>>>>>>>> On Oct 31, 2015, at 8:43 PM, TAY wee-beng wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 1/11/2015 12:47 AM, Matthew Knepley wrote: >>>>>>>>>>>>>>>>>> On Sat, Oct 31, 2015 at 11:34 AM, TAY wee-beng wrote: >>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I understand that as mentioned in the faq, due to the limitations in memory, the scaling is not linear. So, I am trying to write a proposal to use a supercomputer. >>>>>>>>>>>>>>>>>> Its specs are: >>>>>>>>>>>>>>>>>> Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of memory per node) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 8 cores / processor >>>>>>>>>>>>>>>>>> Interconnect: Tofu (6-dimensional mesh/torus) Interconnect >>>>>>>>>>>>>>>>>> Each cabinet contains 96 computing nodes, >>>>>>>>>>>>>>>>>> One of the requirement is to give the performance of my current code with my current set of data, and there is a formula to calculate the estimated parallel efficiency when using the new large set of data >>>>>>>>>>>>>>>>>> There are 2 ways to give performance: >>>>>>>>>>>>>>>>>> 1. Strong scaling, which is defined as how the elapsed time varies with the number of processors for a fixed >>>>>>>>>>>>>>>>>> problem. >>>>>>>>>>>>>>>>>> 2. Weak scaling, which is defined as how the elapsed time varies with the number of processors for a >>>>>>>>>>>>>>>>>> fixed problem size per processor. >>>>>>>>>>>>>>>>>> I ran my cases with 48 and 96 cores with my current cluster, giving 140 and 90 mins respectively. This is classified as strong scaling. >>>>>>>>>>>>>>>>>> Cluster specs: >>>>>>>>>>>>>>>>>> CPU: AMD 6234 2.4GHz >>>>>>>>>>>>>>>>>> 8 cores / processor (CPU) >>>>>>>>>>>>>>>>>> 6 CPU / node >>>>>>>>>>>>>>>>>> So 48 Cores / CPU >>>>>>>>>>>>>>>>>> Not sure abt the memory / node >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The parallel efficiency ?En? for a given degree of parallelism ?n? indicates how much the program is >>>>>>>>>>>>>>>>>> efficiently accelerated by parallel processing. ?En? is given by the following formulae. Although their >>>>>>>>>>>>>>>>>> derivation processes are different depending on strong and weak scaling, derived formulae are the >>>>>>>>>>>>>>>>>> same. >>>>>>>>>>>>>>>>>> From the estimated time, my parallel efficiency using Amdahl's law on the current old cluster was 52.7%. >>>>>>>>>>>>>>>>>> So is my results acceptable? >>>>>>>>>>>>>>>>>> For the large data set, if using 2205 nodes (2205X8cores), my expected parallel efficiency is only 0.5%. The proposal recommends value of > 50%. >>>>>>>>>>>>>>>>>> The problem with this analysis is that the estimated serial fraction from Amdahl's Law changes as a function >>>>>>>>>>>>>>>>>> of problem size, so you cannot take the strong scaling from one problem and apply it to another without a >>>>>>>>>>>>>>>>>> model of this dependence. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Weak scaling does model changes with problem size, so I would measure weak scaling on your current >>>>>>>>>>>>>>>>>> cluster, and extrapolate to the big machine. I realize that this does not make sense for many scientific >>>>>>>>>>>>>>>>>> applications, but neither does requiring a certain parallel efficiency. >>>>>>>>>>>>>>>>> Ok I check the results for my weak scaling it is even worse for the expected parallel efficiency. From the formula used, it's obvious it's doing some sort of exponential extrapolation decrease. So unless I can achieve a near > 90% speed up when I double the cores and problem size for my current 48/96 cores setup, extrapolating from about 96 nodes to 10,000 nodes will give a much lower expected parallel efficiency for the new case. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> However, it's mentioned in the FAQ that due to memory requirement, it's impossible to get >90% speed when I double the cores and problem size (ie linear increase in performance), which means that I can't get >90% speed up when I double the cores and problem size for my current 48/96 cores setup. Is that so? >>>>>>>>>>>>>>>> What is the output of -ksp_view -log_summary on the problem and then on the problem doubled in size and number of processors? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Barry >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have attached the output >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 48 cores: log48 >>>>>>>>>>>>>>> 96 cores: log96 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> There are 2 solvers - The momentum linear eqn uses bcgs, while the Poisson eqn uses hypre BoomerAMG. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Problem size doubled from 158x266x150 to 158x266x300. >>>>>>>>>>>>>>>>> So is it fair to say that the main problem does not lie in my programming skills, but rather the way the linear equations are solved? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Matt >>>>>>>>>>>>>>>>>> Is it possible for this type of scaling in PETSc (>50%), when using 17640 (2205X8) cores? >>>>>>>>>>>>>>>>>> Btw, I do not have access to the system. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Sent using CloudMagic Email >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>>>>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>> > > From zonexo at gmail.com Thu Nov 5 09:58:15 2015 From: zonexo at gmail.com (TAY wee-beng) Date: Thu, 5 Nov 2015 23:58:15 +0800 Subject: [petsc-users] Scaling with number of cores In-Reply-To: <56A89BE9-5AEC-4A99-A888-2B7EF35C7BC9@mcs.anl.gov> References: <5634EDA4.5030203@gmail.com> <56356E2A.8090000@gmail.com> <0BEDF334-C0C6-49D1-9FE8-D178181C5F00@mcs.anl.gov> <5636140A.3040506@gmail.com> <61EEB3FD-A093-475C-9841-9E79BB120385@mcs.anl.gov> <5636BDF3.8000109@gmail.com> <5636E059.2010107@gmail.com> <501DC517-2887-4222-9ECD-9951D521A4E5@mcs.anl.gov> <56370085.7070502@gmail.com> <6FF57BB6-FB0D-49F0-BF84-E4EA760FC010@mcs.anl.gov> <56372A12.90900@gmail.com> <563839F8.9080000@gmail.com> <589BE4CF-18B5-4F04-A442-7FB0F2F3AFC1@mcs.anl.gov> <5638AD42.9060609@gmail.com> <2057DCBD-BDB9-4424-A4E3-C1BA5BB8C214@mcs.anl.gov> <563ACD5F.6060301@gmail.com> <56A89BE9-5AEC-4A99-A888-2B7EF35C7BC9@mcs.anl.gov> Message-ID: <563B7C97.1070604@gmail.com> Sorry I realised that I didn't use gamg and that's why. But if I use gamg, the 8 core case worked, but the 64 core case shows p diverged. Why is this so? Btw, I have also added nullspace in my code. Thank you. Yours sincerely, TAY wee-beng On 5/11/2015 12:03 PM, Barry Smith wrote: > There is a problem here. The -log_summary doesn't show all the events associated with the -pc_type gamg preconditioner it should have rows like > > VecDot 2 1.0 6.1989e-06 1.0 1.00e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1613 > VecMDot 134 1.0 5.4145e-04 1.0 1.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 3025 > VecNorm 154 1.0 2.4176e-04 1.0 3.82e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1578 > VecScale 148 1.0 1.6928e-04 1.0 1.76e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1039 > VecCopy 106 1.0 1.2255e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecSet 474 1.0 5.1236e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecAXPY 54 1.0 1.3471e-04 1.0 2.35e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1742 > VecAYPX 384 1.0 5.7459e-04 1.0 4.94e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 860 > VecAXPBYCZ 192 1.0 4.7398e-04 1.0 9.88e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 2085 > VecWAXPY 2 1.0 7.8678e-06 1.0 5.00e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 636 > VecMAXPY 148 1.0 8.1539e-04 1.0 1.96e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 3 0 0 0 1 3 0 0 0 2399 > VecPointwiseMult 66 1.0 1.1253e-04 1.0 6.79e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 604 > VecScatterBegin 45 1.0 6.3419e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecSetRandom 6 1.0 3.0994e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecReduceArith 4 1.0 1.3113e-05 1.0 2.00e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1525 > VecReduceComm 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecNormalize 148 1.0 4.4799e-04 1.0 5.27e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1177 > MatMult 424 1.0 8.9276e-03 1.0 2.09e+07 1.0 0.0e+00 0.0e+00 0.0e+00 7 37 0 0 0 7 37 0 0 0 2343 > MatMultAdd 48 1.0 5.0926e-04 1.0 1.05e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 2069 > MatMultTranspose 48 1.0 9.8586e-04 1.0 1.05e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 1069 > MatSolve 16 1.0 2.2173e-05 1.0 1.02e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 460 > MatSOR 354 1.0 1.0547e-02 1.0 1.72e+07 1.0 0.0e+00 0.0e+00 0.0e+00 9 31 0 0 0 9 31 0 0 0 1631 > MatLUFactorSym 2 1.0 4.7922e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatLUFactorNum 2 1.0 2.5272e-05 1.0 7.75e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 307 > MatScale 18 1.0 1.7142e-04 1.0 1.50e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 874 > MatResidual 48 1.0 1.0548e-03 1.0 2.33e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 1 4 0 0 0 2212 > MatAssemblyBegin 57 1.0 4.7684e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyEnd 57 1.0 1.9786e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 > MatGetRow 21616 1.0 1.8497e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 > MatGetRowIJ 2 1.0 6.9141e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatGetOrdering 2 1.0 6.0797e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatCoarsen 6 1.0 9.3222e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatZeroEntries 2 1.0 3.9101e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAXPY 6 1.0 1.7998e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 > MatFDColorCreate 1 1.0 3.2902e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatFDColorSetUp 1 1.0 1.6739e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > MatFDColorApply 2 1.0 1.3199e-03 1.0 2.41e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 1 4 0 0 0 1826 > MatFDColorFunc 42 1.0 7.4601e-04 1.0 2.20e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 1 4 0 0 0 2956 > MatMatMult 6 1.0 5.1048e-03 1.0 1.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 4 2 0 0 0 4 2 0 0 0 241 > MatMatMultSym 6 1.0 3.2601e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 > MatMatMultNum 6 1.0 1.8158e-03 1.0 1.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 679 > MatPtAP 6 1.0 2.1328e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 0.0e+00 18 11 0 0 0 18 11 0 0 0 283 > MatPtAPSymbolic 6 1.0 1.0073e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 8 0 0 0 0 8 0 0 0 0 0 > MatPtAPNumeric 6 1.0 1.1230e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 0.0e+00 9 11 0 0 0 9 11 0 0 0 537 > MatTrnMatMult 2 1.0 7.2789e-04 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 75 > MatTrnMatMultSym 2 1.0 5.7006e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatTrnMatMultNum 2 1.0 1.5473e-04 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 352 > MatGetSymTrans 8 1.0 3.1638e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPGMRESOrthog 134 1.0 1.3156e-03 1.0 3.28e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 6 0 0 0 1 6 0 0 0 2491 > KSPSetUp 24 1.0 4.6754e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSolve 2 1.0 1.1291e-01 1.0 5.32e+07 1.0 0.0e+00 0.0e+00 0.0e+00 94 95 0 0 0 94 95 0 0 0 471 > PCGAMGGraph_AGG 6 1.0 1.2108e-02 1.0 1.82e+04 1.0 0.0e+00 0.0e+00 0.0e+00 10 0 0 0 0 10 0 0 0 0 2 > PCGAMGCoarse_AGG 6 1.0 1.1127e-03 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 49 > PCGAMGProl_AGG 6 1.0 4.1062e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 34 0 0 0 0 34 0 0 0 0 0 > PCGAMGPOpt_AGG 6 1.0 1.1200e-02 1.0 5.98e+06 1.0 0.0e+00 0.0e+00 0.0e+00 9 11 0 0 0 9 11 0 0 0 534 > GAMG: createProl 6 1.0 6.5530e-02 1.0 6.06e+06 1.0 0.0e+00 0.0e+00 0.0e+00 55 11 0 0 0 55 11 0 0 0 92 > Graph 12 1.0 1.1692e-02 1.0 1.82e+04 1.0 0.0e+00 0.0e+00 0.0e+00 10 0 0 0 0 10 0 0 0 0 2 > MIS/Agg 6 1.0 1.4496e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > SA: col data 6 1.0 7.1526e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > SA: frmProl0 6 1.0 4.0917e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 34 0 0 0 0 34 0 0 0 0 0 > SA: smooth 6 1.0 1.1198e-02 1.0 5.98e+06 1.0 0.0e+00 0.0e+00 0.0e+00 9 11 0 0 0 9 11 0 0 0 534 > GAMG: partLevel 6 1.0 2.1341e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 0.0e+00 18 11 0 0 0 18 11 0 0 0 283 > PCSetUp 4 1.0 8.8020e-02 1.0 1.21e+07 1.0 0.0e+00 0.0e+00 0.0e+00 74 22 0 0 0 74 22 0 0 0 137 > PCSetUpOnBlocks 16 1.0 1.8382e-04 1.0 7.75e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 42 > PCApply 16 1.0 2.3858e-02 1.0 3.91e+07 1.0 0.0e+00 0.0e+00 0.0e+00 20 70 0 0 0 20 70 0 0 0 1637 > > > Are you sure you ran with -pc_type gamg ? What about running with -info does it print anything about gamg? What about -ksp_view does it indicate it is using the gamg preconditioner? > > >> On Nov 4, 2015, at 9:30 PM, TAY wee-beng wrote: >> >> Hi, >> >> I have attached the 2 logs. >> >> Thank you >> >> Yours sincerely, >> >> TAY wee-beng >> >> On 4/11/2015 1:11 AM, Barry Smith wrote: >>> Ok, the convergence looks good. Now run on 8 and 64 processes as before with -log_summary and not -ksp_monitor to see how it scales. >>> >>> Barry >>> >>>> On Nov 3, 2015, at 6:49 AM, TAY wee-beng wrote: >>>> >>>> Hi, >>>> >>>> I tried and have attached the log. >>>> >>>> Ya, my Poisson eqn has Neumann boundary condition. Do I need to specify some null space stuff? Like KSPSetNullSpace or MatNullSpaceCreate? >>>> >>>> Thank you >>>> >>>> Yours sincerely, >>>> >>>> TAY wee-beng >>>> >>>> On 3/11/2015 12:45 PM, Barry Smith wrote: >>>>>> On Nov 2, 2015, at 10:37 PM, TAY wee-beng wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I tried : >>>>>> >>>>>> 1. -poisson_pc_gamg_agg_nsmooths 1 -poisson_pc_type gamg >>>>>> >>>>>> 2. -poisson_pc_type gamg >>>>> Run with -poisson_ksp_monitor_true_residual -poisson_ksp_monitor_converged_reason >>>>> Does your poisson have Neumann boundary conditions? Do you have any zeros on the diagonal for the matrix (you shouldn't). >>>>> >>>>> There may be something wrong with your poisson discretization that was also messing up hypre >>>>> >>>>> >>>>> >>>>>> Both options give: >>>>>> >>>>>> 1 0.00150000 0.00000000 0.00000000 1.00000000 NaN NaN NaN >>>>>> M Diverged but why?, time = 2 >>>>>> reason = -9 >>>>>> >>>>>> How can I check what's wrong? >>>>>> >>>>>> Thank you >>>>>> >>>>>> Yours sincerely, >>>>>> >>>>>> TAY wee-beng >>>>>> >>>>>> On 3/11/2015 3:18 AM, Barry Smith wrote: >>>>>>> hypre is just not scaling well here. I do not know why. Since hypre is a block box for us there is no way to determine why the poor scaling. >>>>>>> >>>>>>> If you make the same two runs with -pc_type gamg there will be a lot more information in the log summary about in what routines it is scaling well or poorly. >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Nov 2, 2015, at 3:17 AM, TAY wee-beng wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I have attached the 2 files. >>>>>>>> >>>>>>>> Thank you >>>>>>>> >>>>>>>> Yours sincerely, >>>>>>>> >>>>>>>> TAY wee-beng >>>>>>>> >>>>>>>> On 2/11/2015 2:55 PM, Barry Smith wrote: >>>>>>>>> Run (158/2)x(266/2)x(150/2) grid on 8 processes and then (158)x(266)x(150) on 64 processors and send the two -log_summary results >>>>>>>>> >>>>>>>>> Barry >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Nov 2, 2015, at 12:19 AM, TAY wee-beng wrote: >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I have attached the new results. >>>>>>>>>> >>>>>>>>>> Thank you >>>>>>>>>> >>>>>>>>>> Yours sincerely, >>>>>>>>>> >>>>>>>>>> TAY wee-beng >>>>>>>>>> >>>>>>>>>> On 2/11/2015 12:27 PM, Barry Smith wrote: >>>>>>>>>>> Run without the -momentum_ksp_view -poisson_ksp_view and send the new results >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> You can see from the log summary that the PCSetUp is taking a much smaller percentage of the time meaning that it is reusing the preconditioner and not rebuilding it each time. >>>>>>>>>>> >>>>>>>>>>> Barry >>>>>>>>>>> >>>>>>>>>>> Something makes no sense with the output: it gives >>>>>>>>>>> >>>>>>>>>>> KSPSolve 199 1.0 2.3298e+03 1.0 5.20e+09 1.8 3.8e+04 9.9e+05 5.0e+02 90100 66100 24 90100 66100 24 165 >>>>>>>>>>> >>>>>>>>>>> 90% of the time is in the solve but there is no significant amount of time in other events of the code which is just not possible. I hope it is due to your IO. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On Nov 1, 2015, at 10:02 PM, TAY wee-beng wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> I have attached the new run with 100 time steps for 48 and 96 cores. >>>>>>>>>>>> >>>>>>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >>>>>>>>>>>> >>>>>>>>>>>> Why does the number of processes increase so much? Is there something wrong with my coding? Seems to be so too for my new run. >>>>>>>>>>>> >>>>>>>>>>>> Thank you >>>>>>>>>>>> >>>>>>>>>>>> Yours sincerely, >>>>>>>>>>>> >>>>>>>>>>>> TAY wee-beng >>>>>>>>>>>> >>>>>>>>>>>> On 2/11/2015 9:49 AM, Barry Smith wrote: >>>>>>>>>>>>> If you are doing many time steps with the same linear solver then you MUST do your weak scaling studies with MANY time steps since the setup time of AMG only takes place in the first stimestep. So run both 48 and 96 processes with the same large number of time steps. >>>>>>>>>>>>> >>>>>>>>>>>>> Barry >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> On Nov 1, 2015, at 7:35 PM, TAY wee-beng wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Sorry I forgot and use the old a.out. I have attached the new log for 48cores (log48), together with the 96cores log (log96). >>>>>>>>>>>>>> >>>>>>>>>>>>>> Why does the number of processes increase so much? Is there something wrong with my coding? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Lastly, I only simulated 2 time steps previously. Now I run for 10 timesteps (log48_10). Is it building the preconditioner at every timestep? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Also, what about momentum eqn? Is it working well? >>>>>>>>>>>>>> >>>>>>>>>>>>>> I will try the gamg later too. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thank you >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yours sincerely, >>>>>>>>>>>>>> >>>>>>>>>>>>>> TAY wee-beng >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 2/11/2015 12:30 AM, Barry Smith wrote: >>>>>>>>>>>>>>> You used gmres with 48 processes but richardson with 96. You need to be careful and make sure you don't change the solvers when you change the number of processors since you can get very different inconsistent results >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Anyways all the time is being spent in the BoomerAMG algebraic multigrid setup and it is is scaling badly. When you double the problem size and number of processes it went from 3.2445e+01 to 4.3599e+02 seconds. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> PCSetUp 3 1.0 3.2445e+01 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 62 8 0 0 4 62 8 0 0 5 11 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> PCSetUp 3 1.0 4.3599e+02 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 85 18 0 0 6 85 18 0 0 6 2 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Now is the Poisson problem changing at each timestep or can you use the same preconditioner built with BoomerAMG for all the time steps? Algebraic multigrid has a large set up time that you often doesn't matter if you have many time steps but if you have to rebuild it each timestep it is too large? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> You might also try -pc_type gamg and see how PETSc's algebraic multigrid scales for your problem/machine. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Barry >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Nov 1, 2015, at 7:30 AM, TAY wee-beng wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 1/11/2015 10:00 AM, Barry Smith wrote: >>>>>>>>>>>>>>>>>> On Oct 31, 2015, at 8:43 PM, TAY wee-beng wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On 1/11/2015 12:47 AM, Matthew Knepley wrote: >>>>>>>>>>>>>>>>>>> On Sat, Oct 31, 2015 at 11:34 AM, TAY wee-beng wrote: >>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I understand that as mentioned in the faq, due to the limitations in memory, the scaling is not linear. So, I am trying to write a proposal to use a supercomputer. >>>>>>>>>>>>>>>>>>> Its specs are: >>>>>>>>>>>>>>>>>>> Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of memory per node) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 8 cores / processor >>>>>>>>>>>>>>>>>>> Interconnect: Tofu (6-dimensional mesh/torus) Interconnect >>>>>>>>>>>>>>>>>>> Each cabinet contains 96 computing nodes, >>>>>>>>>>>>>>>>>>> One of the requirement is to give the performance of my current code with my current set of data, and there is a formula to calculate the estimated parallel efficiency when using the new large set of data >>>>>>>>>>>>>>>>>>> There are 2 ways to give performance: >>>>>>>>>>>>>>>>>>> 1. Strong scaling, which is defined as how the elapsed time varies with the number of processors for a fixed >>>>>>>>>>>>>>>>>>> problem. >>>>>>>>>>>>>>>>>>> 2. Weak scaling, which is defined as how the elapsed time varies with the number of processors for a >>>>>>>>>>>>>>>>>>> fixed problem size per processor. >>>>>>>>>>>>>>>>>>> I ran my cases with 48 and 96 cores with my current cluster, giving 140 and 90 mins respectively. This is classified as strong scaling. >>>>>>>>>>>>>>>>>>> Cluster specs: >>>>>>>>>>>>>>>>>>> CPU: AMD 6234 2.4GHz >>>>>>>>>>>>>>>>>>> 8 cores / processor (CPU) >>>>>>>>>>>>>>>>>>> 6 CPU / node >>>>>>>>>>>>>>>>>>> So 48 Cores / CPU >>>>>>>>>>>>>>>>>>> Not sure abt the memory / node >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> The parallel efficiency ?En? for a given degree of parallelism ?n? indicates how much the program is >>>>>>>>>>>>>>>>>>> efficiently accelerated by parallel processing. ?En? is given by the following formulae. Although their >>>>>>>>>>>>>>>>>>> derivation processes are different depending on strong and weak scaling, derived formulae are the >>>>>>>>>>>>>>>>>>> same. >>>>>>>>>>>>>>>>>>> From the estimated time, my parallel efficiency using Amdahl's law on the current old cluster was 52.7%. >>>>>>>>>>>>>>>>>>> So is my results acceptable? >>>>>>>>>>>>>>>>>>> For the large data set, if using 2205 nodes (2205X8cores), my expected parallel efficiency is only 0.5%. The proposal recommends value of > 50%. >>>>>>>>>>>>>>>>>>> The problem with this analysis is that the estimated serial fraction from Amdahl's Law changes as a function >>>>>>>>>>>>>>>>>>> of problem size, so you cannot take the strong scaling from one problem and apply it to another without a >>>>>>>>>>>>>>>>>>> model of this dependence. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Weak scaling does model changes with problem size, so I would measure weak scaling on your current >>>>>>>>>>>>>>>>>>> cluster, and extrapolate to the big machine. I realize that this does not make sense for many scientific >>>>>>>>>>>>>>>>>>> applications, but neither does requiring a certain parallel efficiency. >>>>>>>>>>>>>>>>>> Ok I check the results for my weak scaling it is even worse for the expected parallel efficiency. From the formula used, it's obvious it's doing some sort of exponential extrapolation decrease. So unless I can achieve a near > 90% speed up when I double the cores and problem size for my current 48/96 cores setup, extrapolating from about 96 nodes to 10,000 nodes will give a much lower expected parallel efficiency for the new case. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> However, it's mentioned in the FAQ that due to memory requirement, it's impossible to get >90% speed when I double the cores and problem size (ie linear increase in performance), which means that I can't get >90% speed up when I double the cores and problem size for my current 48/96 cores setup. Is that so? >>>>>>>>>>>>>>>>> What is the output of -ksp_view -log_summary on the problem and then on the problem doubled in size and number of processors? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Barry >>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I have attached the output >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 48 cores: log48 >>>>>>>>>>>>>>>> 96 cores: log96 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> There are 2 solvers - The momentum linear eqn uses bcgs, while the Poisson eqn uses hypre BoomerAMG. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Problem size doubled from 158x266x150 to 158x266x300. >>>>>>>>>>>>>>>>>> So is it fair to say that the main problem does not lie in my programming skills, but rather the way the linear equations are solved? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Matt >>>>>>>>>>>>>>>>>>> Is it possible for this type of scaling in PETSc (>50%), when using 17640 (2205X8) cores? >>>>>>>>>>>>>>>>>>> Btw, I do not have access to the system. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Sent using CloudMagic Email >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>>>>>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>> >> From bsmith at mcs.anl.gov Thu Nov 5 10:06:30 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 5 Nov 2015 10:06:30 -0600 Subject: [petsc-users] Scaling with number of cores In-Reply-To: <563B7C97.1070604@gmail.com> References: <5634EDA4.5030203@gmail.com> <56356E2A.8090000@gmail.com> <0BEDF334-C0C6-49D1-9FE8-D178181C5F00@mcs.anl.gov> <5636140A.3040506@gmail.com> <61EEB3FD-A093-475C-9841-9E79BB120385@mcs.anl.gov> <5636BDF3.8000109@gmail.com> <5636E059.2010107@gmail.com> <501DC517-2887-4222-9ECD-9951D521A4E5@mcs.anl.gov> <56370085.7070502@gmail.com> <6FF57BB6-FB0D-49F0-BF84-E4EA760FC010@mcs.anl.gov> <56372A12.90900@gmail.com> <563839F8.9080000@gmail.com> <589BE4CF-18B5-4F04-A442-7FB0F2F3AFC1@mcs.anl.gov> <5638AD42.9060609@gmail.com> <2057DCBD-BDB9-4424-A4E3-C1BA5BB8C214@mcs.anl.gov> <563ACD5F.6060301@gmail.com> <56A89BE9-5AEC-4A99-A888-2B7EF35C7BC9@mcs.anl.gov> <563B7C97.1070604@gmail.com> Message-ID: <9C6B69AD-1A8B-4A37-8A17-0DBD52641F20@mcs.anl.gov> > On Nov 5, 2015, at 9:58 AM, TAY wee-beng wrote: > > Sorry I realised that I didn't use gamg and that's why. But if I use gamg, the 8 core case worked, but the 64 core case shows p diverged. > > Why is this so? Btw, I have also added nullspace in my code. You don't need the null space and should not add it. > > Thank you. > > Yours sincerely, > > TAY wee-beng > > On 5/11/2015 12:03 PM, Barry Smith wrote: >> There is a problem here. The -log_summary doesn't show all the events associated with the -pc_type gamg preconditioner it should have rows like >> >> VecDot 2 1.0 6.1989e-06 1.0 1.00e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1613 >> VecMDot 134 1.0 5.4145e-04 1.0 1.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 3025 >> VecNorm 154 1.0 2.4176e-04 1.0 3.82e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1578 >> VecScale 148 1.0 1.6928e-04 1.0 1.76e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1039 >> VecCopy 106 1.0 1.2255e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecSet 474 1.0 5.1236e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecAXPY 54 1.0 1.3471e-04 1.0 2.35e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1742 >> VecAYPX 384 1.0 5.7459e-04 1.0 4.94e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 860 >> VecAXPBYCZ 192 1.0 4.7398e-04 1.0 9.88e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 2085 >> VecWAXPY 2 1.0 7.8678e-06 1.0 5.00e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 636 >> VecMAXPY 148 1.0 8.1539e-04 1.0 1.96e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 3 0 0 0 1 3 0 0 0 2399 >> VecPointwiseMult 66 1.0 1.1253e-04 1.0 6.79e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 604 >> VecScatterBegin 45 1.0 6.3419e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecSetRandom 6 1.0 3.0994e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecReduceArith 4 1.0 1.3113e-05 1.0 2.00e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1525 >> VecReduceComm 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecNormalize 148 1.0 4.4799e-04 1.0 5.27e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1177 >> MatMult 424 1.0 8.9276e-03 1.0 2.09e+07 1.0 0.0e+00 0.0e+00 0.0e+00 7 37 0 0 0 7 37 0 0 0 2343 >> MatMultAdd 48 1.0 5.0926e-04 1.0 1.05e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 2069 >> MatMultTranspose 48 1.0 9.8586e-04 1.0 1.05e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 1069 >> MatSolve 16 1.0 2.2173e-05 1.0 1.02e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 460 >> MatSOR 354 1.0 1.0547e-02 1.0 1.72e+07 1.0 0.0e+00 0.0e+00 0.0e+00 9 31 0 0 0 9 31 0 0 0 1631 >> MatLUFactorSym 2 1.0 4.7922e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatLUFactorNum 2 1.0 2.5272e-05 1.0 7.75e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 307 >> MatScale 18 1.0 1.7142e-04 1.0 1.50e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 874 >> MatResidual 48 1.0 1.0548e-03 1.0 2.33e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 1 4 0 0 0 2212 >> MatAssemblyBegin 57 1.0 4.7684e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatAssemblyEnd 57 1.0 1.9786e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >> MatGetRow 21616 1.0 1.8497e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >> MatGetRowIJ 2 1.0 6.9141e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatGetOrdering 2 1.0 6.0797e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatCoarsen 6 1.0 9.3222e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatZeroEntries 2 1.0 3.9101e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatAXPY 6 1.0 1.7998e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >> MatFDColorCreate 1 1.0 3.2902e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatFDColorSetUp 1 1.0 1.6739e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> MatFDColorApply 2 1.0 1.3199e-03 1.0 2.41e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 1 4 0 0 0 1826 >> MatFDColorFunc 42 1.0 7.4601e-04 1.0 2.20e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 1 4 0 0 0 2956 >> MatMatMult 6 1.0 5.1048e-03 1.0 1.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 4 2 0 0 0 4 2 0 0 0 241 >> MatMatMultSym 6 1.0 3.2601e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 >> MatMatMultNum 6 1.0 1.8158e-03 1.0 1.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 679 >> MatPtAP 6 1.0 2.1328e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 0.0e+00 18 11 0 0 0 18 11 0 0 0 283 >> MatPtAPSymbolic 6 1.0 1.0073e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 8 0 0 0 0 8 0 0 0 0 0 >> MatPtAPNumeric 6 1.0 1.1230e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 0.0e+00 9 11 0 0 0 9 11 0 0 0 537 >> MatTrnMatMult 2 1.0 7.2789e-04 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 75 >> MatTrnMatMultSym 2 1.0 5.7006e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatTrnMatMultNum 2 1.0 1.5473e-04 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 352 >> MatGetSymTrans 8 1.0 3.1638e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> KSPGMRESOrthog 134 1.0 1.3156e-03 1.0 3.28e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 6 0 0 0 1 6 0 0 0 2491 >> KSPSetUp 24 1.0 4.6754e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> KSPSolve 2 1.0 1.1291e-01 1.0 5.32e+07 1.0 0.0e+00 0.0e+00 0.0e+00 94 95 0 0 0 94 95 0 0 0 471 >> PCGAMGGraph_AGG 6 1.0 1.2108e-02 1.0 1.82e+04 1.0 0.0e+00 0.0e+00 0.0e+00 10 0 0 0 0 10 0 0 0 0 2 >> PCGAMGCoarse_AGG 6 1.0 1.1127e-03 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 49 >> PCGAMGProl_AGG 6 1.0 4.1062e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 34 0 0 0 0 34 0 0 0 0 0 >> PCGAMGPOpt_AGG 6 1.0 1.1200e-02 1.0 5.98e+06 1.0 0.0e+00 0.0e+00 0.0e+00 9 11 0 0 0 9 11 0 0 0 534 >> GAMG: createProl 6 1.0 6.5530e-02 1.0 6.06e+06 1.0 0.0e+00 0.0e+00 0.0e+00 55 11 0 0 0 55 11 0 0 0 92 >> Graph 12 1.0 1.1692e-02 1.0 1.82e+04 1.0 0.0e+00 0.0e+00 0.0e+00 10 0 0 0 0 10 0 0 0 0 2 >> MIS/Agg 6 1.0 1.4496e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> SA: col data 6 1.0 7.1526e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> SA: frmProl0 6 1.0 4.0917e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 34 0 0 0 0 34 0 0 0 0 0 >> SA: smooth 6 1.0 1.1198e-02 1.0 5.98e+06 1.0 0.0e+00 0.0e+00 0.0e+00 9 11 0 0 0 9 11 0 0 0 534 >> GAMG: partLevel 6 1.0 2.1341e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 0.0e+00 18 11 0 0 0 18 11 0 0 0 283 >> PCSetUp 4 1.0 8.8020e-02 1.0 1.21e+07 1.0 0.0e+00 0.0e+00 0.0e+00 74 22 0 0 0 74 22 0 0 0 137 >> PCSetUpOnBlocks 16 1.0 1.8382e-04 1.0 7.75e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 42 >> PCApply 16 1.0 2.3858e-02 1.0 3.91e+07 1.0 0.0e+00 0.0e+00 0.0e+00 20 70 0 0 0 20 70 0 0 0 1637 >> >> >> Are you sure you ran with -pc_type gamg ? What about running with -info does it print anything about gamg? What about -ksp_view does it indicate it is using the gamg preconditioner? >> >> >>> On Nov 4, 2015, at 9:30 PM, TAY wee-beng wrote: >>> >>> Hi, >>> >>> I have attached the 2 logs. >>> >>> Thank you >>> >>> Yours sincerely, >>> >>> TAY wee-beng >>> >>> On 4/11/2015 1:11 AM, Barry Smith wrote: >>>> Ok, the convergence looks good. Now run on 8 and 64 processes as before with -log_summary and not -ksp_monitor to see how it scales. >>>> >>>> Barry >>>> >>>>> On Nov 3, 2015, at 6:49 AM, TAY wee-beng wrote: >>>>> >>>>> Hi, >>>>> >>>>> I tried and have attached the log. >>>>> >>>>> Ya, my Poisson eqn has Neumann boundary condition. Do I need to specify some null space stuff? Like KSPSetNullSpace or MatNullSpaceCreate? >>>>> >>>>> Thank you >>>>> >>>>> Yours sincerely, >>>>> >>>>> TAY wee-beng >>>>> >>>>> On 3/11/2015 12:45 PM, Barry Smith wrote: >>>>>>> On Nov 2, 2015, at 10:37 PM, TAY wee-beng wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I tried : >>>>>>> >>>>>>> 1. -poisson_pc_gamg_agg_nsmooths 1 -poisson_pc_type gamg >>>>>>> >>>>>>> 2. -poisson_pc_type gamg >>>>>> Run with -poisson_ksp_monitor_true_residual -poisson_ksp_monitor_converged_reason >>>>>> Does your poisson have Neumann boundary conditions? Do you have any zeros on the diagonal for the matrix (you shouldn't). >>>>>> >>>>>> There may be something wrong with your poisson discretization that was also messing up hypre >>>>>> >>>>>> >>>>>> >>>>>>> Both options give: >>>>>>> >>>>>>> 1 0.00150000 0.00000000 0.00000000 1.00000000 NaN NaN NaN >>>>>>> M Diverged but why?, time = 2 >>>>>>> reason = -9 >>>>>>> >>>>>>> How can I check what's wrong? >>>>>>> >>>>>>> Thank you >>>>>>> >>>>>>> Yours sincerely, >>>>>>> >>>>>>> TAY wee-beng >>>>>>> >>>>>>> On 3/11/2015 3:18 AM, Barry Smith wrote: >>>>>>>> hypre is just not scaling well here. I do not know why. Since hypre is a block box for us there is no way to determine why the poor scaling. >>>>>>>> >>>>>>>> If you make the same two runs with -pc_type gamg there will be a lot more information in the log summary about in what routines it is scaling well or poorly. >>>>>>>> >>>>>>>> Barry >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On Nov 2, 2015, at 3:17 AM, TAY wee-beng wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I have attached the 2 files. >>>>>>>>> >>>>>>>>> Thank you >>>>>>>>> >>>>>>>>> Yours sincerely, >>>>>>>>> >>>>>>>>> TAY wee-beng >>>>>>>>> >>>>>>>>> On 2/11/2015 2:55 PM, Barry Smith wrote: >>>>>>>>>> Run (158/2)x(266/2)x(150/2) grid on 8 processes and then (158)x(266)x(150) on 64 processors and send the two -log_summary results >>>>>>>>>> >>>>>>>>>> Barry >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Nov 2, 2015, at 12:19 AM, TAY wee-beng wrote: >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I have attached the new results. >>>>>>>>>>> >>>>>>>>>>> Thank you >>>>>>>>>>> >>>>>>>>>>> Yours sincerely, >>>>>>>>>>> >>>>>>>>>>> TAY wee-beng >>>>>>>>>>> >>>>>>>>>>> On 2/11/2015 12:27 PM, Barry Smith wrote: >>>>>>>>>>>> Run without the -momentum_ksp_view -poisson_ksp_view and send the new results >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> You can see from the log summary that the PCSetUp is taking a much smaller percentage of the time meaning that it is reusing the preconditioner and not rebuilding it each time. >>>>>>>>>>>> >>>>>>>>>>>> Barry >>>>>>>>>>>> >>>>>>>>>>>> Something makes no sense with the output: it gives >>>>>>>>>>>> >>>>>>>>>>>> KSPSolve 199 1.0 2.3298e+03 1.0 5.20e+09 1.8 3.8e+04 9.9e+05 5.0e+02 90100 66100 24 90100 66100 24 165 >>>>>>>>>>>> >>>>>>>>>>>> 90% of the time is in the solve but there is no significant amount of time in other events of the code which is just not possible. I hope it is due to your IO. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> On Nov 1, 2015, at 10:02 PM, TAY wee-beng wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I have attached the new run with 100 time steps for 48 and 96 cores. >>>>>>>>>>>>> >>>>>>>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >>>>>>>>>>>>> >>>>>>>>>>>>> Why does the number of processes increase so much? Is there something wrong with my coding? Seems to be so too for my new run. >>>>>>>>>>>>> >>>>>>>>>>>>> Thank you >>>>>>>>>>>>> >>>>>>>>>>>>> Yours sincerely, >>>>>>>>>>>>> >>>>>>>>>>>>> TAY wee-beng >>>>>>>>>>>>> >>>>>>>>>>>>> On 2/11/2015 9:49 AM, Barry Smith wrote: >>>>>>>>>>>>>> If you are doing many time steps with the same linear solver then you MUST do your weak scaling studies with MANY time steps since the setup time of AMG only takes place in the first stimestep. So run both 48 and 96 processes with the same large number of time steps. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Barry >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Nov 1, 2015, at 7:35 PM, TAY wee-beng wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Sorry I forgot and use the old a.out. I have attached the new log for 48cores (log48), together with the 96cores log (log96). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Why does the number of processes increase so much? Is there something wrong with my coding? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Lastly, I only simulated 2 time steps previously. Now I run for 10 timesteps (log48_10). Is it building the preconditioner at every timestep? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Also, what about momentum eqn? Is it working well? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I will try the gamg later too. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thank you >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Yours sincerely, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> TAY wee-beng >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 2/11/2015 12:30 AM, Barry Smith wrote: >>>>>>>>>>>>>>>> You used gmres with 48 processes but richardson with 96. You need to be careful and make sure you don't change the solvers when you change the number of processors since you can get very different inconsistent results >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Anyways all the time is being spent in the BoomerAMG algebraic multigrid setup and it is is scaling badly. When you double the problem size and number of processes it went from 3.2445e+01 to 4.3599e+02 seconds. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> PCSetUp 3 1.0 3.2445e+01 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 62 8 0 0 4 62 8 0 0 5 11 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> PCSetUp 3 1.0 4.3599e+02 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 85 18 0 0 6 85 18 0 0 6 2 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Now is the Poisson problem changing at each timestep or can you use the same preconditioner built with BoomerAMG for all the time steps? Algebraic multigrid has a large set up time that you often doesn't matter if you have many time steps but if you have to rebuild it each timestep it is too large? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> You might also try -pc_type gamg and see how PETSc's algebraic multigrid scales for your problem/machine. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Barry >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Nov 1, 2015, at 7:30 AM, TAY wee-beng wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 1/11/2015 10:00 AM, Barry Smith wrote: >>>>>>>>>>>>>>>>>>> On Oct 31, 2015, at 8:43 PM, TAY wee-beng wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On 1/11/2015 12:47 AM, Matthew Knepley wrote: >>>>>>>>>>>>>>>>>>>> On Sat, Oct 31, 2015 at 11:34 AM, TAY wee-beng wrote: >>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I understand that as mentioned in the faq, due to the limitations in memory, the scaling is not linear. So, I am trying to write a proposal to use a supercomputer. >>>>>>>>>>>>>>>>>>>> Its specs are: >>>>>>>>>>>>>>>>>>>> Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of memory per node) >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> 8 cores / processor >>>>>>>>>>>>>>>>>>>> Interconnect: Tofu (6-dimensional mesh/torus) Interconnect >>>>>>>>>>>>>>>>>>>> Each cabinet contains 96 computing nodes, >>>>>>>>>>>>>>>>>>>> One of the requirement is to give the performance of my current code with my current set of data, and there is a formula to calculate the estimated parallel efficiency when using the new large set of data >>>>>>>>>>>>>>>>>>>> There are 2 ways to give performance: >>>>>>>>>>>>>>>>>>>> 1. Strong scaling, which is defined as how the elapsed time varies with the number of processors for a fixed >>>>>>>>>>>>>>>>>>>> problem. >>>>>>>>>>>>>>>>>>>> 2. Weak scaling, which is defined as how the elapsed time varies with the number of processors for a >>>>>>>>>>>>>>>>>>>> fixed problem size per processor. >>>>>>>>>>>>>>>>>>>> I ran my cases with 48 and 96 cores with my current cluster, giving 140 and 90 mins respectively. This is classified as strong scaling. >>>>>>>>>>>>>>>>>>>> Cluster specs: >>>>>>>>>>>>>>>>>>>> CPU: AMD 6234 2.4GHz >>>>>>>>>>>>>>>>>>>> 8 cores / processor (CPU) >>>>>>>>>>>>>>>>>>>> 6 CPU / node >>>>>>>>>>>>>>>>>>>> So 48 Cores / CPU >>>>>>>>>>>>>>>>>>>> Not sure abt the memory / node >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> The parallel efficiency ?En? for a given degree of parallelism ?n? indicates how much the program is >>>>>>>>>>>>>>>>>>>> efficiently accelerated by parallel processing. ?En? is given by the following formulae. Although their >>>>>>>>>>>>>>>>>>>> derivation processes are different depending on strong and weak scaling, derived formulae are the >>>>>>>>>>>>>>>>>>>> same. >>>>>>>>>>>>>>>>>>>> From the estimated time, my parallel efficiency using Amdahl's law on the current old cluster was 52.7%. >>>>>>>>>>>>>>>>>>>> So is my results acceptable? >>>>>>>>>>>>>>>>>>>> For the large data set, if using 2205 nodes (2205X8cores), my expected parallel efficiency is only 0.5%. The proposal recommends value of > 50%. >>>>>>>>>>>>>>>>>>>> The problem with this analysis is that the estimated serial fraction from Amdahl's Law changes as a function >>>>>>>>>>>>>>>>>>>> of problem size, so you cannot take the strong scaling from one problem and apply it to another without a >>>>>>>>>>>>>>>>>>>> model of this dependence. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Weak scaling does model changes with problem size, so I would measure weak scaling on your current >>>>>>>>>>>>>>>>>>>> cluster, and extrapolate to the big machine. I realize that this does not make sense for many scientific >>>>>>>>>>>>>>>>>>>> applications, but neither does requiring a certain parallel efficiency. >>>>>>>>>>>>>>>>>>> Ok I check the results for my weak scaling it is even worse for the expected parallel efficiency. From the formula used, it's obvious it's doing some sort of exponential extrapolation decrease. So unless I can achieve a near > 90% speed up when I double the cores and problem size for my current 48/96 cores setup, extrapolating from about 96 nodes to 10,000 nodes will give a much lower expected parallel efficiency for the new case. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> However, it's mentioned in the FAQ that due to memory requirement, it's impossible to get >90% speed when I double the cores and problem size (ie linear increase in performance), which means that I can't get >90% speed up when I double the cores and problem size for my current 48/96 cores setup. Is that so? >>>>>>>>>>>>>>>>>> What is the output of -ksp_view -log_summary on the problem and then on the problem doubled in size and number of processors? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Barry >>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I have attached the output >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 48 cores: log48 >>>>>>>>>>>>>>>>> 96 cores: log96 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> There are 2 solvers - The momentum linear eqn uses bcgs, while the Poisson eqn uses hypre BoomerAMG. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Problem size doubled from 158x266x150 to 158x266x300. >>>>>>>>>>>>>>>>>>> So is it fair to say that the main problem does not lie in my programming skills, but rather the way the linear equations are solved? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Matt >>>>>>>>>>>>>>>>>>>> Is it possible for this type of scaling in PETSc (>50%), when using 17640 (2205X8) cores? >>>>>>>>>>>>>>>>>>>> Btw, I do not have access to the system. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Sent using CloudMagic Email >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>>>>>>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>> >>> > From bsmith at mcs.anl.gov Thu Nov 5 10:07:46 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 5 Nov 2015 10:07:46 -0600 Subject: [petsc-users] Scaling with number of cores In-Reply-To: <563B7C97.1070604@gmail.com> References: <5634EDA4.5030203@gmail.com> <56356E2A.8090000@gmail.com> <0BEDF334-C0C6-49D1-9FE8-D178181C5F00@mcs.anl.gov> <5636140A.3040506@gmail.com> <61EEB3FD-A093-475C-9841-9E79BB120385@mcs.anl.gov> <5636BDF3.8000109@gmail.com> <5636E059.2010107@gmail.com> <501DC517-2887-4222-9ECD-9951D521A4E5@mcs.anl.gov> <56370085.7070502@gmail.com> <6FF57BB6-FB0D-49F0-BF84-E4EA760FC010@mcs.anl.gov> <56372A12.90900@gmail.com> <563839F8.9080000@gmail.com> <589BE4CF-18B5-4F04-A442-7FB0F2F3AFC1@mcs.anl.gov> <5638AD42.9060609@gmail.com> <2057DCBD-BDB9-4424-A4E3-C1BA5BB8C214@mcs.anl.gov> <563ACD5F.6060301@gmail.com> <56A89BE9-5AEC-4A99-A888-2B7EF35C7BC9@mcs.anl.gov> <563B7C97.1070604@gmail.com> Message-ID: <0F2039C3-52A4-4BB5-BE77-6E75163BFDD9@mcs.anl.gov> > On Nov 5, 2015, at 9:58 AM, TAY wee-beng wrote: > > Sorry I realised that I didn't use gamg and that's why. But if I use gamg, the 8 core case worked, but the 64 core case shows p diverged. Where is the log file for the 8 core case? And where is all the output from where it fails with 64 cores? Include -ksp_monitor_true_residual and -ksp_converged_reason Barry > > Why is this so? Btw, I have also added nullspace in my code. > > Thank you. > > Yours sincerely, > > TAY wee-beng > > On 5/11/2015 12:03 PM, Barry Smith wrote: >> There is a problem here. The -log_summary doesn't show all the events associated with the -pc_type gamg preconditioner it should have rows like >> >> VecDot 2 1.0 6.1989e-06 1.0 1.00e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1613 >> VecMDot 134 1.0 5.4145e-04 1.0 1.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 3025 >> VecNorm 154 1.0 2.4176e-04 1.0 3.82e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1578 >> VecScale 148 1.0 1.6928e-04 1.0 1.76e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1039 >> VecCopy 106 1.0 1.2255e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecSet 474 1.0 5.1236e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecAXPY 54 1.0 1.3471e-04 1.0 2.35e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1742 >> VecAYPX 384 1.0 5.7459e-04 1.0 4.94e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 860 >> VecAXPBYCZ 192 1.0 4.7398e-04 1.0 9.88e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 2085 >> VecWAXPY 2 1.0 7.8678e-06 1.0 5.00e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 636 >> VecMAXPY 148 1.0 8.1539e-04 1.0 1.96e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 3 0 0 0 1 3 0 0 0 2399 >> VecPointwiseMult 66 1.0 1.1253e-04 1.0 6.79e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 604 >> VecScatterBegin 45 1.0 6.3419e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecSetRandom 6 1.0 3.0994e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecReduceArith 4 1.0 1.3113e-05 1.0 2.00e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1525 >> VecReduceComm 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecNormalize 148 1.0 4.4799e-04 1.0 5.27e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1177 >> MatMult 424 1.0 8.9276e-03 1.0 2.09e+07 1.0 0.0e+00 0.0e+00 0.0e+00 7 37 0 0 0 7 37 0 0 0 2343 >> MatMultAdd 48 1.0 5.0926e-04 1.0 1.05e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 2069 >> MatMultTranspose 48 1.0 9.8586e-04 1.0 1.05e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 1069 >> MatSolve 16 1.0 2.2173e-05 1.0 1.02e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 460 >> MatSOR 354 1.0 1.0547e-02 1.0 1.72e+07 1.0 0.0e+00 0.0e+00 0.0e+00 9 31 0 0 0 9 31 0 0 0 1631 >> MatLUFactorSym 2 1.0 4.7922e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatLUFactorNum 2 1.0 2.5272e-05 1.0 7.75e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 307 >> MatScale 18 1.0 1.7142e-04 1.0 1.50e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 874 >> MatResidual 48 1.0 1.0548e-03 1.0 2.33e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 1 4 0 0 0 2212 >> MatAssemblyBegin 57 1.0 4.7684e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatAssemblyEnd 57 1.0 1.9786e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >> MatGetRow 21616 1.0 1.8497e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >> MatGetRowIJ 2 1.0 6.9141e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatGetOrdering 2 1.0 6.0797e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatCoarsen 6 1.0 9.3222e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatZeroEntries 2 1.0 3.9101e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatAXPY 6 1.0 1.7998e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >> MatFDColorCreate 1 1.0 3.2902e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatFDColorSetUp 1 1.0 1.6739e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> MatFDColorApply 2 1.0 1.3199e-03 1.0 2.41e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 1 4 0 0 0 1826 >> MatFDColorFunc 42 1.0 7.4601e-04 1.0 2.20e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 1 4 0 0 0 2956 >> MatMatMult 6 1.0 5.1048e-03 1.0 1.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 4 2 0 0 0 4 2 0 0 0 241 >> MatMatMultSym 6 1.0 3.2601e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 >> MatMatMultNum 6 1.0 1.8158e-03 1.0 1.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 679 >> MatPtAP 6 1.0 2.1328e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 0.0e+00 18 11 0 0 0 18 11 0 0 0 283 >> MatPtAPSymbolic 6 1.0 1.0073e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 8 0 0 0 0 8 0 0 0 0 0 >> MatPtAPNumeric 6 1.0 1.1230e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 0.0e+00 9 11 0 0 0 9 11 0 0 0 537 >> MatTrnMatMult 2 1.0 7.2789e-04 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 75 >> MatTrnMatMultSym 2 1.0 5.7006e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatTrnMatMultNum 2 1.0 1.5473e-04 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 352 >> MatGetSymTrans 8 1.0 3.1638e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> KSPGMRESOrthog 134 1.0 1.3156e-03 1.0 3.28e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 6 0 0 0 1 6 0 0 0 2491 >> KSPSetUp 24 1.0 4.6754e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> KSPSolve 2 1.0 1.1291e-01 1.0 5.32e+07 1.0 0.0e+00 0.0e+00 0.0e+00 94 95 0 0 0 94 95 0 0 0 471 >> PCGAMGGraph_AGG 6 1.0 1.2108e-02 1.0 1.82e+04 1.0 0.0e+00 0.0e+00 0.0e+00 10 0 0 0 0 10 0 0 0 0 2 >> PCGAMGCoarse_AGG 6 1.0 1.1127e-03 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 49 >> PCGAMGProl_AGG 6 1.0 4.1062e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 34 0 0 0 0 34 0 0 0 0 0 >> PCGAMGPOpt_AGG 6 1.0 1.1200e-02 1.0 5.98e+06 1.0 0.0e+00 0.0e+00 0.0e+00 9 11 0 0 0 9 11 0 0 0 534 >> GAMG: createProl 6 1.0 6.5530e-02 1.0 6.06e+06 1.0 0.0e+00 0.0e+00 0.0e+00 55 11 0 0 0 55 11 0 0 0 92 >> Graph 12 1.0 1.1692e-02 1.0 1.82e+04 1.0 0.0e+00 0.0e+00 0.0e+00 10 0 0 0 0 10 0 0 0 0 2 >> MIS/Agg 6 1.0 1.4496e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> SA: col data 6 1.0 7.1526e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> SA: frmProl0 6 1.0 4.0917e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 34 0 0 0 0 34 0 0 0 0 0 >> SA: smooth 6 1.0 1.1198e-02 1.0 5.98e+06 1.0 0.0e+00 0.0e+00 0.0e+00 9 11 0 0 0 9 11 0 0 0 534 >> GAMG: partLevel 6 1.0 2.1341e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 0.0e+00 18 11 0 0 0 18 11 0 0 0 283 >> PCSetUp 4 1.0 8.8020e-02 1.0 1.21e+07 1.0 0.0e+00 0.0e+00 0.0e+00 74 22 0 0 0 74 22 0 0 0 137 >> PCSetUpOnBlocks 16 1.0 1.8382e-04 1.0 7.75e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 42 >> PCApply 16 1.0 2.3858e-02 1.0 3.91e+07 1.0 0.0e+00 0.0e+00 0.0e+00 20 70 0 0 0 20 70 0 0 0 1637 >> >> >> Are you sure you ran with -pc_type gamg ? What about running with -info does it print anything about gamg? What about -ksp_view does it indicate it is using the gamg preconditioner? >> >> >>> On Nov 4, 2015, at 9:30 PM, TAY wee-beng wrote: >>> >>> Hi, >>> >>> I have attached the 2 logs. >>> >>> Thank you >>> >>> Yours sincerely, >>> >>> TAY wee-beng >>> >>> On 4/11/2015 1:11 AM, Barry Smith wrote: >>>> Ok, the convergence looks good. Now run on 8 and 64 processes as before with -log_summary and not -ksp_monitor to see how it scales. >>>> >>>> Barry >>>> >>>>> On Nov 3, 2015, at 6:49 AM, TAY wee-beng wrote: >>>>> >>>>> Hi, >>>>> >>>>> I tried and have attached the log. >>>>> >>>>> Ya, my Poisson eqn has Neumann boundary condition. Do I need to specify some null space stuff? Like KSPSetNullSpace or MatNullSpaceCreate? >>>>> >>>>> Thank you >>>>> >>>>> Yours sincerely, >>>>> >>>>> TAY wee-beng >>>>> >>>>> On 3/11/2015 12:45 PM, Barry Smith wrote: >>>>>>> On Nov 2, 2015, at 10:37 PM, TAY wee-beng wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I tried : >>>>>>> >>>>>>> 1. -poisson_pc_gamg_agg_nsmooths 1 -poisson_pc_type gamg >>>>>>> >>>>>>> 2. -poisson_pc_type gamg >>>>>> Run with -poisson_ksp_monitor_true_residual -poisson_ksp_monitor_converged_reason >>>>>> Does your poisson have Neumann boundary conditions? Do you have any zeros on the diagonal for the matrix (you shouldn't). >>>>>> >>>>>> There may be something wrong with your poisson discretization that was also messing up hypre >>>>>> >>>>>> >>>>>> >>>>>>> Both options give: >>>>>>> >>>>>>> 1 0.00150000 0.00000000 0.00000000 1.00000000 NaN NaN NaN >>>>>>> M Diverged but why?, time = 2 >>>>>>> reason = -9 >>>>>>> >>>>>>> How can I check what's wrong? >>>>>>> >>>>>>> Thank you >>>>>>> >>>>>>> Yours sincerely, >>>>>>> >>>>>>> TAY wee-beng >>>>>>> >>>>>>> On 3/11/2015 3:18 AM, Barry Smith wrote: >>>>>>>> hypre is just not scaling well here. I do not know why. Since hypre is a block box for us there is no way to determine why the poor scaling. >>>>>>>> >>>>>>>> If you make the same two runs with -pc_type gamg there will be a lot more information in the log summary about in what routines it is scaling well or poorly. >>>>>>>> >>>>>>>> Barry >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On Nov 2, 2015, at 3:17 AM, TAY wee-beng wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I have attached the 2 files. >>>>>>>>> >>>>>>>>> Thank you >>>>>>>>> >>>>>>>>> Yours sincerely, >>>>>>>>> >>>>>>>>> TAY wee-beng >>>>>>>>> >>>>>>>>> On 2/11/2015 2:55 PM, Barry Smith wrote: >>>>>>>>>> Run (158/2)x(266/2)x(150/2) grid on 8 processes and then (158)x(266)x(150) on 64 processors and send the two -log_summary results >>>>>>>>>> >>>>>>>>>> Barry >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Nov 2, 2015, at 12:19 AM, TAY wee-beng wrote: >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I have attached the new results. >>>>>>>>>>> >>>>>>>>>>> Thank you >>>>>>>>>>> >>>>>>>>>>> Yours sincerely, >>>>>>>>>>> >>>>>>>>>>> TAY wee-beng >>>>>>>>>>> >>>>>>>>>>> On 2/11/2015 12:27 PM, Barry Smith wrote: >>>>>>>>>>>> Run without the -momentum_ksp_view -poisson_ksp_view and send the new results >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> You can see from the log summary that the PCSetUp is taking a much smaller percentage of the time meaning that it is reusing the preconditioner and not rebuilding it each time. >>>>>>>>>>>> >>>>>>>>>>>> Barry >>>>>>>>>>>> >>>>>>>>>>>> Something makes no sense with the output: it gives >>>>>>>>>>>> >>>>>>>>>>>> KSPSolve 199 1.0 2.3298e+03 1.0 5.20e+09 1.8 3.8e+04 9.9e+05 5.0e+02 90100 66100 24 90100 66100 24 165 >>>>>>>>>>>> >>>>>>>>>>>> 90% of the time is in the solve but there is no significant amount of time in other events of the code which is just not possible. I hope it is due to your IO. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> On Nov 1, 2015, at 10:02 PM, TAY wee-beng wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I have attached the new run with 100 time steps for 48 and 96 cores. >>>>>>>>>>>>> >>>>>>>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >>>>>>>>>>>>> >>>>>>>>>>>>> Why does the number of processes increase so much? Is there something wrong with my coding? Seems to be so too for my new run. >>>>>>>>>>>>> >>>>>>>>>>>>> Thank you >>>>>>>>>>>>> >>>>>>>>>>>>> Yours sincerely, >>>>>>>>>>>>> >>>>>>>>>>>>> TAY wee-beng >>>>>>>>>>>>> >>>>>>>>>>>>> On 2/11/2015 9:49 AM, Barry Smith wrote: >>>>>>>>>>>>>> If you are doing many time steps with the same linear solver then you MUST do your weak scaling studies with MANY time steps since the setup time of AMG only takes place in the first stimestep. So run both 48 and 96 processes with the same large number of time steps. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Barry >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Nov 1, 2015, at 7:35 PM, TAY wee-beng wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Sorry I forgot and use the old a.out. I have attached the new log for 48cores (log48), together with the 96cores log (log96). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Why does the number of processes increase so much? Is there something wrong with my coding? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Lastly, I only simulated 2 time steps previously. Now I run for 10 timesteps (log48_10). Is it building the preconditioner at every timestep? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Also, what about momentum eqn? Is it working well? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I will try the gamg later too. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thank you >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Yours sincerely, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> TAY wee-beng >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 2/11/2015 12:30 AM, Barry Smith wrote: >>>>>>>>>>>>>>>> You used gmres with 48 processes but richardson with 96. You need to be careful and make sure you don't change the solvers when you change the number of processors since you can get very different inconsistent results >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Anyways all the time is being spent in the BoomerAMG algebraic multigrid setup and it is is scaling badly. When you double the problem size and number of processes it went from 3.2445e+01 to 4.3599e+02 seconds. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> PCSetUp 3 1.0 3.2445e+01 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 62 8 0 0 4 62 8 0 0 5 11 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> PCSetUp 3 1.0 4.3599e+02 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 85 18 0 0 6 85 18 0 0 6 2 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Now is the Poisson problem changing at each timestep or can you use the same preconditioner built with BoomerAMG for all the time steps? Algebraic multigrid has a large set up time that you often doesn't matter if you have many time steps but if you have to rebuild it each timestep it is too large? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> You might also try -pc_type gamg and see how PETSc's algebraic multigrid scales for your problem/machine. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Barry >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Nov 1, 2015, at 7:30 AM, TAY wee-beng wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 1/11/2015 10:00 AM, Barry Smith wrote: >>>>>>>>>>>>>>>>>>> On Oct 31, 2015, at 8:43 PM, TAY wee-beng wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On 1/11/2015 12:47 AM, Matthew Knepley wrote: >>>>>>>>>>>>>>>>>>>> On Sat, Oct 31, 2015 at 11:34 AM, TAY wee-beng wrote: >>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I understand that as mentioned in the faq, due to the limitations in memory, the scaling is not linear. So, I am trying to write a proposal to use a supercomputer. >>>>>>>>>>>>>>>>>>>> Its specs are: >>>>>>>>>>>>>>>>>>>> Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of memory per node) >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> 8 cores / processor >>>>>>>>>>>>>>>>>>>> Interconnect: Tofu (6-dimensional mesh/torus) Interconnect >>>>>>>>>>>>>>>>>>>> Each cabinet contains 96 computing nodes, >>>>>>>>>>>>>>>>>>>> One of the requirement is to give the performance of my current code with my current set of data, and there is a formula to calculate the estimated parallel efficiency when using the new large set of data >>>>>>>>>>>>>>>>>>>> There are 2 ways to give performance: >>>>>>>>>>>>>>>>>>>> 1. Strong scaling, which is defined as how the elapsed time varies with the number of processors for a fixed >>>>>>>>>>>>>>>>>>>> problem. >>>>>>>>>>>>>>>>>>>> 2. Weak scaling, which is defined as how the elapsed time varies with the number of processors for a >>>>>>>>>>>>>>>>>>>> fixed problem size per processor. >>>>>>>>>>>>>>>>>>>> I ran my cases with 48 and 96 cores with my current cluster, giving 140 and 90 mins respectively. This is classified as strong scaling. >>>>>>>>>>>>>>>>>>>> Cluster specs: >>>>>>>>>>>>>>>>>>>> CPU: AMD 6234 2.4GHz >>>>>>>>>>>>>>>>>>>> 8 cores / processor (CPU) >>>>>>>>>>>>>>>>>>>> 6 CPU / node >>>>>>>>>>>>>>>>>>>> So 48 Cores / CPU >>>>>>>>>>>>>>>>>>>> Not sure abt the memory / node >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> The parallel efficiency ?En? for a given degree of parallelism ?n? indicates how much the program is >>>>>>>>>>>>>>>>>>>> efficiently accelerated by parallel processing. ?En? is given by the following formulae. Although their >>>>>>>>>>>>>>>>>>>> derivation processes are different depending on strong and weak scaling, derived formulae are the >>>>>>>>>>>>>>>>>>>> same. >>>>>>>>>>>>>>>>>>>> From the estimated time, my parallel efficiency using Amdahl's law on the current old cluster was 52.7%. >>>>>>>>>>>>>>>>>>>> So is my results acceptable? >>>>>>>>>>>>>>>>>>>> For the large data set, if using 2205 nodes (2205X8cores), my expected parallel efficiency is only 0.5%. The proposal recommends value of > 50%. >>>>>>>>>>>>>>>>>>>> The problem with this analysis is that the estimated serial fraction from Amdahl's Law changes as a function >>>>>>>>>>>>>>>>>>>> of problem size, so you cannot take the strong scaling from one problem and apply it to another without a >>>>>>>>>>>>>>>>>>>> model of this dependence. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Weak scaling does model changes with problem size, so I would measure weak scaling on your current >>>>>>>>>>>>>>>>>>>> cluster, and extrapolate to the big machine. I realize that this does not make sense for many scientific >>>>>>>>>>>>>>>>>>>> applications, but neither does requiring a certain parallel efficiency. >>>>>>>>>>>>>>>>>>> Ok I check the results for my weak scaling it is even worse for the expected parallel efficiency. From the formula used, it's obvious it's doing some sort of exponential extrapolation decrease. So unless I can achieve a near > 90% speed up when I double the cores and problem size for my current 48/96 cores setup, extrapolating from about 96 nodes to 10,000 nodes will give a much lower expected parallel efficiency for the new case. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> However, it's mentioned in the FAQ that due to memory requirement, it's impossible to get >90% speed when I double the cores and problem size (ie linear increase in performance), which means that I can't get >90% speed up when I double the cores and problem size for my current 48/96 cores setup. Is that so? >>>>>>>>>>>>>>>>>> What is the output of -ksp_view -log_summary on the problem and then on the problem doubled in size and number of processors? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Barry >>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I have attached the output >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 48 cores: log48 >>>>>>>>>>>>>>>>> 96 cores: log96 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> There are 2 solvers - The momentum linear eqn uses bcgs, while the Poisson eqn uses hypre BoomerAMG. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Problem size doubled from 158x266x150 to 158x266x300. >>>>>>>>>>>>>>>>>>> So is it fair to say that the main problem does not lie in my programming skills, but rather the way the linear equations are solved? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Matt >>>>>>>>>>>>>>>>>>>> Is it possible for this type of scaling in PETSc (>50%), when using 17640 (2205X8) cores? >>>>>>>>>>>>>>>>>>>> Btw, I do not have access to the system. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Sent using CloudMagic Email >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>>>>>>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>> >>> > From zonexo at gmail.com Thu Nov 5 20:47:39 2015 From: zonexo at gmail.com (TAY wee-beng) Date: Fri, 6 Nov 2015 10:47:39 +0800 Subject: [petsc-users] Scaling with number of cores In-Reply-To: <0F2039C3-52A4-4BB5-BE77-6E75163BFDD9@mcs.anl.gov> References: <5634EDA4.5030203@gmail.com> <56356E2A.8090000@gmail.com> <0BEDF334-C0C6-49D1-9FE8-D178181C5F00@mcs.anl.gov> <5636140A.3040506@gmail.com> <61EEB3FD-A093-475C-9841-9E79BB120385@mcs.anl.gov> <5636BDF3.8000109@gmail.com> <5636E059.2010107@gmail.com> <501DC517-2887-4222-9ECD-9951D521A4E5@mcs.anl.gov> <56370085.7070502@gmail.com> <6FF57BB6-FB0D-49F0-BF84-E4EA760FC010@mcs.anl.gov> <56372A12.90900@gmail.com> <563839F8.9080000@gmail.com> <589BE4CF-18B5-4F04-A442-7FB0F2F3AFC1@mcs.anl.gov> <5638AD42.9060609@gmail.com> <2057DCBD-BDB9-4424-A4E3-C1BA5BB8C214@mcs.anl.gov> <563ACD5F.6060301@gmail.com> <56A89BE9-5AEC-4A99-A888-2B7EF35C7BC9@mcs.anl.gov> <563B7C97.1070604@gmail.com> <0F2039C3-52A4-4BB5-BE77-6E75163BFDD9@mcs.anl.gov> Message-ID: <563C14CB.8050205@gmail.com> Hi, I have removed the nullspace and attached the new logs. Thank you Yours sincerely, TAY wee-beng On 6/11/2015 12:07 AM, Barry Smith wrote: >> On Nov 5, 2015, at 9:58 AM, TAY wee-beng wrote: >> >> Sorry I realised that I didn't use gamg and that's why. But if I use gamg, the 8 core case worked, but the 64 core case shows p diverged. > Where is the log file for the 8 core case? And where is all the output from where it fails with 64 cores? Include -ksp_monitor_true_residual and -ksp_converged_reason > > Barry > >> Why is this so? Btw, I have also added nullspace in my code. >> >> Thank you. >> >> Yours sincerely, >> >> TAY wee-beng >> >> On 5/11/2015 12:03 PM, Barry Smith wrote: >>> There is a problem here. The -log_summary doesn't show all the events associated with the -pc_type gamg preconditioner it should have rows like >>> >>> VecDot 2 1.0 6.1989e-06 1.0 1.00e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1613 >>> VecMDot 134 1.0 5.4145e-04 1.0 1.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 3025 >>> VecNorm 154 1.0 2.4176e-04 1.0 3.82e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1578 >>> VecScale 148 1.0 1.6928e-04 1.0 1.76e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1039 >>> VecCopy 106 1.0 1.2255e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecSet 474 1.0 5.1236e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecAXPY 54 1.0 1.3471e-04 1.0 2.35e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1742 >>> VecAYPX 384 1.0 5.7459e-04 1.0 4.94e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 860 >>> VecAXPBYCZ 192 1.0 4.7398e-04 1.0 9.88e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 2085 >>> VecWAXPY 2 1.0 7.8678e-06 1.0 5.00e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 636 >>> VecMAXPY 148 1.0 8.1539e-04 1.0 1.96e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 3 0 0 0 1 3 0 0 0 2399 >>> VecPointwiseMult 66 1.0 1.1253e-04 1.0 6.79e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 604 >>> VecScatterBegin 45 1.0 6.3419e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecSetRandom 6 1.0 3.0994e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecReduceArith 4 1.0 1.3113e-05 1.0 2.00e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1525 >>> VecReduceComm 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecNormalize 148 1.0 4.4799e-04 1.0 5.27e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1177 >>> MatMult 424 1.0 8.9276e-03 1.0 2.09e+07 1.0 0.0e+00 0.0e+00 0.0e+00 7 37 0 0 0 7 37 0 0 0 2343 >>> MatMultAdd 48 1.0 5.0926e-04 1.0 1.05e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 2069 >>> MatMultTranspose 48 1.0 9.8586e-04 1.0 1.05e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 1069 >>> MatSolve 16 1.0 2.2173e-05 1.0 1.02e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 460 >>> MatSOR 354 1.0 1.0547e-02 1.0 1.72e+07 1.0 0.0e+00 0.0e+00 0.0e+00 9 31 0 0 0 9 31 0 0 0 1631 >>> MatLUFactorSym 2 1.0 4.7922e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatLUFactorNum 2 1.0 2.5272e-05 1.0 7.75e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 307 >>> MatScale 18 1.0 1.7142e-04 1.0 1.50e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 874 >>> MatResidual 48 1.0 1.0548e-03 1.0 2.33e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 1 4 0 0 0 2212 >>> MatAssemblyBegin 57 1.0 4.7684e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatAssemblyEnd 57 1.0 1.9786e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >>> MatGetRow 21616 1.0 1.8497e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >>> MatGetRowIJ 2 1.0 6.9141e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatGetOrdering 2 1.0 6.0797e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatCoarsen 6 1.0 9.3222e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatZeroEntries 2 1.0 3.9101e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatAXPY 6 1.0 1.7998e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >>> MatFDColorCreate 1 1.0 3.2902e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatFDColorSetUp 1 1.0 1.6739e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>> MatFDColorApply 2 1.0 1.3199e-03 1.0 2.41e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 1 4 0 0 0 1826 >>> MatFDColorFunc 42 1.0 7.4601e-04 1.0 2.20e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 1 4 0 0 0 2956 >>> MatMatMult 6 1.0 5.1048e-03 1.0 1.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 4 2 0 0 0 4 2 0 0 0 241 >>> MatMatMultSym 6 1.0 3.2601e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 >>> MatMatMultNum 6 1.0 1.8158e-03 1.0 1.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 679 >>> MatPtAP 6 1.0 2.1328e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 0.0e+00 18 11 0 0 0 18 11 0 0 0 283 >>> MatPtAPSymbolic 6 1.0 1.0073e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 8 0 0 0 0 8 0 0 0 0 0 >>> MatPtAPNumeric 6 1.0 1.1230e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 0.0e+00 9 11 0 0 0 9 11 0 0 0 537 >>> MatTrnMatMult 2 1.0 7.2789e-04 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 75 >>> MatTrnMatMultSym 2 1.0 5.7006e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatTrnMatMultNum 2 1.0 1.5473e-04 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 352 >>> MatGetSymTrans 8 1.0 3.1638e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> KSPGMRESOrthog 134 1.0 1.3156e-03 1.0 3.28e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 6 0 0 0 1 6 0 0 0 2491 >>> KSPSetUp 24 1.0 4.6754e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> KSPSolve 2 1.0 1.1291e-01 1.0 5.32e+07 1.0 0.0e+00 0.0e+00 0.0e+00 94 95 0 0 0 94 95 0 0 0 471 >>> PCGAMGGraph_AGG 6 1.0 1.2108e-02 1.0 1.82e+04 1.0 0.0e+00 0.0e+00 0.0e+00 10 0 0 0 0 10 0 0 0 0 2 >>> PCGAMGCoarse_AGG 6 1.0 1.1127e-03 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 49 >>> PCGAMGProl_AGG 6 1.0 4.1062e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 34 0 0 0 0 34 0 0 0 0 0 >>> PCGAMGPOpt_AGG 6 1.0 1.1200e-02 1.0 5.98e+06 1.0 0.0e+00 0.0e+00 0.0e+00 9 11 0 0 0 9 11 0 0 0 534 >>> GAMG: createProl 6 1.0 6.5530e-02 1.0 6.06e+06 1.0 0.0e+00 0.0e+00 0.0e+00 55 11 0 0 0 55 11 0 0 0 92 >>> Graph 12 1.0 1.1692e-02 1.0 1.82e+04 1.0 0.0e+00 0.0e+00 0.0e+00 10 0 0 0 0 10 0 0 0 0 2 >>> MIS/Agg 6 1.0 1.4496e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> SA: col data 6 1.0 7.1526e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> SA: frmProl0 6 1.0 4.0917e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 34 0 0 0 0 34 0 0 0 0 0 >>> SA: smooth 6 1.0 1.1198e-02 1.0 5.98e+06 1.0 0.0e+00 0.0e+00 0.0e+00 9 11 0 0 0 9 11 0 0 0 534 >>> GAMG: partLevel 6 1.0 2.1341e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 0.0e+00 18 11 0 0 0 18 11 0 0 0 283 >>> PCSetUp 4 1.0 8.8020e-02 1.0 1.21e+07 1.0 0.0e+00 0.0e+00 0.0e+00 74 22 0 0 0 74 22 0 0 0 137 >>> PCSetUpOnBlocks 16 1.0 1.8382e-04 1.0 7.75e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 42 >>> PCApply 16 1.0 2.3858e-02 1.0 3.91e+07 1.0 0.0e+00 0.0e+00 0.0e+00 20 70 0 0 0 20 70 0 0 0 1637 >>> >>> >>> Are you sure you ran with -pc_type gamg ? What about running with -info does it print anything about gamg? What about -ksp_view does it indicate it is using the gamg preconditioner? >>> >>> >>>> On Nov 4, 2015, at 9:30 PM, TAY wee-beng wrote: >>>> >>>> Hi, >>>> >>>> I have attached the 2 logs. >>>> >>>> Thank you >>>> >>>> Yours sincerely, >>>> >>>> TAY wee-beng >>>> >>>> On 4/11/2015 1:11 AM, Barry Smith wrote: >>>>> Ok, the convergence looks good. Now run on 8 and 64 processes as before with -log_summary and not -ksp_monitor to see how it scales. >>>>> >>>>> Barry >>>>> >>>>>> On Nov 3, 2015, at 6:49 AM, TAY wee-beng wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I tried and have attached the log. >>>>>> >>>>>> Ya, my Poisson eqn has Neumann boundary condition. Do I need to specify some null space stuff? Like KSPSetNullSpace or MatNullSpaceCreate? >>>>>> >>>>>> Thank you >>>>>> >>>>>> Yours sincerely, >>>>>> >>>>>> TAY wee-beng >>>>>> >>>>>> On 3/11/2015 12:45 PM, Barry Smith wrote: >>>>>>>> On Nov 2, 2015, at 10:37 PM, TAY wee-beng wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I tried : >>>>>>>> >>>>>>>> 1. -poisson_pc_gamg_agg_nsmooths 1 -poisson_pc_type gamg >>>>>>>> >>>>>>>> 2. -poisson_pc_type gamg >>>>>>> Run with -poisson_ksp_monitor_true_residual -poisson_ksp_monitor_converged_reason >>>>>>> Does your poisson have Neumann boundary conditions? Do you have any zeros on the diagonal for the matrix (you shouldn't). >>>>>>> >>>>>>> There may be something wrong with your poisson discretization that was also messing up hypre >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Both options give: >>>>>>>> >>>>>>>> 1 0.00150000 0.00000000 0.00000000 1.00000000 NaN NaN NaN >>>>>>>> M Diverged but why?, time = 2 >>>>>>>> reason = -9 >>>>>>>> >>>>>>>> How can I check what's wrong? >>>>>>>> >>>>>>>> Thank you >>>>>>>> >>>>>>>> Yours sincerely, >>>>>>>> >>>>>>>> TAY wee-beng >>>>>>>> >>>>>>>> On 3/11/2015 3:18 AM, Barry Smith wrote: >>>>>>>>> hypre is just not scaling well here. I do not know why. Since hypre is a block box for us there is no way to determine why the poor scaling. >>>>>>>>> >>>>>>>>> If you make the same two runs with -pc_type gamg there will be a lot more information in the log summary about in what routines it is scaling well or poorly. >>>>>>>>> >>>>>>>>> Barry >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Nov 2, 2015, at 3:17 AM, TAY wee-beng wrote: >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I have attached the 2 files. >>>>>>>>>> >>>>>>>>>> Thank you >>>>>>>>>> >>>>>>>>>> Yours sincerely, >>>>>>>>>> >>>>>>>>>> TAY wee-beng >>>>>>>>>> >>>>>>>>>> On 2/11/2015 2:55 PM, Barry Smith wrote: >>>>>>>>>>> Run (158/2)x(266/2)x(150/2) grid on 8 processes and then (158)x(266)x(150) on 64 processors and send the two -log_summary results >>>>>>>>>>> >>>>>>>>>>> Barry >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On Nov 2, 2015, at 12:19 AM, TAY wee-beng wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> I have attached the new results. >>>>>>>>>>>> >>>>>>>>>>>> Thank you >>>>>>>>>>>> >>>>>>>>>>>> Yours sincerely, >>>>>>>>>>>> >>>>>>>>>>>> TAY wee-beng >>>>>>>>>>>> >>>>>>>>>>>> On 2/11/2015 12:27 PM, Barry Smith wrote: >>>>>>>>>>>>> Run without the -momentum_ksp_view -poisson_ksp_view and send the new results >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> You can see from the log summary that the PCSetUp is taking a much smaller percentage of the time meaning that it is reusing the preconditioner and not rebuilding it each time. >>>>>>>>>>>>> >>>>>>>>>>>>> Barry >>>>>>>>>>>>> >>>>>>>>>>>>> Something makes no sense with the output: it gives >>>>>>>>>>>>> >>>>>>>>>>>>> KSPSolve 199 1.0 2.3298e+03 1.0 5.20e+09 1.8 3.8e+04 9.9e+05 5.0e+02 90100 66100 24 90100 66100 24 165 >>>>>>>>>>>>> >>>>>>>>>>>>> 90% of the time is in the solve but there is no significant amount of time in other events of the code which is just not possible. I hope it is due to your IO. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> On Nov 1, 2015, at 10:02 PM, TAY wee-beng wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I have attached the new run with 100 time steps for 48 and 96 cores. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Why does the number of processes increase so much? Is there something wrong with my coding? Seems to be so too for my new run. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thank you >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yours sincerely, >>>>>>>>>>>>>> >>>>>>>>>>>>>> TAY wee-beng >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 2/11/2015 9:49 AM, Barry Smith wrote: >>>>>>>>>>>>>>> If you are doing many time steps with the same linear solver then you MUST do your weak scaling studies with MANY time steps since the setup time of AMG only takes place in the first stimestep. So run both 48 and 96 processes with the same large number of time steps. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Barry >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Nov 1, 2015, at 7:35 PM, TAY wee-beng wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Sorry I forgot and use the old a.out. I have attached the new log for 48cores (log48), together with the 96cores log (log96). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Why does the number of processes increase so much? Is there something wrong with my coding? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Lastly, I only simulated 2 time steps previously. Now I run for 10 timesteps (log48_10). Is it building the preconditioner at every timestep? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Also, what about momentum eqn? Is it working well? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I will try the gamg later too. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thank you >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Yours sincerely, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> TAY wee-beng >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 2/11/2015 12:30 AM, Barry Smith wrote: >>>>>>>>>>>>>>>>> You used gmres with 48 processes but richardson with 96. You need to be careful and make sure you don't change the solvers when you change the number of processors since you can get very different inconsistent results >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Anyways all the time is being spent in the BoomerAMG algebraic multigrid setup and it is is scaling badly. When you double the problem size and number of processes it went from 3.2445e+01 to 4.3599e+02 seconds. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> PCSetUp 3 1.0 3.2445e+01 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 62 8 0 0 4 62 8 0 0 5 11 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> PCSetUp 3 1.0 4.3599e+02 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 85 18 0 0 6 85 18 0 0 6 2 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Now is the Poisson problem changing at each timestep or can you use the same preconditioner built with BoomerAMG for all the time steps? Algebraic multigrid has a large set up time that you often doesn't matter if you have many time steps but if you have to rebuild it each timestep it is too large? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> You might also try -pc_type gamg and see how PETSc's algebraic multigrid scales for your problem/machine. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Barry >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Nov 1, 2015, at 7:30 AM, TAY wee-beng wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On 1/11/2015 10:00 AM, Barry Smith wrote: >>>>>>>>>>>>>>>>>>>> On Oct 31, 2015, at 8:43 PM, TAY wee-beng wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On 1/11/2015 12:47 AM, Matthew Knepley wrote: >>>>>>>>>>>>>>>>>>>>> On Sat, Oct 31, 2015 at 11:34 AM, TAY wee-beng wrote: >>>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I understand that as mentioned in the faq, due to the limitations in memory, the scaling is not linear. So, I am trying to write a proposal to use a supercomputer. >>>>>>>>>>>>>>>>>>>>> Its specs are: >>>>>>>>>>>>>>>>>>>>> Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of memory per node) >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 8 cores / processor >>>>>>>>>>>>>>>>>>>>> Interconnect: Tofu (6-dimensional mesh/torus) Interconnect >>>>>>>>>>>>>>>>>>>>> Each cabinet contains 96 computing nodes, >>>>>>>>>>>>>>>>>>>>> One of the requirement is to give the performance of my current code with my current set of data, and there is a formula to calculate the estimated parallel efficiency when using the new large set of data >>>>>>>>>>>>>>>>>>>>> There are 2 ways to give performance: >>>>>>>>>>>>>>>>>>>>> 1. Strong scaling, which is defined as how the elapsed time varies with the number of processors for a fixed >>>>>>>>>>>>>>>>>>>>> problem. >>>>>>>>>>>>>>>>>>>>> 2. Weak scaling, which is defined as how the elapsed time varies with the number of processors for a >>>>>>>>>>>>>>>>>>>>> fixed problem size per processor. >>>>>>>>>>>>>>>>>>>>> I ran my cases with 48 and 96 cores with my current cluster, giving 140 and 90 mins respectively. This is classified as strong scaling. >>>>>>>>>>>>>>>>>>>>> Cluster specs: >>>>>>>>>>>>>>>>>>>>> CPU: AMD 6234 2.4GHz >>>>>>>>>>>>>>>>>>>>> 8 cores / processor (CPU) >>>>>>>>>>>>>>>>>>>>> 6 CPU / node >>>>>>>>>>>>>>>>>>>>> So 48 Cores / CPU >>>>>>>>>>>>>>>>>>>>> Not sure abt the memory / node >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> The parallel efficiency ?En? for a given degree of parallelism ?n? indicates how much the program is >>>>>>>>>>>>>>>>>>>>> efficiently accelerated by parallel processing. ?En? is given by the following formulae. Although their >>>>>>>>>>>>>>>>>>>>> derivation processes are different depending on strong and weak scaling, derived formulae are the >>>>>>>>>>>>>>>>>>>>> same. >>>>>>>>>>>>>>>>>>>>> From the estimated time, my parallel efficiency using Amdahl's law on the current old cluster was 52.7%. >>>>>>>>>>>>>>>>>>>>> So is my results acceptable? >>>>>>>>>>>>>>>>>>>>> For the large data set, if using 2205 nodes (2205X8cores), my expected parallel efficiency is only 0.5%. The proposal recommends value of > 50%. >>>>>>>>>>>>>>>>>>>>> The problem with this analysis is that the estimated serial fraction from Amdahl's Law changes as a function >>>>>>>>>>>>>>>>>>>>> of problem size, so you cannot take the strong scaling from one problem and apply it to another without a >>>>>>>>>>>>>>>>>>>>> model of this dependence. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Weak scaling does model changes with problem size, so I would measure weak scaling on your current >>>>>>>>>>>>>>>>>>>>> cluster, and extrapolate to the big machine. I realize that this does not make sense for many scientific >>>>>>>>>>>>>>>>>>>>> applications, but neither does requiring a certain parallel efficiency. >>>>>>>>>>>>>>>>>>>> Ok I check the results for my weak scaling it is even worse for the expected parallel efficiency. From the formula used, it's obvious it's doing some sort of exponential extrapolation decrease. So unless I can achieve a near > 90% speed up when I double the cores and problem size for my current 48/96 cores setup, extrapolating from about 96 nodes to 10,000 nodes will give a much lower expected parallel efficiency for the new case. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> However, it's mentioned in the FAQ that due to memory requirement, it's impossible to get >90% speed when I double the cores and problem size (ie linear increase in performance), which means that I can't get >90% speed up when I double the cores and problem size for my current 48/96 cores setup. Is that so? >>>>>>>>>>>>>>>>>>> What is the output of -ksp_view -log_summary on the problem and then on the problem doubled in size and number of processors? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Barry >>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I have attached the output >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 48 cores: log48 >>>>>>>>>>>>>>>>>> 96 cores: log96 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> There are 2 solvers - The momentum linear eqn uses bcgs, while the Poisson eqn uses hypre BoomerAMG. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Problem size doubled from 158x266x150 to 158x266x300. >>>>>>>>>>>>>>>>>>>> So is it fair to say that the main problem does not lie in my programming skills, but rather the way the linear equations are solved? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Matt >>>>>>>>>>>>>>>>>>>>> Is it possible for this type of scaling in PETSc (>50%), when using 17640 (2205X8) cores? >>>>>>>>>>>>>>>>>>>>> Btw, I do not have access to the system. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Sent using CloudMagic Email >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>>>>>>>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>> >>>> -------------- next part -------------- 0.000000000000000E+000 0.353000000000000 0.000000000000000E+000 90.0000000000000 0.000000000000000E+000 0.000000000000000E+000 1.00000000000000 0.400000000000000 0 -400000 AB,AA,BB -2.00050000002375 2.00050000002375 2.61200002906844 2.53550002543489 size_x,size_y,size_z 79 133 75 body_cg_ini 0.523700833348298 0.778648765134454 7.03282656467989 Warning - length difference between element and cell max_element_length,min_element_length,min_delta 0.000000000000000E+000 10000000000.0000 4.300000000000000E-002 maximum ngh_surfaces and ngh_vertics are 149 68 minimum ngh_surfaces and ngh_vertics are 54 22 body_cg_ini 0.896813342835977 -0.976707581163755 7.03282656467989 Warning - length difference between element and cell max_element_length,min_element_length,min_delta 0.000000000000000E+000 10000000000.0000 4.300000000000000E-002 maximum ngh_surfaces and ngh_vertics are 149 68 minimum ngh_surfaces and ngh_vertics are 54 22 min IIB_cell_no 0 max IIB_cell_no 265 final initial IIB_cell_no 1325 min I_cell_no 0 max I_cell_no 94 final initial I_cell_no 470 size(IIB_cell_u),size(I_cell_u),size(IIB_equal_cell_u),size(I_equal_cell_u) 1325 470 1325 470 IIB_I_cell_no_uvw_total1 265 270 255 94 91 95 IIB_I_cell_no_uvw_total2 273 280 267 97 94 98 1 0.00150000 0.14647311 0.14738627 1.08799969 0.19042093E+02 0.17803989E+00 0.78750668E+06 escape_time reached, so abort body 1 implicit forces and moment 1 0.869079172306081 -0.476901556137086 8.158436217625725E-002 0.428147637893997 0.558124953374670 -0.928673815311215 body 2 implicit forces and moment 2 0.551071812724179 0.775546545440679 0.135476357173946 -0.634587321947283 0.290234875219091 0.936523266880710 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./a.out on a petsc-3.6.2_shared_rel named n12-04 with 8 processors, by wtay Fri Nov 6 03:01:51 2015 Using Petsc Release Version 3.6.2, Oct, 02, 2015 Max Max/Min Avg Total Time (sec): 3.802e+03 1.00000 3.802e+03 Objects: 5.560e+02 1.00361 5.542e+02 Flops: 1.594e+12 1.12475 1.484e+12 1.187e+13 Flops/sec: 4.192e+08 1.12475 3.902e+08 3.122e+09 MPI Messages: 3.857e+06 2.54046 2.913e+06 2.331e+07 MPI Message Lengths: 7.441e+10 2.00621 2.218e+04 5.169e+11 MPI Reductions: 2.063e+05 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 3.8019e+03 100.0% 1.1869e+13 100.0% 2.331e+07 100.0% 2.218e+04 100.0% 2.063e+05 100.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 1066750 1.0 1.6362e+03 1.2 5.96e+11 1.2 1.9e+07 2.6e+04 0.0e+00 37 37 81 95 0 37 37 81 95 0 2695 MatMultAdd 164088 1.0 1.4247e+02 1.2 3.77e+10 1.2 2.3e+06 5.8e+03 0.0e+00 3 2 10 3 0 3 2 10 3 0 1961 MatMultTranspose 164088 1.0 2.0693e+02 1.4 3.77e+10 1.2 2.3e+06 5.8e+03 0.0e+00 4 2 10 3 0 4 2 10 3 0 1350 MatSolve 82341277.2 2.5719e+00 1.4 1.16e+09 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3316 MatSOR 984572 1.0 1.6424e+03 1.2 5.00e+11 1.1 0.0e+00 0.0e+00 0.0e+00 39 31 0 0 0 39 31 0 0 0 2275 MatLUFactorSym 1 1.0 3.5048e-05 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatLUFactorNum 100 1.0 5.1047e+00 1.2 6.34e+08 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 919 MatILUFactorSym 1 1.0 3.7481e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatConvert 4 1.0 1.7271e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatScale 12 1.0 1.4342e-02 1.2 3.23e+06 1.2 7.4e+01 2.4e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 1665 MatResidual 164088 1.0 2.8096e+02 1.3 9.48e+10 1.2 3.0e+06 2.4e+04 0.0e+00 6 6 13 14 0 6 6 13 14 0 2493 MatAssemblyBegin 153 1.0 2.7167e+00 8.5 0.00e+00 0.0 1.1e+02 6.9e+03 2.5e+02 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 153 1.0 1.3169e+00 1.1 0.00e+00 0.0 7.5e+02 6.5e+03 1.8e+02 0 0 0 0 0 0 0 0 0 0 0 MatGetRow 462356 1.1 1.5225e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetRowIJ 2 2.0 9.0599e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetSubMatrix 4 1.0 3.0942e-03 1.0 0.00e+00 0.0 1.2e+02 1.6e+03 6.4e+01 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 2 2.0 6.7458e-03 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatCoarsen 4 1.0 5.3186e-02 1.3 0.00e+00 0.0 6.4e+02 1.3e+04 1.6e+01 0 0 0 0 0 0 0 0 0 0 0 MatAXPY 4 1.0 1.0154e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMatMult 4 1.0 1.4702e-01 1.0 2.31e+06 1.2 4.5e+02 1.2e+04 6.4e+01 0 0 0 0 0 0 0 0 0 0 116 MatMatMultSym 4 1.0 9.9177e-02 1.0 0.00e+00 0.0 3.8e+02 1.0e+04 5.6e+01 0 0 0 0 0 0 0 0 0 0 0 MatMatMultNum 4 1.0 4.8954e-02 1.0 2.31e+06 1.2 7.4e+01 2.4e+04 8.0e+00 0 0 0 0 0 0 0 0 0 0 349 MatPtAP 4 1.0 6.1620e-01 1.0 2.66e+07 1.4 7.5e+02 4.0e+04 6.8e+01 0 0 0 0 0 0 0 0 0 0 307 MatPtAPSymbolic 4 1.0 2.3832e-01 1.0 0.00e+00 0.0 4.5e+02 4.6e+04 2.8e+01 0 0 0 0 0 0 0 0 0 0 0 MatPtAPNumeric 4 1.0 3.7817e-01 1.0 2.66e+07 1.4 3.0e+02 3.0e+04 4.0e+01 0 0 0 0 0 0 0 0 0 0 500 MatTrnMatMult 1 1.0 6.1167e-01 1.0 1.05e+07 1.2 8.4e+01 1.7e+05 1.9e+01 0 0 0 0 0 0 0 0 0 0 127 MatTrnMatMultSym 1 1.0 2.9542e-01 1.0 0.00e+00 0.0 7.0e+01 9.2e+04 1.7e+01 0 0 0 0 0 0 0 0 0 0 0 MatTrnMatMultNum 1 1.0 3.1680e-01 1.0 1.05e+07 1.2 1.4e+01 5.8e+05 2.0e+00 0 0 0 0 0 0 0 0 0 0 246 MatGetLocalMat 14 1.0 4.1928e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetBrAoCol 12 1.0 3.7147e-02 1.4 0.00e+00 0.0 5.2e+02 4.1e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetSymTrans 8 1.0 7.8440e-03 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPGMRESOrthog 80756 1.0 3.9613e+02 1.8 2.55e+11 1.1 0.0e+00 0.0e+00 8.1e+04 8 16 0 0 39 8 16 0 0 39 4833 KSPSetUp 213 1.0 4.7638e-02 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+01 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 199 1.0 3.7658e+03 1.0 1.59e+12 1.1 2.3e+07 2.2e+04 2.0e+05 99100100100 99 99100100100 99 3152 VecDot 198 1.0 7.2550e-01 2.1 1.25e+08 1.1 0.0e+00 0.0e+00 2.0e+02 0 0 0 0 0 0 0 0 0 0 1290 VecDotNorm2 99 1.0 5.3831e-01 3.5 1.25e+08 1.1 0.0e+00 0.0e+00 9.9e+01 0 0 0 0 0 0 0 0 0 0 1739 VecMDot 80756 1.0 3.0185e+02 2.8 1.28e+11 1.1 0.0e+00 0.0e+00 8.1e+04 5 8 0 0 39 5 8 0 0 39 3171 VecNorm 123352 1.0 4.6411e+01 4.4 8.75e+09 1.1 0.0e+00 0.0e+00 1.2e+05 1 1 0 0 60 1 1 0 0 60 1414 VecScale 123124 1.0 3.6298e+00 1.3 4.31e+09 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 8911 VecCopy 206684 1.0 1.3012e+01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 699500 1.0 1.0120e+01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 43666 1.0 6.9073e-01 1.2 5.55e+08 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 6020 VecAYPX 1312704 1.0 1.2019e+02 1.3 4.74e+10 1.1 0.0e+00 0.0e+00 0.0e+00 3 3 0 0 0 3 3 0 0 0 2960 VecAXPBYCZ 656550 1.0 8.2020e+01 1.3 9.51e+10 1.1 0.0e+00 0.0e+00 0.0e+00 2 6 0 0 0 2 6 0 0 0 8696 VecWAXPY 198 1.0 3.3118e-01 1.4 1.25e+08 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2827 VecMAXPY 123154 1.0 1.1971e+02 1.2 1.36e+11 1.1 0.0e+00 0.0e+00 0.0e+00 3 9 0 0 0 3 9 0 0 0 8519 VecAssemblyBegin 416 1.0 5.0510e-01 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+03 0 0 0 0 1 0 0 0 0 1 0 VecAssemblyEnd 416 1.0 1.0114e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecPointwiseMult 44 1.0 7.5576e-03 1.9 1.27e+06 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1262 VecScatterBegin 1394945 1.0 4.2240e+01 2.3 0.00e+00 0.0 2.3e+07 2.2e+04 0.0e+00 1 0100100 0 1 0100100 0 0 VecScatterEnd 1394945 1.0 7.3695e+02 3.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 11 0 0 0 0 11 0 0 0 0 0 VecSetRandom 4 1.0 3.2659e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecNormalize 123154 1.0 4.8155e+01 3.5 1.29e+10 1.1 0.0e+00 0.0e+00 1.2e+05 1 1 0 0 60 1 1 0 0 60 2015 PCGAMGGraph_AGG 4 1.0 3.8426e-01 1.0 2.31e+06 1.2 2.2e+02 1.2e+04 4.8e+01 0 0 0 0 0 0 0 0 0 0 44 PCGAMGCoarse_AGG 4 1.0 7.2042e-01 1.0 1.05e+07 1.2 7.9e+02 4.2e+04 5.1e+01 0 0 0 0 0 0 0 0 0 0 108 PCGAMGProl_AGG 4 1.0 1.6819e-01 1.0 0.00e+00 0.0 3.4e+02 2.0e+04 9.6e+01 0 0 0 0 0 0 0 0 0 0 0 PCGAMGPOpt_AGG 4 1.0 4.5917e-01 1.0 5.82e+07 1.1 1.2e+03 2.0e+04 2.0e+02 0 0 0 0 0 0 0 0 0 0 945 GAMG: createProl 4 1.0 1.7316e+00 1.0 7.10e+07 1.1 2.5e+03 2.6e+04 4.0e+02 0 0 0 0 0 0 0 0 0 0 305 Graph 8 1.0 3.8322e-01 1.0 2.31e+06 1.2 2.2e+02 1.2e+04 4.8e+01 0 0 0 0 0 0 0 0 0 0 45 MIS/Agg 4 1.0 5.3304e-02 1.3 0.00e+00 0.0 6.4e+02 1.3e+04 1.6e+01 0 0 0 0 0 0 0 0 0 0 0 SA: col data 4 1.0 2.2745e-02 1.1 0.00e+00 0.0 1.5e+02 4.0e+04 4.0e+01 0 0 0 0 0 0 0 0 0 0 0 SA: frmProl0 4 1.0 1.3956e-01 1.0 0.00e+00 0.0 1.9e+02 4.6e+03 4.0e+01 0 0 0 0 0 0 0 0 0 0 0 SA: smooth 4 1.0 4.5917e-01 1.0 5.82e+07 1.1 1.2e+03 2.0e+04 2.0e+02 0 0 0 0 0 0 0 0 0 0 945 GAMG: partLevel 4 1.0 6.2054e-01 1.0 2.66e+07 1.4 8.9e+02 3.3e+04 1.7e+02 0 0 0 0 0 0 0 0 0 0 305 repartition 2 1.0 2.2912e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 Invert-Sort 2 1.0 3.1805e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 0 0 0 0 0 0 0 Move A 2 1.0 1.4431e-03 1.1 0.00e+00 0.0 6.9e+01 2.6e+03 3.4e+01 0 0 0 0 0 0 0 0 0 0 0 Move P 2 1.0 1.9579e-03 1.0 0.00e+00 0.0 4.8e+01 6.1e+01 3.4e+01 0 0 0 0 0 0 0 0 0 0 0 PCSetUp 200 1.0 7.5039e+00 1.2 7.32e+08 1.2 3.4e+03 2.8e+04 5.9e+02 0 0 0 0 0 0 0 0 0 0 721 PCSetUpOnBlocks 41121 1.0 5.1800e+00 1.2 6.34e+08 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 905 PCApply 41319 1.0 3.3777e+03 1.1 1.26e+12 1.1 2.3e+07 2.1e+04 1.2e+05 87 79 98 91 60 87 79 98 91 60 2771 SFSetGraph 4 1.0 2.0399e-03 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFBcastBegin 24 1.0 1.8663e-02 4.9 0.00e+00 0.0 6.4e+02 1.3e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFBcastEnd 24 1.0 1.0988e-02 3.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 106 106 220273512 0 Matrix Coarsen 4 4 2512 0 Krylov Solver 18 18 287944 0 Vector 294 294 93849960 0 Vector Scatter 29 29 31760 0 Index Set 78 78 2936420 0 Preconditioner 18 18 17444 0 Star Forest Bipartite Graph 4 4 3424 0 PetscRandom 4 4 2496 0 Viewer 1 0 0 0 ======================================================================================================================== Average time to get PetscTime(): 1.90735e-07 Average time for MPI_Barrier(): 3.19481e-06 Average time for zero size MPI_Send(): 1.31428e-05 #PETSc Option Table entries: -log_summary -poisson_pc_type gamg #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --with-mpi-dir=/opt/ud/openmpi-1.8.8/ --with-blas-lapack-dir=/opt/ud/intel_xe_2013sp1/mkl/lib/intel64/ --with-debugging=0 --download-hypre=1 --prefix=/home/wtay/Lib/petsc-3.6.2_shared_rel --known-mpi-shared=1 --with-shared-libraries --with-fortran-interfaces=1 ----------------------------------------- Libraries compiled on Sun Oct 18 17:34:07 2015 on hpc12 Machine characteristics: Linux-3.10.0-123.20.1.el7.x86_64-x86_64-with-centos-7.1.1503-Core Using PETSc directory: /home/wtay/Codes/petsc-3.6.2 Using PETSc arch: petsc-3.6.2_shared_rel ----------------------------------------- Using C compiler: /opt/ud/openmpi-1.8.8/bin/mpicc -fPIC -wd1572 -O3 ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: /opt/ud/openmpi-1.8.8/bin/mpif90 -fPIC -O3 ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/include -I/home/wtay/Codes/petsc-3.6.2/include -I/home/wtay/Codes/petsc-3.6.2/include -I/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/include -I/home/wtay/Lib/petsc-3.6.2_shared_rel/include -I/opt/ud/openmpi-1.8.8/include ----------------------------------------- Using C linker: /opt/ud/openmpi-1.8.8/bin/mpicc Using Fortran linker: /opt/ud/openmpi-1.8.8/bin/mpif90 Using libraries: -Wl,-rpath,/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/lib -L/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/lib -lpetsc -Wl,-rpath,/home/wtay/Lib/petsc-3.6.2_shared_rel/lib -L/home/wtay/Lib/petsc-3.6.2_shared_rel/lib -lHYPRE -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -lmpi_cxx -Wl,-rpath,/opt/ud/intel_xe_2013sp1/mkl/lib/intel64 -L/opt/ud/intel_xe_2013sp1/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -lX11 -lhwloc -lssl -lcrypto -lmpi_usempi -lmpi_mpifh -lifport -lifcore -lm -lmpi_cxx -ldl -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -lmpi -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -limf -lsvml -lirng -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lpthread -lirc_s -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -ldl ----------------------------------------- -------------- next part -------------- 0.000000000000000E+000 0.353000000000000 0.000000000000000E+000 90.0000000000000 0.000000000000000E+000 0.000000000000000E+000 1.00000000000000 0.400000000000000 0 -400000 z grid divid too small! myid,each procs z size 23 2 z grid divid too small! myid,each procs z size 50 2 z grid divid too small! myid,each procs z size 31 2 z grid divid too small! myid,each procs z size 25 2 z grid divid too small! myid,each procs z size 56 2 z grid divid too small! myid,each procs z size 28 2 z grid divid too small! myid,each procs z size 32 2 z grid divid too small! myid,each procs z size 57 2 z grid divid too small! myid,each procs z size 62 2 z grid divid too small! myid,each procs z size 39 2 z grid divid too small! myid,each procs z size 49 2 z grid divid too small! myid,each procs z size 47 2 z grid divid too small! myid,each procs z size 33 2 z grid divid too small! myid,each procs z size 43 2 z grid divid too small! myid,each procs z size 46 2 z grid divid too small! myid,each procs z size 37 2 z grid divid too small! myid,each procs z size 40 2 z grid divid too small! myid,each procs z size 44 2 z grid divid too small! myid,each procs z size 38 2 z grid divid too small! myid,each procs z size 41 2 z grid divid too small! myid,each procs z size 51 2 z grid divid too small! myid,each procs z size 59 2 z grid divid too small! myid,each procs z size 48 2 z grid divid too small! myid,each procs z size 36 2 z grid divid too small! myid,each procs z size 42 2 z grid divid too small! myid,each procs z size 45 2 z grid divid too small! myid,each procs z size 29 2 z grid divid too small! myid,each procs z size 27 2 z grid divid too small! myid,each procs z size 35 2 z grid divid too small! myid,each procs z size 34 2 z grid divid too small! myid,each procs z size 54 2 z grid divid too small! myid,each procs z size 60 2 z grid divid too small! myid,each procs z size 53 2 z grid divid too small! myid,each procs z size 58 2 z grid divid too small! myid,each procs z size 63 2 z grid divid too small! myid,each procs z size 52 2 z grid divid too small! myid,each procs z size 61 2 z grid divid too small! myid,each procs z size 55 2 z grid divid too small! myid,each procs z size 30 2 AB,AA,BB -2.47900002275128 2.50750002410496 3.46600006963126 3.40250006661518 size_x,size_y,size_z 158 266 150 z grid divid too small! myid,each procs z size 24 2 z grid divid too small! myid,each procs z size 26 2 z grid divid too small! myid,each procs z size 22 2 body_cg_ini 0.523700833348298 0.778648765134454 7.03282656467989 Warning - length difference between element and cell max_element_length,min_element_length,min_delta 0.000000000000000E+000 10000000000.0000 1.800000000000000E-002 maximum ngh_surfaces and ngh_vertics are 42 22 minimum ngh_surfaces and ngh_vertics are 28 10 body_cg_ini 0.896813342835977 -0.976707581163755 7.03282656467989 Warning - length difference between element and cell max_element_length,min_element_length,min_delta 0.000000000000000E+000 10000000000.0000 1.800000000000000E-002 maximum ngh_surfaces and ngh_vertics are 42 22 minimum ngh_surfaces and ngh_vertics are 28 10 min IIB_cell_no 0 max IIB_cell_no 429 final initial IIB_cell_no 2145 min I_cell_no 0 max I_cell_no 460 final initial I_cell_no 2300 size(IIB_cell_u),size(I_cell_u),size(IIB_equal_cell_u),size(I_equal_cell_u) 2145 2300 2145 2300 IIB_I_cell_no_uvw_total1 3090 3094 3078 3080 3074 3073 IIB_I_cell_no_uvw_total2 3102 3108 3089 3077 3060 3086 P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged Linear poisson_ solve did not converge due to DIVERGED_ITS iterations 10000 P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged P Diverged -------------------------------------------------------------------------- mpiexec has exited due to process rank 37 with PID 0 on node n12-10 exiting improperly. There are three reasons this could occur: 1. this process did not call "init" before exiting, but others in the job did. This can cause a job to hang indefinitely while it waits for all processes to call "init". By rule, if one process calls "init", then ALL processes must call "init" prior to termination. 2. this process called "init", but exited without calling "finalize". By rule, all processes that call "init" MUST call "finalize" prior to exiting or it will be considered an "abnormal termination" 3. this process called "MPI_Abort" or "orte_abort" and the mca parameter orte_create_session_dirs is set to false. In this case, the run-time cannot detect that the abort call was an abnormal termination. Hence, the only error message you will receive is this one. This may have caused other processes in the application to be terminated by signals sent by mpiexec (as reported here). You can avoid this message by specifying -quiet on the mpiexec command line. -------------------------------------------------------------------------- From bsmith at mcs.anl.gov Thu Nov 5 22:08:31 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 5 Nov 2015 22:08:31 -0600 Subject: [petsc-users] Scaling with number of cores In-Reply-To: <563C14CB.8050205@gmail.com> References: <5634EDA4.5030203@gmail.com> <56356E2A.8090000@gmail.com> <0BEDF334-C0C6-49D1-9FE8-D178181C5F00@mcs.anl.gov> <5636140A.3040506@gmail.com> <61EEB3FD-A093-475C-9841-9E79BB120385@mcs.anl.gov> <5636BDF3.8000109@gmail.com> <5636E059.2010107@gmail.com> <501DC517-2887-4222-9ECD-9951D521A4E5@mcs.anl.gov> <56370085.7070502@gmail.com> <6FF57BB6-FB0D-49F0-BF84-E4EA760FC010@mcs.anl.gov> <56372A12.90900@gmail.com> <563839F8.9080000@gmail.com> <589BE4CF-18B5-4F04-A442-7FB0F2F3AFC1@mcs.anl.gov> <5638AD42.9060609@gmail.com> <2057DCBD-BDB9-4424-A4E3-C1BA5BB8C214@mcs.anl.gov> <563ACD5F.6060301@gmail.com> <56A89BE9-5AEC-4A99-A888-2B7EF35C7BC9@mcs.anl.gov> <563B7C97.1070604@gmail.com> <0F2039C3-52A4-4BB5-BE77-6E75163BFDD9@mcs.anl.gov> <563C14CB.8050205@gmail.com> Message-ID: <108ACE20-8B56-4FA6-8CEE-D455B975DEBF@mcs.anl.gov> Ok the 64 case not converging makes no sense. Run it with ksp_monitor and ksp_converged_reason for the pressure solve turned on and -info You need to figure out why it is not converging. Barry > On Nov 5, 2015, at 8:47 PM, TAY wee-beng wrote: > > Hi, > > I have removed the nullspace and attached the new logs. > > Thank you > > Yours sincerely, > > TAY wee-beng > > On 6/11/2015 12:07 AM, Barry Smith wrote: >>> On Nov 5, 2015, at 9:58 AM, TAY wee-beng wrote: >>> >>> Sorry I realised that I didn't use gamg and that's why. But if I use gamg, the 8 core case worked, but the 64 core case shows p diverged. >> Where is the log file for the 8 core case? And where is all the output from where it fails with 64 cores? Include -ksp_monitor_true_residual and -ksp_converged_reason >> >> Barry >> >>> Why is this so? Btw, I have also added nullspace in my code. >>> >>> Thank you. >>> >>> Yours sincerely, >>> >>> TAY wee-beng >>> >>> On 5/11/2015 12:03 PM, Barry Smith wrote: >>>> There is a problem here. The -log_summary doesn't show all the events associated with the -pc_type gamg preconditioner it should have rows like >>>> >>>> VecDot 2 1.0 6.1989e-06 1.0 1.00e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1613 >>>> VecMDot 134 1.0 5.4145e-04 1.0 1.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 3025 >>>> VecNorm 154 1.0 2.4176e-04 1.0 3.82e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1578 >>>> VecScale 148 1.0 1.6928e-04 1.0 1.76e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1039 >>>> VecCopy 106 1.0 1.2255e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> VecSet 474 1.0 5.1236e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> VecAXPY 54 1.0 1.3471e-04 1.0 2.35e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1742 >>>> VecAYPX 384 1.0 5.7459e-04 1.0 4.94e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 860 >>>> VecAXPBYCZ 192 1.0 4.7398e-04 1.0 9.88e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 2085 >>>> VecWAXPY 2 1.0 7.8678e-06 1.0 5.00e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 636 >>>> VecMAXPY 148 1.0 8.1539e-04 1.0 1.96e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 3 0 0 0 1 3 0 0 0 2399 >>>> VecPointwiseMult 66 1.0 1.1253e-04 1.0 6.79e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 604 >>>> VecScatterBegin 45 1.0 6.3419e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> VecSetRandom 6 1.0 3.0994e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> VecReduceArith 4 1.0 1.3113e-05 1.0 2.00e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1525 >>>> VecReduceComm 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> VecNormalize 148 1.0 4.4799e-04 1.0 5.27e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1177 >>>> MatMult 424 1.0 8.9276e-03 1.0 2.09e+07 1.0 0.0e+00 0.0e+00 0.0e+00 7 37 0 0 0 7 37 0 0 0 2343 >>>> MatMultAdd 48 1.0 5.0926e-04 1.0 1.05e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 2069 >>>> MatMultTranspose 48 1.0 9.8586e-04 1.0 1.05e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 1069 >>>> MatSolve 16 1.0 2.2173e-05 1.0 1.02e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 460 >>>> MatSOR 354 1.0 1.0547e-02 1.0 1.72e+07 1.0 0.0e+00 0.0e+00 0.0e+00 9 31 0 0 0 9 31 0 0 0 1631 >>>> MatLUFactorSym 2 1.0 4.7922e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatLUFactorNum 2 1.0 2.5272e-05 1.0 7.75e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 307 >>>> MatScale 18 1.0 1.7142e-04 1.0 1.50e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 874 >>>> MatResidual 48 1.0 1.0548e-03 1.0 2.33e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 1 4 0 0 0 2212 >>>> MatAssemblyBegin 57 1.0 4.7684e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatAssemblyEnd 57 1.0 1.9786e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >>>> MatGetRow 21616 1.0 1.8497e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >>>> MatGetRowIJ 2 1.0 6.9141e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatGetOrdering 2 1.0 6.0797e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatCoarsen 6 1.0 9.3222e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatZeroEntries 2 1.0 3.9101e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatAXPY 6 1.0 1.7998e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >>>> MatFDColorCreate 1 1.0 3.2902e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatFDColorSetUp 1 1.0 1.6739e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>>> MatFDColorApply 2 1.0 1.3199e-03 1.0 2.41e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 1 4 0 0 0 1826 >>>> MatFDColorFunc 42 1.0 7.4601e-04 1.0 2.20e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 1 4 0 0 0 2956 >>>> MatMatMult 6 1.0 5.1048e-03 1.0 1.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 4 2 0 0 0 4 2 0 0 0 241 >>>> MatMatMultSym 6 1.0 3.2601e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 >>>> MatMatMultNum 6 1.0 1.8158e-03 1.0 1.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 679 >>>> MatPtAP 6 1.0 2.1328e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 0.0e+00 18 11 0 0 0 18 11 0 0 0 283 >>>> MatPtAPSymbolic 6 1.0 1.0073e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 8 0 0 0 0 8 0 0 0 0 0 >>>> MatPtAPNumeric 6 1.0 1.1230e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 0.0e+00 9 11 0 0 0 9 11 0 0 0 537 >>>> MatTrnMatMult 2 1.0 7.2789e-04 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 75 >>>> MatTrnMatMultSym 2 1.0 5.7006e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatTrnMatMultNum 2 1.0 1.5473e-04 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 352 >>>> MatGetSymTrans 8 1.0 3.1638e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> KSPGMRESOrthog 134 1.0 1.3156e-03 1.0 3.28e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 6 0 0 0 1 6 0 0 0 2491 >>>> KSPSetUp 24 1.0 4.6754e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> KSPSolve 2 1.0 1.1291e-01 1.0 5.32e+07 1.0 0.0e+00 0.0e+00 0.0e+00 94 95 0 0 0 94 95 0 0 0 471 >>>> PCGAMGGraph_AGG 6 1.0 1.2108e-02 1.0 1.82e+04 1.0 0.0e+00 0.0e+00 0.0e+00 10 0 0 0 0 10 0 0 0 0 2 >>>> PCGAMGCoarse_AGG 6 1.0 1.1127e-03 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 49 >>>> PCGAMGProl_AGG 6 1.0 4.1062e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 34 0 0 0 0 34 0 0 0 0 0 >>>> PCGAMGPOpt_AGG 6 1.0 1.1200e-02 1.0 5.98e+06 1.0 0.0e+00 0.0e+00 0.0e+00 9 11 0 0 0 9 11 0 0 0 534 >>>> GAMG: createProl 6 1.0 6.5530e-02 1.0 6.06e+06 1.0 0.0e+00 0.0e+00 0.0e+00 55 11 0 0 0 55 11 0 0 0 92 >>>> Graph 12 1.0 1.1692e-02 1.0 1.82e+04 1.0 0.0e+00 0.0e+00 0.0e+00 10 0 0 0 0 10 0 0 0 0 2 >>>> MIS/Agg 6 1.0 1.4496e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> SA: col data 6 1.0 7.1526e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> SA: frmProl0 6 1.0 4.0917e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 34 0 0 0 0 34 0 0 0 0 0 >>>> SA: smooth 6 1.0 1.1198e-02 1.0 5.98e+06 1.0 0.0e+00 0.0e+00 0.0e+00 9 11 0 0 0 9 11 0 0 0 534 >>>> GAMG: partLevel 6 1.0 2.1341e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 0.0e+00 18 11 0 0 0 18 11 0 0 0 283 >>>> PCSetUp 4 1.0 8.8020e-02 1.0 1.21e+07 1.0 0.0e+00 0.0e+00 0.0e+00 74 22 0 0 0 74 22 0 0 0 137 >>>> PCSetUpOnBlocks 16 1.0 1.8382e-04 1.0 7.75e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 42 >>>> PCApply 16 1.0 2.3858e-02 1.0 3.91e+07 1.0 0.0e+00 0.0e+00 0.0e+00 20 70 0 0 0 20 70 0 0 0 1637 >>>> >>>> >>>> Are you sure you ran with -pc_type gamg ? What about running with -info does it print anything about gamg? What about -ksp_view does it indicate it is using the gamg preconditioner? >>>> >>>> >>>>> On Nov 4, 2015, at 9:30 PM, TAY wee-beng wrote: >>>>> >>>>> Hi, >>>>> >>>>> I have attached the 2 logs. >>>>> >>>>> Thank you >>>>> >>>>> Yours sincerely, >>>>> >>>>> TAY wee-beng >>>>> >>>>> On 4/11/2015 1:11 AM, Barry Smith wrote: >>>>>> Ok, the convergence looks good. Now run on 8 and 64 processes as before with -log_summary and not -ksp_monitor to see how it scales. >>>>>> >>>>>> Barry >>>>>> >>>>>>> On Nov 3, 2015, at 6:49 AM, TAY wee-beng wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I tried and have attached the log. >>>>>>> >>>>>>> Ya, my Poisson eqn has Neumann boundary condition. Do I need to specify some null space stuff? Like KSPSetNullSpace or MatNullSpaceCreate? >>>>>>> >>>>>>> Thank you >>>>>>> >>>>>>> Yours sincerely, >>>>>>> >>>>>>> TAY wee-beng >>>>>>> >>>>>>> On 3/11/2015 12:45 PM, Barry Smith wrote: >>>>>>>>> On Nov 2, 2015, at 10:37 PM, TAY wee-beng wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I tried : >>>>>>>>> >>>>>>>>> 1. -poisson_pc_gamg_agg_nsmooths 1 -poisson_pc_type gamg >>>>>>>>> >>>>>>>>> 2. -poisson_pc_type gamg >>>>>>>> Run with -poisson_ksp_monitor_true_residual -poisson_ksp_monitor_converged_reason >>>>>>>> Does your poisson have Neumann boundary conditions? Do you have any zeros on the diagonal for the matrix (you shouldn't). >>>>>>>> >>>>>>>> There may be something wrong with your poisson discretization that was also messing up hypre >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Both options give: >>>>>>>>> >>>>>>>>> 1 0.00150000 0.00000000 0.00000000 1.00000000 NaN NaN NaN >>>>>>>>> M Diverged but why?, time = 2 >>>>>>>>> reason = -9 >>>>>>>>> >>>>>>>>> How can I check what's wrong? >>>>>>>>> >>>>>>>>> Thank you >>>>>>>>> >>>>>>>>> Yours sincerely, >>>>>>>>> >>>>>>>>> TAY wee-beng >>>>>>>>> >>>>>>>>> On 3/11/2015 3:18 AM, Barry Smith wrote: >>>>>>>>>> hypre is just not scaling well here. I do not know why. Since hypre is a block box for us there is no way to determine why the poor scaling. >>>>>>>>>> >>>>>>>>>> If you make the same two runs with -pc_type gamg there will be a lot more information in the log summary about in what routines it is scaling well or poorly. >>>>>>>>>> >>>>>>>>>> Barry >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Nov 2, 2015, at 3:17 AM, TAY wee-beng wrote: >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I have attached the 2 files. >>>>>>>>>>> >>>>>>>>>>> Thank you >>>>>>>>>>> >>>>>>>>>>> Yours sincerely, >>>>>>>>>>> >>>>>>>>>>> TAY wee-beng >>>>>>>>>>> >>>>>>>>>>> On 2/11/2015 2:55 PM, Barry Smith wrote: >>>>>>>>>>>> Run (158/2)x(266/2)x(150/2) grid on 8 processes and then (158)x(266)x(150) on 64 processors and send the two -log_summary results >>>>>>>>>>>> >>>>>>>>>>>> Barry >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> On Nov 2, 2015, at 12:19 AM, TAY wee-beng wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I have attached the new results. >>>>>>>>>>>>> >>>>>>>>>>>>> Thank you >>>>>>>>>>>>> >>>>>>>>>>>>> Yours sincerely, >>>>>>>>>>>>> >>>>>>>>>>>>> TAY wee-beng >>>>>>>>>>>>> >>>>>>>>>>>>> On 2/11/2015 12:27 PM, Barry Smith wrote: >>>>>>>>>>>>>> Run without the -momentum_ksp_view -poisson_ksp_view and send the new results >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> You can see from the log summary that the PCSetUp is taking a much smaller percentage of the time meaning that it is reusing the preconditioner and not rebuilding it each time. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Barry >>>>>>>>>>>>>> >>>>>>>>>>>>>> Something makes no sense with the output: it gives >>>>>>>>>>>>>> >>>>>>>>>>>>>> KSPSolve 199 1.0 2.3298e+03 1.0 5.20e+09 1.8 3.8e+04 9.9e+05 5.0e+02 90100 66100 24 90100 66100 24 165 >>>>>>>>>>>>>> >>>>>>>>>>>>>> 90% of the time is in the solve but there is no significant amount of time in other events of the code which is just not possible. I hope it is due to your IO. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Nov 1, 2015, at 10:02 PM, TAY wee-beng wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have attached the new run with 100 time steps for 48 and 96 cores. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Why does the number of processes increase so much? Is there something wrong with my coding? Seems to be so too for my new run. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thank you >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Yours sincerely, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> TAY wee-beng >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 2/11/2015 9:49 AM, Barry Smith wrote: >>>>>>>>>>>>>>>> If you are doing many time steps with the same linear solver then you MUST do your weak scaling studies with MANY time steps since the setup time of AMG only takes place in the first stimestep. So run both 48 and 96 processes with the same large number of time steps. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Barry >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Nov 1, 2015, at 7:35 PM, TAY wee-beng wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Sorry I forgot and use the old a.out. I have attached the new log for 48cores (log48), together with the 96cores log (log96). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Why does the number of processes increase so much? Is there something wrong with my coding? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Lastly, I only simulated 2 time steps previously. Now I run for 10 timesteps (log48_10). Is it building the preconditioner at every timestep? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Also, what about momentum eqn? Is it working well? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I will try the gamg later too. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thank you >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Yours sincerely, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> TAY wee-beng >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 2/11/2015 12:30 AM, Barry Smith wrote: >>>>>>>>>>>>>>>>>> You used gmres with 48 processes but richardson with 96. You need to be careful and make sure you don't change the solvers when you change the number of processors since you can get very different inconsistent results >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Anyways all the time is being spent in the BoomerAMG algebraic multigrid setup and it is is scaling badly. When you double the problem size and number of processes it went from 3.2445e+01 to 4.3599e+02 seconds. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> PCSetUp 3 1.0 3.2445e+01 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 62 8 0 0 4 62 8 0 0 5 11 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> PCSetUp 3 1.0 4.3599e+02 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 85 18 0 0 6 85 18 0 0 6 2 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Now is the Poisson problem changing at each timestep or can you use the same preconditioner built with BoomerAMG for all the time steps? Algebraic multigrid has a large set up time that you often doesn't matter if you have many time steps but if you have to rebuild it each timestep it is too large? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> You might also try -pc_type gamg and see how PETSc's algebraic multigrid scales for your problem/machine. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Barry >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Nov 1, 2015, at 7:30 AM, TAY wee-beng wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On 1/11/2015 10:00 AM, Barry Smith wrote: >>>>>>>>>>>>>>>>>>>>> On Oct 31, 2015, at 8:43 PM, TAY wee-beng wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On 1/11/2015 12:47 AM, Matthew Knepley wrote: >>>>>>>>>>>>>>>>>>>>>> On Sat, Oct 31, 2015 at 11:34 AM, TAY wee-beng wrote: >>>>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I understand that as mentioned in the faq, due to the limitations in memory, the scaling is not linear. So, I am trying to write a proposal to use a supercomputer. >>>>>>>>>>>>>>>>>>>>>> Its specs are: >>>>>>>>>>>>>>>>>>>>>> Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of memory per node) >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> 8 cores / processor >>>>>>>>>>>>>>>>>>>>>> Interconnect: Tofu (6-dimensional mesh/torus) Interconnect >>>>>>>>>>>>>>>>>>>>>> Each cabinet contains 96 computing nodes, >>>>>>>>>>>>>>>>>>>>>> One of the requirement is to give the performance of my current code with my current set of data, and there is a formula to calculate the estimated parallel efficiency when using the new large set of data >>>>>>>>>>>>>>>>>>>>>> There are 2 ways to give performance: >>>>>>>>>>>>>>>>>>>>>> 1. Strong scaling, which is defined as how the elapsed time varies with the number of processors for a fixed >>>>>>>>>>>>>>>>>>>>>> problem. >>>>>>>>>>>>>>>>>>>>>> 2. Weak scaling, which is defined as how the elapsed time varies with the number of processors for a >>>>>>>>>>>>>>>>>>>>>> fixed problem size per processor. >>>>>>>>>>>>>>>>>>>>>> I ran my cases with 48 and 96 cores with my current cluster, giving 140 and 90 mins respectively. This is classified as strong scaling. >>>>>>>>>>>>>>>>>>>>>> Cluster specs: >>>>>>>>>>>>>>>>>>>>>> CPU: AMD 6234 2.4GHz >>>>>>>>>>>>>>>>>>>>>> 8 cores / processor (CPU) >>>>>>>>>>>>>>>>>>>>>> 6 CPU / node >>>>>>>>>>>>>>>>>>>>>> So 48 Cores / CPU >>>>>>>>>>>>>>>>>>>>>> Not sure abt the memory / node >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> The parallel efficiency ?En? for a given degree of parallelism ?n? indicates how much the program is >>>>>>>>>>>>>>>>>>>>>> efficiently accelerated by parallel processing. ?En? is given by the following formulae. Although their >>>>>>>>>>>>>>>>>>>>>> derivation processes are different depending on strong and weak scaling, derived formulae are the >>>>>>>>>>>>>>>>>>>>>> same. >>>>>>>>>>>>>>>>>>>>>> From the estimated time, my parallel efficiency using Amdahl's law on the current old cluster was 52.7%. >>>>>>>>>>>>>>>>>>>>>> So is my results acceptable? >>>>>>>>>>>>>>>>>>>>>> For the large data set, if using 2205 nodes (2205X8cores), my expected parallel efficiency is only 0.5%. The proposal recommends value of > 50%. >>>>>>>>>>>>>>>>>>>>>> The problem with this analysis is that the estimated serial fraction from Amdahl's Law changes as a function >>>>>>>>>>>>>>>>>>>>>> of problem size, so you cannot take the strong scaling from one problem and apply it to another without a >>>>>>>>>>>>>>>>>>>>>> model of this dependence. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Weak scaling does model changes with problem size, so I would measure weak scaling on your current >>>>>>>>>>>>>>>>>>>>>> cluster, and extrapolate to the big machine. I realize that this does not make sense for many scientific >>>>>>>>>>>>>>>>>>>>>> applications, but neither does requiring a certain parallel efficiency. >>>>>>>>>>>>>>>>>>>>> Ok I check the results for my weak scaling it is even worse for the expected parallel efficiency. From the formula used, it's obvious it's doing some sort of exponential extrapolation decrease. So unless I can achieve a near > 90% speed up when I double the cores and problem size for my current 48/96 cores setup, extrapolating from about 96 nodes to 10,000 nodes will give a much lower expected parallel efficiency for the new case. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> However, it's mentioned in the FAQ that due to memory requirement, it's impossible to get >90% speed when I double the cores and problem size (ie linear increase in performance), which means that I can't get >90% speed up when I double the cores and problem size for my current 48/96 cores setup. Is that so? >>>>>>>>>>>>>>>>>>>> What is the output of -ksp_view -log_summary on the problem and then on the problem doubled in size and number of processors? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Barry >>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I have attached the output >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 48 cores: log48 >>>>>>>>>>>>>>>>>>> 96 cores: log96 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> There are 2 solvers - The momentum linear eqn uses bcgs, while the Poisson eqn uses hypre BoomerAMG. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Problem size doubled from 158x266x150 to 158x266x300. >>>>>>>>>>>>>>>>>>>>> So is it fair to say that the main problem does not lie in my programming skills, but rather the way the linear equations are solved? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Matt >>>>>>>>>>>>>>>>>>>>>> Is it possible for this type of scaling in PETSc (>50%), when using 17640 (2205X8) cores? >>>>>>>>>>>>>>>>>>>>>> Btw, I do not have access to the system. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Sent using CloudMagic Email >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>>>>>>>>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>> >>>>> > > From zonexo at gmail.com Thu Nov 5 23:16:52 2015 From: zonexo at gmail.com (TAY wee-beng) Date: Fri, 6 Nov 2015 13:16:52 +0800 Subject: [petsc-users] Scaling with number of cores In-Reply-To: <108ACE20-8B56-4FA6-8CEE-D455B975DEBF@mcs.anl.gov> References: <5634EDA4.5030203@gmail.com> <56356E2A.8090000@gmail.com> <0BEDF334-C0C6-49D1-9FE8-D178181C5F00@mcs.anl.gov> <5636140A.3040506@gmail.com> <61EEB3FD-A093-475C-9841-9E79BB120385@mcs.anl.gov> <5636BDF3.8000109@gmail.com> <5636E059.2010107@gmail.com> <501DC517-2887-4222-9ECD-9951D521A4E5@mcs.anl.gov> <56370085.7070502@gmail.com> <6FF57BB6-FB0D-49F0-BF84-E4EA760FC010@mcs.anl.gov> <56372A12.90900@gmail.com> <563839F8.9080000@gmail.com> <589BE4CF-18B5-4F04-A442-7FB0F2F3AFC1@mcs.anl.gov> <5638AD42.9060609@gmail.com> <2057DCBD-BDB9-4424-A4E3-C1BA5BB8C214@mcs.anl.gov> <563ACD5F.6060301@gmail.com> <56A89BE9-5AEC-4A99-A888-2B7EF35C7BC9@mcs.anl.gov> <563B7C97.1070604@gmail.com> <0F2039C3-52A4-4BB5-BE77-6E75163BFDD9@mcs.anl.gov> <563C14CB.8050205@gmail.com> <108ACE20-8B56-4FA6-8CEE-D455B975DEBF@mcs.anl.gov> Message-ID: <563C37C4.3040904@gmail.com> On 6/11/2015 12:08 PM, Barry Smith wrote: > Ok the 64 case not converging makes no sense. > > Run it with ksp_monitor and ksp_converged_reason for the pressure solve turned on and -info > > You need to figure out why it is not converging. > > Barry Hi, I found out the reason. Because my partitioning is only in the z direction, and if using 64cores to partition 150 cells in the z direction, some partitions will be too small, leading to error. So how can I test now? The original problem has 158x266x300 with 96 cores. How should I reduce it to test for scaling? Thanks. > >> On Nov 5, 2015, at 8:47 PM, TAY wee-beng wrote: >> >> Hi, >> >> I have removed the nullspace and attached the new logs. >> >> Thank you >> >> Yours sincerely, >> >> TAY wee-beng >> >> On 6/11/2015 12:07 AM, Barry Smith wrote: >>>> On Nov 5, 2015, at 9:58 AM, TAY wee-beng wrote: >>>> >>>> Sorry I realised that I didn't use gamg and that's why. But if I use gamg, the 8 core case worked, but the 64 core case shows p diverged. >>> Where is the log file for the 8 core case? And where is all the output from where it fails with 64 cores? Include -ksp_monitor_true_residual and -ksp_converged_reason >>> >>> Barry >>> >>>> Why is this so? Btw, I have also added nullspace in my code. >>>> >>>> Thank you. >>>> >>>> Yours sincerely, >>>> >>>> TAY wee-beng >>>> >>>> On 5/11/2015 12:03 PM, Barry Smith wrote: >>>>> There is a problem here. The -log_summary doesn't show all the events associated with the -pc_type gamg preconditioner it should have rows like >>>>> >>>>> VecDot 2 1.0 6.1989e-06 1.0 1.00e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1613 >>>>> VecMDot 134 1.0 5.4145e-04 1.0 1.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 3025 >>>>> VecNorm 154 1.0 2.4176e-04 1.0 3.82e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1578 >>>>> VecScale 148 1.0 1.6928e-04 1.0 1.76e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1039 >>>>> VecCopy 106 1.0 1.2255e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>> VecSet 474 1.0 5.1236e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>> VecAXPY 54 1.0 1.3471e-04 1.0 2.35e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1742 >>>>> VecAYPX 384 1.0 5.7459e-04 1.0 4.94e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 860 >>>>> VecAXPBYCZ 192 1.0 4.7398e-04 1.0 9.88e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 2085 >>>>> VecWAXPY 2 1.0 7.8678e-06 1.0 5.00e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 636 >>>>> VecMAXPY 148 1.0 8.1539e-04 1.0 1.96e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 3 0 0 0 1 3 0 0 0 2399 >>>>> VecPointwiseMult 66 1.0 1.1253e-04 1.0 6.79e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 604 >>>>> VecScatterBegin 45 1.0 6.3419e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>> VecSetRandom 6 1.0 3.0994e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>> VecReduceArith 4 1.0 1.3113e-05 1.0 2.00e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1525 >>>>> VecReduceComm 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>> VecNormalize 148 1.0 4.4799e-04 1.0 5.27e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1177 >>>>> MatMult 424 1.0 8.9276e-03 1.0 2.09e+07 1.0 0.0e+00 0.0e+00 0.0e+00 7 37 0 0 0 7 37 0 0 0 2343 >>>>> MatMultAdd 48 1.0 5.0926e-04 1.0 1.05e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 2069 >>>>> MatMultTranspose 48 1.0 9.8586e-04 1.0 1.05e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 1069 >>>>> MatSolve 16 1.0 2.2173e-05 1.0 1.02e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 460 >>>>> MatSOR 354 1.0 1.0547e-02 1.0 1.72e+07 1.0 0.0e+00 0.0e+00 0.0e+00 9 31 0 0 0 9 31 0 0 0 1631 >>>>> MatLUFactorSym 2 1.0 4.7922e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>> MatLUFactorNum 2 1.0 2.5272e-05 1.0 7.75e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 307 >>>>> MatScale 18 1.0 1.7142e-04 1.0 1.50e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 874 >>>>> MatResidual 48 1.0 1.0548e-03 1.0 2.33e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 1 4 0 0 0 2212 >>>>> MatAssemblyBegin 57 1.0 4.7684e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>> MatAssemblyEnd 57 1.0 1.9786e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >>>>> MatGetRow 21616 1.0 1.8497e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >>>>> MatGetRowIJ 2 1.0 6.9141e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>> MatGetOrdering 2 1.0 6.0797e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>> MatCoarsen 6 1.0 9.3222e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>> MatZeroEntries 2 1.0 3.9101e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>> MatAXPY 6 1.0 1.7998e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >>>>> MatFDColorCreate 1 1.0 3.2902e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>> MatFDColorSetUp 1 1.0 1.6739e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>>>> MatFDColorApply 2 1.0 1.3199e-03 1.0 2.41e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 1 4 0 0 0 1826 >>>>> MatFDColorFunc 42 1.0 7.4601e-04 1.0 2.20e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 1 4 0 0 0 2956 >>>>> MatMatMult 6 1.0 5.1048e-03 1.0 1.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 4 2 0 0 0 4 2 0 0 0 241 >>>>> MatMatMultSym 6 1.0 3.2601e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 >>>>> MatMatMultNum 6 1.0 1.8158e-03 1.0 1.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 679 >>>>> MatPtAP 6 1.0 2.1328e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 0.0e+00 18 11 0 0 0 18 11 0 0 0 283 >>>>> MatPtAPSymbolic 6 1.0 1.0073e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 8 0 0 0 0 8 0 0 0 0 0 >>>>> MatPtAPNumeric 6 1.0 1.1230e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 0.0e+00 9 11 0 0 0 9 11 0 0 0 537 >>>>> MatTrnMatMult 2 1.0 7.2789e-04 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 75 >>>>> MatTrnMatMultSym 2 1.0 5.7006e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>> MatTrnMatMultNum 2 1.0 1.5473e-04 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 352 >>>>> MatGetSymTrans 8 1.0 3.1638e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>> KSPGMRESOrthog 134 1.0 1.3156e-03 1.0 3.28e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 6 0 0 0 1 6 0 0 0 2491 >>>>> KSPSetUp 24 1.0 4.6754e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>> KSPSolve 2 1.0 1.1291e-01 1.0 5.32e+07 1.0 0.0e+00 0.0e+00 0.0e+00 94 95 0 0 0 94 95 0 0 0 471 >>>>> PCGAMGGraph_AGG 6 1.0 1.2108e-02 1.0 1.82e+04 1.0 0.0e+00 0.0e+00 0.0e+00 10 0 0 0 0 10 0 0 0 0 2 >>>>> PCGAMGCoarse_AGG 6 1.0 1.1127e-03 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 49 >>>>> PCGAMGProl_AGG 6 1.0 4.1062e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 34 0 0 0 0 34 0 0 0 0 0 >>>>> PCGAMGPOpt_AGG 6 1.0 1.1200e-02 1.0 5.98e+06 1.0 0.0e+00 0.0e+00 0.0e+00 9 11 0 0 0 9 11 0 0 0 534 >>>>> GAMG: createProl 6 1.0 6.5530e-02 1.0 6.06e+06 1.0 0.0e+00 0.0e+00 0.0e+00 55 11 0 0 0 55 11 0 0 0 92 >>>>> Graph 12 1.0 1.1692e-02 1.0 1.82e+04 1.0 0.0e+00 0.0e+00 0.0e+00 10 0 0 0 0 10 0 0 0 0 2 >>>>> MIS/Agg 6 1.0 1.4496e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>> SA: col data 6 1.0 7.1526e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>> SA: frmProl0 6 1.0 4.0917e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 34 0 0 0 0 34 0 0 0 0 0 >>>>> SA: smooth 6 1.0 1.1198e-02 1.0 5.98e+06 1.0 0.0e+00 0.0e+00 0.0e+00 9 11 0 0 0 9 11 0 0 0 534 >>>>> GAMG: partLevel 6 1.0 2.1341e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 0.0e+00 18 11 0 0 0 18 11 0 0 0 283 >>>>> PCSetUp 4 1.0 8.8020e-02 1.0 1.21e+07 1.0 0.0e+00 0.0e+00 0.0e+00 74 22 0 0 0 74 22 0 0 0 137 >>>>> PCSetUpOnBlocks 16 1.0 1.8382e-04 1.0 7.75e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 42 >>>>> PCApply 16 1.0 2.3858e-02 1.0 3.91e+07 1.0 0.0e+00 0.0e+00 0.0e+00 20 70 0 0 0 20 70 0 0 0 1637 >>>>> >>>>> >>>>> Are you sure you ran with -pc_type gamg ? What about running with -info does it print anything about gamg? What about -ksp_view does it indicate it is using the gamg preconditioner? >>>>> >>>>> >>>>>> On Nov 4, 2015, at 9:30 PM, TAY wee-beng wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I have attached the 2 logs. >>>>>> >>>>>> Thank you >>>>>> >>>>>> Yours sincerely, >>>>>> >>>>>> TAY wee-beng >>>>>> >>>>>> On 4/11/2015 1:11 AM, Barry Smith wrote: >>>>>>> Ok, the convergence looks good. Now run on 8 and 64 processes as before with -log_summary and not -ksp_monitor to see how it scales. >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>>> On Nov 3, 2015, at 6:49 AM, TAY wee-beng wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I tried and have attached the log. >>>>>>>> >>>>>>>> Ya, my Poisson eqn has Neumann boundary condition. Do I need to specify some null space stuff? Like KSPSetNullSpace or MatNullSpaceCreate? >>>>>>>> >>>>>>>> Thank you >>>>>>>> >>>>>>>> Yours sincerely, >>>>>>>> >>>>>>>> TAY wee-beng >>>>>>>> >>>>>>>> On 3/11/2015 12:45 PM, Barry Smith wrote: >>>>>>>>>> On Nov 2, 2015, at 10:37 PM, TAY wee-beng wrote: >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I tried : >>>>>>>>>> >>>>>>>>>> 1. -poisson_pc_gamg_agg_nsmooths 1 -poisson_pc_type gamg >>>>>>>>>> >>>>>>>>>> 2. -poisson_pc_type gamg >>>>>>>>> Run with -poisson_ksp_monitor_true_residual -poisson_ksp_monitor_converged_reason >>>>>>>>> Does your poisson have Neumann boundary conditions? Do you have any zeros on the diagonal for the matrix (you shouldn't). >>>>>>>>> >>>>>>>>> There may be something wrong with your poisson discretization that was also messing up hypre >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Both options give: >>>>>>>>>> >>>>>>>>>> 1 0.00150000 0.00000000 0.00000000 1.00000000 NaN NaN NaN >>>>>>>>>> M Diverged but why?, time = 2 >>>>>>>>>> reason = -9 >>>>>>>>>> >>>>>>>>>> How can I check what's wrong? >>>>>>>>>> >>>>>>>>>> Thank you >>>>>>>>>> >>>>>>>>>> Yours sincerely, >>>>>>>>>> >>>>>>>>>> TAY wee-beng >>>>>>>>>> >>>>>>>>>> On 3/11/2015 3:18 AM, Barry Smith wrote: >>>>>>>>>>> hypre is just not scaling well here. I do not know why. Since hypre is a block box for us there is no way to determine why the poor scaling. >>>>>>>>>>> >>>>>>>>>>> If you make the same two runs with -pc_type gamg there will be a lot more information in the log summary about in what routines it is scaling well or poorly. >>>>>>>>>>> >>>>>>>>>>> Barry >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On Nov 2, 2015, at 3:17 AM, TAY wee-beng wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> I have attached the 2 files. >>>>>>>>>>>> >>>>>>>>>>>> Thank you >>>>>>>>>>>> >>>>>>>>>>>> Yours sincerely, >>>>>>>>>>>> >>>>>>>>>>>> TAY wee-beng >>>>>>>>>>>> >>>>>>>>>>>> On 2/11/2015 2:55 PM, Barry Smith wrote: >>>>>>>>>>>>> Run (158/2)x(266/2)x(150/2) grid on 8 processes and then (158)x(266)x(150) on 64 processors and send the two -log_summary results >>>>>>>>>>>>> >>>>>>>>>>>>> Barry >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> On Nov 2, 2015, at 12:19 AM, TAY wee-beng wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I have attached the new results. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thank you >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yours sincerely, >>>>>>>>>>>>>> >>>>>>>>>>>>>> TAY wee-beng >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 2/11/2015 12:27 PM, Barry Smith wrote: >>>>>>>>>>>>>>> Run without the -momentum_ksp_view -poisson_ksp_view and send the new results >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> You can see from the log summary that the PCSetUp is taking a much smaller percentage of the time meaning that it is reusing the preconditioner and not rebuilding it each time. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Barry >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Something makes no sense with the output: it gives >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> KSPSolve 199 1.0 2.3298e+03 1.0 5.20e+09 1.8 3.8e+04 9.9e+05 5.0e+02 90100 66100 24 90100 66100 24 165 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 90% of the time is in the solve but there is no significant amount of time in other events of the code which is just not possible. I hope it is due to your IO. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Nov 1, 2015, at 10:02 PM, TAY wee-beng wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I have attached the new run with 100 time steps for 48 and 96 cores. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Why does the number of processes increase so much? Is there something wrong with my coding? Seems to be so too for my new run. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thank you >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Yours sincerely, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> TAY wee-beng >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 2/11/2015 9:49 AM, Barry Smith wrote: >>>>>>>>>>>>>>>>> If you are doing many time steps with the same linear solver then you MUST do your weak scaling studies with MANY time steps since the setup time of AMG only takes place in the first stimestep. So run both 48 and 96 processes with the same large number of time steps. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Barry >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Nov 1, 2015, at 7:35 PM, TAY wee-beng wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Sorry I forgot and use the old a.out. I have attached the new log for 48cores (log48), together with the 96cores log (log96). >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Why does the number of processes increase so much? Is there something wrong with my coding? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Lastly, I only simulated 2 time steps previously. Now I run for 10 timesteps (log48_10). Is it building the preconditioner at every timestep? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Also, what about momentum eqn? Is it working well? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I will try the gamg later too. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thank you >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Yours sincerely, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> TAY wee-beng >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On 2/11/2015 12:30 AM, Barry Smith wrote: >>>>>>>>>>>>>>>>>>> You used gmres with 48 processes but richardson with 96. You need to be careful and make sure you don't change the solvers when you change the number of processors since you can get very different inconsistent results >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Anyways all the time is being spent in the BoomerAMG algebraic multigrid setup and it is is scaling badly. When you double the problem size and number of processes it went from 3.2445e+01 to 4.3599e+02 seconds. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> PCSetUp 3 1.0 3.2445e+01 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 62 8 0 0 4 62 8 0 0 5 11 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> PCSetUp 3 1.0 4.3599e+02 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 85 18 0 0 6 85 18 0 0 6 2 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Now is the Poisson problem changing at each timestep or can you use the same preconditioner built with BoomerAMG for all the time steps? Algebraic multigrid has a large set up time that you often doesn't matter if you have many time steps but if you have to rebuild it each timestep it is too large? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> You might also try -pc_type gamg and see how PETSc's algebraic multigrid scales for your problem/machine. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Barry >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Nov 1, 2015, at 7:30 AM, TAY wee-beng wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On 1/11/2015 10:00 AM, Barry Smith wrote: >>>>>>>>>>>>>>>>>>>>>> On Oct 31, 2015, at 8:43 PM, TAY wee-beng wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On 1/11/2015 12:47 AM, Matthew Knepley wrote: >>>>>>>>>>>>>>>>>>>>>>> On Sat, Oct 31, 2015 at 11:34 AM, TAY wee-beng wrote: >>>>>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> I understand that as mentioned in the faq, due to the limitations in memory, the scaling is not linear. So, I am trying to write a proposal to use a supercomputer. >>>>>>>>>>>>>>>>>>>>>>> Its specs are: >>>>>>>>>>>>>>>>>>>>>>> Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of memory per node) >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> 8 cores / processor >>>>>>>>>>>>>>>>>>>>>>> Interconnect: Tofu (6-dimensional mesh/torus) Interconnect >>>>>>>>>>>>>>>>>>>>>>> Each cabinet contains 96 computing nodes, >>>>>>>>>>>>>>>>>>>>>>> One of the requirement is to give the performance of my current code with my current set of data, and there is a formula to calculate the estimated parallel efficiency when using the new large set of data >>>>>>>>>>>>>>>>>>>>>>> There are 2 ways to give performance: >>>>>>>>>>>>>>>>>>>>>>> 1. Strong scaling, which is defined as how the elapsed time varies with the number of processors for a fixed >>>>>>>>>>>>>>>>>>>>>>> problem. >>>>>>>>>>>>>>>>>>>>>>> 2. Weak scaling, which is defined as how the elapsed time varies with the number of processors for a >>>>>>>>>>>>>>>>>>>>>>> fixed problem size per processor. >>>>>>>>>>>>>>>>>>>>>>> I ran my cases with 48 and 96 cores with my current cluster, giving 140 and 90 mins respectively. This is classified as strong scaling. >>>>>>>>>>>>>>>>>>>>>>> Cluster specs: >>>>>>>>>>>>>>>>>>>>>>> CPU: AMD 6234 2.4GHz >>>>>>>>>>>>>>>>>>>>>>> 8 cores / processor (CPU) >>>>>>>>>>>>>>>>>>>>>>> 6 CPU / node >>>>>>>>>>>>>>>>>>>>>>> So 48 Cores / CPU >>>>>>>>>>>>>>>>>>>>>>> Not sure abt the memory / node >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> The parallel efficiency ?En? for a given degree of parallelism ?n? indicates how much the program is >>>>>>>>>>>>>>>>>>>>>>> efficiently accelerated by parallel processing. ?En? is given by the following formulae. Although their >>>>>>>>>>>>>>>>>>>>>>> derivation processes are different depending on strong and weak scaling, derived formulae are the >>>>>>>>>>>>>>>>>>>>>>> same. >>>>>>>>>>>>>>>>>>>>>>> From the estimated time, my parallel efficiency using Amdahl's law on the current old cluster was 52.7%. >>>>>>>>>>>>>>>>>>>>>>> So is my results acceptable? >>>>>>>>>>>>>>>>>>>>>>> For the large data set, if using 2205 nodes (2205X8cores), my expected parallel efficiency is only 0.5%. The proposal recommends value of > 50%. >>>>>>>>>>>>>>>>>>>>>>> The problem with this analysis is that the estimated serial fraction from Amdahl's Law changes as a function >>>>>>>>>>>>>>>>>>>>>>> of problem size, so you cannot take the strong scaling from one problem and apply it to another without a >>>>>>>>>>>>>>>>>>>>>>> model of this dependence. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Weak scaling does model changes with problem size, so I would measure weak scaling on your current >>>>>>>>>>>>>>>>>>>>>>> cluster, and extrapolate to the big machine. I realize that this does not make sense for many scientific >>>>>>>>>>>>>>>>>>>>>>> applications, but neither does requiring a certain parallel efficiency. >>>>>>>>>>>>>>>>>>>>>> Ok I check the results for my weak scaling it is even worse for the expected parallel efficiency. From the formula used, it's obvious it's doing some sort of exponential extrapolation decrease. So unless I can achieve a near > 90% speed up when I double the cores and problem size for my current 48/96 cores setup, extrapolating from about 96 nodes to 10,000 nodes will give a much lower expected parallel efficiency for the new case. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> However, it's mentioned in the FAQ that due to memory requirement, it's impossible to get >90% speed when I double the cores and problem size (ie linear increase in performance), which means that I can't get >90% speed up when I double the cores and problem size for my current 48/96 cores setup. Is that so? >>>>>>>>>>>>>>>>>>>>> What is the output of -ksp_view -log_summary on the problem and then on the problem doubled in size and number of processors? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Barry >>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I have attached the output >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> 48 cores: log48 >>>>>>>>>>>>>>>>>>>> 96 cores: log96 >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> There are 2 solvers - The momentum linear eqn uses bcgs, while the Poisson eqn uses hypre BoomerAMG. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Problem size doubled from 158x266x150 to 158x266x300. >>>>>>>>>>>>>>>>>>>>>> So is it fair to say that the main problem does not lie in my programming skills, but rather the way the linear equations are solved? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Matt >>>>>>>>>>>>>>>>>>>>>>> Is it possible for this type of scaling in PETSc (>50%), when using 17640 (2205X8) cores? >>>>>>>>>>>>>>>>>>>>>>> Btw, I do not have access to the system. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Sent using CloudMagic Email >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>>>>>>>>>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>> >>>>>> >> From bsmith at mcs.anl.gov Thu Nov 5 23:26:14 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 5 Nov 2015 23:26:14 -0600 Subject: [petsc-users] Scaling with number of cores In-Reply-To: <563C37C4.3040904@gmail.com> References: <5634EDA4.5030203@gmail.com> <56356E2A.8090000@gmail.com> <0BEDF334-C0C6-49D1-9FE8-D178181C5F00@mcs.anl.gov> <5636140A.3040506@gmail.com> <61EEB3FD-A093-475C-9841-9E79BB120385@mcs.anl.gov> <5636BDF3.8000109@gmail.com> <5636E059.2010107@gmail.com> <501DC517-2887-4222-9ECD-9951D521A4E5@mcs.anl.gov> <56370085.7070502@gmail.com> <6FF57BB6-FB0D-49F0-BF84-E4EA760FC010@mcs.anl.gov> <56372A12.90900@gmail.com> <563839F8.9080000@gmail.com> <589BE4CF-18B5-4F04-A442-7FB0F2F3AFC1@mcs.anl.gov> <5638AD42.9060609@gmail.com> <2057DCBD-BDB9-4424-A4E3-C1BA5BB8C214@mcs.anl.gov> <563ACD5F.6060301@gmail.com> <56A89BE9-5AEC-4A99-A888-2B7EF35C7BC9@mcs.anl.gov> <563B7C97.1070604@gmail.com> <0F2039C3-52A4-4BB5-BE77-6E75163BFDD9@mcs.anl.gov> <563C14CB.8050205@gmail.com> <108ACE20-8B56-4FA6-8CEE-D455B975DEBF@mcs.anl.gov> <563C37C4.3040904@gmail.com> Message-ID: <8764B318-4058-4435-AC58-011989E8718B@mcs.anl.gov> > On Nov 5, 2015, at 11:16 PM, TAY wee-beng wrote: > > > On 6/11/2015 12:08 PM, Barry Smith wrote: >> Ok the 64 case not converging makes no sense. >> >> Run it with ksp_monitor and ksp_converged_reason for the pressure solve turned on and -info >> >> You need to figure out why it is not converging. >> >> Barry > Hi, > > I found out the reason. Because my partitioning is only in the z direction, and if using 64cores to partition 150 cells in the z direction, some partitions will be too small, leading to error. Oh, well this is your fundamental problem and why you don't get scaling! You need to have partitioning in all three directions or you will never get good scaling! This is fundamental, just fix your code to have partitioning in all dimensions Barry > > So how can I test now? The original problem has 158x266x300 with 96 cores. How should I reduce it to test for scaling? > > Thanks. >> >>> On Nov 5, 2015, at 8:47 PM, TAY wee-beng wrote: >>> >>> Hi, >>> >>> I have removed the nullspace and attached the new logs. >>> >>> Thank you >>> >>> Yours sincerely, >>> >>> TAY wee-beng >>> >>> On 6/11/2015 12:07 AM, Barry Smith wrote: >>>>> On Nov 5, 2015, at 9:58 AM, TAY wee-beng wrote: >>>>> >>>>> Sorry I realised that I didn't use gamg and that's why. But if I use gamg, the 8 core case worked, but the 64 core case shows p diverged. >>>> Where is the log file for the 8 core case? And where is all the output from where it fails with 64 cores? Include -ksp_monitor_true_residual and -ksp_converged_reason >>>> >>>> Barry >>>> >>>>> Why is this so? Btw, I have also added nullspace in my code. >>>>> >>>>> Thank you. >>>>> >>>>> Yours sincerely, >>>>> >>>>> TAY wee-beng >>>>> >>>>> On 5/11/2015 12:03 PM, Barry Smith wrote: >>>>>> There is a problem here. The -log_summary doesn't show all the events associated with the -pc_type gamg preconditioner it should have rows like >>>>>> >>>>>> VecDot 2 1.0 6.1989e-06 1.0 1.00e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1613 >>>>>> VecMDot 134 1.0 5.4145e-04 1.0 1.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 3025 >>>>>> VecNorm 154 1.0 2.4176e-04 1.0 3.82e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1578 >>>>>> VecScale 148 1.0 1.6928e-04 1.0 1.76e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1039 >>>>>> VecCopy 106 1.0 1.2255e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>> VecSet 474 1.0 5.1236e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>> VecAXPY 54 1.0 1.3471e-04 1.0 2.35e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1742 >>>>>> VecAYPX 384 1.0 5.7459e-04 1.0 4.94e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 860 >>>>>> VecAXPBYCZ 192 1.0 4.7398e-04 1.0 9.88e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 2085 >>>>>> VecWAXPY 2 1.0 7.8678e-06 1.0 5.00e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 636 >>>>>> VecMAXPY 148 1.0 8.1539e-04 1.0 1.96e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 3 0 0 0 1 3 0 0 0 2399 >>>>>> VecPointwiseMult 66 1.0 1.1253e-04 1.0 6.79e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 604 >>>>>> VecScatterBegin 45 1.0 6.3419e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>> VecSetRandom 6 1.0 3.0994e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>> VecReduceArith 4 1.0 1.3113e-05 1.0 2.00e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1525 >>>>>> VecReduceComm 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>> VecNormalize 148 1.0 4.4799e-04 1.0 5.27e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1177 >>>>>> MatMult 424 1.0 8.9276e-03 1.0 2.09e+07 1.0 0.0e+00 0.0e+00 0.0e+00 7 37 0 0 0 7 37 0 0 0 2343 >>>>>> MatMultAdd 48 1.0 5.0926e-04 1.0 1.05e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 2069 >>>>>> MatMultTranspose 48 1.0 9.8586e-04 1.0 1.05e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 1069 >>>>>> MatSolve 16 1.0 2.2173e-05 1.0 1.02e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 460 >>>>>> MatSOR 354 1.0 1.0547e-02 1.0 1.72e+07 1.0 0.0e+00 0.0e+00 0.0e+00 9 31 0 0 0 9 31 0 0 0 1631 >>>>>> MatLUFactorSym 2 1.0 4.7922e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>> MatLUFactorNum 2 1.0 2.5272e-05 1.0 7.75e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 307 >>>>>> MatScale 18 1.0 1.7142e-04 1.0 1.50e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 874 >>>>>> MatResidual 48 1.0 1.0548e-03 1.0 2.33e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 1 4 0 0 0 2212 >>>>>> MatAssemblyBegin 57 1.0 4.7684e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>> MatAssemblyEnd 57 1.0 1.9786e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >>>>>> MatGetRow 21616 1.0 1.8497e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >>>>>> MatGetRowIJ 2 1.0 6.9141e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>> MatGetOrdering 2 1.0 6.0797e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>> MatCoarsen 6 1.0 9.3222e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>> MatZeroEntries 2 1.0 3.9101e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>> MatAXPY 6 1.0 1.7998e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >>>>>> MatFDColorCreate 1 1.0 3.2902e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>> MatFDColorSetUp 1 1.0 1.6739e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>>>>> MatFDColorApply 2 1.0 1.3199e-03 1.0 2.41e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 1 4 0 0 0 1826 >>>>>> MatFDColorFunc 42 1.0 7.4601e-04 1.0 2.20e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 1 4 0 0 0 2956 >>>>>> MatMatMult 6 1.0 5.1048e-03 1.0 1.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 4 2 0 0 0 4 2 0 0 0 241 >>>>>> MatMatMultSym 6 1.0 3.2601e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 >>>>>> MatMatMultNum 6 1.0 1.8158e-03 1.0 1.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 679 >>>>>> MatPtAP 6 1.0 2.1328e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 0.0e+00 18 11 0 0 0 18 11 0 0 0 283 >>>>>> MatPtAPSymbolic 6 1.0 1.0073e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 8 0 0 0 0 8 0 0 0 0 0 >>>>>> MatPtAPNumeric 6 1.0 1.1230e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 0.0e+00 9 11 0 0 0 9 11 0 0 0 537 >>>>>> MatTrnMatMult 2 1.0 7.2789e-04 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 75 >>>>>> MatTrnMatMultSym 2 1.0 5.7006e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>> MatTrnMatMultNum 2 1.0 1.5473e-04 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 352 >>>>>> MatGetSymTrans 8 1.0 3.1638e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>> KSPGMRESOrthog 134 1.0 1.3156e-03 1.0 3.28e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 6 0 0 0 1 6 0 0 0 2491 >>>>>> KSPSetUp 24 1.0 4.6754e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>> KSPSolve 2 1.0 1.1291e-01 1.0 5.32e+07 1.0 0.0e+00 0.0e+00 0.0e+00 94 95 0 0 0 94 95 0 0 0 471 >>>>>> PCGAMGGraph_AGG 6 1.0 1.2108e-02 1.0 1.82e+04 1.0 0.0e+00 0.0e+00 0.0e+00 10 0 0 0 0 10 0 0 0 0 2 >>>>>> PCGAMGCoarse_AGG 6 1.0 1.1127e-03 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 49 >>>>>> PCGAMGProl_AGG 6 1.0 4.1062e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 34 0 0 0 0 34 0 0 0 0 0 >>>>>> PCGAMGPOpt_AGG 6 1.0 1.1200e-02 1.0 5.98e+06 1.0 0.0e+00 0.0e+00 0.0e+00 9 11 0 0 0 9 11 0 0 0 534 >>>>>> GAMG: createProl 6 1.0 6.5530e-02 1.0 6.06e+06 1.0 0.0e+00 0.0e+00 0.0e+00 55 11 0 0 0 55 11 0 0 0 92 >>>>>> Graph 12 1.0 1.1692e-02 1.0 1.82e+04 1.0 0.0e+00 0.0e+00 0.0e+00 10 0 0 0 0 10 0 0 0 0 2 >>>>>> MIS/Agg 6 1.0 1.4496e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>> SA: col data 6 1.0 7.1526e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>> SA: frmProl0 6 1.0 4.0917e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 34 0 0 0 0 34 0 0 0 0 0 >>>>>> SA: smooth 6 1.0 1.1198e-02 1.0 5.98e+06 1.0 0.0e+00 0.0e+00 0.0e+00 9 11 0 0 0 9 11 0 0 0 534 >>>>>> GAMG: partLevel 6 1.0 2.1341e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 0.0e+00 18 11 0 0 0 18 11 0 0 0 283 >>>>>> PCSetUp 4 1.0 8.8020e-02 1.0 1.21e+07 1.0 0.0e+00 0.0e+00 0.0e+00 74 22 0 0 0 74 22 0 0 0 137 >>>>>> PCSetUpOnBlocks 16 1.0 1.8382e-04 1.0 7.75e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 42 >>>>>> PCApply 16 1.0 2.3858e-02 1.0 3.91e+07 1.0 0.0e+00 0.0e+00 0.0e+00 20 70 0 0 0 20 70 0 0 0 1637 >>>>>> >>>>>> >>>>>> Are you sure you ran with -pc_type gamg ? What about running with -info does it print anything about gamg? What about -ksp_view does it indicate it is using the gamg preconditioner? >>>>>> >>>>>> >>>>>>> On Nov 4, 2015, at 9:30 PM, TAY wee-beng wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I have attached the 2 logs. >>>>>>> >>>>>>> Thank you >>>>>>> >>>>>>> Yours sincerely, >>>>>>> >>>>>>> TAY wee-beng >>>>>>> >>>>>>> On 4/11/2015 1:11 AM, Barry Smith wrote: >>>>>>>> Ok, the convergence looks good. Now run on 8 and 64 processes as before with -log_summary and not -ksp_monitor to see how it scales. >>>>>>>> >>>>>>>> Barry >>>>>>>> >>>>>>>>> On Nov 3, 2015, at 6:49 AM, TAY wee-beng wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I tried and have attached the log. >>>>>>>>> >>>>>>>>> Ya, my Poisson eqn has Neumann boundary condition. Do I need to specify some null space stuff? Like KSPSetNullSpace or MatNullSpaceCreate? >>>>>>>>> >>>>>>>>> Thank you >>>>>>>>> >>>>>>>>> Yours sincerely, >>>>>>>>> >>>>>>>>> TAY wee-beng >>>>>>>>> >>>>>>>>> On 3/11/2015 12:45 PM, Barry Smith wrote: >>>>>>>>>>> On Nov 2, 2015, at 10:37 PM, TAY wee-beng wrote: >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I tried : >>>>>>>>>>> >>>>>>>>>>> 1. -poisson_pc_gamg_agg_nsmooths 1 -poisson_pc_type gamg >>>>>>>>>>> >>>>>>>>>>> 2. -poisson_pc_type gamg >>>>>>>>>> Run with -poisson_ksp_monitor_true_residual -poisson_ksp_monitor_converged_reason >>>>>>>>>> Does your poisson have Neumann boundary conditions? Do you have any zeros on the diagonal for the matrix (you shouldn't). >>>>>>>>>> >>>>>>>>>> There may be something wrong with your poisson discretization that was also messing up hypre >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Both options give: >>>>>>>>>>> >>>>>>>>>>> 1 0.00150000 0.00000000 0.00000000 1.00000000 NaN NaN NaN >>>>>>>>>>> M Diverged but why?, time = 2 >>>>>>>>>>> reason = -9 >>>>>>>>>>> >>>>>>>>>>> How can I check what's wrong? >>>>>>>>>>> >>>>>>>>>>> Thank you >>>>>>>>>>> >>>>>>>>>>> Yours sincerely, >>>>>>>>>>> >>>>>>>>>>> TAY wee-beng >>>>>>>>>>> >>>>>>>>>>> On 3/11/2015 3:18 AM, Barry Smith wrote: >>>>>>>>>>>> hypre is just not scaling well here. I do not know why. Since hypre is a block box for us there is no way to determine why the poor scaling. >>>>>>>>>>>> >>>>>>>>>>>> If you make the same two runs with -pc_type gamg there will be a lot more information in the log summary about in what routines it is scaling well or poorly. >>>>>>>>>>>> >>>>>>>>>>>> Barry >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> On Nov 2, 2015, at 3:17 AM, TAY wee-beng wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I have attached the 2 files. >>>>>>>>>>>>> >>>>>>>>>>>>> Thank you >>>>>>>>>>>>> >>>>>>>>>>>>> Yours sincerely, >>>>>>>>>>>>> >>>>>>>>>>>>> TAY wee-beng >>>>>>>>>>>>> >>>>>>>>>>>>> On 2/11/2015 2:55 PM, Barry Smith wrote: >>>>>>>>>>>>>> Run (158/2)x(266/2)x(150/2) grid on 8 processes and then (158)x(266)x(150) on 64 processors and send the two -log_summary results >>>>>>>>>>>>>> >>>>>>>>>>>>>> Barry >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Nov 2, 2015, at 12:19 AM, TAY wee-beng wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have attached the new results. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thank you >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Yours sincerely, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> TAY wee-beng >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 2/11/2015 12:27 PM, Barry Smith wrote: >>>>>>>>>>>>>>>> Run without the -momentum_ksp_view -poisson_ksp_view and send the new results >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> You can see from the log summary that the PCSetUp is taking a much smaller percentage of the time meaning that it is reusing the preconditioner and not rebuilding it each time. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Barry >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Something makes no sense with the output: it gives >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> KSPSolve 199 1.0 2.3298e+03 1.0 5.20e+09 1.8 3.8e+04 9.9e+05 5.0e+02 90100 66100 24 90100 66100 24 165 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 90% of the time is in the solve but there is no significant amount of time in other events of the code which is just not possible. I hope it is due to your IO. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Nov 1, 2015, at 10:02 PM, TAY wee-beng wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I have attached the new run with 100 time steps for 48 and 96 cores. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Why does the number of processes increase so much? Is there something wrong with my coding? Seems to be so too for my new run. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thank you >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Yours sincerely, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> TAY wee-beng >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 2/11/2015 9:49 AM, Barry Smith wrote: >>>>>>>>>>>>>>>>>> If you are doing many time steps with the same linear solver then you MUST do your weak scaling studies with MANY time steps since the setup time of AMG only takes place in the first stimestep. So run both 48 and 96 processes with the same large number of time steps. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Barry >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Nov 1, 2015, at 7:35 PM, TAY wee-beng wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Sorry I forgot and use the old a.out. I have attached the new log for 48cores (log48), together with the 96cores log (log96). >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Why does the number of processes increase so much? Is there something wrong with my coding? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Lastly, I only simulated 2 time steps previously. Now I run for 10 timesteps (log48_10). Is it building the preconditioner at every timestep? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Also, what about momentum eqn? Is it working well? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I will try the gamg later too. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thank you >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Yours sincerely, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> TAY wee-beng >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On 2/11/2015 12:30 AM, Barry Smith wrote: >>>>>>>>>>>>>>>>>>>> You used gmres with 48 processes but richardson with 96. You need to be careful and make sure you don't change the solvers when you change the number of processors since you can get very different inconsistent results >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Anyways all the time is being spent in the BoomerAMG algebraic multigrid setup and it is is scaling badly. When you double the problem size and number of processes it went from 3.2445e+01 to 4.3599e+02 seconds. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> PCSetUp 3 1.0 3.2445e+01 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 62 8 0 0 4 62 8 0 0 5 11 >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> PCSetUp 3 1.0 4.3599e+02 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 85 18 0 0 6 85 18 0 0 6 2 >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Now is the Poisson problem changing at each timestep or can you use the same preconditioner built with BoomerAMG for all the time steps? Algebraic multigrid has a large set up time that you often doesn't matter if you have many time steps but if you have to rebuild it each timestep it is too large? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> You might also try -pc_type gamg and see how PETSc's algebraic multigrid scales for your problem/machine. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Barry >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Nov 1, 2015, at 7:30 AM, TAY wee-beng wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On 1/11/2015 10:00 AM, Barry Smith wrote: >>>>>>>>>>>>>>>>>>>>>>> On Oct 31, 2015, at 8:43 PM, TAY wee-beng wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On 1/11/2015 12:47 AM, Matthew Knepley wrote: >>>>>>>>>>>>>>>>>>>>>>>> On Sat, Oct 31, 2015 at 11:34 AM, TAY wee-beng wrote: >>>>>>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I understand that as mentioned in the faq, due to the limitations in memory, the scaling is not linear. So, I am trying to write a proposal to use a supercomputer. >>>>>>>>>>>>>>>>>>>>>>>> Its specs are: >>>>>>>>>>>>>>>>>>>>>>>> Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of memory per node) >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> 8 cores / processor >>>>>>>>>>>>>>>>>>>>>>>> Interconnect: Tofu (6-dimensional mesh/torus) Interconnect >>>>>>>>>>>>>>>>>>>>>>>> Each cabinet contains 96 computing nodes, >>>>>>>>>>>>>>>>>>>>>>>> One of the requirement is to give the performance of my current code with my current set of data, and there is a formula to calculate the estimated parallel efficiency when using the new large set of data >>>>>>>>>>>>>>>>>>>>>>>> There are 2 ways to give performance: >>>>>>>>>>>>>>>>>>>>>>>> 1. Strong scaling, which is defined as how the elapsed time varies with the number of processors for a fixed >>>>>>>>>>>>>>>>>>>>>>>> problem. >>>>>>>>>>>>>>>>>>>>>>>> 2. Weak scaling, which is defined as how the elapsed time varies with the number of processors for a >>>>>>>>>>>>>>>>>>>>>>>> fixed problem size per processor. >>>>>>>>>>>>>>>>>>>>>>>> I ran my cases with 48 and 96 cores with my current cluster, giving 140 and 90 mins respectively. This is classified as strong scaling. >>>>>>>>>>>>>>>>>>>>>>>> Cluster specs: >>>>>>>>>>>>>>>>>>>>>>>> CPU: AMD 6234 2.4GHz >>>>>>>>>>>>>>>>>>>>>>>> 8 cores / processor (CPU) >>>>>>>>>>>>>>>>>>>>>>>> 6 CPU / node >>>>>>>>>>>>>>>>>>>>>>>> So 48 Cores / CPU >>>>>>>>>>>>>>>>>>>>>>>> Not sure abt the memory / node >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> The parallel efficiency ?En? for a given degree of parallelism ?n? indicates how much the program is >>>>>>>>>>>>>>>>>>>>>>>> efficiently accelerated by parallel processing. ?En? is given by the following formulae. Although their >>>>>>>>>>>>>>>>>>>>>>>> derivation processes are different depending on strong and weak scaling, derived formulae are the >>>>>>>>>>>>>>>>>>>>>>>> same. >>>>>>>>>>>>>>>>>>>>>>>> From the estimated time, my parallel efficiency using Amdahl's law on the current old cluster was 52.7%. >>>>>>>>>>>>>>>>>>>>>>>> So is my results acceptable? >>>>>>>>>>>>>>>>>>>>>>>> For the large data set, if using 2205 nodes (2205X8cores), my expected parallel efficiency is only 0.5%. The proposal recommends value of > 50%. >>>>>>>>>>>>>>>>>>>>>>>> The problem with this analysis is that the estimated serial fraction from Amdahl's Law changes as a function >>>>>>>>>>>>>>>>>>>>>>>> of problem size, so you cannot take the strong scaling from one problem and apply it to another without a >>>>>>>>>>>>>>>>>>>>>>>> model of this dependence. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Weak scaling does model changes with problem size, so I would measure weak scaling on your current >>>>>>>>>>>>>>>>>>>>>>>> cluster, and extrapolate to the big machine. I realize that this does not make sense for many scientific >>>>>>>>>>>>>>>>>>>>>>>> applications, but neither does requiring a certain parallel efficiency. >>>>>>>>>>>>>>>>>>>>>>> Ok I check the results for my weak scaling it is even worse for the expected parallel efficiency. From the formula used, it's obvious it's doing some sort of exponential extrapolation decrease. So unless I can achieve a near > 90% speed up when I double the cores and problem size for my current 48/96 cores setup, extrapolating from about 96 nodes to 10,000 nodes will give a much lower expected parallel efficiency for the new case. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> However, it's mentioned in the FAQ that due to memory requirement, it's impossible to get >90% speed when I double the cores and problem size (ie linear increase in performance), which means that I can't get >90% speed up when I double the cores and problem size for my current 48/96 cores setup. Is that so? >>>>>>>>>>>>>>>>>>>>>> What is the output of -ksp_view -log_summary on the problem and then on the problem doubled in size and number of processors? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Barry >>>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I have attached the output >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 48 cores: log48 >>>>>>>>>>>>>>>>>>>>> 96 cores: log96 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> There are 2 solvers - The momentum linear eqn uses bcgs, while the Poisson eqn uses hypre BoomerAMG. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Problem size doubled from 158x266x150 to 158x266x300. >>>>>>>>>>>>>>>>>>>>>>> So is it fair to say that the main problem does not lie in my programming skills, but rather the way the linear equations are solved? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Matt >>>>>>>>>>>>>>>>>>>>>>>> Is it possible for this type of scaling in PETSc (>50%), when using 17640 (2205X8) cores? >>>>>>>>>>>>>>>>>>>>>>>> Btw, I do not have access to the system. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Sent using CloudMagic Email >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>>>>>>>>>>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>> >>>>>>> >>> > From zonexo at gmail.com Thu Nov 5 23:59:11 2015 From: zonexo at gmail.com (TAY wee-beng) Date: Fri, 6 Nov 2015 13:59:11 +0800 Subject: [petsc-users] Scaling with number of cores In-Reply-To: <8764B318-4058-4435-AC58-011989E8718B@mcs.anl.gov> References: <5634EDA4.5030203@gmail.com> <56356E2A.8090000@gmail.com> <0BEDF334-C0C6-49D1-9FE8-D178181C5F00@mcs.anl.gov> <5636140A.3040506@gmail.com> <61EEB3FD-A093-475C-9841-9E79BB120385@mcs.anl.gov> <5636BDF3.8000109@gmail.com> <5636E059.2010107@gmail.com> <501DC517-2887-4222-9ECD-9951D521A4E5@mcs.anl.gov> <56370085.7070502@gmail.com> <6FF57BB6-FB0D-49F0-BF84-E4EA760FC010@mcs.anl.gov> <56372A12.90900@gmail.com> <563839F8.9080000@gmail.com> <589BE4CF-18B5-4F04-A442-7FB0F2F3AFC1@mcs.anl.gov> <5638AD42.9060609@gmail.com> <2057DCBD-BDB9-4424-A4E3-C1BA5BB8C214@mcs.anl.gov> <563ACD5F.6060301@gmail.com> <56A89BE9-5AEC-4A99-A888-2B7EF35C7BC9@mcs.anl.gov> <563B7C97.1070604@gmail.com> <0F2039C3-52A4-4BB5-BE77-6E75163BFDD9@mcs.anl.gov> <563C14CB.8050205@gmail.com> <108ACE20-8B56-4FA6-8CEE-D455B975DEBF@mcs.anl.gov> <563C37C4.3040904@gmail.com> <8764B318-4058-4435-AC58-011989E8718B@mcs.anl.gov> Message-ID: <563C41AF.4080109@gmail.com> On 6/11/2015 1:26 PM, Barry Smith wrote: >> On Nov 5, 2015, at 11:16 PM, TAY wee-beng wrote: >> >> >> On 6/11/2015 12:08 PM, Barry Smith wrote: >>> Ok the 64 case not converging makes no sense. >>> >>> Run it with ksp_monitor and ksp_converged_reason for the pressure solve turned on and -info >>> >>> You need to figure out why it is not converging. >>> >>> Barry >> Hi, >> >> I found out the reason. Because my partitioning is only in the z direction, and if using 64cores to partition 150 cells in the z direction, some partitions will be too small, leading to error. > Oh, well this is your fundamental problem and why you don't get scaling! You need to have partitioning in all three directions or you will never get good scaling! This is fundamental, just fix your code to have partitioning in all dimensions > > Barry Hi, Ok, I'll make the change and compare again. Thanks > >> So how can I test now? The original problem has 158x266x300 with 96 cores. How should I reduce it to test for scaling? >> >> Thanks. >>>> On Nov 5, 2015, at 8:47 PM, TAY wee-beng wrote: >>>> >>>> Hi, >>>> >>>> I have removed the nullspace and attached the new logs. >>>> >>>> Thank you >>>> >>>> Yours sincerely, >>>> >>>> TAY wee-beng >>>> >>>> On 6/11/2015 12:07 AM, Barry Smith wrote: >>>>>> On Nov 5, 2015, at 9:58 AM, TAY wee-beng wrote: >>>>>> >>>>>> Sorry I realised that I didn't use gamg and that's why. But if I use gamg, the 8 core case worked, but the 64 core case shows p diverged. >>>>> Where is the log file for the 8 core case? And where is all the output from where it fails with 64 cores? Include -ksp_monitor_true_residual and -ksp_converged_reason >>>>> >>>>> Barry >>>>> >>>>>> Why is this so? Btw, I have also added nullspace in my code. >>>>>> >>>>>> Thank you. >>>>>> >>>>>> Yours sincerely, >>>>>> >>>>>> TAY wee-beng >>>>>> >>>>>> On 5/11/2015 12:03 PM, Barry Smith wrote: >>>>>>> There is a problem here. The -log_summary doesn't show all the events associated with the -pc_type gamg preconditioner it should have rows like >>>>>>> >>>>>>> VecDot 2 1.0 6.1989e-06 1.0 1.00e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1613 >>>>>>> VecMDot 134 1.0 5.4145e-04 1.0 1.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 3025 >>>>>>> VecNorm 154 1.0 2.4176e-04 1.0 3.82e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1578 >>>>>>> VecScale 148 1.0 1.6928e-04 1.0 1.76e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1039 >>>>>>> VecCopy 106 1.0 1.2255e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>>> VecSet 474 1.0 5.1236e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>>> VecAXPY 54 1.0 1.3471e-04 1.0 2.35e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1742 >>>>>>> VecAYPX 384 1.0 5.7459e-04 1.0 4.94e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 860 >>>>>>> VecAXPBYCZ 192 1.0 4.7398e-04 1.0 9.88e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 2085 >>>>>>> VecWAXPY 2 1.0 7.8678e-06 1.0 5.00e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 636 >>>>>>> VecMAXPY 148 1.0 8.1539e-04 1.0 1.96e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 3 0 0 0 1 3 0 0 0 2399 >>>>>>> VecPointwiseMult 66 1.0 1.1253e-04 1.0 6.79e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 604 >>>>>>> VecScatterBegin 45 1.0 6.3419e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>>> VecSetRandom 6 1.0 3.0994e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>>> VecReduceArith 4 1.0 1.3113e-05 1.0 2.00e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1525 >>>>>>> VecReduceComm 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>>> VecNormalize 148 1.0 4.4799e-04 1.0 5.27e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1177 >>>>>>> MatMult 424 1.0 8.9276e-03 1.0 2.09e+07 1.0 0.0e+00 0.0e+00 0.0e+00 7 37 0 0 0 7 37 0 0 0 2343 >>>>>>> MatMultAdd 48 1.0 5.0926e-04 1.0 1.05e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 2069 >>>>>>> MatMultTranspose 48 1.0 9.8586e-04 1.0 1.05e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 1069 >>>>>>> MatSolve 16 1.0 2.2173e-05 1.0 1.02e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 460 >>>>>>> MatSOR 354 1.0 1.0547e-02 1.0 1.72e+07 1.0 0.0e+00 0.0e+00 0.0e+00 9 31 0 0 0 9 31 0 0 0 1631 >>>>>>> MatLUFactorSym 2 1.0 4.7922e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>>> MatLUFactorNum 2 1.0 2.5272e-05 1.0 7.75e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 307 >>>>>>> MatScale 18 1.0 1.7142e-04 1.0 1.50e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 874 >>>>>>> MatResidual 48 1.0 1.0548e-03 1.0 2.33e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 1 4 0 0 0 2212 >>>>>>> MatAssemblyBegin 57 1.0 4.7684e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>>> MatAssemblyEnd 57 1.0 1.9786e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >>>>>>> MatGetRow 21616 1.0 1.8497e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >>>>>>> MatGetRowIJ 2 1.0 6.9141e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>>> MatGetOrdering 2 1.0 6.0797e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>>> MatCoarsen 6 1.0 9.3222e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>>> MatZeroEntries 2 1.0 3.9101e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>>> MatAXPY 6 1.0 1.7998e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >>>>>>> MatFDColorCreate 1 1.0 3.2902e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>>> MatFDColorSetUp 1 1.0 1.6739e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>>>>>> MatFDColorApply 2 1.0 1.3199e-03 1.0 2.41e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 1 4 0 0 0 1826 >>>>>>> MatFDColorFunc 42 1.0 7.4601e-04 1.0 2.20e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 1 4 0 0 0 2956 >>>>>>> MatMatMult 6 1.0 5.1048e-03 1.0 1.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 4 2 0 0 0 4 2 0 0 0 241 >>>>>>> MatMatMultSym 6 1.0 3.2601e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 >>>>>>> MatMatMultNum 6 1.0 1.8158e-03 1.0 1.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 679 >>>>>>> MatPtAP 6 1.0 2.1328e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 0.0e+00 18 11 0 0 0 18 11 0 0 0 283 >>>>>>> MatPtAPSymbolic 6 1.0 1.0073e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 8 0 0 0 0 8 0 0 0 0 0 >>>>>>> MatPtAPNumeric 6 1.0 1.1230e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 0.0e+00 9 11 0 0 0 9 11 0 0 0 537 >>>>>>> MatTrnMatMult 2 1.0 7.2789e-04 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 75 >>>>>>> MatTrnMatMultSym 2 1.0 5.7006e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>>> MatTrnMatMultNum 2 1.0 1.5473e-04 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 352 >>>>>>> MatGetSymTrans 8 1.0 3.1638e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>>> KSPGMRESOrthog 134 1.0 1.3156e-03 1.0 3.28e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 6 0 0 0 1 6 0 0 0 2491 >>>>>>> KSPSetUp 24 1.0 4.6754e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>>> KSPSolve 2 1.0 1.1291e-01 1.0 5.32e+07 1.0 0.0e+00 0.0e+00 0.0e+00 94 95 0 0 0 94 95 0 0 0 471 >>>>>>> PCGAMGGraph_AGG 6 1.0 1.2108e-02 1.0 1.82e+04 1.0 0.0e+00 0.0e+00 0.0e+00 10 0 0 0 0 10 0 0 0 0 2 >>>>>>> PCGAMGCoarse_AGG 6 1.0 1.1127e-03 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 49 >>>>>>> PCGAMGProl_AGG 6 1.0 4.1062e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 34 0 0 0 0 34 0 0 0 0 0 >>>>>>> PCGAMGPOpt_AGG 6 1.0 1.1200e-02 1.0 5.98e+06 1.0 0.0e+00 0.0e+00 0.0e+00 9 11 0 0 0 9 11 0 0 0 534 >>>>>>> GAMG: createProl 6 1.0 6.5530e-02 1.0 6.06e+06 1.0 0.0e+00 0.0e+00 0.0e+00 55 11 0 0 0 55 11 0 0 0 92 >>>>>>> Graph 12 1.0 1.1692e-02 1.0 1.82e+04 1.0 0.0e+00 0.0e+00 0.0e+00 10 0 0 0 0 10 0 0 0 0 2 >>>>>>> MIS/Agg 6 1.0 1.4496e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>>> SA: col data 6 1.0 7.1526e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>>> SA: frmProl0 6 1.0 4.0917e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 34 0 0 0 0 34 0 0 0 0 0 >>>>>>> SA: smooth 6 1.0 1.1198e-02 1.0 5.98e+06 1.0 0.0e+00 0.0e+00 0.0e+00 9 11 0 0 0 9 11 0 0 0 534 >>>>>>> GAMG: partLevel 6 1.0 2.1341e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 0.0e+00 18 11 0 0 0 18 11 0 0 0 283 >>>>>>> PCSetUp 4 1.0 8.8020e-02 1.0 1.21e+07 1.0 0.0e+00 0.0e+00 0.0e+00 74 22 0 0 0 74 22 0 0 0 137 >>>>>>> PCSetUpOnBlocks 16 1.0 1.8382e-04 1.0 7.75e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 42 >>>>>>> PCApply 16 1.0 2.3858e-02 1.0 3.91e+07 1.0 0.0e+00 0.0e+00 0.0e+00 20 70 0 0 0 20 70 0 0 0 1637 >>>>>>> >>>>>>> >>>>>>> Are you sure you ran with -pc_type gamg ? What about running with -info does it print anything about gamg? What about -ksp_view does it indicate it is using the gamg preconditioner? >>>>>>> >>>>>>> >>>>>>>> On Nov 4, 2015, at 9:30 PM, TAY wee-beng wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I have attached the 2 logs. >>>>>>>> >>>>>>>> Thank you >>>>>>>> >>>>>>>> Yours sincerely, >>>>>>>> >>>>>>>> TAY wee-beng >>>>>>>> >>>>>>>> On 4/11/2015 1:11 AM, Barry Smith wrote: >>>>>>>>> Ok, the convergence looks good. Now run on 8 and 64 processes as before with -log_summary and not -ksp_monitor to see how it scales. >>>>>>>>> >>>>>>>>> Barry >>>>>>>>> >>>>>>>>>> On Nov 3, 2015, at 6:49 AM, TAY wee-beng wrote: >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I tried and have attached the log. >>>>>>>>>> >>>>>>>>>> Ya, my Poisson eqn has Neumann boundary condition. Do I need to specify some null space stuff? Like KSPSetNullSpace or MatNullSpaceCreate? >>>>>>>>>> >>>>>>>>>> Thank you >>>>>>>>>> >>>>>>>>>> Yours sincerely, >>>>>>>>>> >>>>>>>>>> TAY wee-beng >>>>>>>>>> >>>>>>>>>> On 3/11/2015 12:45 PM, Barry Smith wrote: >>>>>>>>>>>> On Nov 2, 2015, at 10:37 PM, TAY wee-beng wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> I tried : >>>>>>>>>>>> >>>>>>>>>>>> 1. -poisson_pc_gamg_agg_nsmooths 1 -poisson_pc_type gamg >>>>>>>>>>>> >>>>>>>>>>>> 2. -poisson_pc_type gamg >>>>>>>>>>> Run with -poisson_ksp_monitor_true_residual -poisson_ksp_monitor_converged_reason >>>>>>>>>>> Does your poisson have Neumann boundary conditions? Do you have any zeros on the diagonal for the matrix (you shouldn't). >>>>>>>>>>> >>>>>>>>>>> There may be something wrong with your poisson discretization that was also messing up hypre >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Both options give: >>>>>>>>>>>> >>>>>>>>>>>> 1 0.00150000 0.00000000 0.00000000 1.00000000 NaN NaN NaN >>>>>>>>>>>> M Diverged but why?, time = 2 >>>>>>>>>>>> reason = -9 >>>>>>>>>>>> >>>>>>>>>>>> How can I check what's wrong? >>>>>>>>>>>> >>>>>>>>>>>> Thank you >>>>>>>>>>>> >>>>>>>>>>>> Yours sincerely, >>>>>>>>>>>> >>>>>>>>>>>> TAY wee-beng >>>>>>>>>>>> >>>>>>>>>>>> On 3/11/2015 3:18 AM, Barry Smith wrote: >>>>>>>>>>>>> hypre is just not scaling well here. I do not know why. Since hypre is a block box for us there is no way to determine why the poor scaling. >>>>>>>>>>>>> >>>>>>>>>>>>> If you make the same two runs with -pc_type gamg there will be a lot more information in the log summary about in what routines it is scaling well or poorly. >>>>>>>>>>>>> >>>>>>>>>>>>> Barry >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> On Nov 2, 2015, at 3:17 AM, TAY wee-beng wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I have attached the 2 files. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thank you >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yours sincerely, >>>>>>>>>>>>>> >>>>>>>>>>>>>> TAY wee-beng >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 2/11/2015 2:55 PM, Barry Smith wrote: >>>>>>>>>>>>>>> Run (158/2)x(266/2)x(150/2) grid on 8 processes and then (158)x(266)x(150) on 64 processors and send the two -log_summary results >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Barry >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Nov 2, 2015, at 12:19 AM, TAY wee-beng wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I have attached the new results. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thank you >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Yours sincerely, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> TAY wee-beng >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 2/11/2015 12:27 PM, Barry Smith wrote: >>>>>>>>>>>>>>>>> Run without the -momentum_ksp_view -poisson_ksp_view and send the new results >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> You can see from the log summary that the PCSetUp is taking a much smaller percentage of the time meaning that it is reusing the preconditioner and not rebuilding it each time. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Barry >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Something makes no sense with the output: it gives >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> KSPSolve 199 1.0 2.3298e+03 1.0 5.20e+09 1.8 3.8e+04 9.9e+05 5.0e+02 90100 66100 24 90100 66100 24 165 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 90% of the time is in the solve but there is no significant amount of time in other events of the code which is just not possible. I hope it is due to your IO. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Nov 1, 2015, at 10:02 PM, TAY wee-beng wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I have attached the new run with 100 time steps for 48 and 96 cores. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Why does the number of processes increase so much? Is there something wrong with my coding? Seems to be so too for my new run. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thank you >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Yours sincerely, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> TAY wee-beng >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On 2/11/2015 9:49 AM, Barry Smith wrote: >>>>>>>>>>>>>>>>>>> If you are doing many time steps with the same linear solver then you MUST do your weak scaling studies with MANY time steps since the setup time of AMG only takes place in the first stimestep. So run both 48 and 96 processes with the same large number of time steps. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Barry >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Nov 1, 2015, at 7:35 PM, TAY wee-beng wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Sorry I forgot and use the old a.out. I have attached the new log for 48cores (log48), together with the 96cores log (log96). >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Why does the number of processes increase so much? Is there something wrong with my coding? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Lastly, I only simulated 2 time steps previously. Now I run for 10 timesteps (log48_10). Is it building the preconditioner at every timestep? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Also, what about momentum eqn? Is it working well? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I will try the gamg later too. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thank you >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Yours sincerely, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> TAY wee-beng >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On 2/11/2015 12:30 AM, Barry Smith wrote: >>>>>>>>>>>>>>>>>>>>> You used gmres with 48 processes but richardson with 96. You need to be careful and make sure you don't change the solvers when you change the number of processors since you can get very different inconsistent results >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Anyways all the time is being spent in the BoomerAMG algebraic multigrid setup and it is is scaling badly. When you double the problem size and number of processes it went from 3.2445e+01 to 4.3599e+02 seconds. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> PCSetUp 3 1.0 3.2445e+01 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 62 8 0 0 4 62 8 0 0 5 11 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> PCSetUp 3 1.0 4.3599e+02 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 85 18 0 0 6 85 18 0 0 6 2 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Now is the Poisson problem changing at each timestep or can you use the same preconditioner built with BoomerAMG for all the time steps? Algebraic multigrid has a large set up time that you often doesn't matter if you have many time steps but if you have to rebuild it each timestep it is too large? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> You might also try -pc_type gamg and see how PETSc's algebraic multigrid scales for your problem/machine. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Barry >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Nov 1, 2015, at 7:30 AM, TAY wee-beng wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On 1/11/2015 10:00 AM, Barry Smith wrote: >>>>>>>>>>>>>>>>>>>>>>>> On Oct 31, 2015, at 8:43 PM, TAY wee-beng wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On 1/11/2015 12:47 AM, Matthew Knepley wrote: >>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Oct 31, 2015 at 11:34 AM, TAY wee-beng wrote: >>>>>>>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> I understand that as mentioned in the faq, due to the limitations in memory, the scaling is not linear. So, I am trying to write a proposal to use a supercomputer. >>>>>>>>>>>>>>>>>>>>>>>>> Its specs are: >>>>>>>>>>>>>>>>>>>>>>>>> Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of memory per node) >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> 8 cores / processor >>>>>>>>>>>>>>>>>>>>>>>>> Interconnect: Tofu (6-dimensional mesh/torus) Interconnect >>>>>>>>>>>>>>>>>>>>>>>>> Each cabinet contains 96 computing nodes, >>>>>>>>>>>>>>>>>>>>>>>>> One of the requirement is to give the performance of my current code with my current set of data, and there is a formula to calculate the estimated parallel efficiency when using the new large set of data >>>>>>>>>>>>>>>>>>>>>>>>> There are 2 ways to give performance: >>>>>>>>>>>>>>>>>>>>>>>>> 1. Strong scaling, which is defined as how the elapsed time varies with the number of processors for a fixed >>>>>>>>>>>>>>>>>>>>>>>>> problem. >>>>>>>>>>>>>>>>>>>>>>>>> 2. Weak scaling, which is defined as how the elapsed time varies with the number of processors for a >>>>>>>>>>>>>>>>>>>>>>>>> fixed problem size per processor. >>>>>>>>>>>>>>>>>>>>>>>>> I ran my cases with 48 and 96 cores with my current cluster, giving 140 and 90 mins respectively. This is classified as strong scaling. >>>>>>>>>>>>>>>>>>>>>>>>> Cluster specs: >>>>>>>>>>>>>>>>>>>>>>>>> CPU: AMD 6234 2.4GHz >>>>>>>>>>>>>>>>>>>>>>>>> 8 cores / processor (CPU) >>>>>>>>>>>>>>>>>>>>>>>>> 6 CPU / node >>>>>>>>>>>>>>>>>>>>>>>>> So 48 Cores / CPU >>>>>>>>>>>>>>>>>>>>>>>>> Not sure abt the memory / node >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> The parallel efficiency ?En? for a given degree of parallelism ?n? indicates how much the program is >>>>>>>>>>>>>>>>>>>>>>>>> efficiently accelerated by parallel processing. ?En? is given by the following formulae. Although their >>>>>>>>>>>>>>>>>>>>>>>>> derivation processes are different depending on strong and weak scaling, derived formulae are the >>>>>>>>>>>>>>>>>>>>>>>>> same. >>>>>>>>>>>>>>>>>>>>>>>>> From the estimated time, my parallel efficiency using Amdahl's law on the current old cluster was 52.7%. >>>>>>>>>>>>>>>>>>>>>>>>> So is my results acceptable? >>>>>>>>>>>>>>>>>>>>>>>>> For the large data set, if using 2205 nodes (2205X8cores), my expected parallel efficiency is only 0.5%. The proposal recommends value of > 50%. >>>>>>>>>>>>>>>>>>>>>>>>> The problem with this analysis is that the estimated serial fraction from Amdahl's Law changes as a function >>>>>>>>>>>>>>>>>>>>>>>>> of problem size, so you cannot take the strong scaling from one problem and apply it to another without a >>>>>>>>>>>>>>>>>>>>>>>>> model of this dependence. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Weak scaling does model changes with problem size, so I would measure weak scaling on your current >>>>>>>>>>>>>>>>>>>>>>>>> cluster, and extrapolate to the big machine. I realize that this does not make sense for many scientific >>>>>>>>>>>>>>>>>>>>>>>>> applications, but neither does requiring a certain parallel efficiency. >>>>>>>>>>>>>>>>>>>>>>>> Ok I check the results for my weak scaling it is even worse for the expected parallel efficiency. From the formula used, it's obvious it's doing some sort of exponential extrapolation decrease. So unless I can achieve a near > 90% speed up when I double the cores and problem size for my current 48/96 cores setup, extrapolating from about 96 nodes to 10,000 nodes will give a much lower expected parallel efficiency for the new case. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> However, it's mentioned in the FAQ that due to memory requirement, it's impossible to get >90% speed when I double the cores and problem size (ie linear increase in performance), which means that I can't get >90% speed up when I double the cores and problem size for my current 48/96 cores setup. Is that so? >>>>>>>>>>>>>>>>>>>>>>> What is the output of -ksp_view -log_summary on the problem and then on the problem doubled in size and number of processors? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Barry >>>>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I have attached the output >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> 48 cores: log48 >>>>>>>>>>>>>>>>>>>>>> 96 cores: log96 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> There are 2 solvers - The momentum linear eqn uses bcgs, while the Poisson eqn uses hypre BoomerAMG. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Problem size doubled from 158x266x150 to 158x266x300. >>>>>>>>>>>>>>>>>>>>>>>> So is it fair to say that the main problem does not lie in my programming skills, but rather the way the linear equations are solved? >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Matt >>>>>>>>>>>>>>>>>>>>>>>>> Is it possible for this type of scaling in PETSc (>50%), when using 17640 (2205X8) cores? >>>>>>>>>>>>>>>>>>>>>>>>> Btw, I do not have access to the system. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Sent using CloudMagic Email >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>>>>>>>>>>>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>> From davydden at gmail.com Fri Nov 6 08:51:52 2015 From: davydden at gmail.com (Denis Davydov) Date: Fri, 6 Nov 2015 15:51:52 +0100 Subject: [petsc-users] CG+GAMG convergence issues in GHEP Krylov-Schur for some MPI runs In-Reply-To: <2724FD04-9FED-43B8-B8A4-9F66BB8BD43B@dsic.upv.es> References: <084F486B-00B4-4803-A95B-EE390EB25391@gmail.com> <2724FD04-9FED-43B8-B8A4-9F66BB8BD43B@dsic.upv.es> Message-ID: After running in debug mode it seems that the GAMG solver indeed did not converge, however throwing the error leads to SIGABRT (backtrace and frames are below). It is still very suspicious why would solving for (unchanged) mass matrix wouldn't converge inside SLEPc's spectral transformation. p.s. valgrind takes enormous amount of time on this problem, will try to leave it over the weekend. Denis. =============== Program received signal SIGABRT, Aborted. 0x00007fffea87fcc9 in __GI_raise (sig=sig at entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt #0 0x00007fffea87fcc9 in __GI_raise (sig=sig at entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x00007fffea8830d8 in __GI_abort () at abort.c:89 #2 0x00007fffeb790c91 in PetscTraceBackErrorHandler (comm=0x2a09bd0, line=798, fun=0x7fffed0e24b9 <__func__.20043> "KSPSolve", file=0x7fffed0e1620 "/home/davydden/.hashdist/tmp/petsc-hujktg3j6hq7/src/ksp/ksp/interface/itfunc.c", n=91, p=PETSC_ERROR_INITIAL, mess=0x7fffffffac30 "KSPSolve has not converged", ctx=0x0) at /home/davydden/.hashdist/tmp/petsc-hujktg3j6hq7/src/sys/error/errtrace.c:243 #3 0x00007fffeb78b8b9 in PetscError (comm=0x2a09bd0, line=798, func=0x7fffed0e24b9 <__func__.20043> "KSPSolve", file=0x7fffed0e1620 "/home/davydden/.hashdist/tmp/petsc-hujktg3j6hq7/src/ksp/ksp/interface/itfunc.c", n=91, p=PETSC_ERROR_INITIAL, mess=0x7fffed0e1e7a "KSPSolve has not converged") at /home/davydden/.hashdist/tmp/petsc-hujktg3j6hq7/src/sys/error/err.c:377 #4 0x00007fffec75e1e7 in KSPSolve (ksp=0x367227d0, b=0x35b285c0, x=0x35d89250) at /home/davydden/.hashdist/tmp/petsc-hujktg3j6hq7/src/ksp/ksp/interface/itfunc.c:798 #5 0x00007fffe32a8657 in STMatSolve (st=0x3672d820, b=0x35b285c0, x=0x35d89250) at /home/davydden/.hashdist/tmp/slepc-22nb32nbgvhx/src/sys/classes/st/interface/stsles.c:166 ---Type to continue, or q to quit---q Quit (gdb) f 5 #5 0x00007fffe32a8657 in STMatSolve (st=0x3672d820, b=0x35b285c0, x=0x35d89250) at /home/davydden/.hashdist/tmp/slepc-22nb32nbgvhx/src/sys/classes/st/interface/stsles.c:166 166 ierr = KSPSolve(st->ksp,b,x);CHKERRQ(ierr); (gdb) f 4 #4 0x00007fffec75e1e7 in KSPSolve (ksp=0x367227d0, b=0x35b285c0, x=0x35d89250) at /home/davydden/.hashdist/tmp/petsc-hujktg3j6hq7/src/ksp/ksp/interface/itfunc.c:798 798 if (ksp->errorifnotconverged && ksp->reason < 0) SETERRQ(comm,PETSC_ERR_NOT_CONVERGED,"KSPSolve has not converged"); (gdb) f 3 #3 0x00007fffeb78b8b9 in PetscError (comm=0x2a09bd0, line=798, func=0x7fffed0e24b9 <__func__.20043> "KSPSolve", file=0x7fffed0e1620 "/home/davydden/.hashdist/tmp/petsc-hujktg3j6hq7/src/ksp/ksp/interface/itfunc.c", n=91, p=PETSC_ERROR_INITIAL, mess=0x7fffed0e1e7a "KSPSolve has not converged") at /home/davydden/.hashdist/tmp/petsc-hujktg3j6hq7/src/sys/error/err.c:377 377 if (!eh) ierr = PetscTraceBackErrorHandler(comm,line,func,file,n,p,lbuf,0); (gdb) f 2 #2 0x00007fffeb790c91 in PetscTraceBackErrorHandler (comm=0x2a09bd0, line=798, fun=0x7fffed0e24b9 <__func__.20043> "KSPSolve", file=0x7fffed0e1620 "/home/davydden/.hashdist/tmp/petsc-hujktg3j6hq7/src/ksp/ksp/interface/itfunc.c", n=91, p=PETSC_ERROR_INITIAL, mess=0x7fffffffac30 "KSPSolve has not converged", ctx=0x0) at /home/davydden/.hashdist/tmp/petsc-hujktg3j6hq7/src/sys/error/errtrace.c:243 243 abort(); (gdb) f 1 #1 0x00007fffea8830d8 in __GI_abort () at abort.c:89 89 abort.c: No such file or directory. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Fri Nov 6 09:09:02 2015 From: hzhang at mcs.anl.gov (Hong) Date: Fri, 6 Nov 2015 09:09:02 -0600 Subject: [petsc-users] CG+GAMG convergence issues in GHEP Krylov-Schur for some MPI runs In-Reply-To: References: <084F486B-00B4-4803-A95B-EE390EB25391@gmail.com> <2724FD04-9FED-43B8-B8A4-9F66BB8BD43B@dsic.upv.es> Message-ID: Denis: Do you use shift-and-invert method for solving eigenvalue problem? If so, the linear problems would be extremely ill-conditioned, for which the direct solver, such LU or Cholesky are usually the only working option. You may run your petsc/slepc code with option '-ksp_monitor' to observe convergence behavior. Hong After running in debug mode it seems that the GAMG solver indeed did not > converge, however throwing the error leads to SIGABRT (backtrace and frames > are below). > It is still very suspicious why would solving for (unchanged) mass matrix > wouldn't converge inside SLEPc's spectral transformation. > > p.s. valgrind takes enormous amount of time on this problem, > will try to leave it over the weekend. > > Denis. > > =============== > Program received signal SIGABRT, Aborted. > 0x00007fffea87fcc9 in __GI_raise (sig=sig at entry=6) > at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 > 56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory. > (gdb) bt > #0 0x00007fffea87fcc9 in __GI_raise (sig=sig at entry=6) > at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 > #1 0x00007fffea8830d8 in __GI_abort () at abort.c:89 > #2 0x00007fffeb790c91 in PetscTraceBackErrorHandler (comm=0x2a09bd0, > line=798, fun=0x7fffed0e24b9 <__func__.20043> "KSPSolve", > file=0x7fffed0e1620 > "/home/davydden/.hashdist/tmp/petsc-hujktg3j6hq7/src/ksp/ksp/interface/itfunc.c", > n=91, p=PETSC_ERROR_INITIAL, > mess=0x7fffffffac30 "KSPSolve has not converged", ctx=0x0) > at > /home/davydden/.hashdist/tmp/petsc-hujktg3j6hq7/src/sys/error/errtrace.c:243 > #3 0x00007fffeb78b8b9 in PetscError (comm=0x2a09bd0, line=798, > func=0x7fffed0e24b9 <__func__.20043> "KSPSolve", > file=0x7fffed0e1620 > "/home/davydden/.hashdist/tmp/petsc-hujktg3j6hq7/src/ksp/ksp/interface/itfunc.c", > n=91, p=PETSC_ERROR_INITIAL, > mess=0x7fffed0e1e7a "KSPSolve has not converged") > at > /home/davydden/.hashdist/tmp/petsc-hujktg3j6hq7/src/sys/error/err.c:377 > #4 0x00007fffec75e1e7 in KSPSolve (ksp=0x367227d0, b=0x35b285c0, > x=0x35d89250) > at > /home/davydden/.hashdist/tmp/petsc-hujktg3j6hq7/src/ksp/ksp/interface/itfunc.c:798 > #5 0x00007fffe32a8657 in STMatSolve (st=0x3672d820, b=0x35b285c0, > x=0x35d89250) > at > /home/davydden/.hashdist/tmp/slepc-22nb32nbgvhx/src/sys/classes/st/interface/stsles.c:166 > ---Type to continue, or q to quit---q > Quit > (gdb) f 5 > #5 0x00007fffe32a8657 in STMatSolve (st=0x3672d820, b=0x35b285c0, > x=0x35d89250) > at > /home/davydden/.hashdist/tmp/slepc-22nb32nbgvhx/src/sys/classes/st/interface/stsles.c:166 > 166 ierr = KSPSolve(st->ksp,b,x);CHKERRQ(ierr); > (gdb) f 4 > #4 0x00007fffec75e1e7 in KSPSolve (ksp=0x367227d0, b=0x35b285c0, > x=0x35d89250) > at > /home/davydden/.hashdist/tmp/petsc-hujktg3j6hq7/src/ksp/ksp/interface/itfunc.c:798 > 798 if (ksp->errorifnotconverged && ksp->reason < 0) > SETERRQ(comm,PETSC_ERR_NOT_CONVERGED,"KSPSolve has not converged"); > (gdb) f 3 > #3 0x00007fffeb78b8b9 in PetscError (comm=0x2a09bd0, line=798, > func=0x7fffed0e24b9 <__func__.20043> "KSPSolve", > file=0x7fffed0e1620 > "/home/davydden/.hashdist/tmp/petsc-hujktg3j6hq7/src/ksp/ksp/interface/itfunc.c", > n=91, p=PETSC_ERROR_INITIAL, > mess=0x7fffed0e1e7a "KSPSolve has not converged") > at > /home/davydden/.hashdist/tmp/petsc-hujktg3j6hq7/src/sys/error/err.c:377 > 377 if (!eh) ierr = > PetscTraceBackErrorHandler(comm,line,func,file,n,p,lbuf,0); > (gdb) f 2 > #2 0x00007fffeb790c91 in PetscTraceBackErrorHandler (comm=0x2a09bd0, > line=798, fun=0x7fffed0e24b9 <__func__.20043> "KSPSolve", > file=0x7fffed0e1620 > "/home/davydden/.hashdist/tmp/petsc-hujktg3j6hq7/src/ksp/ksp/interface/itfunc.c", > n=91, p=PETSC_ERROR_INITIAL, > mess=0x7fffffffac30 "KSPSolve has not converged", ctx=0x0) > at > /home/davydden/.hashdist/tmp/petsc-hujktg3j6hq7/src/sys/error/errtrace.c:243 > 243 abort(); > (gdb) f 1 > #1 0x00007fffea8830d8 in __GI_abort () at abort.c:89 > 89 abort.c: No such file or directory. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From davydden at gmail.com Fri Nov 6 09:15:18 2015 From: davydden at gmail.com (Denis Davydov) Date: Fri, 6 Nov 2015 16:15:18 +0100 Subject: [petsc-users] CG+GAMG convergence issues in GHEP Krylov-Schur for some MPI runs In-Reply-To: References: <084F486B-00B4-4803-A95B-EE390EB25391@gmail.com> <2724FD04-9FED-43B8-B8A4-9F66BB8BD43B@dsic.upv.es> Message-ID: Hi Hong, > On 6 Nov 2015, at 16:09, Hong wrote: > > Denis: > Do you use shift-and-invert method for solving eigenvalue problem? no, it?s just shift with zero value. So for GHEP one inverts B-matrix. > If so, the linear problems would be extremely ill-conditioned, for which the direct solver, such LU or Cholesky are usually the only working option. Depends on the shift, i would say. In any case the same problem works with jacobi preconditioner no no other changes, so i would not relate it to any settings on SLEPc part. > You may run your petsc/slepc code with option '-ksp_monitor' to observe convergence behavior. Will do, thanks. Regards, Denis. From knepley at gmail.com Fri Nov 6 09:22:03 2015 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 6 Nov 2015 09:22:03 -0600 Subject: [petsc-users] CG+GAMG convergence issues in GHEP Krylov-Schur for some MPI runs In-Reply-To: References: <084F486B-00B4-4803-A95B-EE390EB25391@gmail.com> <2724FD04-9FED-43B8-B8A4-9F66BB8BD43B@dsic.upv.es> Message-ID: On Fri, Nov 6, 2015 at 9:15 AM, Denis Davydov wrote: > Hi Hong, > > > On 6 Nov 2015, at 16:09, Hong wrote: > > > > Denis: > > Do you use shift-and-invert method for solving eigenvalue problem? > no, it?s just shift with zero value. So for GHEP one inverts B-matrix. > > > If so, the linear problems would be extremely ill-conditioned, for which > the direct solver, such LU or Cholesky are usually the only working option. > Depends on the shift, i would say. > In any case the same problem works with jacobi preconditioner no no other > changes, > so i would not relate it to any settings on SLEPc part. Is it possible that the matrix is rank deficient? Jacobi will just chug along and sometimes work, but AMG will fail spectacularly in that case. Matt > > > You may run your petsc/slepc code with option '-ksp_monitor' to observe > convergence behavior. > Will do, thanks. > > Regards, > Denis. > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From davydden at gmail.com Fri Nov 6 09:29:42 2015 From: davydden at gmail.com (Denis Davydov) Date: Fri, 6 Nov 2015 16:29:42 +0100 Subject: [petsc-users] CG+GAMG convergence issues in GHEP Krylov-Schur for some MPI runs In-Reply-To: References: <084F486B-00B4-4803-A95B-EE390EB25391@gmail.com> <2724FD04-9FED-43B8-B8A4-9F66BB8BD43B@dsic.upv.es> Message-ID: <2605F715-A65B-4DBD-A768-5117326EC251@gmail.com> > On 6 Nov 2015, at 16:22, Matthew Knepley wrote: > > Is it possible that the matrix is rank deficient? Jacobi will just chug along and sometimes work, but > AMG will fail spectacularly in that case. It should not. It is just a mass (overlap) matrix coming from linear FEs with zero Dirichlet BC assembled in deal.II. Due to elimination of some algebraic constraints on DoFs there are lines with only diagonal element, but it should still be SPD. More interestingly is that it does not fail immediately (i.e. the first time it?s used in SLEPc solvers), but only on the 4th step. So 3 times SLEPc worked just fine to solve GHEP with Gamg and zero shift. Regards, Denis. From knepley at gmail.com Fri Nov 6 09:32:59 2015 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 6 Nov 2015 09:32:59 -0600 Subject: [petsc-users] CG+GAMG convergence issues in GHEP Krylov-Schur for some MPI runs In-Reply-To: <2605F715-A65B-4DBD-A768-5117326EC251@gmail.com> References: <084F486B-00B4-4803-A95B-EE390EB25391@gmail.com> <2724FD04-9FED-43B8-B8A4-9F66BB8BD43B@dsic.upv.es> <2605F715-A65B-4DBD-A768-5117326EC251@gmail.com> Message-ID: On Fri, Nov 6, 2015 at 9:29 AM, Denis Davydov wrote: > > > On 6 Nov 2015, at 16:22, Matthew Knepley wrote: > > > > Is it possible that the matrix is rank deficient? Jacobi will just chug > along and sometimes work, but > > AMG will fail spectacularly in that case. > > It should not. It is just a mass (overlap) matrix coming from linear FEs > with zero Dirichlet BC assembled in deal.II. > Due to elimination of some algebraic constraints on DoFs there are lines > with only diagonal element, > but it should still be SPD. > > More interestingly is that it does not fail immediately (i.e. the first > time it?s used in SLEPc solvers), > but only on the 4th step. So 3 times SLEPc worked just fine to solve GHEP > with Gamg and zero shift. > Then I think it is not doing what you suppose. I am not inclined to believe that it behaves differently on the same matrix. Matt > Regards, > Denis. > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Nov 6 10:39:50 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 6 Nov 2015 10:39:50 -0600 Subject: [petsc-users] CG+GAMG convergence issues in GHEP Krylov-Schur for some MPI runs In-Reply-To: <2605F715-A65B-4DBD-A768-5117326EC251@gmail.com> References: <084F486B-00B4-4803-A95B-EE390EB25391@gmail.com> <2724FD04-9FED-43B8-B8A4-9F66BB8BD43B@dsic.upv.es> <2605F715-A65B-4DBD-A768-5117326EC251@gmail.com> Message-ID: <8C4E73A6-F414-42E4-86F2-DB45C4C7E09D@mcs.anl.gov> If it is a true mass matrix in the finite element sense of the word then it should be very well conditioned and one definitely would not use something like GAMG on. Jacobi + CG or maybe SSOR + CG should converge rapidly Barry > On Nov 6, 2015, at 9:29 AM, Denis Davydov wrote: > > >> On 6 Nov 2015, at 16:22, Matthew Knepley wrote: >> >> Is it possible that the matrix is rank deficient? Jacobi will just chug along and sometimes work, but >> AMG will fail spectacularly in that case. > > It should not. It is just a mass (overlap) matrix coming from linear FEs with zero Dirichlet BC assembled in deal.II. > Due to elimination of some algebraic constraints on DoFs there are lines with only diagonal element, > but it should still be SPD. > > More interestingly is that it does not fail immediately (i.e. the first time it?s used in SLEPc solvers), > but only on the 4th step. So 3 times SLEPc worked just fine to solve GHEP with Gamg and zero shift. > > Regards, > Denis. > > From davydden at gmail.com Fri Nov 6 11:35:32 2015 From: davydden at gmail.com (Denis Davydov) Date: Fri, 6 Nov 2015 18:35:32 +0100 Subject: [petsc-users] CG+GAMG convergence issues in GHEP Krylov-Schur for some MPI runs In-Reply-To: <8C4E73A6-F414-42E4-86F2-DB45C4C7E09D@mcs.anl.gov> References: <084F486B-00B4-4803-A95B-EE390EB25391@gmail.com> <2724FD04-9FED-43B8-B8A4-9F66BB8BD43B@dsic.upv.es> <2605F715-A65B-4DBD-A768-5117326EC251@gmail.com> <8C4E73A6-F414-42E4-86F2-DB45C4C7E09D@mcs.anl.gov> Message-ID: <7F43D245-9C93-4B3E-B9AF-617B2D34A923@gmail.com> > On 6 Nov 2015, at 17:39, Barry Smith wrote: > > > If it is a true mass matrix in the finite element sense of the word then it should be very well conditioned and one definitely would not use something like GAMG on. Jacobi + CG or maybe SSOR + CG should converge rapidly That I understand and absolutely agree. It just does not explain why GAMG would fail, especially on 4 cores and not on 8. Regards, Denis. From ybay2 at illinois.edu Fri Nov 6 12:39:19 2015 From: ybay2 at illinois.edu (Bay, Yong Yi) Date: Fri, 6 Nov 2015 18:39:19 +0000 Subject: [petsc-users] Calling DMCompositeCreate() in Fortran Message-ID: Hi, I noticed that DMCompositeCreate() does not exist in fortran format. What would be the best way to call this function if my code is written in Fortran? I tried using iso_c_binding as follows and it compiles but crashes on execution. interface subroutine DMCompositeCreate(comm, dm) bind(C,name="DMCompositeCreate") import integer(c_int), value :: comm integer(c_long), value :: dm end subroutine DMCompositeCreate end interface -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri Nov 6 12:39:42 2015 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 6 Nov 2015 10:39:42 -0800 Subject: [petsc-users] CG+GAMG convergence issues in GHEP Krylov-Schur for some MPI runs In-Reply-To: <7F43D245-9C93-4B3E-B9AF-617B2D34A923@gmail.com> References: <084F486B-00B4-4803-A95B-EE390EB25391@gmail.com> <2724FD04-9FED-43B8-B8A4-9F66BB8BD43B@dsic.upv.es> <2605F715-A65B-4DBD-A768-5117326EC251@gmail.com> <8C4E73A6-F414-42E4-86F2-DB45C4C7E09D@mcs.anl.gov> <7F43D245-9C93-4B3E-B9AF-617B2D34A923@gmail.com> Message-ID: You can run with -info and grep on GAMG, and send this. If you are shifting a matrix then it can/will get indefinite. If it is just a mass matrix then Jacobi should converge quickly - does it? On Fri, Nov 6, 2015 at 9:35 AM, Denis Davydov wrote: > > > On 6 Nov 2015, at 17:39, Barry Smith wrote: > > > > > > If it is a true mass matrix in the finite element sense of the word > then it should be very well conditioned and one definitely would not use > something like GAMG on. Jacobi + CG or maybe SSOR + CG should converge > rapidly > > That I understand and absolutely agree. > It just does not explain why GAMG would fail, especially on 4 cores and > not on 8. > > Regards, > Denis. > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Nov 6 13:33:44 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 6 Nov 2015 13:33:44 -0600 Subject: [petsc-users] Calling DMCompositeCreate() in Fortran In-Reply-To: References: Message-ID: <8B7ADE77-759F-4D26-892D-D448E2A7BDAA@mcs.anl.gov> When something crashes it is time to run in the debugger and determine what is causing the crash. The code looks ok and we could only speculate wildly why it crashes. Running in the debugger will likely revel the problem in less than 3 minutes. Barry > On Nov 6, 2015, at 12:39 PM, Bay, Yong Yi wrote: > > Hi, > > I noticed that DMCompositeCreate() does not exist in fortran format. What would be the best way to call this function if my code is written in Fortran? I tried using iso_c_binding as follows and it compiles but crashes on execution. > > interface > subroutine DMCompositeCreate(comm, dm) bind(C,name="DMCompositeCreate") > import > integer(c_int), value :: comm > integer(c_long), value :: dm > end subroutine DMCompositeCreate > end interface From ybay2 at illinois.edu Fri Nov 6 15:51:58 2015 From: ybay2 at illinois.edu (Bay, Yong Yi) Date: Fri, 6 Nov 2015 21:51:58 +0000 Subject: [petsc-users] Calling DMCompositeCreate() in Fortran In-Reply-To: <8B7ADE77-759F-4D26-892D-D448E2A7BDAA@mcs.anl.gov> References: , <8B7ADE77-759F-4D26-892D-D448E2A7BDAA@mcs.anl.gov> Message-ID: Thanks for the advice. I tried running in gdb, and the crash occurred in DMCreate. However I didn't manage to figure out the reason. After looking at what DMCompositeCreate actually does, I simply called DMCreate and DMSetType instead (binding DMSetType instead) and this worked. Yong Yi ________________________________________ From: Barry Smith [bsmith at mcs.anl.gov] Sent: Friday, November 06, 2015 1:33 PM To: Bay, Yong Yi Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Calling DMCompositeCreate() in Fortran When something crashes it is time to run in the debugger and determine what is causing the crash. The code looks ok and we could only speculate wildly why it crashes. Running in the debugger will likely revel the problem in less than 3 minutes. Barry > On Nov 6, 2015, at 12:39 PM, Bay, Yong Yi wrote: > > Hi, > > I noticed that DMCompositeCreate() does not exist in fortran format. What would be the best way to call this function if my code is written in Fortran? I tried using iso_c_binding as follows and it compiles but crashes on execution. > > interface > subroutine DMCompositeCreate(comm, dm) bind(C,name="DMCompositeCreate") > import > integer(c_int), value :: comm > integer(c_long), value :: dm > end subroutine DMCompositeCreate > end interface From mkury at berkeley.edu Fri Nov 6 18:35:36 2015 From: mkury at berkeley.edu (Matthew Kury) Date: Fri, 6 Nov 2015 16:35:36 -0800 Subject: [petsc-users] Matrix indexing for distributed DMPlex Message-ID: Dear All, I have been trying to figure out how to appropriately index a matrix that was created from a DMPlex with a section defined for it. I created the matrix with DMCreateMatrix() and I tried to index the entries by using the ISLocalToGlobalMapping obtained from DMGetLocalToGlobalMapping, with the local indices being those obtained from PetscSectionGetOffset() with the appropriate points from the D.A.G. However this does not seem to work. In particular, with the ISLocalToGlobalMapping, what I understand is that it gives the relationship of the local indexing to the global indexing, however there are negative numbers in this mapping which I fail to find an explanation of in the documentation. In addition, there do not appear to be redundant indexes to the global matrix as one would expect in a distributed DMPlex because there can be some of the same points on different processors. Any help is greatly appreciated. Thank you for your time, Matthew W. Kury Ph.D. Candidate CMRL UC Berkeley M.S. Mechanical Engineering UC Berkeley, 2014 -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Nov 6 20:16:49 2015 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 6 Nov 2015 20:16:49 -0600 Subject: [petsc-users] Matrix indexing for distributed DMPlex In-Reply-To: References: Message-ID: On Fri, Nov 6, 2015 at 6:35 PM, Matthew Kury wrote: > Dear All, > > I have been trying to figure out how to appropriately index a matrix that > was created from a DMPlex with a section defined for it. > > I created the matrix with DMCreateMatrix() and I tried to index the > entries by using the ISLocalToGlobalMapping obtained from > DMGetLocalToGlobalMapping, with the local indices being those obtained from > PetscSectionGetOffset() with the appropriate points from the D.A.G. However > this does not seem to work. > It should work. However, you can always get the global section using DMGetDefaultGlobalSection() which directly gives global offsets, although nonlocal offsets are stored as -(off+1). > In particular, with the ISLocalToGlobalMapping, what I understand is that > it gives the relationship of the local indexing to the global indexing, > however there are negative numbers in this mapping which I fail to find an > explanation of in the documentation. In addition, there do not appear to be > redundant indexes to the global matrix as one would expect in a distributed > DMPlex because there can be some of the same points on different > processors. > Yes, the negative numbers are -(off+1) for nonlocal offsets. I am not sure I understand your point about redundant offsets. Thanks, Matt > Any help is greatly appreciated. > > Thank you for your time, > > > Matthew W. Kury > Ph.D. Candidate > CMRL UC Berkeley > M.S. Mechanical Engineering UC Berkeley, 2014 > > > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From zonexo at gmail.com Sat Nov 7 23:27:12 2015 From: zonexo at gmail.com (TAY wee-beng) Date: Sun, 8 Nov 2015 13:27:12 +0800 Subject: [petsc-users] DM structures of PetscInt Message-ID: <563EDD30.5090100@gmail.com> Hi, I need to use DM structures of type PETScInt. I tried to use: DM da_cu_types Vec cu_types_local,cu_types_global PetscInt,pointer :: cu_types_array(:,:,:) Is this allowed? The cu_types_array is to be of integer type. Because I got into problems compiling. -- Thank you. Yours sincerely, TAY wee-beng From knepley at gmail.com Sun Nov 8 05:41:09 2015 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 8 Nov 2015 05:41:09 -0600 Subject: [petsc-users] DM structures of PetscInt In-Reply-To: <563EDD30.5090100@gmail.com> References: <563EDD30.5090100@gmail.com> Message-ID: On Sat, Nov 7, 2015 at 11:27 PM, TAY wee-beng wrote: > Hi, > > I need to use DM structures of type PETScInt. > > I tried to use: > > DM da_cu_types > > Vec cu_types_local,cu_types_global > > PetscInt,pointer :: cu_types_array(:,:,:) > > Is this allowed? The cu_types_array is to be of integer type. Because I > got into problems compiling. We do not support integer Vecs. You should use IS. What are you trying to do? Matt > > -- > Thank you. > > Yours sincerely, > > TAY wee-beng > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From zonexo at gmail.com Sun Nov 8 07:37:26 2015 From: zonexo at gmail.com (Wee Beng Tay) Date: Sun, 08 Nov 2015 21:37:26 +0800 Subject: [petsc-users] DM structures of PetscInt In-Reply-To: References: <563EDD30.5090100@gmail.com> Message-ID: <1446989848432-6e66229e-e63cdbce-6964db05@gmail.com> Sent using CloudMagic Email [https://cloudmagic.com/k/d/mailapp?ct=pa&cv=7.4.10&pv=5.0.2&source=email_footer_2] On Sun, Nov 08, 2015 at 7:41 PM, Matthew Knepley < knepley at gmail.com [knepley at gmail.com] > wrote: On Sat, Nov 7, 2015 at 11:27 PM, TAY wee-beng < zonexo at gmail.com [zonexo at gmail.com] > wrote: Hi, I need to use DM structures of type PETScInt. I tried to use: DM da_cu_types Vec cu_types_local,cu_types_global PetscInt,pointer :: cu_types_array(:,:,:) Is this allowed? The cu_types_array is to be of integer type. Because I got into problems compiling. We do not support integer Vecs. You should use IS. What are you trying to do? Matt Hi, I've a cfd code and I need to separate cells inside or outside a body. I denote the inside /boundary /outside cells as type =2/1/0. I've a subroutine which determines the type in parallel and hence I need a Vec with integer type. Based on my description, can I use Is? Is there an example I can follow? Thanks -- Thank you. Yours sincerely, TAY wee-beng -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sun Nov 8 08:57:14 2015 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 8 Nov 2015 08:57:14 -0600 Subject: [petsc-users] DM structures of PetscInt In-Reply-To: <1446989848432-6e66229e-e63cdbce-6964db05@gmail.com> References: <563EDD30.5090100@gmail.com> <1446989848432-6e66229e-e63cdbce-6964db05@gmail.com> Message-ID: On Sun, Nov 8, 2015 at 7:37 AM, Wee Beng Tay wrote: > > Sent using CloudMagic Email > > On Sun, Nov 08, 2015 at 7:41 PM, Matthew Knepley > wrote: > > On Sat, Nov 7, 2015 at 11:27 PM, TAY wee-beng wrote: > >> Hi, >> >> I need to use DM structures of type PETScInt. >> >> I tried to use: >> >> DM da_cu_types >> >> Vec cu_types_local,cu_types_global >> >> PetscInt,pointer :: cu_types_array(:,:,:) >> >> Is this allowed? The cu_types_array is to be of integer type. Because I >> got into problems compiling. > > > We do not support integer Vecs. You should use IS. What are you trying to > do? > > Matt > > > Hi, > > I've a cfd code and I need to separate cells inside or outside a body. I > denote the inside /boundary /outside cells as type =2/1/0. I've a > subroutine which determines the type in parallel and hence I need a Vec > with integer type. Based on my description, can I use Is? Is there an > example I can follow? > Yes, use an IS. If you need unstructured communication, you will need to use the PetscSF structure. Matt > Thanks > > > >> -- >> Thank you. >> >> Yours sincerely, >> >> TAY wee-beng >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sun Nov 8 12:08:38 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sun, 8 Nov 2015 12:08:38 -0600 Subject: [petsc-users] DM structures of PetscInt In-Reply-To: <1446989848432-6e66229e-e63cdbce-6964db05@gmail.com> References: <563EDD30.5090100@gmail.com> <1446989848432-6e66229e-e63cdbce-6964db05@gmail.com> Message-ID: <2E8B4BB0-1FE8-4AA3-86FE-53374B8FE5AD@mcs.anl.gov> > On Nov 8, 2015, at 7:37 AM, Wee Beng Tay wrote: > > > > Sent using CloudMagic Email > On Sun, Nov 08, 2015 at 7:41 PM, Matthew Knepley wrote: > > On Sat, Nov 7, 2015 at 11:27 PM, TAY wee-beng wrote: > Hi, > > I need to use DM structures of type PETScInt. > > I tried to use: > > DM da_cu_types > > Vec cu_types_local,cu_types_global > > PetscInt,pointer :: cu_types_array(:,:,:) > > Is this allowed? The cu_types_array is to be of integer type. Because I got into problems compiling. > > We do not support integer Vecs. You should use IS. What are you trying to do? > > Matt > > Hi, > > I've a cfd code and I need to separate cells inside or outside a body. I denote the inside /boundary /outside cells as type =2/1/0. I've a subroutine which determines the type in parallel and hence I need a Vec with integer type. The simplest thing is just to use a real value of 2. 1. or 0. Memoywise it is not a big deal since you will have dozens of vectors plus memory heavy matrix so you won't even notice this extra memory. > Based on my description, can I use Is? Is there an example I can follow? > > Thanks > > > -- > Thank you. > > Yours sincerely, > > TAY wee-beng > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener From bsmith at mcs.anl.gov Sun Nov 8 18:20:50 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sun, 8 Nov 2015 18:20:50 -0600 Subject: [petsc-users] For users of PETSc master branch, API change Message-ID: For users of the PETSc master branch. I have pushed into master some API changes for the PetscOptionsGetXXX() and related routines. The first argument is now a PetscOptions object, which is optional, if you pass a NULL in for the first argument (or a PETSC_NULL_OBJECT in Fortran) you will retain the same functionality as you had previously. Barry From Massimiliano.Leoni at Rolls-Royce.com Mon Nov 9 08:23:31 2015 From: Massimiliano.Leoni at Rolls-Royce.com (Leoni, Massimiliano) Date: Mon, 9 Nov 2015 14:23:31 +0000 Subject: [petsc-users] [petsc-dev] [SLEPc] For users of PETSc master branch, API change Message-ID: <8CFD9053FDA44C4C82D83C288A3763F62F7B43F9@GBDOXPR-MBX003.Rolls-Royce.Local> Is there a branch in the SLEPc repo that supports this? Massimiliano > -----Original Message----- > From: petsc-dev-bounces at mcs.anl.gov [mailto:petsc-dev- > bounces at mcs.anl.gov] On Behalf Of Barry Smith > Sent: 09 November 2015 00:21 > To: PETSc; petsc-dev > Subject: [petsc-dev] For users of PETSc master branch, API change > > > For users of the PETSc master branch. > > I have pushed into master some API changes for the PetscOptionsGetXXX() > and related routines. The first argument is now a PetscOptions object, which > is optional, if you pass a NULL in for the first argument (or a > PETSC_NULL_OBJECT in Fortran) you will retain the same functionality as you > had previously. > > Barry The data contained in, or attached to, this e-mail, may contain confidential information. If you have received it in error you should notify the sender immediately by reply e-mail, delete the message from your system and contact +44 (0) 3301235850 (Security Operations Centre) if you need assistance. Please do not copy it for any purpose, or disclose its contents to any other person. An e-mail response to this address may be subject to interception or monitoring for operational reasons or for lawful business practices. (c) 2015 Rolls-Royce plc Registered office: 62 Buckingham Gate, London SW1E 6AT Company number: 1003142. Registered in England. From jroman at dsic.upv.es Mon Nov 9 08:25:42 2015 From: jroman at dsic.upv.es (Jose E. Roman) Date: Mon, 9 Nov 2015 15:25:42 +0100 Subject: [petsc-users] [petsc-dev] [SLEPc] For users of PETSc master branch, API change In-Reply-To: <8CFD9053FDA44C4C82D83C288A3763F62F7B43F9@GBDOXPR-MBX003.Rolls-Royce.Local> References: <8CFD9053FDA44C4C82D83C288A3763F62F7B43F9@GBDOXPR-MBX003.Rolls-Royce.Local> Message-ID: <4A1E3EBE-2253-40D9-95F6-B2F396C3A7F4@dsic.upv.es> Working on it. Be patient. Should be available on master tomorrow. Jose > El 9/11/2015, a las 15:23, Leoni, Massimiliano escribi?: > > Is there a branch in the SLEPc repo that supports this? > > Massimiliano > >> -----Original Message----- >> From: petsc-dev-bounces at mcs.anl.gov [mailto:petsc-dev- >> bounces at mcs.anl.gov] On Behalf Of Barry Smith >> Sent: 09 November 2015 00:21 >> To: PETSc; petsc-dev >> Subject: [petsc-dev] For users of PETSc master branch, API change >> >> >> For users of the PETSc master branch. >> >> I have pushed into master some API changes for the PetscOptionsGetXXX() >> and related routines. The first argument is now a PetscOptions object, which >> is optional, if you pass a NULL in for the first argument (or a >> PETSC_NULL_OBJECT in Fortran) you will retain the same functionality as you >> had previously. >> >> Barry > > The data contained in, or attached to, this e-mail, may contain confidential information. If you have received it in error you should notify the sender immediately by reply e-mail, delete the message from your system and contact +44 (0) 3301235850 (Security Operations Centre) if you need assistance. Please do not copy it for any purpose, or disclose its contents to any other person. > > An e-mail response to this address may be subject to interception or monitoring for operational reasons or for lawful business practices. > > (c) 2015 Rolls-Royce plc > > Registered office: 62 Buckingham Gate, London SW1E 6AT Company number: 1003142. Registered in England. > From Massimiliano.Leoni at Rolls-Royce.com Mon Nov 9 08:31:00 2015 From: Massimiliano.Leoni at Rolls-Royce.com (Leoni, Massimiliano) Date: Mon, 9 Nov 2015 14:31:00 +0000 Subject: [petsc-users] [petsc-dev] [SLEPc] For users of PETSc master branch, API change In-Reply-To: <4A1E3EBE-2253-40D9-95F6-B2F396C3A7F4@dsic.upv.es> References: <8CFD9053FDA44C4C82D83C288A3763F62F7B43F9@GBDOXPR-MBX003.Rolls-Royce.Local> <4A1E3EBE-2253-40D9-95F6-B2F396C3A7F4@dsic.upv.es> Message-ID: <8CFD9053FDA44C4C82D83C288A3763F62F7B4413@GBDOXPR-MBX003.Rolls-Royce.Local> Ok, sorry! It looks like I chose the worst possible day to update :D Best, Massimiliano > -----Original Message----- > From: Jose E. Roman [mailto:jroman at dsic.upv.es] > Sent: 09 November 2015 14:26 > To: Leoni, Massimiliano > Cc: Barry Smith; PETSc; petsc-dev > Subject: Re: [petsc-dev] [SLEPc] For users of PETSc master branch, API > change > > Working on it. Be patient. Should be available on master tomorrow. > Jose > > > > > El 9/11/2015, a las 15:23, Leoni, Massimiliano Royce.com> escribi?: > > > > Is there a branch in the SLEPc repo that supports this? > > > > Massimiliano > > > >> -----Original Message----- > >> From: petsc-dev-bounces at mcs.anl.gov [mailto:petsc-dev- > >> bounces at mcs.anl.gov] On Behalf Of Barry Smith > >> Sent: 09 November 2015 00:21 > >> To: PETSc; petsc-dev > >> Subject: [petsc-dev] For users of PETSc master branch, API change > >> > >> > >> For users of the PETSc master branch. > >> > >> I have pushed into master some API changes for the > >> PetscOptionsGetXXX() and related routines. The first argument is now > >> a PetscOptions object, which is optional, if you pass a NULL in for > >> the first argument (or a PETSC_NULL_OBJECT in Fortran) you will > >> retain the same functionality as you had previously. > >> > >> Barry > > > > The data contained in, or attached to, this e-mail, may contain confidential > information. If you have received it in error you should notify the sender > immediately by reply e-mail, delete the message from your system and > contact +44 (0) 3301235850 (Security Operations Centre) if you need > assistance. Please do not copy it for any purpose, or disclose its contents to > any other person. > > > > An e-mail response to this address may be subject to interception or > monitoring for operational reasons or for lawful business practices. > > > > (c) 2015 Rolls-Royce plc > > > > Registered office: 62 Buckingham Gate, London SW1E 6AT Company > number: 1003142. Registered in England. > > The data contained in, or attached to, this e-mail, may contain confidential information. If you have received it in error you should notify the sender immediately by reply e-mail, delete the message from your system and contact +44 (0) 3301235850 (Security Operations Centre) if you need assistance. Please do not copy it for any purpose, or disclose its contents to any other person. An e-mail response to this address may be subject to interception or monitoring for operational reasons or for lawful business practices. (c) 2015 Rolls-Royce plc Registered office: 62 Buckingham Gate, London SW1E 6AT Company number: 1003142. Registered in England. From zonexo at gmail.com Mon Nov 9 08:33:39 2015 From: zonexo at gmail.com (TAY wee-beng) Date: Mon, 9 Nov 2015 22:33:39 +0800 Subject: [petsc-users] DM structures of PetscInt In-Reply-To: <2E8B4BB0-1FE8-4AA3-86FE-53374B8FE5AD@mcs.anl.gov> References: <563EDD30.5090100@gmail.com> <1446989848432-6e66229e-e63cdbce-6964db05@gmail.com> <2E8B4BB0-1FE8-4AA3-86FE-53374B8FE5AD@mcs.anl.gov> Message-ID: <5640AEC3.5030602@gmail.com> On 9/11/2015 2:08 AM, Barry Smith wrote: >> On Nov 8, 2015, at 7:37 AM, Wee Beng Tay wrote: >> >> >> >> Sent using CloudMagic Email >> On Sun, Nov 08, 2015 at 7:41 PM, Matthew Knepley wrote: >> >> On Sat, Nov 7, 2015 at 11:27 PM, TAY wee-beng wrote: >> Hi, >> >> I need to use DM structures of type PETScInt. >> >> I tried to use: >> >> DM da_cu_types >> >> Vec cu_types_local,cu_types_global >> >> PetscInt,pointer :: cu_types_array(:,:,:) >> >> Is this allowed? The cu_types_array is to be of integer type. Because I got into problems compiling. >> >> We do not support integer Vecs. You should use IS. What are you trying to do? >> >> Matt >> >> Hi, >> >> I've a cfd code and I need to separate cells inside or outside a body. I denote the inside /boundary /outside cells as type =2/1/0. I've a subroutine which determines the type in parallel and hence I need a Vec with integer type. > The simplest thing is just to use a real value of 2. 1. or 0. Memoywise it is not a big deal since you will have dozens of vectors plus memory heavy matrix so you won't even notice this extra memory. > Hi, Ya, thought that this is simpler. It's working now. Thanks. >> Based on my description, can I use Is? Is there an example I can follow? >> >> Thanks >> >> >> -- >> Thank you. >> >> Yours sincerely, >> >> TAY wee-beng >> >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener From balay at mcs.anl.gov Mon Nov 9 08:44:30 2015 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 9 Nov 2015 08:44:30 -0600 Subject: [petsc-users] [petsc-dev] [SLEPc] For users of PETSc master branch, API change In-Reply-To: <8CFD9053FDA44C4C82D83C288A3763F62F7B4413@GBDOXPR-MBX003.Rolls-Royce.Local> References: <8CFD9053FDA44C4C82D83C288A3763F62F7B43F9@GBDOXPR-MBX003.Rolls-Royce.Local> <4A1E3EBE-2253-40D9-95F6-B2F396C3A7F4@dsic.upv.es> <8CFD9053FDA44C4C82D83C288A3763F62F7B4413@GBDOXPR-MBX003.Rolls-Royce.Local> Message-ID: you can try using a slightly older 'master' snapshot' [until you get the slpec fix] For eg: git checkout d916695f21d798ebdf80dc439ef54c5223c9183c And once the slepc fix is available - you can do: git checkout master git pull Satish On Mon, 9 Nov 2015, Leoni, Massimiliano wrote: > Ok, sorry! > It looks like I chose the worst possible day to update :D > > Best, > > Massimiliano > > > -----Original Message----- > > From: Jose E. Roman [mailto:jroman at dsic.upv.es] > > Sent: 09 November 2015 14:26 > > To: Leoni, Massimiliano > > Cc: Barry Smith; PETSc; petsc-dev > > Subject: Re: [petsc-dev] [SLEPc] For users of PETSc master branch, API > > change > > > > Working on it. Be patient. Should be available on master tomorrow. > > Jose > > > > > > > > > El 9/11/2015, a las 15:23, Leoni, Massimiliano > Royce.com> escribi?: > > > > > > Is there a branch in the SLEPc repo that supports this? > > > > > > Massimiliano > > > > > >> -----Original Message----- > > >> From: petsc-dev-bounces at mcs.anl.gov [mailto:petsc-dev- > > >> bounces at mcs.anl.gov] On Behalf Of Barry Smith > > >> Sent: 09 November 2015 00:21 > > >> To: PETSc; petsc-dev > > >> Subject: [petsc-dev] For users of PETSc master branch, API change > > >> > > >> > > >> For users of the PETSc master branch. > > >> > > >> I have pushed into master some API changes for the > > >> PetscOptionsGetXXX() and related routines. The first argument is now > > >> a PetscOptions object, which is optional, if you pass a NULL in for > > >> the first argument (or a PETSC_NULL_OBJECT in Fortran) you will > > >> retain the same functionality as you had previously. > > >> > > >> Barry > > > > > > The data contained in, or attached to, this e-mail, may contain confidential > > information. If you have received it in error you should notify the sender > > immediately by reply e-mail, delete the message from your system and > > contact +44 (0) 3301235850 (Security Operations Centre) if you need > > assistance. Please do not copy it for any purpose, or disclose its contents to > > any other person. > > > > > > An e-mail response to this address may be subject to interception or > > monitoring for operational reasons or for lawful business practices. > > > > > > (c) 2015 Rolls-Royce plc > > > > > > Registered office: 62 Buckingham Gate, London SW1E 6AT Company > > number: 1003142. Registered in England. > > > > > The data contained in, or attached to, this e-mail, may contain confidential information. If you have received it in error you should notify the sender immediately by reply e-mail, delete the message from your system and contact +44 (0) 3301235850 (Security Operations Centre) if you need assistance. Please do not copy it for any purpose, or disclose its contents to any other person. > > An e-mail response to this address may be subject to interception or monitoring for operational reasons or for lawful business practices. > > (c) 2015 Rolls-Royce plc > > Registered office: 62 Buckingham Gate, London SW1E 6AT Company number: 1003142. Registered in England. > From jroman at dsic.upv.es Mon Nov 9 08:48:16 2015 From: jroman at dsic.upv.es (Jose E. Roman) Date: Mon, 9 Nov 2015 15:48:16 +0100 Subject: [petsc-users] [petsc-dev] [SLEPc] For users of PETSc master branch, API change In-Reply-To: References: <8CFD9053FDA44C4C82D83C288A3763F62F7B43F9@GBDOXPR-MBX003.Rolls-Royce.Local> <4A1E3EBE-2253-40D9-95F6-B2F396C3A7F4@dsic.upv.es> <8CFD9053FDA44C4C82D83C288A3763F62F7B4413@GBDOXPR-MBX003.Rolls-Royce.Local> Message-ID: The fix is already in SLEPc's branches 'jose/sync-with-petsc' and 'next'. Will merge into 'master' tomorrow. Jose > El 9/11/2015, a las 15:44, Satish Balay escribi?: > > you can try using a slightly older 'master' snapshot' [until you get > the slpec fix] > > For eg: > git checkout d916695f21d798ebdf80dc439ef54c5223c9183c > > And once the slepc fix is available - you can do: > git checkout master > git pull > > Satish > > On Mon, 9 Nov 2015, Leoni, Massimiliano wrote: > >> Ok, sorry! >> It looks like I chose the worst possible day to update :D >> >> Best, >> >> Massimiliano >> >>> -----Original Message----- >>> From: Jose E. Roman [mailto:jroman at dsic.upv.es] >>> Sent: 09 November 2015 14:26 >>> To: Leoni, Massimiliano >>> Cc: Barry Smith; PETSc; petsc-dev >>> Subject: Re: [petsc-dev] [SLEPc] For users of PETSc master branch, API >>> change >>> >>> Working on it. Be patient. Should be available on master tomorrow. >>> Jose >>> >>> >>> >>>> El 9/11/2015, a las 15:23, Leoni, Massimiliano >> Royce.com> escribi?: >>>> >>>> Is there a branch in the SLEPc repo that supports this? >>>> >>>> Massimiliano >>>> >>>>> -----Original Message----- >>>>> From: petsc-dev-bounces at mcs.anl.gov [mailto:petsc-dev- >>>>> bounces at mcs.anl.gov] On Behalf Of Barry Smith >>>>> Sent: 09 November 2015 00:21 >>>>> To: PETSc; petsc-dev >>>>> Subject: [petsc-dev] For users of PETSc master branch, API change >>>>> >>>>> >>>>> For users of the PETSc master branch. >>>>> >>>>> I have pushed into master some API changes for the >>>>> PetscOptionsGetXXX() and related routines. The first argument is now >>>>> a PetscOptions object, which is optional, if you pass a NULL in for >>>>> the first argument (or a PETSC_NULL_OBJECT in Fortran) you will >>>>> retain the same functionality as you had previously. >>>>> >>>>> Barry >>>> >>>> The data contained in, or attached to, this e-mail, may contain confidential >>> information. If you have received it in error you should notify the sender >>> immediately by reply e-mail, delete the message from your system and >>> contact +44 (0) 3301235850 (Security Operations Centre) if you need >>> assistance. Please do not copy it for any purpose, or disclose its contents to >>> any other person. >>>> >>>> An e-mail response to this address may be subject to interception or >>> monitoring for operational reasons or for lawful business practices. >>>> >>>> (c) 2015 Rolls-Royce plc >>>> >>>> Registered office: 62 Buckingham Gate, London SW1E 6AT Company >>> number: 1003142. Registered in England. >>>> >> >> The data contained in, or attached to, this e-mail, may contain confidential information. If you have received it in error you should notify the sender immediately by reply e-mail, delete the message from your system and contact +44 (0) 3301235850 (Security Operations Centre) if you need assistance. Please do not copy it for any purpose, or disclose its contents to any other person. >> >> An e-mail response to this address may be subject to interception or monitoring for operational reasons or for lawful business practices. >> >> (c) 2015 Rolls-Royce plc >> >> Registered office: 62 Buckingham Gate, London SW1E 6AT Company number: 1003142. Registered in England. >> From zonexo at gmail.com Tue Nov 10 01:33:34 2015 From: zonexo at gmail.com (TAY wee-beng) Date: Tue, 10 Nov 2015 15:33:34 +0800 Subject: [petsc-users] Memory usage with DMDACreate3d and DMDAGetCorners Message-ID: <56419DCE.4090106@gmail.com> Hi, I need a subroutine in Fortran to partition a subset of my grid in the 3 x,y,z directions for MPI. I thought of using DMDACreate3d and DMDAGetCorners to get the starting and width of the partitioned grid. Because I need to partition at every time step and the subset grid changes dimension and index at every time step, so I will also need to use DMDestroy after each time step Will that use alot of memory? Will the grid actually be created? So I wonder if this DMDACreate3d and DMDestroy calls will take a lot of time. -- Thank you. Yours sincerely, TAY wee-beng From zonexo at gmail.com Tue Nov 10 03:27:21 2015 From: zonexo at gmail.com (TAY wee-beng) Date: Tue, 10 Nov 2015 17:27:21 +0800 Subject: [petsc-users] Use of DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90 Message-ID: <5641B879.4020504@gmail.com> Hi, Inside my subroutine, I need to access the DA variable cu_types_array frequently. So I need to call DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90 before and after frequently. Is this necessary? Can I call DMDAVecGetArrayF90 at the start and only call DMDAVecRestoreArrayF90 towards the end, where I don't need to modify the values of cu_types_array anymore? Will this cause memory corruption? Also, must the array be restored using DMDAVecRestoreArrayF90 before calling DMLocalToLocalBegin,DMLocalToLocalEnd? -- Thank you. Yours sincerely, TAY wee-beng From knepley at gmail.com Tue Nov 10 06:25:49 2015 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 10 Nov 2015 06:25:49 -0600 Subject: [petsc-users] Memory usage with DMDACreate3d and DMDAGetCorners In-Reply-To: <56419DCE.4090106@gmail.com> References: <56419DCE.4090106@gmail.com> Message-ID: On Tue, Nov 10, 2015 at 1:33 AM, TAY wee-beng wrote: > Hi, > > I need a subroutine in Fortran to partition a subset of my grid in the 3 > x,y,z directions for MPI. I thought of using DMDACreate3d and > DMDAGetCorners to get the starting and width of the partitioned grid. > > Because I need to partition at every time step and the subset grid changes > dimension and index at every time step, so I will also need to use > DMDestroy after each time step > > Will that use alot of memory? Will the grid actually be created? So I > wonder if this DMDACreate3d and DMDestroy calls will take a lot of time. DMDA just does 1D partitioning in each dimension, so its not that sophisticated. Is that all you want? Matt > > -- > Thank you. > > Yours sincerely, > > TAY wee-beng > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Nov 10 06:27:10 2015 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 10 Nov 2015 06:27:10 -0600 Subject: [petsc-users] Use of DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90 In-Reply-To: <5641B879.4020504@gmail.com> References: <5641B879.4020504@gmail.com> Message-ID: On Tue, Nov 10, 2015 at 3:27 AM, TAY wee-beng wrote: > Hi, > > Inside my subroutine, I need to access the DA variable cu_types_array > frequently. > > So I need to call DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90 before and > after frequently. > > Is this necessary? Can I call DMDAVecGetArrayF90 at the start and only > call DMDAVecRestoreArrayF90 towards the end, where I don't need to modify > the values of cu_types_array anymore? > > Will this cause memory corruption? > You cannot use any other vector operations before you have called Restore. > Also, must the array be restored using DMDAVecRestoreArrayF90 before > calling DMLocalToLocalBegin,DMLocalToLocalEnd? Yes. Matt > > -- > Thank you. > > Yours sincerely, > > TAY wee-beng > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From zonexo at gmail.com Tue Nov 10 06:30:54 2015 From: zonexo at gmail.com (TAY wee-beng) Date: Tue, 10 Nov 2015 20:30:54 +0800 Subject: [petsc-users] Use of DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90 In-Reply-To: References: <5641B879.4020504@gmail.com> Message-ID: <5641E37E.5070401@gmail.com> On 10/11/2015 8:27 PM, Matthew Knepley wrote: > On Tue, Nov 10, 2015 at 3:27 AM, TAY wee-beng > wrote: > > Hi, > > Inside my subroutine, I need to access the DA variable > cu_types_array frequently. > > So I need to call DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90 > before and after frequently. > > Is this necessary? Can I call DMDAVecGetArrayF90 at the start and > only call DMDAVecRestoreArrayF90 towards the end, where I don't > need to modify the values of cu_types_array anymore? > > Will this cause memory corruption? > > > You cannot use any other vector operations before you have called Restore. Hi, What do you mean by vector operations? I will just be doing some maths operation to change the values in cu_types_array. Is that fine? > > Also, must the array be restored using DMDAVecRestoreArrayF90 > before calling DMLocalToLocalBegin,DMLocalToLocalEnd? > > > Yes. > > Matt > > > -- > Thank you. > > Yours sincerely, > > TAY wee-beng > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From zonexo at gmail.com Tue Nov 10 06:34:55 2015 From: zonexo at gmail.com (TAY wee-beng) Date: Tue, 10 Nov 2015 20:34:55 +0800 Subject: [petsc-users] Memory usage with DMDACreate3d and DMDAGetCorners In-Reply-To: References: <56419DCE.4090106@gmail.com> Message-ID: <5641E46F.1020200@gmail.com> On 10/11/2015 8:25 PM, Matthew Knepley wrote: > On Tue, Nov 10, 2015 at 1:33 AM, TAY wee-beng > wrote: > > Hi, > > I need a subroutine in Fortran to partition a subset of my grid in > the 3 x,y,z directions for MPI. I thought of using DMDACreate3d > and DMDAGetCorners to get the starting and width of the > partitioned grid. > > Because I need to partition at every time step and the subset grid > changes dimension and index at every time step, so I will also > need to use DMDestroy after each time step > > Will that use alot of memory? Will the grid actually be created? > So I wonder if this DMDACreate3d and DMDestroy calls will take a > lot of time. > > > DMDA just does 1D partitioning in each dimension, so its not that > sophisticated. Is that all you want? > > Matt Hi, Ya, that's all I want. Btw, how does DMDACreate3d partition the grids in x,y,z? What is the algorithm behind it? Supposed I have 14 x 17 x 20 and 12 cores. How does DMDACreate3d partition it? Thanks. > > > -- > Thank you. > > Yours sincerely, > > TAY wee-beng > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Nov 10 06:47:37 2015 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 10 Nov 2015 06:47:37 -0600 Subject: [petsc-users] Use of DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90 In-Reply-To: <5641E37E.5070401@gmail.com> References: <5641B879.4020504@gmail.com> <5641E37E.5070401@gmail.com> Message-ID: On Tue, Nov 10, 2015 at 6:30 AM, TAY wee-beng wrote: > > On 10/11/2015 8:27 PM, Matthew Knepley wrote: > > On Tue, Nov 10, 2015 at 3:27 AM, TAY wee-beng wrote: > >> Hi, >> >> Inside my subroutine, I need to access the DA variable cu_types_array >> frequently. >> >> So I need to call DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90 before >> and after frequently. >> >> Is this necessary? Can I call DMDAVecGetArrayF90 at the start and only >> call DMDAVecRestoreArrayF90 towards the end, where I don't need to modify >> the values of cu_types_array anymore? >> >> Will this cause memory corruption? >> > > You cannot use any other vector operations before you have called Restore. > > > Hi, > > What do you mean by vector operations? I will just be doing some maths > operation to change the values in cu_types_array. Is that fine? > While you have the array, no other operation can change the values. Matt > > > Also, must the array be restored using DMDAVecRestoreArrayF90 before >> calling DMLocalToLocalBegin,DMLocalToLocalEnd? > > > Yes. > > Matt > > >> >> -- >> Thank you. >> >> Yours sincerely, >> >> TAY wee-beng >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Nov 10 06:48:38 2015 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 10 Nov 2015 06:48:38 -0600 Subject: [petsc-users] Memory usage with DMDACreate3d and DMDAGetCorners In-Reply-To: <5641E46F.1020200@gmail.com> References: <56419DCE.4090106@gmail.com> <5641E46F.1020200@gmail.com> Message-ID: On Tue, Nov 10, 2015 at 6:34 AM, TAY wee-beng wrote: > > On 10/11/2015 8:25 PM, Matthew Knepley wrote: > > On Tue, Nov 10, 2015 at 1:33 AM, TAY wee-beng wrote: > >> Hi, >> >> I need a subroutine in Fortran to partition a subset of my grid in the 3 >> x,y,z directions for MPI. I thought of using DMDACreate3d and >> DMDAGetCorners to get the starting and width of the partitioned grid. >> >> Because I need to partition at every time step and the subset grid >> changes dimension and index at every time step, so I will also need to use >> DMDestroy after each time step >> >> Will that use alot of memory? Will the grid actually be created? So I >> wonder if this DMDACreate3d and DMDestroy calls will take a lot of time. > > > DMDA just does 1D partitioning in each dimension, so its not that > sophisticated. Is that all you want? > > Matt > > Hi, > > Ya, that's all I want. Btw, how does DMDACreate3d partition the grids in > x,y,z? What is the algorithm behind it? > > Supposed I have 14 x 17 x 20 and 12 cores. How does DMDACreate3d partition > it? > https://bitbucket.org/petsc/petsc/src/b0bc92c60ab2e8c65b1792a9a4080bf92080e52f/src/dm/impls/da/da3.c?at=master&fileviewer=file-view-default#da3.c-231 Thanks, Matt > Thanks. > > > >> >> -- >> Thank you. >> >> Yours sincerely, >> >> TAY wee-beng >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmund.ervik at ntnu.no Tue Nov 10 06:51:30 2015 From: asmund.ervik at ntnu.no (=?UTF-8?Q?=c3=85smund_Ervik?=) Date: Tue, 10 Nov 2015 13:51:30 +0100 Subject: [petsc-users] petsc-users Digest, Vol 83, Issue 34 In-Reply-To: References: Message-ID: <5641E852.4060605@ntnu.no> > From: TAY wee-beng > To: PETSc users list > Subject: [petsc-users] Memory usage with DMDACreate3d and > DMDAGetCorners > Message-ID: <56419DCE.4090106 at gmail.com> > Content-Type: text/plain; charset=utf-8; format=flowed > > Hi, > > I need a subroutine in Fortran to partition a subset of my grid in the 3 > x,y,z directions for MPI. I thought of using DMDACreate3d and > DMDAGetCorners to get the starting and width of the partitioned grid. You might want to look at dm/examples/ex13f90.F90 (and ex13f90aux.F90) which does a detailed walkthrough of the DMDA routines for a very simple case in 3D. http://www.mcs.anl.gov/petsc/petsc-current/src/dm/examples/tutorials/ex13f90.F90.html > > Because I need to partition at every time step and the subset grid > changes dimension and index at every time step, so I will also need to > use DMDestroy after each time step > > Will that use alot of memory? Will the grid actually be created? So I > wonder if this DMDACreate3d and DMDestroy calls will take a lot of time. > > > -- > Thank you. > > Yours sincerely, > > TAY wee-beng -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 473 bytes Desc: OpenPGP digital signature URL: From mfadams at lbl.gov Tue Nov 10 07:51:48 2015 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 10 Nov 2015 08:51:48 -0500 Subject: [petsc-users] CG+GAMG convergence issues in GHEP Krylov-Schur for some MPI runs In-Reply-To: <5ED4A81C-002B-4A6B-BC6D-F93BE200F0A5@mcs.anl.gov> References: <084F486B-00B4-4803-A95B-EE390EB25391@gmail.com> <2724FD04-9FED-43B8-B8A4-9F66BB8BD43B@dsic.upv.es> <4E4A4E38-B9C3-4F41-8D52-1AFF3E4E8C8C@gmail.com> <5ED4A81C-002B-4A6B-BC6D-F93BE200F0A5@mcs.anl.gov> Message-ID: I ran an 8 processor job on Edison of a small code for a short run (just a linear solve) and got 37 Mb of output! Here is a 'Petsc' grep. Perhaps we should build an ignore file for things that we believe is a false positive. On Tue, Nov 3, 2015 at 11:55 AM, Barry Smith wrote: > > I am more optimistic about valgrind than Mark. I first try valgrind and > if that fails to be helpful then use the debugger. valgrind has the > advantage that it finds the FIRST place that something is wrong, while in > the debugger it is kind of late at the crash. > > Valgrind should not be noisy, if it is then the applications/libraries > should be cleaned up so that they are valgrind clean and then valgrind is > useful. > > Barry > > > > > On Nov 3, 2015, at 7:47 AM, Mark Adams wrote: > > > > BTW, I think that our advice for segv is use a debugger. DDT or > Totalview, and gdb if need be, will get you right to the source code and > will get 90% of bugs diagnosed. Valgrind is noisy and cumbersome to use > but can diagnose 90% of the other 10%. > > > > On Tue, Nov 3, 2015 at 7:32 AM, Denis Davydov > wrote: > > Hi Jose, > > > > > On 3 Nov 2015, at 12:20, Jose E. Roman wrote: > > > > > > I am answering the SLEPc-related questions: > > > - Having different number of iterations when changing the number of > processes is normal. > > the change in iterations i mentioned are for different preconditioners, > but the same number of MPI processes. > > > > > > > - Yes, if you do not destroy the EPS solver, then the preconditioner > would be reused. > > > > > > Regarding the segmentation fault, I have no clue. Not sure if this is > related to GAMG or not. Maybe running under valgrind could provide more > information. > > will try that. > > > > Denis. > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: petsc_val.gz Type: application/x-gzip Size: 430214 bytes Desc: not available URL: From mfadams at lbl.gov Tue Nov 10 08:20:34 2015 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 10 Nov 2015 09:20:34 -0500 Subject: [petsc-users] CG+GAMG convergence issues in GHEP Krylov-Schur for some MPI runs In-Reply-To: References: <084F486B-00B4-4803-A95B-EE390EB25391@gmail.com> <2724FD04-9FED-43B8-B8A4-9F66BB8BD43B@dsic.upv.es> <4E4A4E38-B9C3-4F41-8D52-1AFF3E4E8C8C@gmail.com> <5ED4A81C-002B-4A6B-BC6D-F93BE200F0A5@mcs.anl.gov> Message-ID: valgrind on Edison does seem to give a lot of false positives. The line number are accurate (not always the case). "assert" triggers it, as does SETERRQ. On Tue, Nov 10, 2015 at 8:51 AM, Mark Adams wrote: > I ran an 8 processor job on Edison of a small code for a short run (just a > linear solve) and got 37 Mb of output! > > Here is a 'Petsc' grep. > > Perhaps we should build an ignore file for things that we believe is a > false positive. > > On Tue, Nov 3, 2015 at 11:55 AM, Barry Smith wrote: > >> >> I am more optimistic about valgrind than Mark. I first try valgrind and >> if that fails to be helpful then use the debugger. valgrind has the >> advantage that it finds the FIRST place that something is wrong, while in >> the debugger it is kind of late at the crash. >> >> Valgrind should not be noisy, if it is then the applications/libraries >> should be cleaned up so that they are valgrind clean and then valgrind is >> useful. >> >> Barry >> >> >> >> > On Nov 3, 2015, at 7:47 AM, Mark Adams wrote: >> > >> > BTW, I think that our advice for segv is use a debugger. DDT or >> Totalview, and gdb if need be, will get you right to the source code and >> will get 90% of bugs diagnosed. Valgrind is noisy and cumbersome to use >> but can diagnose 90% of the other 10%. >> > >> > On Tue, Nov 3, 2015 at 7:32 AM, Denis Davydov >> wrote: >> > Hi Jose, >> > >> > > On 3 Nov 2015, at 12:20, Jose E. Roman wrote: >> > > >> > > I am answering the SLEPc-related questions: >> > > - Having different number of iterations when changing the number of >> processes is normal. >> > the change in iterations i mentioned are for different preconditioners, >> but the same number of MPI processes. >> > >> > >> > > - Yes, if you do not destroy the EPS solver, then the preconditioner >> would be reused. >> > > >> > > Regarding the segmentation fault, I have no clue. Not sure if this is >> related to GAMG or not. Maybe running under valgrind could provide more >> information. >> > will try that. >> > >> > Denis. >> > >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Nov 10 10:15:10 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 10 Nov 2015 10:15:10 -0600 Subject: [petsc-users] CG+GAMG convergence issues in GHEP Krylov-Schur for some MPI runs In-Reply-To: References: <084F486B-00B4-4803-A95B-EE390EB25391@gmail.com> <2724FD04-9FED-43B8-B8A4-9F66BB8BD43B@dsic.upv.es> <4E4A4E38-B9C3-4F41-8D52-1AFF3E4E8C8C@gmail.com> <5ED4A81C-002B-4A6B-BC6D-F93BE200F0A5@mcs.anl.gov> Message-ID: Please send me the full output. This is nuts and should be reported once we understand it better to NERSc as something to be fixed. When I pay $60 million in taxes to a computing center I expect something that works fine for free on my laptop to work also there. Barry > On Nov 10, 2015, at 7:51 AM, Mark Adams wrote: > > I ran an 8 processor job on Edison of a small code for a short run (just a linear solve) and got 37 Mb of output! > > Here is a 'Petsc' grep. > > Perhaps we should build an ignore file for things that we believe is a false positive. > > On Tue, Nov 3, 2015 at 11:55 AM, Barry Smith wrote: > > I am more optimistic about valgrind than Mark. I first try valgrind and if that fails to be helpful then use the debugger. valgrind has the advantage that it finds the FIRST place that something is wrong, while in the debugger it is kind of late at the crash. > > Valgrind should not be noisy, if it is then the applications/libraries should be cleaned up so that they are valgrind clean and then valgrind is useful. > > Barry > > > > > On Nov 3, 2015, at 7:47 AM, Mark Adams wrote: > > > > BTW, I think that our advice for segv is use a debugger. DDT or Totalview, and gdb if need be, will get you right to the source code and will get 90% of bugs diagnosed. Valgrind is noisy and cumbersome to use but can diagnose 90% of the other 10%. > > > > On Tue, Nov 3, 2015 at 7:32 AM, Denis Davydov wrote: > > Hi Jose, > > > > > On 3 Nov 2015, at 12:20, Jose E. Roman wrote: > > > > > > I am answering the SLEPc-related questions: > > > - Having different number of iterations when changing the number of processes is normal. > > the change in iterations i mentioned are for different preconditioners, but the same number of MPI processes. > > > > > > > - Yes, if you do not destroy the EPS solver, then the preconditioner would be reused. > > > > > > Regarding the segmentation fault, I have no clue. Not sure if this is related to GAMG or not. Maybe running under valgrind could provide more information. > > will try that. > > > > Denis. > > > > > From david.knezevic at akselos.com Tue Nov 10 20:39:44 2015 From: david.knezevic at akselos.com (David Knezevic) Date: Tue, 10 Nov 2015 21:39:44 -0500 Subject: [petsc-users] GAMG and zero pivots follow up Message-ID: I'm looking into using GAMG, so I wanted to start with a simple 3D elasticity problem. When I first tried this, I got the following "zero pivot" error: ----------------------------------------------------------------------- [0]PETSC ERROR: Zero pivot in LU factorization: http://www.mcs.anl.gov/petsc/documentation/faq.html#zeropivot [0]PETSC ERROR: Zero pivot, row 3 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.6.1, Jul, 22, 2015 [0]PETSC ERROR: /home/dknez/akselos-dev/scrbe/build/bin/fe_solver-opt_real on a arch-linux2-c-opt named david-Lenovo by dknez Tue Nov 10 21:26:39 2015 [0]PETSC ERROR: Configure options --with-shared-libraries=1 --with-debugging=0 --download-suitesparse --download-parmetis --download-blacs --with-blas-lapack-dir=/opt/intel/system_studio_2015.2.050/mkl --CXXFLAGS=-Wl,--no-as-needed --download-scalapack --download-mumps --download-metis --download-superlu_dist --prefix=/home/dknez/software/libmesh_install/opt_real/petsc --download-hypre --download-ml [0]PETSC ERROR: #1 PetscKernel_A_gets_inverse_A_5() line 48 in /home/dknez/software/petsc-3.6.1/src/mat/impls/baij/seq/dgefa5.c [0]PETSC ERROR: #2 MatSOR_SeqAIJ_Inode() line 2808 in /home/dknez/software/petsc-3.6.1/src/mat/impls/aij/seq/inode.c [0]PETSC ERROR: #3 MatSOR() line 3697 in /home/dknez/software/petsc-3.6.1/src/mat/interface/matrix.c [0]PETSC ERROR: #4 PCApply_SOR() line 37 in /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/sor/sor.c [0]PETSC ERROR: #5 PCApply() line 482 in /home/dknez/software/petsc-3.6.1/src/ksp/pc/interface/precon.c [0]PETSC ERROR: #6 KSP_PCApply() line 242 in /home/dknez/software/petsc-3.6.1/include/petsc/private/kspimpl.h [0]PETSC ERROR: #7 KSPInitialResidual() line 63 in /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itres.c [0]PETSC ERROR: #8 KSPSolve_GMRES() line 235 in /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/gmres/gmres.c [0]PETSC ERROR: #9 KSPSolve() line 604 in /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #10 KSPSolve_Chebyshev() line 381 in /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/cheby/cheby.c [0]PETSC ERROR: #11 KSPSolve() line 604 in /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #12 PCMGMCycle_Private() line 19 in /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c [0]PETSC ERROR: #13 PCMGMCycle_Private() line 48 in /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c [0]PETSC ERROR: #14 PCApply_MG() line 338 in /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c [0]PETSC ERROR: #15 PCApply() line 482 in /home/dknez/software/petsc-3.6.1/src/ksp/pc/interface/precon.c [0]PETSC ERROR: #16 KSP_PCApply() line 242 in /home/dknez/software/petsc-3.6.1/include/petsc/private/kspimpl.h [0]PETSC ERROR: #17 KSPSolve_CG() line 139 in /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/cg/cg.c [0]PETSC ERROR: #18 KSPSolve() line 604 in /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c ----------------------------------------------------------------------- I saw that there was a thread about this in September (subject: "gamg and zero pivots"), and that the fix is to use "-mg_levels_pc_type jacobi." When I do that, the solve succeeds (I pasted the -ksp_view at the end of this email). So I have two questions about this: 1. Is it surprising that I hit this issue for a 3D elasticity problem? Note that matrix assembly was done in libMesh, I can look into the structure of the assembled matrix more carefully, if needed. Also, note that I can solve this problem with direct solvers just fine. 2. Is there a way to set "-mg_levels_pc_type jacobi" programmatically, rather than via the command line? Thanks, David ----------------------------------------------------------------------- ksp_view output: KSP Object: 1 MPI processes type: cg maximum iterations=5000 tolerances: relative=1e-12, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using PRECONDITIONED norm type for convergence test PC Object: 1 MPI processes type: gamg MG: type is MULTIPLICATIVE, levels=5 cycles=v Cycles per PCApply=1 Using Galerkin computed coarse grid matrices GAMG specific options Threshold for dropping small values from graph 0 AGG specific options Symmetric graph false Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 1 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 1 MPI processes type: bjacobi block Jacobi: number of blocks = 1 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (mg_coarse_sub_) 1 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_sub_) 1 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=30, cols=30, bs=6 package used to perform factorization: petsc total: nonzeros=540, allocated nonzeros=540 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 9 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=30, cols=30, bs=6 total: nonzeros=540, allocated nonzeros=540 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 9 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=30, cols=30, bs=6 total: nonzeros=540, allocated nonzeros=540 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 9 nodes, limit used is 5 Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 1 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.335276, max = 3.68804 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_1_esteig_) 1 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_1_) 1 MPI processes type: jacobi linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=72, cols=72, bs=6 total: nonzeros=1728, allocated nonzeros=1728 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 23 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_levels_2_) 1 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.260121, max = 2.86133 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_2_esteig_) 1 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_2_) 1 MPI processes type: jacobi linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=174, cols=174, bs=6 total: nonzeros=5796, allocated nonzeros=5796 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 57 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 3 ------------------------------- KSP Object: (mg_levels_3_) 1 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.267401, max = 2.94141 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_3_esteig_) 1 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_3_) 1 MPI processes type: jacobi linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=828, cols=828, bs=6 total: nonzeros=44496, allocated nonzeros=44496 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 276 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 4 ------------------------------- KSP Object: (mg_levels_4_) 1 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.224361, max = 2.46797 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_4_esteig_) 1 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_4_) 1 MPI processes type: jacobi linear system matrix = precond matrix: Mat Object: () 1 MPI processes type: seqaij rows=2676, cols=2676, bs=3 total: nonzeros=94014, allocated nonzeros=94014 total number of mallocs used during MatSetValues calls =0 has attached near null space using I-node routines: found 892 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: () 1 MPI processes type: seqaij rows=2676, cols=2676, bs=3 total: nonzeros=94014, allocated nonzeros=94014 total number of mallocs used during MatSetValues calls =0 has attached near null space using I-node routines: found 892 nodes, limit used is 5 -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Nov 10 21:00:17 2015 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 10 Nov 2015 21:00:17 -0600 Subject: [petsc-users] GAMG and zero pivots follow up In-Reply-To: References: Message-ID: On Tue, Nov 10, 2015 at 8:39 PM, David Knezevic wrote: > I'm looking into using GAMG, so I wanted to start with a simple 3D > elasticity problem. When I first tried this, I got the following "zero > pivot" error: > > ----------------------------------------------------------------------- > > [0]PETSC ERROR: Zero pivot in LU factorization: > http://www.mcs.anl.gov/petsc/documentation/faq.html#zeropivot > [0]PETSC ERROR: Zero pivot, row 3 > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.6.1, Jul, 22, 2015 > [0]PETSC ERROR: /home/dknez/akselos-dev/scrbe/build/bin/fe_solver-opt_real > on a arch-linux2-c-opt named david-Lenovo by dknez Tue Nov 10 21:26:39 2015 > [0]PETSC ERROR: Configure options --with-shared-libraries=1 > --with-debugging=0 --download-suitesparse --download-parmetis > --download-blacs > --with-blas-lapack-dir=/opt/intel/system_studio_2015.2.050/mkl > --CXXFLAGS=-Wl,--no-as-needed --download-scalapack --download-mumps > --download-metis --download-superlu_dist > --prefix=/home/dknez/software/libmesh_install/opt_real/petsc > --download-hypre --download-ml > [0]PETSC ERROR: #1 PetscKernel_A_gets_inverse_A_5() line 48 in > /home/dknez/software/petsc-3.6.1/src/mat/impls/baij/seq/dgefa5.c > [0]PETSC ERROR: #2 MatSOR_SeqAIJ_Inode() line 2808 in > /home/dknez/software/petsc-3.6.1/src/mat/impls/aij/seq/inode.c > [0]PETSC ERROR: #3 MatSOR() line 3697 in > /home/dknez/software/petsc-3.6.1/src/mat/interface/matrix.c > [0]PETSC ERROR: #4 PCApply_SOR() line 37 in > /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/sor/sor.c > [0]PETSC ERROR: #5 PCApply() line 482 in > /home/dknez/software/petsc-3.6.1/src/ksp/pc/interface/precon.c > [0]PETSC ERROR: #6 KSP_PCApply() line 242 in > /home/dknez/software/petsc-3.6.1/include/petsc/private/kspimpl.h > [0]PETSC ERROR: #7 KSPInitialResidual() line 63 in > /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itres.c > [0]PETSC ERROR: #8 KSPSolve_GMRES() line 235 in > /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/gmres/gmres.c > [0]PETSC ERROR: #9 KSPSolve() line 604 in > /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: #10 KSPSolve_Chebyshev() line 381 in > /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/cheby/cheby.c > [0]PETSC ERROR: #11 KSPSolve() line 604 in > /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: #12 PCMGMCycle_Private() line 19 in > /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c > [0]PETSC ERROR: #13 PCMGMCycle_Private() line 48 in > /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c > [0]PETSC ERROR: #14 PCApply_MG() line 338 in > /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c > [0]PETSC ERROR: #15 PCApply() line 482 in > /home/dknez/software/petsc-3.6.1/src/ksp/pc/interface/precon.c > [0]PETSC ERROR: #16 KSP_PCApply() line 242 in > /home/dknez/software/petsc-3.6.1/include/petsc/private/kspimpl.h > [0]PETSC ERROR: #17 KSPSolve_CG() line 139 in > /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/cg/cg.c > [0]PETSC ERROR: #18 KSPSolve() line 604 in > /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c > > ----------------------------------------------------------------------- > > I saw that there was a thread about this in September (subject: "gamg and > zero pivots"), and that the fix is to use "-mg_levels_pc_type jacobi." > When I do that, the solve succeeds (I pasted the -ksp_view at the end of > this email). > > So I have two questions about this: > > 1. Is it surprising that I hit this issue for a 3D elasticity problem? > Note that matrix assembly was done in libMesh, I can look into the > structure of the assembled matrix more carefully, if needed. Also, note > that I can solve this problem with direct solvers just fine. > Yes, this seems like a bug, but it could be some strange BC thing I do not understand. Naively, the elastic element matrix has a nonzero diagonal. I see that you are doing LU of size 5. That seems strange for 3D elasticity. Am I missing something? I would expect block size 3. > 2. Is there a way to set "-mg_levels_pc_type jacobi" programmatically, > rather than via the command line? > I would really discourage you from doing this. It makes your code fragile and inflexible. Thanks, Matt > Thanks, > David > > ----------------------------------------------------------------------- > > ksp_view output: > > > KSP Object: 1 MPI processes type: cg maximum iterations=5000 tolerances: > relative=1e-12, absolute=1e-50, divergence=10000 left preconditioning using > nonzero initial guess using PRECONDITIONED norm type for convergence test > PC Object: 1 MPI processes type: gamg MG: type is MULTIPLICATIVE, levels=5 > cycles=v Cycles per PCApply=1 Using Galerkin computed coarse grid matrices > GAMG specific options Threshold for dropping small values from graph 0 AGG > specific options Symmetric graph false Coarse grid solver -- level > ------------------------------- KSP Object: (mg_coarse_) 1 MPI processes > type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement GMRES: happy breakdown > tolerance 1e-30 maximum iterations=1, initial guess is zero tolerances: > relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using > NONE norm type for convergence test PC Object: (mg_coarse_) 1 MPI processes > type: bjacobi block Jacobi: number of blocks = 1 Local solve is same for > all blocks, in the following KSP and PC objects: KSP Object: > (mg_coarse_sub_) 1 MPI processes type: preonly maximum iterations=1, > initial guess is zero tolerances: relative=1e-05, absolute=1e-50, > divergence=10000 left preconditioning using NONE norm type for convergence > test PC Object: (mg_coarse_sub_) 1 MPI processes type: lu LU: out-of-place > factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on > blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill > ratio given 5, needed 1 Factored matrix follows: Mat Object: 1 MPI > processes type: seqaij rows=30, cols=30, bs=6 package used to perform > factorization: petsc total: nonzeros=540, allocated nonzeros=540 total > number of mallocs used during MatSetValues calls =0 using I-node routines: > found 9 nodes, limit used is 5 linear system matrix = precond matrix: Mat > Object: 1 MPI processes type: seqaij rows=30, cols=30, bs=6 total: > nonzeros=540, allocated nonzeros=540 total number of mallocs used during > MatSetValues calls =0 using I-node routines: found 9 nodes, limit used is 5 > linear system matrix = precond matrix: Mat Object: 1 MPI processes type: > seqaij rows=30, cols=30, bs=6 total: nonzeros=540, allocated nonzeros=540 > total number of mallocs used during MatSetValues calls =0 using I-node > routines: found 9 nodes, limit used is 5 Down solver (pre-smoother) on > level 1 ------------------------------- KSP Object: (mg_levels_1_) 1 MPI > processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.335276, > max = 3.68804 Chebyshev: eigenvalues estimated using gmres with > translations [0 0.1; 0 1.1] KSP Object: (mg_levels_1_esteig_) 1 MPI > processes type: gmres GMRES: restart=30, using Classical (unmodified) > Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy > breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left > preconditioning using NONE norm type for convergence test maximum > iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > left preconditioning using nonzero initial guess using NONE norm type for > convergence test PC Object: (mg_levels_1_) 1 MPI processes type: jacobi > linear system matrix = precond matrix: Mat Object: 1 MPI processes type: > seqaij rows=72, cols=72, bs=6 total: nonzeros=1728, allocated nonzeros=1728 > total number of mallocs used during MatSetValues calls =0 using I-node > routines: found 23 nodes, limit used is 5 Up solver (post-smoother) same as > down solver (pre-smoother) Down solver (pre-smoother) on level 2 > ------------------------------- KSP Object: (mg_levels_2_) 1 MPI processes > type: chebyshev Chebyshev: eigenvalue estimates: min = 0.260121, max = > 2.86133 Chebyshev: eigenvalues estimated using gmres with translations [0 > 0.1; 0 1.1] KSP Object: (mg_levels_2_esteig_) 1 MPI processes type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement GMRES: happy breakdown > tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: > relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using > NONE norm type for convergence test maximum iterations=2 tolerances: > relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using > nonzero initial guess using NONE norm type for convergence test PC Object: > (mg_levels_2_) 1 MPI processes type: jacobi linear system matrix = precond > matrix: Mat Object: 1 MPI processes type: seqaij rows=174, cols=174, bs=6 > total: nonzeros=5796, allocated nonzeros=5796 total number of mallocs used > during MatSetValues calls =0 using I-node routines: found 57 nodes, limit > used is 5 Up solver (post-smoother) same as down solver (pre-smoother) Down > solver (pre-smoother) on level 3 ------------------------------- KSP > Object: (mg_levels_3_) 1 MPI processes type: chebyshev Chebyshev: > eigenvalue estimates: min = 0.267401, max = 2.94141 Chebyshev: eigenvalues > estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: > (mg_levels_3_esteig_) 1 MPI processes type: gmres GMRES: restart=30, using > Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative > refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, > initial guess is zero tolerances: relative=1e-05, absolute=1e-50, > divergence=10000 left preconditioning using NONE norm type for convergence > test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, > divergence=10000 left preconditioning using nonzero initial guess using > NONE norm type for convergence test PC Object: (mg_levels_3_) 1 MPI > processes type: jacobi linear system matrix = precond matrix: Mat Object: 1 > MPI processes type: seqaij rows=828, cols=828, bs=6 total: nonzeros=44496, > allocated nonzeros=44496 total number of mallocs used during MatSetValues > calls =0 using I-node routines: found 276 nodes, limit used is 5 Up solver > (post-smoother) same as down solver (pre-smoother) Down solver > (pre-smoother) on level 4 ------------------------------- KSP Object: > (mg_levels_4_) 1 MPI processes type: chebyshev Chebyshev: eigenvalue > estimates: min = 0.224361, max = 2.46797 Chebyshev: eigenvalues estimated > using gmres with translations [0 0.1; 0 1.1] KSP Object: > (mg_levels_4_esteig_) 1 MPI processes type: gmres GMRES: restart=30, using > Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative > refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, > initial guess is zero tolerances: relative=1e-05, absolute=1e-50, > divergence=10000 left preconditioning using NONE norm type for convergence > test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, > divergence=10000 left preconditioning using nonzero initial guess using > NONE norm type for convergence test PC Object: (mg_levels_4_) 1 MPI > processes type: jacobi linear system matrix = precond matrix: Mat Object: > () 1 MPI processes type: seqaij rows=2676, cols=2676, bs=3 total: > nonzeros=94014, allocated nonzeros=94014 total number of mallocs used > during MatSetValues calls =0 has attached near null space using I-node > routines: found 892 nodes, limit used is 5 Up solver (post-smoother) same > as down solver (pre-smoother) linear system matrix = precond matrix: Mat > Object: () 1 MPI processes type: seqaij rows=2676, cols=2676, bs=3 total: > nonzeros=94014, allocated nonzeros=94014 total number of mallocs used > during MatSetValues calls =0 has attached near null space using I-node > routines: found 892 nodes, limit used is 5 > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.knezevic at akselos.com Tue Nov 10 21:21:54 2015 From: david.knezevic at akselos.com (David Knezevic) Date: Tue, 10 Nov 2015 22:21:54 -0500 Subject: [petsc-users] GAMG and zero pivots follow up In-Reply-To: References: Message-ID: On Tue, Nov 10, 2015 at 10:00 PM, Matthew Knepley wrote: > On Tue, Nov 10, 2015 at 8:39 PM, David Knezevic < > david.knezevic at akselos.com> wrote: > >> I'm looking into using GAMG, so I wanted to start with a simple 3D >> elasticity problem. When I first tried this, I got the following "zero >> pivot" error: >> >> ----------------------------------------------------------------------- >> >> [0]PETSC ERROR: Zero pivot in LU factorization: >> http://www.mcs.anl.gov/petsc/documentation/faq.html#zeropivot >> [0]PETSC ERROR: Zero pivot, row 3 >> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html >> for trouble shooting. >> [0]PETSC ERROR: Petsc Release Version 3.6.1, Jul, 22, 2015 >> [0]PETSC ERROR: >> /home/dknez/akselos-dev/scrbe/build/bin/fe_solver-opt_real on a >> arch-linux2-c-opt named david-Lenovo by dknez Tue Nov 10 21:26:39 2015 >> [0]PETSC ERROR: Configure options --with-shared-libraries=1 >> --with-debugging=0 --download-suitesparse --download-parmetis >> --download-blacs >> --with-blas-lapack-dir=/opt/intel/system_studio_2015.2.050/mkl >> --CXXFLAGS=-Wl,--no-as-needed --download-scalapack --download-mumps >> --download-metis --download-superlu_dist >> --prefix=/home/dknez/software/libmesh_install/opt_real/petsc >> --download-hypre --download-ml >> [0]PETSC ERROR: #1 PetscKernel_A_gets_inverse_A_5() line 48 in >> /home/dknez/software/petsc-3.6.1/src/mat/impls/baij/seq/dgefa5.c >> [0]PETSC ERROR: #2 MatSOR_SeqAIJ_Inode() line 2808 in >> /home/dknez/software/petsc-3.6.1/src/mat/impls/aij/seq/inode.c >> [0]PETSC ERROR: #3 MatSOR() line 3697 in >> /home/dknez/software/petsc-3.6.1/src/mat/interface/matrix.c >> [0]PETSC ERROR: #4 PCApply_SOR() line 37 in >> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/sor/sor.c >> [0]PETSC ERROR: #5 PCApply() line 482 in >> /home/dknez/software/petsc-3.6.1/src/ksp/pc/interface/precon.c >> [0]PETSC ERROR: #6 KSP_PCApply() line 242 in >> /home/dknez/software/petsc-3.6.1/include/petsc/private/kspimpl.h >> [0]PETSC ERROR: #7 KSPInitialResidual() line 63 in >> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itres.c >> [0]PETSC ERROR: #8 KSPSolve_GMRES() line 235 in >> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/gmres/gmres.c >> [0]PETSC ERROR: #9 KSPSolve() line 604 in >> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c >> [0]PETSC ERROR: #10 KSPSolve_Chebyshev() line 381 in >> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/cheby/cheby.c >> [0]PETSC ERROR: #11 KSPSolve() line 604 in >> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c >> [0]PETSC ERROR: #12 PCMGMCycle_Private() line 19 in >> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c >> [0]PETSC ERROR: #13 PCMGMCycle_Private() line 48 in >> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c >> [0]PETSC ERROR: #14 PCApply_MG() line 338 in >> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c >> [0]PETSC ERROR: #15 PCApply() line 482 in >> /home/dknez/software/petsc-3.6.1/src/ksp/pc/interface/precon.c >> [0]PETSC ERROR: #16 KSP_PCApply() line 242 in >> /home/dknez/software/petsc-3.6.1/include/petsc/private/kspimpl.h >> [0]PETSC ERROR: #17 KSPSolve_CG() line 139 in >> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/cg/cg.c >> [0]PETSC ERROR: #18 KSPSolve() line 604 in >> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c >> >> ----------------------------------------------------------------------- >> >> I saw that there was a thread about this in September (subject: "gamg and >> zero pivots"), and that the fix is to use "-mg_levels_pc_type jacobi." >> When I do that, the solve succeeds (I pasted the -ksp_view at the end of >> this email). >> >> So I have two questions about this: >> >> 1. Is it surprising that I hit this issue for a 3D elasticity problem? >> Note that matrix assembly was done in libMesh, I can look into the >> structure of the assembled matrix more carefully, if needed. Also, note >> that I can solve this problem with direct solvers just fine. >> > > Yes, this seems like a bug, but it could be some strange BC thing I do not > understand. > OK, I can look into the matrix in more detail. I agree that it should have a non-zero diagonal, so I'll have a look at what's happening with that. > Naively, the elastic element matrix has a nonzero diagonal. I see that you > are doing LU > of size 5. That seems strange for 3D elasticity. Am I missing something? I > would expect > block size 3. > I'm not sure what is causing the LU of size 5. Is there a setting to control that? Regarding the block size: I set the vector and matrix block size to 3 via VecSetBlockSize and MatSetBlockSize. I also used MatNullSpaceCreateRigidBody on a vector with block size of 3, and set the matrix's near nullspace using that. > > >> 2. Is there a way to set "-mg_levels_pc_type jacobi" programmatically, >> rather than via the command line? >> > > I would really discourage you from doing this. It makes your code fragile > and inflexible. > OK. The reason I asked is that in this case I have to write a bunch of code to set block sizes and the near nullspace, so I figured it'd be good to also set the corresponding solver options required to make this work... anyway, if I can fix the zero diagonal issue, I guess this will be moot. David > ----------------------------------------------------------------------- >> >> ksp_view output: >> >> >> KSP Object: 1 MPI processes type: cg maximum iterations=5000 tolerances: >> relative=1e-12, absolute=1e-50, divergence=10000 left preconditioning using >> nonzero initial guess using PRECONDITIONED norm type for convergence test >> PC Object: 1 MPI processes type: gamg MG: type is MULTIPLICATIVE, levels=5 >> cycles=v Cycles per PCApply=1 Using Galerkin computed coarse grid matrices >> GAMG specific options Threshold for dropping small values from graph 0 AGG >> specific options Symmetric graph false Coarse grid solver -- level >> ------------------------------- KSP Object: (mg_coarse_) 1 MPI processes >> type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt >> Orthogonalization with no iterative refinement GMRES: happy breakdown >> tolerance 1e-30 maximum iterations=1, initial guess is zero tolerances: >> relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using >> NONE norm type for convergence test PC Object: (mg_coarse_) 1 MPI processes >> type: bjacobi block Jacobi: number of blocks = 1 Local solve is same for >> all blocks, in the following KSP and PC objects: KSP Object: >> (mg_coarse_sub_) 1 MPI processes type: preonly maximum iterations=1, >> initial guess is zero tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000 left preconditioning using NONE norm type for convergence >> test PC Object: (mg_coarse_sub_) 1 MPI processes type: lu LU: out-of-place >> factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on >> blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill >> ratio given 5, needed 1 Factored matrix follows: Mat Object: 1 MPI >> processes type: seqaij rows=30, cols=30, bs=6 package used to perform >> factorization: petsc total: nonzeros=540, allocated nonzeros=540 total >> number of mallocs used during MatSetValues calls =0 using I-node routines: >> found 9 nodes, limit used is 5 linear system matrix = precond matrix: Mat >> Object: 1 MPI processes type: seqaij rows=30, cols=30, bs=6 total: >> nonzeros=540, allocated nonzeros=540 total number of mallocs used during >> MatSetValues calls =0 using I-node routines: found 9 nodes, limit used is 5 >> linear system matrix = precond matrix: Mat Object: 1 MPI processes type: >> seqaij rows=30, cols=30, bs=6 total: nonzeros=540, allocated nonzeros=540 >> total number of mallocs used during MatSetValues calls =0 using I-node >> routines: found 9 nodes, limit used is 5 Down solver (pre-smoother) on >> level 1 ------------------------------- KSP Object: (mg_levels_1_) 1 MPI >> processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.335276, >> max = 3.68804 Chebyshev: eigenvalues estimated using gmres with >> translations [0 0.1; 0 1.1] KSP Object: (mg_levels_1_esteig_) 1 MPI >> processes type: gmres GMRES: restart=30, using Classical (unmodified) >> Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy >> breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left >> preconditioning using NONE norm type for convergence test maximum >> iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 >> left preconditioning using nonzero initial guess using NONE norm type for >> convergence test PC Object: (mg_levels_1_) 1 MPI processes type: jacobi >> linear system matrix = precond matrix: Mat Object: 1 MPI processes type: >> seqaij rows=72, cols=72, bs=6 total: nonzeros=1728, allocated nonzeros=1728 >> total number of mallocs used during MatSetValues calls =0 using I-node >> routines: found 23 nodes, limit used is 5 Up solver (post-smoother) same as >> down solver (pre-smoother) Down solver (pre-smoother) on level 2 >> ------------------------------- KSP Object: (mg_levels_2_) 1 MPI processes >> type: chebyshev Chebyshev: eigenvalue estimates: min = 0.260121, max = >> 2.86133 Chebyshev: eigenvalues estimated using gmres with translations [0 >> 0.1; 0 1.1] KSP Object: (mg_levels_2_esteig_) 1 MPI processes type: gmres >> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt >> Orthogonalization with no iterative refinement GMRES: happy breakdown >> tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: >> relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using >> NONE norm type for convergence test maximum iterations=2 tolerances: >> relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using >> nonzero initial guess using NONE norm type for convergence test PC Object: >> (mg_levels_2_) 1 MPI processes type: jacobi linear system matrix = precond >> matrix: Mat Object: 1 MPI processes type: seqaij rows=174, cols=174, bs=6 >> total: nonzeros=5796, allocated nonzeros=5796 total number of mallocs used >> during MatSetValues calls =0 using I-node routines: found 57 nodes, limit >> used is 5 Up solver (post-smoother) same as down solver (pre-smoother) Down >> solver (pre-smoother) on level 3 ------------------------------- KSP >> Object: (mg_levels_3_) 1 MPI processes type: chebyshev Chebyshev: >> eigenvalue estimates: min = 0.267401, max = 2.94141 Chebyshev: eigenvalues >> estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: >> (mg_levels_3_esteig_) 1 MPI processes type: gmres GMRES: restart=30, using >> Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative >> refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, >> initial guess is zero tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000 left preconditioning using NONE norm type for convergence >> test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000 left preconditioning using nonzero initial guess using >> NONE norm type for convergence test PC Object: (mg_levels_3_) 1 MPI >> processes type: jacobi linear system matrix = precond matrix: Mat Object: 1 >> MPI processes type: seqaij rows=828, cols=828, bs=6 total: nonzeros=44496, >> allocated nonzeros=44496 total number of mallocs used during MatSetValues >> calls =0 using I-node routines: found 276 nodes, limit used is 5 Up solver >> (post-smoother) same as down solver (pre-smoother) Down solver >> (pre-smoother) on level 4 ------------------------------- KSP Object: >> (mg_levels_4_) 1 MPI processes type: chebyshev Chebyshev: eigenvalue >> estimates: min = 0.224361, max = 2.46797 Chebyshev: eigenvalues estimated >> using gmres with translations [0 0.1; 0 1.1] KSP Object: >> (mg_levels_4_esteig_) 1 MPI processes type: gmres GMRES: restart=30, using >> Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative >> refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, >> initial guess is zero tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000 left preconditioning using NONE norm type for convergence >> test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000 left preconditioning using nonzero initial guess using >> NONE norm type for convergence test PC Object: (mg_levels_4_) 1 MPI >> processes type: jacobi linear system matrix = precond matrix: Mat Object: >> () 1 MPI processes type: seqaij rows=2676, cols=2676, bs=3 total: >> nonzeros=94014, allocated nonzeros=94014 total number of mallocs used >> during MatSetValues calls =0 has attached near null space using I-node >> routines: found 892 nodes, limit used is 5 Up solver (post-smoother) same >> as down solver (pre-smoother) linear system matrix = precond matrix: Mat >> Object: () 1 MPI processes type: seqaij rows=2676, cols=2676, bs=3 total: >> nonzeros=94014, allocated nonzeros=94014 total number of mallocs used >> during MatSetValues calls =0 has attached near null space using I-node >> routines: found 892 nodes, limit used is 5 >> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Nov 10 21:24:31 2015 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 10 Nov 2015 21:24:31 -0600 Subject: [petsc-users] GAMG and zero pivots follow up In-Reply-To: References: Message-ID: On Tue, Nov 10, 2015 at 9:21 PM, David Knezevic wrote: > On Tue, Nov 10, 2015 at 10:00 PM, Matthew Knepley > wrote: > >> On Tue, Nov 10, 2015 at 8:39 PM, David Knezevic < >> david.knezevic at akselos.com> wrote: >> >>> I'm looking into using GAMG, so I wanted to start with a simple 3D >>> elasticity problem. When I first tried this, I got the following "zero >>> pivot" error: >>> >>> ----------------------------------------------------------------------- >>> >>> [0]PETSC ERROR: Zero pivot in LU factorization: >>> http://www.mcs.anl.gov/petsc/documentation/faq.html#zeropivot >>> [0]PETSC ERROR: Zero pivot, row 3 >>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html >>> for trouble shooting. >>> [0]PETSC ERROR: Petsc Release Version 3.6.1, Jul, 22, 2015 >>> [0]PETSC ERROR: >>> /home/dknez/akselos-dev/scrbe/build/bin/fe_solver-opt_real on a >>> arch-linux2-c-opt named david-Lenovo by dknez Tue Nov 10 21:26:39 2015 >>> [0]PETSC ERROR: Configure options --with-shared-libraries=1 >>> --with-debugging=0 --download-suitesparse --download-parmetis >>> --download-blacs >>> --with-blas-lapack-dir=/opt/intel/system_studio_2015.2.050/mkl >>> --CXXFLAGS=-Wl,--no-as-needed --download-scalapack --download-mumps >>> --download-metis --download-superlu_dist >>> --prefix=/home/dknez/software/libmesh_install/opt_real/petsc >>> --download-hypre --download-ml >>> [0]PETSC ERROR: #1 PetscKernel_A_gets_inverse_A_5() line 48 in >>> /home/dknez/software/petsc-3.6.1/src/mat/impls/baij/seq/dgefa5.c >>> [0]PETSC ERROR: #2 MatSOR_SeqAIJ_Inode() line 2808 in >>> /home/dknez/software/petsc-3.6.1/src/mat/impls/aij/seq/inode.c >>> [0]PETSC ERROR: #3 MatSOR() line 3697 in >>> /home/dknez/software/petsc-3.6.1/src/mat/interface/matrix.c >>> [0]PETSC ERROR: #4 PCApply_SOR() line 37 in >>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/sor/sor.c >>> [0]PETSC ERROR: #5 PCApply() line 482 in >>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/interface/precon.c >>> [0]PETSC ERROR: #6 KSP_PCApply() line 242 in >>> /home/dknez/software/petsc-3.6.1/include/petsc/private/kspimpl.h >>> [0]PETSC ERROR: #7 KSPInitialResidual() line 63 in >>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itres.c >>> [0]PETSC ERROR: #8 KSPSolve_GMRES() line 235 in >>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/gmres/gmres.c >>> [0]PETSC ERROR: #9 KSPSolve() line 604 in >>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c >>> [0]PETSC ERROR: #10 KSPSolve_Chebyshev() line 381 in >>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/cheby/cheby.c >>> [0]PETSC ERROR: #11 KSPSolve() line 604 in >>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c >>> [0]PETSC ERROR: #12 PCMGMCycle_Private() line 19 in >>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c >>> [0]PETSC ERROR: #13 PCMGMCycle_Private() line 48 in >>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c >>> [0]PETSC ERROR: #14 PCApply_MG() line 338 in >>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c >>> [0]PETSC ERROR: #15 PCApply() line 482 in >>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/interface/precon.c >>> [0]PETSC ERROR: #16 KSP_PCApply() line 242 in >>> /home/dknez/software/petsc-3.6.1/include/petsc/private/kspimpl.h >>> [0]PETSC ERROR: #17 KSPSolve_CG() line 139 in >>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/cg/cg.c >>> [0]PETSC ERROR: #18 KSPSolve() line 604 in >>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c >>> >>> ----------------------------------------------------------------------- >>> >>> I saw that there was a thread about this in September (subject: "gamg >>> and zero pivots"), and that the fix is to use "-mg_levels_pc_type >>> jacobi." When I do that, the solve succeeds (I pasted the -ksp_view at the >>> end of this email). >>> >>> So I have two questions about this: >>> >>> 1. Is it surprising that I hit this issue for a 3D elasticity problem? >>> Note that matrix assembly was done in libMesh, I can look into the >>> structure of the assembled matrix more carefully, if needed. Also, note >>> that I can solve this problem with direct solvers just fine. >>> >> >> Yes, this seems like a bug, but it could be some strange BC thing I do >> not understand. >> > > > OK, I can look into the matrix in more detail. I agree that it should have > a non-zero diagonal, so I'll have a look at what's happening with that. > > > > >> Naively, the elastic element matrix has a nonzero diagonal. I see that >> you are doing LU >> of size 5. That seems strange for 3D elasticity. Am I missing something? >> I would expect >> block size 3. >> > > > I'm not sure what is causing the LU of size 5. Is there a setting to > control that? > > Regarding the block size: I set the vector and matrix block size to 3 > via VecSetBlockSize and MatSetBlockSize. I also > used MatNullSpaceCreateRigidBody on a vector with block size of 3, and set > the matrix's near nullspace using that. > Can you run this same example with -mat_no_inode? I think it may be a strange blocking that is causing this. Thanks, Matt > >> >>> 2. Is there a way to set "-mg_levels_pc_type jacobi" programmatically, >>> rather than via the command line? >>> >> >> I would really discourage you from doing this. It makes your code fragile >> and inflexible. >> > > OK. The reason I asked is that in this case I have to write a bunch of > code to set block sizes and the near nullspace, so I figured it'd be good > to also set the corresponding solver options required to make this work... > anyway, if I can fix the zero diagonal issue, I guess this will be moot. > > David > > > > >> ----------------------------------------------------------------------- >>> >>> ksp_view output: >>> >>> >>> KSP Object: 1 MPI processes type: cg maximum iterations=5000 tolerances: >>> relative=1e-12, absolute=1e-50, divergence=10000 left preconditioning using >>> nonzero initial guess using PRECONDITIONED norm type for convergence test >>> PC Object: 1 MPI processes type: gamg MG: type is MULTIPLICATIVE, levels=5 >>> cycles=v Cycles per PCApply=1 Using Galerkin computed coarse grid matrices >>> GAMG specific options Threshold for dropping small values from graph 0 AGG >>> specific options Symmetric graph false Coarse grid solver -- level >>> ------------------------------- KSP Object: (mg_coarse_) 1 MPI processes >>> type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt >>> Orthogonalization with no iterative refinement GMRES: happy breakdown >>> tolerance 1e-30 maximum iterations=1, initial guess is zero tolerances: >>> relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using >>> NONE norm type for convergence test PC Object: (mg_coarse_) 1 MPI processes >>> type: bjacobi block Jacobi: number of blocks = 1 Local solve is same for >>> all blocks, in the following KSP and PC objects: KSP Object: >>> (mg_coarse_sub_) 1 MPI processes type: preonly maximum iterations=1, >>> initial guess is zero tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000 left preconditioning using NONE norm type for convergence >>> test PC Object: (mg_coarse_sub_) 1 MPI processes type: lu LU: out-of-place >>> factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on >>> blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill >>> ratio given 5, needed 1 Factored matrix follows: Mat Object: 1 MPI >>> processes type: seqaij rows=30, cols=30, bs=6 package used to perform >>> factorization: petsc total: nonzeros=540, allocated nonzeros=540 total >>> number of mallocs used during MatSetValues calls =0 using I-node routines: >>> found 9 nodes, limit used is 5 linear system matrix = precond matrix: Mat >>> Object: 1 MPI processes type: seqaij rows=30, cols=30, bs=6 total: >>> nonzeros=540, allocated nonzeros=540 total number of mallocs used during >>> MatSetValues calls =0 using I-node routines: found 9 nodes, limit used is 5 >>> linear system matrix = precond matrix: Mat Object: 1 MPI processes type: >>> seqaij rows=30, cols=30, bs=6 total: nonzeros=540, allocated nonzeros=540 >>> total number of mallocs used during MatSetValues calls =0 using I-node >>> routines: found 9 nodes, limit used is 5 Down solver (pre-smoother) on >>> level 1 ------------------------------- KSP Object: (mg_levels_1_) 1 MPI >>> processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.335276, >>> max = 3.68804 Chebyshev: eigenvalues estimated using gmres with >>> translations [0 0.1; 0 1.1] KSP Object: (mg_levels_1_esteig_) 1 MPI >>> processes type: gmres GMRES: restart=30, using Classical (unmodified) >>> Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy >>> breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero >>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left >>> preconditioning using NONE norm type for convergence test maximum >>> iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 >>> left preconditioning using nonzero initial guess using NONE norm type for >>> convergence test PC Object: (mg_levels_1_) 1 MPI processes type: jacobi >>> linear system matrix = precond matrix: Mat Object: 1 MPI processes type: >>> seqaij rows=72, cols=72, bs=6 total: nonzeros=1728, allocated nonzeros=1728 >>> total number of mallocs used during MatSetValues calls =0 using I-node >>> routines: found 23 nodes, limit used is 5 Up solver (post-smoother) same as >>> down solver (pre-smoother) Down solver (pre-smoother) on level 2 >>> ------------------------------- KSP Object: (mg_levels_2_) 1 MPI processes >>> type: chebyshev Chebyshev: eigenvalue estimates: min = 0.260121, max = >>> 2.86133 Chebyshev: eigenvalues estimated using gmres with translations [0 >>> 0.1; 0 1.1] KSP Object: (mg_levels_2_esteig_) 1 MPI processes type: gmres >>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt >>> Orthogonalization with no iterative refinement GMRES: happy breakdown >>> tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: >>> relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using >>> NONE norm type for convergence test maximum iterations=2 tolerances: >>> relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using >>> nonzero initial guess using NONE norm type for convergence test PC Object: >>> (mg_levels_2_) 1 MPI processes type: jacobi linear system matrix = precond >>> matrix: Mat Object: 1 MPI processes type: seqaij rows=174, cols=174, bs=6 >>> total: nonzeros=5796, allocated nonzeros=5796 total number of mallocs used >>> during MatSetValues calls =0 using I-node routines: found 57 nodes, limit >>> used is 5 Up solver (post-smoother) same as down solver (pre-smoother) Down >>> solver (pre-smoother) on level 3 ------------------------------- KSP >>> Object: (mg_levels_3_) 1 MPI processes type: chebyshev Chebyshev: >>> eigenvalue estimates: min = 0.267401, max = 2.94141 Chebyshev: eigenvalues >>> estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: >>> (mg_levels_3_esteig_) 1 MPI processes type: gmres GMRES: restart=30, using >>> Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative >>> refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, >>> initial guess is zero tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000 left preconditioning using NONE norm type for convergence >>> test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000 left preconditioning using nonzero initial guess using >>> NONE norm type for convergence test PC Object: (mg_levels_3_) 1 MPI >>> processes type: jacobi linear system matrix = precond matrix: Mat Object: 1 >>> MPI processes type: seqaij rows=828, cols=828, bs=6 total: nonzeros=44496, >>> allocated nonzeros=44496 total number of mallocs used during MatSetValues >>> calls =0 using I-node routines: found 276 nodes, limit used is 5 Up solver >>> (post-smoother) same as down solver (pre-smoother) Down solver >>> (pre-smoother) on level 4 ------------------------------- KSP Object: >>> (mg_levels_4_) 1 MPI processes type: chebyshev Chebyshev: eigenvalue >>> estimates: min = 0.224361, max = 2.46797 Chebyshev: eigenvalues estimated >>> using gmres with translations [0 0.1; 0 1.1] KSP Object: >>> (mg_levels_4_esteig_) 1 MPI processes type: gmres GMRES: restart=30, using >>> Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative >>> refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, >>> initial guess is zero tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000 left preconditioning using NONE norm type for convergence >>> test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000 left preconditioning using nonzero initial guess using >>> NONE norm type for convergence test PC Object: (mg_levels_4_) 1 MPI >>> processes type: jacobi linear system matrix = precond matrix: Mat Object: >>> () 1 MPI processes type: seqaij rows=2676, cols=2676, bs=3 total: >>> nonzeros=94014, allocated nonzeros=94014 total number of mallocs used >>> during MatSetValues calls =0 has attached near null space using I-node >>> routines: found 892 nodes, limit used is 5 Up solver (post-smoother) same >>> as down solver (pre-smoother) linear system matrix = precond matrix: Mat >>> Object: () 1 MPI processes type: seqaij rows=2676, cols=2676, bs=3 total: >>> nonzeros=94014, allocated nonzeros=94014 total number of mallocs used >>> during MatSetValues calls =0 has attached near null space using I-node >>> routines: found 892 nodes, limit used is 5 >>> >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.knezevic at akselos.com Tue Nov 10 21:28:10 2015 From: david.knezevic at akselos.com (David Knezevic) Date: Tue, 10 Nov 2015 22:28:10 -0500 Subject: [petsc-users] GAMG and zero pivots follow up In-Reply-To: References: Message-ID: On Tue, Nov 10, 2015 at 10:24 PM, Matthew Knepley wrote: > On Tue, Nov 10, 2015 at 9:21 PM, David Knezevic < > david.knezevic at akselos.com> wrote: > >> On Tue, Nov 10, 2015 at 10:00 PM, Matthew Knepley >> wrote: >> >>> On Tue, Nov 10, 2015 at 8:39 PM, David Knezevic < >>> david.knezevic at akselos.com> wrote: >>> >>>> I'm looking into using GAMG, so I wanted to start with a simple 3D >>>> elasticity problem. When I first tried this, I got the following "zero >>>> pivot" error: >>>> >>>> ----------------------------------------------------------------------- >>>> >>>> [0]PETSC ERROR: Zero pivot in LU factorization: >>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#zeropivot >>>> [0]PETSC ERROR: Zero pivot, row 3 >>>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html >>>> for trouble shooting. >>>> [0]PETSC ERROR: Petsc Release Version 3.6.1, Jul, 22, 2015 >>>> [0]PETSC ERROR: >>>> /home/dknez/akselos-dev/scrbe/build/bin/fe_solver-opt_real on a >>>> arch-linux2-c-opt named david-Lenovo by dknez Tue Nov 10 21:26:39 2015 >>>> [0]PETSC ERROR: Configure options --with-shared-libraries=1 >>>> --with-debugging=0 --download-suitesparse --download-parmetis >>>> --download-blacs >>>> --with-blas-lapack-dir=/opt/intel/system_studio_2015.2.050/mkl >>>> --CXXFLAGS=-Wl,--no-as-needed --download-scalapack --download-mumps >>>> --download-metis --download-superlu_dist >>>> --prefix=/home/dknez/software/libmesh_install/opt_real/petsc >>>> --download-hypre --download-ml >>>> [0]PETSC ERROR: #1 PetscKernel_A_gets_inverse_A_5() line 48 in >>>> /home/dknez/software/petsc-3.6.1/src/mat/impls/baij/seq/dgefa5.c >>>> [0]PETSC ERROR: #2 MatSOR_SeqAIJ_Inode() line 2808 in >>>> /home/dknez/software/petsc-3.6.1/src/mat/impls/aij/seq/inode.c >>>> [0]PETSC ERROR: #3 MatSOR() line 3697 in >>>> /home/dknez/software/petsc-3.6.1/src/mat/interface/matrix.c >>>> [0]PETSC ERROR: #4 PCApply_SOR() line 37 in >>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/sor/sor.c >>>> [0]PETSC ERROR: #5 PCApply() line 482 in >>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/interface/precon.c >>>> [0]PETSC ERROR: #6 KSP_PCApply() line 242 in >>>> /home/dknez/software/petsc-3.6.1/include/petsc/private/kspimpl.h >>>> [0]PETSC ERROR: #7 KSPInitialResidual() line 63 in >>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itres.c >>>> [0]PETSC ERROR: #8 KSPSolve_GMRES() line 235 in >>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/gmres/gmres.c >>>> [0]PETSC ERROR: #9 KSPSolve() line 604 in >>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c >>>> [0]PETSC ERROR: #10 KSPSolve_Chebyshev() line 381 in >>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/cheby/cheby.c >>>> [0]PETSC ERROR: #11 KSPSolve() line 604 in >>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c >>>> [0]PETSC ERROR: #12 PCMGMCycle_Private() line 19 in >>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c >>>> [0]PETSC ERROR: #13 PCMGMCycle_Private() line 48 in >>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c >>>> [0]PETSC ERROR: #14 PCApply_MG() line 338 in >>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c >>>> [0]PETSC ERROR: #15 PCApply() line 482 in >>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/interface/precon.c >>>> [0]PETSC ERROR: #16 KSP_PCApply() line 242 in >>>> /home/dknez/software/petsc-3.6.1/include/petsc/private/kspimpl.h >>>> [0]PETSC ERROR: #17 KSPSolve_CG() line 139 in >>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/cg/cg.c >>>> [0]PETSC ERROR: #18 KSPSolve() line 604 in >>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c >>>> >>>> ----------------------------------------------------------------------- >>>> >>>> I saw that there was a thread about this in September (subject: "gamg >>>> and zero pivots"), and that the fix is to use "-mg_levels_pc_type >>>> jacobi." When I do that, the solve succeeds (I pasted the -ksp_view at the >>>> end of this email). >>>> >>>> So I have two questions about this: >>>> >>>> 1. Is it surprising that I hit this issue for a 3D elasticity problem? >>>> Note that matrix assembly was done in libMesh, I can look into the >>>> structure of the assembled matrix more carefully, if needed. Also, note >>>> that I can solve this problem with direct solvers just fine. >>>> >>> >>> Yes, this seems like a bug, but it could be some strange BC thing I do >>> not understand. >>> >> >> >> OK, I can look into the matrix in more detail. I agree that it should >> have a non-zero diagonal, so I'll have a look at what's happening with that. >> >> >> >> >>> Naively, the elastic element matrix has a nonzero diagonal. I see that >>> you are doing LU >>> of size 5. That seems strange for 3D elasticity. Am I missing something? >>> I would expect >>> block size 3. >>> >> >> >> I'm not sure what is causing the LU of size 5. Is there a setting to >> control that? >> >> Regarding the block size: I set the vector and matrix block size to 3 >> via VecSetBlockSize and MatSetBlockSize. I also >> used MatNullSpaceCreateRigidBody on a vector with block size of 3, and set >> the matrix's near nullspace using that. >> > > Can you run this same example with -mat_no_inode? I think it may be a > strange blocking that is causing this. > That works. The -ksp_view output is below. Thanks, David KSP Object: 1 MPI processes type: cg maximum iterations=5000 tolerances: relative=1e-12, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using PRECONDITIONED norm type for convergence test PC Object: 1 MPI processes type: gamg MG: type is MULTIPLICATIVE, levels=5 cycles=v Cycles per PCApply=1 Using Galerkin computed coarse grid matrices GAMG specific options Threshold for dropping small values from graph 0 AGG specific options Symmetric graph false Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 1 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 1 MPI processes type: bjacobi block Jacobi: number of blocks = 1 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (mg_coarse_sub_) 1 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_sub_) 1 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=30, cols=30, bs=6 package used to perform factorization: petsc total: nonzeros=540, allocated nonzeros=540 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=30, cols=30, bs=6 total: nonzeros=540, allocated nonzeros=540 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=30, cols=30, bs=6 total: nonzeros=540, allocated nonzeros=540 total number of mallocs used during MatSetValues calls =0 not using I-node routines Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 1 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = -1.79769e+307, max = -inf Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_1_esteig_) 1 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_1_) 1 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=72, cols=72, bs=6 total: nonzeros=1728, allocated nonzeros=1728 total number of mallocs used during MatSetValues calls =0 not using I-node routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_levels_2_) 1 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = -1.79769e+307, max = -inf Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_2_esteig_) 1 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_2_) 1 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=174, cols=174, bs=6 total: nonzeros=5796, allocated nonzeros=5796 total number of mallocs used during MatSetValues calls =0 not using I-node routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 3 ------------------------------- KSP Object: (mg_levels_3_) 1 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = -1.79769e+307, max = -inf Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_3_esteig_) 1 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_3_) 1 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=828, cols=828, bs=6 total: nonzeros=44496, allocated nonzeros=44496 total number of mallocs used during MatSetValues calls =0 not using I-node routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 4 ------------------------------- KSP Object: (mg_levels_4_) 1 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.0998367, max = 1.0982 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_4_esteig_) 1 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_4_) 1 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: () 1 MPI processes type: seqaij rows=2676, cols=2676, bs=3 total: nonzeros=94014, allocated nonzeros=94014 total number of mallocs used during MatSetValues calls =0 has attached near null space not using I-node routines Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: () 1 MPI processes type: seqaij rows=2676, cols=2676, bs=3 total: nonzeros=94014, allocated nonzeros=94014 total number of mallocs used during MatSetValues calls =0 has attached near null space not using I-node routines -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed Nov 11 07:36:46 2015 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 11 Nov 2015 08:36:46 -0500 Subject: [petsc-users] CG+GAMG convergence issues in GHEP Krylov-Schur for some MPI runs In-Reply-To: References: <084F486B-00B4-4803-A95B-EE390EB25391@gmail.com> <2724FD04-9FED-43B8-B8A4-9F66BB8BD43B@dsic.upv.es> <4E4A4E38-B9C3-4F41-8D52-1AFF3E4E8C8C@gmail.com> <5ED4A81C-002B-4A6B-BC6D-F93BE200F0A5@mcs.anl.gov> Message-ID: On Tue, Nov 10, 2015 at 11:15 AM, Barry Smith wrote: > > Please send me the full output. This is nuts and should be reported once > we understand it better to NERSc as something to be fixed. When I pay $60 > million in taxes to a computing center I expect something that works fine > for free on my laptop to work also there. > > Barry > > > On Nov 10, 2015, at 7:51 AM, Mark Adams wrote: > > > > I ran an 8 processor job on Edison of a small code for a short run (just > a linear solve) and got 37 Mb of output! > > > > Here is a 'Petsc' grep. > > > > Perhaps we should build an ignore file for things that we believe is a > false positive. > > > > On Tue, Nov 3, 2015 at 11:55 AM, Barry Smith wrote: > > > > I am more optimistic about valgrind than Mark. I first try valgrind > and if that fails to be helpful then use the debugger. valgrind has the > advantage that it finds the FIRST place that something is wrong, while in > the debugger it is kind of late at the crash. > > > > Valgrind should not be noisy, if it is then the applications/libraries > should be cleaned up so that they are valgrind clean and then valgrind is > useful. > > > > Barry > > > > > > > > > On Nov 3, 2015, at 7:47 AM, Mark Adams wrote: > > > > > > BTW, I think that our advice for segv is use a debugger. DDT or > Totalview, and gdb if need be, will get you right to the source code and > will get 90% of bugs diagnosed. Valgrind is noisy and cumbersome to use > but can diagnose 90% of the other 10%. > > > > > > On Tue, Nov 3, 2015 at 7:32 AM, Denis Davydov > wrote: > > > Hi Jose, > > > > > > > On 3 Nov 2015, at 12:20, Jose E. Roman wrote: > > > > > > > > I am answering the SLEPc-related questions: > > > > - Having different number of iterations when changing the number of > processes is normal. > > > the change in iterations i mentioned are for different > preconditioners, but the same number of MPI processes. > > > > > > > > > > - Yes, if you do not destroy the EPS solver, then the preconditioner > would be reused. > > > > > > > > Regarding the segmentation fault, I have no clue. Not sure if this > is related to GAMG or not. Maybe running under valgrind could provide more > information. > > > will try that. > > > > > > Denis. > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: outval.gz Type: application/x-gzip Size: 57974 bytes Desc: not available URL: From bsmith at mcs.anl.gov Wed Nov 11 10:21:54 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 11 Nov 2015 10:21:54 -0600 Subject: [petsc-users] CG+GAMG convergence issues in GHEP Krylov-Schur for some MPI runs In-Reply-To: References: <084F486B-00B4-4803-A95B-EE390EB25391@gmail.com> <2724FD04-9FED-43B8-B8A4-9F66BB8BD43B@dsic.upv.es> <4E4A4E38-B9C3-4F41-8D52-1AFF3E4E8C8C@gmail.com> <5ED4A81C-002B-4A6B-BC6D-F93BE200F0A5@mcs.anl.gov> Message-ID: Thanks do you use a petscrc file or any file with PETSc options in it for the run? Thanks please send me the exact PETSc commit you are built off so I can see the line numbers in our source when things go bad. Barry > On Nov 11, 2015, at 7:36 AM, Mark Adams wrote: > > > > On Tue, Nov 10, 2015 at 11:15 AM, Barry Smith wrote: > > Please send me the full output. This is nuts and should be reported once we understand it better to NERSc as something to be fixed. When I pay $60 million in taxes to a computing center I expect something that works fine for free on my laptop to work also there. > > Barry > > > On Nov 10, 2015, at 7:51 AM, Mark Adams wrote: > > > > I ran an 8 processor job on Edison of a small code for a short run (just a linear solve) and got 37 Mb of output! > > > > Here is a 'Petsc' grep. > > > > Perhaps we should build an ignore file for things that we believe is a false positive. > > > > On Tue, Nov 3, 2015 at 11:55 AM, Barry Smith wrote: > > > > I am more optimistic about valgrind than Mark. I first try valgrind and if that fails to be helpful then use the debugger. valgrind has the advantage that it finds the FIRST place that something is wrong, while in the debugger it is kind of late at the crash. > > > > Valgrind should not be noisy, if it is then the applications/libraries should be cleaned up so that they are valgrind clean and then valgrind is useful. > > > > Barry > > > > > > > > > On Nov 3, 2015, at 7:47 AM, Mark Adams wrote: > > > > > > BTW, I think that our advice for segv is use a debugger. DDT or Totalview, and gdb if need be, will get you right to the source code and will get 90% of bugs diagnosed. Valgrind is noisy and cumbersome to use but can diagnose 90% of the other 10%. > > > > > > On Tue, Nov 3, 2015 at 7:32 AM, Denis Davydov wrote: > > > Hi Jose, > > > > > > > On 3 Nov 2015, at 12:20, Jose E. Roman wrote: > > > > > > > > I am answering the SLEPc-related questions: > > > > - Having different number of iterations when changing the number of processes is normal. > > > the change in iterations i mentioned are for different preconditioners, but the same number of MPI processes. > > > > > > > > > > - Yes, if you do not destroy the EPS solver, then the preconditioner would be reused. > > > > > > > > Regarding the segmentation fault, I have no clue. Not sure if this is related to GAMG or not. Maybe running under valgrind could provide more information. > > > will try that. > > > > > > Denis. > > > > > > > > > > > > From david.knezevic at akselos.com Wed Nov 11 11:24:32 2015 From: david.knezevic at akselos.com (David Knezevic) Date: Wed, 11 Nov 2015 12:24:32 -0500 Subject: [petsc-users] GAMG and zero pivots follow up In-Reply-To: References: Message-ID: On Tue, Nov 10, 2015 at 10:28 PM, David Knezevic wrote: > On Tue, Nov 10, 2015 at 10:24 PM, Matthew Knepley > wrote: > >> On Tue, Nov 10, 2015 at 9:21 PM, David Knezevic < >> david.knezevic at akselos.com> wrote: >> >>> On Tue, Nov 10, 2015 at 10:00 PM, Matthew Knepley >>> wrote: >>> >>>> On Tue, Nov 10, 2015 at 8:39 PM, David Knezevic < >>>> david.knezevic at akselos.com> wrote: >>>> >>>>> I'm looking into using GAMG, so I wanted to start with a simple 3D >>>>> elasticity problem. When I first tried this, I got the following "zero >>>>> pivot" error: >>>>> >>>>> ----------------------------------------------------------------------- >>>>> >>>>> [0]PETSC ERROR: Zero pivot in LU factorization: >>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#zeropivot >>>>> [0]PETSC ERROR: Zero pivot, row 3 >>>>> [0]PETSC ERROR: See >>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>>>> shooting. >>>>> [0]PETSC ERROR: Petsc Release Version 3.6.1, Jul, 22, 2015 >>>>> [0]PETSC ERROR: >>>>> /home/dknez/akselos-dev/scrbe/build/bin/fe_solver-opt_real on a >>>>> arch-linux2-c-opt named david-Lenovo by dknez Tue Nov 10 21:26:39 2015 >>>>> [0]PETSC ERROR: Configure options --with-shared-libraries=1 >>>>> --with-debugging=0 --download-suitesparse --download-parmetis >>>>> --download-blacs >>>>> --with-blas-lapack-dir=/opt/intel/system_studio_2015.2.050/mkl >>>>> --CXXFLAGS=-Wl,--no-as-needed --download-scalapack --download-mumps >>>>> --download-metis --download-superlu_dist >>>>> --prefix=/home/dknez/software/libmesh_install/opt_real/petsc >>>>> --download-hypre --download-ml >>>>> [0]PETSC ERROR: #1 PetscKernel_A_gets_inverse_A_5() line 48 in >>>>> /home/dknez/software/petsc-3.6.1/src/mat/impls/baij/seq/dgefa5.c >>>>> [0]PETSC ERROR: #2 MatSOR_SeqAIJ_Inode() line 2808 in >>>>> /home/dknez/software/petsc-3.6.1/src/mat/impls/aij/seq/inode.c >>>>> [0]PETSC ERROR: #3 MatSOR() line 3697 in >>>>> /home/dknez/software/petsc-3.6.1/src/mat/interface/matrix.c >>>>> [0]PETSC ERROR: #4 PCApply_SOR() line 37 in >>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/sor/sor.c >>>>> [0]PETSC ERROR: #5 PCApply() line 482 in >>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/interface/precon.c >>>>> [0]PETSC ERROR: #6 KSP_PCApply() line 242 in >>>>> /home/dknez/software/petsc-3.6.1/include/petsc/private/kspimpl.h >>>>> [0]PETSC ERROR: #7 KSPInitialResidual() line 63 in >>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itres.c >>>>> [0]PETSC ERROR: #8 KSPSolve_GMRES() line 235 in >>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/gmres/gmres.c >>>>> [0]PETSC ERROR: #9 KSPSolve() line 604 in >>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c >>>>> [0]PETSC ERROR: #10 KSPSolve_Chebyshev() line 381 in >>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/cheby/cheby.c >>>>> [0]PETSC ERROR: #11 KSPSolve() line 604 in >>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c >>>>> [0]PETSC ERROR: #12 PCMGMCycle_Private() line 19 in >>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c >>>>> [0]PETSC ERROR: #13 PCMGMCycle_Private() line 48 in >>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c >>>>> [0]PETSC ERROR: #14 PCApply_MG() line 338 in >>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c >>>>> [0]PETSC ERROR: #15 PCApply() line 482 in >>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/interface/precon.c >>>>> [0]PETSC ERROR: #16 KSP_PCApply() line 242 in >>>>> /home/dknez/software/petsc-3.6.1/include/petsc/private/kspimpl.h >>>>> [0]PETSC ERROR: #17 KSPSolve_CG() line 139 in >>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/cg/cg.c >>>>> [0]PETSC ERROR: #18 KSPSolve() line 604 in >>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c >>>>> >>>>> ----------------------------------------------------------------------- >>>>> >>>>> I saw that there was a thread about this in September (subject: "gamg >>>>> and zero pivots"), and that the fix is to use "-mg_levels_pc_type >>>>> jacobi." When I do that, the solve succeeds (I pasted the -ksp_view at the >>>>> end of this email). >>>>> >>>>> So I have two questions about this: >>>>> >>>>> 1. Is it surprising that I hit this issue for a 3D elasticity problem? >>>>> Note that matrix assembly was done in libMesh, I can look into the >>>>> structure of the assembled matrix more carefully, if needed. Also, note >>>>> that I can solve this problem with direct solvers just fine. >>>>> >>>> >>>> Yes, this seems like a bug, but it could be some strange BC thing I do >>>> not understand. >>>> >>> >>> >>> OK, I can look into the matrix in more detail. I agree that it should >>> have a non-zero diagonal, so I'll have a look at what's happening with that. >>> >>> >>> >>> >>>> Naively, the elastic element matrix has a nonzero diagonal. I see that >>>> you are doing LU >>>> of size 5. That seems strange for 3D elasticity. Am I missing >>>> something? I would expect >>>> block size 3. >>>> >>> >>> >>> I'm not sure what is causing the LU of size 5. Is there a setting to >>> control that? >>> >>> Regarding the block size: I set the vector and matrix block size to 3 >>> via VecSetBlockSize and MatSetBlockSize. I also >>> used MatNullSpaceCreateRigidBody on a vector with block size of 3, and set >>> the matrix's near nullspace using that. >>> >> >> Can you run this same example with -mat_no_inode? I think it may be a >> strange blocking that is causing this. >> > > > That works. The -ksp_view output is below. > I just wanted to follow up on this. I had a more careful look at the matrix, and confirmed that there are no zero entries on the diagonal (as expected for elasticity). The matrix is from one of libMesh's example problems: a simple cantilever model using HEX8 elements. Do you have any further thoughts about what might cause the "strange blocking" that you referred to? If there's something non-standard that libMesh is doing with the blocks, I'd be interested to look into that. I can send over the matrix if that would be helpful. Thanks, David -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.knezevic at akselos.com Wed Nov 11 11:57:45 2015 From: david.knezevic at akselos.com (David Knezevic) Date: Wed, 11 Nov 2015 12:57:45 -0500 Subject: [petsc-users] GAMG and zero pivots follow up In-Reply-To: References: Message-ID: On Wed, Nov 11, 2015 at 12:24 PM, David Knezevic wrote: > On Tue, Nov 10, 2015 at 10:28 PM, David Knezevic < > david.knezevic at akselos.com> wrote: > >> On Tue, Nov 10, 2015 at 10:24 PM, Matthew Knepley >> wrote: >> >>> On Tue, Nov 10, 2015 at 9:21 PM, David Knezevic < >>> david.knezevic at akselos.com> wrote: >>> >>>> On Tue, Nov 10, 2015 at 10:00 PM, Matthew Knepley >>>> wrote: >>>> >>>>> On Tue, Nov 10, 2015 at 8:39 PM, David Knezevic < >>>>> david.knezevic at akselos.com> wrote: >>>>> >>>>>> I'm looking into using GAMG, so I wanted to start with a simple 3D >>>>>> elasticity problem. When I first tried this, I got the following "zero >>>>>> pivot" error: >>>>>> >>>>>> >>>>>> ----------------------------------------------------------------------- >>>>>> >>>>>> [0]PETSC ERROR: Zero pivot in LU factorization: >>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#zeropivot >>>>>> [0]PETSC ERROR: Zero pivot, row 3 >>>>>> [0]PETSC ERROR: See >>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>>>>> shooting. >>>>>> [0]PETSC ERROR: Petsc Release Version 3.6.1, Jul, 22, 2015 >>>>>> [0]PETSC ERROR: >>>>>> /home/dknez/akselos-dev/scrbe/build/bin/fe_solver-opt_real on a >>>>>> arch-linux2-c-opt named david-Lenovo by dknez Tue Nov 10 21:26:39 2015 >>>>>> [0]PETSC ERROR: Configure options --with-shared-libraries=1 >>>>>> --with-debugging=0 --download-suitesparse --download-parmetis >>>>>> --download-blacs >>>>>> --with-blas-lapack-dir=/opt/intel/system_studio_2015.2.050/mkl >>>>>> --CXXFLAGS=-Wl,--no-as-needed --download-scalapack --download-mumps >>>>>> --download-metis --download-superlu_dist >>>>>> --prefix=/home/dknez/software/libmesh_install/opt_real/petsc >>>>>> --download-hypre --download-ml >>>>>> [0]PETSC ERROR: #1 PetscKernel_A_gets_inverse_A_5() line 48 in >>>>>> /home/dknez/software/petsc-3.6.1/src/mat/impls/baij/seq/dgefa5.c >>>>>> [0]PETSC ERROR: #2 MatSOR_SeqAIJ_Inode() line 2808 in >>>>>> /home/dknez/software/petsc-3.6.1/src/mat/impls/aij/seq/inode.c >>>>>> [0]PETSC ERROR: #3 MatSOR() line 3697 in >>>>>> /home/dknez/software/petsc-3.6.1/src/mat/interface/matrix.c >>>>>> [0]PETSC ERROR: #4 PCApply_SOR() line 37 in >>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/sor/sor.c >>>>>> [0]PETSC ERROR: #5 PCApply() line 482 in >>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/interface/precon.c >>>>>> [0]PETSC ERROR: #6 KSP_PCApply() line 242 in >>>>>> /home/dknez/software/petsc-3.6.1/include/petsc/private/kspimpl.h >>>>>> [0]PETSC ERROR: #7 KSPInitialResidual() line 63 in >>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itres.c >>>>>> [0]PETSC ERROR: #8 KSPSolve_GMRES() line 235 in >>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/gmres/gmres.c >>>>>> [0]PETSC ERROR: #9 KSPSolve() line 604 in >>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c >>>>>> [0]PETSC ERROR: #10 KSPSolve_Chebyshev() line 381 in >>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/cheby/cheby.c >>>>>> [0]PETSC ERROR: #11 KSPSolve() line 604 in >>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c >>>>>> [0]PETSC ERROR: #12 PCMGMCycle_Private() line 19 in >>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c >>>>>> [0]PETSC ERROR: #13 PCMGMCycle_Private() line 48 in >>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c >>>>>> [0]PETSC ERROR: #14 PCApply_MG() line 338 in >>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c >>>>>> [0]PETSC ERROR: #15 PCApply() line 482 in >>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/interface/precon.c >>>>>> [0]PETSC ERROR: #16 KSP_PCApply() line 242 in >>>>>> /home/dknez/software/petsc-3.6.1/include/petsc/private/kspimpl.h >>>>>> [0]PETSC ERROR: #17 KSPSolve_CG() line 139 in >>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/cg/cg.c >>>>>> [0]PETSC ERROR: #18 KSPSolve() line 604 in >>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c >>>>>> >>>>>> >>>>>> ----------------------------------------------------------------------- >>>>>> >>>>>> I saw that there was a thread about this in September (subject: "gamg >>>>>> and zero pivots"), and that the fix is to use "-mg_levels_pc_type >>>>>> jacobi." When I do that, the solve succeeds (I pasted the -ksp_view at the >>>>>> end of this email). >>>>>> >>>>>> So I have two questions about this: >>>>>> >>>>>> 1. Is it surprising that I hit this issue for a 3D elasticity >>>>>> problem? Note that matrix assembly was done in libMesh, I can look into the >>>>>> structure of the assembled matrix more carefully, if needed. Also, note >>>>>> that I can solve this problem with direct solvers just fine. >>>>>> >>>>> >>>>> Yes, this seems like a bug, but it could be some strange BC thing I do >>>>> not understand. >>>>> >>>> >>>> >>>> OK, I can look into the matrix in more detail. I agree that it should >>>> have a non-zero diagonal, so I'll have a look at what's happening with that. >>>> >>>> >>>> >>>> >>>>> Naively, the elastic element matrix has a nonzero diagonal. I see that >>>>> you are doing LU >>>>> of size 5. That seems strange for 3D elasticity. Am I missing >>>>> something? I would expect >>>>> block size 3. >>>>> >>>> >>>> >>>> I'm not sure what is causing the LU of size 5. Is there a setting to >>>> control that? >>>> >>>> Regarding the block size: I set the vector and matrix block size to 3 >>>> via VecSetBlockSize and MatSetBlockSize. I also >>>> used MatNullSpaceCreateRigidBody on a vector with block size of 3, and set >>>> the matrix's near nullspace using that. >>>> >>> >>> Can you run this same example with -mat_no_inode? I think it may be a >>> strange blocking that is causing this. >>> >> >> >> That works. The -ksp_view output is below. >> > > > I just wanted to follow up on this. I had a more careful look at the > matrix, and confirmed that there are no zero entries on the diagonal (as > expected for elasticity). The matrix is from one of libMesh's example > problems: a simple cantilever model using HEX8 elements. > > Do you have any further thoughts about what might cause the "strange > blocking" that you referred to? If there's something non-standard that > libMesh is doing with the blocks, I'd be interested to look into that. I > can send over the matrix if that would be helpful. > > Thanks, > David > > P.S. I was previously calling VecSetBlockSize and MatSetBlockSize to set the block size to 3. When I don't do that, I no longer need to call -mat_no_inodes. I've pasted the -ksp_view output below. Does it look like that's working OK? ---------------------------------------------------------- KSP Object: 1 MPI processes type: cg maximum iterations=5000 tolerances: relative=1e-12, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using PRECONDITIONED norm type for convergence test PC Object: 1 MPI processes type: gamg MG: type is MULTIPLICATIVE, levels=6 cycles=v Cycles per PCApply=1 Using Galerkin computed coarse grid matrices GAMG specific options Threshold for dropping small values from graph 0 AGG specific options Symmetric graph false Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 1 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 1 MPI processes type: bjacobi block Jacobi: number of blocks = 1 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (mg_coarse_sub_) 1 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_sub_) 1 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5, needed 1.03941 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=47, cols=47 package used to perform factorization: petsc total: nonzeros=211, allocated nonzeros=211 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=47, cols=47 total: nonzeros=203, allocated nonzeros=203 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=47, cols=47 total: nonzeros=203, allocated nonzeros=203 total number of mallocs used during MatSetValues calls =0 not using I-node routines Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 1 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.0998481, max = 1.09833 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_1_esteig_) 1 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_1_) 1 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=67, cols=67 total: nonzeros=373, allocated nonzeros=373 total number of mallocs used during MatSetValues calls =0 not using I-node routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_levels_2_) 1 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.0997389, max = 1.09713 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_2_esteig_) 1 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_2_) 1 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=129, cols=129 total: nonzeros=1029, allocated nonzeros=1029 total number of mallocs used during MatSetValues calls =0 not using I-node routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 3 ------------------------------- KSP Object: (mg_levels_3_) 1 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.0997179, max = 1.0969 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_3_esteig_) 1 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_3_) 1 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=372, cols=372 total: nonzeros=4116, allocated nonzeros=4116 total number of mallocs used during MatSetValues calls =0 not using I-node routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 4 ------------------------------- KSP Object: (mg_levels_4_) 1 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.0995012, max = 1.09451 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_4_esteig_) 1 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_4_) 1 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=1816, cols=1816 total: nonzeros=26636, allocated nonzeros=26636 total number of mallocs used during MatSetValues calls =0 not using I-node routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 5 ------------------------------- KSP Object: (mg_levels_5_) 1 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.0994721, max = 1.09419 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_5_esteig_) 1 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_5_) 1 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: () 1 MPI processes type: seqaij rows=55473, cols=55473 total: nonzeros=4.08484e+06, allocated nonzeros=4.08484e+06 total number of mallocs used during MatSetValues calls =0 has attached near null space using I-node routines: found 18491 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: () 1 MPI processes type: seqaij rows=55473, cols=55473 total: nonzeros=4.08484e+06, allocated nonzeros=4.08484e+06 total number of mallocs used during MatSetValues calls =0 has attached near null space using I-node routines: found 18491 nodes, limit used is 5 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Nov 11 12:01:39 2015 From: jed at jedbrown.org (Jed Brown) Date: Wed, 11 Nov 2015 11:01:39 -0700 Subject: [petsc-users] GAMG and zero pivots follow up In-Reply-To: References: Message-ID: <87egfwbekc.fsf@jedbrown.org> David Knezevic writes: > Do you have any further thoughts about what might cause the "strange > blocking" that you referred to? If there's something non-standard that > libMesh is doing with the blocks, I'd be interested to look into that. I > can send over the matrix if that would be helpful. Are you running Libmesh with --node_major_dofs? -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From david.knezevic at akselos.com Wed Nov 11 12:36:19 2015 From: david.knezevic at akselos.com (David Knezevic) Date: Wed, 11 Nov 2015 13:36:19 -0500 Subject: [petsc-users] GAMG and zero pivots follow up In-Reply-To: References: Message-ID: On Wed, Nov 11, 2015 at 12:57 PM, David Knezevic wrote: > On Wed, Nov 11, 2015 at 12:24 PM, David Knezevic < > david.knezevic at akselos.com> wrote: > >> On Tue, Nov 10, 2015 at 10:28 PM, David Knezevic < >> david.knezevic at akselos.com> wrote: >> >>> On Tue, Nov 10, 2015 at 10:24 PM, Matthew Knepley >>> wrote: >>> >>>> On Tue, Nov 10, 2015 at 9:21 PM, David Knezevic < >>>> david.knezevic at akselos.com> wrote: >>>> >>>>> On Tue, Nov 10, 2015 at 10:00 PM, Matthew Knepley >>>>> wrote: >>>>> >>>>>> On Tue, Nov 10, 2015 at 8:39 PM, David Knezevic < >>>>>> david.knezevic at akselos.com> wrote: >>>>>> >>>>>>> I'm looking into using GAMG, so I wanted to start with a simple 3D >>>>>>> elasticity problem. When I first tried this, I got the following "zero >>>>>>> pivot" error: >>>>>>> >>>>>>> >>>>>>> ----------------------------------------------------------------------- >>>>>>> >>>>>>> [0]PETSC ERROR: Zero pivot in LU factorization: >>>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#zeropivot >>>>>>> [0]PETSC ERROR: Zero pivot, row 3 >>>>>>> [0]PETSC ERROR: See >>>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>>>>>> shooting. >>>>>>> [0]PETSC ERROR: Petsc Release Version 3.6.1, Jul, 22, 2015 >>>>>>> [0]PETSC ERROR: >>>>>>> /home/dknez/akselos-dev/scrbe/build/bin/fe_solver-opt_real on a >>>>>>> arch-linux2-c-opt named david-Lenovo by dknez Tue Nov 10 21:26:39 2015 >>>>>>> [0]PETSC ERROR: Configure options --with-shared-libraries=1 >>>>>>> --with-debugging=0 --download-suitesparse --download-parmetis >>>>>>> --download-blacs >>>>>>> --with-blas-lapack-dir=/opt/intel/system_studio_2015.2.050/mkl >>>>>>> --CXXFLAGS=-Wl,--no-as-needed --download-scalapack --download-mumps >>>>>>> --download-metis --download-superlu_dist >>>>>>> --prefix=/home/dknez/software/libmesh_install/opt_real/petsc >>>>>>> --download-hypre --download-ml >>>>>>> [0]PETSC ERROR: #1 PetscKernel_A_gets_inverse_A_5() line 48 in >>>>>>> /home/dknez/software/petsc-3.6.1/src/mat/impls/baij/seq/dgefa5.c >>>>>>> [0]PETSC ERROR: #2 MatSOR_SeqAIJ_Inode() line 2808 in >>>>>>> /home/dknez/software/petsc-3.6.1/src/mat/impls/aij/seq/inode.c >>>>>>> [0]PETSC ERROR: #3 MatSOR() line 3697 in >>>>>>> /home/dknez/software/petsc-3.6.1/src/mat/interface/matrix.c >>>>>>> [0]PETSC ERROR: #4 PCApply_SOR() line 37 in >>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/sor/sor.c >>>>>>> [0]PETSC ERROR: #5 PCApply() line 482 in >>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/interface/precon.c >>>>>>> [0]PETSC ERROR: #6 KSP_PCApply() line 242 in >>>>>>> /home/dknez/software/petsc-3.6.1/include/petsc/private/kspimpl.h >>>>>>> [0]PETSC ERROR: #7 KSPInitialResidual() line 63 in >>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itres.c >>>>>>> [0]PETSC ERROR: #8 KSPSolve_GMRES() line 235 in >>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/gmres/gmres.c >>>>>>> [0]PETSC ERROR: #9 KSPSolve() line 604 in >>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c >>>>>>> [0]PETSC ERROR: #10 KSPSolve_Chebyshev() line 381 in >>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/cheby/cheby.c >>>>>>> [0]PETSC ERROR: #11 KSPSolve() line 604 in >>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c >>>>>>> [0]PETSC ERROR: #12 PCMGMCycle_Private() line 19 in >>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c >>>>>>> [0]PETSC ERROR: #13 PCMGMCycle_Private() line 48 in >>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c >>>>>>> [0]PETSC ERROR: #14 PCApply_MG() line 338 in >>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c >>>>>>> [0]PETSC ERROR: #15 PCApply() line 482 in >>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/interface/precon.c >>>>>>> [0]PETSC ERROR: #16 KSP_PCApply() line 242 in >>>>>>> /home/dknez/software/petsc-3.6.1/include/petsc/private/kspimpl.h >>>>>>> [0]PETSC ERROR: #17 KSPSolve_CG() line 139 in >>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/cg/cg.c >>>>>>> [0]PETSC ERROR: #18 KSPSolve() line 604 in >>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c >>>>>>> >>>>>>> >>>>>>> ----------------------------------------------------------------------- >>>>>>> >>>>>>> I saw that there was a thread about this in September (subject: >>>>>>> "gamg and zero pivots"), and that the fix is to use "-mg_levels_pc_type >>>>>>> jacobi." When I do that, the solve succeeds (I pasted the -ksp_view at the >>>>>>> end of this email). >>>>>>> >>>>>>> So I have two questions about this: >>>>>>> >>>>>>> 1. Is it surprising that I hit this issue for a 3D elasticity >>>>>>> problem? Note that matrix assembly was done in libMesh, I can look into the >>>>>>> structure of the assembled matrix more carefully, if needed. Also, note >>>>>>> that I can solve this problem with direct solvers just fine. >>>>>>> >>>>>> >>>>>> Yes, this seems like a bug, but it could be some strange BC thing I >>>>>> do not understand. >>>>>> >>>>> >>>>> >>>>> OK, I can look into the matrix in more detail. I agree that it should >>>>> have a non-zero diagonal, so I'll have a look at what's happening with that. >>>>> >>>>> >>>>> >>>>> >>>>>> Naively, the elastic element matrix has a nonzero diagonal. I see >>>>>> that you are doing LU >>>>>> of size 5. That seems strange for 3D elasticity. Am I missing >>>>>> something? I would expect >>>>>> block size 3. >>>>>> >>>>> >>>>> >>>>> I'm not sure what is causing the LU of size 5. Is there a setting to >>>>> control that? >>>>> >>>>> Regarding the block size: I set the vector and matrix block size to 3 >>>>> via VecSetBlockSize and MatSetBlockSize. I also >>>>> used MatNullSpaceCreateRigidBody on a vector with block size of 3, and set >>>>> the matrix's near nullspace using that. >>>>> >>>> >>>> Can you run this same example with -mat_no_inode? I think it may be a >>>> strange blocking that is causing this. >>>> >>> >>> >>> That works. The -ksp_view output is below. >>> >> >> >> I just wanted to follow up on this. I had a more careful look at the >> matrix, and confirmed that there are no zero entries on the diagonal (as >> expected for elasticity). The matrix is from one of libMesh's example >> problems: a simple cantilever model using HEX8 elements. >> >> Do you have any further thoughts about what might cause the "strange >> blocking" that you referred to? If there's something non-standard that >> libMesh is doing with the blocks, I'd be interested to look into that. I >> can send over the matrix if that would be helpful. >> >> Thanks, >> David >> >> > P.S. I was previously calling VecSetBlockSize and MatSetBlockSize to set > the block size to 3. When I don't do that, I no longer need to call > -mat_no_inodes. I've pasted the -ksp_view output below. Does it look like > that's working OK? > Sorry for the multiple messages, but I think I found the issue. libMesh internally sets the block size to 1 earlier on (in PetscMatrix::init()). I guess it'll work fine if I get it to set the block size to 3 instead, so I'll look into that. (libMesh has an enable-blocked-storage configure option that should take care of this automatically.) David -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed Nov 11 15:38:53 2015 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 11 Nov 2015 16:38:53 -0500 Subject: [petsc-users] CG+GAMG convergence issues in GHEP Krylov-Schur for some MPI runs In-Reply-To: References: <084F486B-00B4-4803-A95B-EE390EB25391@gmail.com> <2724FD04-9FED-43B8-B8A4-9F66BB8BD43B@dsic.upv.es> <4E4A4E38-B9C3-4F41-8D52-1AFF3E4E8C8C@gmail.com> <5ED4A81C-002B-4A6B-BC6D-F93BE200F0A5@mcs.anl.gov> Message-ID: These are the only PETSc params that I used: -log_summary -options_left false -fp_trap I last update about 3 weeks ago and I am on a branch. I can redo this with a current master. My repo seems to have been polluted: 13:35 edison12 master> ~/petsc$ git status # On branch master # Your branch is ahead of 'origin/master' by 262 commits. # nothing to commit (working directory clean) I trust this is OK but let me know if you would like me to clone a fresh repo. Mark On Wed, Nov 11, 2015 at 11:21 AM, Barry Smith wrote: > > Thanks > > do you use a petscrc file or any file with PETSc options in it for the > run? > > Thanks please send me the exact PETSc commit you are built off so I can > see the line numbers in our source when things go bad. > > Barry > > > On Nov 11, 2015, at 7:36 AM, Mark Adams wrote: > > > > > > > > On Tue, Nov 10, 2015 at 11:15 AM, Barry Smith > wrote: > > > > Please send me the full output. This is nuts and should be reported > once we understand it better to NERSc as something to be fixed. When I pay > $60 million in taxes to a computing center I expect something that works > fine for free on my laptop to work also there. > > > > Barry > > > > > On Nov 10, 2015, at 7:51 AM, Mark Adams wrote: > > > > > > I ran an 8 processor job on Edison of a small code for a short run > (just a linear solve) and got 37 Mb of output! > > > > > > Here is a 'Petsc' grep. > > > > > > Perhaps we should build an ignore file for things that we believe is a > false positive. > > > > > > On Tue, Nov 3, 2015 at 11:55 AM, Barry Smith > wrote: > > > > > > I am more optimistic about valgrind than Mark. I first try valgrind > and if that fails to be helpful then use the debugger. valgrind has the > advantage that it finds the FIRST place that something is wrong, while in > the debugger it is kind of late at the crash. > > > > > > Valgrind should not be noisy, if it is then the > applications/libraries should be cleaned up so that they are valgrind clean > and then valgrind is useful. > > > > > > Barry > > > > > > > > > > > > > On Nov 3, 2015, at 7:47 AM, Mark Adams wrote: > > > > > > > > BTW, I think that our advice for segv is use a debugger. DDT or > Totalview, and gdb if need be, will get you right to the source code and > will get 90% of bugs diagnosed. Valgrind is noisy and cumbersome to use > but can diagnose 90% of the other 10%. > > > > > > > > On Tue, Nov 3, 2015 at 7:32 AM, Denis Davydov > wrote: > > > > Hi Jose, > > > > > > > > > On 3 Nov 2015, at 12:20, Jose E. Roman wrote: > > > > > > > > > > I am answering the SLEPc-related questions: > > > > > - Having different number of iterations when changing the number > of processes is normal. > > > > the change in iterations i mentioned are for different > preconditioners, but the same number of MPI processes. > > > > > > > > > > > > > - Yes, if you do not destroy the EPS solver, then the > preconditioner would be reused. > > > > > > > > > > Regarding the segmentation fault, I have no clue. Not sure if this > is related to GAMG or not. Maybe running under valgrind could provide more > information. > > > > will try that. > > > > > > > > Denis. > > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Nov 11 15:53:37 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 11 Nov 2015 15:53:37 -0600 Subject: [petsc-users] CG+GAMG convergence issues in GHEP Krylov-Schur for some MPI runs In-Reply-To: References: <084F486B-00B4-4803-A95B-EE390EB25391@gmail.com> <2724FD04-9FED-43B8-B8A4-9F66BB8BD43B@dsic.upv.es> <4E4A4E38-B9C3-4F41-8D52-1AFF3E4E8C8C@gmail.com> <5ED4A81C-002B-4A6B-BC6D-F93BE200F0A5@mcs.anl.gov> Message-ID: <533477BC-026D-41D4-B835-5174830A0A7E@mcs.anl.gov> send the output from git log > On Nov 11, 2015, at 3:38 PM, Mark Adams wrote: > > These are the only PETSc params that I used: > > -log_summary > -options_left false > -fp_trap > > I last update about 3 weeks ago and I am on a branch. I can redo this with a current master. My repo seems to have been polluted: > > 13:35 edison12 master> ~/petsc$ git status > # On branch master > # Your branch is ahead of 'origin/master' by 262 commits. > # > nothing to commit (working directory clean) > > I trust this is OK but let me know if you would like me to clone a fresh repo. > > Mark > > > > On Wed, Nov 11, 2015 at 11:21 AM, Barry Smith wrote: > > Thanks > > do you use a petscrc file or any file with PETSc options in it for the run? > > Thanks please send me the exact PETSc commit you are built off so I can see the line numbers in our source when things go bad. > > Barry > > > On Nov 11, 2015, at 7:36 AM, Mark Adams wrote: > > > > > > > > On Tue, Nov 10, 2015 at 11:15 AM, Barry Smith wrote: > > > > Please send me the full output. This is nuts and should be reported once we understand it better to NERSc as something to be fixed. When I pay $60 million in taxes to a computing center I expect something that works fine for free on my laptop to work also there. > > > > Barry > > > > > On Nov 10, 2015, at 7:51 AM, Mark Adams wrote: > > > > > > I ran an 8 processor job on Edison of a small code for a short run (just a linear solve) and got 37 Mb of output! > > > > > > Here is a 'Petsc' grep. > > > > > > Perhaps we should build an ignore file for things that we believe is a false positive. > > > > > > On Tue, Nov 3, 2015 at 11:55 AM, Barry Smith wrote: > > > > > > I am more optimistic about valgrind than Mark. I first try valgrind and if that fails to be helpful then use the debugger. valgrind has the advantage that it finds the FIRST place that something is wrong, while in the debugger it is kind of late at the crash. > > > > > > Valgrind should not be noisy, if it is then the applications/libraries should be cleaned up so that they are valgrind clean and then valgrind is useful. > > > > > > Barry > > > > > > > > > > > > > On Nov 3, 2015, at 7:47 AM, Mark Adams wrote: > > > > > > > > BTW, I think that our advice for segv is use a debugger. DDT or Totalview, and gdb if need be, will get you right to the source code and will get 90% of bugs diagnosed. Valgrind is noisy and cumbersome to use but can diagnose 90% of the other 10%. > > > > > > > > On Tue, Nov 3, 2015 at 7:32 AM, Denis Davydov wrote: > > > > Hi Jose, > > > > > > > > > On 3 Nov 2015, at 12:20, Jose E. Roman wrote: > > > > > > > > > > I am answering the SLEPc-related questions: > > > > > - Having different number of iterations when changing the number of processes is normal. > > > > the change in iterations i mentioned are for different preconditioners, but the same number of MPI processes. > > > > > > > > > > > > > - Yes, if you do not destroy the EPS solver, then the preconditioner would be reused. > > > > > > > > > > Regarding the segmentation fault, I have no clue. Not sure if this is related to GAMG or not. Maybe running under valgrind could provide more information. > > > > will try that. > > > > > > > > Denis. > > > > > > > > > > > > > > > > > > > > > From mfadams at lbl.gov Wed Nov 11 16:37:05 2015 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 11 Nov 2015 17:37:05 -0500 Subject: [petsc-users] CG+GAMG convergence issues in GHEP Krylov-Schur for some MPI runs In-Reply-To: <533477BC-026D-41D4-B835-5174830A0A7E@mcs.anl.gov> References: <084F486B-00B4-4803-A95B-EE390EB25391@gmail.com> <2724FD04-9FED-43B8-B8A4-9F66BB8BD43B@dsic.upv.es> <4E4A4E38-B9C3-4F41-8D52-1AFF3E4E8C8C@gmail.com> <5ED4A81C-002B-4A6B-BC6D-F93BE200F0A5@mcs.anl.gov> <533477BC-026D-41D4-B835-5174830A0A7E@mcs.anl.gov> Message-ID: OK, here is an updated output: commit 4bf4127a5802dd8df0d302f4b0a83b52c238cccf Merge: 0d8bfb1 40b2df9 Author: Jason Sarich Date: Wed Nov 11 09:47:21 2015 -0600 Merge branch 'sarich/jenkins' On Wed, Nov 11, 2015 at 4:53 PM, Barry Smith wrote: > > send the output from > > git log > > > > On Nov 11, 2015, at 3:38 PM, Mark Adams wrote: > > > > These are the only PETSc params that I used: > > > > -log_summary > > -options_left false > > -fp_trap > > > > I last update about 3 weeks ago and I am on a branch. I can redo this > with a current master. My repo seems to have been polluted: > > > > 13:35 edison12 master> ~/petsc$ git status > > # On branch master > > # Your branch is ahead of 'origin/master' by 262 commits. > > # > > nothing to commit (working directory clean) > > > > I trust this is OK but let me know if you would like me to clone a fresh > repo. > > > > Mark > > > > > > > > On Wed, Nov 11, 2015 at 11:21 AM, Barry Smith > wrote: > > > > Thanks > > > > do you use a petscrc file or any file with PETSc options in it for > the run? > > > > Thanks please send me the exact PETSc commit you are built off so I > can see the line numbers in our source when things go bad. > > > > Barry > > > > > On Nov 11, 2015, at 7:36 AM, Mark Adams wrote: > > > > > > > > > > > > On Tue, Nov 10, 2015 at 11:15 AM, Barry Smith > wrote: > > > > > > Please send me the full output. This is nuts and should be reported > once we understand it better to NERSc as something to be fixed. When I pay > $60 million in taxes to a computing center I expect something that works > fine for free on my laptop to work also there. > > > > > > Barry > > > > > > > On Nov 10, 2015, at 7:51 AM, Mark Adams wrote: > > > > > > > > I ran an 8 processor job on Edison of a small code for a short run > (just a linear solve) and got 37 Mb of output! > > > > > > > > Here is a 'Petsc' grep. > > > > > > > > Perhaps we should build an ignore file for things that we believe is > a false positive. > > > > > > > > On Tue, Nov 3, 2015 at 11:55 AM, Barry Smith > wrote: > > > > > > > > I am more optimistic about valgrind than Mark. I first try > valgrind and if that fails to be helpful then use the debugger. valgrind > has the advantage that it finds the FIRST place that something is wrong, > while in the debugger it is kind of late at the crash. > > > > > > > > Valgrind should not be noisy, if it is then the > applications/libraries should be cleaned up so that they are valgrind clean > and then valgrind is useful. > > > > > > > > Barry > > > > > > > > > > > > > > > > > On Nov 3, 2015, at 7:47 AM, Mark Adams wrote: > > > > > > > > > > BTW, I think that our advice for segv is use a debugger. DDT or > Totalview, and gdb if need be, will get you right to the source code and > will get 90% of bugs diagnosed. Valgrind is noisy and cumbersome to use > but can diagnose 90% of the other 10%. > > > > > > > > > > On Tue, Nov 3, 2015 at 7:32 AM, Denis Davydov > wrote: > > > > > Hi Jose, > > > > > > > > > > > On 3 Nov 2015, at 12:20, Jose E. Roman > wrote: > > > > > > > > > > > > I am answering the SLEPc-related questions: > > > > > > - Having different number of iterations when changing the number > of processes is normal. > > > > > the change in iterations i mentioned are for different > preconditioners, but the same number of MPI processes. > > > > > > > > > > > > > > > > - Yes, if you do not destroy the EPS solver, then the > preconditioner would be reused. > > > > > > > > > > > > Regarding the segmentation fault, I have no clue. Not sure if > this is related to GAMG or not. Maybe running under valgrind could provide > more information. > > > > > will try that. > > > > > > > > > > Denis. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: outval.gz Type: application/x-gzip Size: 56803 bytes Desc: not available URL: From bsmith at mcs.anl.gov Wed Nov 11 17:14:53 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 11 Nov 2015 17:14:53 -0600 Subject: [petsc-users] CG+GAMG convergence issues in GHEP Krylov-Schur for some MPI runs In-Reply-To: References: <084F486B-00B4-4803-A95B-EE390EB25391@gmail.com> <2724FD04-9FED-43B8-B8A4-9F66BB8BD43B@dsic.upv.es> <4E4A4E38-B9C3-4F41-8D52-1AFF3E4E8C8C@gmail.com> <5ED4A81C-002B-4A6B-BC6D-F93BE200F0A5@mcs.anl.gov> Message-ID: <11E7B1E9-D812-4717-A9F2-929A218573E0@mcs.anl.gov> Hmm, you absolutely must be using an options file otherwise it would never be doing all the stuff it is doing inside PetscOptionsInsertFile()! Please send me the options file. Barry Most of the reports are doing to vendor crimes but it possible that the PetscTokenFind() code has a memory issue though I don't see how. Seriously the NERSc people should be pressuring Cray to have valgrind clean code, this is disgraceful. Conditional jump or move depends on uninitialised value(s) ==2948== at 0x542EC7: PetscTokenFind (str.c:965) ==2948== by 0x4F00B9: PetscOptionsInsertString (options.c:390) ==2948== by 0x4F2F7B: PetscOptionsInsertFile (options.c:590) ==2948== by 0x4F4ED7: PetscOptionsInsert (options.c:721) ==2948== by 0x51A629: PetscInitialize (pinit.c:859) ==2948== by 0x47B98D: main (in /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex) ==2948== ==2948== Use of uninitialised value of size 8 ==2948== at 0x542ECD: PetscTokenFind (str.c:965) ==2948== by 0x4F00B9: PetscOptionsInsertString (options.c:390) ==2948== by 0x4F2F7B: PetscOptionsInsertFile (options.c:590) ==2948== by 0x4F4ED7: PetscOptionsInsert (options.c:721) ==2948== by 0x51A629: PetscInitialize (pinit.c:859) ==2948== by 0x47B98D: main (in /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex) ==2948== ==2948== Conditional jump or move depends on uninitialised value(s) ==2948== at 0x542F04: PetscTokenFind (str.c:966) ==2948== by 0x4F00B9: PetscOptionsInsertString (options.c:390) ==2948== by 0x4F2F7B: PetscOptionsInsertFile (options.c:590) ==2948== by 0x4F4ED7: PetscOptionsInsert (options.c:721) ==2948== by 0x51A629: PetscInitialize (pinit.c:859) ==2948== by 0x47B98D: main (in /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex) ==2948== ==2948== Use of uninitialised value of size 8 ==2948== at 0x542F0E: PetscTokenFind (str.c:967) ==2948== by 0x4F00B9: PetscOptionsInsertString (options.c:390) ==2948== by 0x4F2F7B: PetscOptionsInsertFile (options.c:590) ==2948== by 0x4F4ED7: PetscOptionsInsert (options.c:721) ==2948== by 0x51A629: PetscInitialize (pinit.c:859) ==2948== by 0x47B98D: main (in /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex) ==2948== ==2948== Use of uninitialised value of size 8 ==2948== at 0x542F77: PetscTokenFind (str.c:973) ==2948== by 0x4F00B9: PetscOptionsInsertString (options.c:390) ==2948== by 0x4F2F7B: PetscOptionsInsertFile (options.c:590) ==2948== by 0x4F4ED7: PetscOptionsInsert (options.c:721) ==2948== by 0x51A629: PetscInitialize (pinit.c:859) ==2948== by 0x47B98D: main (in /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex) ==2948== ==2948== Use of uninitialised value of size 8 ==2948== at 0x542F2D: PetscTokenFind (str.c:968) ==2948== by 0x4F00B9: PetscOptionsInsertString (options.c:390) ==2948== by 0x4F2F7B: PetscOptionsInsertFile (options.c:590) ==2948== by 0x4F4ED7: PetscOptionsInsert (options.c:721) ==2948== by 0x51A629: PetscInitialize (pinit.c:859) ==2948== by 0x47B98D: main (in /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex) > On Nov 11, 2015, at 3:38 PM, Mark Adams wrote: > > These are the only PETSc params that I used: > > -log_summary > -options_left false > -fp_trap > > I last update about 3 weeks ago and I am on a branch. I can redo this with a current master. My repo seems to have been polluted: > > 13:35 edison12 master> ~/petsc$ git status > # On branch master > # Your branch is ahead of 'origin/master' by 262 commits. > # > nothing to commit (working directory clean) > > I trust this is OK but let me know if you would like me to clone a fresh repo. > > Mark > > > > On Wed, Nov 11, 2015 at 11:21 AM, Barry Smith wrote: > > Thanks > > do you use a petscrc file or any file with PETSc options in it for the run? > > Thanks please send me the exact PETSc commit you are built off so I can see the line numbers in our source when things go bad. > > Barry > > > On Nov 11, 2015, at 7:36 AM, Mark Adams wrote: > > > > > > > > On Tue, Nov 10, 2015 at 11:15 AM, Barry Smith wrote: > > > > Please send me the full output. This is nuts and should be reported once we understand it better to NERSc as something to be fixed. When I pay $60 million in taxes to a computing center I expect something that works fine for free on my laptop to work also there. > > > > Barry > > > > > On Nov 10, 2015, at 7:51 AM, Mark Adams wrote: > > > > > > I ran an 8 processor job on Edison of a small code for a short run (just a linear solve) and got 37 Mb of output! > > > > > > Here is a 'Petsc' grep. > > > > > > Perhaps we should build an ignore file for things that we believe is a false positive. > > > > > > On Tue, Nov 3, 2015 at 11:55 AM, Barry Smith wrote: > > > > > > I am more optimistic about valgrind than Mark. I first try valgrind and if that fails to be helpful then use the debugger. valgrind has the advantage that it finds the FIRST place that something is wrong, while in the debugger it is kind of late at the crash. > > > > > > Valgrind should not be noisy, if it is then the applications/libraries should be cleaned up so that they are valgrind clean and then valgrind is useful. > > > > > > Barry > > > > > > > > > > > > > On Nov 3, 2015, at 7:47 AM, Mark Adams wrote: > > > > > > > > BTW, I think that our advice for segv is use a debugger. DDT or Totalview, and gdb if need be, will get you right to the source code and will get 90% of bugs diagnosed. Valgrind is noisy and cumbersome to use but can diagnose 90% of the other 10%. > > > > > > > > On Tue, Nov 3, 2015 at 7:32 AM, Denis Davydov wrote: > > > > Hi Jose, > > > > > > > > > On 3 Nov 2015, at 12:20, Jose E. Roman wrote: > > > > > > > > > > I am answering the SLEPc-related questions: > > > > > - Having different number of iterations when changing the number of processes is normal. > > > > the change in iterations i mentioned are for different preconditioners, but the same number of MPI processes. > > > > > > > > > > > > > - Yes, if you do not destroy the EPS solver, then the preconditioner would be reused. > > > > > > > > > > Regarding the segmentation fault, I have no clue. Not sure if this is related to GAMG or not. Maybe running under valgrind could provide more information. > > > > will try that. > > > > > > > > Denis. > > > > > > > > > > > > > > > > > > > > > From gianmail at gmail.com Wed Nov 11 22:05:49 2015 From: gianmail at gmail.com (Gianluca Meneghello) Date: Wed, 11 Nov 2015 20:05:49 -0800 Subject: [petsc-users] Parallel matrix-vector multiplication Message-ID: Hi, I am trying to do something apparently really simple but with no success. I need to perform a matrix-vector multiplication x = B f , where the length of x is bigger than the length of f (or viceversa). Thus, B cannot be created using DMCreateMatrix. Both x and f are obtained from different DMs, the smaller covering only a subdomain of the larger. The application is to apply a control f to a system, e.g. \dot{x} = A x + B f. The problem is, when running on more than one core, the vector x is not organized as I would expect (everything works fine on a single core). I attach a short example where B is intended to map f to the interior of x. mpirun -n 1 ./test -draw_pause -1 works fine while mpirun -n 2 ./test -draw_pause -1 shows the problem I have not found any example with non square matrices in the src folder, any help is very welcome. Thanks for your time, Gianluca -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test.cpp Type: text/x-c++src Size: 2227 bytes Desc: not available URL: From bsmith at mcs.anl.gov Wed Nov 11 22:12:05 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 11 Nov 2015 22:12:05 -0600 Subject: [petsc-users] Parallel matrix-vector multiplication In-Reply-To: References: Message-ID: <85ADC6EB-7A56-4A00-9DEB-DBC997BB6970@mcs.anl.gov> When you create the 2 DM you must be set the lx, ly arguments (the ones you set to 0) in your code carefully to insure that the vectors for the 2 DM you create have compatible layout to do the matrix vector product. You can run a very small problem with 2 processors and printing out the vectors to see the layout to make sure you get it correct. The 2 DM don't have any magically way of knowing that you created another DMDA and want it to be compatible automatically. Barry DMDACreate2d(PETSC_COMM_WORLD , DM_BOUNDARY_GHOSTED , DM_BOUNDARY_GHOSTED , DMDA_STENCIL_BOX , Mx , Nx , PETSC_DECIDE , PETSC_DECIDE , 1 , 0 , 0 , 0 , &dax); DMDACreate2d(PETSC_COMM_WORLD , DM_BOUNDARY_NONE , DM_BOUNDARY_NONE , DMDA_STENCIL_BOX , Mx-2*bs , Nx-2*bs , PETSC_DECIDE , PETSC_DECIDE , 1 , 0 , 0 , 0 , &daf); > On Nov 11, 2015, at 10:05 PM, Gianluca Meneghello wrote: > > Hi, > > I am trying to do something apparently really simple but with no success. > > I need to perform a matrix-vector multiplication x = B f , where the length of x is bigger than the length of f (or viceversa). Thus, B cannot be created using DMCreateMatrix. > > Both x and f are obtained from different DMs, the smaller covering only a subdomain of the larger. The application is to apply a control f to a system, e.g. \dot{x} = A x + B f. > > The problem is, when running on more than one core, the vector x is not organized as I would expect (everything works fine on a single core). > > I attach a short example where B is intended to map f to the interior of x. > > mpirun -n 1 ./test -draw_pause -1 works fine while > mpirun -n 2 ./test -draw_pause -1 shows the problem > > I have not found any example with non square matrices in the src folder, any help is very welcome. > > Thanks for your time, > > Gianluca > From gianmail at gmail.com Wed Nov 11 22:47:11 2015 From: gianmail at gmail.com (Gianluca Meneghello) Date: Wed, 11 Nov 2015 20:47:11 -0800 Subject: [petsc-users] Parallel matrix-vector multiplication In-Reply-To: <85ADC6EB-7A56-4A00-9DEB-DBC997BB6970@mcs.anl.gov> References: <85ADC6EB-7A56-4A00-9DEB-DBC997BB6970@mcs.anl.gov> Message-ID: Hi, thanks for the very quick reply. One more question: is there a way to get the lx and ly from the first dm and use them (modified) for the second dm? DMDAGetInfo does not seem to provide this information. Thanks again for your help Gianluca On Wed, Nov 11, 2015 at 8:12 PM, Barry Smith wrote: > > When you create the 2 DM you must be set the lx, ly arguments (the ones > you set to 0) in your code carefully to insure that the vectors for the 2 > DM you create have compatible layout to do the matrix vector product. > > You can run a very small problem with 2 processors and printing out the > vectors to see the layout to make sure you get it correct. > > The 2 DM don't have any magically way of knowing that you created > another DMDA and want it to be compatible automatically. > > Barry > > DMDACreate2d(PETSC_COMM_WORLD , DM_BOUNDARY_GHOSTED , DM_BOUNDARY_GHOSTED > , DMDA_STENCIL_BOX , > Mx , Nx , PETSC_DECIDE , PETSC_DECIDE , 1 , 0 , 0 , 0 , > &dax); > DMDACreate2d(PETSC_COMM_WORLD , DM_BOUNDARY_NONE , DM_BOUNDARY_NONE > , DMDA_STENCIL_BOX , > Mx-2*bs , Nx-2*bs , PETSC_DECIDE , PETSC_DECIDE , 1 , 0 , 0 , 0 , > &daf); > > > On Nov 11, 2015, at 10:05 PM, Gianluca Meneghello > wrote: > > > > Hi, > > > > I am trying to do something apparently really simple but with no success. > > > > I need to perform a matrix-vector multiplication x = B f , where the > length of x is bigger than the length of f (or viceversa). Thus, B cannot > be created using DMCreateMatrix. > > > > Both x and f are obtained from different DMs, the smaller covering only > a subdomain of the larger. The application is to apply a control f to a > system, e.g. \dot{x} = A x + B f. > > > > The problem is, when running on more than one core, the vector x is not > organized as I would expect (everything works fine on a single core). > > > > I attach a short example where B is intended to map f to the interior of > x. > > > > mpirun -n 1 ./test -draw_pause -1 works fine while > > mpirun -n 2 ./test -draw_pause -1 shows the problem > > > > I have not found any example with non square matrices in the src folder, > any help is very welcome. > > > > Thanks for your time, > > > > Gianluca > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Nov 12 00:19:37 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 12 Nov 2015 00:19:37 -0600 Subject: [petsc-users] Parallel matrix-vector multiplication In-Reply-To: References: <85ADC6EB-7A56-4A00-9DEB-DBC997BB6970@mcs.anl.gov> Message-ID: <7BD83C8D-141C-4B89-AE89-5BDB40BCA2B4@mcs.anl.gov> DMDAGetOwnershipRanges > On Nov 11, 2015, at 10:47 PM, Gianluca Meneghello wrote: > > Hi, > > thanks for the very quick reply. > > One more question: is there a way to get the lx and ly from the first dm and use them (modified) for the second dm? DMDAGetInfo does not seem to provide this information. > > Thanks again for your help > > Gianluca > > On Wed, Nov 11, 2015 at 8:12 PM, Barry Smith wrote: > > When you create the 2 DM you must be set the lx, ly arguments (the ones you set to 0) in your code carefully to insure that the vectors for the 2 DM you create have compatible layout to do the matrix vector product. > > You can run a very small problem with 2 processors and printing out the vectors to see the layout to make sure you get it correct. > > The 2 DM don't have any magically way of knowing that you created another DMDA and want it to be compatible automatically. > > Barry > > DMDACreate2d(PETSC_COMM_WORLD , DM_BOUNDARY_GHOSTED , DM_BOUNDARY_GHOSTED , DMDA_STENCIL_BOX , > Mx , Nx , PETSC_DECIDE , PETSC_DECIDE , 1 , 0 , 0 , 0 , &dax); > DMDACreate2d(PETSC_COMM_WORLD , DM_BOUNDARY_NONE , DM_BOUNDARY_NONE , DMDA_STENCIL_BOX , > Mx-2*bs , Nx-2*bs , PETSC_DECIDE , PETSC_DECIDE , 1 , 0 , 0 , 0 , &daf); > > > On Nov 11, 2015, at 10:05 PM, Gianluca Meneghello wrote: > > > > Hi, > > > > I am trying to do something apparently really simple but with no success. > > > > I need to perform a matrix-vector multiplication x = B f , where the length of x is bigger than the length of f (or viceversa). Thus, B cannot be created using DMCreateMatrix. > > > > Both x and f are obtained from different DMs, the smaller covering only a subdomain of the larger. The application is to apply a control f to a system, e.g. \dot{x} = A x + B f. > > > > The problem is, when running on more than one core, the vector x is not organized as I would expect (everything works fine on a single core). > > > > I attach a short example where B is intended to map f to the interior of x. > > > > mpirun -n 1 ./test -draw_pause -1 works fine while > > mpirun -n 2 ./test -draw_pause -1 shows the problem > > > > I have not found any example with non square matrices in the src folder, any help is very welcome. > > > > Thanks for your time, > > > > Gianluca > > > > From mfadams at lbl.gov Thu Nov 12 08:50:07 2015 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 12 Nov 2015 09:50:07 -0500 Subject: [petsc-users] GAMG and zero pivots follow up In-Reply-To: References: Message-ID: Note, I suspect the zero pivot is coming from a coarse grid. I don't know why no-inode fixed it. N.B., this might not be deterministic. If you run with -info and grep on 'GAMG' you will see the block sizes. They should be 3 on the fine grid and the 6 on the coarse grids if you set everything up correctly. Mark On Wed, Nov 11, 2015 at 1:36 PM, David Knezevic wrote: > On Wed, Nov 11, 2015 at 12:57 PM, David Knezevic < > david.knezevic at akselos.com> wrote: > >> On Wed, Nov 11, 2015 at 12:24 PM, David Knezevic < >> david.knezevic at akselos.com> wrote: >> >>> On Tue, Nov 10, 2015 at 10:28 PM, David Knezevic < >>> david.knezevic at akselos.com> wrote: >>> >>>> On Tue, Nov 10, 2015 at 10:24 PM, Matthew Knepley >>>> wrote: >>>> >>>>> On Tue, Nov 10, 2015 at 9:21 PM, David Knezevic < >>>>> david.knezevic at akselos.com> wrote: >>>>> >>>>>> On Tue, Nov 10, 2015 at 10:00 PM, Matthew Knepley >>>>>> wrote: >>>>>> >>>>>>> On Tue, Nov 10, 2015 at 8:39 PM, David Knezevic < >>>>>>> david.knezevic at akselos.com> wrote: >>>>>>> >>>>>>>> I'm looking into using GAMG, so I wanted to start with a simple 3D >>>>>>>> elasticity problem. When I first tried this, I got the following "zero >>>>>>>> pivot" error: >>>>>>>> >>>>>>>> >>>>>>>> ----------------------------------------------------------------------- >>>>>>>> >>>>>>>> [0]PETSC ERROR: Zero pivot in LU factorization: >>>>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#zeropivot >>>>>>>> [0]PETSC ERROR: Zero pivot, row 3 >>>>>>>> [0]PETSC ERROR: See >>>>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>>>>>>> shooting. >>>>>>>> [0]PETSC ERROR: Petsc Release Version 3.6.1, Jul, 22, 2015 >>>>>>>> [0]PETSC ERROR: >>>>>>>> /home/dknez/akselos-dev/scrbe/build/bin/fe_solver-opt_real on a >>>>>>>> arch-linux2-c-opt named david-Lenovo by dknez Tue Nov 10 21:26:39 2015 >>>>>>>> [0]PETSC ERROR: Configure options --with-shared-libraries=1 >>>>>>>> --with-debugging=0 --download-suitesparse --download-parmetis >>>>>>>> --download-blacs >>>>>>>> --with-blas-lapack-dir=/opt/intel/system_studio_2015.2.050/mkl >>>>>>>> --CXXFLAGS=-Wl,--no-as-needed --download-scalapack --download-mumps >>>>>>>> --download-metis --download-superlu_dist >>>>>>>> --prefix=/home/dknez/software/libmesh_install/opt_real/petsc >>>>>>>> --download-hypre --download-ml >>>>>>>> [0]PETSC ERROR: #1 PetscKernel_A_gets_inverse_A_5() line 48 in >>>>>>>> /home/dknez/software/petsc-3.6.1/src/mat/impls/baij/seq/dgefa5.c >>>>>>>> [0]PETSC ERROR: #2 MatSOR_SeqAIJ_Inode() line 2808 in >>>>>>>> /home/dknez/software/petsc-3.6.1/src/mat/impls/aij/seq/inode.c >>>>>>>> [0]PETSC ERROR: #3 MatSOR() line 3697 in >>>>>>>> /home/dknez/software/petsc-3.6.1/src/mat/interface/matrix.c >>>>>>>> [0]PETSC ERROR: #4 PCApply_SOR() line 37 in >>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/sor/sor.c >>>>>>>> [0]PETSC ERROR: #5 PCApply() line 482 in >>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/interface/precon.c >>>>>>>> [0]PETSC ERROR: #6 KSP_PCApply() line 242 in >>>>>>>> /home/dknez/software/petsc-3.6.1/include/petsc/private/kspimpl.h >>>>>>>> [0]PETSC ERROR: #7 KSPInitialResidual() line 63 in >>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itres.c >>>>>>>> [0]PETSC ERROR: #8 KSPSolve_GMRES() line 235 in >>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/gmres/gmres.c >>>>>>>> [0]PETSC ERROR: #9 KSPSolve() line 604 in >>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c >>>>>>>> [0]PETSC ERROR: #10 KSPSolve_Chebyshev() line 381 in >>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/cheby/cheby.c >>>>>>>> [0]PETSC ERROR: #11 KSPSolve() line 604 in >>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c >>>>>>>> [0]PETSC ERROR: #12 PCMGMCycle_Private() line 19 in >>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c >>>>>>>> [0]PETSC ERROR: #13 PCMGMCycle_Private() line 48 in >>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c >>>>>>>> [0]PETSC ERROR: #14 PCApply_MG() line 338 in >>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c >>>>>>>> [0]PETSC ERROR: #15 PCApply() line 482 in >>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/interface/precon.c >>>>>>>> [0]PETSC ERROR: #16 KSP_PCApply() line 242 in >>>>>>>> /home/dknez/software/petsc-3.6.1/include/petsc/private/kspimpl.h >>>>>>>> [0]PETSC ERROR: #17 KSPSolve_CG() line 139 in >>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/cg/cg.c >>>>>>>> [0]PETSC ERROR: #18 KSPSolve() line 604 in >>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c >>>>>>>> >>>>>>>> >>>>>>>> ----------------------------------------------------------------------- >>>>>>>> >>>>>>>> I saw that there was a thread about this in September (subject: >>>>>>>> "gamg and zero pivots"), and that the fix is to use "-mg_levels_pc_type >>>>>>>> jacobi." When I do that, the solve succeeds (I pasted the -ksp_view at the >>>>>>>> end of this email). >>>>>>>> >>>>>>>> So I have two questions about this: >>>>>>>> >>>>>>>> 1. Is it surprising that I hit this issue for a 3D elasticity >>>>>>>> problem? Note that matrix assembly was done in libMesh, I can look into the >>>>>>>> structure of the assembled matrix more carefully, if needed. Also, note >>>>>>>> that I can solve this problem with direct solvers just fine. >>>>>>>> >>>>>>> >>>>>>> Yes, this seems like a bug, but it could be some strange BC thing I >>>>>>> do not understand. >>>>>>> >>>>>> >>>>>> >>>>>> OK, I can look into the matrix in more detail. I agree that it should >>>>>> have a non-zero diagonal, so I'll have a look at what's happening with that. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Naively, the elastic element matrix has a nonzero diagonal. I see >>>>>>> that you are doing LU >>>>>>> of size 5. That seems strange for 3D elasticity. Am I missing >>>>>>> something? I would expect >>>>>>> block size 3. >>>>>>> >>>>>> >>>>>> >>>>>> I'm not sure what is causing the LU of size 5. Is there a setting to >>>>>> control that? >>>>>> >>>>>> Regarding the block size: I set the vector and matrix block size to 3 >>>>>> via VecSetBlockSize and MatSetBlockSize. I also >>>>>> used MatNullSpaceCreateRigidBody on a vector with block size of 3, and set >>>>>> the matrix's near nullspace using that. >>>>>> >>>>> >>>>> Can you run this same example with -mat_no_inode? I think it may be a >>>>> strange blocking that is causing this. >>>>> >>>> >>>> >>>> That works. The -ksp_view output is below. >>>> >>> >>> >>> I just wanted to follow up on this. I had a more careful look at the >>> matrix, and confirmed that there are no zero entries on the diagonal (as >>> expected for elasticity). The matrix is from one of libMesh's example >>> problems: a simple cantilever model using HEX8 elements. >>> >>> Do you have any further thoughts about what might cause the "strange >>> blocking" that you referred to? If there's something non-standard that >>> libMesh is doing with the blocks, I'd be interested to look into that. I >>> can send over the matrix if that would be helpful. >>> >>> Thanks, >>> David >>> >>> >> P.S. I was previously calling VecSetBlockSize and MatSetBlockSize to set >> the block size to 3. When I don't do that, I no longer need to call >> -mat_no_inodes. I've pasted the -ksp_view output below. Does it look like >> that's working OK? >> > > > Sorry for the multiple messages, but I think I found the issue. libMesh > internally sets the block size to 1 earlier on (in PetscMatrix::init()). I > guess it'll work fine if I get it to set the block size to 3 instead, so > I'll look into that. (libMesh has an enable-blocked-storage configure > option that should take care of this automatically.) > > David > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.knezevic at akselos.com Thu Nov 12 08:58:48 2015 From: david.knezevic at akselos.com (David Knezevic) Date: Thu, 12 Nov 2015 09:58:48 -0500 Subject: [petsc-users] GAMG and zero pivots follow up In-Reply-To: References: Message-ID: On Thu, Nov 12, 2015 at 9:50 AM, Mark Adams wrote: > Note, I suspect the zero pivot is coming from a coarse grid. I don't know > why no-inode fixed it. N.B., this might not be deterministic. > > If you run with -info and grep on 'GAMG' you will see the block sizes. > They should be 3 on the fine grid and the 6 on the coarse grids if you set > everything up correctly. > OK, thanks for the info, I'll look into this further. Though note that I got it to work now without needing no-inode now. The change I made was to make sure that I matched the order of function calls from the PETSc GAMG examples. The libMesh code I was using was doing things somewhat out of order, apparently. David On Wed, Nov 11, 2015 at 1:36 PM, David Knezevic wrote: > On Wed, Nov 11, 2015 at 12:57 PM, David Knezevic < > david.knezevic at akselos.com> wrote: > >> On Wed, Nov 11, 2015 at 12:24 PM, David Knezevic < >> david.knezevic at akselos.com> wrote: >> >>> On Tue, Nov 10, 2015 at 10:28 PM, David Knezevic < >>> david.knezevic at akselos.com> wrote: >>> >>>> On Tue, Nov 10, 2015 at 10:24 PM, Matthew Knepley >>>> wrote: >>>> >>>>> On Tue, Nov 10, 2015 at 9:21 PM, David Knezevic < >>>>> david.knezevic at akselos.com> wrote: >>>>> >>>>>> On Tue, Nov 10, 2015 at 10:00 PM, Matthew Knepley >>>>>> wrote: >>>>>> >>>>>>> On Tue, Nov 10, 2015 at 8:39 PM, David Knezevic < >>>>>>> david.knezevic at akselos.com> wrote: >>>>>>> >>>>>>>> I'm looking into using GAMG, so I wanted to start with a simple 3D >>>>>>>> elasticity problem. When I first tried this, I got the following "zero >>>>>>>> pivot" error: >>>>>>>> >>>>>>>> >>>>>>>> ----------------------------------------------------------------------- >>>>>>>> >>>>>>>> [0]PETSC ERROR: Zero pivot in LU factorization: >>>>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#zeropivot >>>>>>>> [0]PETSC ERROR: Zero pivot, row 3 >>>>>>>> [0]PETSC ERROR: See >>>>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>>>>>>> shooting. >>>>>>>> [0]PETSC ERROR: Petsc Release Version 3.6.1, Jul, 22, 2015 >>>>>>>> [0]PETSC ERROR: >>>>>>>> /home/dknez/akselos-dev/scrbe/build/bin/fe_solver-opt_real on a >>>>>>>> arch-linux2-c-opt named david-Lenovo by dknez Tue Nov 10 21:26:39 2015 >>>>>>>> [0]PETSC ERROR: Configure options --with-shared-libraries=1 >>>>>>>> --with-debugging=0 --download-suitesparse --download-parmetis >>>>>>>> --download-blacs >>>>>>>> --with-blas-lapack-dir=/opt/intel/system_studio_2015.2.050/mkl >>>>>>>> --CXXFLAGS=-Wl,--no-as-needed --download-scalapack --download-mumps >>>>>>>> --download-metis --download-superlu_dist >>>>>>>> --prefix=/home/dknez/software/libmesh_install/opt_real/petsc >>>>>>>> --download-hypre --download-ml >>>>>>>> [0]PETSC ERROR: #1 PetscKernel_A_gets_inverse_A_5() line 48 in >>>>>>>> /home/dknez/software/petsc-3.6.1/src/mat/impls/baij/seq/dgefa5.c >>>>>>>> [0]PETSC ERROR: #2 MatSOR_SeqAIJ_Inode() line 2808 in >>>>>>>> /home/dknez/software/petsc-3.6.1/src/mat/impls/aij/seq/inode.c >>>>>>>> [0]PETSC ERROR: #3 MatSOR() line 3697 in >>>>>>>> /home/dknez/software/petsc-3.6.1/src/mat/interface/matrix.c >>>>>>>> [0]PETSC ERROR: #4 PCApply_SOR() line 37 in >>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/sor/sor.c >>>>>>>> [0]PETSC ERROR: #5 PCApply() line 482 in >>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/interface/precon.c >>>>>>>> [0]PETSC ERROR: #6 KSP_PCApply() line 242 in >>>>>>>> /home/dknez/software/petsc-3.6.1/include/petsc/private/kspimpl.h >>>>>>>> [0]PETSC ERROR: #7 KSPInitialResidual() line 63 in >>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itres.c >>>>>>>> [0]PETSC ERROR: #8 KSPSolve_GMRES() line 235 in >>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/gmres/gmres.c >>>>>>>> [0]PETSC ERROR: #9 KSPSolve() line 604 in >>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c >>>>>>>> [0]PETSC ERROR: #10 KSPSolve_Chebyshev() line 381 in >>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/cheby/cheby.c >>>>>>>> [0]PETSC ERROR: #11 KSPSolve() line 604 in >>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c >>>>>>>> [0]PETSC ERROR: #12 PCMGMCycle_Private() line 19 in >>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c >>>>>>>> [0]PETSC ERROR: #13 PCMGMCycle_Private() line 48 in >>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c >>>>>>>> [0]PETSC ERROR: #14 PCApply_MG() line 338 in >>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c >>>>>>>> [0]PETSC ERROR: #15 PCApply() line 482 in >>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/interface/precon.c >>>>>>>> [0]PETSC ERROR: #16 KSP_PCApply() line 242 in >>>>>>>> /home/dknez/software/petsc-3.6.1/include/petsc/private/kspimpl.h >>>>>>>> [0]PETSC ERROR: #17 KSPSolve_CG() line 139 in >>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/cg/cg.c >>>>>>>> [0]PETSC ERROR: #18 KSPSolve() line 604 in >>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c >>>>>>>> >>>>>>>> >>>>>>>> ----------------------------------------------------------------------- >>>>>>>> >>>>>>>> I saw that there was a thread about this in September (subject: >>>>>>>> "gamg and zero pivots"), and that the fix is to use "-mg_levels_pc_type >>>>>>>> jacobi." When I do that, the solve succeeds (I pasted the -ksp_view at the >>>>>>>> end of this email). >>>>>>>> >>>>>>>> So I have two questions about this: >>>>>>>> >>>>>>>> 1. Is it surprising that I hit this issue for a 3D elasticity >>>>>>>> problem? Note that matrix assembly was done in libMesh, I can look into the >>>>>>>> structure of the assembled matrix more carefully, if needed. Also, note >>>>>>>> that I can solve this problem with direct solvers just fine. >>>>>>>> >>>>>>> >>>>>>> Yes, this seems like a bug, but it could be some strange BC thing I >>>>>>> do not understand. >>>>>>> >>>>>> >>>>>> >>>>>> OK, I can look into the matrix in more detail. I agree that it should >>>>>> have a non-zero diagonal, so I'll have a look at what's happening with that. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Naively, the elastic element matrix has a nonzero diagonal. I see >>>>>>> that you are doing LU >>>>>>> of size 5. That seems strange for 3D elasticity. Am I missing >>>>>>> something? I would expect >>>>>>> block size 3. >>>>>>> >>>>>> >>>>>> >>>>>> I'm not sure what is causing the LU of size 5. Is there a setting to >>>>>> control that? >>>>>> >>>>>> Regarding the block size: I set the vector and matrix block size to 3 >>>>>> via VecSetBlockSize and MatSetBlockSize. I also >>>>>> used MatNullSpaceCreateRigidBody on a vector with block size of 3, and set >>>>>> the matrix's near nullspace using that. >>>>>> >>>>> >>>>> Can you run this same example with -mat_no_inode? I think it may be a >>>>> strange blocking that is causing this. >>>>> >>>> >>>> >>>> That works. The -ksp_view output is below. >>>> >>> >>> >>> I just wanted to follow up on this. I had a more careful look at the >>> matrix, and confirmed that there are no zero entries on the diagonal (as >>> expected for elasticity). The matrix is from one of libMesh's example >>> problems: a simple cantilever model using HEX8 elements. >>> >>> Do you have any further thoughts about what might cause the "strange >>> blocking" that you referred to? If there's something non-standard that >>> libMesh is doing with the blocks, I'd be interested to look into that. I >>> can send over the matrix if that would be helpful. >>> >>> Thanks, >>> David >>> >>> >> P.S. I was previously calling VecSetBlockSize and MatSetBlockSize to set >> the block size to 3. When I don't do that, I no longer need to call >> -mat_no_inodes. I've pasted the -ksp_view output below. Does it look like >> that's working OK? >> > > > Sorry for the multiple messages, but I think I found the issue. libMesh > internally sets the block size to 1 earlier on (in PetscMatrix::init()). I > guess it'll work fine if I get it to set the block size to 3 instead, so > I'll look into that. (libMesh has an enable-blocked-storage configure > option that should take care of this automatically.) > > David > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Nov 12 09:35:41 2015 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 12 Nov 2015 10:35:41 -0500 Subject: [petsc-users] CG+GAMG convergence issues in GHEP Krylov-Schur for some MPI runs In-Reply-To: <11E7B1E9-D812-4717-A9F2-929A218573E0@mcs.anl.gov> References: <084F486B-00B4-4803-A95B-EE390EB25391@gmail.com> <2724FD04-9FED-43B8-B8A4-9F66BB8BD43B@dsic.upv.es> <4E4A4E38-B9C3-4F41-8D52-1AFF3E4E8C8C@gmail.com> <5ED4A81C-002B-4A6B-BC6D-F93BE200F0A5@mcs.anl.gov> <11E7B1E9-D812-4717-A9F2-929A218573E0@mcs.anl.gov> Message-ID: On Wed, Nov 11, 2015 at 6:14 PM, Barry Smith wrote: > > Hmm, you absolutely must be using an options file otherwise it would > never be doing all the stuff it is doing inside PetscOptionsInsertFile()! > > Yes, here it is: -log_summary #-help -options_left false -damping 1.15 -fp_trap #-on_error_attach_debugger /usr/local/bin/gdb #-on_error_attach_debugger /Users/markadams/homebrew/bin/gdb #-start_in_debugger /Users/markadams/homebrew/bin/gdb -debugger_nodes 1 #-malloc_debug #-malloc_dump > Please send me the options file. > > Barry > > Most of the reports are doing to vendor crimes but it possible that the > PetscTokenFind() code has a memory issue though I don't see how. > > Seriously the NERSc people should be pressuring Cray to have valgrind > clean code, this is disgraceful. > > > Conditional jump or move depends on uninitialised value(s) > ==2948== at 0x542EC7: PetscTokenFind (str.c:965) > ==2948== by 0x4F00B9: PetscOptionsInsertString (options.c:390) > ==2948== by 0x4F2F7B: PetscOptionsInsertFile (options.c:590) > ==2948== by 0x4F4ED7: PetscOptionsInsert (options.c:721) > ==2948== by 0x51A629: PetscInitialize (pinit.c:859) > ==2948== by 0x47B98D: main (in > /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex) > ==2948== > ==2948== Use of uninitialised value of size 8 > ==2948== at 0x542ECD: PetscTokenFind (str.c:965) > ==2948== by 0x4F00B9: PetscOptionsInsertString (options.c:390) > ==2948== by 0x4F2F7B: PetscOptionsInsertFile (options.c:590) > ==2948== by 0x4F4ED7: PetscOptionsInsert (options.c:721) > ==2948== by 0x51A629: PetscInitialize (pinit.c:859) > ==2948== by 0x47B98D: main (in > /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex) > ==2948== > ==2948== Conditional jump or move depends on uninitialised value(s) > ==2948== at 0x542F04: PetscTokenFind (str.c:966) > ==2948== by 0x4F00B9: PetscOptionsInsertString (options.c:390) > ==2948== by 0x4F2F7B: PetscOptionsInsertFile (options.c:590) > ==2948== by 0x4F4ED7: PetscOptionsInsert (options.c:721) > ==2948== by 0x51A629: PetscInitialize (pinit.c:859) > ==2948== by 0x47B98D: main (in > /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex) > ==2948== > ==2948== Use of uninitialised value of size 8 > ==2948== at 0x542F0E: PetscTokenFind (str.c:967) > ==2948== by 0x4F00B9: PetscOptionsInsertString (options.c:390) > ==2948== by 0x4F2F7B: PetscOptionsInsertFile (options.c:590) > ==2948== by 0x4F4ED7: PetscOptionsInsert (options.c:721) > ==2948== by 0x51A629: PetscInitialize (pinit.c:859) > ==2948== by 0x47B98D: main (in > /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex) > ==2948== > ==2948== Use of uninitialised value of size 8 > ==2948== at 0x542F77: PetscTokenFind (str.c:973) > ==2948== by 0x4F00B9: PetscOptionsInsertString (options.c:390) > ==2948== by 0x4F2F7B: PetscOptionsInsertFile (options.c:590) > ==2948== by 0x4F4ED7: PetscOptionsInsert (options.c:721) > ==2948== by 0x51A629: PetscInitialize (pinit.c:859) > ==2948== by 0x47B98D: main (in > /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex) > ==2948== > ==2948== Use of uninitialised value of size 8 > ==2948== at 0x542F2D: PetscTokenFind (str.c:968) > ==2948== by 0x4F00B9: PetscOptionsInsertString (options.c:390) > ==2948== by 0x4F2F7B: PetscOptionsInsertFile (options.c:590) > ==2948== by 0x4F4ED7: PetscOptionsInsert (options.c:721) > ==2948== by 0x51A629: PetscInitialize (pinit.c:859) > ==2948== by 0x47B98D: main (in > /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex) > > > On Nov 11, 2015, at 3:38 PM, Mark Adams wrote: > > > > These are the only PETSc params that I used: > > > > -log_summary > > -options_left false > > -fp_trap > > > > I last update about 3 weeks ago and I am on a branch. I can redo this > with a current master. My repo seems to have been polluted: > > > > 13:35 edison12 master> ~/petsc$ git status > > # On branch master > > # Your branch is ahead of 'origin/master' by 262 commits. > > # > > nothing to commit (working directory clean) > > > > I trust this is OK but let me know if you would like me to clone a fresh > repo. > > > > Mark > > > > > > > > On Wed, Nov 11, 2015 at 11:21 AM, Barry Smith > wrote: > > > > Thanks > > > > do you use a petscrc file or any file with PETSc options in it for > the run? > > > > Thanks please send me the exact PETSc commit you are built off so I > can see the line numbers in our source when things go bad. > > > > Barry > > > > > On Nov 11, 2015, at 7:36 AM, Mark Adams wrote: > > > > > > > > > > > > On Tue, Nov 10, 2015 at 11:15 AM, Barry Smith > wrote: > > > > > > Please send me the full output. This is nuts and should be reported > once we understand it better to NERSc as something to be fixed. When I pay > $60 million in taxes to a computing center I expect something that works > fine for free on my laptop to work also there. > > > > > > Barry > > > > > > > On Nov 10, 2015, at 7:51 AM, Mark Adams wrote: > > > > > > > > I ran an 8 processor job on Edison of a small code for a short run > (just a linear solve) and got 37 Mb of output! > > > > > > > > Here is a 'Petsc' grep. > > > > > > > > Perhaps we should build an ignore file for things that we believe is > a false positive. > > > > > > > > On Tue, Nov 3, 2015 at 11:55 AM, Barry Smith > wrote: > > > > > > > > I am more optimistic about valgrind than Mark. I first try > valgrind and if that fails to be helpful then use the debugger. valgrind > has the advantage that it finds the FIRST place that something is wrong, > while in the debugger it is kind of late at the crash. > > > > > > > > Valgrind should not be noisy, if it is then the > applications/libraries should be cleaned up so that they are valgrind clean > and then valgrind is useful. > > > > > > > > Barry > > > > > > > > > > > > > > > > > On Nov 3, 2015, at 7:47 AM, Mark Adams wrote: > > > > > > > > > > BTW, I think that our advice for segv is use a debugger. DDT or > Totalview, and gdb if need be, will get you right to the source code and > will get 90% of bugs diagnosed. Valgrind is noisy and cumbersome to use > but can diagnose 90% of the other 10%. > > > > > > > > > > On Tue, Nov 3, 2015 at 7:32 AM, Denis Davydov > wrote: > > > > > Hi Jose, > > > > > > > > > > > On 3 Nov 2015, at 12:20, Jose E. Roman > wrote: > > > > > > > > > > > > I am answering the SLEPc-related questions: > > > > > > - Having different number of iterations when changing the number > of processes is normal. > > > > > the change in iterations i mentioned are for different > preconditioners, but the same number of MPI processes. > > > > > > > > > > > > > > > > - Yes, if you do not destroy the EPS solver, then the > preconditioner would be reused. > > > > > > > > > > > > Regarding the segmentation fault, I have no clue. Not sure if > this is related to GAMG or not. Maybe running under valgrind could provide > more information. > > > > > will try that. > > > > > > > > > > Denis. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Nov 12 10:44:19 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 12 Nov 2015 10:44:19 -0600 Subject: [petsc-users] CG+GAMG convergence issues in GHEP Krylov-Schur for some MPI runs In-Reply-To: References: <084F486B-00B4-4803-A95B-EE390EB25391@gmail.com> <2724FD04-9FED-43B8-B8A4-9F66BB8BD43B@dsic.upv.es> <4E4A4E38-B9C3-4F41-8D52-1AFF3E4E8C8C@gmail.com> <5ED4A81C-002B-4A6B-BC6D-F93BE200F0A5@mcs.anl.gov> <11E7B1E9-D812-4717-A9F2-929A218573E0@mcs.anl.gov> Message-ID: Thanks, I don't get any valgrind issues with this file so I have to conclude the valgrind issues all come from that damn Nersc machine. I highly recommend running the application code on some linux machine that is suitably valgrind clean to determine if the are any memory corruption issues with the application code. It is insane to try to debug application codes on damn Nersc machines directly. Barry > On Nov 12, 2015, at 9:35 AM, Mark Adams wrote: > > > > On Wed, Nov 11, 2015 at 6:14 PM, Barry Smith wrote: > > Hmm, you absolutely must be using an options file otherwise it would never be doing all the stuff it is doing inside PetscOptionsInsertFile()! > > > Yes, here it is: > > -log_summary > #-help > -options_left false > -damping 1.15 > -fp_trap > #-on_error_attach_debugger /usr/local/bin/gdb > #-on_error_attach_debugger /Users/markadams/homebrew/bin/gdb > #-start_in_debugger /Users/markadams/homebrew/bin/gdb > -debugger_nodes 1 > #-malloc_debug > #-malloc_dump > > > Please send me the options file. > > Barry > > Most of the reports are doing to vendor crimes but it possible that the PetscTokenFind() code has a memory issue though I don't see how. > > Seriously the NERSc people should be pressuring Cray to have valgrind clean code, this is disgraceful. > > > Conditional jump or move depends on uninitialised value(s) > ==2948== at 0x542EC7: PetscTokenFind (str.c:965) > ==2948== by 0x4F00B9: PetscOptionsInsertString (options.c:390) > ==2948== by 0x4F2F7B: PetscOptionsInsertFile (options.c:590) > ==2948== by 0x4F4ED7: PetscOptionsInsert (options.c:721) > ==2948== by 0x51A629: PetscInitialize (pinit.c:859) > ==2948== by 0x47B98D: main (in /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex) > ==2948== > ==2948== Use of uninitialised value of size 8 > ==2948== at 0x542ECD: PetscTokenFind (str.c:965) > ==2948== by 0x4F00B9: PetscOptionsInsertString (options.c:390) > ==2948== by 0x4F2F7B: PetscOptionsInsertFile (options.c:590) > ==2948== by 0x4F4ED7: PetscOptionsInsert (options.c:721) > ==2948== by 0x51A629: PetscInitialize (pinit.c:859) > ==2948== by 0x47B98D: main (in /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex) > ==2948== > ==2948== Conditional jump or move depends on uninitialised value(s) > ==2948== at 0x542F04: PetscTokenFind (str.c:966) > ==2948== by 0x4F00B9: PetscOptionsInsertString (options.c:390) > ==2948== by 0x4F2F7B: PetscOptionsInsertFile (options.c:590) > ==2948== by 0x4F4ED7: PetscOptionsInsert (options.c:721) > ==2948== by 0x51A629: PetscInitialize (pinit.c:859) > ==2948== by 0x47B98D: main (in /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex) > ==2948== > ==2948== Use of uninitialised value of size 8 > ==2948== at 0x542F0E: PetscTokenFind (str.c:967) > ==2948== by 0x4F00B9: PetscOptionsInsertString (options.c:390) > ==2948== by 0x4F2F7B: PetscOptionsInsertFile (options.c:590) > ==2948== by 0x4F4ED7: PetscOptionsInsert (options.c:721) > ==2948== by 0x51A629: PetscInitialize (pinit.c:859) > ==2948== by 0x47B98D: main (in /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex) > ==2948== > ==2948== Use of uninitialised value of size 8 > ==2948== at 0x542F77: PetscTokenFind (str.c:973) > ==2948== by 0x4F00B9: PetscOptionsInsertString (options.c:390) > ==2948== by 0x4F2F7B: PetscOptionsInsertFile (options.c:590) > ==2948== by 0x4F4ED7: PetscOptionsInsert (options.c:721) > ==2948== by 0x51A629: PetscInitialize (pinit.c:859) > ==2948== by 0x47B98D: main (in /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex) > ==2948== > ==2948== Use of uninitialised value of size 8 > ==2948== at 0x542F2D: PetscTokenFind (str.c:968) > ==2948== by 0x4F00B9: PetscOptionsInsertString (options.c:390) > ==2948== by 0x4F2F7B: PetscOptionsInsertFile (options.c:590) > ==2948== by 0x4F4ED7: PetscOptionsInsert (options.c:721) > ==2948== by 0x51A629: PetscInitialize (pinit.c:859) > ==2948== by 0x47B98D: main (in /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex) > > > On Nov 11, 2015, at 3:38 PM, Mark Adams wrote: > > > > These are the only PETSc params that I used: > > > > -log_summary > > -options_left false > > -fp_trap > > > > I last update about 3 weeks ago and I am on a branch. I can redo this with a current master. My repo seems to have been polluted: > > > > 13:35 edison12 master> ~/petsc$ git status > > # On branch master > > # Your branch is ahead of 'origin/master' by 262 commits. > > # > > nothing to commit (working directory clean) > > > > I trust this is OK but let me know if you would like me to clone a fresh repo. > > > > Mark > > > > > > > > On Wed, Nov 11, 2015 at 11:21 AM, Barry Smith wrote: > > > > Thanks > > > > do you use a petscrc file or any file with PETSc options in it for the run? > > > > Thanks please send me the exact PETSc commit you are built off so I can see the line numbers in our source when things go bad. > > > > Barry > > > > > On Nov 11, 2015, at 7:36 AM, Mark Adams wrote: > > > > > > > > > > > > On Tue, Nov 10, 2015 at 11:15 AM, Barry Smith wrote: > > > > > > Please send me the full output. This is nuts and should be reported once we understand it better to NERSc as something to be fixed. When I pay $60 million in taxes to a computing center I expect something that works fine for free on my laptop to work also there. > > > > > > Barry > > > > > > > On Nov 10, 2015, at 7:51 AM, Mark Adams wrote: > > > > > > > > I ran an 8 processor job on Edison of a small code for a short run (just a linear solve) and got 37 Mb of output! > > > > > > > > Here is a 'Petsc' grep. > > > > > > > > Perhaps we should build an ignore file for things that we believe is a false positive. > > > > > > > > On Tue, Nov 3, 2015 at 11:55 AM, Barry Smith wrote: > > > > > > > > I am more optimistic about valgrind than Mark. I first try valgrind and if that fails to be helpful then use the debugger. valgrind has the advantage that it finds the FIRST place that something is wrong, while in the debugger it is kind of late at the crash. > > > > > > > > Valgrind should not be noisy, if it is then the applications/libraries should be cleaned up so that they are valgrind clean and then valgrind is useful. > > > > > > > > Barry > > > > > > > > > > > > > > > > > On Nov 3, 2015, at 7:47 AM, Mark Adams wrote: > > > > > > > > > > BTW, I think that our advice for segv is use a debugger. DDT or Totalview, and gdb if need be, will get you right to the source code and will get 90% of bugs diagnosed. Valgrind is noisy and cumbersome to use but can diagnose 90% of the other 10%. > > > > > > > > > > On Tue, Nov 3, 2015 at 7:32 AM, Denis Davydov wrote: > > > > > Hi Jose, > > > > > > > > > > > On 3 Nov 2015, at 12:20, Jose E. Roman wrote: > > > > > > > > > > > > I am answering the SLEPc-related questions: > > > > > > - Having different number of iterations when changing the number of processes is normal. > > > > > the change in iterations i mentioned are for different preconditioners, but the same number of MPI processes. > > > > > > > > > > > > > > > > - Yes, if you do not destroy the EPS solver, then the preconditioner would be reused. > > > > > > > > > > > > Regarding the segmentation fault, I have no clue. Not sure if this is related to GAMG or not. Maybe running under valgrind could provide more information. > > > > > will try that. > > > > > > > > > > Denis. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From gianmail at gmail.com Thu Nov 12 12:23:20 2015 From: gianmail at gmail.com (Gianluca Meneghello) Date: Thu, 12 Nov 2015 10:23:20 -0800 Subject: [petsc-users] Parallel matrix-vector multiplication In-Reply-To: <7BD83C8D-141C-4B89-AE89-5BDB40BCA2B4@mcs.anl.gov> References: <85ADC6EB-7A56-4A00-9DEB-DBC997BB6970@mcs.anl.gov> <7BD83C8D-141C-4B89-AE89-5BDB40BCA2B4@mcs.anl.gov> Message-ID: Hi Barry, sorry, but I still cannot make it. I guess what I need is something similar to MatRestrict/MatInterpolate (and B is something similar to what is created from DMCreateInterpolation, except for the fact that the nonzero entries are distributed differently). Am I mistaken? Is there any example I could start from? Thanks again, Gianluca On Wed, Nov 11, 2015 at 10:19 PM, Barry Smith wrote: > DMDAGetOwnershipRanges > > > On Nov 11, 2015, at 10:47 PM, Gianluca Meneghello > wrote: > > > > Hi, > > > > thanks for the very quick reply. > > > > One more question: is there a way to get the lx and ly from the first dm > and use them (modified) for the second dm? DMDAGetInfo does not seem to > provide this information. > > > > Thanks again for your help > > > > Gianluca > > > > On Wed, Nov 11, 2015 at 8:12 PM, Barry Smith wrote: > > > > When you create the 2 DM you must be set the lx, ly arguments (the > ones you set to 0) in your code carefully to insure that the vectors for > the 2 DM you create have compatible layout to do the matrix vector product. > > > > You can run a very small problem with 2 processors and printing out > the vectors to see the layout to make sure you get it correct. > > > > The 2 DM don't have any magically way of knowing that you created > another DMDA and want it to be compatible automatically. > > > > Barry > > > > DMDACreate2d(PETSC_COMM_WORLD , DM_BOUNDARY_GHOSTED , > DM_BOUNDARY_GHOSTED , DMDA_STENCIL_BOX , > > Mx , Nx , PETSC_DECIDE , PETSC_DECIDE , 1 , 0 , 0 , 0 , > &dax); > > DMDACreate2d(PETSC_COMM_WORLD , DM_BOUNDARY_NONE , > DM_BOUNDARY_NONE , DMDA_STENCIL_BOX , > > Mx-2*bs , Nx-2*bs , PETSC_DECIDE , PETSC_DECIDE , 1 , 0 , 0 , 0 , > &daf); > > > > > On Nov 11, 2015, at 10:05 PM, Gianluca Meneghello > wrote: > > > > > > Hi, > > > > > > I am trying to do something apparently really simple but with no > success. > > > > > > I need to perform a matrix-vector multiplication x = B f , where the > length of x is bigger than the length of f (or viceversa). Thus, B cannot > be created using DMCreateMatrix. > > > > > > Both x and f are obtained from different DMs, the smaller covering > only a subdomain of the larger. The application is to apply a control f to > a system, e.g. \dot{x} = A x + B f. > > > > > > The problem is, when running on more than one core, the vector x is > not organized as I would expect (everything works fine on a single core). > > > > > > I attach a short example where B is intended to map f to the interior > of x. > > > > > > mpirun -n 1 ./test -draw_pause -1 works fine while > > > mpirun -n 2 ./test -draw_pause -1 shows the problem > > > > > > I have not found any example with non square matrices in the src > folder, any help is very welcome. > > > > > > Thanks for your time, > > > > > > Gianluca > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Nov 12 14:16:46 2015 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 12 Nov 2015 15:16:46 -0500 Subject: [petsc-users] CG+GAMG convergence issues in GHEP Krylov-Schur for some MPI runs In-Reply-To: References: <084F486B-00B4-4803-A95B-EE390EB25391@gmail.com> <2724FD04-9FED-43B8-B8A4-9F66BB8BD43B@dsic.upv.es> <4E4A4E38-B9C3-4F41-8D52-1AFF3E4E8C8C@gmail.com> <5ED4A81C-002B-4A6B-BC6D-F93BE200F0A5@mcs.anl.gov> <11E7B1E9-D812-4717-A9F2-929A218573E0@mcs.anl.gov> Message-ID: There is a valgrind for El Capitan now and I have it. It runs perfectly clean. Thanks, Mark On Thu, Nov 12, 2015 at 11:44 AM, Barry Smith wrote: > > Thanks, I don't get any valgrind issues with this file so I have to > conclude the valgrind issues all come from that damn Nersc machine. > > I highly recommend running the application code on some linux machine > that is suitably valgrind clean to determine if the are any memory > corruption issues with the application code. It is insane to try to debug > application codes on damn Nersc machines directly. > > Barry > > > On Nov 12, 2015, at 9:35 AM, Mark Adams wrote: > > > > > > > > On Wed, Nov 11, 2015 at 6:14 PM, Barry Smith wrote: > > > > Hmm, you absolutely must be using an options file otherwise it would > never be doing all the stuff it is doing inside PetscOptionsInsertFile()! > > > > > > Yes, here it is: > > > > -log_summary > > #-help > > -options_left false > > -damping 1.15 > > -fp_trap > > #-on_error_attach_debugger /usr/local/bin/gdb > > #-on_error_attach_debugger /Users/markadams/homebrew/bin/gdb > > #-start_in_debugger /Users/markadams/homebrew/bin/gdb > > -debugger_nodes 1 > > #-malloc_debug > > #-malloc_dump > > > > > > Please send me the options file. > > > > Barry > > > > Most of the reports are doing to vendor crimes but it possible that the > PetscTokenFind() code has a memory issue though I don't see how. > > > > Seriously the NERSc people should be pressuring Cray to have valgrind > clean code, this is disgraceful. > > > > > > Conditional jump or move depends on uninitialised value(s) > > ==2948== at 0x542EC7: PetscTokenFind (str.c:965) > > ==2948== by 0x4F00B9: PetscOptionsInsertString (options.c:390) > > ==2948== by 0x4F2F7B: PetscOptionsInsertFile (options.c:590) > > ==2948== by 0x4F4ED7: PetscOptionsInsert (options.c:721) > > ==2948== by 0x51A629: PetscInitialize (pinit.c:859) > > ==2948== by 0x47B98D: main (in > /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex) > > ==2948== > > ==2948== Use of uninitialised value of size 8 > > ==2948== at 0x542ECD: PetscTokenFind (str.c:965) > > ==2948== by 0x4F00B9: PetscOptionsInsertString (options.c:390) > > ==2948== by 0x4F2F7B: PetscOptionsInsertFile (options.c:590) > > ==2948== by 0x4F4ED7: PetscOptionsInsert (options.c:721) > > ==2948== by 0x51A629: PetscInitialize (pinit.c:859) > > ==2948== by 0x47B98D: main (in > /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex) > > ==2948== > > ==2948== Conditional jump or move depends on uninitialised value(s) > > ==2948== at 0x542F04: PetscTokenFind (str.c:966) > > ==2948== by 0x4F00B9: PetscOptionsInsertString (options.c:390) > > ==2948== by 0x4F2F7B: PetscOptionsInsertFile (options.c:590) > > ==2948== by 0x4F4ED7: PetscOptionsInsert (options.c:721) > > ==2948== by 0x51A629: PetscInitialize (pinit.c:859) > > ==2948== by 0x47B98D: main (in > /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex) > > ==2948== > > ==2948== Use of uninitialised value of size 8 > > ==2948== at 0x542F0E: PetscTokenFind (str.c:967) > > ==2948== by 0x4F00B9: PetscOptionsInsertString (options.c:390) > > ==2948== by 0x4F2F7B: PetscOptionsInsertFile (options.c:590) > > ==2948== by 0x4F4ED7: PetscOptionsInsert (options.c:721) > > ==2948== by 0x51A629: PetscInitialize (pinit.c:859) > > ==2948== by 0x47B98D: main (in > /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex) > > ==2948== > > ==2948== Use of uninitialised value of size 8 > > ==2948== at 0x542F77: PetscTokenFind (str.c:973) > > ==2948== by 0x4F00B9: PetscOptionsInsertString (options.c:390) > > ==2948== by 0x4F2F7B: PetscOptionsInsertFile (options.c:590) > > ==2948== by 0x4F4ED7: PetscOptionsInsert (options.c:721) > > ==2948== by 0x51A629: PetscInitialize (pinit.c:859) > > ==2948== by 0x47B98D: main (in > /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex) > > ==2948== > > ==2948== Use of uninitialised value of size 8 > > ==2948== at 0x542F2D: PetscTokenFind (str.c:968) > > ==2948== by 0x4F00B9: PetscOptionsInsertString (options.c:390) > > ==2948== by 0x4F2F7B: PetscOptionsInsertFile (options.c:590) > > ==2948== by 0x4F4ED7: PetscOptionsInsert (options.c:721) > > ==2948== by 0x51A629: PetscInitialize (pinit.c:859) > > ==2948== by 0x47B98D: main (in > /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex) > > > > > On Nov 11, 2015, at 3:38 PM, Mark Adams wrote: > > > > > > These are the only PETSc params that I used: > > > > > > -log_summary > > > -options_left false > > > -fp_trap > > > > > > I last update about 3 weeks ago and I am on a branch. I can redo this > with a current master. My repo seems to have been polluted: > > > > > > 13:35 edison12 master> ~/petsc$ git status > > > # On branch master > > > # Your branch is ahead of 'origin/master' by 262 commits. > > > # > > > nothing to commit (working directory clean) > > > > > > I trust this is OK but let me know if you would like me to clone a > fresh repo. > > > > > > Mark > > > > > > > > > > > > On Wed, Nov 11, 2015 at 11:21 AM, Barry Smith > wrote: > > > > > > Thanks > > > > > > do you use a petscrc file or any file with PETSc options in it for > the run? > > > > > > Thanks please send me the exact PETSc commit you are built off so I > can see the line numbers in our source when things go bad. > > > > > > Barry > > > > > > > On Nov 11, 2015, at 7:36 AM, Mark Adams wrote: > > > > > > > > > > > > > > > > On Tue, Nov 10, 2015 at 11:15 AM, Barry Smith > wrote: > > > > > > > > Please send me the full output. This is nuts and should be > reported once we understand it better to NERSc as something to be fixed. > When I pay $60 million in taxes to a computing center I expect something > that works fine for free on my laptop to work also there. > > > > > > > > Barry > > > > > > > > > On Nov 10, 2015, at 7:51 AM, Mark Adams wrote: > > > > > > > > > > I ran an 8 processor job on Edison of a small code for a short run > (just a linear solve) and got 37 Mb of output! > > > > > > > > > > Here is a 'Petsc' grep. > > > > > > > > > > Perhaps we should build an ignore file for things that we believe > is a false positive. > > > > > > > > > > On Tue, Nov 3, 2015 at 11:55 AM, Barry Smith > wrote: > > > > > > > > > > I am more optimistic about valgrind than Mark. I first try > valgrind and if that fails to be helpful then use the debugger. valgrind > has the advantage that it finds the FIRST place that something is wrong, > while in the debugger it is kind of late at the crash. > > > > > > > > > > Valgrind should not be noisy, if it is then the > applications/libraries should be cleaned up so that they are valgrind clean > and then valgrind is useful. > > > > > > > > > > Barry > > > > > > > > > > > > > > > > > > > > > On Nov 3, 2015, at 7:47 AM, Mark Adams wrote: > > > > > > > > > > > > BTW, I think that our advice for segv is use a debugger. DDT or > Totalview, and gdb if need be, will get you right to the source code and > will get 90% of bugs diagnosed. Valgrind is noisy and cumbersome to use > but can diagnose 90% of the other 10%. > > > > > > > > > > > > On Tue, Nov 3, 2015 at 7:32 AM, Denis Davydov < > davydden at gmail.com> wrote: > > > > > > Hi Jose, > > > > > > > > > > > > > On 3 Nov 2015, at 12:20, Jose E. Roman > wrote: > > > > > > > > > > > > > > I am answering the SLEPc-related questions: > > > > > > > - Having different number of iterations when changing the > number of processes is normal. > > > > > > the change in iterations i mentioned are for different > preconditioners, but the same number of MPI processes. > > > > > > > > > > > > > > > > > > > - Yes, if you do not destroy the EPS solver, then the > preconditioner would be reused. > > > > > > > > > > > > > > Regarding the segmentation fault, I have no clue. Not sure if > this is related to GAMG or not. Maybe running under valgrind could provide > more information. > > > > > > will try that. > > > > > > > > > > > > Denis. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Nov 12 14:24:13 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 12 Nov 2015 14:24:13 -0600 Subject: [petsc-users] Parallel matrix-vector multiplication In-Reply-To: References: <85ADC6EB-7A56-4A00-9DEB-DBC997BB6970@mcs.anl.gov> <7BD83C8D-141C-4B89-AE89-5BDB40BCA2B4@mcs.anl.gov> Message-ID: This is kind of tedious to get right. Plus it unfortunately will not work for all possible partitionings Issues: 1) The MatSetValues() works with global number but PETSc global numbering is per process so with DMDA does not normally match the "natural" ordering. You are trying to use global numbering in the natural ordering. To get it to work you need to use local numbering and MatSetValuesLocal() 2) it is next to impossible to debug as written. So what I did to help debug is to put values using the local numbering into the two vectors f and x and print those vectors. This makes it easy to see when values are put in the wrong place. 3) VecView() for DMDA prints out the vectors in the global natural ordering so it is easy to see if the vectors have the correct values in the correct locations. MatView() for DMDA however prints it only in the PETSc ordering by process so one needs to manual translate to make sure the matrix is correct. Anyways I've attached your code with small Mx=5 and My=4 it runs correctly with 1,2 and 4 processes here is the output $ petscmpiexec -n 1 ./ex5 Vec Object: 1 MPI processes type: seq Vec Object:Vec_0x84000000_0 1 MPI processes type: mpi Process [0] 0. 0. 0. 0. 0. 0. 1. 1. 1. 0. 0. 1. 1. 1. 0. 0. 0. 0. 0. 0. Vec Object: 1 MPI processes type: seq Vec Object:Vec_0x84000000_1 1 MPI processes type: mpi Process [0] 1. 1. 1. 1. 1. 1. Mat Object: 1 MPI processes type: seqaij row 0: row 1: row 2: row 3: row 4: row 5: row 6: (0, 1.) row 7: (1, 1.) row 8: (2, 1.) row 9: row 10: row 11: (3, 1.) row 12: (4, 1.) row 13: (5, 1.) row 14: row 15: row 16: row 17: row 18: row 19: ~/Src/petsc/test-dir (master=) arch-double $ petscmpiexec -n 2 ./ex5 Vec Object: 2 MPI processes type: mpi Vec Object:Vec_0x84000004_0 2 MPI processes type: mpi Process [0] 0. 0. 0. 0. 0. 0. 1. 1. 2. 0. 0. 1. Process [1] 1. 2. 0. 0. 0. 0. 0. 0. Vec Object: 2 MPI processes type: mpi Vec Object:Vec_0x84000004_1 2 MPI processes type: mpi Process [0] 1. 1. 2. 1. Process [1] 1. 2. Mat Object: 2 MPI processes type: mpiaij row 0: row 1: row 2: row 3: row 4: (0, 1.) row 5: (1, 1.) row 6: row 7: (2, 1.) row 8: (3, 1.) row 9: row 10: row 11: row 12: row 13: row 14: (4, 2.) row 15: row 16: (5, 2.) row 17: row 18: row 19: ~/Src/petsc/test-dir (master=) arch-double $ petscmpiexec -n 4 ./ex5 Vec Object: 4 MPI processes type: mpi Vec Object:Vec_0x84000004_0 4 MPI processes type: mpi Process [0] 0. 0. 0. 0. 0. 0. Process [1] 1. 1. 2. 0. Process [2] 0. 3. 3. 4. 0. 0. Process [3] 0. 0. 0. 0. Vec Object: 4 MPI processes type: mpi Vec Object:Vec_0x84000004_1 4 MPI processes type: mpi Process [0] 1. 1. Process [1] 2. Process [2] 3. 3. Process [3] 4. Mat Object: 4 MPI processes type: mpiaij row 0: row 1: row 2: row 3: row 4: (0, 1.) row 5: (1, 1.) row 6: row 7: row 8: (2, 2.) row 9: row 10: row 11: (3, 3.) row 12: (4, 3.) row 13: row 14: row 15: row 16: (5, 4.) row 17: row 18: row 19: It does NOT run correctly with 3 processes $ petscmpiexec -n 3 ./ex5 [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: Argument out of range [1]PETSC ERROR: Local index 2 too large 1 (max) at 0 [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [1]PETSC ERROR: Petsc Development GIT revision: v3.6.2-1539-gf82f7f7 GIT Date: 2015-11-09 20:26:06 -0600 [1]PETSC ERROR: ./ex5 on a arch-double named visitor098-088.wl.anl-external.org by barrysmith Thu Nov 12 14:18:56 2015 [1]PETSC ERROR: Configure options --download-hwloc --download-hypre --download-mpich --with-afterimage PETSC_ARCH=arch-double [1]PETSC ERROR: #1 ISLocalToGlobalMappingApply() line 423 in /Users/barrysmith/Src/PETSc/src/vec/is/utils/isltog.c [1]PETSC ERROR: #2 MatSetValuesLocal() line 2020 in /Users/barrysmith/Src/PETSc/src/mat/interface/matrix.c [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: Argument out of range [1]PETSC ERROR: Local index 2 too large 1 (max) at 0 [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [1]PETSC ERROR: Petsc Development GIT revision: v3.6.2-1539-gf82f7f7 GIT Date: 2015-11-09 20:26:06 -0600 [1]PETSC ERROR: ./ex5 on a arch-double named visitor098-088.wl.anl-external.org by barrysmith Thu Nov 12 14:18:56 2015 [1]PETSC ERROR: Configure options --download-hwloc --download-hypre --download-mpich --with-afterimage PETSC_ARCH=arch-double [1]PETSC ERROR: #3 ISLocalToGlobalMappingApply() line 423 in /Users/barrysmith/Src/PETSc/src/vec/is/utils/isltog.c [1]PETSC ERROR: #4 VecSetValuesLocal() line 1058 in /Users/barrysmith/Src/PETSc/src/vec/vec/interface/rvector.c Vec Object: 3 MPI processes type: mpi Vec Object:Vec_0x84000004_0 3 MPI processes type: mpi Process [0] 0. 0. 0. 0. 0. 0. 1. 2. Process [1] 2. 0. 0. 1. 2. 2. 0. 0. Process [2] 0. 0. 0. 0. Vec Object: 3 MPI processes type: mpi Vec Object:Vec_0x84000004_1 3 MPI processes type: mpi Process [0] 1. 2. Process [1] 0. 1. Process [2] 4. 0. Mat Object: 3 MPI processes type: mpiaij row 0: row 1: row 2: row 3: (0, 1.) row 4: row 5: (1, 1.) row 6: row 7: row 8: row 9: row 10: (2, 2.) row 11: (3, 2.) row 12: (3, 2.) row 13: row 14: row 15: row 16: row 17: row 18: row 19: The reason is that in this case the DMDAs are decomposed into three strips, for x the strips are xi = 0,1 then xi = 2,3 then xi= 4 for f the strips are fi=0, fi=1, fi=2 so there is no way to get a consistent local numbering for both x and f at the same time with 3 strips. So like the DMDA interpolation routines your code can only work for certain decompositions. Forget what I said about lx and ly I don't think that is relevant for what you are trying to do. Barry > On Nov 12, 2015, at 12:23 PM, Gianluca Meneghello wrote: > > Hi Barry, > > sorry, but I still cannot make it. I guess what I need is something similar to MatRestrict/MatInterpolate (and B is something similar to what is created from DMCreateInterpolation, except for the fact that the nonzero entries are distributed differently). > > Am I mistaken? Is there any example I could start from? > > Thanks again, > > Gianluca > > On Wed, Nov 11, 2015 at 10:19 PM, Barry Smith wrote: > DMDAGetOwnershipRanges > > > On Nov 11, 2015, at 10:47 PM, Gianluca Meneghello wrote: > > > > Hi, > > > > thanks for the very quick reply. > > > > One more question: is there a way to get the lx and ly from the first dm and use them (modified) for the second dm? DMDAGetInfo does not seem to provide this information. > > > > Thanks again for your help > > > > Gianluca > > > > On Wed, Nov 11, 2015 at 8:12 PM, Barry Smith wrote: > > > > When you create the 2 DM you must be set the lx, ly arguments (the ones you set to 0) in your code carefully to insure that the vectors for the 2 DM you create have compatible layout to do the matrix vector product. > > > > You can run a very small problem with 2 processors and printing out the vectors to see the layout to make sure you get it correct. > > > > The 2 DM don't have any magically way of knowing that you created another DMDA and want it to be compatible automatically. > > > > Barry > > > > DMDACreate2d(PETSC_COMM_WORLD , DM_BOUNDARY_GHOSTED , DM_BOUNDARY_GHOSTED , DMDA_STENCIL_BOX , > > Mx , Nx , PETSC_DECIDE , PETSC_DECIDE , 1 , 0 , 0 , 0 , &dax); > > DMDACreate2d(PETSC_COMM_WORLD , DM_BOUNDARY_NONE , DM_BOUNDARY_NONE , DMDA_STENCIL_BOX , > > Mx-2*bs , Nx-2*bs , PETSC_DECIDE , PETSC_DECIDE , 1 , 0 , 0 , 0 , &daf); > > > > > On Nov 11, 2015, at 10:05 PM, Gianluca Meneghello wrote: > > > > > > Hi, > > > > > > I am trying to do something apparently really simple but with no success. > > > > > > I need to perform a matrix-vector multiplication x = B f , where the length of x is bigger than the length of f (or viceversa). Thus, B cannot be created using DMCreateMatrix. > > > > > > Both x and f are obtained from different DMs, the smaller covering only a subdomain of the larger. The application is to apply a control f to a system, e.g. \dot{x} = A x + B f. > > > > > > The problem is, when running on more than one core, the vector x is not organized as I would expect (everything works fine on a single core). > > > > > > I attach a short example where B is intended to map f to the interior of x. > > > > > > mpirun -n 1 ./test -draw_pause -1 works fine while > > > mpirun -n 2 ./test -draw_pause -1 shows the problem > > > > > > I have not found any example with non square matrices in the src folder, any help is very welcome. > > > > > > Thanks for your time, > > > > > > Gianluca > > > > > > > > > From bsmith at mcs.anl.gov Thu Nov 12 14:25:39 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 12 Nov 2015 14:25:39 -0600 Subject: [petsc-users] Parallel matrix-vector multiplication In-Reply-To: References: <85ADC6EB-7A56-4A00-9DEB-DBC997BB6970@mcs.anl.gov> <7BD83C8D-141C-4B89-AE89-5BDB40BCA2B4@mcs.anl.gov> Message-ID: A non-text attachment was scrubbed... Name: ex5.cxx Type: application/octet-stream Size: 2726 bytes Desc: not available URL: -------------- next part -------------- I forgot the attached code > On Nov 12, 2015, at 2:24 PM, Barry Smith wrote: > > > This is kind of tedious to get right. Plus it unfortunately will not work for all possible partitionings > > Issues: > > 1) The MatSetValues() works with global number but PETSc global numbering is per process so with DMDA does not normally match the "natural" ordering. You are trying to use global numbering in the natural ordering. To get it to work you need to use local numbering and MatSetValuesLocal() > > 2) it is next to impossible to debug as written. So what I did to help debug is to put values using the local numbering into the two vectors f and x and print those vectors. This makes it easy to see when values are put in the wrong place. > > 3) VecView() for DMDA prints out the vectors in the global natural ordering so it is easy to see if the vectors have the correct values in the correct locations. MatView() for DMDA however prints it only in the PETSc ordering by process so one needs to manual translate to make sure the matrix is correct. > > Anyways I've attached your code with small Mx=5 and My=4 it runs correctly with 1,2 and 4 processes here is the output > > > $ petscmpiexec -n 1 ./ex5 > Vec Object: 1 MPI processes > type: seq > Vec Object:Vec_0x84000000_0 1 MPI processes > type: mpi > Process [0] > 0. > 0. > 0. > 0. > 0. > 0. > 1. > 1. > 1. > 0. > 0. > 1. > 1. > 1. > 0. > 0. > 0. > 0. > 0. > 0. > Vec Object: 1 MPI processes > type: seq > Vec Object:Vec_0x84000000_1 1 MPI processes > type: mpi > Process [0] > 1. > 1. > 1. > 1. > 1. > 1. > Mat Object: 1 MPI processes > type: seqaij > row 0: > row 1: > row 2: > row 3: > row 4: > row 5: > row 6: (0, 1.) > row 7: (1, 1.) > row 8: (2, 1.) > row 9: > row 10: > row 11: (3, 1.) > row 12: (4, 1.) > row 13: (5, 1.) > row 14: > row 15: > row 16: > row 17: > row 18: > row 19: > ~/Src/petsc/test-dir (master=) arch-double > $ petscmpiexec -n 2 ./ex5 > Vec Object: 2 MPI processes > type: mpi > Vec Object:Vec_0x84000004_0 2 MPI processes > type: mpi > Process [0] > 0. > 0. > 0. > 0. > 0. > 0. > 1. > 1. > 2. > 0. > 0. > 1. > Process [1] > 1. > 2. > 0. > 0. > 0. > 0. > 0. > 0. > Vec Object: 2 MPI processes > type: mpi > Vec Object:Vec_0x84000004_1 2 MPI processes > type: mpi > Process [0] > 1. > 1. > 2. > 1. > Process [1] > 1. > 2. > Mat Object: 2 MPI processes > type: mpiaij > row 0: > row 1: > row 2: > row 3: > row 4: (0, 1.) > row 5: (1, 1.) > row 6: > row 7: (2, 1.) > row 8: (3, 1.) > row 9: > row 10: > row 11: > row 12: > row 13: > row 14: (4, 2.) > row 15: > row 16: (5, 2.) > row 17: > row 18: > row 19: > ~/Src/petsc/test-dir (master=) arch-double > $ petscmpiexec -n 4 ./ex5 > Vec Object: 4 MPI processes > type: mpi > Vec Object:Vec_0x84000004_0 4 MPI processes > type: mpi > Process [0] > 0. > 0. > 0. > 0. > 0. > 0. > Process [1] > 1. > 1. > 2. > 0. > Process [2] > 0. > 3. > 3. > 4. > 0. > 0. > Process [3] > 0. > 0. > 0. > 0. > Vec Object: 4 MPI processes > type: mpi > Vec Object:Vec_0x84000004_1 4 MPI processes > type: mpi > Process [0] > 1. > 1. > Process [1] > 2. > Process [2] > 3. > 3. > Process [3] > 4. > Mat Object: 4 MPI processes > type: mpiaij > row 0: > row 1: > row 2: > row 3: > row 4: (0, 1.) > row 5: (1, 1.) > row 6: > row 7: > row 8: (2, 2.) > row 9: > row 10: > row 11: (3, 3.) > row 12: (4, 3.) > row 13: > row 14: > row 15: > row 16: (5, 4.) > row 17: > row 18: > row 19: > > > It does NOT run correctly with 3 processes > > $ petscmpiexec -n 3 ./ex5 > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [1]PETSC ERROR: Argument out of range > [1]PETSC ERROR: Local index 2 too large 1 (max) at 0 > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [1]PETSC ERROR: Petsc Development GIT revision: v3.6.2-1539-gf82f7f7 GIT Date: 2015-11-09 20:26:06 -0600 > [1]PETSC ERROR: ./ex5 on a arch-double named visitor098-088.wl.anl-external.org by barrysmith Thu Nov 12 14:18:56 2015 > [1]PETSC ERROR: Configure options --download-hwloc --download-hypre --download-mpich --with-afterimage PETSC_ARCH=arch-double > [1]PETSC ERROR: #1 ISLocalToGlobalMappingApply() line 423 in /Users/barrysmith/Src/PETSc/src/vec/is/utils/isltog.c > [1]PETSC ERROR: #2 MatSetValuesLocal() line 2020 in /Users/barrysmith/Src/PETSc/src/mat/interface/matrix.c > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [1]PETSC ERROR: Argument out of range > [1]PETSC ERROR: Local index 2 too large 1 (max) at 0 > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [1]PETSC ERROR: Petsc Development GIT revision: v3.6.2-1539-gf82f7f7 GIT Date: 2015-11-09 20:26:06 -0600 > [1]PETSC ERROR: ./ex5 on a arch-double named visitor098-088.wl.anl-external.org by barrysmith Thu Nov 12 14:18:56 2015 > [1]PETSC ERROR: Configure options --download-hwloc --download-hypre --download-mpich --with-afterimage PETSC_ARCH=arch-double > [1]PETSC ERROR: #3 ISLocalToGlobalMappingApply() line 423 in /Users/barrysmith/Src/PETSc/src/vec/is/utils/isltog.c > [1]PETSC ERROR: #4 VecSetValuesLocal() line 1058 in /Users/barrysmith/Src/PETSc/src/vec/vec/interface/rvector.c > Vec Object: 3 MPI processes > type: mpi > Vec Object:Vec_0x84000004_0 3 MPI processes > type: mpi > Process [0] > 0. > 0. > 0. > 0. > 0. > 0. > 1. > 2. > Process [1] > 2. > 0. > 0. > 1. > 2. > 2. > 0. > 0. > Process [2] > 0. > 0. > 0. > 0. > Vec Object: 3 MPI processes > type: mpi > Vec Object:Vec_0x84000004_1 3 MPI processes > type: mpi > Process [0] > 1. > 2. > Process [1] > 0. > 1. > Process [2] > 4. > 0. > Mat Object: 3 MPI processes > type: mpiaij > row 0: > row 1: > row 2: > row 3: (0, 1.) > row 4: > row 5: (1, 1.) > row 6: > row 7: > row 8: > row 9: > row 10: (2, 2.) > row 11: (3, 2.) > row 12: (3, 2.) > row 13: > row 14: > row 15: > row 16: > row 17: > row 18: > row 19: > > The reason is that in this case the DMDAs are decomposed into three strips, for x the strips are xi = 0,1 then xi = 2,3 then xi= 4 > for f the strips are fi=0, fi=1, fi=2 so there is no way to get a consistent local numbering for both x and f at the same time with 3 strips. So like the DMDA interpolation routines your code can only work for certain decompositions. Forget what I said about lx and ly I don't think that is relevant for what you are trying to do. > > Barry > > >> On Nov 12, 2015, at 12:23 PM, Gianluca Meneghello wrote: >> >> Hi Barry, >> >> sorry, but I still cannot make it. I guess what I need is something similar to MatRestrict/MatInterpolate (and B is something similar to what is created from DMCreateInterpolation, except for the fact that the nonzero entries are distributed differently). >> >> Am I mistaken? Is there any example I could start from? >> >> Thanks again, >> >> Gianluca >> >> On Wed, Nov 11, 2015 at 10:19 PM, Barry Smith wrote: >> DMDAGetOwnershipRanges >> >>> On Nov 11, 2015, at 10:47 PM, Gianluca Meneghello wrote: >>> >>> Hi, >>> >>> thanks for the very quick reply. >>> >>> One more question: is there a way to get the lx and ly from the first dm and use them (modified) for the second dm? DMDAGetInfo does not seem to provide this information. >>> >>> Thanks again for your help >>> >>> Gianluca >>> >>> On Wed, Nov 11, 2015 at 8:12 PM, Barry Smith wrote: >>> >>> When you create the 2 DM you must be set the lx, ly arguments (the ones you set to 0) in your code carefully to insure that the vectors for the 2 DM you create have compatible layout to do the matrix vector product. >>> >>> You can run a very small problem with 2 processors and printing out the vectors to see the layout to make sure you get it correct. >>> >>> The 2 DM don't have any magically way of knowing that you created another DMDA and want it to be compatible automatically. >>> >>> Barry >>> >>> DMDACreate2d(PETSC_COMM_WORLD , DM_BOUNDARY_GHOSTED , DM_BOUNDARY_GHOSTED , DMDA_STENCIL_BOX , >>> Mx , Nx , PETSC_DECIDE , PETSC_DECIDE , 1 , 0 , 0 , 0 , &dax); >>> DMDACreate2d(PETSC_COMM_WORLD , DM_BOUNDARY_NONE , DM_BOUNDARY_NONE , DMDA_STENCIL_BOX , >>> Mx-2*bs , Nx-2*bs , PETSC_DECIDE , PETSC_DECIDE , 1 , 0 , 0 , 0 , &daf); >>> >>>> On Nov 11, 2015, at 10:05 PM, Gianluca Meneghello wrote: >>>> >>>> Hi, >>>> >>>> I am trying to do something apparently really simple but with no success. >>>> >>>> I need to perform a matrix-vector multiplication x = B f , where the length of x is bigger than the length of f (or viceversa). Thus, B cannot be created using DMCreateMatrix. >>>> >>>> Both x and f are obtained from different DMs, the smaller covering only a subdomain of the larger. The application is to apply a control f to a system, e.g. \dot{x} = A x + B f. >>>> >>>> The problem is, when running on more than one core, the vector x is not organized as I would expect (everything works fine on a single core). >>>> >>>> I attach a short example where B is intended to map f to the interior of x. >>>> >>>> mpirun -n 1 ./test -draw_pause -1 works fine while >>>> mpirun -n 2 ./test -draw_pause -1 shows the problem >>>> >>>> I have not found any example with non square matrices in the src folder, any help is very welcome. >>>> >>>> Thanks for your time, >>>> >>>> Gianluca >>>> >>> >>> >> >> > From gianmail at gmail.com Thu Nov 12 16:38:09 2015 From: gianmail at gmail.com (Gianluca Meneghello) Date: Thu, 12 Nov 2015 14:38:09 -0800 Subject: [petsc-users] Parallel matrix-vector multiplication In-Reply-To: References: <85ADC6EB-7A56-4A00-9DEB-DBC997BB6970@mcs.anl.gov> <7BD83C8D-141C-4B89-AE89-5BDB40BCA2B4@mcs.anl.gov> Message-ID: Hi Barry, thanks for your help. I am going to have a look at the code right away, I am curious to understand better how PETSc works. I have in the mean time tried a different approach, modifying DMCreateInterpolation_DA_2D_Q0 to provide the mapping (B) between two dms which are not one the refinement of the other. It looks like it is working. If that can be useful, I will clean it up and mail it to you. Thanks again for your help, Gianluca 2015-11-12 12:25 GMT-08:00 Barry Smith : > I forgot the attached code > > > On Nov 12, 2015, at 2:24 PM, Barry Smith wrote: > > > > > > This is kind of tedious to get right. Plus it unfortunately will not > work for all possible partitionings > > > > Issues: > > > > 1) The MatSetValues() works with global number but PETSc global > numbering is per process so with DMDA does not normally match the "natural" > ordering. You are trying to use global numbering in the natural ordering. > To get it to work you need to use local numbering and MatSetValuesLocal() > > > > 2) it is next to impossible to debug as written. So what I did to help > debug is to put values using the local numbering into the two vectors f and > x and print those vectors. This makes it easy to see when values are put in > the wrong place. > > > > 3) VecView() for DMDA prints out the vectors in the global natural > ordering so it is easy to see if the vectors have the correct values in the > correct locations. MatView() for DMDA however prints it only in the PETSc > ordering by process so one needs to manual translate to make sure the > matrix is correct. > > > > Anyways I've attached your code with small Mx=5 and My=4 it runs > correctly with 1,2 and 4 processes here is the output > > > > > > $ petscmpiexec -n 1 ./ex5 > > Vec Object: 1 MPI processes > > type: seq > > Vec Object:Vec_0x84000000_0 1 MPI processes > > type: mpi > > Process [0] > > 0. > > 0. > > 0. > > 0. > > 0. > > 0. > > 1. > > 1. > > 1. > > 0. > > 0. > > 1. > > 1. > > 1. > > 0. > > 0. > > 0. > > 0. > > 0. > > 0. > > Vec Object: 1 MPI processes > > type: seq > > Vec Object:Vec_0x84000000_1 1 MPI processes > > type: mpi > > Process [0] > > 1. > > 1. > > 1. > > 1. > > 1. > > 1. > > Mat Object: 1 MPI processes > > type: seqaij > > row 0: > > row 1: > > row 2: > > row 3: > > row 4: > > row 5: > > row 6: (0, 1.) > > row 7: (1, 1.) > > row 8: (2, 1.) > > row 9: > > row 10: > > row 11: (3, 1.) > > row 12: (4, 1.) > > row 13: (5, 1.) > > row 14: > > row 15: > > row 16: > > row 17: > > row 18: > > row 19: > > ~/Src/petsc/test-dir (master=) arch-double > > $ petscmpiexec -n 2 ./ex5 > > Vec Object: 2 MPI processes > > type: mpi > > Vec Object:Vec_0x84000004_0 2 MPI processes > > type: mpi > > Process [0] > > 0. > > 0. > > 0. > > 0. > > 0. > > 0. > > 1. > > 1. > > 2. > > 0. > > 0. > > 1. > > Process [1] > > 1. > > 2. > > 0. > > 0. > > 0. > > 0. > > 0. > > 0. > > Vec Object: 2 MPI processes > > type: mpi > > Vec Object:Vec_0x84000004_1 2 MPI processes > > type: mpi > > Process [0] > > 1. > > 1. > > 2. > > 1. > > Process [1] > > 1. > > 2. > > Mat Object: 2 MPI processes > > type: mpiaij > > row 0: > > row 1: > > row 2: > > row 3: > > row 4: (0, 1.) > > row 5: (1, 1.) > > row 6: > > row 7: (2, 1.) > > row 8: (3, 1.) > > row 9: > > row 10: > > row 11: > > row 12: > > row 13: > > row 14: (4, 2.) > > row 15: > > row 16: (5, 2.) > > row 17: > > row 18: > > row 19: > > ~/Src/petsc/test-dir (master=) arch-double > > $ petscmpiexec -n 4 ./ex5 > > Vec Object: 4 MPI processes > > type: mpi > > Vec Object:Vec_0x84000004_0 4 MPI processes > > type: mpi > > Process [0] > > 0. > > 0. > > 0. > > 0. > > 0. > > 0. > > Process [1] > > 1. > > 1. > > 2. > > 0. > > Process [2] > > 0. > > 3. > > 3. > > 4. > > 0. > > 0. > > Process [3] > > 0. > > 0. > > 0. > > 0. > > Vec Object: 4 MPI processes > > type: mpi > > Vec Object:Vec_0x84000004_1 4 MPI processes > > type: mpi > > Process [0] > > 1. > > 1. > > Process [1] > > 2. > > Process [2] > > 3. > > 3. > > Process [3] > > 4. > > Mat Object: 4 MPI processes > > type: mpiaij > > row 0: > > row 1: > > row 2: > > row 3: > > row 4: (0, 1.) > > row 5: (1, 1.) > > row 6: > > row 7: > > row 8: (2, 2.) > > row 9: > > row 10: > > row 11: (3, 3.) > > row 12: (4, 3.) > > row 13: > > row 14: > > row 15: > > row 16: (5, 4.) > > row 17: > > row 18: > > row 19: > > > > > > It does NOT run correctly with 3 processes > > > > $ petscmpiexec -n 3 ./ex5 > > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [1]PETSC ERROR: Argument out of range > > [1]PETSC ERROR: Local index 2 too large 1 (max) at 0 > > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [1]PETSC ERROR: Petsc Development GIT revision: v3.6.2-1539-gf82f7f7 > GIT Date: 2015-11-09 20:26:06 -0600 > > [1]PETSC ERROR: ./ex5 on a arch-double named > visitor098-088.wl.anl-external.org by barrysmith Thu Nov 12 14:18:56 2015 > > [1]PETSC ERROR: Configure options --download-hwloc --download-hypre > --download-mpich --with-afterimage PETSC_ARCH=arch-double > > [1]PETSC ERROR: #1 ISLocalToGlobalMappingApply() line 423 in > /Users/barrysmith/Src/PETSc/src/vec/is/utils/isltog.c > > [1]PETSC ERROR: #2 MatSetValuesLocal() line 2020 in > /Users/barrysmith/Src/PETSc/src/mat/interface/matrix.c > > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [1]PETSC ERROR: Argument out of range > > [1]PETSC ERROR: Local index 2 too large 1 (max) at 0 > > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [1]PETSC ERROR: Petsc Development GIT revision: v3.6.2-1539-gf82f7f7 > GIT Date: 2015-11-09 20:26:06 -0600 > > [1]PETSC ERROR: ./ex5 on a arch-double named > visitor098-088.wl.anl-external.org by barrysmith Thu Nov 12 14:18:56 2015 > > [1]PETSC ERROR: Configure options --download-hwloc --download-hypre > --download-mpich --with-afterimage PETSC_ARCH=arch-double > > [1]PETSC ERROR: #3 ISLocalToGlobalMappingApply() line 423 in > /Users/barrysmith/Src/PETSc/src/vec/is/utils/isltog.c > > [1]PETSC ERROR: #4 VecSetValuesLocal() line 1058 in > /Users/barrysmith/Src/PETSc/src/vec/vec/interface/rvector.c > > Vec Object: 3 MPI processes > > type: mpi > > Vec Object:Vec_0x84000004_0 3 MPI processes > > type: mpi > > Process [0] > > 0. > > 0. > > 0. > > 0. > > 0. > > 0. > > 1. > > 2. > > Process [1] > > 2. > > 0. > > 0. > > 1. > > 2. > > 2. > > 0. > > 0. > > Process [2] > > 0. > > 0. > > 0. > > 0. > > Vec Object: 3 MPI processes > > type: mpi > > Vec Object:Vec_0x84000004_1 3 MPI processes > > type: mpi > > Process [0] > > 1. > > 2. > > Process [1] > > 0. > > 1. > > Process [2] > > 4. > > 0. > > Mat Object: 3 MPI processes > > type: mpiaij > > row 0: > > row 1: > > row 2: > > row 3: (0, 1.) > > row 4: > > row 5: (1, 1.) > > row 6: > > row 7: > > row 8: > > row 9: > > row 10: (2, 2.) > > row 11: (3, 2.) > > row 12: (3, 2.) > > row 13: > > row 14: > > row 15: > > row 16: > > row 17: > > row 18: > > row 19: > > > > The reason is that in this case the DMDAs are decomposed into three > strips, for x the strips are xi = 0,1 then xi = 2,3 then xi= 4 > > for f the strips are fi=0, fi=1, fi=2 so there is no way to get a > consistent local numbering for both x and f at the same time with 3 > strips. So like the DMDA interpolation routines your code can only work > for certain decompositions. Forget what I said about lx and ly I don't > think that is relevant for what you are trying to do. > > > > Barry > > > > > >> On Nov 12, 2015, at 12:23 PM, Gianluca Meneghello > wrote: > >> > >> Hi Barry, > >> > >> sorry, but I still cannot make it. I guess what I need is something > similar to MatRestrict/MatInterpolate (and B is something similar to what > is created from DMCreateInterpolation, except for the fact that the nonzero > entries are distributed differently). > >> > >> Am I mistaken? Is there any example I could start from? > >> > >> Thanks again, > >> > >> Gianluca > >> > >> On Wed, Nov 11, 2015 at 10:19 PM, Barry Smith > wrote: > >> DMDAGetOwnershipRanges > >> > >>> On Nov 11, 2015, at 10:47 PM, Gianluca Meneghello > wrote: > >>> > >>> Hi, > >>> > >>> thanks for the very quick reply. > >>> > >>> One more question: is there a way to get the lx and ly from the first > dm and use them (modified) for the second dm? DMDAGetInfo does not seem to > provide this information. > >>> > >>> Thanks again for your help > >>> > >>> Gianluca > >>> > >>> On Wed, Nov 11, 2015 at 8:12 PM, Barry Smith > wrote: > >>> > >>> When you create the 2 DM you must be set the lx, ly arguments (the > ones you set to 0) in your code carefully to insure that the vectors for > the 2 DM you create have compatible layout to do the matrix vector product. > >>> > >>> You can run a very small problem with 2 processors and printing out > the vectors to see the layout to make sure you get it correct. > >>> > >>> The 2 DM don't have any magically way of knowing that you created > another DMDA and want it to be compatible automatically. > >>> > >>> Barry > >>> > >>> DMDACreate2d(PETSC_COMM_WORLD , DM_BOUNDARY_GHOSTED , > DM_BOUNDARY_GHOSTED , DMDA_STENCIL_BOX , > >>> Mx , Nx , PETSC_DECIDE , PETSC_DECIDE , 1 , 0 , 0 , 0 , > &dax); > >>> DMDACreate2d(PETSC_COMM_WORLD , DM_BOUNDARY_NONE , > DM_BOUNDARY_NONE , DMDA_STENCIL_BOX , > >>> Mx-2*bs , Nx-2*bs , PETSC_DECIDE , PETSC_DECIDE , 1 , 0 , 0 , 0 , > &daf); > >>> > >>>> On Nov 11, 2015, at 10:05 PM, Gianluca Meneghello > wrote: > >>>> > >>>> Hi, > >>>> > >>>> I am trying to do something apparently really simple but with no > success. > >>>> > >>>> I need to perform a matrix-vector multiplication x = B f , where the > length of x is bigger than the length of f (or viceversa). Thus, B cannot > be created using DMCreateMatrix. > >>>> > >>>> Both x and f are obtained from different DMs, the smaller covering > only a subdomain of the larger. The application is to apply a control f to > a system, e.g. \dot{x} = A x + B f. > >>>> > >>>> The problem is, when running on more than one core, the vector x is > not organized as I would expect (everything works fine on a single core). > >>>> > >>>> I attach a short example where B is intended to map f to the interior > of x. > >>>> > >>>> mpirun -n 1 ./test -draw_pause -1 works fine while > >>>> mpirun -n 2 ./test -draw_pause -1 shows the problem > >>>> > >>>> I have not found any example with non square matrices in the src > folder, any help is very welcome. > >>>> > >>>> Thanks for your time, > >>>> > >>>> Gianluca > >>>> > >>> > >>> > >> > >> > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From timothee.nicolas at gmail.com Thu Nov 12 18:47:40 2015 From: timothee.nicolas at gmail.com (=?UTF-8?Q?Timoth=C3=A9e_Nicolas?=) Date: Fri, 13 Nov 2015 09:47:40 +0900 Subject: [petsc-users] syntax for routine in PCMGSetResidual Message-ID: Hi all, In the manual and the documentation, the syntax for the routine to be given as argument of PCMGSetResidual: PCMGSetResidual (PC pc,PetscInt l,PetscErrorCode (*residual)(Mat ,Vec ,Vec ,Vec ),Mat mat) is not specified. I mean that the order of the vectors is not specified. I suppose it is something like residual(Mat,b,x,r) with r = b - Mat*x, but it could as well be any combination like residual(Mat,r,x,b). There is no example in the documentation of the usage so I am confused. Does it absolutely need to be set ? I find the manual a bit confusing on this point. Is it only if matrix-free matrices are used ? In the present situation, I use matrix-free operators in a multigrid preconditioner (but the interpolation and restriction are not matrix free) and have not set this residual function yet. I get the following error: [13]PETSC ERROR: Preconditioner number of local rows -1 does not equal resulting vector number of rows 67584 Could this be related ? By the way, I don't understand what is meant by the "preconditioner number of local rows". I have separately tested the operators at each level and they are fine. Best Timothee -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Nov 12 19:38:00 2015 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 12 Nov 2015 19:38:00 -0600 Subject: [petsc-users] syntax for routine in PCMGSetResidual In-Reply-To: References: Message-ID: On Thu, Nov 12, 2015 at 6:47 PM, Timoth?e Nicolas < timothee.nicolas at gmail.com> wrote: > Hi all, > > In the manual and the documentation, the syntax for the routine to be > given as argument of PCMGSetResidual: > > PCMGSetResidual (PC pc,PetscInt l,PetscErrorCode (*residual)(Mat ,Vec ,Vec ,Vec ),Mat mat) > > > is not specified. I mean that the order of the vectors is not specified. I > suppose it is something like > residual(Mat,b,x,r) with r = b - Mat*x, but it could as well be any > combination like residual(Mat,r,x,b). There is no example in the > documentation of the usage so I am confused. Does it absolutely need to be > set ? I find the manual a bit confusing on this point. Is it only if > matrix-free matrices are used ? > > In the present situation, I use matrix-free operators in a multigrid > preconditioner (but the interpolation and restriction are not matrix free) > and have not set this residual function yet. I get the following error: > Always always always give the entire error message. We want the stack. The problem here looks like the preconditioner is reporting -1 rows for process 13. Matt > [13]PETSC ERROR: Preconditioner number of local rows -1 does not equal > resulting vector number of rows 67584 > > Could this be related ? By the way, I don't understand what is meant by > the "preconditioner number of local rows". I have separately tested the > operators at each level and they are fine. > > Best > > Timothee > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From timothee.nicolas at gmail.com Thu Nov 12 19:39:38 2015 From: timothee.nicolas at gmail.com (=?UTF-8?Q?Timoth=C3=A9e_Nicolas?=) Date: Fri, 13 Nov 2015 10:39:38 +0900 Subject: [petsc-users] syntax for routine in PCMGSetResidual In-Reply-To: References: Message-ID: Sorry, here is the full error message [0]PETSC ERROR: Nonconforming object sizes [0]PETSC ERROR: Preconditioner number of local rows -1 does not equal resulting vector number of rows 71808 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 [0]PETSC ERROR: ./mips_implicit on a arch-linux2-c-opt named helios90 by tnicolas Fri Nov 13 10:39:14 2015 [0]PETSC ERROR: Configure options --prefix=/csc/softs/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real --with-debugging=0 --with-x=0 --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpicxx --with-fortran --known-mpi-shared-libraries=1 --with-scalar-type=real --with-precision=double --CFLAGS="-g -O3 -mavx -mkl" --CXXFLAGS="-g -O3 -mavx -mkl" --FFLAGS="-g -O3 -mavx -mkl" [0]PETSC ERROR: #1 PCApply() line 472 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/pc/interface/precon.c [0]PETSC ERROR: #2 KSP_PCApply() line 242 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/include/petsc/private/kspimpl.h [0]PETSC ERROR: #3 KSPInitialResidual() line 63 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itres.c [0]PETSC ERROR: #4 KSPSolve_GMRES() line 235 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/impls/gmres/gmres.c [0]PETSC ERROR: #5 KSPSolve() line 604 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itfunc.c 2015-11-13 10:38 GMT+09:00 Matthew Knepley : > On Thu, Nov 12, 2015 at 6:47 PM, Timoth?e Nicolas < > timothee.nicolas at gmail.com> wrote: > >> Hi all, >> >> In the manual and the documentation, the syntax for the routine to be >> given as argument of PCMGSetResidual: >> >> PCMGSetResidual (PC pc,PetscInt l,PetscErrorCode (*residual)(Mat ,Vec ,Vec ,Vec ),Mat mat) >> >> >> is not specified. I mean that the order of the vectors is not specified. >> I suppose it is something like >> residual(Mat,b,x,r) with r = b - Mat*x, but it could as well be any >> combination like residual(Mat,r,x,b). There is no example in the >> documentation of the usage so I am confused. Does it absolutely need to be >> set ? I find the manual a bit confusing on this point. Is it only if >> matrix-free matrices are used ? >> >> In the present situation, I use matrix-free operators in a multigrid >> preconditioner (but the interpolation and restriction are not matrix free) >> and have not set this residual function yet. I get the following error: >> > > Always always always give the entire error message. We want the stack. > > The problem here looks like the preconditioner is reporting -1 rows for > process 13. > > Matt > > >> [13]PETSC ERROR: Preconditioner number of local rows -1 does not equal >> resulting vector number of rows 67584 >> >> Could this be related ? By the way, I don't understand what is meant by >> the "preconditioner number of local rows". I have separately tested the >> operators at each level and they are fine. >> >> Best >> >> Timothee >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From timothee.nicolas at gmail.com Thu Nov 12 19:40:30 2015 From: timothee.nicolas at gmail.com (=?UTF-8?Q?Timoth=C3=A9e_Nicolas?=) Date: Fri, 13 Nov 2015 10:40:30 +0900 Subject: [petsc-users] syntax for routine in PCMGSetResidual In-Reply-To: References: Message-ID: More precisely, on all 16 processes, same error message: [13]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [13]PETSC ERROR: Nonconforming object sizes [13]PETSC ERROR: Preconditioner number of local rows -1 does not equal resulting vector number of rows 67584 [13]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [13]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 [13]PETSC ERROR: ./mips_implicit on a arch-linux2-c-opt named helios90 by tnicolas Fri Nov 13 10:39:14 2015 [13]PETSC ERROR: Configure options --prefix=/csc/softs/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real --with-debugging=0 --with-x=0 --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpicxx --with-fortran --known-mpi-shared-libraries=1 --with-scalar-type=real --with-precision=double --CFLAGS="-g -O3 -mavx -mkl" --CXXFLAGS="-g -O3 -mavx -mkl" --FFLAGS="-g -O3 -mavx -mkl" [13]PETSC ERROR: #1 PCApply() line 472 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/pc/interface/precon.c [14]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [14]PETSC ERROR: Nonconforming object sizes [14]PETSC ERROR: Preconditioner number of local rows -1 does not equal resulting vector number of rows 71808 [14]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [14]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 [14]PETSC ERROR: ./mips_implicit on a arch-linux2-c-opt named helios90 by tnicolas Fri Nov 13 10:39:14 2015 [14]PETSC ERROR: Configure options --prefix=/csc/softs/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real --with-debugging=0 --with-x=0 --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpicxx --with-fortran --known-mpi-shared-libraries=1 --with-scalar-type=real --with-precision=double --CFLAGS="-g -O3 -mavx -mkl" --CXXFLAGS="-g -O3 -mavx -mkl" --FFLAGS="-g -O3 -mavx -mkl" [14]PETSC ERROR: #1 PCApply() line 472 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/pc/interface/precon.c [14]PETSC ERROR: #2 KSP_PCApply() line 242 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/include/petsc/private/kspimpl.h [14]PETSC ERROR: #3 KSPInitialResidual() line 63 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itres.c [14]PETSC ERROR: #4 KSPSolve_GMRES() line 235 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/impls/gmres/gmres.c [14]PETSC ERROR: [0]PETSC ERROR: [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: Nonconforming object sizes [1]PETSC ERROR: Preconditioner number of local rows -1 does not equal resulting vector number of rows 67584 [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [1]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 [1]PETSC ERROR: ./mips_implicit on a arch-linux2-c-opt named helios90 by tnicolas Fri Nov 13 10:39:14 2015 [1]PETSC ERROR: Configure options --prefix=/csc/softs/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real --with-debugging=0 --with-x=0 --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpicxx --with-fortran --known-mpi-shared-libraries=1 --with-scalar-type=real --with-precision=double --CFLAGS="-g -O3 -mavx -mkl" --CXXFLAGS="-g -O3 -mavx -mkl" --FFLAGS="-g -O3 -mavx -mkl" [1]PETSC ERROR: #1 PCApply() line 472 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/pc/interface/precon.c [1]PETSC ERROR: #2 KSP_PCApply() line 242 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/include/petsc/private/kspimpl.h [1]PETSC ERROR: #3 KSPInitialResidual() line 63 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itres.c [1]PETSC ERROR: #4 KSPSolve_GMRES() line 235 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/impls/gmres/gmres.c [1]PETSC ERROR: #5 KSPSolve() line 604 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itfunc.c [6]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [6]PETSC ERROR: Nonconforming object sizes [6]PETSC ERROR: Preconditioner number of local rows -1 does not equal resulting vector number of rows 71808 [6]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [6]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 [6]PETSC ERROR: ./mips_implicit on a arch-linux2-c-opt named helios90 by tnicolas Fri Nov 13 10:39:14 2015 [6]PETSC ERROR: Configure options --prefix=/csc/softs/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real --with-debugging=0 --with-x=0 --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpicxx --with-fortran --known-mpi-shared-libraries=1 --with-scalar-type=real --with-precision=double --CFLAGS="-g -O3 -mavx -mkl" --CXXFLAGS="-g -O3 -mavx -mkl" --FFLAGS="-g -O3 -mavx -mkl" [6]PETSC ERROR: #1 PCApply() line 472 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/pc/interface/precon.c [6]PETSC ERROR: #2 KSP_PCApply() line 242 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/include/petsc/private/kspimpl.h [6]PETSC ERROR: #3 KSPInitialResidual() line 63 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itres.c [6]PETSC ERROR: #4 KSPSolve_GMRES() line 235 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/impls/gmres/gmres.c [6]PETSC ERROR: #5 KSPSolve() line 604 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itfunc.c [7]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [7]PETSC ERROR: Nonconforming object sizes [7]PETSC ERROR: Preconditioner number of local rows -1 does not equal resulting vector number of rows 67584 [7]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [7]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 [7]PETSC ERROR: ./mips_implicit on a arch-linux2-c-opt named helios90 by tnicolas Fri Nov 13 10:39:14 2015 [7]PETSC ERROR: Configure options --prefix=/csc/softs/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real --with-debugging=0 --with-x=0 --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpicxx --with-fortran --known-mpi-shared-libraries=1 --with-scalar-type=real --with-precision=double --CFLAGS="-g -O3 -mavx -mkl" --CXXFLAGS="-g -O3 -mavx -mkl" --FFLAGS="-g -O3 -mavx -mkl" [7]PETSC ERROR: #1 PCApply() line 472 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/pc/interface/precon.c [7]PETSC ERROR: #2 KSP_PCApply() line 242 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/include/petsc/private/kspimpl.h [7]PETSC ERROR: #3 KSPInitialResidual() line 63 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itres.c [7]PETSC ERROR: #4 KSPSolve_GMRES() line 235 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/impls/gmres/gmres.c [7]PETSC ERROR: #5 KSPSolve() line 604 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itfunc.c [8]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [8]PETSC ERROR: Nonconforming object sizes [8]PETSC ERROR: Preconditioner number of local rows -1 does not equal resulting vector number of rows 71808 [8]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [8]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 [8]PETSC ERROR: ./mips_implicit on a arch-linux2-c-opt named helios90 by tnicolas Fri Nov 13 10:39:14 2015 [8]PETSC ERROR: Configure options --prefix=/csc/softs/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real --with-debugging=0 --with-x=0 --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpicxx --with-fortran --known-mpi-shared-libraries=1 --with-scalar-type=real --with-precision=double --CFLAGS="-g -O3 -mavx -mkl" --CXXFLAGS="-g -O3 -mavx -mkl" --FFLAGS="-g -O3 -mavx -mkl" [8]PETSC ERROR: #1 PCApply() line 472 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/pc/interface/precon.c [8]PETSC ERROR: #2 KSP_PCApply() line 242 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/include/petsc/private/kspimpl.h [8]PETSC ERROR: #3 KSPInitialResidual() line 63 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itres.c [8]PETSC ERROR: #4 KSPSolve_GMRES() line 235 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/impls/gmres/gmres.c [8]PETSC ERROR: #5 KSPSolve() line 604 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itfunc.c [9]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [9]PETSC ERROR: Nonconforming object sizes [9]PETSC ERROR: Preconditioner number of local rows -1 does not equal resulting vector number of rows 67584 [9]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [9]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 [9]PETSC ERROR: ./mips_implicit on a arch-linux2-c-opt named helios90 by tnicolas Fri Nov 13 10:39:14 2015 [9]PETSC ERROR: Configure options --prefix=/csc/softs/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real --with-debugging=0 --with-x=0 --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpicxx --with-fortran --known-mpi-shared-libraries=1 --with-scalar-type=real --with-precision=double --CFLAGS="-g -O3 -mavx -mkl" --CXXFLAGS="-g -O3 -mavx -mkl" --FFLAGS="-g -O3 -mavx -mkl" [9]PETSC ERROR: #1 PCApply() line 472 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/pc/interface/precon.c [9]PETSC ERROR: #2 KSP_PCApply() line 242 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/include/petsc/private/kspimpl.h [9]PETSC ERROR: #3 KSPInitialResidual() line 63 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itres.c [9]PETSC ERROR: #4 KSPSolve_GMRES() line 235 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/impls/gmres/gmres.c [9]PETSC ERROR: #5 KSPSolve() line 604 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itfunc.c [10]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [10]PETSC ERROR: Nonconforming object sizes [10]PETSC ERROR: Preconditioner number of local rows -1 does not equal resulting vector number of rows 71808 [10]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [10]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 [10]PETSC ERROR: ./mips_implicit on a arch-linux2-c-opt named helios90 by tnicolas Fri Nov 13 10:39:14 2015 [10]PETSC ERROR: Configure options --prefix=/csc/softs/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real --with-debugging=0 --with-x=0 --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpicxx --with-fortran --known-mpi-shared-libraries=1 --with-scalar-type=real --with-precision=double --CFLAGS="-g -O3 -mavx -mkl" --CXXFLAGS="-g -O3 -mavx -mkl" --FFLAGS="-g -O3 -mavx -mkl" [10]PETSC ERROR: #1 PCApply() line 472 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/pc/interface/precon.c [10]PETSC ERROR: #2 KSP_PCApply() line 242 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/include/petsc/private/kspimpl.h [10]PETSC ERROR: #3 KSPInitialResidual() line 63 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itres.c [10]PETSC ERROR: #4 KSPSolve_GMRES() line 235 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/impls/gmres/gmres.c [10]PETSC ERROR: #5 KSPSolve() line 604 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itfunc.c [11]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [11]PETSC ERROR: Nonconforming object sizes [11]PETSC ERROR: Preconditioner number of local rows -1 does not equal resulting vector number of rows 67584 [11]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [11]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 [11]PETSC ERROR: ./mips_implicit on a arch-linux2-c-opt named helios90 by tnicolas Fri Nov 13 10:39:14 2015 [11]PETSC ERROR: Configure options --prefix=/csc/softs/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real --with-debugging=0 --with-x=0 --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpicxx --with-fortran --known-mpi-shared-libraries=1 --with-scalar-type=real --with-precision=double --CFLAGS="-g -O3 -mavx -mkl" --CXXFLAGS="-g -O3 -mavx -mkl" --FFLAGS="-g -O3 -mavx -mkl" [11]PETSC ERROR: #1 PCApply() line 472 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/pc/interface/precon.c [11]PETSC ERROR: #2 KSP_PCApply() line 242 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/include/petsc/private/kspimpl.h [11]PETSC ERROR: #3 KSPInitialResidual() line 63 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itres.c [11]PETSC ERROR: #4 KSPSolve_GMRES() line 235 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/impls/gmres/gmres.c [11]PETSC ERROR: #5 KSPSolve() line 604 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itfunc.c [12]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [12]PETSC ERROR: Nonconforming object sizes [12]PETSC ERROR: Preconditioner number of local rows -1 does not equal resulting vector number of rows 71808 [12]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [12]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 [12]PETSC ERROR: ./mips_implicit on a arch-linux2-c-opt named helios90 by tnicolas Fri Nov 13 10:39:14 2015 [12]PETSC ERROR: Configure options --prefix=/csc/softs/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real --with-debugging=0 --with-x=0 --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpicxx --with-fortran --known-mpi-shared-libraries=1 --with-scalar-type=real --with-precision=double --CFLAGS="-g -O3 -mavx -mkl" --CXXFLAGS="-g -O3 -mavx -mkl" --FFLAGS="-g -O3 -mavx -mkl" [12]PETSC ERROR: #1 PCApply() line 472 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/pc/interface/precon.c [12]PETSC ERROR: #2 KSP_PCApply() line 242 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/include/petsc/private/kspimpl.h [12]PETSC ERROR: #3 KSPInitialResidual() line 63 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itres.c [12]PETSC ERROR: #4 KSPSolve_GMRES() line 235 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/impls/gmres/gmres.c [12]PETSC ERROR: #5 KSPSolve() line 604 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itfunc.c [13]PETSC ERROR: #2 KSP_PCApply() line 242 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/include/petsc/private/kspimpl.h [13]PETSC ERROR: #3 KSPInitialResidual() line 63 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itres.c [13]PETSC ERROR: #4 KSPSolve_GMRES() line 235 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/impls/gmres/gmres.c [13]PETSC ERROR: #5 KSPSolve() line 604 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itfunc.c [15]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [15]PETSC ERROR: Nonconforming object sizes [15]PETSC ERROR: Preconditioner number of local rows -1 does not equal resulting vector number of rows 67584 [15]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [15]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 [15]PETSC ERROR: ./mips_implicit on a arch-linux2-c-opt named helios90 by tnicolas Fri Nov 13 10:39:14 2015 [15]PETSC ERROR: Configure options --prefix=/csc/softs/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real --with-debugging=0 --with-x=0 --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpicxx --with-fortran --known-mpi-shared-libraries=1 --with-scalar-type=real --with-precision=double --CFLAGS="-g -O3 -mavx -mkl" --CXXFLAGS="-g -O3 -mavx -mkl" --FFLAGS="-g -O3 -mavx -mkl" [15]PETSC ERROR: #1 PCApply() line 472 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/pc/interface/precon.c [15]PETSC ERROR: #2 KSP_PCApply() line 242 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/include/petsc/private/kspimpl.h [15]PETSC ERROR: #3 KSPInitialResidual() line 63 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itres.c [15]PETSC ERROR: #4 KSPSolve_GMRES() line 235 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/impls/gmres/gmres.c [15]PETSC ERROR: #5 KSPSolve() line 604 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itfunc.c #5 KSPSolve() line 604 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itfunc.c [2]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [2]PETSC ERROR: Nonconforming object sizes [2]PETSC ERROR: Preconditioner number of local rows -1 does not equal resulting vector number of rows 71808 [2]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [2]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 [2]PETSC ERROR: ./mips_implicit on a arch-linux2-c-opt named helios90 by tnicolas Fri Nov 13 10:39:14 2015 [2]PETSC ERROR: Configure options --prefix=/csc/softs/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real --with-debugging=0 --with-x=0 --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpicxx --with-fortran --known-mpi-shared-libraries=1 --with-scalar-type=real --with-precision=double --CFLAGS="-g -O3 -mavx -mkl" --CXXFLAGS="-g -O3 -mavx -mkl" --FFLAGS="-g -O3 -mavx -mkl" [2]PETSC ERROR: #1 PCApply() line 472 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/pc/interface/precon.c [2]PETSC ERROR: #2 KSP_PCApply() line 242 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/include/petsc/private/kspimpl.h [2]PETSC ERROR: #3 KSPInitialResidual() line 63 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itres.c [2]PETSC ERROR: #4 KSPSolve_GMRES() line 235 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/impls/gmres/gmres.c [2]PETSC ERROR: #5 KSPSolve() line 604 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itfunc.c [3]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [3]PETSC ERROR: Nonconforming object sizes [3]PETSC ERROR: Preconditioner number of local rows -1 does not equal resulting vector number of rows 67584 [3]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [3]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 [3]PETSC ERROR: ./mips_implicit on a arch-linux2-c-opt named helios90 by tnicolas Fri Nov 13 10:39:14 2015 [3]PETSC ERROR: Configure options --prefix=/csc/softs/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real --with-debugging=0 --with-x=0 --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpicxx --with-fortran --known-mpi-shared-libraries=1 --with-scalar-type=real --with-precision=double --CFLAGS="-g -O3 -mavx -mkl" --CXXFLAGS="-g -O3 -mavx -mkl" --FFLAGS="-g -O3 -mavx -mkl" [3]PETSC ERROR: #1 PCApply() line 472 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/pc/interface/precon.c [3]PETSC ERROR: #2 KSP_PCApply() line 242 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/include/petsc/private/kspimpl.h [3]PETSC ERROR: #3 KSPInitialResidual() line 63 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itres.c [3]PETSC ERROR: #4 KSPSolve_GMRES() line 235 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/impls/gmres/gmres.c [3]PETSC ERROR: #5 KSPSolve() line 604 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itfunc.c [4]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [4]PETSC ERROR: Nonconforming object sizes [4]PETSC ERROR: Preconditioner number of local rows -1 does not equal resulting vector number of rows 71808 [4]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [4]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 [4]PETSC ERROR: ./mips_implicit on a arch-linux2-c-opt named helios90 by tnicolas Fri Nov 13 10:39:14 2015 [4]PETSC ERROR: Configure options --prefix=/csc/softs/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real --with-debugging=0 --with-x=0 --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpicxx --with-fortran --known-mpi-shared-libraries=1 --with-scalar-type=real --with-precision=double --CFLAGS="-g -O3 -mavx -mkl" --CXXFLAGS="-g -O3 -mavx -mkl" --FFLAGS="-g -O3 -mavx -mkl" [4]PETSC ERROR: #1 PCApply() line 472 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/pc/interface/precon.c [4]PETSC ERROR: #2 KSP_PCApply() line 242 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/include/petsc/private/kspimpl.h [4]PETSC ERROR: #3 KSPInitialResidual() line 63 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itres.c [4]PETSC ERROR: #4 KSPSolve_GMRES() line 235 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/impls/gmres/gmres.c [4]PETSC ERROR: #5 KSPSolve() line 604 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itfunc.c [5]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [5]PETSC ERROR: Nonconforming object sizes [5]PETSC ERROR: Preconditioner number of local rows -1 does not equal resulting vector number of rows 67584 [5]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [5]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 [5]PETSC ERROR: ./mips_implicit on a arch-linux2-c-opt named helios90 by tnicolas Fri Nov 13 10:39:14 2015 [5]PETSC ERROR: Configure options --prefix=/csc/softs/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real --with-debugging=0 --with-x=0 --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpicxx --with-fortran --known-mpi-shared-libraries=1 --with-scalar-type=real --with-precision=double --CFLAGS="-g -O3 -mavx -mkl" --CXXFLAGS="-g -O3 -mavx -mkl" --FFLAGS="-g -O3 -mavx -mkl" [5]PETSC ERROR: #1 PCApply() line 472 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/pc/interface/precon.c [5]PETSC ERROR: #2 KSP_PCApply() line 242 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/include/petsc/private/kspimpl.h [5]PETSC ERROR: #3 KSPInitialResidual() line 63 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itres.c [5]PETSC ERROR: #4 KSPSolve_GMRES() line 235 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/impls/gmres/gmres.c [5]PETSC ERROR: #5 KSPSolve() line 604 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itfunc.c --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Nonconforming object sizes [0]PETSC ERROR: Preconditioner number of local rows -1 does not equal resulting vector number of rows 71808 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 [0]PETSC ERROR: ./mips_implicit on a arch-linux2-c-opt named helios90 by tnicolas Fri Nov 13 10:39:14 2015 [0]PETSC ERROR: Configure options --prefix=/csc/softs/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real --with-debugging=0 --with-x=0 --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpicxx --with-fortran --known-mpi-shared-libraries=1 --with-scalar-type=real --with-precision=double --CFLAGS="-g -O3 -mavx -mkl" --CXXFLAGS="-g -O3 -mavx -mkl" --FFLAGS="-g -O3 -mavx -mkl" [0]PETSC ERROR: #1 PCApply() line 472 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/pc/interface/precon.c [0]PETSC ERROR: #2 KSP_PCApply() line 242 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/include/petsc/private/kspimpl.h [0]PETSC ERROR: #3 KSPInitialResidual() line 63 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itres.c [0]PETSC ERROR: #4 KSPSolve_GMRES() line 235 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/impls/gmres/gmres.c [0]PETSC ERROR: #5 KSPSolve() line 604 in /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itfunc.c 2015-11-13 10:39 GMT+09:00 Timoth?e Nicolas : > Sorry, here is the full error message > > [0]PETSC ERROR: Nonconforming object sizes > [0]PETSC ERROR: Preconditioner number of local rows -1 does not equal > resulting vector number of rows 71808 > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 > [0]PETSC ERROR: ./mips_implicit on a arch-linux2-c-opt named helios90 by > tnicolas Fri Nov 13 10:39:14 2015 > [0]PETSC ERROR: Configure options > --prefix=/csc/softs/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real > --with-debugging=0 --with-x=0 --with-cc=mpicc --with-fc=mpif90 > --with-cxx=mpicxx --with-fortran --known-mpi-shared-libraries=1 > --with-scalar-type=real --with-precision=double --CFLAGS="-g -O3 -mavx > -mkl" --CXXFLAGS="-g -O3 -mavx -mkl" --FFLAGS="-g -O3 -mavx -mkl" > [0]PETSC ERROR: #1 PCApply() line 472 in > /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/pc/interface/precon.c > [0]PETSC ERROR: #2 KSP_PCApply() line 242 in > /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/include/petsc/private/kspimpl.h > [0]PETSC ERROR: #3 KSPInitialResidual() line 63 in > /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itres.c > [0]PETSC ERROR: #4 KSPSolve_GMRES() line 235 in > /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/impls/gmres/gmres.c > [0]PETSC ERROR: #5 KSPSolve() line 604 in > /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itfunc.c > > 2015-11-13 10:38 GMT+09:00 Matthew Knepley : > >> On Thu, Nov 12, 2015 at 6:47 PM, Timoth?e Nicolas < >> timothee.nicolas at gmail.com> wrote: >> >>> Hi all, >>> >>> In the manual and the documentation, the syntax for the routine to be >>> given as argument of PCMGSetResidual: >>> >>> PCMGSetResidual (PC pc,PetscInt l,PetscErrorCode (*residual)(Mat ,Vec ,Vec ,Vec ),Mat mat) >>> >>> >>> is not specified. I mean that the order of the vectors is not specified. >>> I suppose it is something like >>> residual(Mat,b,x,r) with r = b - Mat*x, but it could as well be any >>> combination like residual(Mat,r,x,b). There is no example in the >>> documentation of the usage so I am confused. Does it absolutely need to be >>> set ? I find the manual a bit confusing on this point. Is it only if >>> matrix-free matrices are used ? >>> >>> In the present situation, I use matrix-free operators in a multigrid >>> preconditioner (but the interpolation and restriction are not matrix free) >>> and have not set this residual function yet. I get the following error: >>> >> >> Always always always give the entire error message. We want the stack. >> >> The problem here looks like the preconditioner is reporting -1 rows for >> process 13. >> >> Matt >> >> >>> [13]PETSC ERROR: Preconditioner number of local rows -1 does not equal >>> resulting vector number of rows 67584 >>> >>> Could this be related ? By the way, I don't understand what is meant by >>> the "preconditioner number of local rows". I have separately tested the >>> operators at each level and they are fine. >>> >>> Best >>> >>> Timothee >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Nov 12 19:53:10 2015 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 12 Nov 2015 19:53:10 -0600 Subject: [petsc-users] syntax for routine in PCMGSetResidual In-Reply-To: References: Message-ID: On Thu, Nov 12, 2015 at 7:39 PM, Timoth?e Nicolas < timothee.nicolas at gmail.com> wrote: > Sorry, here is the full error message > > [0]PETSC ERROR: Nonconforming object sizes > [0]PETSC ERROR: Preconditioner number of local rows -1 does not equal > resulting vector number of rows 71808 > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 > [0]PETSC ERROR: ./mips_implicit on a arch-linux2-c-opt named helios90 by > tnicolas Fri Nov 13 10:39:14 2015 > [0]PETSC ERROR: Configure options > --prefix=/csc/softs/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real > --with-debugging=0 --with-x=0 --with-cc=mpicc --with-fc=mpif90 > --with-cxx=mpicxx --with-fortran --known-mpi-shared-libraries=1 > --with-scalar-type=real --with-precision=double --CFLAGS="-g -O3 -mavx > -mkl" --CXXFLAGS="-g -O3 -mavx -mkl" --FFLAGS="-g -O3 -mavx -mkl" > [0]PETSC ERROR: #1 PCApply() line 472 in > /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/pc/interface/precon.c > [0]PETSC ERROR: #2 KSP_PCApply() line 242 in > /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/include/petsc/private/kspimpl.h > [0]PETSC ERROR: #3 KSPInitialResidual() line 63 in > /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itres.c > [0]PETSC ERROR: #4 KSPSolve_GMRES() line 235 in > /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/impls/gmres/gmres.c > [0]PETSC ERROR: #5 KSPSolve() line 604 in > /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itfunc.c > The PC uses the matrix it gets to determine sizes, and compare to the input vectors it gets for PCApply(). The preconditioner matrix is not setup or is not reporting sizes, for example if its a MATSHELL it does not have any sizes. Matt > 2015-11-13 10:38 GMT+09:00 Matthew Knepley : > >> On Thu, Nov 12, 2015 at 6:47 PM, Timoth?e Nicolas < >> timothee.nicolas at gmail.com> wrote: >> >>> Hi all, >>> >>> In the manual and the documentation, the syntax for the routine to be >>> given as argument of PCMGSetResidual: >>> >>> PCMGSetResidual (PC pc,PetscInt l,PetscErrorCode (*residual)(Mat ,Vec ,Vec ,Vec ),Mat mat) >>> >>> >>> is not specified. I mean that the order of the vectors is not specified. >>> I suppose it is something like >>> residual(Mat,b,x,r) with r = b - Mat*x, but it could as well be any >>> combination like residual(Mat,r,x,b). There is no example in the >>> documentation of the usage so I am confused. Does it absolutely need to be >>> set ? I find the manual a bit confusing on this point. Is it only if >>> matrix-free matrices are used ? >>> >>> In the present situation, I use matrix-free operators in a multigrid >>> preconditioner (but the interpolation and restriction are not matrix free) >>> and have not set this residual function yet. I get the following error: >>> >> >> Always always always give the entire error message. We want the stack. >> >> The problem here looks like the preconditioner is reporting -1 rows for >> process 13. >> >> Matt >> >> >>> [13]PETSC ERROR: Preconditioner number of local rows -1 does not equal >>> resulting vector number of rows 67584 >>> >>> Could this be related ? By the way, I don't understand what is meant by >>> the "preconditioner number of local rows". I have separately tested the >>> operators at each level and they are fine. >>> >>> Best >>> >>> Timothee >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From timothee.nicolas at gmail.com Thu Nov 12 19:56:48 2015 From: timothee.nicolas at gmail.com (=?UTF-8?Q?Timoth=C3=A9e_Nicolas?=) Date: Fri, 13 Nov 2015 10:56:48 +0900 Subject: [petsc-users] syntax for routine in PCMGSetResidual In-Reply-To: References: Message-ID: Mmmh, that's strange because I define my matrices with the command call MatCreateShell(PETSC_COMM_WORLD,lctx(level)%localsize,lctx(level)%localsize, & & lctx(level)%ngpdof,lctx(level)%ngpdof,lctx(level), & lctx(level)%Mmat,ierr) and at each level I checked that the sizes "localsize" and "ngpdof" are well set. Timothee 2015-11-13 10:53 GMT+09:00 Matthew Knepley : > On Thu, Nov 12, 2015 at 7:39 PM, Timoth?e Nicolas < > timothee.nicolas at gmail.com> wrote: > >> Sorry, here is the full error message >> >> [0]PETSC ERROR: Nonconforming object sizes >> [0]PETSC ERROR: Preconditioner number of local rows -1 does not equal >> resulting vector number of rows 71808 >> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html >> for trouble shooting. >> [0]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 >> [0]PETSC ERROR: ./mips_implicit on a arch-linux2-c-opt named helios90 by >> tnicolas Fri Nov 13 10:39:14 2015 >> [0]PETSC ERROR: Configure options >> --prefix=/csc/softs/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real >> --with-debugging=0 --with-x=0 --with-cc=mpicc --with-fc=mpif90 >> --with-cxx=mpicxx --with-fortran --known-mpi-shared-libraries=1 >> --with-scalar-type=real --with-precision=double --CFLAGS="-g -O3 -mavx >> -mkl" --CXXFLAGS="-g -O3 -mavx -mkl" --FFLAGS="-g -O3 -mavx -mkl" >> [0]PETSC ERROR: #1 PCApply() line 472 in >> /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/pc/interface/precon.c >> [0]PETSC ERROR: #2 KSP_PCApply() line 242 in >> /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/include/petsc/private/kspimpl.h >> [0]PETSC ERROR: #3 KSPInitialResidual() line 63 in >> /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itres.c >> [0]PETSC ERROR: #4 KSPSolve_GMRES() line 235 in >> /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/impls/gmres/gmres.c >> [0]PETSC ERROR: #5 KSPSolve() line 604 in >> /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itfunc.c >> > > The PC uses the matrix it gets to determine sizes, and compare to the > input vectors it gets for PCApply(). The > preconditioner matrix is not setup or is not reporting sizes, for example > if its a MATSHELL it does not have any sizes. > > Matt > > >> 2015-11-13 10:38 GMT+09:00 Matthew Knepley : >> >>> On Thu, Nov 12, 2015 at 6:47 PM, Timoth?e Nicolas < >>> timothee.nicolas at gmail.com> wrote: >>> >>>> Hi all, >>>> >>>> In the manual and the documentation, the syntax for the routine to be >>>> given as argument of PCMGSetResidual: >>>> >>>> PCMGSetResidual (PC pc,PetscInt l,PetscErrorCode (*residual)(Mat ,Vec ,Vec ,Vec ),Mat mat) >>>> >>>> >>>> is not specified. I mean that the order of the vectors is not >>>> specified. I suppose it is something like >>>> residual(Mat,b,x,r) with r = b - Mat*x, but it could as well be any >>>> combination like residual(Mat,r,x,b). There is no example in the >>>> documentation of the usage so I am confused. Does it absolutely need to be >>>> set ? I find the manual a bit confusing on this point. Is it only if >>>> matrix-free matrices are used ? >>>> >>>> In the present situation, I use matrix-free operators in a multigrid >>>> preconditioner (but the interpolation and restriction are not matrix free) >>>> and have not set this residual function yet. I get the following error: >>>> >>> >>> Always always always give the entire error message. We want the stack. >>> >>> The problem here looks like the preconditioner is reporting -1 rows for >>> process 13. >>> >>> Matt >>> >>> >>>> [13]PETSC ERROR: Preconditioner number of local rows -1 does not equal >>>> resulting vector number of rows 67584 >>>> >>>> Could this be related ? By the way, I don't understand what is meant by >>>> the "preconditioner number of local rows". I have separately tested the >>>> operators at each level and they are fine. >>>> >>>> Best >>>> >>>> Timothee >>>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Nov 12 21:00:58 2015 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 12 Nov 2015 21:00:58 -0600 Subject: [petsc-users] syntax for routine in PCMGSetResidual In-Reply-To: References: Message-ID: On Thu, Nov 12, 2015 at 7:56 PM, Timoth?e Nicolas < timothee.nicolas at gmail.com> wrote: > Mmmh, that's strange because I define my matrices with the command > > call > MatCreateShell(PETSC_COMM_WORLD,lctx(level)%localsize,lctx(level)%localsize, > & > & lctx(level)%ngpdof,lctx(level)%ngpdof,lctx(level), > & lctx(level)%Mmat,ierr) > > and at each level I checked that the sizes "localsize" and "ngpdof" are > well set. > You should be able to trace back in the debugger to see what is sat as pc->mat. Matt > Timothee > > 2015-11-13 10:53 GMT+09:00 Matthew Knepley : > >> On Thu, Nov 12, 2015 at 7:39 PM, Timoth?e Nicolas < >> timothee.nicolas at gmail.com> wrote: >> >>> Sorry, here is the full error message >>> >>> [0]PETSC ERROR: Nonconforming object sizes >>> [0]PETSC ERROR: Preconditioner number of local rows -1 does not equal >>> resulting vector number of rows 71808 >>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html >>> for trouble shooting. >>> [0]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 >>> [0]PETSC ERROR: ./mips_implicit on a arch-linux2-c-opt named helios90 by >>> tnicolas Fri Nov 13 10:39:14 2015 >>> [0]PETSC ERROR: Configure options >>> --prefix=/csc/softs/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real >>> --with-debugging=0 --with-x=0 --with-cc=mpicc --with-fc=mpif90 >>> --with-cxx=mpicxx --with-fortran --known-mpi-shared-libraries=1 >>> --with-scalar-type=real --with-precision=double --CFLAGS="-g -O3 -mavx >>> -mkl" --CXXFLAGS="-g -O3 -mavx -mkl" --FFLAGS="-g -O3 -mavx -mkl" >>> [0]PETSC ERROR: #1 PCApply() line 472 in >>> /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/pc/interface/precon.c >>> [0]PETSC ERROR: #2 KSP_PCApply() line 242 in >>> /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/include/petsc/private/kspimpl.h >>> [0]PETSC ERROR: #3 KSPInitialResidual() line 63 in >>> /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itres.c >>> [0]PETSC ERROR: #4 KSPSolve_GMRES() line 235 in >>> /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/impls/gmres/gmres.c >>> [0]PETSC ERROR: #5 KSPSolve() line 604 in >>> /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itfunc.c >>> >> >> The PC uses the matrix it gets to determine sizes, and compare to the >> input vectors it gets for PCApply(). The >> preconditioner matrix is not setup or is not reporting sizes, for example >> if its a MATSHELL it does not have any sizes. >> >> Matt >> >> >>> 2015-11-13 10:38 GMT+09:00 Matthew Knepley : >>> >>>> On Thu, Nov 12, 2015 at 6:47 PM, Timoth?e Nicolas < >>>> timothee.nicolas at gmail.com> wrote: >>>> >>>>> Hi all, >>>>> >>>>> In the manual and the documentation, the syntax for the routine to be >>>>> given as argument of PCMGSetResidual: >>>>> >>>>> PCMGSetResidual (PC pc,PetscInt l,PetscErrorCode (*residual)(Mat ,Vec ,Vec ,Vec ),Mat mat) >>>>> >>>>> >>>>> is not specified. I mean that the order of the vectors is not >>>>> specified. I suppose it is something like >>>>> residual(Mat,b,x,r) with r = b - Mat*x, but it could as well be any >>>>> combination like residual(Mat,r,x,b). There is no example in the >>>>> documentation of the usage so I am confused. Does it absolutely need to be >>>>> set ? I find the manual a bit confusing on this point. Is it only if >>>>> matrix-free matrices are used ? >>>>> >>>>> In the present situation, I use matrix-free operators in a multigrid >>>>> preconditioner (but the interpolation and restriction are not matrix free) >>>>> and have not set this residual function yet. I get the following error: >>>>> >>>> >>>> Always always always give the entire error message. We want the stack. >>>> >>>> The problem here looks like the preconditioner is reporting -1 rows for >>>> process 13. >>>> >>>> Matt >>>> >>>> >>>>> [13]PETSC ERROR: Preconditioner number of local rows -1 does not equal >>>>> resulting vector number of rows 67584 >>>>> >>>>> Could this be related ? By the way, I don't understand what is meant >>>>> by the "preconditioner number of local rows". I have separately tested the >>>>> operators at each level and they are fine. >>>>> >>>>> Best >>>>> >>>>> Timothee >>>>> >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.knezevic at akselos.com Thu Nov 12 22:02:17 2015 From: david.knezevic at akselos.com (David Knezevic) Date: Thu, 12 Nov 2015 23:02:17 -0500 Subject: [petsc-users] GAMG and zero pivots follow up In-Reply-To: References: Message-ID: On Thu, Nov 12, 2015 at 9:58 AM, David Knezevic wrote: > On Thu, Nov 12, 2015 at 9:50 AM, Mark Adams wrote: > >> Note, I suspect the zero pivot is coming from a coarse grid. I don't >> know why no-inode fixed it. N.B., this might not be deterministic. >> >> If you run with -info and grep on 'GAMG' you will see the block sizes. >> They should be 3 on the fine grid and the 6 on the coarse grids if you set >> everything up correctly. >> > > OK, thanks for the info, I'll look into this further. > > Though note that I got it to work now without needing no-inode now. The > change I made was to make sure that I matched the order of function calls > from the PETSc GAMG examples. The libMesh code I was using was doing things > somewhat out of order, apparently. > As I mentioned above, GAMG seems to be working fine for me now. Thanks for the help on this. However, I wanted to ask about another 3D elasticity test case I ran in which GAMG didn't converge. The "-ksp_view -ksp_monitor" output is below. Any insights into what options I should set in order to get it to converge in this case would be appreciated. (By the way, ML worked in this case with the default options.) Thanks, David -------------------------------------------- 0 KSP Residual norm 2.475892877233e-03 1 KSP Residual norm 6.181785698923e-05 2 KSP Residual norm 9.439050821656e-04 KSP Object: 8 MPI processes type: cg maximum iterations=5000 tolerances: relative=1e-12, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using PRECONDITIONED norm type for convergence test PC Object: 8 MPI processes type: gamg MG: type is MULTIPLICATIVE, levels=5 cycles=v Cycles per PCApply=1 Using Galerkin computed coarse grid matrices GAMG specific options Threshold for dropping small values from graph 0 AGG specific options Symmetric graph false Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 8 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 8 MPI processes type: bjacobi block Jacobi: number of blocks = 8 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (mg_coarse_sub_) 1 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_sub_) 1 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=6, cols=6, bs=6 package used to perform factorization: petsc total: nonzeros=36, allocated nonzeros=36 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 2 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=6, cols=6, bs=6 total: nonzeros=36, allocated nonzeros=36 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 2 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaij rows=6, cols=6, bs=6 total: nonzeros=36, allocated nonzeros=36 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 2 nodes, limit used is 5 Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 8 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.146922, max = 1.61615 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_1_esteig_) 8 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_1_) 8 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaij rows=162, cols=162, bs=6 total: nonzeros=26244, allocated nonzeros=26244 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 5 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_levels_2_) 8 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.143941, max = 1.58335 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_2_esteig_) 8 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_2_) 8 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaij rows=4668, cols=4668, bs=6 total: nonzeros=3.00254e+06, allocated nonzeros=3.00254e+06 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 158 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 3 ------------------------------- KSP Object: (mg_levels_3_) 8 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.128239, max = 1.41063 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_3_esteig_) 8 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_3_) 8 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaij rows=66390, cols=66390, bs=6 total: nonzeros=1.65982e+07, allocated nonzeros=1.65982e+07 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 2668 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 4 ------------------------------- KSP Object: (mg_levels_4_) 8 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.1, max = 1.1 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_4_esteig_) 8 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_4_) 8 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: () 8 MPI processes type: mpiaij rows=1532187, cols=1532187, bs=3 total: nonzeros=1.21304e+08, allocated nonzeros=1.21304e+08 total number of mallocs used during MatSetValues calls =0 has attached near null space using I-node (on process 0) routines: found 66403 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: () 8 MPI processes type: mpiaij rows=1532187, cols=1532187, bs=3 total: nonzeros=1.21304e+08, allocated nonzeros=1.21304e+08 total number of mallocs used during MatSetValues calls =0 has attached near null space using I-node (on process 0) routines: found 66403 nodes, limit used is 5 Error, conv_flag < 0! > > > > > > On Wed, Nov 11, 2015 at 1:36 PM, David Knezevic < > david.knezevic at akselos.com> wrote: > >> On Wed, Nov 11, 2015 at 12:57 PM, David Knezevic < >> david.knezevic at akselos.com> wrote: >> >>> On Wed, Nov 11, 2015 at 12:24 PM, David Knezevic < >>> david.knezevic at akselos.com> wrote: >>> >>>> On Tue, Nov 10, 2015 at 10:28 PM, David Knezevic < >>>> david.knezevic at akselos.com> wrote: >>>> >>>>> On Tue, Nov 10, 2015 at 10:24 PM, Matthew Knepley >>>>> wrote: >>>>> >>>>>> On Tue, Nov 10, 2015 at 9:21 PM, David Knezevic < >>>>>> david.knezevic at akselos.com> wrote: >>>>>> >>>>>>> On Tue, Nov 10, 2015 at 10:00 PM, Matthew Knepley >>>>>> > wrote: >>>>>>> >>>>>>>> On Tue, Nov 10, 2015 at 8:39 PM, David Knezevic < >>>>>>>> david.knezevic at akselos.com> wrote: >>>>>>>> >>>>>>>>> I'm looking into using GAMG, so I wanted to start with a simple 3D >>>>>>>>> elasticity problem. When I first tried this, I got the following "zero >>>>>>>>> pivot" error: >>>>>>>>> >>>>>>>>> >>>>>>>>> ----------------------------------------------------------------------- >>>>>>>>> >>>>>>>>> [0]PETSC ERROR: Zero pivot in LU factorization: >>>>>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#zeropivot >>>>>>>>> [0]PETSC ERROR: Zero pivot, row 3 >>>>>>>>> [0]PETSC ERROR: See >>>>>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>>>>>>>> shooting. >>>>>>>>> [0]PETSC ERROR: Petsc Release Version 3.6.1, Jul, 22, 2015 >>>>>>>>> [0]PETSC ERROR: >>>>>>>>> /home/dknez/akselos-dev/scrbe/build/bin/fe_solver-opt_real on a >>>>>>>>> arch-linux2-c-opt named david-Lenovo by dknez Tue Nov 10 21:26:39 2015 >>>>>>>>> [0]PETSC ERROR: Configure options --with-shared-libraries=1 >>>>>>>>> --with-debugging=0 --download-suitesparse --download-parmetis >>>>>>>>> --download-blacs >>>>>>>>> --with-blas-lapack-dir=/opt/intel/system_studio_2015.2.050/mkl >>>>>>>>> --CXXFLAGS=-Wl,--no-as-needed --download-scalapack --download-mumps >>>>>>>>> --download-metis --download-superlu_dist >>>>>>>>> --prefix=/home/dknez/software/libmesh_install/opt_real/petsc >>>>>>>>> --download-hypre --download-ml >>>>>>>>> [0]PETSC ERROR: #1 PetscKernel_A_gets_inverse_A_5() line 48 in >>>>>>>>> /home/dknez/software/petsc-3.6.1/src/mat/impls/baij/seq/dgefa5.c >>>>>>>>> [0]PETSC ERROR: #2 MatSOR_SeqAIJ_Inode() line 2808 in >>>>>>>>> /home/dknez/software/petsc-3.6.1/src/mat/impls/aij/seq/inode.c >>>>>>>>> [0]PETSC ERROR: #3 MatSOR() line 3697 in >>>>>>>>> /home/dknez/software/petsc-3.6.1/src/mat/interface/matrix.c >>>>>>>>> [0]PETSC ERROR: #4 PCApply_SOR() line 37 in >>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/sor/sor.c >>>>>>>>> [0]PETSC ERROR: #5 PCApply() line 482 in >>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/interface/precon.c >>>>>>>>> [0]PETSC ERROR: #6 KSP_PCApply() line 242 in >>>>>>>>> /home/dknez/software/petsc-3.6.1/include/petsc/private/kspimpl.h >>>>>>>>> [0]PETSC ERROR: #7 KSPInitialResidual() line 63 in >>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itres.c >>>>>>>>> [0]PETSC ERROR: #8 KSPSolve_GMRES() line 235 in >>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/gmres/gmres.c >>>>>>>>> [0]PETSC ERROR: #9 KSPSolve() line 604 in >>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c >>>>>>>>> [0]PETSC ERROR: #10 KSPSolve_Chebyshev() line 381 in >>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/cheby/cheby.c >>>>>>>>> [0]PETSC ERROR: #11 KSPSolve() line 604 in >>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c >>>>>>>>> [0]PETSC ERROR: #12 PCMGMCycle_Private() line 19 in >>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c >>>>>>>>> [0]PETSC ERROR: #13 PCMGMCycle_Private() line 48 in >>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c >>>>>>>>> [0]PETSC ERROR: #14 PCApply_MG() line 338 in >>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c >>>>>>>>> [0]PETSC ERROR: #15 PCApply() line 482 in >>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/interface/precon.c >>>>>>>>> [0]PETSC ERROR: #16 KSP_PCApply() line 242 in >>>>>>>>> /home/dknez/software/petsc-3.6.1/include/petsc/private/kspimpl.h >>>>>>>>> [0]PETSC ERROR: #17 KSPSolve_CG() line 139 in >>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/cg/cg.c >>>>>>>>> [0]PETSC ERROR: #18 KSPSolve() line 604 in >>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c >>>>>>>>> >>>>>>>>> >>>>>>>>> ----------------------------------------------------------------------- >>>>>>>>> >>>>>>>>> I saw that there was a thread about this in September (subject: >>>>>>>>> "gamg and zero pivots"), and that the fix is to use "-mg_levels_pc_type >>>>>>>>> jacobi." When I do that, the solve succeeds (I pasted the -ksp_view at the >>>>>>>>> end of this email). >>>>>>>>> >>>>>>>>> So I have two questions about this: >>>>>>>>> >>>>>>>>> 1. Is it surprising that I hit this issue for a 3D elasticity >>>>>>>>> problem? Note that matrix assembly was done in libMesh, I can look into the >>>>>>>>> structure of the assembled matrix more carefully, if needed. Also, note >>>>>>>>> that I can solve this problem with direct solvers just fine. >>>>>>>>> >>>>>>>> >>>>>>>> Yes, this seems like a bug, but it could be some strange BC thing I >>>>>>>> do not understand. >>>>>>>> >>>>>>> >>>>>>> >>>>>>> OK, I can look into the matrix in more detail. I agree that it >>>>>>> should have a non-zero diagonal, so I'll have a look at what's happening >>>>>>> with that. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Naively, the elastic element matrix has a nonzero diagonal. I see >>>>>>>> that you are doing LU >>>>>>>> of size 5. That seems strange for 3D elasticity. Am I missing >>>>>>>> something? I would expect >>>>>>>> block size 3. >>>>>>>> >>>>>>> >>>>>>> >>>>>>> I'm not sure what is causing the LU of size 5. Is there a setting to >>>>>>> control that? >>>>>>> >>>>>>> Regarding the block size: I set the vector and matrix block size to >>>>>>> 3 via VecSetBlockSize and MatSetBlockSize. I also >>>>>>> used MatNullSpaceCreateRigidBody on a vector with block size of 3, and set >>>>>>> the matrix's near nullspace using that. >>>>>>> >>>>>> >>>>>> Can you run this same example with -mat_no_inode? I think it may be a >>>>>> strange blocking that is causing this. >>>>>> >>>>> >>>>> >>>>> That works. The -ksp_view output is below. >>>>> >>>> >>>> >>>> I just wanted to follow up on this. I had a more careful look at the >>>> matrix, and confirmed that there are no zero entries on the diagonal (as >>>> expected for elasticity). The matrix is from one of libMesh's example >>>> problems: a simple cantilever model using HEX8 elements. >>>> >>>> Do you have any further thoughts about what might cause the "strange >>>> blocking" that you referred to? If there's something non-standard that >>>> libMesh is doing with the blocks, I'd be interested to look into that. I >>>> can send over the matrix if that would be helpful. >>>> >>>> Thanks, >>>> David >>>> >>>> >>> P.S. I was previously calling VecSetBlockSize and MatSetBlockSize to set >>> the block size to 3. When I don't do that, I no longer need to call >>> -mat_no_inodes. I've pasted the -ksp_view output below. Does it look like >>> that's working OK? >>> >> >> >> Sorry for the multiple messages, but I think I found the issue. libMesh >> internally sets the block size to 1 earlier on (in PetscMatrix::init()). I >> guess it'll work fine if I get it to set the block size to 3 instead, so >> I'll look into that. (libMesh has an enable-blocked-storage configure >> option that should take care of this automatically.) >> >> David >> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Nov 13 08:45:24 2015 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 13 Nov 2015 08:45:24 -0600 Subject: [petsc-users] GAMG and zero pivots follow up In-Reply-To: References: Message-ID: On Thu, Nov 12, 2015 at 10:02 PM, David Knezevic wrote: > On Thu, Nov 12, 2015 at 9:58 AM, David Knezevic < > david.knezevic at akselos.com> wrote: > >> On Thu, Nov 12, 2015 at 9:50 AM, Mark Adams wrote: >> >>> Note, I suspect the zero pivot is coming from a coarse grid. I don't >>> know why no-inode fixed it. N.B., this might not be deterministic. >>> >>> If you run with -info and grep on 'GAMG' you will see the block sizes. >>> They should be 3 on the fine grid and the 6 on the coarse grids if you set >>> everything up correctly. >>> >> >> OK, thanks for the info, I'll look into this further. >> >> Though note that I got it to work now without needing no-inode now. The >> change I made was to make sure that I matched the order of function calls >> from the PETSc GAMG examples. The libMesh code I was using was doing things >> somewhat out of order, apparently. >> > > > As I mentioned above, GAMG seems to be working fine for me now. Thanks for > the help on this. > > However, I wanted to ask about another 3D elasticity test case I ran in > which GAMG didn't converge. The "-ksp_view -ksp_monitor" output is below. > Any insights into what options I should set in order to get it to converge > in this case would be appreciated. (By the way, ML worked in this case with > the default options.) > > Thanks, > David > > -------------------------------------------- > > 0 KSP Residual norm 2.475892877233e-03 > 1 KSP Residual norm 6.181785698923e-05 > 2 KSP Residual norm 9.439050821656e-04 > Something very strange is happening here. CG should converge monotonically, but above it does not. What could be happening? a) We could have lost orthogonality This seems really unlikely in 2 iterates. b) The preconditioner could be nonlinear In order to test this, could you run with both GMRES and FGMRES? If the convergence is different, then something is wrong with the preconditioner. I have looked below, and it seems like this should be linear. Mark, could something be happening in GAMG? Also, you should not be using CG/V-cycle. You should be using 1 iterate of FMG, -pc_mg_type full. I have just been going over this in class. It is normal for people to solve way past discretization error, but that does not make much sense. Its not hard to get a sense of discretization error using a manufactured solution. Thanks, Matt > KSP Object: 8 MPI processes > type: cg > maximum iterations=5000 > tolerances: relative=1e-12, absolute=1e-50, divergence=10000 > left preconditioning > using nonzero initial guess > using PRECONDITIONED norm type for convergence test > PC Object: 8 MPI processes > type: gamg > MG: type is MULTIPLICATIVE, levels=5 cycles=v > Cycles per PCApply=1 > Using Galerkin computed coarse grid matrices > GAMG specific options > Threshold for dropping small values from graph 0 > AGG specific options > Symmetric graph false > Coarse grid solver -- level ------------------------------- > KSP Object: (mg_coarse_) 8 MPI processes > type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=1, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_coarse_) 8 MPI processes > type: bjacobi > block Jacobi: number of blocks = 8 > Local solve is same for all blocks, in the following KSP and PC > objects: > KSP Object: (mg_coarse_sub_) 1 MPI processes > type: preonly > maximum iterations=1, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_coarse_sub_) 1 MPI processes > type: lu > LU: out-of-place factorization > tolerance for zero pivot 2.22045e-14 > using diagonal shift on blocks to prevent zero pivot [INBLOCKS] > matrix ordering: nd > factor fill ratio given 5, needed 1 > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=6, cols=6, bs=6 > package used to perform factorization: petsc > total: nonzeros=36, allocated nonzeros=36 > total number of mallocs used during MatSetValues calls =0 > using I-node routines: found 2 nodes, limit used is 5 > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=6, cols=6, bs=6 > total: nonzeros=36, allocated nonzeros=36 > total number of mallocs used during MatSetValues calls =0 > using I-node routines: found 2 nodes, limit used is 5 > linear system matrix = precond matrix: > Mat Object: 8 MPI processes > type: mpiaij > rows=6, cols=6, bs=6 > total: nonzeros=36, allocated nonzeros=36 > total number of mallocs used during MatSetValues calls =0 > using I-node (on process 0) routines: found 2 nodes, limit used > is 5 > Down solver (pre-smoother) on level 1 ------------------------------- > KSP Object: (mg_levels_1_) 8 MPI processes > type: chebyshev > Chebyshev: eigenvalue estimates: min = 0.146922, max = 1.61615 > Chebyshev: eigenvalues estimated using gmres with translations [0 > 0.1; 0 1.1] > KSP Object: (mg_levels_1_esteig_) 8 MPI processes > type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > left preconditioning > using NONE norm type for convergence test > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_1_) 8 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, > omega = 1 > linear system matrix = precond matrix: > Mat Object: 8 MPI processes > type: mpiaij > rows=162, cols=162, bs=6 > total: nonzeros=26244, allocated nonzeros=26244 > total number of mallocs used during MatSetValues calls =0 > using I-node (on process 0) routines: found 5 nodes, limit used > is 5 > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 2 ------------------------------- > KSP Object: (mg_levels_2_) 8 MPI processes > type: chebyshev > Chebyshev: eigenvalue estimates: min = 0.143941, max = 1.58335 > Chebyshev: eigenvalues estimated using gmres with translations [0 > 0.1; 0 1.1] > KSP Object: (mg_levels_2_esteig_) 8 MPI processes > type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > left preconditioning > using NONE norm type for convergence test > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_2_) 8 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, > omega = 1 > linear system matrix = precond matrix: > Mat Object: 8 MPI processes > type: mpiaij > rows=4668, cols=4668, bs=6 > total: nonzeros=3.00254e+06, allocated nonzeros=3.00254e+06 > total number of mallocs used during MatSetValues calls =0 > using I-node (on process 0) routines: found 158 nodes, limit > used is 5 > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 3 ------------------------------- > KSP Object: (mg_levels_3_) 8 MPI processes > type: chebyshev > Chebyshev: eigenvalue estimates: min = 0.128239, max = 1.41063 > Chebyshev: eigenvalues estimated using gmres with translations [0 > 0.1; 0 1.1] > KSP Object: (mg_levels_3_esteig_) 8 MPI processes > type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > left preconditioning > using NONE norm type for convergence test > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_3_) 8 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, > omega = 1 > linear system matrix = precond matrix: > Mat Object: 8 MPI processes > type: mpiaij > rows=66390, cols=66390, bs=6 > total: nonzeros=1.65982e+07, allocated nonzeros=1.65982e+07 > total number of mallocs used during MatSetValues calls =0 > using I-node (on process 0) routines: found 2668 nodes, limit > used is 5 > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 4 ------------------------------- > KSP Object: (mg_levels_4_) 8 MPI processes > type: chebyshev > Chebyshev: eigenvalue estimates: min = 0.1, max = 1.1 > Chebyshev: eigenvalues estimated using gmres with translations [0 > 0.1; 0 1.1] > KSP Object: (mg_levels_4_esteig_) 8 MPI processes > type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > left preconditioning > using NONE norm type for convergence test > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_4_) 8 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, > omega = 1 > linear system matrix = precond matrix: > Mat Object: () 8 MPI processes > type: mpiaij > rows=1532187, cols=1532187, bs=3 > total: nonzeros=1.21304e+08, allocated nonzeros=1.21304e+08 > total number of mallocs used during MatSetValues calls =0 > has attached near null space > using I-node (on process 0) routines: found 66403 nodes, limit > used is 5 > Up solver (post-smoother) same as down solver (pre-smoother) > linear system matrix = precond matrix: > Mat Object: () 8 MPI processes > type: mpiaij > rows=1532187, cols=1532187, bs=3 > total: nonzeros=1.21304e+08, allocated nonzeros=1.21304e+08 > total number of mallocs used during MatSetValues calls =0 > has attached near null space > using I-node (on process 0) routines: found 66403 nodes, limit used > is 5 > Error, conv_flag < 0! > > > > > > > > >> >> >> >> >> >> On Wed, Nov 11, 2015 at 1:36 PM, David Knezevic < >> david.knezevic at akselos.com> wrote: >> >>> On Wed, Nov 11, 2015 at 12:57 PM, David Knezevic < >>> david.knezevic at akselos.com> wrote: >>> >>>> On Wed, Nov 11, 2015 at 12:24 PM, David Knezevic < >>>> david.knezevic at akselos.com> wrote: >>>> >>>>> On Tue, Nov 10, 2015 at 10:28 PM, David Knezevic < >>>>> david.knezevic at akselos.com> wrote: >>>>> >>>>>> On Tue, Nov 10, 2015 at 10:24 PM, Matthew Knepley >>>>>> wrote: >>>>>> >>>>>>> On Tue, Nov 10, 2015 at 9:21 PM, David Knezevic < >>>>>>> david.knezevic at akselos.com> wrote: >>>>>>> >>>>>>>> On Tue, Nov 10, 2015 at 10:00 PM, Matthew Knepley < >>>>>>>> knepley at gmail.com> wrote: >>>>>>>> >>>>>>>>> On Tue, Nov 10, 2015 at 8:39 PM, David Knezevic < >>>>>>>>> david.knezevic at akselos.com> wrote: >>>>>>>>> >>>>>>>>>> I'm looking into using GAMG, so I wanted to start with a simple >>>>>>>>>> 3D elasticity problem. When I first tried this, I got the following "zero >>>>>>>>>> pivot" error: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ----------------------------------------------------------------------- >>>>>>>>>> >>>>>>>>>> [0]PETSC ERROR: Zero pivot in LU factorization: >>>>>>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#zeropivot >>>>>>>>>> [0]PETSC ERROR: Zero pivot, row 3 >>>>>>>>>> [0]PETSC ERROR: See >>>>>>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>>>>>>>>> shooting. >>>>>>>>>> [0]PETSC ERROR: Petsc Release Version 3.6.1, Jul, 22, 2015 >>>>>>>>>> [0]PETSC ERROR: >>>>>>>>>> /home/dknez/akselos-dev/scrbe/build/bin/fe_solver-opt_real on a >>>>>>>>>> arch-linux2-c-opt named david-Lenovo by dknez Tue Nov 10 21:26:39 2015 >>>>>>>>>> [0]PETSC ERROR: Configure options --with-shared-libraries=1 >>>>>>>>>> --with-debugging=0 --download-suitesparse --download-parmetis >>>>>>>>>> --download-blacs >>>>>>>>>> --with-blas-lapack-dir=/opt/intel/system_studio_2015.2.050/mkl >>>>>>>>>> --CXXFLAGS=-Wl,--no-as-needed --download-scalapack --download-mumps >>>>>>>>>> --download-metis --download-superlu_dist >>>>>>>>>> --prefix=/home/dknez/software/libmesh_install/opt_real/petsc >>>>>>>>>> --download-hypre --download-ml >>>>>>>>>> [0]PETSC ERROR: #1 PetscKernel_A_gets_inverse_A_5() line 48 in >>>>>>>>>> /home/dknez/software/petsc-3.6.1/src/mat/impls/baij/seq/dgefa5.c >>>>>>>>>> [0]PETSC ERROR: #2 MatSOR_SeqAIJ_Inode() line 2808 in >>>>>>>>>> /home/dknez/software/petsc-3.6.1/src/mat/impls/aij/seq/inode.c >>>>>>>>>> [0]PETSC ERROR: #3 MatSOR() line 3697 in >>>>>>>>>> /home/dknez/software/petsc-3.6.1/src/mat/interface/matrix.c >>>>>>>>>> [0]PETSC ERROR: #4 PCApply_SOR() line 37 in >>>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/sor/sor.c >>>>>>>>>> [0]PETSC ERROR: #5 PCApply() line 482 in >>>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/interface/precon.c >>>>>>>>>> [0]PETSC ERROR: #6 KSP_PCApply() line 242 in >>>>>>>>>> /home/dknez/software/petsc-3.6.1/include/petsc/private/kspimpl.h >>>>>>>>>> [0]PETSC ERROR: #7 KSPInitialResidual() line 63 in >>>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itres.c >>>>>>>>>> [0]PETSC ERROR: #8 KSPSolve_GMRES() line 235 in >>>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/gmres/gmres.c >>>>>>>>>> [0]PETSC ERROR: #9 KSPSolve() line 604 in >>>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c >>>>>>>>>> [0]PETSC ERROR: #10 KSPSolve_Chebyshev() line 381 in >>>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/cheby/cheby.c >>>>>>>>>> [0]PETSC ERROR: #11 KSPSolve() line 604 in >>>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c >>>>>>>>>> [0]PETSC ERROR: #12 PCMGMCycle_Private() line 19 in >>>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c >>>>>>>>>> [0]PETSC ERROR: #13 PCMGMCycle_Private() line 48 in >>>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c >>>>>>>>>> [0]PETSC ERROR: #14 PCApply_MG() line 338 in >>>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c >>>>>>>>>> [0]PETSC ERROR: #15 PCApply() line 482 in >>>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/interface/precon.c >>>>>>>>>> [0]PETSC ERROR: #16 KSP_PCApply() line 242 in >>>>>>>>>> /home/dknez/software/petsc-3.6.1/include/petsc/private/kspimpl.h >>>>>>>>>> [0]PETSC ERROR: #17 KSPSolve_CG() line 139 in >>>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/cg/cg.c >>>>>>>>>> [0]PETSC ERROR: #18 KSPSolve() line 604 in >>>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ----------------------------------------------------------------------- >>>>>>>>>> >>>>>>>>>> I saw that there was a thread about this in September (subject: >>>>>>>>>> "gamg and zero pivots"), and that the fix is to use "-mg_levels_pc_type >>>>>>>>>> jacobi." When I do that, the solve succeeds (I pasted the -ksp_view at the >>>>>>>>>> end of this email). >>>>>>>>>> >>>>>>>>>> So I have two questions about this: >>>>>>>>>> >>>>>>>>>> 1. Is it surprising that I hit this issue for a 3D elasticity >>>>>>>>>> problem? Note that matrix assembly was done in libMesh, I can look into the >>>>>>>>>> structure of the assembled matrix more carefully, if needed. Also, note >>>>>>>>>> that I can solve this problem with direct solvers just fine. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Yes, this seems like a bug, but it could be some strange BC thing >>>>>>>>> I do not understand. >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> OK, I can look into the matrix in more detail. I agree that it >>>>>>>> should have a non-zero diagonal, so I'll have a look at what's happening >>>>>>>> with that. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Naively, the elastic element matrix has a nonzero diagonal. I see >>>>>>>>> that you are doing LU >>>>>>>>> of size 5. That seems strange for 3D elasticity. Am I missing >>>>>>>>> something? I would expect >>>>>>>>> block size 3. >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I'm not sure what is causing the LU of size 5. Is there a setting >>>>>>>> to control that? >>>>>>>> >>>>>>>> Regarding the block size: I set the vector and matrix block size to >>>>>>>> 3 via VecSetBlockSize and MatSetBlockSize. I also >>>>>>>> used MatNullSpaceCreateRigidBody on a vector with block size of 3, and set >>>>>>>> the matrix's near nullspace using that. >>>>>>>> >>>>>>> >>>>>>> Can you run this same example with -mat_no_inode? I think it may be >>>>>>> a strange blocking that is causing this. >>>>>>> >>>>>> >>>>>> >>>>>> That works. The -ksp_view output is below. >>>>>> >>>>> >>>>> >>>>> I just wanted to follow up on this. I had a more careful look at the >>>>> matrix, and confirmed that there are no zero entries on the diagonal (as >>>>> expected for elasticity). The matrix is from one of libMesh's example >>>>> problems: a simple cantilever model using HEX8 elements. >>>>> >>>>> Do you have any further thoughts about what might cause the "strange >>>>> blocking" that you referred to? If there's something non-standard that >>>>> libMesh is doing with the blocks, I'd be interested to look into that. I >>>>> can send over the matrix if that would be helpful. >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>> >>>> P.S. I was previously calling VecSetBlockSize and MatSetBlockSize to >>>> set the block size to 3. When I don't do that, I no longer need to call >>>> -mat_no_inodes. I've pasted the -ksp_view output below. Does it look like >>>> that's working OK? >>>> >>> >>> >>> Sorry for the multiple messages, but I think I found the issue. libMesh >>> internally sets the block size to 1 earlier on (in PetscMatrix::init()). I >>> guess it'll work fine if I get it to set the block size to 3 instead, so >>> I'll look into that. (libMesh has an enable-blocked-storage configure >>> option that should take care of this automatically.) >>> >>> David >>> >>> >>> >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.knezevic at akselos.com Fri Nov 13 08:52:31 2015 From: david.knezevic at akselos.com (David Knezevic) Date: Fri, 13 Nov 2015 09:52:31 -0500 Subject: [petsc-users] GAMG and zero pivots follow up In-Reply-To: References: Message-ID: On Fri, Nov 13, 2015 at 9:45 AM, Matthew Knepley wrote: > On Thu, Nov 12, 2015 at 10:02 PM, David Knezevic < > david.knezevic at akselos.com> wrote: > >> On Thu, Nov 12, 2015 at 9:58 AM, David Knezevic < >> david.knezevic at akselos.com> wrote: >> >>> On Thu, Nov 12, 2015 at 9:50 AM, Mark Adams wrote: >>> >>>> Note, I suspect the zero pivot is coming from a coarse grid. I don't >>>> know why no-inode fixed it. N.B., this might not be deterministic. >>>> >>>> If you run with -info and grep on 'GAMG' you will see the block sizes. >>>> They should be 3 on the fine grid and the 6 on the coarse grids if you set >>>> everything up correctly. >>>> >>> >>> OK, thanks for the info, I'll look into this further. >>> >>> Though note that I got it to work now without needing no-inode now. The >>> change I made was to make sure that I matched the order of function calls >>> from the PETSc GAMG examples. The libMesh code I was using was doing things >>> somewhat out of order, apparently. >>> >> >> >> As I mentioned above, GAMG seems to be working fine for me now. Thanks >> for the help on this. >> >> However, I wanted to ask about another 3D elasticity test case I ran in >> which GAMG didn't converge. The "-ksp_view -ksp_monitor" output is below. >> Any insights into what options I should set in order to get it to converge >> in this case would be appreciated. (By the way, ML worked in this case with >> the default options.) >> >> Thanks, >> David >> >> -------------------------------------------- >> >> 0 KSP Residual norm 2.475892877233e-03 >> 1 KSP Residual norm 6.181785698923e-05 >> 2 KSP Residual norm 9.439050821656e-04 >> > > Something very strange is happening here. CG should converge > monotonically, but above it does not. What could be happening? > > a) We could have lost orthogonality > > This seems really unlikely in 2 iterates. > > b) The preconditioner could be nonlinear > > In order to test this, could you run with both GMRES and FGMRES? If > the convergence is different, then something is wrong > with the preconditioner. > > I have looked below, and it seems like this should be linear. Mark, > could something be happening in GAMG? > > Also, you should not be using CG/V-cycle. You should be using 1 iterate of > FMG, -pc_mg_type full. I have just been going over > this in class. It is normal for people to solve way past discretization > error, but that does not make much sense. Its not hard to > get a sense of discretization error using a manufactured solution. > OK, thanks for this info. I'll do some tests based on your comments above when I get some time and I'll let you know what happens. Thanks, David > KSP Object: 8 MPI processes >> type: cg >> maximum iterations=5000 >> tolerances: relative=1e-12, absolute=1e-50, divergence=10000 >> left preconditioning >> using nonzero initial guess >> using PRECONDITIONED norm type for convergence test >> PC Object: 8 MPI processes >> type: gamg >> MG: type is MULTIPLICATIVE, levels=5 cycles=v >> Cycles per PCApply=1 >> Using Galerkin computed coarse grid matrices >> GAMG specific options >> Threshold for dropping small values from graph 0 >> AGG specific options >> Symmetric graph false >> Coarse grid solver -- level ------------------------------- >> KSP Object: (mg_coarse_) 8 MPI processes >> type: gmres >> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt >> Orthogonalization with no iterative refinement >> GMRES: happy breakdown tolerance 1e-30 >> maximum iterations=1, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000 >> left preconditioning >> using NONE norm type for convergence test >> PC Object: (mg_coarse_) 8 MPI processes >> type: bjacobi >> block Jacobi: number of blocks = 8 >> Local solve is same for all blocks, in the following KSP and PC >> objects: >> KSP Object: (mg_coarse_sub_) 1 MPI processes >> type: preonly >> maximum iterations=1, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000 >> left preconditioning >> using NONE norm type for convergence test >> PC Object: (mg_coarse_sub_) 1 MPI processes >> type: lu >> LU: out-of-place factorization >> tolerance for zero pivot 2.22045e-14 >> using diagonal shift on blocks to prevent zero pivot [INBLOCKS] >> matrix ordering: nd >> factor fill ratio given 5, needed 1 >> Factored matrix follows: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=6, cols=6, bs=6 >> package used to perform factorization: petsc >> total: nonzeros=36, allocated nonzeros=36 >> total number of mallocs used during MatSetValues calls =0 >> using I-node routines: found 2 nodes, limit used is 5 >> linear system matrix = precond matrix: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=6, cols=6, bs=6 >> total: nonzeros=36, allocated nonzeros=36 >> total number of mallocs used during MatSetValues calls =0 >> using I-node routines: found 2 nodes, limit used is 5 >> linear system matrix = precond matrix: >> Mat Object: 8 MPI processes >> type: mpiaij >> rows=6, cols=6, bs=6 >> total: nonzeros=36, allocated nonzeros=36 >> total number of mallocs used during MatSetValues calls =0 >> using I-node (on process 0) routines: found 2 nodes, limit used >> is 5 >> Down solver (pre-smoother) on level 1 ------------------------------- >> KSP Object: (mg_levels_1_) 8 MPI processes >> type: chebyshev >> Chebyshev: eigenvalue estimates: min = 0.146922, max = 1.61615 >> Chebyshev: eigenvalues estimated using gmres with translations >> [0 0.1; 0 1.1] >> KSP Object: (mg_levels_1_esteig_) 8 MPI processes >> type: gmres >> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt >> Orthogonalization with no iterative refinement >> GMRES: happy breakdown tolerance 1e-30 >> maximum iterations=10, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000 >> left preconditioning >> using NONE norm type for convergence test >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000 >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (mg_levels_1_) 8 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local iterations = >> 1, omega = 1 >> linear system matrix = precond matrix: >> Mat Object: 8 MPI processes >> type: mpiaij >> rows=162, cols=162, bs=6 >> total: nonzeros=26244, allocated nonzeros=26244 >> total number of mallocs used during MatSetValues calls =0 >> using I-node (on process 0) routines: found 5 nodes, limit used >> is 5 >> Up solver (post-smoother) same as down solver (pre-smoother) >> Down solver (pre-smoother) on level 2 ------------------------------- >> KSP Object: (mg_levels_2_) 8 MPI processes >> type: chebyshev >> Chebyshev: eigenvalue estimates: min = 0.143941, max = 1.58335 >> Chebyshev: eigenvalues estimated using gmres with translations >> [0 0.1; 0 1.1] >> KSP Object: (mg_levels_2_esteig_) 8 MPI processes >> type: gmres >> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt >> Orthogonalization with no iterative refinement >> GMRES: happy breakdown tolerance 1e-30 >> maximum iterations=10, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000 >> left preconditioning >> using NONE norm type for convergence test >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000 >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (mg_levels_2_) 8 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local iterations = >> 1, omega = 1 >> linear system matrix = precond matrix: >> Mat Object: 8 MPI processes >> type: mpiaij >> rows=4668, cols=4668, bs=6 >> total: nonzeros=3.00254e+06, allocated nonzeros=3.00254e+06 >> total number of mallocs used during MatSetValues calls =0 >> using I-node (on process 0) routines: found 158 nodes, limit >> used is 5 >> Up solver (post-smoother) same as down solver (pre-smoother) >> Down solver (pre-smoother) on level 3 ------------------------------- >> KSP Object: (mg_levels_3_) 8 MPI processes >> type: chebyshev >> Chebyshev: eigenvalue estimates: min = 0.128239, max = 1.41063 >> Chebyshev: eigenvalues estimated using gmres with translations >> [0 0.1; 0 1.1] >> KSP Object: (mg_levels_3_esteig_) 8 MPI processes >> type: gmres >> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt >> Orthogonalization with no iterative refinement >> GMRES: happy breakdown tolerance 1e-30 >> maximum iterations=10, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000 >> left preconditioning >> using NONE norm type for convergence test >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000 >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (mg_levels_3_) 8 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local iterations = >> 1, omega = 1 >> linear system matrix = precond matrix: >> Mat Object: 8 MPI processes >> type: mpiaij >> rows=66390, cols=66390, bs=6 >> total: nonzeros=1.65982e+07, allocated nonzeros=1.65982e+07 >> total number of mallocs used during MatSetValues calls =0 >> using I-node (on process 0) routines: found 2668 nodes, limit >> used is 5 >> Up solver (post-smoother) same as down solver (pre-smoother) >> Down solver (pre-smoother) on level 4 ------------------------------- >> KSP Object: (mg_levels_4_) 8 MPI processes >> type: chebyshev >> Chebyshev: eigenvalue estimates: min = 0.1, max = 1.1 >> Chebyshev: eigenvalues estimated using gmres with translations >> [0 0.1; 0 1.1] >> KSP Object: (mg_levels_4_esteig_) 8 MPI processes >> type: gmres >> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt >> Orthogonalization with no iterative refinement >> GMRES: happy breakdown tolerance 1e-30 >> maximum iterations=10, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000 >> left preconditioning >> using NONE norm type for convergence test >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000 >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (mg_levels_4_) 8 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local iterations = >> 1, omega = 1 >> linear system matrix = precond matrix: >> Mat Object: () 8 MPI processes >> type: mpiaij >> rows=1532187, cols=1532187, bs=3 >> total: nonzeros=1.21304e+08, allocated nonzeros=1.21304e+08 >> total number of mallocs used during MatSetValues calls =0 >> has attached near null space >> using I-node (on process 0) routines: found 66403 nodes, limit >> used is 5 >> Up solver (post-smoother) same as down solver (pre-smoother) >> linear system matrix = precond matrix: >> Mat Object: () 8 MPI processes >> type: mpiaij >> rows=1532187, cols=1532187, bs=3 >> total: nonzeros=1.21304e+08, allocated nonzeros=1.21304e+08 >> total number of mallocs used during MatSetValues calls =0 >> has attached near null space >> using I-node (on process 0) routines: found 66403 nodes, limit used >> is 5 >> Error, conv_flag < 0! >> >> >> >> >> >> >> >> >>> >>> >>> >>> >>> >>> On Wed, Nov 11, 2015 at 1:36 PM, David Knezevic < >>> david.knezevic at akselos.com> wrote: >>> >>>> On Wed, Nov 11, 2015 at 12:57 PM, David Knezevic < >>>> david.knezevic at akselos.com> wrote: >>>> >>>>> On Wed, Nov 11, 2015 at 12:24 PM, David Knezevic < >>>>> david.knezevic at akselos.com> wrote: >>>>> >>>>>> On Tue, Nov 10, 2015 at 10:28 PM, David Knezevic < >>>>>> david.knezevic at akselos.com> wrote: >>>>>> >>>>>>> On Tue, Nov 10, 2015 at 10:24 PM, Matthew Knepley >>>>>> > wrote: >>>>>>> >>>>>>>> On Tue, Nov 10, 2015 at 9:21 PM, David Knezevic < >>>>>>>> david.knezevic at akselos.com> wrote: >>>>>>>> >>>>>>>>> On Tue, Nov 10, 2015 at 10:00 PM, Matthew Knepley < >>>>>>>>> knepley at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> On Tue, Nov 10, 2015 at 8:39 PM, David Knezevic < >>>>>>>>>> david.knezevic at akselos.com> wrote: >>>>>>>>>> >>>>>>>>>>> I'm looking into using GAMG, so I wanted to start with a simple >>>>>>>>>>> 3D elasticity problem. When I first tried this, I got the following "zero >>>>>>>>>>> pivot" error: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ----------------------------------------------------------------------- >>>>>>>>>>> >>>>>>>>>>> [0]PETSC ERROR: Zero pivot in LU factorization: >>>>>>>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#zeropivot >>>>>>>>>>> [0]PETSC ERROR: Zero pivot, row 3 >>>>>>>>>>> [0]PETSC ERROR: See >>>>>>>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>>>>>>>>>> shooting. >>>>>>>>>>> [0]PETSC ERROR: Petsc Release Version 3.6.1, Jul, 22, 2015 >>>>>>>>>>> [0]PETSC ERROR: >>>>>>>>>>> /home/dknez/akselos-dev/scrbe/build/bin/fe_solver-opt_real on a >>>>>>>>>>> arch-linux2-c-opt named david-Lenovo by dknez Tue Nov 10 21:26:39 2015 >>>>>>>>>>> [0]PETSC ERROR: Configure options --with-shared-libraries=1 >>>>>>>>>>> --with-debugging=0 --download-suitesparse --download-parmetis >>>>>>>>>>> --download-blacs >>>>>>>>>>> --with-blas-lapack-dir=/opt/intel/system_studio_2015.2.050/mkl >>>>>>>>>>> --CXXFLAGS=-Wl,--no-as-needed --download-scalapack --download-mumps >>>>>>>>>>> --download-metis --download-superlu_dist >>>>>>>>>>> --prefix=/home/dknez/software/libmesh_install/opt_real/petsc >>>>>>>>>>> --download-hypre --download-ml >>>>>>>>>>> [0]PETSC ERROR: #1 PetscKernel_A_gets_inverse_A_5() line 48 in >>>>>>>>>>> /home/dknez/software/petsc-3.6.1/src/mat/impls/baij/seq/dgefa5.c >>>>>>>>>>> [0]PETSC ERROR: #2 MatSOR_SeqAIJ_Inode() line 2808 in >>>>>>>>>>> /home/dknez/software/petsc-3.6.1/src/mat/impls/aij/seq/inode.c >>>>>>>>>>> [0]PETSC ERROR: #3 MatSOR() line 3697 in >>>>>>>>>>> /home/dknez/software/petsc-3.6.1/src/mat/interface/matrix.c >>>>>>>>>>> [0]PETSC ERROR: #4 PCApply_SOR() line 37 in >>>>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/sor/sor.c >>>>>>>>>>> [0]PETSC ERROR: #5 PCApply() line 482 in >>>>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/interface/precon.c >>>>>>>>>>> [0]PETSC ERROR: #6 KSP_PCApply() line 242 in >>>>>>>>>>> /home/dknez/software/petsc-3.6.1/include/petsc/private/kspimpl.h >>>>>>>>>>> [0]PETSC ERROR: #7 KSPInitialResidual() line 63 in >>>>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itres.c >>>>>>>>>>> [0]PETSC ERROR: #8 KSPSolve_GMRES() line 235 in >>>>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/gmres/gmres.c >>>>>>>>>>> [0]PETSC ERROR: #9 KSPSolve() line 604 in >>>>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c >>>>>>>>>>> [0]PETSC ERROR: #10 KSPSolve_Chebyshev() line 381 in >>>>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/cheby/cheby.c >>>>>>>>>>> [0]PETSC ERROR: #11 KSPSolve() line 604 in >>>>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c >>>>>>>>>>> [0]PETSC ERROR: #12 PCMGMCycle_Private() line 19 in >>>>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c >>>>>>>>>>> [0]PETSC ERROR: #13 PCMGMCycle_Private() line 48 in >>>>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c >>>>>>>>>>> [0]PETSC ERROR: #14 PCApply_MG() line 338 in >>>>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/impls/mg/mg.c >>>>>>>>>>> [0]PETSC ERROR: #15 PCApply() line 482 in >>>>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/pc/interface/precon.c >>>>>>>>>>> [0]PETSC ERROR: #16 KSP_PCApply() line 242 in >>>>>>>>>>> /home/dknez/software/petsc-3.6.1/include/petsc/private/kspimpl.h >>>>>>>>>>> [0]PETSC ERROR: #17 KSPSolve_CG() line 139 in >>>>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/impls/cg/cg.c >>>>>>>>>>> [0]PETSC ERROR: #18 KSPSolve() line 604 in >>>>>>>>>>> /home/dknez/software/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ----------------------------------------------------------------------- >>>>>>>>>>> >>>>>>>>>>> I saw that there was a thread about this in September (subject: >>>>>>>>>>> "gamg and zero pivots"), and that the fix is to use "-mg_levels_pc_type >>>>>>>>>>> jacobi." When I do that, the solve succeeds (I pasted the -ksp_view at the >>>>>>>>>>> end of this email). >>>>>>>>>>> >>>>>>>>>>> So I have two questions about this: >>>>>>>>>>> >>>>>>>>>>> 1. Is it surprising that I hit this issue for a 3D elasticity >>>>>>>>>>> problem? Note that matrix assembly was done in libMesh, I can look into the >>>>>>>>>>> structure of the assembled matrix more carefully, if needed. Also, note >>>>>>>>>>> that I can solve this problem with direct solvers just fine. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Yes, this seems like a bug, but it could be some strange BC thing >>>>>>>>>> I do not understand. >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> OK, I can look into the matrix in more detail. I agree that it >>>>>>>>> should have a non-zero diagonal, so I'll have a look at what's happening >>>>>>>>> with that. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Naively, the elastic element matrix has a nonzero diagonal. I see >>>>>>>>>> that you are doing LU >>>>>>>>>> of size 5. That seems strange for 3D elasticity. Am I missing >>>>>>>>>> something? I would expect >>>>>>>>>> block size 3. >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> I'm not sure what is causing the LU of size 5. Is there a setting >>>>>>>>> to control that? >>>>>>>>> >>>>>>>>> Regarding the block size: I set the vector and matrix block size >>>>>>>>> to 3 via VecSetBlockSize and MatSetBlockSize. I also >>>>>>>>> used MatNullSpaceCreateRigidBody on a vector with block size of 3, and set >>>>>>>>> the matrix's near nullspace using that. >>>>>>>>> >>>>>>>> >>>>>>>> Can you run this same example with -mat_no_inode? I think it may be >>>>>>>> a strange blocking that is causing this. >>>>>>>> >>>>>>> >>>>>>> >>>>>>> That works. The -ksp_view output is below. >>>>>>> >>>>>> >>>>>> >>>>>> I just wanted to follow up on this. I had a more careful look at the >>>>>> matrix, and confirmed that there are no zero entries on the diagonal (as >>>>>> expected for elasticity). The matrix is from one of libMesh's example >>>>>> problems: a simple cantilever model using HEX8 elements. >>>>>> >>>>>> Do you have any further thoughts about what might cause the "strange >>>>>> blocking" that you referred to? If there's something non-standard that >>>>>> libMesh is doing with the blocks, I'd be interested to look into that. I >>>>>> can send over the matrix if that would be helpful. >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>> >>>>> P.S. I was previously calling VecSetBlockSize and MatSetBlockSize to >>>>> set the block size to 3. When I don't do that, I no longer need to call >>>>> -mat_no_inodes. I've pasted the -ksp_view output below. Does it look like >>>>> that's working OK? >>>>> >>>> >>>> >>>> Sorry for the multiple messages, but I think I found the issue. libMesh >>>> internally sets the block size to 1 earlier on (in PetscMatrix::init()). I >>>> guess it'll work fine if I get it to set the block size to 3 instead, so >>>> I'll look into that. (libMesh has an enable-blocked-storage configure >>>> option that should take care of this automatically.) >>>> >>>> David >>>> >>>> >>>> >>> >>> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Fri Nov 13 12:28:59 2015 From: jed at jedbrown.org (Jed Brown) Date: Fri, 13 Nov 2015 11:28:59 -0700 Subject: [petsc-users] GAMG and zero pivots follow up In-Reply-To: References: Message-ID: <87si4992j8.fsf@jedbrown.org> Matthew Knepley writes: > Something very strange is happening here. CG should converge monotonically, > but above it does not. What could be happening? Are you use -ksp_norm_type natural? CG is not monotone in other norms. Also, if boundary conditions are enforced using a nonsymmetric formulation (for example), then you can get lack of monotonicity with CG that may not be catastrophic. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From knepley at gmail.com Fri Nov 13 13:48:50 2015 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 13 Nov 2015 13:48:50 -0600 Subject: [petsc-users] GAMG and zero pivots follow up In-Reply-To: <87si4992j8.fsf@jedbrown.org> References: <87si4992j8.fsf@jedbrown.org> Message-ID: On Fri, Nov 13, 2015 at 12:28 PM, Jed Brown wrote: > Matthew Knepley writes: > > Something very strange is happening here. CG should converge > monotonically, > > but above it does not. What could be happening? > > Are you use -ksp_norm_type natural? CG is not monotone in other norms. > Yikes! I did not check that. Why do we have PRECONDITIONED as the default for CG? Matt Also, if boundary conditions are enforced using a nonsymmetric > formulation (for example), then you can get lack of monotonicity with CG > that may not be catastrophic. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Fri Nov 13 13:59:55 2015 From: jed at jedbrown.org (Jed Brown) Date: Fri, 13 Nov 2015 12:59:55 -0700 Subject: [petsc-users] GAMG and zero pivots follow up In-Reply-To: References: <87si4992j8.fsf@jedbrown.org> Message-ID: <87k2pl8ybo.fsf@jedbrown.org> r<#secure method=pgpmime mode=sign> Matthew Knepley writes: > Yikes! I did not check that. Why do we have PRECONDITIONED as the default > for CG? So we can do an extra reduction on each iteration? I mean, who doesn't want that? ;-) From david.knezevic at akselos.com Fri Nov 13 15:45:50 2015 From: david.knezevic at akselos.com (David Knezevic) Date: Fri, 13 Nov 2015 16:45:50 -0500 Subject: [petsc-users] GAMG and zero pivots follow up In-Reply-To: References: <87si4992j8.fsf@jedbrown.org> Message-ID: To follow up on this, I have run some tests on a smaller test case (same model, coarser mesh). The options I tried are: 8 MPI processes with: "-ksp_type gmres -pc_type gamg" "-ksp_type fgmres -pc_type gamg" "-ksp_type cg -pc_type gamg" "-ksp_type cg -pc_type gamg -ksp_norm_type natural" "-ksp_type cg -pc_type gamg -ksp_norm_type natural -pc_mg_type full" 4 MPI processes with: "-ksp_type cg -pc_type gamg -ksp_norm_type natural -pc_mg_type full" "-ksp_type cg -pc_type gamg" "-ksp_type gmres -pc_type gamg" "-ksp_type fgmres -pc_type gamg" Let me know if you'd like me to try any other cases. The "-ksp_monitor -ksp_view" output is attached. As you can see, all of the "4 MPI processes" cases worked well, and none of the "8 MPI processes" cases converged. Best regards, David On Fri, Nov 13, 2015 at 2:48 PM, Matthew Knepley wrote: > On Fri, Nov 13, 2015 at 12:28 PM, Jed Brown wrote: > >> Matthew Knepley writes: >> > Something very strange is happening here. CG should converge >> monotonically, >> > but above it does not. What could be happening? >> >> Are you use -ksp_norm_type natural? CG is not monotone in other norms. >> > > Yikes! I did not check that. Why do we have PRECONDITIONED as the default > for CG? > > Matt > > Also, if boundary conditions are enforced using a nonsymmetric >> formulation (for example), then you can get lack of monotonicity with CG >> that may not be catastrophic. >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ---------------------------------------------------------------------------------------------------------- 8 MPI processes with "-ksp_type gmres -pc_type gamg -ksp_max_it 200": 0 KSP Residual norm 1.243143122667e-03 1 KSP Residual norm 6.790564176716e-04 2 KSP Residual norm 2.863642167759e-04 3 KSP Residual norm 2.169460015654e-04 4 KSP Residual norm 1.623823395363e-04 5 KSP Residual norm 1.115831456626e-04 6 KSP Residual norm 7.905550178072e-05 7 KSP Residual norm 6.676506348708e-05 8 KSP Residual norm 5.411318965562e-05 9 KSP Residual norm 4.946393535811e-05 10 KSP Residual norm 4.873373148542e-05 11 KSP Residual norm 4.873372878673e-05 12 KSP Residual norm 4.832393150132e-05 13 KSP Residual norm 4.682644869071e-05 14 KSP Residual norm 4.177965048741e-05 15 KSP Residual norm 3.561738393315e-05 16 KSP Residual norm 3.183178450210e-05 17 KSP Residual norm 2.820849441834e-05 18 KSP Residual norm 2.411305833934e-05 19 KSP Residual norm 2.073531106031e-05 20 KSP Residual norm 1.832253875945e-05 21 KSP Residual norm 1.613725457732e-05 22 KSP Residual norm 1.447115239529e-05 23 KSP Residual norm 1.332661204650e-05 24 KSP Residual norm 1.248919278483e-05 25 KSP Residual norm 1.196188151016e-05 26 KSP Residual norm 1.161737695363e-05 27 KSP Residual norm 1.153632298642e-05 28 KSP Residual norm 1.153553211867e-05 29 KSP Residual norm 1.150729646659e-05 30 KSP Residual norm 1.134640588584e-05 31 KSP Residual norm 1.125651483355e-05 32 KSP Residual norm 1.121985823782e-05 33 KSP Residual norm 1.121682797994e-05 34 KSP Residual norm 1.120526536096e-05 35 KSP Residual norm 1.112698441144e-05 36 KSP Residual norm 1.099161361515e-05 37 KSP Residual norm 1.077160664786e-05 38 KSP Residual norm 1.046319447066e-05 39 KSP Residual norm 1.002732866634e-05 40 KSP Residual norm 9.687406818053e-06 41 KSP Residual norm 9.291736292845e-06 42 KSP Residual norm 8.787280517217e-06 43 KSP Residual norm 8.323595238657e-06 44 KSP Residual norm 7.891080867185e-06 45 KSP Residual norm 7.537064831605e-06 46 KSP Residual norm 7.316511381129e-06 47 KSP Residual norm 7.185951262668e-06 48 KSP Residual norm 7.117216131634e-06 49 KSP Residual norm 7.104770082988e-06 50 KSP Residual norm 7.099139305978e-06 51 KSP Residual norm 7.038487040610e-06 52 KSP Residual norm 6.861458611935e-06 53 KSP Residual norm 6.625293689513e-06 54 KSP Residual norm 6.329429663991e-06 55 KSP Residual norm 5.919056214664e-06 56 KSP Residual norm 5.507182558921e-06 57 KSP Residual norm 5.196803884679e-06 58 KSP Residual norm 5.002188092285e-06 59 KSP Residual norm 4.880759791404e-06 60 KSP Residual norm 4.770150595855e-06 61 KSP Residual norm 4.724469615685e-06 62 KSP Residual norm 4.673829760077e-06 63 KSP Residual norm 4.629705280910e-06 64 KSP Residual norm 4.601474765626e-06 65 KSP Residual norm 4.593132269745e-06 66 KSP Residual norm 4.593013889961e-06 67 KSP Residual norm 4.587628601477e-06 68 KSP Residual norm 4.552820908762e-06 69 KSP Residual norm 4.477855982146e-06 70 KSP Residual norm 4.405304333703e-06 71 KSP Residual norm 4.330447444642e-06 72 KSP Residual norm 4.237398563528e-06 73 KSP Residual norm 4.138174613148e-06 74 KSP Residual norm 4.031940389494e-06 75 KSP Residual norm 3.924707157992e-06 76 KSP Residual norm 3.802185445933e-06 77 KSP Residual norm 3.721305730027e-06 78 KSP Residual norm 3.679963259079e-06 79 KSP Residual norm 3.667845615364e-06 80 KSP Residual norm 3.667179799479e-06 81 KSP Residual norm 3.662313644020e-06 82 KSP Residual norm 3.638833884448e-06 83 KSP Residual norm 3.598532435205e-06 84 KSP Residual norm 3.535852321120e-06 85 KSP Residual norm 3.456701541505e-06 86 KSP Residual norm 3.365433403050e-06 87 KSP Residual norm 3.271989106911e-06 88 KSP Residual norm 3.176356037348e-06 89 KSP Residual norm 3.034288347877e-06 90 KSP Residual norm 2.938417615975e-06 91 KSP Residual norm 2.900711516997e-06 92 KSP Residual norm 2.869381676648e-06 93 KSP Residual norm 2.855261464067e-06 94 KSP Residual norm 2.850036755015e-06 95 KSP Residual norm 2.849803273543e-06 96 KSP Residual norm 2.849494112992e-06 97 KSP Residual norm 2.846681415018e-06 98 KSP Residual norm 2.838146221432e-06 99 KSP Residual norm 2.823475351161e-06 100 KSP Residual norm 2.804320481647e-06 101 KSP Residual norm 2.775770817616e-06 102 KSP Residual norm 2.740056888907e-06 103 KSP Residual norm 2.691643851812e-06 104 KSP Residual norm 2.631661051625e-06 105 KSP Residual norm 2.579149456214e-06 106 KSP Residual norm 2.544055525393e-06 107 KSP Residual norm 2.525575063547e-06 108 KSP Residual norm 2.511478767787e-06 109 KSP Residual norm 2.505297851010e-06 110 KSP Residual norm 2.504876255779e-06 111 KSP Residual norm 2.504461173319e-06 112 KSP Residual norm 2.500656666173e-06 113 KSP Residual norm 2.492291225825e-06 114 KSP Residual norm 2.473685710456e-06 115 KSP Residual norm 2.444819997807e-06 116 KSP Residual norm 2.412359679152e-06 117 KSP Residual norm 2.381936043848e-06 118 KSP Residual norm 2.360962621352e-06 119 KSP Residual norm 2.339143019400e-06 120 KSP Residual norm 2.312025372293e-06 121 KSP Residual norm 2.295275489137e-06 122 KSP Residual norm 2.280756556273e-06 123 KSP Residual norm 2.267434577949e-06 124 KSP Residual norm 2.258143104977e-06 125 KSP Residual norm 2.252859925246e-06 126 KSP Residual norm 2.250650586520e-06 127 KSP Residual norm 2.250303886635e-06 128 KSP Residual norm 2.250181880096e-06 129 KSP Residual norm 2.248447177342e-06 130 KSP Residual norm 2.244592613442e-06 131 KSP Residual norm 2.237578766797e-06 132 KSP Residual norm 2.225002721330e-06 133 KSP Residual norm 2.208977931816e-06 134 KSP Residual norm 2.190972169698e-06 135 KSP Residual norm 2.163454716687e-06 136 KSP Residual norm 2.113637340502e-06 137 KSP Residual norm 2.056404468594e-06 138 KSP Residual norm 2.018194032501e-06 139 KSP Residual norm 2.002756982750e-06 140 KSP Residual norm 1.998776984922e-06 141 KSP Residual norm 1.998751658408e-06 142 KSP Residual norm 1.998070253266e-06 143 KSP Residual norm 1.995347469782e-06 144 KSP Residual norm 1.989518461703e-06 145 KSP Residual norm 1.977152184939e-06 146 KSP Residual norm 1.956268435730e-06 147 KSP Residual norm 1.933741835071e-06 148 KSP Residual norm 1.903282487554e-06 149 KSP Residual norm 1.852992801238e-06 150 KSP Residual norm 1.805399988610e-06 151 KSP Residual norm 1.780994347161e-06 152 KSP Residual norm 1.758724640678e-06 153 KSP Residual norm 1.741785139789e-06 154 KSP Residual norm 1.729960287252e-06 155 KSP Residual norm 1.722639037882e-06 156 KSP Residual norm 1.717886600510e-06 157 KSP Residual norm 1.716690823426e-06 158 KSP Residual norm 1.716687901763e-06 159 KSP Residual norm 1.715728009211e-06 160 KSP Residual norm 1.711931771052e-06 161 KSP Residual norm 1.705427293500e-06 162 KSP Residual norm 1.694134402206e-06 163 KSP Residual norm 1.681925884623e-06 164 KSP Residual norm 1.669192420569e-06 165 KSP Residual norm 1.655373298279e-06 166 KSP Residual norm 1.643029516607e-06 167 KSP Residual norm 1.633824784916e-06 168 KSP Residual norm 1.625200980894e-06 169 KSP Residual norm 1.618026284854e-06 170 KSP Residual norm 1.614455149899e-06 171 KSP Residual norm 1.613267262776e-06 172 KSP Residual norm 1.613263468733e-06 173 KSP Residual norm 1.612824286477e-06 174 KSP Residual norm 1.611353951523e-06 175 KSP Residual norm 1.608719287421e-06 176 KSP Residual norm 1.604804091012e-06 177 KSP Residual norm 1.599631030973e-06 178 KSP Residual norm 1.593875036132e-06 179 KSP Residual norm 1.587155532692e-06 180 KSP Residual norm 1.579858618170e-06 181 KSP Residual norm 1.575395248383e-06 182 KSP Residual norm 1.569657005101e-06 183 KSP Residual norm 1.562953726886e-06 184 KSP Residual norm 1.558094493575e-06 185 KSP Residual norm 1.554659285749e-06 186 KSP Residual norm 1.552539529764e-06 187 KSP Residual norm 1.551536559097e-06 188 KSP Residual norm 1.551451922610e-06 189 KSP Residual norm 1.551156446567e-06 190 KSP Residual norm 1.549585509867e-06 191 KSP Residual norm 1.546047672843e-06 192 KSP Residual norm 1.540124267251e-06 193 KSP Residual norm 1.533684533355e-06 194 KSP Residual norm 1.525913861531e-06 195 KSP Residual norm 1.514318734030e-06 196 KSP Residual norm 1.496433144904e-06 197 KSP Residual norm 1.479608675862e-06 198 KSP Residual norm 1.465580795626e-06 199 KSP Residual norm 1.457272640901e-06 200 KSP Residual norm 1.454275659455e-06 KSP Object: 8 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=200 tolerances: relative=1e-12, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using PRECONDITIONED norm type for convergence test PC Object: 8 MPI processes type: gamg MG: type is MULTIPLICATIVE, levels=4 cycles=v Cycles per PCApply=1 Using Galerkin computed coarse grid matrices GAMG specific options Threshold for dropping small values from graph 0 AGG specific options Symmetric graph false Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 8 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 8 MPI processes type: bjacobi block Jacobi: number of blocks = 8 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (mg_coarse_sub_) 1 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_sub_) 1 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=36, cols=36, bs=6 package used to perform factorization: petsc total: nonzeros=1296, allocated nonzeros=1296 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 8 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=36, cols=36, bs=6 total: nonzeros=1296, allocated nonzeros=1296 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 8 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaij rows=36, cols=36, bs=6 total: nonzeros=1296, allocated nonzeros=1296 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 8 nodes, limit used is 5 Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 8 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.164941, max = 1.81436 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_1_esteig_) 8 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_1_) 8 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaij rows=672, cols=672, bs=6 total: nonzeros=255600, allocated nonzeros=255600 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 10 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_levels_2_) 8 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.138297, max = 1.52126 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_2_esteig_) 8 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_2_) 8 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaij rows=8940, cols=8940, bs=6 total: nonzeros=1.99829e+06, allocated nonzeros=1.99829e+06 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 364 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 3 ------------------------------- KSP Object: (mg_levels_3_) 8 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.1, max = 1.1 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_3_esteig_) 8 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_3_) 8 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: () 8 MPI processes type: mpiaij rows=199752, cols=199752, bs=3 total: nonzeros=1.54794e+07, allocated nonzeros=1.54794e+07 total number of mallocs used during MatSetValues calls =0 has attached near null space using I-node (on process 0) routines: found 9163 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: () 8 MPI processes type: mpiaij rows=199752, cols=199752, bs=3 total: nonzeros=1.54794e+07, allocated nonzeros=1.54794e+07 total number of mallocs used during MatSetValues calls =0 has attached near null space using I-node (on process 0) routines: found 9163 nodes, limit used is 5 Error, conv_flag < 0! ---------------------------------------------------------------------------------------------------------- 8 MPI processes with "-ksp_type fgmres -pc_type gamg -ksp_max_it 200": 0 KSP Residual norm 1.000000000000e+00 1 KSP Residual norm 4.670997735323e-03 2 KSP Residual norm 4.107753808617e-03 3 KSP Residual norm 2.889337330348e-03 4 KSP Residual norm 2.116306152456e-03 5 KSP Residual norm 1.627169532601e-03 6 KSP Residual norm 1.218768613562e-03 7 KSP Residual norm 8.120536960099e-04 8 KSP Residual norm 6.255941040493e-04 9 KSP Residual norm 4.871101195846e-04 10 KSP Residual norm 3.715553641296e-04 11 KSP Residual norm 3.186530628626e-04 12 KSP Residual norm 2.916461487976e-04 13 KSP Residual norm 2.617076737616e-04 14 KSP Residual norm 2.434026947079e-04 15 KSP Residual norm 2.279904817094e-04 16 KSP Residual norm 2.094017928661e-04 17 KSP Residual norm 1.920994891443e-04 18 KSP Residual norm 1.824925891815e-04 19 KSP Residual norm 1.734956192133e-04 20 KSP Residual norm 1.641803577415e-04 21 KSP Residual norm 1.588154081656e-04 22 KSP Residual norm 1.551465552751e-04 23 KSP Residual norm 1.527361686091e-04 24 KSP Residual norm 1.507178096592e-04 25 KSP Residual norm 1.497023666544e-04 26 KSP Residual norm 1.493164041697e-04 27 KSP Residual norm 1.492075267955e-04 28 KSP Residual norm 1.492068301125e-04 29 KSP Residual norm 1.489442512775e-04 30 KSP Residual norm 1.481048645778e-04 31 KSP Residual norm 1.474912906430e-04 32 KSP Residual norm 1.472011893427e-04 33 KSP Residual norm 1.471630138354e-04 34 KSP Residual norm 1.471550027371e-04 35 KSP Residual norm 1.470274749944e-04 36 KSP Residual norm 1.466892693934e-04 37 KSP Residual norm 1.460358831019e-04 38 KSP Residual norm 1.446742442148e-04 39 KSP Residual norm 1.431445836508e-04 40 KSP Residual norm 1.412836143884e-04 41 KSP Residual norm 1.388559210259e-04 42 KSP Residual norm 1.362542327411e-04 43 KSP Residual norm 1.340755156773e-04 44 KSP Residual norm 1.320627198519e-04 45 KSP Residual norm 1.304003591661e-04 46 KSP Residual norm 1.296389081606e-04 47 KSP Residual norm 1.293033497828e-04 48 KSP Residual norm 1.291691278421e-04 49 KSP Residual norm 1.291623043886e-04 50 KSP Residual norm 1.291167471680e-04 51 KSP Residual norm 1.289206750819e-04 52 KSP Residual norm 1.285100433935e-04 53 KSP Residual norm 1.279213654135e-04 54 KSP Residual norm 1.267775022913e-04 55 KSP Residual norm 1.254444474248e-04 56 KSP Residual norm 1.232791465833e-04 57 KSP Residual norm 1.208718136472e-04 58 KSP Residual norm 1.188202291365e-04 59 KSP Residual norm 1.172567876791e-04 60 KSP Residual norm 1.161519332344e-04 61 KSP Residual norm 1.157179339808e-04 62 KSP Residual norm 1.154472502196e-04 63 KSP Residual norm 1.153127504460e-04 64 KSP Residual norm 1.152777949860e-04 65 KSP Residual norm 1.152770118788e-04 66 KSP Residual norm 1.152598598481e-04 67 KSP Residual norm 1.151629129170e-04 68 KSP Residual norm 1.149661894772e-04 69 KSP Residual norm 1.146467643611e-04 70 KSP Residual norm 1.141866143599e-04 71 KSP Residual norm 1.137025615373e-04 72 KSP Residual norm 1.131870901828e-04 73 KSP Residual norm 1.125975338408e-04 74 KSP Residual norm 1.121327693566e-04 75 KSP Residual norm 1.117787078570e-04 76 KSP Residual norm 1.115324968645e-04 77 KSP Residual norm 1.114305998751e-04 78 KSP Residual norm 1.113970100465e-04 79 KSP Residual norm 1.113957597622e-04 80 KSP Residual norm 1.113847867416e-04 81 KSP Residual norm 1.113271032325e-04 82 KSP Residual norm 1.112054965331e-04 83 KSP Residual norm 1.110712297565e-04 84 KSP Residual norm 1.108472210129e-04 85 KSP Residual norm 1.104980395243e-04 86 KSP Residual norm 1.100251900688e-04 87 KSP Residual norm 1.093770551681e-04 88 KSP Residual norm 1.087264856232e-04 89 KSP Residual norm 1.079864655032e-04 90 KSP Residual norm 1.069761018146e-04 91 KSP Residual norm 1.063544715487e-04 92 KSP Residual norm 1.059754177887e-04 93 KSP Residual norm 1.057041553935e-04 94 KSP Residual norm 1.055091860555e-04 95 KSP Residual norm 1.054083360521e-04 96 KSP Residual norm 1.053609855313e-04 97 KSP Residual norm 1.053537755585e-04 98 KSP Residual norm 1.053427425561e-04 99 KSP Residual norm 1.052632351147e-04 100 KSP Residual norm 1.050672404418e-04 101 KSP Residual norm 1.047076764607e-04 102 KSP Residual norm 1.042881690190e-04 103 KSP Residual norm 1.038070303216e-04 104 KSP Residual norm 1.032844486594e-04 105 KSP Residual norm 1.027143454602e-04 106 KSP Residual norm 1.020711306007e-04 107 KSP Residual norm 1.016161730578e-04 108 KSP Residual norm 1.012557310931e-04 109 KSP Residual norm 1.009893064643e-04 110 KSP Residual norm 1.008471358431e-04 111 KSP Residual norm 1.007996592947e-04 112 KSP Residual norm 1.007934604418e-04 113 KSP Residual norm 1.007885339167e-04 114 KSP Residual norm 1.007508141131e-04 115 KSP Residual norm 1.006505753610e-04 116 KSP Residual norm 1.004519339301e-04 117 KSP Residual norm 1.000280738585e-04 118 KSP Residual norm 9.944350404969e-05 119 KSP Residual norm 9.877759176521e-05 120 KSP Residual norm 9.818389395388e-05 121 KSP Residual norm 9.773577436930e-05 122 KSP Residual norm 9.728467175021e-05 123 KSP Residual norm 9.694926350524e-05 124 KSP Residual norm 9.672180646719e-05 125 KSP Residual norm 9.660114484763e-05 126 KSP Residual norm 9.654419729617e-05 127 KSP Residual norm 9.652671659533e-05 128 KSP Residual norm 9.652657055221e-05 129 KSP Residual norm 9.651524413669e-05 130 KSP Residual norm 9.647086564822e-05 131 KSP Residual norm 9.636753857657e-05 132 KSP Residual norm 9.618894386546e-05 133 KSP Residual norm 9.593619806440e-05 134 KSP Residual norm 9.569428638515e-05 135 KSP Residual norm 9.546096106785e-05 136 KSP Residual norm 9.525162010664e-05 137 KSP Residual norm 9.513447840595e-05 138 KSP Residual norm 9.505399070083e-05 139 KSP Residual norm 9.500332960990e-05 140 KSP Residual norm 9.497709591387e-05 141 KSP Residual norm 9.496352888228e-05 142 KSP Residual norm 9.495949083864e-05 143 KSP Residual norm 9.495948564734e-05 144 KSP Residual norm 9.495712057209e-05 145 KSP Residual norm 9.494127387935e-05 146 KSP Residual norm 9.489993555506e-05 147 KSP Residual norm 9.480961347934e-05 148 KSP Residual norm 9.469034802471e-05 149 KSP Residual norm 9.450844275660e-05 150 KSP Residual norm 9.417598101754e-05 151 KSP Residual norm 9.390512254846e-05 152 KSP Residual norm 9.376400510459e-05 153 KSP Residual norm 9.367492079299e-05 154 KSP Residual norm 9.361909990247e-05 155 KSP Residual norm 9.358335650237e-05 156 KSP Residual norm 9.356353195699e-05 157 KSP Residual norm 9.355891777491e-05 158 KSP Residual norm 9.355716324420e-05 159 KSP Residual norm 9.353449945065e-05 160 KSP Residual norm 9.346172941840e-05 161 KSP Residual norm 9.332283377696e-05 162 KSP Residual norm 9.315361593823e-05 163 KSP Residual norm 9.295337730885e-05 164 KSP Residual norm 9.270022939874e-05 165 KSP Residual norm 9.239090400786e-05 166 KSP Residual norm 9.198176650477e-05 167 KSP Residual norm 9.156845906463e-05 168 KSP Residual norm 9.110592161428e-05 169 KSP Residual norm 9.066318157752e-05 170 KSP Residual norm 9.045691253321e-05 171 KSP Residual norm 9.040420523474e-05 172 KSP Residual norm 9.040265102252e-05 173 KSP Residual norm 9.037594896598e-05 174 KSP Residual norm 9.027023066846e-05 175 KSP Residual norm 9.005602932590e-05 176 KSP Residual norm 8.974010911762e-05 177 KSP Residual norm 8.909184657206e-05 178 KSP Residual norm 8.820223874376e-05 179 KSP Residual norm 8.713852748360e-05 180 KSP Residual norm 8.640949295202e-05 181 KSP Residual norm 8.583932475736e-05 182 KSP Residual norm 8.510819872706e-05 183 KSP Residual norm 8.457549405131e-05 184 KSP Residual norm 8.420203270956e-05 185 KSP Residual norm 8.400296218178e-05 186 KSP Residual norm 8.389188464879e-05 187 KSP Residual norm 8.383544110933e-05 188 KSP Residual norm 8.382332577117e-05 189 KSP Residual norm 8.382244057144e-05 190 KSP Residual norm 8.379390563248e-05 191 KSP Residual norm 8.369470603151e-05 192 KSP Residual norm 8.349180033405e-05 193 KSP Residual norm 8.318078193378e-05 194 KSP Residual norm 8.288342685543e-05 195 KSP Residual norm 8.260634837066e-05 196 KSP Residual norm 8.240897737972e-05 197 KSP Residual norm 8.232131544913e-05 198 KSP Residual norm 8.226983935342e-05 199 KSP Residual norm 8.223729915616e-05 200 KSP Residual norm 8.222055281446e-05 KSP Object: 8 MPI processes type: fgmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=200 tolerances: relative=1e-12, absolute=1e-50, divergence=10000 right preconditioning using nonzero initial guess using UNPRECONDITIONED norm type for convergence test PC Object: 8 MPI processes type: gamg MG: type is MULTIPLICATIVE, levels=4 cycles=v Cycles per PCApply=1 Using Galerkin computed coarse grid matrices GAMG specific options Threshold for dropping small values from graph 0 AGG specific options Symmetric graph false Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 8 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 8 MPI processes type: bjacobi block Jacobi: number of blocks = 8 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (mg_coarse_sub_) 1 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_sub_) 1 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=36, cols=36, bs=6 package used to perform factorization: petsc total: nonzeros=1296, allocated nonzeros=1296 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 8 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=36, cols=36, bs=6 total: nonzeros=1296, allocated nonzeros=1296 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 8 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaij rows=36, cols=36, bs=6 total: nonzeros=1296, allocated nonzeros=1296 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 8 nodes, limit used is 5 Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 8 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.164941, max = 1.81436 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_1_esteig_) 8 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_1_) 8 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaij rows=672, cols=672, bs=6 total: nonzeros=255600, allocated nonzeros=255600 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 10 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_levels_2_) 8 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.138297, max = 1.52126 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_2_esteig_) 8 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_2_) 8 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaij rows=8940, cols=8940, bs=6 total: nonzeros=1.99829e+06, allocated nonzeros=1.99829e+06 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 364 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 3 ------------------------------- KSP Object: (mg_levels_3_) 8 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.1, max = 1.1 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_3_esteig_) 8 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_3_) 8 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: () 8 MPI processes type: mpiaij rows=199752, cols=199752, bs=3 total: nonzeros=1.54794e+07, allocated nonzeros=1.54794e+07 total number of mallocs used during MatSetValues calls =0 has attached near null space using I-node (on process 0) routines: found 9163 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: () 8 MPI processes type: mpiaij rows=199752, cols=199752, bs=3 total: nonzeros=1.54794e+07, allocated nonzeros=1.54794e+07 total number of mallocs used during MatSetValues calls =0 has attached near null space using I-node (on process 0) routines: found 9163 nodes, limit used is 5 Error, conv_flag < 0! ---------------------------------------------------------------------------------------------------------- 8 MPI processes with "-ksp_type cg -pc_type gamg": 0 KSP Residual norm 1.243143122667e-03 1 KSP Residual norm 7.792397481334e-04 KSP Object: 8 MPI processes type: cg maximum iterations=200 tolerances: relative=1e-12, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using PRECONDITIONED norm type for convergence test PC Object: 8 MPI processes type: gamg MG: type is MULTIPLICATIVE, levels=4 cycles=v Cycles per PCApply=1 Using Galerkin computed coarse grid matrices GAMG specific options Threshold for dropping small values from graph 0 AGG specific options Symmetric graph false Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 8 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 8 MPI processes type: bjacobi block Jacobi: number of blocks = 8 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (mg_coarse_sub_) 1 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_sub_) 1 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=36, cols=36, bs=6 package used to perform factorization: petsc total: nonzeros=1296, allocated nonzeros=1296 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 8 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=36, cols=36, bs=6 total: nonzeros=1296, allocated nonzeros=1296 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 8 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaij rows=36, cols=36, bs=6 total: nonzeros=1296, allocated nonzeros=1296 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 8 nodes, limit used is 5 Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 8 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.164941, max = 1.81436 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_1_esteig_) 8 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_1_) 8 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaij rows=672, cols=672, bs=6 total: nonzeros=255600, allocated nonzeros=255600 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 10 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_levels_2_) 8 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.138297, max = 1.52126 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_2_esteig_) 8 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_2_) 8 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaij rows=8940, cols=8940, bs=6 total: nonzeros=1.99829e+06, allocated nonzeros=1.99829e+06 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 364 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 3 ------------------------------- KSP Object: (mg_levels_3_) 8 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.1, max = 1.1 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_3_esteig_) 8 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_3_) 8 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: () 8 MPI processes type: mpiaij rows=199752, cols=199752, bs=3 total: nonzeros=1.54794e+07, allocated nonzeros=1.54794e+07 total number of mallocs used during MatSetValues calls =0 has attached near null space using I-node (on process 0) routines: found 9163 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: () 8 MPI processes type: mpiaij rows=199752, cols=199752, bs=3 total: nonzeros=1.54794e+07, allocated nonzeros=1.54794e+07 total number of mallocs used during MatSetValues calls =0 has attached near null space using I-node (on process 0) routines: found 9163 nodes, limit used is 5 Error, conv_flag < 0! ---------------------------------------------------------------------------------------------------------- 8 MPI processes with "-ksp_type cg -pc_type gamg -ksp_norm_type natural": 0 KSP Residual norm 3.519959598679e-02 1 KSP Residual norm 6.406057376702e-04 KSP Object: 8 MPI processes type: cg maximum iterations=200 tolerances: relative=1e-12, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NATURAL norm type for convergence test PC Object: 8 MPI processes type: gamg MG: type is MULTIPLICATIVE, levels=4 cycles=v Cycles per PCApply=1 Using Galerkin computed coarse grid matrices GAMG specific options Threshold for dropping small values from graph 0 AGG specific options Symmetric graph false Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 8 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 8 MPI processes type: bjacobi block Jacobi: number of blocks = 8 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (mg_coarse_sub_) 1 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_sub_) 1 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=36, cols=36, bs=6 package used to perform factorization: petsc total: nonzeros=1296, allocated nonzeros=1296 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 8 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=36, cols=36, bs=6 total: nonzeros=1296, allocated nonzeros=1296 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 8 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaij rows=36, cols=36, bs=6 total: nonzeros=1296, allocated nonzeros=1296 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 8 nodes, limit used is 5 Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 8 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.164941, max = 1.81436 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_1_esteig_) 8 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_1_) 8 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaij rows=672, cols=672, bs=6 total: nonzeros=255600, allocated nonzeros=255600 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 10 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_levels_2_) 8 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.138297, max = 1.52126 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_2_esteig_) 8 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_2_) 8 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaij rows=8940, cols=8940, bs=6 total: nonzeros=1.99829e+06, allocated nonzeros=1.99829e+06 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 364 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 3 ------------------------------- KSP Object: (mg_levels_3_) 8 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.1, max = 1.1 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_3_esteig_) 8 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_3_) 8 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: () 8 MPI processes type: mpiaij rows=199752, cols=199752, bs=3 total: nonzeros=1.54794e+07, allocated nonzeros=1.54794e+07 total number of mallocs used during MatSetValues calls =0 has attached near null space using I-node (on process 0) routines: found 9163 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: () 8 MPI processes type: mpiaij rows=199752, cols=199752, bs=3 total: nonzeros=1.54794e+07, allocated nonzeros=1.54794e+07 total number of mallocs used during MatSetValues calls =0 has attached near null space using I-node (on process 0) routines: found 9163 nodes, limit used is 5 Error, conv_flag < 0! ---------------------------------------------------------------------------------------------------------- 8 MPI processes with "-ksp_type cg -pc_type gamg -ksp_norm_type natural -pc_mg_type full": 0 KSP Residual norm 3.520284638135e-02 1 KSP Residual norm 1.694661905493e-02 KSP Object: 8 MPI processes type: cg maximum iterations=200 tolerances: relative=1e-12, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NATURAL norm type for convergence test PC Object: 8 MPI processes type: gamg MG: type is FULL, levels=4 cycles=v Using Galerkin computed coarse grid matrices GAMG specific options Threshold for dropping small values from graph 0 AGG specific options Symmetric graph false Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 8 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 8 MPI processes type: bjacobi block Jacobi: number of blocks = 8 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (mg_coarse_sub_) 1 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_sub_) 1 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=36, cols=36, bs=6 package used to perform factorization: petsc total: nonzeros=1296, allocated nonzeros=1296 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 8 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=36, cols=36, bs=6 total: nonzeros=1296, allocated nonzeros=1296 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 8 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaij rows=36, cols=36, bs=6 total: nonzeros=1296, allocated nonzeros=1296 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 8 nodes, limit used is 5 Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 8 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.162676, max = 1.78943 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_1_esteig_) 8 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_1_) 8 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaij rows=672, cols=672, bs=6 total: nonzeros=255600, allocated nonzeros=255600 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 10 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_levels_2_) 8 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.138297, max = 1.52126 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_2_esteig_) 8 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_2_) 8 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaij rows=8940, cols=8940, bs=6 total: nonzeros=1.99829e+06, allocated nonzeros=1.99829e+06 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 364 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 3 ------------------------------- KSP Object: (mg_levels_3_) 8 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.1, max = 1.1 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_3_esteig_) 8 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_3_) 8 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: () 8 MPI processes type: mpiaij rows=199752, cols=199752, bs=3 total: nonzeros=1.54794e+07, allocated nonzeros=1.54794e+07 total number of mallocs used during MatSetValues calls =0 has attached near null space using I-node (on process 0) routines: found 9163 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: () 8 MPI processes type: mpiaij rows=199752, cols=199752, bs=3 total: nonzeros=1.54794e+07, allocated nonzeros=1.54794e+07 total number of mallocs used during MatSetValues calls =0 has attached near null space using I-node (on process 0) routines: found 9163 nodes, limit used is 5 Error, conv_flag < 0! ---------------------------------------------------------------------------------------------------------- 4 MPI processes with "-ksp_type cg -pc_type gamg -ksp_norm_type natural -pc_mg_type full": 0 KSP Residual norm 3.848292390006e-01 1 KSP Residual norm 3.394562853380e-02 2 KSP Residual norm 1.709369896514e-02 3 KSP Residual norm 6.553998141868e-03 4 KSP Residual norm 1.571187862685e-03 5 KSP Residual norm 5.542725674501e-04 6 KSP Residual norm 2.190719066108e-04 7 KSP Residual norm 8.746263629667e-05 8 KSP Residual norm 2.888658373789e-05 9 KSP Residual norm 1.070029893769e-05 10 KSP Residual norm 4.344983313819e-06 11 KSP Residual norm 1.580078249323e-06 12 KSP Residual norm 6.433222755357e-07 13 KSP Residual norm 3.060196103782e-07 14 KSP Residual norm 1.370666779262e-07 15 KSP Residual norm 5.152436428051e-08 16 KSP Residual norm 2.069305024182e-08 17 KSP Residual norm 8.324752474842e-09 18 KSP Residual norm 3.288645476001e-09 19 KSP Residual norm 1.304811823492e-09 20 KSP Residual norm 4.871732366599e-10 21 KSP Residual norm 2.006752778499e-10 22 KSP Residual norm 8.721524661501e-11 23 KSP Residual norm 3.839304119068e-11 24 KSP Residual norm 1.592613431818e-11 25 KSP Residual norm 5.759010833611e-12 26 KSP Residual norm 2.300499231327e-12 27 KSP Residual norm 9.338357091753e-13 28 KSP Residual norm 3.271574791340e-13 KSP Object: 4 MPI processes type: cg maximum iterations=200 tolerances: relative=1e-12, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NATURAL norm type for convergence test PC Object: 4 MPI processes type: gamg MG: type is FULL, levels=4 cycles=v Using Galerkin computed coarse grid matrices GAMG specific options Threshold for dropping small values from graph 0 AGG specific options Symmetric graph false Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 4 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 4 MPI processes type: bjacobi block Jacobi: number of blocks = 4 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (mg_coarse_sub_) 1 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_sub_) 1 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=36, cols=36, bs=6 package used to perform factorization: petsc total: nonzeros=1296, allocated nonzeros=1296 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 8 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=36, cols=36, bs=6 total: nonzeros=1296, allocated nonzeros=1296 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 8 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 4 MPI processes type: mpiaij rows=36, cols=36, bs=6 total: nonzeros=1296, allocated nonzeros=1296 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 8 nodes, limit used is 5 Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 4 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.15311, max = 1.6842 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_1_esteig_) 4 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_1_) 4 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: 4 MPI processes type: mpiaij rows=690, cols=690, bs=6 total: nonzeros=271044, allocated nonzeros=271044 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 47 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_levels_2_) 4 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.140891, max = 1.5498 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_2_esteig_) 4 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_2_) 4 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: 4 MPI processes type: mpiaij rows=8892, cols=8892, bs=6 total: nonzeros=1.9759e+06, allocated nonzeros=1.9759e+06 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 742 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 3 ------------------------------- KSP Object: (mg_levels_3_) 4 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.152642, max = 1.67907 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_3_esteig_) 4 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_3_) 4 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: () 4 MPI processes type: mpiaij rows=199752, cols=199752, bs=3 total: nonzeros=1.54794e+07, allocated nonzeros=1.54794e+07 total number of mallocs used during MatSetValues calls =0 has attached near null space using I-node (on process 0) routines: found 17386 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: () 4 MPI processes type: mpiaij rows=199752, cols=199752, bs=3 total: nonzeros=1.54794e+07, allocated nonzeros=1.54794e+07 total number of mallocs used during MatSetValues calls =0 has attached near null space using I-node (on process 0) routines: found 17386 nodes, limit used is 5 ---------------------------------------------------------------------------------------------------------- 4 MPI processes with "-ksp_type cg -pc_type gamg": 0 KSP Residual norm 2.475992358942e-01 1 KSP Residual norm 1.010115669246e-01 2 KSP Residual norm 5.483162946279e-02 3 KSP Residual norm 3.603244951862e-02 4 KSP Residual norm 1.685453948015e-02 5 KSP Residual norm 6.509593131622e-03 6 KSP Residual norm 2.632449026853e-03 7 KSP Residual norm 1.241917834713e-03 8 KSP Residual norm 5.224606030158e-04 9 KSP Residual norm 2.415967606517e-04 10 KSP Residual norm 1.260567680797e-04 11 KSP Residual norm 6.128703145411e-05 12 KSP Residual norm 2.679437838882e-05 13 KSP Residual norm 1.365494948139e-05 14 KSP Residual norm 7.038019548649e-06 15 KSP Residual norm 3.080810878376e-06 16 KSP Residual norm 9.636609709684e-07 17 KSP Residual norm 4.213041474760e-07 18 KSP Residual norm 2.289578124495e-07 19 KSP Residual norm 1.263375775690e-07 20 KSP Residual norm 5.703870904870e-08 21 KSP Residual norm 2.485007763011e-08 22 KSP Residual norm 1.237773305875e-08 23 KSP Residual norm 5.385311109191e-09 24 KSP Residual norm 2.329545414721e-09 25 KSP Residual norm 8.675622114080e-10 26 KSP Residual norm 3.570856507066e-10 27 KSP Residual norm 1.734555265152e-10 28 KSP Residual norm 8.579357543855e-11 29 KSP Residual norm 3.633034228722e-11 30 KSP Residual norm 1.400577345736e-11 31 KSP Residual norm 6.244917840390e-12 32 KSP Residual norm 3.389807223488e-12 33 KSP Residual norm 1.737000616285e-12 34 KSP Residual norm 8.070277344018e-13 35 KSP Residual norm 3.849555117241e-13 36 KSP Residual norm 1.738497314772e-13 KSP Object: 4 MPI processes type: cg maximum iterations=200 tolerances: relative=1e-12, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using PRECONDITIONED norm type for convergence test PC Object: 4 MPI processes type: gamg MG: type is MULTIPLICATIVE, levels=4 cycles=v Cycles per PCApply=1 Using Galerkin computed coarse grid matrices GAMG specific options Threshold for dropping small values from graph 0 AGG specific options Symmetric graph false Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 4 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 4 MPI processes type: bjacobi block Jacobi: number of blocks = 4 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (mg_coarse_sub_) 1 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_sub_) 1 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=36, cols=36, bs=6 package used to perform factorization: petsc total: nonzeros=1296, allocated nonzeros=1296 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 8 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=36, cols=36, bs=6 total: nonzeros=1296, allocated nonzeros=1296 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 8 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 4 MPI processes type: mpiaij rows=36, cols=36, bs=6 total: nonzeros=1296, allocated nonzeros=1296 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 8 nodes, limit used is 5 Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 4 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.155999, max = 1.71599 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_1_esteig_) 4 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_1_) 4 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: 4 MPI processes type: mpiaij rows=690, cols=690, bs=6 total: nonzeros=271044, allocated nonzeros=271044 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 47 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_levels_2_) 4 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.140448, max = 1.54493 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_2_esteig_) 4 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_2_) 4 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: 4 MPI processes type: mpiaij rows=8892, cols=8892, bs=6 total: nonzeros=1.9759e+06, allocated nonzeros=1.9759e+06 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 742 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 3 ------------------------------- KSP Object: (mg_levels_3_) 4 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.152642, max = 1.67907 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_3_esteig_) 4 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_3_) 4 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: () 4 MPI processes type: mpiaij rows=199752, cols=199752, bs=3 total: nonzeros=1.54794e+07, allocated nonzeros=1.54794e+07 total number of mallocs used during MatSetValues calls =0 has attached near null space using I-node (on process 0) routines: found 17386 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: () 4 MPI processes type: mpiaij rows=199752, cols=199752, bs=3 total: nonzeros=1.54794e+07, allocated nonzeros=1.54794e+07 total number of mallocs used during MatSetValues calls =0 has attached near null space using I-node (on process 0) routines: found 17386 nodes, limit used is 5 ---------------------------------------------------------------------------------------------------------- 4 MPI processes with "-ksp_type gmres -pc_type gamg": 0 KSP Residual norm 2.475992358942e-01 1 KSP Residual norm 9.788251866410e-02 2 KSP Residual norm 5.360475672741e-02 3 KSP Residual norm 3.205714918186e-02 4 KSP Residual norm 1.499738421848e-02 5 KSP Residual norm 5.656223448764e-03 6 KSP Residual norm 2.237512723165e-03 7 KSP Residual norm 1.072453752902e-03 8 KSP Residual norm 4.596044917514e-04 9 KSP Residual norm 2.133364000371e-04 10 KSP Residual norm 1.112111466773e-04 11 KSP Residual norm 5.012424812027e-05 12 KSP Residual norm 2.154120596339e-05 13 KSP Residual norm 1.113658201430e-05 14 KSP Residual norm 5.834290931458e-06 15 KSP Residual norm 2.584285286138e-06 16 KSP Residual norm 8.085531298248e-07 17 KSP Residual norm 3.730338029295e-07 18 KSP Residual norm 1.913619404427e-07 19 KSP Residual norm 1.034223785467e-07 20 KSP Residual norm 4.752281229701e-08 21 KSP Residual norm 2.194488512677e-08 22 KSP Residual norm 1.127906237520e-08 23 KSP Residual norm 4.848127907381e-09 24 KSP Residual norm 2.073720505286e-09 25 KSP Residual norm 7.549640791079e-10 26 KSP Residual norm 3.159362670485e-10 27 KSP Residual norm 1.480081790670e-10 28 KSP Residual norm 7.315280619947e-11 29 KSP Residual norm 3.096498631652e-11 30 KSP Residual norm 1.189973158428e-11 31 KSP Residual norm 6.479291407172e-12 32 KSP Residual norm 3.492865137025e-12 33 KSP Residual norm 1.805801366839e-12 34 KSP Residual norm 8.167217552481e-13 35 KSP Residual norm 3.806289288690e-13 36 KSP Residual norm 1.782376313089e-13 KSP Object: 4 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=200 tolerances: relative=1e-12, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using PRECONDITIONED norm type for convergence test PC Object: 4 MPI processes type: gamg MG: type is MULTIPLICATIVE, levels=4 cycles=v Cycles per PCApply=1 Using Galerkin computed coarse grid matrices GAMG specific options Threshold for dropping small values from graph 0 AGG specific options Symmetric graph false Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 4 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 4 MPI processes type: bjacobi block Jacobi: number of blocks = 4 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (mg_coarse_sub_) 1 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_sub_) 1 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=36, cols=36, bs=6 package used to perform factorization: petsc total: nonzeros=1296, allocated nonzeros=1296 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 8 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=36, cols=36, bs=6 total: nonzeros=1296, allocated nonzeros=1296 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 8 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 4 MPI processes type: mpiaij rows=36, cols=36, bs=6 total: nonzeros=1296, allocated nonzeros=1296 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 8 nodes, limit used is 5 Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 4 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.155999, max = 1.71599 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_1_esteig_) 4 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_1_) 4 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: 4 MPI processes type: mpiaij rows=690, cols=690, bs=6 total: nonzeros=271044, allocated nonzeros=271044 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 47 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_levels_2_) 4 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.140448, max = 1.54493 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_2_esteig_) 4 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_2_) 4 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: 4 MPI processes type: mpiaij rows=8892, cols=8892, bs=6 total: nonzeros=1.9759e+06, allocated nonzeros=1.9759e+06 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 742 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 3 ------------------------------- KSP Object: (mg_levels_3_) 4 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.152642, max = 1.67907 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_3_esteig_) 4 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_3_) 4 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: () 4 MPI processes type: mpiaij rows=199752, cols=199752, bs=3 total: nonzeros=1.54794e+07, allocated nonzeros=1.54794e+07 total number of mallocs used during MatSetValues calls =0 has attached near null space using I-node (on process 0) routines: found 17386 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: () 4 MPI processes type: mpiaij rows=199752, cols=199752, bs=3 total: nonzeros=1.54794e+07, allocated nonzeros=1.54794e+07 total number of mallocs used during MatSetValues calls =0 has attached near null space using I-node (on process 0) routines: found 17386 nodes, limit used is 5 ---------------------------------------------------------------------------------------------------------- 4 MPI processes with "-ksp_type fgmres -pc_type gamg": 0 KSP Residual norm 1.000000000000e+00 1 KSP Residual norm 4.455458968821e-02 2 KSP Residual norm 1.488836455945e-02 3 KSP Residual norm 9.951640321741e-03 4 KSP Residual norm 4.935202239340e-03 5 KSP Residual norm 2.196694493952e-03 6 KSP Residual norm 8.473972720318e-04 7 KSP Residual norm 3.682116362426e-04 8 KSP Residual norm 1.810781420824e-04 9 KSP Residual norm 8.751313625489e-05 10 KSP Residual norm 4.184578916542e-05 11 KSP Residual norm 2.187519733560e-05 12 KSP Residual norm 9.952092586537e-06 13 KSP Residual norm 4.670524263053e-06 14 KSP Residual norm 2.273179771211e-06 15 KSP Residual norm 1.077537786223e-06 16 KSP Residual norm 5.567436295073e-07 17 KSP Residual norm 2.877739699547e-07 18 KSP Residual norm 1.308592111530e-07 19 KSP Residual norm 5.712099664096e-08 20 KSP Residual norm 2.699430333773e-08 21 KSP Residual norm 1.305214840427e-08 22 KSP Residual norm 5.720307231306e-09 23 KSP Residual norm 2.481505127019e-09 24 KSP Residual norm 1.111010201625e-09 25 KSP Residual norm 4.839255906413e-10 26 KSP Residual norm 2.109362228309e-10 27 KSP Residual norm 9.986868872181e-11 28 KSP Residual norm 4.448162227505e-11 29 KSP Residual norm 2.209195701027e-11 30 KSP Residual norm 9.636290761114e-12 31 KSP Residual norm 5.491604445745e-12 32 KSP Residual norm 2.816561935838e-12 33 KSP Residual norm 1.430060877293e-12 34 KSP Residual norm 7.093552021344e-13 KSP Object: 4 MPI processes type: fgmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=200 tolerances: relative=1e-12, absolute=1e-50, divergence=10000 right preconditioning using nonzero initial guess using UNPRECONDITIONED norm type for convergence test PC Object: 4 MPI processes type: gamg MG: type is MULTIPLICATIVE, levels=4 cycles=v Cycles per PCApply=1 Using Galerkin computed coarse grid matrices GAMG specific options Threshold for dropping small values from graph 0 AGG specific options Symmetric graph false Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 4 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 4 MPI processes type: bjacobi block Jacobi: number of blocks = 4 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (mg_coarse_sub_) 1 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_sub_) 1 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=36, cols=36, bs=6 package used to perform factorization: petsc total: nonzeros=1296, allocated nonzeros=1296 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 8 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=36, cols=36, bs=6 total: nonzeros=1296, allocated nonzeros=1296 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 8 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 4 MPI processes type: mpiaij rows=36, cols=36, bs=6 total: nonzeros=1296, allocated nonzeros=1296 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 8 nodes, limit used is 5 Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 4 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.155999, max = 1.71599 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_1_esteig_) 4 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_1_) 4 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: 4 MPI processes type: mpiaij rows=690, cols=690, bs=6 total: nonzeros=271044, allocated nonzeros=271044 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 47 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_levels_2_) 4 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.140448, max = 1.54493 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_2_esteig_) 4 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_2_) 4 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: 4 MPI processes type: mpiaij rows=8892, cols=8892, bs=6 total: nonzeros=1.9759e+06, allocated nonzeros=1.9759e+06 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 742 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 3 ------------------------------- KSP Object: (mg_levels_3_) 4 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.152642, max = 1.67907 Chebyshev: eigenvalues estimated using gmres with translations [0 0.1; 0 1.1] KSP Object: (mg_levels_3_esteig_) 4 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_3_) 4 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: () 4 MPI processes type: mpiaij rows=199752, cols=199752, bs=3 total: nonzeros=1.54794e+07, allocated nonzeros=1.54794e+07 total number of mallocs used during MatSetValues calls =0 has attached near null space using I-node (on process 0) routines: found 17386 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: () 4 MPI processes type: mpiaij rows=199752, cols=199752, bs=3 total: nonzeros=1.54794e+07, allocated nonzeros=1.54794e+07 total number of mallocs used during MatSetValues calls =0 has attached near null space using I-node (on process 0) routines: found 17386 nodes, limit used is 5 ---------------------------------------------------------------------------------------------------------- From zocca.marco at gmail.com Sat Nov 14 06:28:22 2015 From: zocca.marco at gmail.com (Marco Zocca) Date: Sat, 14 Nov 2015 13:28:22 +0100 Subject: [petsc-users] modeling a dual mesh Message-ID: What construct can I use to build and keep track (i.e. map over) a staggered index set, i.e. the dual (vertices, elements) meshes for a FEM application. I am looking for the equivalent, on a regular mesh, of the `sieve` construction ( http://www.mcs.anl.gov/petsc/documentation/tutorials/sieve.pdf , which I understand to be only available for DMPLEX). I could hack together my own by using DM and IS, but I'd first like to be sure about the non-existence of current implementations. Thank you in advance, Marco -------------- next part -------------- An HTML attachment was scrubbed... URL: From timothee.nicolas at gmail.com Sat Nov 14 09:21:13 2015 From: timothee.nicolas at gmail.com (=?UTF-8?Q?Timoth=C3=A9e_Nicolas?=) Date: Sun, 15 Nov 2015 00:21:13 +0900 Subject: [petsc-users] syntax for routine in PCMGSetResidual In-Reply-To: References: Message-ID: My bad, I had carefully set all the levels, but stupidly forgotten to set the operators on the main KSP... Then the error message makes a lot of sense. Best Timoth?e 2015-11-13 12:00 GMT+09:00 Matthew Knepley : > On Thu, Nov 12, 2015 at 7:56 PM, Timoth?e Nicolas < > timothee.nicolas at gmail.com> wrote: > >> Mmmh, that's strange because I define my matrices with the command >> >> call >> MatCreateShell(PETSC_COMM_WORLD,lctx(level)%localsize,lctx(level)%localsize, >> & >> & lctx(level)%ngpdof,lctx(level)%ngpdof,lctx(level), >> & lctx(level)%Mmat,ierr) >> >> and at each level I checked that the sizes "localsize" and "ngpdof" are >> well set. >> > > You should be able to trace back in the debugger to see what is sat as > pc->mat. > > Matt > > >> Timothee >> >> 2015-11-13 10:53 GMT+09:00 Matthew Knepley : >> >>> On Thu, Nov 12, 2015 at 7:39 PM, Timoth?e Nicolas < >>> timothee.nicolas at gmail.com> wrote: >>> >>>> Sorry, here is the full error message >>>> >>>> [0]PETSC ERROR: Nonconforming object sizes >>>> [0]PETSC ERROR: Preconditioner number of local rows -1 does not equal >>>> resulting vector number of rows 71808 >>>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html >>>> for trouble shooting. >>>> [0]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 >>>> [0]PETSC ERROR: ./mips_implicit on a arch-linux2-c-opt named helios90 >>>> by tnicolas Fri Nov 13 10:39:14 2015 >>>> [0]PETSC ERROR: Configure options >>>> --prefix=/csc/softs/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real >>>> --with-debugging=0 --with-x=0 --with-cc=mpicc --with-fc=mpif90 >>>> --with-cxx=mpicxx --with-fortran --known-mpi-shared-libraries=1 >>>> --with-scalar-type=real --with-precision=double --CFLAGS="-g -O3 -mavx >>>> -mkl" --CXXFLAGS="-g -O3 -mavx -mkl" --FFLAGS="-g -O3 -mavx -mkl" >>>> [0]PETSC ERROR: #1 PCApply() line 472 in >>>> /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/pc/interface/precon.c >>>> [0]PETSC ERROR: #2 KSP_PCApply() line 242 in >>>> /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/include/petsc/private/kspimpl.h >>>> [0]PETSC ERROR: #3 KSPInitialResidual() line 63 in >>>> /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itres.c >>>> [0]PETSC ERROR: #4 KSPSolve_GMRES() line 235 in >>>> /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/impls/gmres/gmres.c >>>> [0]PETSC ERROR: #5 KSPSolve() line 604 in >>>> /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/src/ksp/ksp/interface/itfunc.c >>>> >>> >>> The PC uses the matrix it gets to determine sizes, and compare to the >>> input vectors it gets for PCApply(). The >>> preconditioner matrix is not setup or is not reporting sizes, for >>> example if its a MATSHELL it does not have any sizes. >>> >>> Matt >>> >>> >>>> 2015-11-13 10:38 GMT+09:00 Matthew Knepley : >>>> >>>>> On Thu, Nov 12, 2015 at 6:47 PM, Timoth?e Nicolas < >>>>> timothee.nicolas at gmail.com> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> In the manual and the documentation, the syntax for the routine to be >>>>>> given as argument of PCMGSetResidual: >>>>>> >>>>>> PCMGSetResidual (PC pc,PetscInt l,PetscErrorCode (*residual)(Mat ,Vec ,Vec ,Vec ),Mat mat) >>>>>> >>>>>> >>>>>> is not specified. I mean that the order of the vectors is not >>>>>> specified. I suppose it is something like >>>>>> residual(Mat,b,x,r) with r = b - Mat*x, but it could as well be any >>>>>> combination like residual(Mat,r,x,b). There is no example in the >>>>>> documentation of the usage so I am confused. Does it absolutely need to be >>>>>> set ? I find the manual a bit confusing on this point. Is it only if >>>>>> matrix-free matrices are used ? >>>>>> >>>>>> In the present situation, I use matrix-free operators in a multigrid >>>>>> preconditioner (but the interpolation and restriction are not matrix free) >>>>>> and have not set this residual function yet. I get the following error: >>>>>> >>>>> >>>>> Always always always give the entire error message. We want the stack. >>>>> >>>>> The problem here looks like the preconditioner is reporting -1 rows >>>>> for process 13. >>>>> >>>>> Matt >>>>> >>>>> >>>>>> [13]PETSC ERROR: Preconditioner number of local rows -1 does not >>>>>> equal resulting vector number of rows 67584 >>>>>> >>>>>> Could this be related ? By the way, I don't understand what is meant >>>>>> by the "preconditioner number of local rows". I have separately tested the >>>>>> operators at each level and they are fine. >>>>>> >>>>>> Best >>>>>> >>>>>> Timothee >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>> >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sat Nov 14 14:03:32 2015 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 14 Nov 2015 14:03:32 -0600 Subject: [petsc-users] modeling a dual mesh In-Reply-To: References: Message-ID: On Sat, Nov 14, 2015 at 6:28 AM, Marco Zocca wrote: > What construct can I use to build and keep track (i.e. map over) a > staggered index set, i.e. the dual (vertices, elements) meshes for a FEM > application. > > I am looking for the equivalent, on a regular mesh, of the `sieve` > construction ( > http://www.mcs.anl.gov/petsc/documentation/tutorials/sieve.pdf , which I > understand to be only available for DMPLEX). > > I could hack together my own by using DM and IS, but I'd first like to be > sure about the non-existence of current implementations. > 1) Everyone who has done this in PETSc just uses another DMDA and some buffer cells 2) I think the right way to do this is to have a canonical numbering for the pieces of a DMDA and use PetscSection. I wrote a bunch of support for this, which allows arbitrary partitioning, staggered and PIC (and DG) discretizations, and FEM-style residuals. However, no one cared because using multiple DMDAs is so easy. Matt > Thank you in advance, > > Marco > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From blechta at karlin.mff.cuni.cz Sun Nov 15 13:54:17 2015 From: blechta at karlin.mff.cuni.cz (Jan Blechta) Date: Sun, 15 Nov 2015 20:54:17 +0100 Subject: [petsc-users] SuperLU_dist computes rubbish Message-ID: <20151115205417.4af84e0d@gott> Using attached petsc4py code, matrix and right-hand side, SuperLU_dist returns totally wrong solution for mixed Laplacian: $ tar -xzf report.tar.gz $ python test-solve.py -pc_factor_mat_solver_package mumps -ksp_final_residual KSP final norm of residual 3.81865e-15 $ python test-solve.py -pc_factor_mat_solver_package umfpack -ksp_final_residual KSP final norm of residual 3.68546e-14 $ python test-solve.py -pc_factor_mat_solver_package superlu_dist -ksp_final_residual KSP final norm of residual 1827.72 Moreover final residual is random when run using mpirun -np 3. Maybe a memory corruption issue? This is reproducible using PETSc 3.6.2 (and SuperLU_dist configured by PETSc) and much older, see http://fenicsproject.org/pipermail/fenics-support/2014-March/000439.html but has never been reported upstream. The code for assembling the matrix and rhs using FEniCS is also included for the sake of completeness. Jan -------------- next part -------------- A non-text attachment was scrubbed... Name: report.tar.gz Type: application/gzip Size: 54716 bytes Desc: not available URL: From hzhang at mcs.anl.gov Sun Nov 15 20:53:29 2015 From: hzhang at mcs.anl.gov (Hong) Date: Sun, 15 Nov 2015 20:53:29 -0600 Subject: [petsc-users] SuperLU_dist computes rubbish In-Reply-To: <20151115205417.4af84e0d@gott> References: <20151115205417.4af84e0d@gott> Message-ID: Jan: I can reproduce reported behavior using petsc/src/ksp/ksp/examples/tutorials/ex10.c on your mat.dat and rhs.dat. Using petsc sequential lu with default ordering 'nd', I get ./ex10 -f0 mat.dat -rhs rhs.dat -pc_type lu Number of iterations = 0 Residual norm 0.0220971 Changing to ./ex10 -f0 mat.dat -rhs rhs.dat -pc_type lu -pc_factor_mat_ordering_type natural Number of iterations = 1 Residual norm < 1.e-12 Back to superlu_dist, I get mpiexec -n 3 ./ex10 -f0 /homes/hzhang/tmp/mat.dat -rhs /homes/hzhang/tmp/rhs.dat -pc_type lu -pc_factor_mat_solver_package superlu_dist Number of iterations = 4 Residual norm 25650.8 which uses default ordering (-ksp_view) Row permutation LargeDiag Column permutation METIS_AT_PLUS_A Run it with mpiexec -n 3 ./ex10 -f0 mat.dat -rhs rhs.dat -pc_type lu -pc_factor_mat_solver_package superlu_dist -mat_superlu_dist_rowperm NATURAL -mat_superlu_dist_colperm NATURAL Number of iterations = 1 Residual norm < 1.e-12 i.e., your problem is sensitive to matrix ordering, which I do not know why. I checked condition number of your mat.dat using superlu: ./ex10 -f0 mat.dat -rhs rhs.dat -pc_type lu -pc_factor_mat_solver_package superlu -mat_superlu_conditionnumber Recip. condition number = 1.137938e-03 Number of iterations = 1 Residual norm < 1.e-12 As you see, matrix is well-conditioned. Why is it so sensitive to matrix ordering? Hong Using attached petsc4py code, matrix and right-hand side, SuperLU_dist > returns totally wrong solution for mixed Laplacian: > > $ tar -xzf report.tar.gz > $ python test-solve.py -pc_factor_mat_solver_package mumps > -ksp_final_residual > KSP final norm of residual 3.81865e-15 > $ python test-solve.py -pc_factor_mat_solver_package umfpack > -ksp_final_residual > KSP final norm of residual 3.68546e-14 > $ python test-solve.py -pc_factor_mat_solver_package superlu_dist > -ksp_final_residual > KSP final norm of residual 1827.72 > > Moreover final residual is random when run using mpirun -np 3. Maybe > a memory corruption issue? This is reproducible using PETSc 3.6.2 (and > SuperLU_dist configured by PETSc) and much older, see > http://fenicsproject.org/pipermail/fenics-support/2014-March/000439.html > but has never been reported upstream. > > The code for assembling the matrix and rhs using FEniCS is also > included for the sake of completeness. > > Jan > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sun Nov 15 21:41:49 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sun, 15 Nov 2015 21:41:49 -0600 Subject: [petsc-users] SuperLU_dist computes rubbish In-Reply-To: References: <20151115205417.4af84e0d@gott> Message-ID: <10DF0500-8691-49A9-BBC8-5E5096B22B54@mcs.anl.gov> $ ./ex10 -pc_type lu -ksp_monitor_true_residual -f0 ~/Downloads/mat.dat -rhs ~/Downloads/rhs.dat -ksp_converged_reason Linear solve did not converge due to DIVERGED_NANORINF iterations 0 Number of iterations = 0 Residual norm 0.0220971 The matrix has a zero pivot with the nd ordering $ ./ex10 -pc_type lu -ksp_monitor_true_residual -f0 ~/Downloads/mat.dat -rhs ~/Downloads/rhs.dat -ksp_converged_reason -ksp_error_if_not_converged [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Zero pivot in LU factorization: http://www.mcs.anl.gov/petsc/documentation/faq.html#zeropivot [0]PETSC ERROR: Zero pivot row 5 value 0. tolerance 2.22045e-14 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.6.2-1539-g5ca2a2b GIT Date: 2015-11-13 02:00:47 -0600 [0]PETSC ERROR: ./ex10 on a arch-mpich-nemesis named Barrys-MacBook-Pro.local by barrysmith Sun Nov 15 21:37:15 2015 [0]PETSC ERROR: Configure options --download-mpich --download-mpich-device=ch3:nemesis [0]PETSC ERROR: #1 MatPivotCheck_none() line 688 in /Users/barrysmith/Src/PETSc/include/petsc/private/matimpl.h [0]PETSC ERROR: #2 MatPivotCheck() line 707 in /Users/barrysmith/Src/PETSc/include/petsc/private/matimpl.h [0]PETSC ERROR: #3 MatLUFactorNumeric_SeqAIJ_Inode() line 1332 in /Users/barrysmith/Src/PETSc/src/mat/impls/aij/seq/inode.c [0]PETSC ERROR: #4 MatLUFactorNumeric() line 2946 in /Users/barrysmith/Src/PETSc/src/mat/interface/matrix.c [0]PETSC ERROR: #5 PCSetUp_LU() line 152 in /Users/barrysmith/Src/PETSc/src/ksp/pc/impls/factor/lu/lu.c [0]PETSC ERROR: #6 PCSetUp() line 984 in /Users/barrysmith/Src/PETSc/src/ksp/pc/interface/precon.c [0]PETSC ERROR: #7 KSPSetUp() line 332 in /Users/barrysmith/Src/PETSc/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #8 main() line 312 in /Users/barrysmith/Src/petsc/src/ksp/ksp/examples/tutorials/ex10.c [0]PETSC ERROR: PETSc Option Table entries: [0]PETSC ERROR: -f0 /Users/barrysmith/Downloads/mat.dat [0]PETSC ERROR: -ksp_converged_reason [0]PETSC ERROR: -ksp_error_if_not_converged [0]PETSC ERROR: -ksp_monitor_true_residual [0]PETSC ERROR: -malloc_test [0]PETSC ERROR: -pc_type lu [0]PETSC ERROR: -rhs /Users/barrysmith/Downloads/rhs.dat [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- application called MPI_Abort(MPI_COMM_WORLD, 71) - process 0 ~/Src/petsc/src/ksp/ksp/examples/tutorials (barry/utilize-hwloc *>) arch-mpich-nemesis $ ./ex10 -pc_type lu -ksp_monitor_true_residual -f0 ~/Downloads/mat.dat -rhs ~/Downloads/rhs.dat -ksp_converged_reason -ksp_error_if_not_converged -pc_factor_nonzeros_along_diagonal" > ~/Src/petsc/src/ksp/ksp/examples/tutorials (barry/utilize-hwloc *>) arch-mpich-nemesis $ ./ex10 -pc_type lu -ksp_monitor_true_residual -f0 ~/Downloads/mat.dat -rhs ~/Downloads/rhs.dat -ksp_converged_reason -ksp_error_if_not_converged -pc_factor_nonzeros_along_diagonal 0 KSP preconditioned resid norm 1.905901677970e+00 true resid norm 2.209708691208e-02 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.703926496877e-14 true resid norm 5.880234823611e-15 ||r(i)||/||b|| 2.661090507997e-13 Linear solve converged due to CONVERGED_RTOL iterations 1 Number of iterations = 1 Residual norm < 1.e-12 ~/Src/petsc/src/ksp/ksp/examples/tutorials (barry/utilize-hwloc *>) arch-mpich-nemesis Jan, Remember you ALWAYS have to call KSPConvergedReason() after a KSPSolve to see what happened in the solve. > On Nov 15, 2015, at 8:53 PM, Hong wrote: > > Jan: > I can reproduce reported behavior using > petsc/src/ksp/ksp/examples/tutorials/ex10.c on your mat.dat and rhs.dat. > > Using petsc sequential lu with default ordering 'nd', I get > ./ex10 -f0 mat.dat -rhs rhs.dat -pc_type lu > Number of iterations = 0 > Residual norm 0.0220971 > > Changing to > ./ex10 -f0 mat.dat -rhs rhs.dat -pc_type lu -pc_factor_mat_ordering_type natural > Number of iterations = 1 > Residual norm < 1.e-12 > > Back to superlu_dist, I get > mpiexec -n 3 ./ex10 -f0 /homes/hzhang/tmp/mat.dat -rhs /homes/hzhang/tmp/rhs.dat -pc_type lu -pc_factor_mat_solver_package superlu_dist > Number of iterations = 4 > Residual norm 25650.8 > > which uses default ordering (-ksp_view) > Row permutation LargeDiag > Column permutation METIS_AT_PLUS_A > > Run it with > mpiexec -n 3 ./ex10 -f0 mat.dat -rhs rhs.dat -pc_type lu -pc_factor_mat_solver_package superlu_dist -mat_superlu_dist_rowperm NATURAL -mat_superlu_dist_colperm NATURAL > Number of iterations = 1 > Residual norm < 1.e-12 > > i.e., your problem is sensitive to matrix ordering, which I do not know why. > > I checked condition number of your mat.dat using superlu: > ./ex10 -f0 mat.dat -rhs rhs.dat -pc_type lu -pc_factor_mat_solver_package superlu -mat_superlu_conditionnumber > Recip. condition number = 1.137938e-03 > Number of iterations = 1 > Residual norm < 1.e-12 > > As you see, matrix is well-conditioned. Why is it so sensitive to matrix ordering? > > Hong > > Using attached petsc4py code, matrix and right-hand side, SuperLU_dist > returns totally wrong solution for mixed Laplacian: > > $ tar -xzf report.tar.gz > $ python test-solve.py -pc_factor_mat_solver_package mumps -ksp_final_residual > KSP final norm of residual 3.81865e-15 > $ python test-solve.py -pc_factor_mat_solver_package umfpack -ksp_final_residual > KSP final norm of residual 3.68546e-14 > $ python test-solve.py -pc_factor_mat_solver_package superlu_dist -ksp_final_residual > KSP final norm of residual 1827.72 > > Moreover final residual is random when run using mpirun -np 3. Maybe > a memory corruption issue? This is reproducible using PETSc 3.6.2 (and > SuperLU_dist configured by PETSc) and much older, see > http://fenicsproject.org/pipermail/fenics-support/2014-March/000439.html > but has never been reported upstream. > > The code for assembling the matrix and rhs using FEniCS is also > included for the sake of completeness. > > Jan > From zocca.marco at gmail.com Mon Nov 16 05:37:27 2015 From: zocca.marco at gmail.com (Marco Zocca) Date: Mon, 16 Nov 2015 12:37:27 +0100 Subject: [petsc-users] shared components for real- and complex-scalar builds Message-ID: Currently, I'm re-building fblaslapack and mpich along with PETSc for both cases, but would it be possible to share either between the two builds? Thank you, Marco -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Nov 16 06:24:22 2015 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 16 Nov 2015 06:24:22 -0600 Subject: [petsc-users] shared components for real- and complex-scalar builds In-Reply-To: References: Message-ID: On Mon, Nov 16, 2015 at 5:37 AM, Marco Zocca wrote: > Currently, I'm re-building fblaslapack and mpich along with PETSc for both > cases, but would it be possible to share either between the two builds? > If you build them yourself with the same compilers, independent of configure, yes. Matt > Thank you, > Marco > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon Nov 16 07:17:48 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 16 Nov 2015 07:17:48 -0600 Subject: [petsc-users] shared components for real- and complex-scalar builds In-Reply-To: References: Message-ID: > On Nov 16, 2015, at 6:24 AM, Matthew Knepley wrote: > > On Mon, Nov 16, 2015 at 5:37 AM, Marco Zocca wrote: > Currently, I'm re-building fblaslapack and mpich along with PETSc for both cases, but would it be possible to share either between the two builds? > > If you build them yourself with the same compilers, independent of configure, yes. But it is not worth it. > > Matt > > Thank you, > Marco > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener From blechta at karlin.mff.cuni.cz Mon Nov 16 08:04:28 2015 From: blechta at karlin.mff.cuni.cz (Jan Blechta) Date: Mon, 16 Nov 2015 15:04:28 +0100 Subject: [petsc-users] SuperLU_dist computes rubbish In-Reply-To: <10DF0500-8691-49A9-BBC8-5E5096B22B54@mcs.anl.gov> References: <20151115205417.4af84e0d@gott> <10DF0500-8691-49A9-BBC8-5E5096B22B54@mcs.anl.gov> Message-ID: <20151116150428.21ac4973@gott> On Sun, 15 Nov 2015 21:41:49 -0600 Barry Smith wrote: > $ ./ex10 -pc_type lu -ksp_monitor_true_residual -f0 > ~/Downloads/mat.dat -rhs ~/Downloads/rhs.dat -ksp_converged_reason > Linear solve did not converge due to DIVERGED_NANORINF iterations 0 > Number of iterations = 0 Residual norm 0.0220971 > > > The matrix has a zero pivot with the nd ordering > > $ ./ex10 -pc_type lu -ksp_monitor_true_residual -f0 > ~/Downloads/mat.dat -rhs ~/Downloads/rhs.dat -ksp_converged_reason > -ksp_error_if_not_converged [0]PETSC ERROR: --------------------- > Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Zero pivot in LU factorization: > http://www.mcs.anl.gov/petsc/documentation/faq.html#zeropivot > [0]PETSC ERROR: Zero pivot row 5 value 0. tolerance 2.22045e-14 > > [0]PETSC ERROR: See > http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble > shooting. [0]PETSC ERROR: Petsc Development GIT revision: > v3.6.2-1539-g5ca2a2b GIT Date: 2015-11-13 02:00:47 -0600 [0]PETSC > ERROR: ./ex10 on a arch-mpich-nemesis named Barrys-MacBook-Pro.local > by barrysmith Sun Nov 15 21:37:15 2015 [0]PETSC ERROR: Configure > options --download-mpich --download-mpich-device=ch3:nemesis [0]PETSC > ERROR: #1 MatPivotCheck_none() line 688 > in /Users/barrysmith/Src/PETSc/include/petsc/private/matimpl.h > [0]PETSC ERROR: #2 MatPivotCheck() line 707 > in /Users/barrysmith/Src/PETSc/include/petsc/private/matimpl.h > [0]PETSC ERROR: #3 MatLUFactorNumeric_SeqAIJ_Inode() line 1332 > in /Users/barrysmith/Src/PETSc/src/mat/impls/aij/seq/inode.c [0]PETSC > ERROR: #4 MatLUFactorNumeric() line 2946 > in /Users/barrysmith/Src/PETSc/src/mat/interface/matrix.c [0]PETSC > ERROR: #5 PCSetUp_LU() line 152 > in /Users/barrysmith/Src/PETSc/src/ksp/pc/impls/factor/lu/lu.c > [0]PETSC ERROR: #6 PCSetUp() line 984 > in /Users/barrysmith/Src/PETSc/src/ksp/pc/interface/precon.c [0]PETSC > ERROR: #7 KSPSetUp() line 332 > in /Users/barrysmith/Src/PETSc/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: #8 main() line 312 > in /Users/barrysmith/Src/petsc/src/ksp/ksp/examples/tutorials/ex10.c > [0]PETSC ERROR: PETSc Option Table entries: [0]PETSC ERROR: > -f0 /Users/barrysmith/Downloads/mat.dat [0]PETSC ERROR: > -ksp_converged_reason [0]PETSC ERROR: -ksp_error_if_not_converged > [0]PETSC ERROR: -ksp_monitor_true_residual [0]PETSC ERROR: > -malloc_test [0]PETSC ERROR: -pc_type lu [0]PETSC ERROR: > -rhs /Users/barrysmith/Downloads/rhs.dat [0]PETSC ERROR: > ----------------End of Error Message -------send entire error message > to petsc-maint at mcs.anl.gov---------- application called > MPI_Abort(MPI_COMM_WORLD, 71) - process 0 > ~/Src/petsc/src/ksp/ksp/examples/tutorials (barry/utilize-hwloc *>) > arch-mpich-nemesis > > $ ./ex10 -pc_type lu -ksp_monitor_true_residual -f0 > ~/Downloads/mat.dat -rhs ~/Downloads/rhs.dat -ksp_converged_reason > -ksp_error_if_not_converged -pc_factor_nonzeros_along_diagonal" > > > ~/Src/petsc/src/ksp/ksp/examples/tutorials (barry/utilize-hwloc *>) > arch-mpich-nemesis $ ./ex10 -pc_type lu -ksp_monitor_true_residual > -f0 ~/Downloads/mat.dat -rhs ~/Downloads/rhs.dat > -ksp_converged_reason -ksp_error_if_not_converged > -pc_factor_nonzeros_along_diagonal 0 KSP preconditioned resid norm > 1.905901677970e+00 true resid norm 2.209708691208e-02 ||r(i)||/||b|| > 1.000000000000e+00 1 KSP preconditioned resid norm 1.703926496877e-14 > true resid norm 5.880234823611e-15 ||r(i)||/||b|| 2.661090507997e-13 > Linear solve converged due to CONVERGED_RTOL iterations 1 Number of > iterations = 1 Residual norm < 1.e-12 > ~/Src/petsc/src/ksp/ksp/examples/tutorials (barry/utilize-hwloc *>) > arch-mpich-nemesis > > Jan, > > Remember you ALWAYS have to call KSPConvergedReason() after a > KSPSolve to see what happened in the solve. Yes, it was a stupidity on my side. Thanks, Barry and Hong. Jan > > > On Nov 15, 2015, at 8:53 PM, Hong wrote: > > > > Jan: > > I can reproduce reported behavior using > > petsc/src/ksp/ksp/examples/tutorials/ex10.c on your mat.dat and > > rhs.dat. > > > > Using petsc sequential lu with default ordering 'nd', I get > > ./ex10 -f0 mat.dat -rhs rhs.dat -pc_type lu > > Number of iterations = 0 > > Residual norm 0.0220971 > > > > Changing to > > ./ex10 -f0 mat.dat -rhs rhs.dat -pc_type lu > > -pc_factor_mat_ordering_type natural Number of iterations = 1 > > Residual norm < 1.e-12 > > > > Back to superlu_dist, I get > > mpiexec -n 3 ./ex10 -f0 /homes/hzhang/tmp/mat.dat > > -rhs /homes/hzhang/tmp/rhs.dat -pc_type lu > > -pc_factor_mat_solver_package superlu_dist Number of iterations = > > 4 Residual norm 25650.8 > > > > which uses default ordering (-ksp_view) > > Row permutation LargeDiag > > Column permutation METIS_AT_PLUS_A > > > > Run it with > > mpiexec -n 3 ./ex10 -f0 mat.dat -rhs rhs.dat -pc_type lu > > -pc_factor_mat_solver_package superlu_dist > > -mat_superlu_dist_rowperm NATURAL -mat_superlu_dist_colperm NATURAL > > Number of iterations = 1 Residual norm < 1.e-12 > > > > i.e., your problem is sensitive to matrix ordering, which I do not > > know why. > > > > I checked condition number of your mat.dat using superlu: > > ./ex10 -f0 mat.dat -rhs rhs.dat -pc_type lu > > -pc_factor_mat_solver_package superlu -mat_superlu_conditionnumber > > Recip. condition number = 1.137938e-03 Number of iterations = 1 > > Residual norm < 1.e-12 > > > > As you see, matrix is well-conditioned. Why is it so sensitive to > > matrix ordering? > > > > Hong > > > > Using attached petsc4py code, matrix and right-hand side, > > SuperLU_dist returns totally wrong solution for mixed Laplacian: > > > > $ tar -xzf report.tar.gz > > $ python test-solve.py -pc_factor_mat_solver_package mumps > > -ksp_final_residual KSP final norm of residual 3.81865e-15 > > $ python test-solve.py -pc_factor_mat_solver_package umfpack > > -ksp_final_residual KSP final norm of residual 3.68546e-14 > > $ python test-solve.py -pc_factor_mat_solver_package superlu_dist > > -ksp_final_residual KSP final norm of residual 1827.72 > > > > Moreover final residual is random when run using mpirun -np 3. Maybe > > a memory corruption issue? This is reproducible using PETSc 3.6.2 > > (and SuperLU_dist configured by PETSc) and much older, see > > http://fenicsproject.org/pipermail/fenics-support/2014-March/000439.html > > but has never been reported upstream. > > > > The code for assembling the matrix and rhs using FEniCS is also > > included for the sake of completeness. > > > > Jan > > From hzhang at mcs.anl.gov Mon Nov 16 09:22:45 2015 From: hzhang at mcs.anl.gov (Hong) Date: Mon, 16 Nov 2015 09:22:45 -0600 Subject: [petsc-users] SuperLU_dist computes rubbish In-Reply-To: <10DF0500-8691-49A9-BBC8-5E5096B22B54@mcs.anl.gov> References: <20151115205417.4af84e0d@gott> <10DF0500-8691-49A9-BBC8-5E5096B22B54@mcs.anl.gov> Message-ID: Barry: > $ ./ex10 -pc_type lu -ksp_monitor_true_residual -f0 ~/Downloads/mat.dat > -rhs ~/Downloads/rhs.dat -ksp_converged_reason > Linear solve did not converge due to DIVERGED_NANORINF iterations 0 > Number of iterations = 0 > Residual norm 0.0220971 > Hmm, I'm working on it, and forgot to check '-ksp_converged_reason'. However, superlu_dist does not report zero pivot, might simply 'exit'. I'll contact Sherry about it. Hong > > The matrix has a zero pivot with the nd ordering > > $ ./ex10 -pc_type lu -ksp_monitor_true_residual -f0 ~/Downloads/mat.dat > -rhs ~/Downloads/rhs.dat -ksp_converged_reason -ksp_error_if_not_converged > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Zero pivot in LU factorization: > http://www.mcs.anl.gov/petsc/documentation/faq.html#zeropivot > [0]PETSC ERROR: Zero pivot row 5 value 0. tolerance 2.22045e-14 > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.6.2-1539-g5ca2a2b GIT > Date: 2015-11-13 02:00:47 -0600 > [0]PETSC ERROR: ./ex10 on a arch-mpich-nemesis named > Barrys-MacBook-Pro.local by barrysmith Sun Nov 15 21:37:15 2015 > [0]PETSC ERROR: Configure options --download-mpich > --download-mpich-device=ch3:nemesis > [0]PETSC ERROR: #1 MatPivotCheck_none() line 688 in > /Users/barrysmith/Src/PETSc/include/petsc/private/matimpl.h > [0]PETSC ERROR: #2 MatPivotCheck() line 707 in > /Users/barrysmith/Src/PETSc/include/petsc/private/matimpl.h > [0]PETSC ERROR: #3 MatLUFactorNumeric_SeqAIJ_Inode() line 1332 in > /Users/barrysmith/Src/PETSc/src/mat/impls/aij/seq/inode.c > [0]PETSC ERROR: #4 MatLUFactorNumeric() line 2946 in > /Users/barrysmith/Src/PETSc/src/mat/interface/matrix.c > [0]PETSC ERROR: #5 PCSetUp_LU() line 152 in > /Users/barrysmith/Src/PETSc/src/ksp/pc/impls/factor/lu/lu.c > [0]PETSC ERROR: #6 PCSetUp() line 984 in > /Users/barrysmith/Src/PETSc/src/ksp/pc/interface/precon.c > [0]PETSC ERROR: #7 KSPSetUp() line 332 in > /Users/barrysmith/Src/PETSc/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: #8 main() line 312 in > /Users/barrysmith/Src/petsc/src/ksp/ksp/examples/tutorials/ex10.c > [0]PETSC ERROR: PETSc Option Table entries: > [0]PETSC ERROR: -f0 /Users/barrysmith/Downloads/mat.dat > [0]PETSC ERROR: -ksp_converged_reason > [0]PETSC ERROR: -ksp_error_if_not_converged > [0]PETSC ERROR: -ksp_monitor_true_residual > [0]PETSC ERROR: -malloc_test > [0]PETSC ERROR: -pc_type lu > [0]PETSC ERROR: -rhs /Users/barrysmith/Downloads/rhs.dat > [0]PETSC ERROR: ----------------End of Error Message -------send entire > error message to petsc-maint at mcs.anl.gov---------- > application called MPI_Abort(MPI_COMM_WORLD, 71) - process 0 > ~/Src/petsc/src/ksp/ksp/examples/tutorials (barry/utilize-hwloc *>) > arch-mpich-nemesis > > $ ./ex10 -pc_type lu -ksp_monitor_true_residual -f0 ~/Downloads/mat.dat > -rhs ~/Downloads/rhs.dat -ksp_converged_reason -ksp_error_if_not_converged > -pc_factor_nonzeros_along_diagonal" > > > ~/Src/petsc/src/ksp/ksp/examples/tutorials (barry/utilize-hwloc *>) > arch-mpich-nemesis > $ ./ex10 -pc_type lu -ksp_monitor_true_residual -f0 ~/Downloads/mat.dat > -rhs ~/Downloads/rhs.dat -ksp_converged_reason -ksp_error_if_not_converged > -pc_factor_nonzeros_along_diagonal > 0 KSP preconditioned resid norm 1.905901677970e+00 true resid norm > 2.209708691208e-02 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP preconditioned resid norm 1.703926496877e-14 true resid norm > 5.880234823611e-15 ||r(i)||/||b|| 2.661090507997e-13 > Linear solve converged due to CONVERGED_RTOL iterations 1 > Number of iterations = 1 > Residual norm < 1.e-12 > ~/Src/petsc/src/ksp/ksp/examples/tutorials (barry/utilize-hwloc *>) > arch-mpich-nemesis > > Jan, > > Remember you ALWAYS have to call KSPConvergedReason() after a KSPSolve > to see what happened in the solve. > > > On Nov 15, 2015, at 8:53 PM, Hong wrote: > > > > Jan: > > I can reproduce reported behavior using > > petsc/src/ksp/ksp/examples/tutorials/ex10.c on your mat.dat and rhs.dat. > > > > Using petsc sequential lu with default ordering 'nd', I get > > ./ex10 -f0 mat.dat -rhs rhs.dat -pc_type lu > > Number of iterations = 0 > > Residual norm 0.0220971 > > > > Changing to > > ./ex10 -f0 mat.dat -rhs rhs.dat -pc_type lu -pc_factor_mat_ordering_type > natural > > Number of iterations = 1 > > Residual norm < 1.e-12 > > > > Back to superlu_dist, I get > > mpiexec -n 3 ./ex10 -f0 /homes/hzhang/tmp/mat.dat -rhs > /homes/hzhang/tmp/rhs.dat -pc_type lu -pc_factor_mat_solver_package > superlu_dist > > Number of iterations = 4 > > Residual norm 25650.8 > > > > which uses default ordering (-ksp_view) > > Row permutation LargeDiag > > Column permutation METIS_AT_PLUS_A > > > > Run it with > > mpiexec -n 3 ./ex10 -f0 mat.dat -rhs rhs.dat -pc_type lu > -pc_factor_mat_solver_package superlu_dist -mat_superlu_dist_rowperm > NATURAL -mat_superlu_dist_colperm NATURAL > > Number of iterations = 1 > > Residual norm < 1.e-12 > > > > i.e., your problem is sensitive to matrix ordering, which I do not know > why. > > > > I checked condition number of your mat.dat using superlu: > > ./ex10 -f0 mat.dat -rhs rhs.dat -pc_type lu > -pc_factor_mat_solver_package superlu -mat_superlu_conditionnumber > > Recip. condition number = 1.137938e-03 > > Number of iterations = 1 > > Residual norm < 1.e-12 > > > > As you see, matrix is well-conditioned. Why is it so sensitive to matrix > ordering? > > > > Hong > > > > Using attached petsc4py code, matrix and right-hand side, SuperLU_dist > > returns totally wrong solution for mixed Laplacian: > > > > $ tar -xzf report.tar.gz > > $ python test-solve.py -pc_factor_mat_solver_package mumps > -ksp_final_residual > > KSP final norm of residual 3.81865e-15 > > $ python test-solve.py -pc_factor_mat_solver_package umfpack > -ksp_final_residual > > KSP final norm of residual 3.68546e-14 > > $ python test-solve.py -pc_factor_mat_solver_package superlu_dist > -ksp_final_residual > > KSP final norm of residual 1827.72 > > > > Moreover final residual is random when run using mpirun -np 3. Maybe > > a memory corruption issue? This is reproducible using PETSc 3.6.2 (and > > SuperLU_dist configured by PETSc) and much older, see > > http://fenicsproject.org/pipermail/fenics-support/2014-March/000439.html > > but has never been reported upstream. > > > > The code for assembling the matrix and rhs using FEniCS is also > > included for the sake of completeness. > > > > Jan > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From naage at mek.dtu.dk Mon Nov 16 09:26:42 2015 From: naage at mek.dtu.dk (Niels Aage) Date: Mon, 16 Nov 2015 15:26:42 +0000 Subject: [petsc-users] Using STRUMPACK within PETSc Message-ID: Hi all, First of all, I hope this is the right place to pose the following question. We're trying to create a shell preconditioner based on low rank properties of a linear system. For this to be efficient it requires a fast (mpi-parallelized) method for compressing (low rank) dense matrices. A tool that offers just this is STRUMPACK (http://portal.nersc.gov/project/sparse/strumpack/). So we simply need to link our petsc program to strumpack, but we have a hard time performing the linking. We have cooked our code down to a very simple test problem (see the attached code) that tries to invoke a single method from strumpack. This results in an "undefined reference to SDP_C_double_init(....)" even though the method clearly is in the object file "StrumpackDensePackage_C.o". Note that my PETSc (3.6.2) is build with scalapack support which is needed by strumpack - and that I have used the PETSc libs to build and link the strumpack test example "STRUMPACK-Dense-1.1.1/examples/c_example.c". And that this works just fine:-) We hope you have suggestions on how we can proceed - and let me know if any additional information is needed. Thanks, Niels Aage Associate Professor, Ph.D. Department of Mechanical Engineering, Section for Solid Mechanics Centre for Acoustic-Mechanical Micro Systems Technical University of Denmark, Building 404, DK-2800 Lyngby, Denmark Phone: (+45) 4525 4253, Fax: (+45) 4593 1475, E-mail: naage at mek.dtu.dk, Office: b404-122 Group homepage: www.topopt.dtu.dk Centre homepage: www.camm.elektro.dtu.dk -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: code.zip Type: application/zip Size: 681109 bytes Desc: code.zip URL: From Eric.Chamberland at giref.ulaval.ca Mon Nov 16 09:38:02 2015 From: Eric.Chamberland at giref.ulaval.ca (Eric Chamberland) Date: Mon, 16 Nov 2015 10:38:02 -0500 Subject: [petsc-users] On the edge of 2^31 unknowns Message-ID: <5649F85A.9020807@giref.ulaval.ca> Hi, we just tried this morning to solve a 1.7 billion dofs problem (on 1060 processors task) . Unfortunately, petsc-3.5.3 didn't succeed within a MatMatMult product with the following backtrace: [0]PETSC ERROR: Out of memory. This could be due to allocating [0]PETSC ERROR: too large an object or bleeding by not properly [0]PETSC ERROR: destroying unneeded objects. [0]PETSC ERROR: Memory allocated 0 Memory used by process 4126371840 [0]PETSC ERROR: Try running with -malloc_dump or -malloc_log for info. [0]PETSC ERROR: Memory requested 18446744070461249536 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.5.3, Jan, 31, 2015 [0]PETSC ERROR: /rap/jsf-051-aa/ericc/GIREF/bin/probGD.opt on a linux-gnu-intel named r101-n31 by [0]PETSC ERROR: Configure options PETSC_ARCH=linux-gnu-intel CFLAGS="-O3 -xHost -mkl -fPIC -m64 -n [0]PETSC ERROR: #1 PetscMallocAlign() line 46 in /software6/src/petsc-3.5.3/src/sys/memory/mal.c [0]PETSC ERROR: #2 PetscLLCondensedCreate_Scalable() line 1327 in /software6/src/petsc-3.5.3/inclu [0]PETSC ERROR: #3 MatMatMultSymbolic_MPIAIJ_MPIAIJ() line 741 in /software6/src/petsc-3.5.3/src/m [0]PETSC ERROR: #4 MatMatMult_MPIAIJ_MPIAIJ() line 33 in /software6/src/petsc-3.5.3/src/mat/impls/ [0]PETSC ERROR: #5 MatMatMult() line 8713 in /software6/src/petsc-3.5.3/src/mat/interface/matrix.c The amount of memory requested is, apparently, an overflow of a PetscInt which is a 32 bit signed int. Bad user I am, I didn't expected that to solve a problem with less than 2^31 unknowns, it could be mandatory to compile petsc with 64bit indices... Is compiling with "--with-64-bit-indices" the only solution to my problem? Is it a known limitation/bug with a "Petsc-32bit-indices"? Thanks, Eric From knepley at gmail.com Mon Nov 16 09:42:14 2015 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 16 Nov 2015 09:42:14 -0600 Subject: [petsc-users] On the edge of 2^31 unknowns In-Reply-To: <5649F85A.9020807@giref.ulaval.ca> References: <5649F85A.9020807@giref.ulaval.ca> Message-ID: Sometimes when we do not have exact counts, we need to overestimate sizes. This is especially true in sparse MatMat. Matt On Mon, Nov 16, 2015 at 9:38 AM, Eric Chamberland < Eric.Chamberland at giref.ulaval.ca> wrote: > Hi, > > we just tried this morning to solve a 1.7 billion dofs problem (on 1060 > processors task) . > > Unfortunately, petsc-3.5.3 didn't succeed within a MatMatMult product with > the following backtrace: > > [0]PETSC ERROR: Out of memory. This could be due to allocating > [0]PETSC ERROR: too large an object or bleeding by not properly > [0]PETSC ERROR: destroying unneeded objects. > [0]PETSC ERROR: Memory allocated 0 Memory used by process 4126371840 > [0]PETSC ERROR: Try running with -malloc_dump or -malloc_log for info. > [0]PETSC ERROR: Memory requested 18446744070461249536 > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.5.3, Jan, 31, 2015 > [0]PETSC ERROR: /rap/jsf-051-aa/ericc/GIREF/bin/probGD.opt on a > linux-gnu-intel named r101-n31 by > [0]PETSC ERROR: Configure options PETSC_ARCH=linux-gnu-intel CFLAGS="-O3 > -xHost -mkl -fPIC -m64 -n > [0]PETSC ERROR: #1 PetscMallocAlign() line 46 in > /software6/src/petsc-3.5.3/src/sys/memory/mal.c > [0]PETSC ERROR: #2 PetscLLCondensedCreate_Scalable() line 1327 in > /software6/src/petsc-3.5.3/inclu > [0]PETSC ERROR: #3 MatMatMultSymbolic_MPIAIJ_MPIAIJ() line 741 in > /software6/src/petsc-3.5.3/src/m > [0]PETSC ERROR: #4 MatMatMult_MPIAIJ_MPIAIJ() line 33 in > /software6/src/petsc-3.5.3/src/mat/impls/ > [0]PETSC ERROR: #5 MatMatMult() line 8713 in > /software6/src/petsc-3.5.3/src/mat/interface/matrix.c > > The amount of memory requested is, apparently, an overflow of a PetscInt > which is a 32 bit signed int. > > Bad user I am, I didn't expected that to solve a problem with less than > 2^31 unknowns, it could be mandatory to compile petsc with 64bit indices... > > Is compiling with "--with-64-bit-indices" the only solution to my problem? > > Is it a known limitation/bug with a "Petsc-32bit-indices"? > > Thanks, > > Eric > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From Eric.Chamberland at giref.ulaval.ca Mon Nov 16 11:11:49 2015 From: Eric.Chamberland at giref.ulaval.ca (Eric Chamberland) Date: Mon, 16 Nov 2015 12:11:49 -0500 Subject: [petsc-users] On the edge of 2^31 unknowns In-Reply-To: References: <5649F85A.9020807@giref.ulaval.ca> Message-ID: <564A0E55.4010909@giref.ulaval.ca> On 16/11/15 10:42 AM, Matthew Knepley wrote: > Sometimes when we do not have exact counts, we need to overestimate > sizes. This is especially true > in sparse MatMat. Ok... so, to be sure, I am correct if I say that recompiling petsc with "--with-64-bit-indices" is the only solution to my problem? I mean, no other fixes exist for this overestimation in a more recent release of petsc, like putting the result in a "long int" instead? Thanks, Eric From balay at mcs.anl.gov Mon Nov 16 11:21:25 2015 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 16 Nov 2015 11:21:25 -0600 Subject: [petsc-users] Using STRUMPACK within PETSc In-Reply-To: References: Message-ID: >From STRUMPACK-Dense-1.1.1/src/StrumpackDensePackage_C.cpp >>>>>>>>> /* This C++ file implements the functions of the C interface. */ #include "StrumpackDensePackage.hpp" extern "C" { #include "StrumpackDensePackage.h" <<<<<<<<<< So looks like you have to use StrumpackDensePackage.h with "extern C". i.e mainSDP.cc should have: #include extern "C" { #include "StrumpackDensePackage.h" } [usually such details should be handled in the public include file - and not exposed to the user..] Satish On Mon, 16 Nov 2015, Niels Aage wrote: > > Hi all, > > First of all, I hope this is the right place to pose the following question. We're trying to create a shell preconditioner based on low rank properties of a linear system. For this to be efficient it requires a fast (mpi-parallelized) method for compressing (low rank) dense matrices. A tool that offers just this is STRUMPACK (http://portal.nersc.gov/project/sparse/strumpack/). > > So we simply need to link our petsc program to strumpack, but we have a hard time performing the linking. We have cooked our code down to a very simple test problem (see the attached code) that tries to invoke a single method from strumpack. This results in an "undefined reference to SDP_C_double_init(....)" even though the method clearly is in the object file "StrumpackDensePackage_C.o". > > Note that my PETSc (3.6.2) is build with scalapack support which is needed by strumpack - and that I have used the PETSc libs to build and link the strumpack test example "STRUMPACK-Dense-1.1.1/examples/c_example.c". And that this works just fine:-) > > We hope you have suggestions on how we can proceed - and let me know if any additional information is needed. > > Thanks, > Niels Aage > > Associate Professor, Ph.D. > Department of Mechanical Engineering, > Section for Solid Mechanics > Centre for Acoustic-Mechanical Micro Systems > Technical University of Denmark, Building 404, DK-2800 Lyngby, Denmark > > Phone: (+45) 4525 4253, Fax: (+45) 4593 1475, > E-mail: naage at mek.dtu.dk, Office: b404-122 > Group homepage: www.topopt.dtu.dk > Centre homepage: www.camm.elektro.dtu.dk > From mfadams at lbl.gov Mon Nov 16 11:31:25 2015 From: mfadams at lbl.gov (Mark Adams) Date: Mon, 16 Nov 2015 12:31:25 -0500 Subject: [petsc-users] GAMG and zero pivots follow up In-Reply-To: References: <87si4992j8.fsf@jedbrown.org> Message-ID: On Fri, Nov 13, 2015 at 2:48 PM, Matthew Knepley wrote: > On Fri, Nov 13, 2015 at 12:28 PM, Jed Brown wrote: > >> Matthew Knepley writes: >> > Something very strange is happening here. CG should converge >> monotonically, >> > but above it does not. What could be happening? >> >> Are you use -ksp_norm_type natural? CG is not monotone in other norms. >> > > Yikes! I did not check that. Why do we have PRECONDITIONED as the default > for CG? > > The two norm of the residual (ie, natural, right?) is not monotone either in CG. > Matt > > Also, if boundary conditions are enforced using a nonsymmetric >> formulation (for example), then you can get lack of monotonicity with CG >> that may not be catastrophic. >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon Nov 16 11:40:36 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 16 Nov 2015 11:40:36 -0600 Subject: [petsc-users] On the edge of 2^31 unknowns In-Reply-To: <564A0E55.4010909@giref.ulaval.ca> References: <5649F85A.9020807@giref.ulaval.ca> <564A0E55.4010909@giref.ulaval.ca> Message-ID: <869E52FA-3DED-4274-8FBA-C7749F5BBD63@mcs.anl.gov> Eric, The behavior you get with bizarre integers and a crash is not the behavior we want. We would like to detect these overflows appropriately. If you can track through the error and determine the location where the overflow occurs then we would gladly put in additional checks and use of PetscInt64 to handle these things better. So let us know the exact cause and we'll improve the code. Barry > On Nov 16, 2015, at 11:11 AM, Eric Chamberland wrote: > > On 16/11/15 10:42 AM, Matthew Knepley wrote: >> Sometimes when we do not have exact counts, we need to overestimate >> sizes. This is especially true >> in sparse MatMat. > > Ok... so, to be sure, I am correct if I say that recompiling petsc with > "--with-64-bit-indices" is the only solution to my problem? > > I mean, no other fixes exist for this overestimation in a more recent release of petsc, like putting the result in a "long int" instead? > > Thanks, > > Eric > From jed at jedbrown.org Mon Nov 16 11:47:23 2015 From: jed at jedbrown.org (Jed Brown) Date: Mon, 16 Nov 2015 10:47:23 -0700 Subject: [petsc-users] GAMG and zero pivots follow up In-Reply-To: References: <87si4992j8.fsf@jedbrown.org> Message-ID: <87fv057s5w.fsf@jedbrown.org> Mark Adams writes: >> Yikes! I did not check that. Why do we have PRECONDITIONED as the default >> for CG? That is the preconditioned norm of the residual. I.e., || M^{-1} (A x - b) ||_2 > The two norm of the residual (ie, natural, right?) is not monotone either > in CG. "natural" is CG's energy norm, i.e., || e ||_{M^{-1/2} A M^{-T/2}} -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From Eric.Chamberland at giref.ulaval.ca Mon Nov 16 12:26:48 2015 From: Eric.Chamberland at giref.ulaval.ca (Eric Chamberland) Date: Mon, 16 Nov 2015 13:26:48 -0500 Subject: [petsc-users] On the edge of 2^31 unknowns In-Reply-To: <869E52FA-3DED-4274-8FBA-C7749F5BBD63@mcs.anl.gov> References: <5649F85A.9020807@giref.ulaval.ca> <564A0E55.4010909@giref.ulaval.ca> <869E52FA-3DED-4274-8FBA-C7749F5BBD63@mcs.anl.gov> Message-ID: <564A1FE8.2030404@giref.ulaval.ca> Barry, I can't launch the code again and retrieve other informations, since I am not allowed to do so: the cluster have around ~780 nodes and I got a very special permission to reserve 530 of them... So the best I can do is to give you the backtrace PETSc gave me... :/ (see the first post with the backtrace: http://lists.mcs.anl.gov/pipermail/petsc-users/2015-November/027644.html) And until today, all smaller meshes with the same solver succeeded to complete... (I went up to 219 millions of unknowns on 64 nodes). I understand then that there could be some use of PetscInt64 in the actual code that would help fix problems like the one I got. I found it is a big challenge to track down all occurrence of this kind of overflow in the code, due to the size of the systems you have to have to reproduce this problem.... Eric On 16/11/15 12:40 PM, Barry Smith wrote: > > Eric, > > The behavior you get with bizarre integers and a crash is not the behavior we want. We would like to detect these overflows appropriately. If you can track through the error and determine the location where the overflow occurs then we would gladly put in additional checks and use of PetscInt64 to handle these things better. So let us know the exact cause and we'll improve the code. > > Barry > > > >> On Nov 16, 2015, at 11:11 AM, Eric Chamberland wrote: >> >> On 16/11/15 10:42 AM, Matthew Knepley wrote: >>> Sometimes when we do not have exact counts, we need to overestimate >>> sizes. This is especially true >>> in sparse MatMat. >> >> Ok... so, to be sure, I am correct if I say that recompiling petsc with >> "--with-64-bit-indices" is the only solution to my problem? >> >> I mean, no other fixes exist for this overestimation in a more recent release of petsc, like putting the result in a "long int" instead? >> >> Thanks, >> >> Eric >> From naage at mek.dtu.dk Mon Nov 16 12:43:12 2015 From: naage at mek.dtu.dk (Niels Aage) Date: Mon, 16 Nov 2015 18:43:12 +0000 Subject: [petsc-users] Using STRUMPACK within PETSc In-Reply-To: References: , Message-ID: Dear Satish, That totally did the trick ! Thanks a lot:-) Niels ________________________________________ Fra: Satish Balay [balay at mcs.anl.gov] Sendt: 16. november 2015 18:21 Til: Niels Aage Cc: petsc-users at mcs.anl.gov Emne: Re: [petsc-users] Using STRUMPACK within PETSc >From STRUMPACK-Dense-1.1.1/src/StrumpackDensePackage_C.cpp >>>>>>>>> /* This C++ file implements the functions of the C interface. */ #include "StrumpackDensePackage.hpp" extern "C" { #include "StrumpackDensePackage.h" <<<<<<<<<< So looks like you have to use StrumpackDensePackage.h with "extern C". i.e mainSDP.cc should have: #include extern "C" { #include "StrumpackDensePackage.h" } [usually such details should be handled in the public include file - and not exposed to the user..] Satish On Mon, 16 Nov 2015, Niels Aage wrote: > > Hi all, > > First of all, I hope this is the right place to pose the following question. We're trying to create a shell preconditioner based on low rank properties of a linear system. For this to be efficient it requires a fast (mpi-parallelized) method for compressing (low rank) dense matrices. A tool that offers just this is STRUMPACK (http://portal.nersc.gov/project/sparse/strumpack/). > > So we simply need to link our petsc program to strumpack, but we have a hard time performing the linking. We have cooked our code down to a very simple test problem (see the attached code) that tries to invoke a single method from strumpack. This results in an "undefined reference to SDP_C_double_init(....)" even though the method clearly is in the object file "StrumpackDensePackage_C.o". > > Note that my PETSc (3.6.2) is build with scalapack support which is needed by strumpack - and that I have used the PETSc libs to build and link the strumpack test example "STRUMPACK-Dense-1.1.1/examples/c_example.c". And that this works just fine:-) > > We hope you have suggestions on how we can proceed - and let me know if any additional information is needed. > > Thanks, > Niels Aage > > Associate Professor, Ph.D. > Department of Mechanical Engineering, > Section for Solid Mechanics > Centre for Acoustic-Mechanical Micro Systems > Technical University of Denmark, Building 404, DK-2800 Lyngby, Denmark > > Phone: (+45) 4525 4253, Fax: (+45) 4593 1475, > E-mail: naage at mek.dtu.dk, Office: b404-122 > Group homepage: www.topopt.dtu.dk > Centre homepage: www.camm.elektro.dtu.dk > From Eric.Chamberland at giref.ulaval.ca Mon Nov 16 12:59:32 2015 From: Eric.Chamberland at giref.ulaval.ca (Eric Chamberland) Date: Mon, 16 Nov 2015 13:59:32 -0500 Subject: [petsc-users] On the edge of 2^31 unknowns In-Reply-To: <564A1FE8.2030404@giref.ulaval.ca> References: <5649F85A.9020807@giref.ulaval.ca> <564A0E55.4010909@giref.ulaval.ca> <869E52FA-3DED-4274-8FBA-C7749F5BBD63@mcs.anl.gov> <564A1FE8.2030404@giref.ulaval.ca> Message-ID: <564A2794.8080605@giref.ulaval.ca> I looked into the code of PetscLLCondensedCreate_Scalable: ... ierr = PetscMalloc1(2*(nlnk_max+2),lnk);CHKERRQ(ierr); ... and just for fun, I tried this: #include int main() { int a=1741445953; // my number of unknowns... int b=2*(a+2); unsigned long int c = b; std::cout << " a: " << a << " b: " << b << " c: " << c < Barry, > > I can't launch the code again and retrieve other informations, since I > am not allowed to do so: the cluster have around ~780 nodes and I got a > very special permission to reserve 530 of them... > > So the best I can do is to give you the backtrace PETSc gave me... :/ > (see the first post with the backtrace: > http://lists.mcs.anl.gov/pipermail/petsc-users/2015-November/027644.html) > > And until today, all smaller meshes with the same solver succeeded to > complete... (I went up to 219 millions of unknowns on 64 nodes). > > I understand then that there could be some use of PetscInt64 in the > actual code that would help fix problems like the one I got. I found it > is a big challenge to track down all occurrence of this kind of overflow > in the code, due to the size of the systems you have to have to > reproduce this problem.... > > Eric > > > On 16/11/15 12:40 PM, Barry Smith wrote: >> >> Eric, >> >> The behavior you get with bizarre integers and a crash is not the >> behavior we want. We would like to detect these overflows >> appropriately. If you can track through the error and determine the >> location where the overflow occurs then we would gladly put in >> additional checks and use of PetscInt64 to handle these things better. >> So let us know the exact cause and we'll improve the code. >> >> Barry >> >> >> >>> On Nov 16, 2015, at 11:11 AM, Eric Chamberland >>> wrote: >>> >>> On 16/11/15 10:42 AM, Matthew Knepley wrote: >>>> Sometimes when we do not have exact counts, we need to overestimate >>>> sizes. This is especially true >>>> in sparse MatMat. >>> >>> Ok... so, to be sure, I am correct if I say that recompiling petsc with >>> "--with-64-bit-indices" is the only solution to my problem? >>> >>> I mean, no other fixes exist for this overestimation in a more recent >>> release of petsc, like putting the result in a "long int" instead? >>> >>> Thanks, >>> >>> Eric >>> From bsmith at mcs.anl.gov Mon Nov 16 14:22:39 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 16 Nov 2015 14:22:39 -0600 Subject: [petsc-users] On the edge of 2^31 unknowns In-Reply-To: <564A1FE8.2030404@giref.ulaval.ca> References: <5649F85A.9020807@giref.ulaval.ca> <564A0E55.4010909@giref.ulaval.ca> <869E52FA-3DED-4274-8FBA-C7749F5BBD63@mcs.anl.gov> <564A1FE8.2030404@giref.ulaval.ca> Message-ID: <7129FCAE-A4DE-49A1-9811-55CCA6BCC5F0@mcs.anl.gov> > On Nov 16, 2015, at 12:26 PM, Eric Chamberland wrote: > > Barry, > > I can't launch the code again and retrieve other informations, since I am not allowed to do so: the cluster have around ~780 nodes and I got a very special permission to reserve 530 of them... > > So the best I can do is to give you the backtrace PETSc gave me... :/ > (see the first post with the backtrace: http://lists.mcs.anl.gov/pipermail/petsc-users/2015-November/027644.html) > > And until today, all smaller meshes with the same solver succeeded to complete... (I went up to 219 millions of unknowns on 64 nodes). > > I understand then that there could be some use of PetscInt64 in the actual code that would help fix problems like the one I got. I found it is a big challenge to track down all occurrence of this kind of overflow in the code, due to the size of the systems you have to have to reproduce this problem.... Eric, This is exactly our problem and why I asked for you data. Doing a manual code inspection line by line looking for the potential overflow points is tedious and would take forever so we need to fix these instead as we become aware of them. You then wrote I looked into the code of PetscLLCondensedCreate_Scalable: ... ierr = PetscMalloc1(2*(nlnk_max+2),lnk);CHKERRQ(ierr); ... and just for fun, I tried this: #include int main() { int a=1741445953; // my number of unknowns... int b=2*(a+2); unsigned long int c = b; std::cout << " a: " << a << " b: " << b << " c: " << c < > Eric > > > On 16/11/15 12:40 PM, Barry Smith wrote: >> >> Eric, >> >> The behavior you get with bizarre integers and a crash is not the behavior we want. We would like to detect these overflows appropriately. If you can track through the error and determine the location where the overflow occurs then we would gladly put in additional checks and use of PetscInt64 to handle these things better. So let us know the exact cause and we'll improve the code. >> >> Barry >> >> >> >>> On Nov 16, 2015, at 11:11 AM, Eric Chamberland wrote: >>> >>> On 16/11/15 10:42 AM, Matthew Knepley wrote: >>>> Sometimes when we do not have exact counts, we need to overestimate >>>> sizes. This is especially true >>>> in sparse MatMat. >>> >>> Ok... so, to be sure, I am correct if I say that recompiling petsc with >>> "--with-64-bit-indices" is the only solution to my problem? >>> >>> I mean, no other fixes exist for this overestimation in a more recent release of petsc, like putting the result in a "long int" instead? >>> >>> Thanks, >>> >>> Eric >>> > From jed at jedbrown.org Mon Nov 16 13:22:51 2015 From: jed at jedbrown.org (Jed Brown) Date: Mon, 16 Nov 2015 12:22:51 -0700 Subject: [petsc-users] On the edge of 2^31 unknowns In-Reply-To: <564A2794.8080605@giref.ulaval.ca> References: <5649F85A.9020807@giref.ulaval.ca> <564A0E55.4010909@giref.ulaval.ca> <869E52FA-3DED-4274-8FBA-C7749F5BBD63@mcs.anl.gov> <564A1FE8.2030404@giref.ulaval.ca> <564A2794.8080605@giref.ulaval.ca> Message-ID: <87d1v97nqs.fsf@jedbrown.org> Eric Chamberland writes: > int main() { > int a=1741445953; // my number of unknowns... > int b=2*(a+2); > unsigned long int c = b; > std::cout << " a: " << a << " b: " << b << " c: " << c < return 0; > } > > and it gives: > > a: 1741445953 b: -812075386 c: 18446744072897476230 This comes from the integer casting rules. $ cat overflow.c #include int main(int argc,char **argv) { int a=1741445953; size_t a1 = 2*(a+2); size_t a2 = 2ll*(a+2); printf("%zd %zd\n", a1, a2); return 0; } $ make overflow && ./overflow cc overflow.c -o overflow -812075386 3482891910 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From jed at jedbrown.org Mon Nov 16 14:41:11 2015 From: jed at jedbrown.org (Jed Brown) Date: Mon, 16 Nov 2015 13:41:11 -0700 Subject: [petsc-users] On the edge of 2^31 unknowns In-Reply-To: <7129FCAE-A4DE-49A1-9811-55CCA6BCC5F0@mcs.anl.gov> References: <5649F85A.9020807@giref.ulaval.ca> <564A0E55.4010909@giref.ulaval.ca> <869E52FA-3DED-4274-8FBA-C7749F5BBD63@mcs.anl.gov> <564A1FE8.2030404@giref.ulaval.ca> <7129FCAE-A4DE-49A1-9811-55CCA6BCC5F0@mcs.anl.gov> Message-ID: <87a8qd7k48.fsf@jedbrown.org> Barry Smith writes: > Out goal is that if something won't fit in a 32 bit int we use a 64 > bit integer when possible or at least produce a very useful error > message instead of the horrible malloc error you get. The more > crashes you can give us the quicker we can fix these errors. This feels like something that we should be able to find with static analysis, though I don't know how since many of the problems are a consequence of unsuffixed numeric literals having type "int". What if we compiled for an I16LP32 architecture (emulator) so we could find these problems at small scale? -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From Eric.Chamberland at giref.ulaval.ca Mon Nov 16 14:43:26 2015 From: Eric.Chamberland at giref.ulaval.ca (Eric Chamberland) Date: Mon, 16 Nov 2015 15:43:26 -0500 Subject: [petsc-users] On the edge of 2^31 unknowns In-Reply-To: <7129FCAE-A4DE-49A1-9811-55CCA6BCC5F0@mcs.anl.gov> References: <5649F85A.9020807@giref.ulaval.ca> <564A0E55.4010909@giref.ulaval.ca> <869E52FA-3DED-4274-8FBA-C7749F5BBD63@mcs.anl.gov> <564A1FE8.2030404@giref.ulaval.ca> <7129FCAE-A4DE-49A1-9811-55CCA6BCC5F0@mcs.anl.gov> Message-ID: <564A3FEE.30404@giref.ulaval.ca> On 16/11/15 03:22 PM, Barry Smith wrote: > Out goal is that if something won't fit in a 32 bit int we use a 64 bit integer when possible or at least produce a very useful error message instead of the horrible malloc error you get. The more crashes you can give us the quicker we can fix these errors. understood. On the next try, if you have released a fix, let's say in 3.6.3, I will first launch the computation with 32bit indices. If it fails, I will send you the error but.... I will have on hand a 64bit indices version of everything, compiled and ready to be launched just in case of error. Doing so I will not loose my special cluster allocation... anyway, thanks a lot for your help! :) Eric From bsmith at mcs.anl.gov Mon Nov 16 14:46:05 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 16 Nov 2015 14:46:05 -0600 Subject: [petsc-users] On the edge of 2^31 unknowns In-Reply-To: <87a8qd7k48.fsf@jedbrown.org> References: <5649F85A.9020807@giref.ulaval.ca> <564A0E55.4010909@giref.ulaval.ca> <869E52FA-3DED-4274-8FBA-C7749F5BBD63@mcs.anl.gov> <564A1FE8.2030404@giref.ulaval.ca> <7129FCAE-A4DE-49A1-9811-55CCA6BCC5F0@mcs.anl.gov> <87a8qd7k48.fsf@jedbrown.org> Message-ID: <68CD7F1B-CDF8-49C6-9FEF-4D018A465A91@mcs.anl.gov> > On Nov 16, 2015, at 2:41 PM, Jed Brown wrote: > > Barry Smith writes: >> Out goal is that if something won't fit in a 32 bit int we use a 64 >> bit integer when possible or at least produce a very useful error >> message instead of the horrible malloc error you get. The more >> crashes you can give us the quicker we can fix these errors. > > This feels like something that we should be able to find with static > analysis, though I don't know how since many of the problems are a > consequence of unsuffixed numeric literals having type "int". > > What if we compiled for an I16LP32 architecture (emulator) so we could > find these problems at small scale? Or defined PetscInt to be short for test runs? From maxime.boissonneault at calculquebec.ca Mon Nov 16 14:56:14 2015 From: maxime.boissonneault at calculquebec.ca (Maxime Boissonneault) Date: Mon, 16 Nov 2015 15:56:14 -0500 Subject: [petsc-users] On the edge of 2^31 unknowns In-Reply-To: <68CD7F1B-CDF8-49C6-9FEF-4D018A465A91@mcs.anl.gov> References: <5649F85A.9020807@giref.ulaval.ca> <564A0E55.4010909@giref.ulaval.ca> <869E52FA-3DED-4274-8FBA-C7749F5BBD63@mcs.anl.gov> <564A1FE8.2030404@giref.ulaval.ca> <7129FCAE-A4DE-49A1-9811-55CCA6BCC5F0@mcs.anl.gov> <87a8qd7k48.fsf@jedbrown.org> <68CD7F1B-CDF8-49C6-9FEF-4D018A465A91@mcs.anl.gov> Message-ID: <564A42EE.70605@calculquebec.ca> Le 2015-11-16 15:46, Barry Smith a ?crit : >> On Nov 16, 2015, at 2:41 PM, Jed Brown wrote: >> >> Barry Smith writes: >>> Out goal is that if something won't fit in a 32 bit int we use a 64 >>> bit integer when possible or at least produce a very useful error >>> message instead of the horrible malloc error you get. The more >>> crashes you can give us the quicker we can fix these errors. >> This feels like something that we should be able to find with static >> analysis, though I don't know how since many of the problems are a >> consequence of unsuffixed numeric literals having type "int". >> >> What if we compiled for an I16LP32 architecture (emulator) so we could >> find these problems at small scale? > Or defined PetscInt to be short for test runs? > > PetscInt to short should work, as the integer overflow will happen at a much smaller scale, but will still result in a malloc of ~2^63 bytes of memory once the negative number is converted back to size_t. From jed at jedbrown.org Mon Nov 16 14:53:51 2015 From: jed at jedbrown.org (Jed Brown) Date: Mon, 16 Nov 2015 13:53:51 -0700 Subject: [petsc-users] On the edge of 2^31 unknowns In-Reply-To: <68CD7F1B-CDF8-49C6-9FEF-4D018A465A91@mcs.anl.gov> References: <5649F85A.9020807@giref.ulaval.ca> <564A0E55.4010909@giref.ulaval.ca> <869E52FA-3DED-4274-8FBA-C7749F5BBD63@mcs.anl.gov> <564A1FE8.2030404@giref.ulaval.ca> <7129FCAE-A4DE-49A1-9811-55CCA6BCC5F0@mcs.anl.gov> <87a8qd7k48.fsf@jedbrown.org> <68CD7F1B-CDF8-49C6-9FEF-4D018A465A91@mcs.anl.gov> Message-ID: <877flh7jj4.fsf@jedbrown.org> Barry Smith writes: >> On Nov 16, 2015, at 2:41 PM, Jed Brown wrote: >> >> Barry Smith writes: >>> Out goal is that if something won't fit in a 32 bit int we use a 64 >>> bit integer when possible or at least produce a very useful error >>> message instead of the horrible malloc error you get. The more >>> crashes you can give us the quicker we can fix these errors. >> >> This feels like something that we should be able to find with static >> analysis, though I don't know how since many of the problems are a >> consequence of unsuffixed numeric literals having type "int". >> >> What if we compiled for an I16LP32 architecture (emulator) so we could >> find these problems at small scale? > > Or defined PetscInt to be short for test runs? Wouldn't help in this particular case because the numeric literal "2" has type int causing the rest of the arithmetic to be done with that type. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From bsmith at mcs.anl.gov Mon Nov 16 15:31:07 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 16 Nov 2015 15:31:07 -0600 Subject: [petsc-users] On the edge of 2^31 unknowns In-Reply-To: <877flh7jj4.fsf@jedbrown.org> References: <5649F85A.9020807@giref.ulaval.ca> <564A0E55.4010909@giref.ulaval.ca> <869E52FA-3DED-4274-8FBA-C7749F5BBD63@mcs.anl.gov> <564A1FE8.2030404@giref.ulaval.ca> <7129FCAE-A4DE-49A1-9811-55CCA6BCC5F0@mcs.anl.gov> <87a8qd7k48.fsf@jedbrown.org> <68CD7F1B-CDF8-49C6-9FEF-4D018A465A91@mcs.anl.gov> <877flh7jj4.fsf@jedbrown.org> Message-ID: <7E883F0F-BA74-4D76-A7BB-3556A3D8C452@mcs.anl.gov> ah yes, and we have a bunch of those 2's running around > On Nov 16, 2015, at 2:53 PM, Jed Brown wrote: > > Barry Smith writes: > >>> On Nov 16, 2015, at 2:41 PM, Jed Brown wrote: >>> >>> Barry Smith writes: >>>> Out goal is that if something won't fit in a 32 bit int we use a 64 >>>> bit integer when possible or at least produce a very useful error >>>> message instead of the horrible malloc error you get. The more >>>> crashes you can give us the quicker we can fix these errors. >>> >>> This feels like something that we should be able to find with static >>> analysis, though I don't know how since many of the problems are a >>> consequence of unsuffixed numeric literals having type "int". >>> >>> What if we compiled for an I16LP32 architecture (emulator) so we could >>> find these problems at small scale? >> >> Or defined PetscInt to be short for test runs? > > Wouldn't help in this particular case because the numeric literal "2" > has type int causing the rest of the arithmetic to be done with that > type. From bsmith at mcs.anl.gov Mon Nov 16 18:12:38 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 16 Nov 2015 18:12:38 -0600 Subject: [petsc-users] On the edge of 2^31 unknowns In-Reply-To: <564A1FE8.2030404@giref.ulaval.ca> References: <5649F85A.9020807@giref.ulaval.ca> <564A0E55.4010909@giref.ulaval.ca> <869E52FA-3DED-4274-8FBA-C7749F5BBD63@mcs.anl.gov> <564A1FE8.2030404@giref.ulaval.ca> Message-ID: <30507352-1B79-4338-B839-A066159C8CA8@mcs.anl.gov> I have started a branch with utilities to help catch/handle these integer overflow issues https://bitbucket.org/petsc/petsc/pull-requests/389/add-utilities-for-handling-petscint/diff all suggestions are appreciated Barry > On Nov 16, 2015, at 12:26 PM, Eric Chamberland wrote: > > Barry, > > I can't launch the code again and retrieve other informations, since I am not allowed to do so: the cluster have around ~780 nodes and I got a very special permission to reserve 530 of them... > > So the best I can do is to give you the backtrace PETSc gave me... :/ > (see the first post with the backtrace: http://lists.mcs.anl.gov/pipermail/petsc-users/2015-November/027644.html) > > And until today, all smaller meshes with the same solver succeeded to complete... (I went up to 219 millions of unknowns on 64 nodes). > > I understand then that there could be some use of PetscInt64 in the actual code that would help fix problems like the one I got. I found it is a big challenge to track down all occurrence of this kind of overflow in the code, due to the size of the systems you have to have to reproduce this problem.... > > Eric > > > On 16/11/15 12:40 PM, Barry Smith wrote: >> >> Eric, >> >> The behavior you get with bizarre integers and a crash is not the behavior we want. We would like to detect these overflows appropriately. If you can track through the error and determine the location where the overflow occurs then we would gladly put in additional checks and use of PetscInt64 to handle these things better. So let us know the exact cause and we'll improve the code. >> >> Barry >> >> >> >>> On Nov 16, 2015, at 11:11 AM, Eric Chamberland wrote: >>> >>> On 16/11/15 10:42 AM, Matthew Knepley wrote: >>>> Sometimes when we do not have exact counts, we need to overestimate >>>> sizes. This is especially true >>>> in sparse MatMat. >>> >>> Ok... so, to be sure, I am correct if I say that recompiling petsc with >>> "--with-64-bit-indices" is the only solution to my problem? >>> >>> I mean, no other fixes exist for this overestimation in a more recent release of petsc, like putting the result in a "long int" instead? >>> >>> Thanks, >>> >>> Eric >>> > From bsmith at mcs.anl.gov Mon Nov 16 20:17:25 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 16 Nov 2015 20:17:25 -0600 Subject: [petsc-users] Fwd: IDEAS Project xSDK documents open for comment References: Message-ID: PETSc users and developers, For your information. Please comment if you have suggestions for improvement. Barry > > > From: Michael A Heroux > > Date: Thursday, November 12, 2015 at 2:53 PM > To: Trilinos Users > > Subject: IDEAS Project xSDK documents open for comment > > Dear Trilinos Users, > > The IDEAS Scientific SW Developer Productivity project is focused on specific activities intended to enhance developer productivity. One major deliverable for the project is an Extreme-scale Scientific SW Developer Kit (xSDK), which is a collection of policies, tools and interoperability layers on top of widely used libraries. In early 2016, we will release the first version of the xSDK, which will encompass the hypre, PETSc, SuperLU and Trilinos libraries. > > Although each of these libraries will continue with their own independent development and distribution efforts, the IDEAS xSDK will provide common configure, build capabilities, the first phase of increased interoperability between the libraries, and explicit policies for expected practices across all libraries. These enhancements should help users work more easily with all of these libraries in combination. > > In preparation for the first xSDK release, we have two draft documents (package compliance standards and standard configure/CMake options) open for community comment. > > These documents are available from the IDEAS Project website: https://ideas-productivity.org/resources/xsdk-docs / > > We would value your comments on these documents over the next few weeks. > > For general information about the IDEAS Project, please visit the main webpage: https://ideas-productivity.org > > Thanks. > > Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: From hong at aspiritech.org Mon Nov 16 20:47:29 2015 From: hong at aspiritech.org (hong at aspiritech.org) Date: Mon, 16 Nov 2015 20:47:29 -0600 Subject: [petsc-users] Fwd: funny kids on Jimmy Kimmel In-Reply-To: References: Message-ID: My daughter sent me this fun youtube, enjoy :-D with Hillary Clinton! https://www.youtube.com/watch?v=db94fvXK2ww -------------- next part -------------- An HTML attachment was scrubbed... URL: From davydden at gmail.com Tue Nov 17 00:38:51 2015 From: davydden at gmail.com (Denis Davydov) Date: Tue, 17 Nov 2015 07:38:51 +0100 Subject: [petsc-users] CG+GAMG convergence issues in GHEP Krylov-Schur for some MPI runs In-Reply-To: References: <084F486B-00B4-4803-A95B-EE390EB25391@gmail.com> <2724FD04-9FED-43B8-B8A4-9F66BB8BD43B@dsic.upv.es> <4E4A4E38-B9C3-4F41-8D52-1AFF3E4E8C8C@gmail.com> <5ED4A81C-002B-4A6B-BC6D-F93BE200F0A5@mcs.anl.gov> <11E7B1E9-D812-4717-A9F2-929A218573E0@mcs.anl.gov> Message-ID: Hi Mark, > On 12 Nov 2015, at 21:16, Mark Adams wrote: > > There is a valgrind for El Capitan now and I have it. It runs perfectly clean. Do you compile it yourself or use Homebrew / MacPorts? I always seem to have some noise it valgrind at least from OpenMPI (even with suppression file), perhaps it?s better with MPICH. Kind regards, Denis From schor.pavel at gmail.com Tue Nov 17 09:06:46 2015 From: schor.pavel at gmail.com (Pavel Schor) Date: Tue, 17 Nov 2015 16:06:46 +0100 Subject: [petsc-users] Understanding PETSC: boundary layer flow with SNES Message-ID: <564B4286.30701@gmail.com> Dear PETSC users, I am a newbie to PETSC and I try to understand it. My first attempt is to solve 2D laminar boundary layer equations: u*du/dx +v*du/dy = Ue*(dUe/dx). -nu*d2u/dy2 dudx + dvdy = 0 B.C.: y=0: u=0, v=0 y=H: u(x)=Ue(x) Where H is height of the domain, Ue(x) is edge velocity given by a potential flow solver. I assume Ue(x)=10.0 I have several questions regarding the workflow in PETSC, I took SNES example29 (2D cavity flow) and I tried to modify it: 1) There are vectors x and f. I suppose vector x to represent the values of velocity and vector f to represent velocity residuals. Please is this correct? Based on example29 and my assumption, I calculated the velocity: dhx = (PetscReal)(info->mx-1); dhy = (PetscReal)(info->my-1); dx = 1.0/dhx; dy = 1./dhy; u = x[j][i].u; v = x[j][i].v; dudx = (x[j][i+1].u - x[j][i-1].u)/2.0/dx; dudy = (x[j+1][i].u - x[j-1][i].u)/2.0/dy; dvdy = (x[j+1][i].v - x[j-1][i].v)/2.0/dy; d2udy2 = (-2.0*u + x[j-1][i].u + x[j+1][i].u)/dy/dy; /* U velocity */ f[j][i].u = u*dudx +v*dudy -Ue*dUedx -nu*d2udy2; /* V velocity */ f[j][i].v = dudx + dvdy; The code does not work. With initial conditions x[j][i].u=Ue; and x[0][i].u=0.0; the result is the same as the initial conditions. 2) In SNES example29, there are boundary conditions specified on vector f? For example: /* Test whether we are on the top edge of the global array */ if (yinte == info->my) { j = info->my - 1; yinte = yinte - 1; /* top edge */ for (i=info->xs; ixs+info->xm; i++) { f[j][i].u = x[j][i].u - lid; f[j][i].v = x[j][i].v; I don't understand the last two lines. I just deleted the conditions for left and right edges and replaced f[j][i].u = x[j][i].u - lid; with f[j][i].u = x[j][i].u - Ue; 3) Please could someone explain the normalization on following lines? (Taken from example 29) /* Define mesh intervals ratios for uniform grid. Note: FD formulae below are normalized by multiplying through by local volume element (i.e. hx*hy) to obtain coefficients O(1) in two dimensions. */ dhx = (PetscReal)(info->mx-1); dhy = (PetscReal) (info->my-1); hx = 1.0/dhx; hy = 1.0/dhy; hxdhy = hx*dhy; hydhx = hy*dhx; Thanks in advance & Kind regards Pavel Schor PhD. student, Institute of aerospace engineering, Brno University of technology From mono at mek.dtu.dk Tue Nov 17 10:11:10 2015 From: mono at mek.dtu.dk (=?utf-8?B?TW9ydGVuIE5vYmVsLUrDuHJnZW5zZW4=?=) Date: Tue, 17 Nov 2015 16:11:10 +0000 Subject: [petsc-users] Problem Iterating DMPlex Message-ID: <1231A5C4-CAE0-4219-BBC5-8885C959F24D@dtu.dk> After distributing a DMPlex it seems like my cells are appearing twice (or rather multiple cells maps onto the same vertices). I?m assuming the way I?m iterating the DMPlex is wrong. Essentially I iterate the DMPlex the following way after distribution (see code snippet below ? or attached file). A related problem; Since distribution of a DMPlex reorders the point indices, how to do I map between distributed point indices and the original point indices. And a final question: After distributing a DMPlex, some of the vertices are shared and exists in multiple instances. When adding dofs to these, how I I know if dof is owned by the current instance or it is a ghost dof? I hope someone can point me in the right direction :) Kind regards, Morten Code snippet PetscInt from,to,dof,off; DMPlexGetHeightStratum(dm, 0,&from, &to); for (int cellIndex=from;cellIndex 2 0 --> 3 0 --> 4 1 --> 4 1 --> 5 1 --> 6 But when distributing (on two cores) it looks like this, where both cells maps to the same edges (true for both cores): 0 --> 11 0 --> 12 0 --> 13 1 --> 11 1 --> 12 1 --> 13 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: twotriangles.cc Type: application/octet-stream Size: 3823 bytes Desc: twotriangles.cc URL: From knepley at gmail.com Tue Nov 17 10:53:01 2015 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 17 Nov 2015 10:53:01 -0600 Subject: [petsc-users] Problem Iterating DMPlex In-Reply-To: <1231A5C4-CAE0-4219-BBC5-8885C959F24D@dtu.dk> References: <1231A5C4-CAE0-4219-BBC5-8885C959F24D@dtu.dk> Message-ID: On Tue, Nov 17, 2015 at 10:11 AM, Morten Nobel-J?rgensen wrote: > After distributing a DMPlex it seems like my cells are appearing twice (or > rather multiple cells maps onto the same vertices). > Your code is creating the serial mesh on each process. You only want nonzero sizes on one proc. You can see the mesh using -dm_view, and everything using -dm_view ::ascii_info_detail Thanks, Matt > I?m assuming the way I?m iterating the DMPlex is wrong. Essentially I > iterate the DMPlex the following way after distribution (see code snippet > below ? or attached file). > > A related problem; Since distribution of a DMPlex reorders the point > indices, how to do I map between distributed point indices and the original > point indices. > > And a final question: After distributing a DMPlex, some of the vertices > are shared and exists in multiple instances. When adding dofs to these, how > I I know if dof is owned by the current instance or it is a ghost dof? > > I hope someone can point me in the right direction :) > > Kind regards, > Morten > > > Code snippet > > PetscInt from,to,dof,off; > DMPlexGetHeightStratum(dm, 0,&from, &to); > for (int cellIndex=from;cellIndex const PetscInt *edges; > PetscInt numEdges; > DMPlexGetConeSize(dm, cellIndex, &numEdges); > DMPlexGetCone(dm, cellIndex, &edges); > for (int e = 0;e int edgeIndex = edges[e]; > const PetscInt *vertices; > PetscInt numVertices; > DMPlexGetConeSize(dm, edgeIndex, &numVertices); > DMPlexGetCone(dm, edgeIndex, &vertices); > for (int v = 0;v int vertexIndex = vertices[v]; > > For a non distibuted mesh the top of the hasse diagram looks like this: > 0 --> 2 > 0 --> 3 > 0 --> 4 > 1 --> 4 > 1 --> 5 > 1 --> 6 > > But when distributing (on two cores) it looks like this, where both cells > maps to the same edges (true for both cores): > 0 --> 11 > 0 --> 12 > 0 --> 13 > 1 --> 11 > 1 --> 12 > 1 --> 13 > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From rlmackie862 at gmail.com Tue Nov 17 11:07:20 2015 From: rlmackie862 at gmail.com (Randall Mackie) Date: Tue, 17 Nov 2015 09:07:20 -0800 Subject: [petsc-users] problem using 64-bit-indices Message-ID: <06F81B6A-E173-4E22-A650-C75E2DC28993@gmail.com> I ran into a problem yesterday where a call to DMDACreate3d gave an error message about the size being too big and that I should use ?with-64-bit-indices. So I recompiled PETSc (latest version, 3.6.2) with that option, but when I recompiled my code, I found the following errors: call VecGetArrayF90(vSeqZero,ptr_a,ierr) 1 Error: Type mismatch in argument 'ierr' at (1); passed INTEGER(8) to INTEGER(4) call VecRestoreArrayF90(vSeqZero,ptr_a,ierr) 1 Error: Type mismatch in argument 'ierr' at (1); passed INTEGER(8) to INTEGER(4) call DMDAVecGetArrayF90(da,J,ptr_j,ierr) 1 Error: There is no specific subroutine for the generic 'dmdavecgetarrayf90' at (1) So strangely, even though all my variables are defined like PetscInt, PetscReal, etc, VecGetArrayF90 is still expecting a 4 byte integer, which is easy enough for me to do, but there seems to be a problem with DMDAVecGetArrayF90. In the meantime, I?ll switch to VecGetArrayF90, and retry my code, but I thought I?d pass along these error messages. Randy M. From balay at mcs.anl.gov Tue Nov 17 11:12:43 2015 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 17 Nov 2015 11:12:43 -0600 Subject: [petsc-users] problem using 64-bit-indices In-Reply-To: <06F81B6A-E173-4E22-A650-C75E2DC28993@gmail.com> References: <06F81B6A-E173-4E22-A650-C75E2DC28993@gmail.com> Message-ID: ierr is of type 'PetscErrorCode' - not 'PetscInt' Perhaps the dmdavecgetarrayf90 issue is also due to the difference in types used for parameters Satish On Tue, 17 Nov 2015, Randall Mackie wrote: > I ran into a problem yesterday where a call to DMDACreate3d gave an error message about the size being too big and that I should use ?with-64-bit-indices. > > So I recompiled PETSc (latest version, 3.6.2) with that option, but when I recompiled my code, I found the following errors: > > > call VecGetArrayF90(vSeqZero,ptr_a,ierr) > 1 > Error: Type mismatch in argument 'ierr' at (1); passed INTEGER(8) to INTEGER(4) > > call VecRestoreArrayF90(vSeqZero,ptr_a,ierr) > 1 > Error: Type mismatch in argument 'ierr' at (1); passed INTEGER(8) to INTEGER(4) > > call DMDAVecGetArrayF90(da,J,ptr_j,ierr) > 1 > Error: There is no specific subroutine for the generic 'dmdavecgetarrayf90' at (1) > > > So strangely, even though all my variables are defined like PetscInt, PetscReal, etc, VecGetArrayF90 is still expecting a 4 byte integer, which is easy enough for me to do, but there seems to be a problem with DMDAVecGetArrayF90. In the meantime, I?ll switch to VecGetArrayF90, and retry my code, but I thought I?d pass along these error messages. > > > Randy M. From rlmackie862 at gmail.com Tue Nov 17 17:06:16 2015 From: rlmackie862 at gmail.com (Randall Mackie) Date: Tue, 17 Nov 2015 15:06:16 -0800 Subject: [petsc-users] problem using 64-bit-indices In-Reply-To: References: <06F81B6A-E173-4E22-A650-C75E2DC28993@gmail.com> Message-ID: <9CF19846-50D0-43DE-B9BF-7E787F21E9C8@gmail.com> Thanks Satish, This fixed all error messages. Randy > On Nov 17, 2015, at 9:12 AM, Satish Balay wrote: > > ierr is of type 'PetscErrorCode' - not 'PetscInt' > > Perhaps the dmdavecgetarrayf90 issue is also due to the difference in types used for parameters > > Satish > > On Tue, 17 Nov 2015, Randall Mackie wrote: > >> I ran into a problem yesterday where a call to DMDACreate3d gave an error message about the size being too big and that I should use ?with-64-bit-indices. >> >> So I recompiled PETSc (latest version, 3.6.2) with that option, but when I recompiled my code, I found the following errors: >> >> >> call VecGetArrayF90(vSeqZero,ptr_a,ierr) >> 1 >> Error: Type mismatch in argument 'ierr' at (1); passed INTEGER(8) to INTEGER(4) >> >> call VecRestoreArrayF90(vSeqZero,ptr_a,ierr) >> 1 >> Error: Type mismatch in argument 'ierr' at (1); passed INTEGER(8) to INTEGER(4) >> >> call DMDAVecGetArrayF90(da,J,ptr_j,ierr) >> 1 >> Error: There is no specific subroutine for the generic 'dmdavecgetarrayf90' at (1) >> >> >> So strangely, even though all my variables are defined like PetscInt, PetscReal, etc, VecGetArrayF90 is still expecting a 4 byte integer, which is easy enough for me to do, but there seems to be a problem with DMDAVecGetArrayF90. In the meantime, I?ll switch to VecGetArrayF90, and retry my code, but I thought I?d pass along these error messages. >> >> >> Randy M. From aotero at fi.uba.ar Wed Nov 18 08:03:47 2015 From: aotero at fi.uba.ar (Alejandro D Otero) Date: Wed, 18 Nov 2015 11:03:47 -0300 Subject: [petsc-users] DMPlex for high order elements Message-ID: Hi, I am trying to define a spectral element mesh in the frame of DMPlex structure. To begin with I am thinking of quadrilateral 2d elements. Lets say I have a 16 node element with 2 nodes on each edge besides the 2 corners and 4 nodes inside the element. I'd like to treat all nodes as vertex in the DMPlex structure. I am wondering whether I can make internal edge nodes (those not in the extremes of the edge) depend on the corresponding edge and internal nodes (those not on the element edges) depend on the corresponding element? If this is not the right way of doing this from DMPlex philosophical point of view, could you please give me an example? Thanks in advance, best regards, Alejandro -------------- next part -------------- An HTML attachment was scrubbed... URL: From mpovolot at purdue.edu Wed Nov 18 12:28:55 2015 From: mpovolot at purdue.edu (Michael Povolotskyi) Date: Wed, 18 Nov 2015 13:28:55 -0500 Subject: [petsc-users] set limits for a nonlinear step Message-ID: <564CC367.1060403@purdue.edu> Dear Petsc developers, in the old days there was a function SNESLineSearchSetParams How to set a maximum value for the nonlinear step now? Thank you, Michael. -- Michael Povolotskyi, PhD Research Assistant Professor Network for Computational Nanotechnology Hall for Discover and Learning Research, Room 441 West Lafayette, IN 47907 Phone (765) 4949396 From jed at jedbrown.org Wed Nov 18 13:29:58 2015 From: jed at jedbrown.org (Jed Brown) Date: Wed, 18 Nov 2015 12:29:58 -0700 Subject: [petsc-users] DMPlex for high order elements In-Reply-To: References: Message-ID: <87twoj3y2x.fsf@jedbrown.org> Alejandro D Otero writes: > Hi, I am trying to define a spectral element mesh in the frame of DMPlex > structure. > To begin with I am thinking of quadrilateral 2d elements. Lets say I have a > 16 node element with 2 nodes on each edge besides the 2 corners and 4 nodes > inside the element. I'd like to treat all nodes as vertex in the DMPlex > structure. That's not the intended way to use DMPlex. You associate an arbitrary number of dofs with each topological entity (vertex, edge, face, cell). So for your Q_3 elements in 2D, you would put 1 dof at vertices, 2 at edges, and 4 at cell interiors. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From aotero at fi.uba.ar Wed Nov 18 13:32:51 2015 From: aotero at fi.uba.ar (Alejandro D Otero) Date: Wed, 18 Nov 2015 16:32:51 -0300 Subject: [petsc-users] [Posible SPAM] Re: DMPlex for high order elements In-Reply-To: <87twoj3y2x.fsf@jedbrown.org> References: <87twoj3y2x.fsf@jedbrown.org> Message-ID: Ok, I'll think about it. Thank you! On Wed, Nov 18, 2015 at 4:29 PM, Jed Brown wrote: > Alejandro D Otero writes: > > > Hi, I am trying to define a spectral element mesh in the frame of DMPlex > > structure. > > To begin with I am thinking of quadrilateral 2d elements. Lets say I > have a > > 16 node element with 2 nodes on each edge besides the 2 corners and 4 > nodes > > inside the element. I'd like to treat all nodes as vertex in the DMPlex > > structure. > > That's not the intended way to use DMPlex. You associate an arbitrary > number of dofs with each topological entity (vertex, edge, face, cell). > So for your Q_3 elements in 2D, you would put 1 dof at vertices, 2 at > edges, and 4 at cell interiors. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Nov 18 14:09:01 2015 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 18 Nov 2015 14:09:01 -0600 Subject: [petsc-users] set limits for a nonlinear step In-Reply-To: <564CC367.1060403@purdue.edu> References: <564CC367.1060403@purdue.edu> Message-ID: On Wed, Nov 18, 2015 at 12:28 PM, Michael Povolotskyi wrote: > Dear Petsc developers, > in the old days there was a function SNESLineSearchSetParams > How to set a maximum value for the nonlinear step now? > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESLineSearchSetTolerances.html Matt > Thank you, > Michael. > > -- > Michael Povolotskyi, PhD > Research Assistant Professor > Network for Computational Nanotechnology > Hall for Discover and Learning Research, Room 441 > West Lafayette, IN 47907 > Phone (765) 4949396 > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From may at bu.edu Wed Nov 18 14:45:31 2015 From: may at bu.edu (Young, Matthew, Adam) Date: Wed, 18 Nov 2015 20:45:31 +0000 Subject: [petsc-users] Citation for hypre BoomerAMG Message-ID: <17A35C213185A84BB8ED54C88FBFD712D355304D@IST-EX10MBX-3.ad.bu.edu> When I run my code with -citations, it produces @manual{hypre-web-page, title = {{\sl hypre}: High Performance Preconditioners}, organization = {Lawrence Livermore National Laboratory}, note = {\url{http://www.llnl.gov/CASC/hypre/}} } for BoomerAMG. However, the result of \citep{hypre-web-page} is "[hyp]" in the text and a bibliography entry with only "()" for the author. This seems incorrect. If this is the intended output, can you tell me what I'm doing wrong in LaTeX? The other references (two for PETSc and one for GMRES) are fine. --Matt -------------------------------------------------------------- Matthew Young Graduate Student Boston University Dept. of Astronomy -------------------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Nov 18 15:03:46 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 18 Nov 2015 15:03:46 -0600 Subject: [petsc-users] Citation for hypre BoomerAMG In-Reply-To: <17A35C213185A84BB8ED54C88FBFD712D355304D@IST-EX10MBX-3.ad.bu.edu> References: <17A35C213185A84BB8ED54C88FBFD712D355304D@IST-EX10MBX-3.ad.bu.edu> Message-ID: > On Nov 18, 2015, at 2:45 PM, Young, Matthew, Adam wrote: > > When I run my code with -citations, it produces > > @manual{hypre-web-page, > title = {{\sl hypre}: High Performance Preconditioners}, > organization = {Lawrence Livermore National Laboratory}, > note = {\url{http://www.llnl.gov/CASC/hypre/}} > } > > for BoomerAMG. However, the result of \citep{hypre-web-page} is "[hyp]" in the text Is this ok? Or should it produce something different in the text? I think your bibliography style file uses some shortened form of the bibtex key for the this. > and a bibliography entry with only "()" for the author. This seems incorrect. If this is the intended output, can you tell me what I'm doing wrong in LaTeX? The other references (two for PETSc and one for GMRES) are fine. Likely your bibliography style file requires an author field and so is putting in () since this bibliography entry doesn't have an author entry. You can try adding an author field manually and see if that improves things. hypre folks, We'd be happy to list some authors here (or not) just let us know what you prefer? Barry > > --Matt > > -------------------------------------------------------------- > Matthew Young > Graduate Student > Boston University Dept. of Astronomy > -------------------------------------------------------------- From may at bu.edu Wed Nov 18 15:28:13 2015 From: may at bu.edu (Young, Matthew, Adam) Date: Wed, 18 Nov 2015 21:28:13 +0000 Subject: [petsc-users] Citation for hypre BoomerAMG In-Reply-To: References: <17A35C213185A84BB8ED54C88FBFD712D355304D@IST-EX10MBX-3.ad.bu.edu>, Message-ID: <17A35C213185A84BB8ED54C88FBFD712D355312B@IST-EX10MBX-3.ad.bu.edu> Barry, You seem to be right that my style file was looking for an author; it seems to have also been looking for a year, hence the empty brackets. Unfortunately, adding an author and a year caused only those two fields to be printed in the bibliography (leaving out the rest of the info generated by -citation). I'm using agufull08.bst. I'll wait to see what the Hypre folks say. I just want to give them proper citation. --Matt -------------------------------------------------------------- Matthew Young Graduate Student Boston University Dept. of Astronomy -------------------------------------------------------------- ________________________________________ From: Barry Smith [bsmith at mcs.anl.gov] Sent: Wednesday, November 18, 2015 4:03 PM To: Young, Matthew, Adam Cc: petsc-users at mcs.anl.gov; hypre-support at llnl.gov Subject: Re: [petsc-users] Citation for hypre BoomerAMG > On Nov 18, 2015, at 2:45 PM, Young, Matthew, Adam wrote: > > When I run my code with -citations, it produces > > @manual{hypre-web-page, > title = {{\sl hypre}: High Performance Preconditioners}, > organization = {Lawrence Livermore National Laboratory}, > note = {\url{http://www.llnl.gov/CASC/hypre/}} > } > > for BoomerAMG. However, the result of \citep{hypre-web-page} is "[hyp]" in the text Is this ok? Or should it produce something different in the text? I think your bibliography style file uses some shortened form of the bibtex key for the this. > and a bibliography entry with only "()" for the author. This seems incorrect. If this is the intended output, can you tell me what I'm doing wrong in LaTeX? The other references (two for PETSc and one for GMRES) are fine. Likely your bibliography style file requires an author field and so is putting in () since this bibliography entry doesn't have an author entry. You can try adding an author field manually and see if that improves things. hypre folks, We'd be happy to list some authors here (or not) just let us know what you prefer? Barry > > --Matt > > -------------------------------------------------------------- > Matthew Young > Graduate Student > Boston University Dept. of Astronomy > -------------------------------------------------------------- From hypre-support at llnl.gov Wed Nov 18 18:28:03 2015 From: hypre-support at llnl.gov (Rob Falgout hypre Tracker) Date: Thu, 19 Nov 2015 07:28:03 +0700 Subject: [petsc-users] [issue1376] Citation for hypre BoomerAMG In-Reply-To: <17A35C213185A84BB8ED54C88FBFD712D355312B@IST-EX10MBX-3.ad.bu.edu> Message-ID: <09ACAD942F7D15419E1F3DBAEAB0786786D21EFC@PRDEXMBX-02.the-lab.llnl.gov> Rob Falgout added the comment: Hi Matt, We have recently been using the web page as the reference for hypre, with no author list (the team and code are continually changing). So, the bib entry looks fine. To get rid of the error, try changing '@manual' to '@misc'. Hope this helps. Thanks for asking! -Rob > -----Original Message----- > From: Young, Matthew, Adam hypre Tracker [mailto:hypre- > support at llnl.gov] > Sent: Wednesday, November 18, 2015 1:34 PM > To: Li, Ruipeng; may at bu.edu; Osei-Kuffuor, Daniel; rfalgout at llnl.gov; > Schroder, Jacob B.; tzanio at llnl.gov; umyang at llnl.gov; Wang, Lu > Subject: [issue1376] [petsc-users] Citation for hypre BoomerAMG > > > New submission from Young, Matthew, Adam : > > Barry, > > You seem to be right that my style file was looking for an author; it seems to > have also been looking for a year, hence the empty brackets. Unfortunately, > adding an author and a year caused only those two fields to be printed in the > bibliography (leaving out the rest of the info generated by -citation). I'm > using agufull08.bst. > > I'll wait to see what the Hypre folks say. I just want to give them proper > citation. > > --Matt > -------------------------------------------------------------- > Matthew Young > Graduate Student > Boston University Dept. of Astronomy > -------------------------------------------------------------- > > ________________________________________ > From: Barry Smith [bsmith at mcs.anl.gov] > Sent: Wednesday, November 18, 2015 4:03 PM > To: Young, Matthew, Adam > Cc: petsc-users at mcs.anl.gov; hypre-support at llnl.gov > Subject: Re: [petsc-users] Citation for hypre BoomerAMG > > > On Nov 18, 2015, at 2:45 PM, Young, Matthew, Adam > wrote: > > > > When I run my code with -citations, it produces > > > > @manual{hypre-web-page, > > title = {{\sl hypre}: High Performance Preconditioners}, > > organization = {Lawrence Livermore National Laboratory}, > > note = {\url{http://www.llnl.gov/CASC/hypre/}} > > } > > > > for BoomerAMG. However, the result of \citep{hypre-web-page} is > > "[hyp]" in the text > > Is this ok? Or should it produce something different in the text? I think your > bibliography style file uses some shortened form of the bibtex key for the > this. > > > and a bibliography entry with only "()" for the author. This seems incorrect. > If this is the intended output, can you tell me what I'm doing wrong in LaTeX? > The other references (two for PETSc and one for GMRES) are fine. > > Likely your bibliography style file requires an author field and so is putting in > () since this bibliography entry doesn't have an author entry. You can try > adding an author field manually and see if that improves things. > > hypre folks, > > We'd be happy to list some authors here (or not) just let us know what > you prefer? > > Barry > > > > > --Matt > > > > -------------------------------------------------------------- > > Matthew Young > > Graduate Student > > Boston University Dept. of Astronomy > > -------------------------------------------------------------- > > ---------- > messages: 6809 > nosy: bsmith, li50, may, oseikuffuor1, petsc-users, rfalgout, schroder2, > tzanio, ulrikey, wang84 > status: unread > title: [petsc-users] Citation for hypre BoomerAMG > > ____________________________________________ > hypre Issue Tracker > > ____________________________________________ ____________________________________________ hypre Issue Tracker ____________________________________________ From bsmith at mcs.anl.gov Wed Nov 18 23:08:31 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 18 Nov 2015 23:08:31 -0600 Subject: [petsc-users] Understanding PETSC: boundary layer flow with SNES In-Reply-To: <564B4286.30701@gmail.com> References: <564B4286.30701@gmail.com> Message-ID: <47846740-0CE7-4FC9-AF73-FA4E6E07FDF9@mcs.anl.gov> Sorry no one answered sooner. > On Nov 17, 2015, at 9:06 AM, Pavel Schor wrote: > > Dear PETSC users, > I am a newbie to PETSC and I try to understand it. My first attempt is to solve 2D laminar boundary layer equations: > > u*du/dx +v*du/dy = Ue*(dUe/dx). -nu*d2u/dy2 > dudx + dvdy = 0 > > B.C.: > y=0: u=0, v=0 > y=H: u(x)=Ue(x) > Where H is height of the domain, Ue(x) is edge velocity given by a potential flow solver. I assume Ue(x)=10.0 > > > I have several questions regarding the workflow in PETSC, I took SNES example29 (2D cavity flow) and I tried to modify it: > > 1) There are vectors x and f. I suppose vector x to represent the values of velocity and vector f to represent velocity residuals. Please is this correct? > > Based on example29 and my assumption, I calculated the velocity: > dhx = (PetscReal)(info->mx-1); dhy = (PetscReal)(info->my-1); > dx = 1.0/dhx; dy = 1./dhy; > u = x[j][i].u; > v = x[j][i].v; > dudx = (x[j][i+1].u - x[j][i-1].u)/2.0/dx; > dudy = (x[j+1][i].u - x[j-1][i].u)/2.0/dy; > dvdy = (x[j+1][i].v - x[j-1][i].v)/2.0/dy; > d2udy2 = (-2.0*u + x[j-1][i].u + x[j+1][i].u)/dy/dy; > > /* U velocity */ > f[j][i].u = u*dudx +v*dudy -Ue*dUedx -nu*d2udy2; > > /* V velocity */ > f[j][i].v = dudx + dvdy; > > The code does not work. With initial conditions x[j][i].u=Ue; and x[0][i].u=0.0; the result is the same as the initial conditions. I haven't look in detailed but your general approach seems ok. You can run for a very small grid and print out the input and out vectors to see why they are not changing. > > 2) In SNES example29, there are boundary conditions specified on vector f? For example: > > /* Test whether we are on the top edge of the global array */ > if (yinte == info->my) { > j = info->my - 1; > yinte = yinte - 1; > /* top edge */ > for (i=info->xs; ixs+info->xm; i++) { > f[j][i].u = x[j][i].u - lid; > f[j][i].v = x[j][i].v; > > I don't understand the last two lines. I just deleted the conditions for left and right edges and replaced f[j][i].u = x[j][i].u - lid; with f[j][i].u = x[j][i].u - Ue; Basically on the boundaries instead of using finite differences to satisfy the PDE we use an equation to satisfy the boundary conditions so what you do seem reasonable. > > 3) Please could someone explain the normalization on following lines? (Taken from example 29) > > /* > Define mesh intervals ratios for uniform grid. > > Note: FD formulae below are normalized by multiplying through by > local volume element (i.e. hx*hy) to obtain coefficients O(1) in two dimensions. > > */ > dhx = (PetscReal)(info->mx-1); dhy = (PetscReal) (info->my-1); > hx = 1.0/dhx; hy = 1.0/dhy; > hxdhy = hx*dhy; hydhx = hy*dhx; When using geometric multigrid one must be careful that the interpolation function used between levels is correct. The easiest way to insure this is to use the same scaling for equations as the one gets with the finite element method. If you are not using multigrid the scaling is not important. We always like to use finite element scaling with our examples so one can use multigrid to solve the linear systems. Barry > > > Thanks in advance & Kind regards > Pavel Schor > PhD. student, Institute of aerospace engineering, Brno University of technology From davydden at gmail.com Thu Nov 19 03:49:29 2015 From: davydden at gmail.com (Denis Davydov) Date: Thu, 19 Nov 2015 10:49:29 +0100 Subject: [petsc-users] [SLEPc] GD is not deterministic when using different number of cores Message-ID: <8E937148-1B39-443F-9A80-40776619472F@gmail.com> Dear all, I was trying to get some scaling results for the GD eigensolver as applied to the density functional theory. Interestingly enough, the number of self-consistent iterations (solution of couple eigenvalue problem and poisson equations) depends on the number of MPI cores used. For my case the range of iterations is 19-24 for MPI cores between 2 and 160. That makes the whole scaling check useless as the eigenproblem is solved different number of times. That is **not** the case when I use Krylov-Schur eigensolver with zero shift, which makes me believe that I am missing some settings on GD to make it fully deterministic. The only non-deterministic part I am currently aware of is the initial subspace for the first SC iterations. But that?s the case for both KS and GD. For subsequent iterations I provide previously obtained eigenvectors as initial subspace. Certainly there will be some round-off error due to different partition of DoFs for different number of MPI cores, but i don?t expect it to have such a strong influence. Especially given the fact that I don?t see this problem with KS. Below is the output of -eps-view for GD with -eps_type gd -eps_harmonic -st_pc_type bjacobi -eps_gd_krylov_start -eps_target -10.0 I would appreciate any suggestions on how to address the issue. As a side question, why GD uses KSP pre-only? It could as well be using a proper linear solver to apply K^{-1} in the expansion state -- I assume the Olsen variant is the default in SLEPc? Kind regards, Denis EPS Object: 4 MPI processes type: gd Davidson: search subspace is B-orthogonalized Davidson: block size=1 Davidson: type of the initial subspace: Krylov Davidson: size of the subspace after restarting: 6 Davidson: number of vectors after restarting from the previous iteration: 0 problem type: generalized symmetric eigenvalue problem extraction type: harmonic Ritz selected portion of the spectrum: closest to target: -10 (in magnitude) postprocessing eigenvectors with purification number of eigenvalues (nev): 87 number of column vectors (ncv): 175 maximum dimension of projected problem (mpd): 175 maximum number of iterations: 57575 tolerance: 1e-10 convergence test: absolute dimension of user-provided initial space: 87 BV Object: 4 MPI processes type: svec 175 columns of global length 57575 vector orthogonalization method: classical Gram-Schmidt orthogonalization refinement: if needed (eta: 0.7071) block orthogonalization method: Gram-Schmidt non-standard inner product Mat Object: 4 MPI processes type: mpiaij rows=57575, cols=57575 total: nonzeros=1.51135e+06, allocated nonzeros=1.51135e+06 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines doing matmult as a single matrix-matrix product DS Object: 4 MPI processes type: gnhep ST Object: 4 MPI processes type: precond shift: -10 number of matrices: 2 all matrices have different nonzero pattern KSP Object: (st_) 4 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-08, absolute=1e-50, divergence=10000 left preconditioning using DEFAULT norm type for convergence test PC Object: (st_) 4 MPI processes type: bjacobi block Jacobi: number of blocks = 4 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (st_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (st_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=15557, cols=15557 package used to perform factorization: petsc total: nonzeros=388947, allocated nonzeros=388947 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=15557, cols=15557 total: nonzeros=388947, allocated nonzeros=388947 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 4 MPI processes type: mpiaij rows=57575, cols=57575 total: nonzeros=1.51135e+06, allocated nonzeros=1.51135e+06 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines From jroman at dsic.upv.es Thu Nov 19 04:19:31 2015 From: jroman at dsic.upv.es (Jose E. Roman) Date: Thu, 19 Nov 2015 11:19:31 +0100 Subject: [petsc-users] [SLEPc] GD is not deterministic when using different number of cores In-Reply-To: <8E937148-1B39-443F-9A80-40776619472F@gmail.com> References: <8E937148-1B39-443F-9A80-40776619472F@gmail.com> Message-ID: <24297832-E5E4-419B-91D3-09D38F267E9F@dsic.upv.es> > El 19 nov 2015, a las 10:49, Denis Davydov escribi?: > > Dear all, > > I was trying to get some scaling results for the GD eigensolver as applied to the density functional theory. > Interestingly enough, the number of self-consistent iterations (solution of couple eigenvalue problem and poisson equations) > depends on the number of MPI cores used. For my case the range of iterations is 19-24 for MPI cores between 2 and 160. > That makes the whole scaling check useless as the eigenproblem is solved different number of times. > > That is **not** the case when I use Krylov-Schur eigensolver with zero shift, which makes me believe that I am missing some settings on GD to make it fully deterministic. The only non-deterministic part I am currently aware of is the initial subspace for the first SC iterations. But that?s the case for both KS and GD. For subsequent iterations I provide previously obtained eigenvectors as initial subspace. > > Certainly there will be some round-off error due to different partition of DoFs for different number of MPI cores, > but i don?t expect it to have such a strong influence. Especially given the fact that I don?t see this problem with KS. > > Below is the output of -eps-view for GD with -eps_type gd -eps_harmonic -st_pc_type bjacobi -eps_gd_krylov_start -eps_target -10.0 > I would appreciate any suggestions on how to address the issue. The block Jacobi preconditioner differs when you change the number of processes. This will probably make GD iterate more when you use more processes. > > As a side question, why GD uses KSP pre-only? It could as well be using a proper linear solver to apply K^{-1} in the expansion state -- You can achieve that with PCKSP. But if you are going to do that, why not using JD instead of GD? > I assume the Olsen variant is the default in SLEPc? Yes. Jose > > Kind regards, > Denis > > > EPS Object: 4 MPI processes > type: gd > Davidson: search subspace is B-orthogonalized > Davidson: block size=1 > Davidson: type of the initial subspace: Krylov > Davidson: size of the subspace after restarting: 6 > Davidson: number of vectors after restarting from the previous iteration: 0 > problem type: generalized symmetric eigenvalue problem > extraction type: harmonic Ritz > selected portion of the spectrum: closest to target: -10 (in magnitude) > postprocessing eigenvectors with purification > number of eigenvalues (nev): 87 > number of column vectors (ncv): 175 > maximum dimension of projected problem (mpd): 175 > maximum number of iterations: 57575 > tolerance: 1e-10 > convergence test: absolute > dimension of user-provided initial space: 87 > BV Object: 4 MPI processes > type: svec > 175 columns of global length 57575 > vector orthogonalization method: classical Gram-Schmidt > orthogonalization refinement: if needed (eta: 0.7071) > block orthogonalization method: Gram-Schmidt > non-standard inner product > Mat Object: 4 MPI processes > type: mpiaij > rows=57575, cols=57575 > total: nonzeros=1.51135e+06, allocated nonzeros=1.51135e+06 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > doing matmult as a single matrix-matrix product > DS Object: 4 MPI processes > type: gnhep > ST Object: 4 MPI processes > type: precond > shift: -10 > number of matrices: 2 > all matrices have different nonzero pattern > KSP Object: (st_) 4 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-08, absolute=1e-50, divergence=10000 > left preconditioning > using DEFAULT norm type for convergence test > PC Object: (st_) 4 MPI processes > type: bjacobi > block Jacobi: number of blocks = 4 > Local solve is same for all blocks, in the following KSP and PC objects: > KSP Object: (st_sub_) 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > left preconditioning > using NONE norm type for convergence test > PC Object: (st_sub_) 1 MPI processes > type: ilu > ILU: out-of-place factorization > 0 levels of fill > tolerance for zero pivot 2.22045e-14 > matrix ordering: natural > factor fill ratio given 1, needed 1 > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=15557, cols=15557 > package used to perform factorization: petsc > total: nonzeros=388947, allocated nonzeros=388947 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=15557, cols=15557 > total: nonzeros=388947, allocated nonzeros=388947 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 4 MPI processes > type: mpiaij > rows=57575, cols=57575 > total: nonzeros=1.51135e+06, allocated nonzeros=1.51135e+06 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > > > From carlesbona at gmail.com Thu Nov 19 11:25:13 2015 From: carlesbona at gmail.com (Carles Bona) Date: Thu, 19 Nov 2015 18:25:13 +0100 Subject: [petsc-users] adding mat rows Message-ID: Dear all, I would like to add some of my equations before I modify them. I haven't found any high level function that would allow me to add rows of a matrix (I am working with a parallel BAIJ). Is there any nice way of doing this? I have tried with MatGetRow/MatRestoreRow, but I am struggling a bit to retain the cols and vals, as only one processor can call MatGetRow but if only that processor tries to allocate memory then one gets a segmentation fault. I guess I should allocate enough memory on all processors... If I refrain from storing the cols and vals I need to call MatSetValues before returning the pointer, with a subsequent call to assemblybegin/end for each row, which slows down the code. The other option would be to forget about these row additions after the matrix has been filled and try to fill it while taking into account these row additions at the same time. I guess I need to be constantly checking for indices then. So, which option (not necessarily mentioned here) would you reccomend? Thanks a lot, Carles -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Nov 19 12:55:40 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 19 Nov 2015 12:55:40 -0600 Subject: [petsc-users] adding mat rows In-Reply-To: References: Message-ID: <3326A6C3-43D6-45D3-9431-E4D995F2CE0D@mcs.anl.gov> > On Nov 19, 2015, at 11:25 AM, Carles Bona wrote: > > Dear all, > > I would like to add some of my equations before I modify them. Please explain what you mean by this. Algebraically exactly what do you want to do? > I haven't found any high level function that would allow me to add rows of a matrix (I am working with a parallel BAIJ). Is there any nice way of doing this? > > I have tried with MatGetRow/MatRestoreRow, but I am struggling a bit to retain the cols and vals, as only one processor can call MatGetRow but if only that processor tries to allocate memory then one gets a segmentation fault. I guess I should allocate enough memory on all processors... > > If I refrain from storing the cols and vals I need to call MatSetValues before returning the pointer, with a subsequent call to assemblybegin/end for each row, which slows down the code. > > The other option would be to forget about these row additions after the matrix has been filled and try to fill it while taking into account these row additions at the same time. I guess I need to be constantly checking for indices then. > > So, which option (not necessarily mentioned here) would you reccomend? > > Thanks a lot, > > Carles From carlesbona at gmail.com Thu Nov 19 13:48:09 2015 From: carlesbona at gmail.com (Carles Bona) Date: Thu, 19 Nov 2015 20:48:09 +0100 Subject: [petsc-users] adding mat rows In-Reply-To: <3326A6C3-43D6-45D3-9431-E4D995F2CE0D@mcs.anl.gov> References: <3326A6C3-43D6-45D3-9431-E4D995F2CE0D@mcs.anl.gov> Message-ID: I have a system of equations: a11*x1 + a12*x2 + ... + a1n*xn = b1 a21*x1 + a22*x2 + ... + a2n*xn = b2 . . . an1*x1 + an2*x2 + ... + ann*xn = bn Let's say I want to modify the first equation, but I don't want to lose the information there, so I will add first the first and second equations, store the result in the second equation and then modify the first equation. Like this: x1 - x2 = 0 (a11+a21)*x1 + (a12+a22)*x2 + ... + (a1n+a2n)*xn = b1+b2 . . . an1*x1 + an2*x2 + ... + ann*xn = bn And I want to do this for a few rows of my matrix. I have already built the nonzero structure so that these additions can be done without hitting a non preallocated location (for example a case where a21 was never allocated because it was meant to be zero always and now, with a11 present, it's different than zero). Any hints? Thanks! Carles El dia 19/11/2015 19:55, "Barry Smith" va escriure: > > > On Nov 19, 2015, at 11:25 AM, Carles Bona wrote: > > > > Dear all, > > > > I would like to add some of my equations before I modify them. > > Please explain what you mean by this. Algebraically exactly what do you > want to do? > > > > I haven't found any high level function that would allow me to add rows > of a matrix (I am working with a parallel BAIJ). Is there any nice way of > doing this? > > > > I have tried with MatGetRow/MatRestoreRow, but I am struggling a bit to > retain the cols and vals, as only one processor can call MatGetRow but if > only that processor tries to allocate memory then one gets a segmentation > fault. I guess I should allocate enough memory on all processors... > > > > If I refrain from storing the cols and vals I need to call MatSetValues > before returning the pointer, with a subsequent call to assemblybegin/end > for each row, which slows down the code. > > > > The other option would be to forget about these row additions after the > matrix has been filled and try to fill it while taking into account these > row additions at the same time. I guess I need to be constantly checking > for indices then. > > > > So, which option (not necessarily mentioned here) would you reccomend? > > > > Thanks a lot, > > > > Carles > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Nov 19 14:05:25 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 19 Nov 2015 14:05:25 -0600 Subject: [petsc-users] adding mat rows In-Reply-To: References: <3326A6C3-43D6-45D3-9431-E4D995F2CE0D@mcs.anl.gov> Message-ID: So long as you arrange things so that "partner rows" are always on the same process I think you should be able to do the following. Call MatGetRow() on the first row, pull out the values of interest, call MatRestoreRow() then call MatSetValues() on the second row with ADD_VALUES. Call MatSetValues() on the first row with the values you want to set with INSERT_VALUES. Not that each process can call MatGetRow (and hence the MatSetValues) a different number of times. Since you cannot mix calls of ADD and INSERT. Have all the processes do their ADDING to all the second partner rows then call MatAssemblyBegin/End once then have all the processes do their setting of values into the first partner rows and call MatAssemblyBegin/End again. There is no need to call the MatAssemblyBegin/End once for each row in this case. Barry If the partner rows are on different processes it would require much more work. This seems like an odd thing to do, by changing some rows you are still losing information even if you have changed other rows by summing in the partner row. > On Nov 19, 2015, at 1:48 PM, Carles Bona wrote: > > I have a system of equations: > > a11*x1 + a12*x2 + ... + a1n*xn = b1 > a21*x1 + a22*x2 + ... + a2n*xn = b2 > . > . > . > an1*x1 + an2*x2 + ... + ann*xn = bn > > Let's say I want to modify the first equation, but I don't want to lose the information there, so I will add first the first and second equations, store the result in the second equation and then modify the first equation. Like this: > > x1 - x2 = 0 > (a11+a21)*x1 + (a12+a22)*x2 + ... + (a1n+a2n)*xn = b1+b2 > . > . > . > an1*x1 + an2*x2 + ... + ann*xn = bn > > And I want to do this for a few rows of my matrix. I have already built the nonzero structure so that these additions can be done without hitting a non preallocated location (for example a case where a21 was never allocated because it was meant to be zero always and now, with a11 present, it's different than zero). > > Any hints? > > Thanks! > > Carles > > El dia 19/11/2015 19:55, "Barry Smith" va escriure: > > > On Nov 19, 2015, at 11:25 AM, Carles Bona wrote: > > > > Dear all, > > > > I would like to add some of my equations before I modify them. > > Please explain what you mean by this. Algebraically exactly what do you want to do? > > > > I haven't found any high level function that would allow me to add rows of a matrix (I am working with a parallel BAIJ). Is there any nice way of doing this? > > > > I have tried with MatGetRow/MatRestoreRow, but I am struggling a bit to retain the cols and vals, as only one processor can call MatGetRow but if only that processor tries to allocate memory then one gets a segmentation fault. I guess I should allocate enough memory on all processors... > > > > If I refrain from storing the cols and vals I need to call MatSetValues before returning the pointer, with a subsequent call to assemblybegin/end for each row, which slows down the code. > > > > The other option would be to forget about these row additions after the matrix has been filled and try to fill it while taking into account these row additions at the same time. I guess I need to be constantly checking for indices then. > > > > So, which option (not necessarily mentioned here) would you reccomend? > > > > Thanks a lot, > > > > Carles > From jed at jedbrown.org Thu Nov 19 14:11:46 2015 From: jed at jedbrown.org (Jed Brown) Date: Thu, 19 Nov 2015 13:11:46 -0700 Subject: [petsc-users] adding mat rows In-Reply-To: References: <3326A6C3-43D6-45D3-9431-E4D995F2CE0D@mcs.anl.gov> Message-ID: <874mgh4um5.fsf@jedbrown.org> Carles Bona writes: > Let's say I want to modify the first equation, but I don't want to lose the > information there, 1. The operation you describe still loses information. 2. Consider expressing your row operations as matrix-matrix multiplication. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From carlesbona at gmail.com Thu Nov 19 14:28:52 2015 From: carlesbona at gmail.com (Carles Bona) Date: Thu, 19 Nov 2015 21:28:52 +0100 Subject: [petsc-users] adding mat rows In-Reply-To: <874mgh4um5.fsf@jedbrown.org> References: <3326A6C3-43D6-45D3-9431-E4D995F2CE0D@mcs.anl.gov> <874mgh4um5.fsf@jedbrown.org> Message-ID: Sorry, yes, I meant losing less information than I would lose without adding beforehand. Rows are in fact on different processors... So yes, elementary matrix multiplication sounds like a cool idea! Thanks a lot!! Carles El dia 19/11/2015 21:12, "Jed Brown" va escriure: > Carles Bona writes: > > > Let's say I want to modify the first equation, but I don't want to lose > the > > information there, > > 1. The operation you describe still loses information. > > 2. Consider expressing your row operations as matrix-matrix multiplication. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Thu Nov 19 14:49:46 2015 From: jed at jedbrown.org (Jed Brown) Date: Thu, 19 Nov 2015 13:49:46 -0700 Subject: [petsc-users] adding mat rows In-Reply-To: References: <3326A6C3-43D6-45D3-9431-E4D995F2CE0D@mcs.anl.gov> <874mgh4um5.fsf@jedbrown.org> Message-ID: <871tbl4sut.fsf@jedbrown.org> Carles Bona writes: > Sorry, yes, I meant losing less information than I would lose without > adding beforehand. That's not true either (as described). You're replacing two equations with one equation, so no matter how you do it, you're opening a 1D space. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From may at bu.edu Thu Nov 19 18:55:56 2015 From: may at bu.edu (Young, Matthew, Adam) Date: Fri, 20 Nov 2015 00:55:56 +0000 Subject: [petsc-users] [issue1376] Citation for hypre BoomerAMG In-Reply-To: <09ACAD942F7D15419E1F3DBAEAB0786786D21EFC@PRDEXMBX-02.the-lab.llnl.gov> References: <17A35C213185A84BB8ED54C88FBFD712D355312B@IST-EX10MBX-3.ad.bu.edu>, <09ACAD942F7D15419E1F3DBAEAB0786786D21EFC@PRDEXMBX-02.the-lab.llnl.gov> Message-ID: <17A35C213185A84BB8ED54C88FBFD712D35548DA@IST-EX10MBX-3.ad.bu.edu> Hi Rob, I tried '@misc', as well as '@online', with no success. I also played around with various fields (e.g. key, author, and date) but didn't get anything from BibTex that looked presentable. This is not a complete disaster, since I need to submit the manuscript with the .bbl file pasted into the document, so I can just hardcode a suitable reference then. Alternatively, I can include the url in a footnote. Thanks for your suggestions and for the great solver. --Matt -------------------------------------------------------------- Matthew Young Graduate Student Boston University Dept. of Astronomy -------------------------------------------------------------- ________________________________________ From: Rob Falgout hypre Tracker [hypre-support at llnl.gov] Sent: Wednesday, November 18, 2015 7:28 PM To: bsmith at mcs.anl.gov; li50 at llnl.gov; Young, Matthew, Adam; oseikuffuor1 at llnl.gov; petsc-users at mcs.anl.gov; schroder2 at llnl.gov; tzanio at llnl.gov; umyang at llnl.gov; wang84 at llnl.gov Subject: [issue1376] [petsc-users] Citation for hypre BoomerAMG Rob Falgout added the comment: Hi Matt, We have recently been using the web page as the reference for hypre, with no author list (the team and code are continually changing). So, the bib entry looks fine. To get rid of the error, try changing '@manual' to '@misc'. Hope this helps. Thanks for asking! -Rob > -----Original Message----- > From: Young, Matthew, Adam hypre Tracker [mailto:hypre- > support at llnl.gov] > Sent: Wednesday, November 18, 2015 1:34 PM > To: Li, Ruipeng; may at bu.edu; Osei-Kuffuor, Daniel; rfalgout at llnl.gov; > Schroder, Jacob B.; tzanio at llnl.gov; umyang at llnl.gov; Wang, Lu > Subject: [issue1376] [petsc-users] Citation for hypre BoomerAMG > > > New submission from Young, Matthew, Adam : > > Barry, > > You seem to be right that my style file was looking for an author; it seems to > have also been looking for a year, hence the empty brackets. Unfortunately, > adding an author and a year caused only those two fields to be printed in the > bibliography (leaving out the rest of the info generated by -citation). I'm > using agufull08.bst. > > I'll wait to see what the Hypre folks say. I just want to give them proper > citation. > > --Matt > -------------------------------------------------------------- > Matthew Young > Graduate Student > Boston University Dept. of Astronomy > -------------------------------------------------------------- > > ________________________________________ > From: Barry Smith [bsmith at mcs.anl.gov] > Sent: Wednesday, November 18, 2015 4:03 PM > To: Young, Matthew, Adam > Cc: petsc-users at mcs.anl.gov; hypre-support at llnl.gov > Subject: Re: [petsc-users] Citation for hypre BoomerAMG > > > On Nov 18, 2015, at 2:45 PM, Young, Matthew, Adam > wrote: > > > > When I run my code with -citations, it produces > > > > @manual{hypre-web-page, > > title = {{\sl hypre}: High Performance Preconditioners}, > > organization = {Lawrence Livermore National Laboratory}, > > note = {\url{http://www.llnl.gov/CASC/hypre/}} > > } > > > > for BoomerAMG. However, the result of \citep{hypre-web-page} is > > "[hyp]" in the text > > Is this ok? Or should it produce something different in the text? I think your > bibliography style file uses some shortened form of the bibtex key for the > this. > > > and a bibliography entry with only "()" for the author. This seems incorrect. > If this is the intended output, can you tell me what I'm doing wrong in LaTeX? > The other references (two for PETSc and one for GMRES) are fine. > > Likely your bibliography style file requires an author field and so is putting in > () since this bibliography entry doesn't have an author entry. You can try > adding an author field manually and see if that improves things. > > hypre folks, > > We'd be happy to list some authors here (or not) just let us know what > you prefer? > > Barry > > > > > --Matt > > > > -------------------------------------------------------------- > > Matthew Young > > Graduate Student > > Boston University Dept. of Astronomy > > -------------------------------------------------------------- > > ---------- > messages: 6809 > nosy: bsmith, li50, may, oseikuffuor1, petsc-users, rfalgout, schroder2, > tzanio, ulrikey, wang84 > status: unread > title: [petsc-users] Citation for hypre BoomerAMG > > ____________________________________________ > hypre Issue Tracker > > ____________________________________________ ____________________________________________ hypre Issue Tracker ____________________________________________ From davydden at gmail.com Fri Nov 20 05:06:11 2015 From: davydden at gmail.com (Denis Davydov) Date: Fri, 20 Nov 2015 12:06:11 +0100 Subject: [petsc-users] [SLEPc] GD is not deterministic when using different number of cores In-Reply-To: <24297832-E5E4-419B-91D3-09D38F267E9F@dsic.upv.es> References: <8E937148-1B39-443F-9A80-40776619472F@gmail.com> <24297832-E5E4-419B-91D3-09D38F267E9F@dsic.upv.es> Message-ID: <6B95A687-8393-4B21-9152-2789519EF62C@gmail.com> > On 19 Nov 2015, at 11:19, Jose E. Roman wrote: > >> >> El 19 nov 2015, a las 10:49, Denis Davydov escribi?: >> >> Dear all, >> >> I was trying to get some scaling results for the GD eigensolver as applied to the density functional theory. >> Interestingly enough, the number of self-consistent iterations (solution of couple eigenvalue problem and poisson equations) >> depends on the number of MPI cores used. For my case the range of iterations is 19-24 for MPI cores between 2 and 160. >> That makes the whole scaling check useless as the eigenproblem is solved different number of times. >> >> That is **not** the case when I use Krylov-Schur eigensolver with zero shift, which makes me believe that I am missing some settings on GD to make it fully deterministic. The only non-deterministic part I am currently aware of is the initial subspace for the first SC iterations. But that?s the case for both KS and GD. For subsequent iterations I provide previously obtained eigenvectors as initial subspace. >> >> Certainly there will be some round-off error due to different partition of DoFs for different number of MPI cores, >> but i don?t expect it to have such a strong influence. Especially given the fact that I don?t see this problem with KS. >> >> Below is the output of -eps-view for GD with -eps_type gd -eps_harmonic -st_pc_type bjacobi -eps_gd_krylov_start -eps_target -10.0 >> I would appreciate any suggestions on how to address the issue. > > The block Jacobi preconditioner differs when you change the number of processes. This will probably make GD iterate more when you use more processes. Switching to Jacobi preconditioner reduced variation in number of SC iterations, but does not remove it. Any other options but initial vector space which may introduce non-deterministic behaviour? >> >> As a side question, why GD uses KSP pre-only? It could as well be using a proper linear solver to apply K^{-1} in the expansion state -- > > You can achieve that with PCKSP. But if you are going to do that, why not using JD instead of GD? It was more a general question why the inverse is implemented by pre-only for GD and is done properly with a full control of KSP for JD. I will try JD as well because so far GD for my problems has a bottleneck in: BVDot (13% time), BVOrthogonalize (10% time), DSSolve (62% time); whereas only 11% of time is spent in MatMult. I suppose BVDot is mostly used in BVOrthogonalize and partly in calculation of Ritz vectors? My best bet with DSSolve (with mpd=175 only) is a better preconditioner and thus reduced number of iterations or double expansion with simple preconditioner? Regards, Denis. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carlesbona at gmail.com Fri Nov 20 05:43:59 2015 From: carlesbona at gmail.com (Carles Bona) Date: Fri, 20 Nov 2015 12:43:59 +0100 Subject: [petsc-users] adding mat rows In-Reply-To: <871tbl4sut.fsf@jedbrown.org> References: <3326A6C3-43D6-45D3-9431-E4D995F2CE0D@mcs.anl.gov> <874mgh4um5.fsf@jedbrown.org> <871tbl4sut.fsf@jedbrown.org> Message-ID: Hello again, MatMatMult not supported for B of type mpibaij (where A*B = C). Can A be of type mpibaij? Will C be of type mpibaij if A is but B is not? I see the limitation to aij/dense is mentioned in the documentation in matmatmultsymbolic and matmatmultnumeric, but not in matmatmult. Probably they have the same limitation... I guess I'm back to row-reading-adding. Carles 2015-11-19 21:49 GMT+01:00 Jed Brown : > Carles Bona writes: > > > Sorry, yes, I meant losing less information than I would lose without > > adding beforehand. > > That's not true either (as described). You're replacing two equations > with one equation, so no matter how you do it, you're opening a 1D > space. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Fri Nov 20 06:10:59 2015 From: jroman at dsic.upv.es (Jose E. Roman) Date: Fri, 20 Nov 2015 13:10:59 +0100 Subject: [petsc-users] [SLEPc] GD is not deterministic when using different number of cores In-Reply-To: <6B95A687-8393-4B21-9152-2789519EF62C@gmail.com> References: <8E937148-1B39-443F-9A80-40776619472F@gmail.com> <24297832-E5E4-419B-91D3-09D38F267E9F@dsic.upv.es> <6B95A687-8393-4B21-9152-2789519EF62C@gmail.com> Message-ID: > El 20 nov 2015, a las 12:06, Denis Davydov escribi?: > >> >> On 19 Nov 2015, at 11:19, Jose E. Roman wrote: >> >>> >>> El 19 nov 2015, a las 10:49, Denis Davydov escribi?: >>> >>> Dear all, >>> >>> I was trying to get some scaling results for the GD eigensolver as applied to the density functional theory. >>> Interestingly enough, the number of self-consistent iterations (solution of couple eigenvalue problem and poisson equations) >>> depends on the number of MPI cores used. For my case the range of iterations is 19-24 for MPI cores between 2 and 160. >>> That makes the whole scaling check useless as the eigenproblem is solved different number of times. >>> >>> That is **not** the case when I use Krylov-Schur eigensolver with zero shift, which makes me believe that I am missing some settings on GD to make it fully deterministic. The only non-deterministic part I am currently aware of is the initial subspace for the first SC iterations. But that?s the case for both KS and GD. For subsequent iterations I provide previously obtained eigenvectors as initial subspace. >>> >>> Certainly there will be some round-off error due to different partition of DoFs for different number of MPI cores, >>> but i don?t expect it to have such a strong influence. Especially given the fact that I don?t see this problem with KS. >>> >>> Below is the output of -eps-view for GD with -eps_type gd -eps_harmonic -st_pc_type bjacobi -eps_gd_krylov_start -eps_target -10.0 >>> I would appreciate any suggestions on how to address the issue. >> >> The block Jacobi preconditioner differs when you change the number of processes. This will probably make GD iterate more when you use more processes. > > Switching to Jacobi preconditioner reduced variation in number of SC iterations, but does not remove it. > Any other options but initial vector space which may introduce non-deterministic behaviour? > >>> >>> As a side question, why GD uses KSP pre-only? It could as well be using a proper linear solver to apply K^{-1} in the expansion state -- >> >> You can achieve that with PCKSP. But if you are going to do that, why not using JD instead of GD? > > It was more a general question why the inverse is implemented by pre-only for GD and is done properly with a full control of KSP for JD. GD uses the preconditioned residual to expand the subspace. JD uses the (approximate) solution of the correction equation. > > I will try JD as well because so far GD for my problems has a bottleneck in: BVDot (13% time), BVOrthogonalize (10% time), DSSolve (62% time); > whereas only 11% of time is spent in MatMult. > I suppose BVDot is mostly used in BVOrthogonalize and partly in calculation of Ritz vectors? > My best bet with DSSolve (with mpd=175 only) is a better preconditioner and thus reduced number of iterations or double expansion with simple preconditioner? DSSolve should always be small. Try reducing mpd. > > Regards, > Denis. From jed at jedbrown.org Fri Nov 20 08:12:27 2015 From: jed at jedbrown.org (Jed Brown) Date: Fri, 20 Nov 2015 07:12:27 -0700 Subject: [petsc-users] adding mat rows In-Reply-To: References: <3326A6C3-43D6-45D3-9431-E4D995F2CE0D@mcs.anl.gov> <874mgh4um5.fsf@jedbrown.org> <871tbl4sut.fsf@jedbrown.org> Message-ID: <87fv003gl0.fsf@jedbrown.org> Carles Bona writes: > Hello again, > > MatMatMult not supported for B of type mpibaij (where A*B = C). Can A be of > type mpibaij? Will C be of type mpibaij if A is but B is not? Use MPIAIJ for this. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From adlinds3 at ncsu.edu Fri Nov 20 12:40:31 2015 From: adlinds3 at ncsu.edu (Alex Lindsay) Date: Fri, 20 Nov 2015 13:40:31 -0500 Subject: [petsc-users] Debugging failed solve (what's an acceptable upper bound to the condition number?) Message-ID: <564F691F.9020302@ncsu.edu> Hello, I have an application built on top of the Moose framework, and I'm trying to debug a solve that is not converging. My linear solve converges very nicely. However, my non-linear solve does not, and the problem appears to be in the line search. Reading the PetSc FAQ, I see that the most common cause of poor line searches are bad Jacobians. However, I'm using a finite-differenced Jacobian; if I run -snes_type=test, I get "norm of matrix ratios" < 1e-15. Thus in this case the Jacobian should be accurate. I'm wondering then if my problem might be these (taken from the FAQ page): * The matrix is very ill-conditioned. Check the condition number . o Try to improve it by choosing the relative scaling of components/boundary conditions. o Try |-ksp_diagonal_scale -ksp_diagonal_scale_fix|. o Perhaps change the formulation of the problem to produce more friendly algebraic equations. * The matrix is nonlinear (e.g. evaluated using finite differencing of a nonlinear function). Try different differencing parameters, |./configure --with-precision=__float128 --download-f2cblaslapack|, check if it converges in "easier" parameter regimes. I'm almost ashamed to share my condition number because I'm sure it must be absurdly high. Without applying -ksp_diagonal_scale and -ksp_diagonal_scale_fix, the condition number is around 1e25. When I do apply those two parameters, the condition number is reduced to 1e17. Even after scaling all my variable residuals so that they were all on the order of unity (a suggestion on the Moose list), I still have a condition number of 1e12. I have no experience with condition numbers, but knowing that perfect condition number is unity, 1e12 seems unacceptable. What's an acceptable upper limit on the condition number? Is it problem dependent? Having already tried scaling the individual variable residuals, I'm not exactly sure what my next method would be for trying to reduce the condition number. I definitely have a nonlinear problem. Could I be having problems because I'm finite differencing non-linear residuals to form my Jacobian? I can see about using a different differencing parameter. I'm also going to consider trying quad precision. However, my hypothesis is that my condition number is the fundamental problem. Is that a reasonable hypothesis? If it's useful, below is console output with -pc_type=svd Time Step 1, time = 1e-10 dt = 1e-10 |residual|_2 of individual variables: potential: 8.12402e+07 potentialliq: 0.000819748 em: 49.206 emliq: 3.08187e-11 Arp: 2375.94 0 Nonlinear |R| = 8.124020e+07 SVD: condition number 1.457087640207e+12, 0 of 851 singular values are (nearly) zero SVD: smallest singular values: 5.637144317564e-09 9.345415388433e-08 4.106132915572e-05 1.017339655185e-04 1.147649477723e-04 SVD: largest singular values : 1.498505466947e+03 1.577560767570e+03 1.719172328193e+03 2.344218235296e+03 8.213813311188e+03 0 KSP unpreconditioned resid norm 3.185019606208e+05 true resid norm 3.185019606208e+05 ||r(i)||/||b|| 1.000000000000e+00 1 KSP unpreconditioned resid norm 6.382886902896e-07 true resid norm 6.382761808414e-07 ||r(i)||/||b|| 2.003994511046e-12 Linear solve converged due to CONVERGED_RTOL iterations 1 Line search: Using full step: fnorm 8.124020470169e+07 gnorm 1.097605946684e+01 |residual|_2 of individual variables: potential: 8.60047 potentialliq: 0.335436 em: 2.26472 emliq: 0.642578 Arp: 6.39151 1 Nonlinear |R| = 1.097606e+01 SVD: condition number 1.457473763066e+12, 0 of 851 singular values are (nearly) zero SVD: smallest singular values: 5.637185516434e-09 9.347128557672e-08 1.017339655587e-04 1.146760266781e-04 4.064422034774e-04 SVD: largest singular values : 1.498505466944e+03 1.577544976882e+03 1.718956369043e+03 2.343692402876e+03 8.216049987736e+03 0 KSP unpreconditioned resid norm 2.653715381459e+01 true resid norm 2.653715381459e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP unpreconditioned resid norm 6.031179341420e-05 true resid norm 6.031183387732e-05 ||r(i)||/||b|| 2.272731819648e-06 Linear solve converged due to CONVERGED_RTOL iterations 1 Line search: gnorm after quadratic fit 2.485190757827e+11 Line search: Cubic step no good, shrinking lambda, current gnorm 2.632996340352e+10 lambda=5.0000000000000003e-02 Line search: Cubic step no good, shrinking lambda, current gnorm 4.290675557416e+09 lambda=2.5000000000000001e-02 Line search: Cubic step no good, shrinking lambda, current gnorm 4.332980055153e+08 lambda=1.2500000000000001e-02 Line search: Cubic step no good, shrinking lambda, current gnorm 1.677118626669e+07 lambda=6.2500000000000003e-03 Line search: Cubic step no good, shrinking lambda, current gnorm 1.024469780306e+05 lambda=3.1250000000000002e-03 Line search: Cubic step no good, shrinking lambda, current gnorm 7.011543252988e+03 lambda=1.5625000000000001e-03 Line search: Cubic step no good, shrinking lambda, current gnorm 1.750171277470e+03 lambda=7.8125000000000004e-04 Line search: Cubic step no good, shrinking lambda, current gnorm 3.486970625406e+02 lambda=3.4794637057251714e-04 Line search: Cubic step no good, shrinking lambda, current gnorm 7.830624839582e+01 lambda=1.5977866967992950e-04 Line search: Cubic step no good, shrinking lambda, current gnorm 2.147529381328e+01 lambda=6.8049915671999093e-05 Line search: Cubic step no good, shrinking lambda, current gnorm 1.138950943123e+01 lambda=1.7575203052774536e-05 Line search: Cubically determined step, current gnorm 1.095195976135e+01 lambda=1.7575203052774537e-06 |residual|_2 of individual variables: potential: 8.59984 potentialliq: 0.395753 em: 2.26492 emliq: 0.642578 Arp: 6.34735 2 Nonlinear |R| = 1.095196e+01 SVD: condition number 1.457459214030e+12, 0 of 851 singular values are (nearly) zero SVD: smallest singular values: 5.637295371943e-09 9.347057884198e-08 1.017339655949e-04 1.146738253493e-04 4.064421554132e-04 SVD: largest singular values : 1.498505466946e+03 1.577543742603e+03 1.718948052797e+03 2.343672206864e+03 8.216128082047e+03 0 KSP unpreconditioned resid norm 2.653244141805e+01 true resid norm 2.653244141805e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP unpreconditioned resid norm 4.480869560737e-05 true resid norm 4.480686665183e-05 ||r(i)||/||b|| 1.688757771886e-06 Linear solve converged due to CONVERGED_RTOL iterations 1 Line search: gnorm after quadratic fit 2.481752147885e+11 Line search: Cubic step no good, shrinking lambda, current gnorm 2.631959989642e+10 lambda=5.0000000000000003e-02 Line search: Cubic step no good, shrinking lambda, current gnorm 4.289110800463e+09 lambda=2.5000000000000001e-02 Line search: Cubic step no good, shrinking lambda, current gnorm 4.332043942482e+08 lambda=1.2500000000000001e-02 Line search: Cubic step no good, shrinking lambda, current gnorm 1.677933337886e+07 lambda=6.2500000000000003e-03 Line search: Cubic step no good, shrinking lambda, current gnorm 1.027980597206e+05 lambda=3.1250000000000002e-03 Line search: Cubic step no good, shrinking lambda, current gnorm 7.054113639063e+03 lambda=1.5625000000000001e-03 Line search: Cubic step no good, shrinking lambda, current gnorm 1.771258630210e+03 lambda=7.8125000000000004e-04 Line search: Cubic step no good, shrinking lambda, current gnorm 3.517070127496e+02 lambda=3.4519087020105563e-04 Line search: Cubic step no good, shrinking lambda, current gnorm 7.844350966118e+01 lambda=1.5664532891249369e-04 Line search: Cubic step no good, shrinking lambda, current gnorm 2.114833995101e+01 lambda=6.5367917100814859e-05 Line search: Cubic step no good, shrinking lambda, current gnorm 1.144636844292e+01 lambda=1.6044984646715980e-05 Line search: Cubic step no good, shrinking lambda, current gnorm 1.095640770627e+01 lambda=1.6044984646715980e-06 Line search: Cubic step no good, shrinking lambda, current gnorm 1.095196729511e+01 lambda=1.6044984646715980e-07 Line search: Cubically determined step, current gnorm 1.095195451041e+01 lambda=2.3994454223607641e-08 |residual|_2 of individual variables: potential: 8.59983 potentialliq: 0.396107 em: 2.26492 emliq: 0.642578 Arp: 6.34733 3 Nonlinear |R| = 1.095195e+01 SVD: condition number 1.457474387942e+12, 0 of 851 singular values are (nearly) zero SVD: smallest singular values: 5.637237413167e-09 9.347057670885e-08 1.017339654798e-04 1.146737961973e-04 4.064420550524e-04 SVD: largest singular values : 1.498505466946e+03 1.577543716995e+03 1.718947893048e+03 2.343671853830e+03 8.216129148438e+03 0 KSP unpreconditioned resid norm 2.653237816527e+01 true resid norm 2.653237816527e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP unpreconditioned resid norm 8.525213442515e-05 true resid norm 8.527696332776e-05 ||r(i)||/||b|| 3.214071607022e-06 Linear solve converged due to CONVERGED_RTOL iterations 1 Line search: gnorm after quadratic fit 2.481576195523e+11 Line search: Cubic step no good, shrinking lambda, current gnorm 2.632005412624e+10 lambda=5.0000000000000003e-02 Line search: Cubic step no good, shrinking lambda, current gnorm 4.289212002697e+09 lambda=2.5000000000000001e-02 Line search: Cubic step no good, shrinking lambda, current gnorm 4.332196637845e+08 lambda=1.2500000000000001e-02 Line search: Cubic step no good, shrinking lambda, current gnorm 1.678040222943e+07 lambda=6.2500000000000003e-03 Line search: Cubic step no good, shrinking lambda, current gnorm 1.027868984884e+05 lambda=3.1250000000000002e-03 Line search: Cubic step no good, shrinking lambda, current gnorm 7.010733464460e+03 lambda=1.5625000000000001e-03 Line search: Cubic step no good, shrinking lambda, current gnorm 1.751519860441e+03 lambda=7.8125000000000004e-04 Line search: Cubic step no good, shrinking lambda, current gnorm 3.497889916171e+02 lambda=3.4753778542938795e-04 Line search: Cubic step no good, shrinking lambda, current gnorm 7.932631084466e+01 lambda=1.5879606741873878e-04 Line search: Cubic step no good, shrinking lambda, current gnorm 2.194608479634e+01 lambda=6.5716583192912669e-05 Line search: Cubic step no good, shrinking lambda, current gnorm 1.117190149691e+01 lambda=1.1541218569257328e-05 Line search: Cubically determined step, current gnorm 1.093879875464e+01 lambda=1.1541218569257329e-06 |residual|_2 of individual variables: potential: 8.59942 potentialliq: 0.403326 em: 2.26505 emliq: 0.714844 Arp: 6.3169 4 Nonlinear |R| = 1.093880e+01 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Fri Nov 20 13:33:09 2015 From: jed at jedbrown.org (Jed Brown) Date: Fri, 20 Nov 2015 12:33:09 -0700 Subject: [petsc-users] Debugging failed solve (what's an acceptable upper bound to the condition number?) In-Reply-To: <564F691F.9020302@ncsu.edu> References: <564F691F.9020302@ncsu.edu> Message-ID: <877flc31qi.fsf@jedbrown.org> Alex Lindsay writes: > I'm almost ashamed to share my condition number because I'm sure it must > be absurdly high. Without applying -ksp_diagonal_scale and > -ksp_diagonal_scale_fix, the condition number is around 1e25. When I do > apply those two parameters, the condition number is reduced to 1e17. > Even after scaling all my variable residuals so that they were all on > the order of unity (a suggestion on the Moose list), I still have a > condition number of 1e12. Double precision provides 16 digits of accuracy in the best case. When you finite difference, the accuracy is reduced to 8 digits if the differencing parameter is chosen optimally. With the condition numbers you're reporting, your matrix is singular up to available precision. > I have no experience with condition numbers, but knowing that perfect > condition number is unity, 1e12 seems unacceptable. What's an > acceptable upper limit on the condition number? Is it problem > dependent? Having already tried scaling the individual variable > residuals, I'm not exactly sure what my next method would be for > trying to reduce the condition number. Singular operators are often caused by incorrect boundary conditions. You should try a small and simple version of your problem and find out why it's producing a singular (or so close to singular we can't tell) operator. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From bsmith at mcs.anl.gov Fri Nov 20 14:24:29 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 20 Nov 2015 14:24:29 -0600 Subject: [petsc-users] Debugging failed solve (what's an acceptable upper bound to the condition number?) In-Reply-To: <564F691F.9020302@ncsu.edu> References: <564F691F.9020302@ncsu.edu> Message-ID: <60DF658D-60DC-4CA6-B64E-40E2D31C91D2@mcs.anl.gov> Do you really only have 851 variables? SVD: condition number 1.457087640207e+12, 0 of 851 singular values are (nearly) zero if so you can use -snes_fd and -ksp_view_pmat binary:filename to save the small matrix and then load it up into MATLAB or similar tool to fully analysis its eigenstructure to see the distribution from the tiny values to the large values; Is it just a small number of tiny ones etc. Note that with such a large condition number the factor the linear system "converges" quickly may be meaningless since a small residual doesn't always mean a small error. The error code still be huge Barry > On Nov 20, 2015, at 12:40 PM, Alex Lindsay wrote: > > Hello, > > I have an application built on top of the Moose framework, and I'm trying to debug a solve that is not converging. My linear solve converges very nicely. However, my non-linear solve does not, and the problem appears to be in the line search. Reading the PetSc FAQ, I see that the most common cause of poor line searches are bad Jacobians. However, I'm using a finite-differenced Jacobian; if I run -snes_type=test, I get "norm of matrix ratios" < 1e-15. Thus in this case the Jacobian should be accurate. I'm wondering then if my problem might be these (taken from the FAQ page): > > ? The matrix is very ill-conditioned. Check the condition number. > ? Try to improve it by choosing the relative scaling of components/boundary conditions. > ? Try -ksp_diagonal_scale -ksp_diagonal_scale_fix. > ? Perhaps change the formulation of the problem to produce more friendly algebraic equations. > ? The matrix is nonlinear (e.g. evaluated using finite differencing of a nonlinear function). Try different differencing parameters, ./configure --with-precision=__float128 --download-f2cblaslapack, check if it converges in "easier" parameter regimes. > I'm almost ashamed to share my condition number because I'm sure it must be absurdly high. Without applying -ksp_diagonal_scale and -ksp_diagonal_scale_fix, the condition number is around 1e25. When I do apply those two parameters, the condition number is reduced to 1e17. Even after scaling all my variable residuals so that they were all on the order of unity (a suggestion on the Moose list), I still have a condition number of 1e12. I have no experience with condition numbers, but knowing that perfect condition number is unity, 1e12 seems unacceptable. What's an acceptable upper limit on the condition number? Is it problem dependent? Having already tried scaling the individual variable residuals, I'm not exactly sure what my next method would be for trying to reduce the condition number. > > I definitely have a nonlinear problem. Could I be having problems because I'm finite differencing non-linear residuals to form my Jacobian? I can see about using a different differencing parameter. I'm also going to consider trying quad precision. However, my hypothesis is that my condition number is the fundamental problem. Is that a reasonable hypothesis? > > If it's useful, below is console output with -pc_type=svd > > Time Step 1, time = 1e-10 > dt = 1e-10 > |residual|_2 of individual variables: > potential: 8.12402e+07 > potentialliq: 0.000819748 > em: 49.206 > emliq: 3.08187e-11 > Arp: 2375.94 > > 0 Nonlinear |R| = 8.124020e+07 > SVD: condition number 1.457087640207e+12, 0 of 851 singular values are (nearly) zero > SVD: smallest singular values: 5.637144317564e-09 9.345415388433e-08 4.106132915572e-05 1.017339655185e-04 1.147649477723e-04 > SVD: largest singular values : 1.498505466947e+03 1.577560767570e+03 1.719172328193e+03 2.344218235296e+03 8.213813311188e+03 > 0 KSP unpreconditioned resid norm 3.185019606208e+05 true resid norm 3.185019606208e+05 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP unpreconditioned resid norm 6.382886902896e-07 true resid norm 6.382761808414e-07 ||r(i)||/||b|| 2.003994511046e-12 > Linear solve converged due to CONVERGED_RTOL iterations 1 > Line search: Using full step: fnorm 8.124020470169e+07 gnorm 1.097605946684e+01 > |residual|_2 of individual variables: > potential: 8.60047 > potentialliq: 0.335436 > em: 2.26472 > emliq: 0.642578 > Arp: 6.39151 > > 1 Nonlinear |R| = 1.097606e+01 > SVD: condition number 1.457473763066e+12, 0 of 851 singular values are (nearly) zero > SVD: smallest singular values: 5.637185516434e-09 9.347128557672e-08 1.017339655587e-04 1.146760266781e-04 4.064422034774e-04 > SVD: largest singular values : 1.498505466944e+03 1.577544976882e+03 1.718956369043e+03 2.343692402876e+03 8.216049987736e+03 > 0 KSP unpreconditioned resid norm 2.653715381459e+01 true resid norm 2.653715381459e+01 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP unpreconditioned resid norm 6.031179341420e-05 true resid norm 6.031183387732e-05 ||r(i)||/||b|| 2.272731819648e-06 > Linear solve converged due to CONVERGED_RTOL iterations 1 > Line search: gnorm after quadratic fit 2.485190757827e+11 > Line search: Cubic step no good, shrinking lambda, current gnorm 2.632996340352e+10 lambda=5.0000000000000003e-02 > Line search: Cubic step no good, shrinking lambda, current gnorm 4.290675557416e+09 lambda=2.5000000000000001e-02 > Line search: Cubic step no good, shrinking lambda, current gnorm 4.332980055153e+08 lambda=1.2500000000000001e-02 > Line search: Cubic step no good, shrinking lambda, current gnorm 1.677118626669e+07 lambda=6.2500000000000003e-03 > Line search: Cubic step no good, shrinking lambda, current gnorm 1.024469780306e+05 lambda=3.1250000000000002e-03 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.011543252988e+03 lambda=1.5625000000000001e-03 > Line search: Cubic step no good, shrinking lambda, current gnorm 1.750171277470e+03 lambda=7.8125000000000004e-04 > Line search: Cubic step no good, shrinking lambda, current gnorm 3.486970625406e+02 lambda=3.4794637057251714e-04 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.830624839582e+01 lambda=1.5977866967992950e-04 > Line search: Cubic step no good, shrinking lambda, current gnorm 2.147529381328e+01 lambda=6.8049915671999093e-05 > Line search: Cubic step no good, shrinking lambda, current gnorm 1.138950943123e+01 lambda=1.7575203052774536e-05 > Line search: Cubically determined step, current gnorm 1.095195976135e+01 lambda=1.7575203052774537e-06 > |residual|_2 of individual variables: > potential: 8.59984 > potentialliq: 0.395753 > em: 2.26492 > emliq: 0.642578 > Arp: 6.34735 > > 2 Nonlinear |R| = 1.095196e+01 > SVD: condition number 1.457459214030e+12, 0 of 851 singular values are (nearly) zero > SVD: smallest singular values: 5.637295371943e-09 9.347057884198e-08 1.017339655949e-04 1.146738253493e-04 4.064421554132e-04 > SVD: largest singular values : 1.498505466946e+03 1.577543742603e+03 1.718948052797e+03 2.343672206864e+03 8.216128082047e+03 > 0 KSP unpreconditioned resid norm 2.653244141805e+01 true resid norm 2.653244141805e+01 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP unpreconditioned resid norm 4.480869560737e-05 true resid norm 4.480686665183e-05 ||r(i)||/||b|| 1.688757771886e-06 > Linear solve converged due to CONVERGED_RTOL iterations 1 > Line search: gnorm after quadratic fit 2.481752147885e+11 > Line search: Cubic step no good, shrinking lambda, current gnorm 2.631959989642e+10 lambda=5.0000000000000003e-02 > Line search: Cubic step no good, shrinking lambda, current gnorm 4.289110800463e+09 lambda=2.5000000000000001e-02 > Line search: Cubic step no good, shrinking lambda, current gnorm 4.332043942482e+08 lambda=1.2500000000000001e-02 > Line search: Cubic step no good, shrinking lambda, current gnorm 1.677933337886e+07 lambda=6.2500000000000003e-03 > Line search: Cubic step no good, shrinking lambda, current gnorm 1.027980597206e+05 lambda=3.1250000000000002e-03 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.054113639063e+03 lambda=1.5625000000000001e-03 > Line search: Cubic step no good, shrinking lambda, current gnorm 1.771258630210e+03 lambda=7.8125000000000004e-04 > Line search: Cubic step no good, shrinking lambda, current gnorm 3.517070127496e+02 lambda=3.4519087020105563e-04 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.844350966118e+01 lambda=1.5664532891249369e-04 > Line search: Cubic step no good, shrinking lambda, current gnorm 2.114833995101e+01 lambda=6.5367917100814859e-05 > Line search: Cubic step no good, shrinking lambda, current gnorm 1.144636844292e+01 lambda=1.6044984646715980e-05 > Line search: Cubic step no good, shrinking lambda, current gnorm 1.095640770627e+01 lambda=1.6044984646715980e-06 > Line search: Cubic step no good, shrinking lambda, current gnorm 1.095196729511e+01 lambda=1.6044984646715980e-07 > Line search: Cubically determined step, current gnorm 1.095195451041e+01 lambda=2.3994454223607641e-08 > |residual|_2 of individual variables: > potential: 8.59983 > potentialliq: 0.396107 > em: 2.26492 > emliq: 0.642578 > Arp: 6.34733 > > 3 Nonlinear |R| = 1.095195e+01 > SVD: condition number 1.457474387942e+12, 0 of 851 singular values are (nearly) zero > SVD: smallest singular values: 5.637237413167e-09 9.347057670885e-08 1.017339654798e-04 1.146737961973e-04 4.064420550524e-04 > SVD: largest singular values : 1.498505466946e+03 1.577543716995e+03 1.718947893048e+03 2.343671853830e+03 8.216129148438e+03 > 0 KSP unpreconditioned resid norm 2.653237816527e+01 true resid norm 2.653237816527e+01 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP unpreconditioned resid norm 8.525213442515e-05 true resid norm 8.527696332776e-05 ||r(i)||/||b|| 3.214071607022e-06 > Linear solve converged due to CONVERGED_RTOL iterations 1 > Line search: gnorm after quadratic fit 2.481576195523e+11 > Line search: Cubic step no good, shrinking lambda, current gnorm 2.632005412624e+10 lambda=5.0000000000000003e-02 > Line search: Cubic step no good, shrinking lambda, current gnorm 4.289212002697e+09 lambda=2.5000000000000001e-02 > Line search: Cubic step no good, shrinking lambda, current gnorm 4.332196637845e+08 lambda=1.2500000000000001e-02 > Line search: Cubic step no good, shrinking lambda, current gnorm 1.678040222943e+07 lambda=6.2500000000000003e-03 > Line search: Cubic step no good, shrinking lambda, current gnorm 1.027868984884e+05 lambda=3.1250000000000002e-03 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.010733464460e+03 lambda=1.5625000000000001e-03 > Line search: Cubic step no good, shrinking lambda, current gnorm 1.751519860441e+03 lambda=7.8125000000000004e-04 > Line search: Cubic step no good, shrinking lambda, current gnorm 3.497889916171e+02 lambda=3.4753778542938795e-04 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.932631084466e+01 lambda=1.5879606741873878e-04 > Line search: Cubic step no good, shrinking lambda, current gnorm 2.194608479634e+01 lambda=6.5716583192912669e-05 > Line search: Cubic step no good, shrinking lambda, current gnorm 1.117190149691e+01 lambda=1.1541218569257328e-05 > Line search: Cubically determined step, current gnorm 1.093879875464e+01 lambda=1.1541218569257329e-06 > |residual|_2 of individual variables: > potential: 8.59942 > potentialliq: 0.403326 > em: 2.26505 > emliq: 0.714844 > Arp: 6.3169 > > 4 Nonlinear |R| = 1.093880e+01 > From bsmith at mcs.anl.gov Fri Nov 20 19:36:29 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 20 Nov 2015 19:36:29 -0600 Subject: [petsc-users] Debugging failed solve (what's an acceptable upper bound to the condition number?) In-Reply-To: <564FC45C.8020300@ncsu.edu> References: <564F691F.9020302@ncsu.edu> <60DF658D-60DC-4CA6-B64E-40E2D31C91D2@mcs.anl.gov> <564FC45C.8020300@ncsu.edu> Message-ID: Always make sure that when you reply it goes to everyone on the mailing list; otherwise you're stuck with only stupid old me trying to understand what is going on. Can you run with everything else the same but use equal permittivities? Do all the huge condition numbers and no convergence of the nonlinear solve go away? Barry > On Nov 20, 2015, at 7:09 PM, Alex Lindsay wrote: > > I think I may be honing in on what's causing my problems. I have an interface where I am coupling two different subdomains. Among the physics at the interface is a jump discontinuity in the gradient of the electrical potential (e.g. a jump discontinuity in the electric field), governed by the ratio of the permittivities on either side of the interface. This is implemented in my code like this: > > Real > DGMatDiffusionInt::computeQpResidual(Moose::DGResidualType type) > { > if (_D_neighbor[_qp] < std::numeric_limits::epsilon()) > mooseError("It doesn't appear that DG material properties got passed."); > > Real r = 0; > > switch (type) > { > case Moose::Element: > r += 0.5 * (-_D[_qp] * _grad_u[_qp] * _normals[_qp] + -_D_neighbor[_qp] * _grad_neighbor_value[_qp] * _normals[_qp]) * _test[_i][_qp]; > break; > > case Moose::Neighbor: > r += 0.5 * (_D[_qp] * _grad_u[_qp] * _normals[_qp] + _D_neighbor[_qp] * _grad_neighbor_value[_qp] * _normals[_qp]) * _test_neighbor[_i][_qp]; > break; > } > > return r; > } > > where here _D and _D_neighbor are the permittivities on either side of the interface. Attached are pictures showing the solution using Newton, and the solution using a Jacobin-Free method. Newton's method yields electric fields with opposite signs on either side of the interface, which is physically impossible. The Jacobian-free solution yields electric fields with the same sign, and with the proper ratio (a ratio of 5, equivalent to the ratio of the permittivities). I'm sure if I had the proper numerical analysis background, I might know why Newton's method has a hard time here, but I don't. Could someone explain why? > > Alex > > On 11/20/2015 03:24 PM, Barry Smith wrote: >> Do you really only have 851 variables? >> >> SVD: condition number 1.457087640207e+12, 0 of 851 singular values are (nearly) zero >> >> if so you can use -snes_fd and -ksp_view_pmat binary:filename to save the small matrix and then load it up into >> MATLAB or similar tool to fully analysis its eigenstructure to see the distribution from the tiny values to the large values; Is it just a small number of tiny ones etc. >> >> Note that with such a large condition number the factor the linear system "converges" quickly may be meaningless since a small residual doesn't always mean a small error. The error code still be huge >> >> >> Barry >> >> >> >>> On Nov 20, 2015, at 12:40 PM, Alex Lindsay wrote: >>> >>> Hello, >>> >>> I have an application built on top of the Moose framework, and I'm trying to debug a solve that is not converging. My linear solve converges very nicely. However, my non-linear solve does not, and the problem appears to be in the line search. Reading the PetSc FAQ, I see that the most common cause of poor line searches are bad Jacobians. However, I'm using a finite-differenced Jacobian; if I run -snes_type=test, I get "norm of matrix ratios" < 1e-15. Thus in this case the Jacobian should be accurate. I'm wondering then if my problem might be these (taken from the FAQ page): >>> >>> ? The matrix is very ill-conditioned. Check the condition number. >>> ? Try to improve it by choosing the relative scaling of components/boundary conditions. >>> ? Try -ksp_diagonal_scale -ksp_diagonal_scale_fix. >>> ? Perhaps change the formulation of the problem to produce more friendly algebraic equations. >>> ? The matrix is nonlinear (e.g. evaluated using finite differencing of a nonlinear function). Try different differencing parameters, ./configure --with-precision=__float128 --download-f2cblaslapack, check if it converges in "easier" parameter regimes. >>> I'm almost ashamed to share my condition number because I'm sure it must be absurdly high. Without applying -ksp_diagonal_scale and -ksp_diagonal_scale_fix, the condition number is around 1e25. When I do apply those two parameters, the condition number is reduced to 1e17. Even after scaling all my variable residuals so that they were all on the order of unity (a suggestion on the Moose list), I still have a condition number of 1e12. I have no experience with condition numbers, but knowing that perfect condition number is unity, 1e12 seems unacceptable. What's an acceptable upper limit on the condition number? Is it problem dependent? Having already tried scaling the individual variable residuals, I'm not exactly sure what my next method would be for trying to reduce the condition number. >>> >>> I definitely have a nonlinear problem. Could I be having problems because I'm finite differencing non-linear residuals to form my Jacobian? I can see about using a different differencing parameter. I'm also going to consider trying quad precision. However, my hypothesis is that my condition number is the fundamental problem. Is that a reasonable hypothesis? >>> >>> If it's useful, below is console output with -pc_type=svd >>> >>> Time Step 1, time = 1e-10 >>> dt = 1e-10 >>> |residual|_2 of individual variables: >>> potential: 8.12402e+07 >>> potentialliq: 0.000819748 >>> em: 49.206 >>> emliq: 3.08187e-11 >>> Arp: 2375.94 >>> >>> 0 Nonlinear |R| = 8.124020e+07 >>> SVD: condition number 1.457087640207e+12, 0 of 851 singular values are (nearly) zero >>> SVD: smallest singular values: 5.637144317564e-09 9.345415388433e-08 4.106132915572e-05 1.017339655185e-04 1.147649477723e-04 >>> SVD: largest singular values : 1.498505466947e+03 1.577560767570e+03 1.719172328193e+03 2.344218235296e+03 8.213813311188e+03 >>> 0 KSP unpreconditioned resid norm 3.185019606208e+05 true resid norm 3.185019606208e+05 ||r(i)||/||b|| 1.000000000000e+00 >>> 1 KSP unpreconditioned resid norm 6.382886902896e-07 true resid norm 6.382761808414e-07 ||r(i)||/||b|| 2.003994511046e-12 >>> Linear solve converged due to CONVERGED_RTOL iterations 1 >>> Line search: Using full step: fnorm 8.124020470169e+07 gnorm 1.097605946684e+01 >>> |residual|_2 of individual variables: >>> potential: 8.60047 >>> potentialliq: 0.335436 >>> em: 2.26472 >>> emliq: 0.642578 >>> Arp: 6.39151 >>> >>> 1 Nonlinear |R| = 1.097606e+01 >>> SVD: condition number 1.457473763066e+12, 0 of 851 singular values are (nearly) zero >>> SVD: smallest singular values: 5.637185516434e-09 9.347128557672e-08 1.017339655587e-04 1.146760266781e-04 4.064422034774e-04 >>> SVD: largest singular values : 1.498505466944e+03 1.577544976882e+03 1.718956369043e+03 2.343692402876e+03 8.216049987736e+03 >>> 0 KSP unpreconditioned resid norm 2.653715381459e+01 true resid norm 2.653715381459e+01 ||r(i)||/||b|| 1.000000000000e+00 >>> 1 KSP unpreconditioned resid norm 6.031179341420e-05 true resid norm 6.031183387732e-05 ||r(i)||/||b|| 2.272731819648e-06 >>> Linear solve converged due to CONVERGED_RTOL iterations 1 >>> Line search: gnorm after quadratic fit 2.485190757827e+11 >>> Line search: Cubic step no good, shrinking lambda, current gnorm 2.632996340352e+10 lambda=5.0000000000000003e-02 >>> Line search: Cubic step no good, shrinking lambda, current gnorm 4.290675557416e+09 lambda=2.5000000000000001e-02 >>> Line search: Cubic step no good, shrinking lambda, current gnorm 4.332980055153e+08 lambda=1.2500000000000001e-02 >>> Line search: Cubic step no good, shrinking lambda, current gnorm 1.677118626669e+07 lambda=6.2500000000000003e-03 >>> Line search: Cubic step no good, shrinking lambda, current gnorm 1.024469780306e+05 lambda=3.1250000000000002e-03 >>> Line search: Cubic step no good, shrinking lambda, current gnorm 7.011543252988e+03 lambda=1.5625000000000001e-03 >>> Line search: Cubic step no good, shrinking lambda, current gnorm 1.750171277470e+03 lambda=7.8125000000000004e-04 >>> Line search: Cubic step no good, shrinking lambda, current gnorm 3.486970625406e+02 lambda=3.4794637057251714e-04 >>> Line search: Cubic step no good, shrinking lambda, current gnorm 7.830624839582e+01 lambda=1.5977866967992950e-04 >>> Line search: Cubic step no good, shrinking lambda, current gnorm 2.147529381328e+01 lambda=6.8049915671999093e-05 >>> Line search: Cubic step no good, shrinking lambda, current gnorm 1.138950943123e+01 lambda=1.7575203052774536e-05 >>> Line search: Cubically determined step, current gnorm 1.095195976135e+01 lambda=1.7575203052774537e-06 >>> |residual|_2 of individual variables: >>> potential: 8.59984 >>> potentialliq: 0.395753 >>> em: 2.26492 >>> emliq: 0.642578 >>> Arp: 6.34735 >>> >>> 2 Nonlinear |R| = 1.095196e+01 >>> SVD: condition number 1.457459214030e+12, 0 of 851 singular values are (nearly) zero >>> SVD: smallest singular values: 5.637295371943e-09 9.347057884198e-08 1.017339655949e-04 1.146738253493e-04 4.064421554132e-04 >>> SVD: largest singular values : 1.498505466946e+03 1.577543742603e+03 1.718948052797e+03 2.343672206864e+03 8.216128082047e+03 >>> 0 KSP unpreconditioned resid norm 2.653244141805e+01 true resid norm 2.653244141805e+01 ||r(i)||/||b|| 1.000000000000e+00 >>> 1 KSP unpreconditioned resid norm 4.480869560737e-05 true resid norm 4.480686665183e-05 ||r(i)||/||b|| 1.688757771886e-06 >>> Linear solve converged due to CONVERGED_RTOL iterations 1 >>> Line search: gnorm after quadratic fit 2.481752147885e+11 >>> Line search: Cubic step no good, shrinking lambda, current gnorm 2.631959989642e+10 lambda=5.0000000000000003e-02 >>> Line search: Cubic step no good, shrinking lambda, current gnorm 4.289110800463e+09 lambda=2.5000000000000001e-02 >>> Line search: Cubic step no good, shrinking lambda, current gnorm 4.332043942482e+08 lambda=1.2500000000000001e-02 >>> Line search: Cubic step no good, shrinking lambda, current gnorm 1.677933337886e+07 lambda=6.2500000000000003e-03 >>> Line search: Cubic step no good, shrinking lambda, current gnorm 1.027980597206e+05 lambda=3.1250000000000002e-03 >>> Line search: Cubic step no good, shrinking lambda, current gnorm 7.054113639063e+03 lambda=1.5625000000000001e-03 >>> Line search: Cubic step no good, shrinking lambda, current gnorm 1.771258630210e+03 lambda=7.8125000000000004e-04 >>> Line search: Cubic step no good, shrinking lambda, current gnorm 3.517070127496e+02 lambda=3.4519087020105563e-04 >>> Line search: Cubic step no good, shrinking lambda, current gnorm 7.844350966118e+01 lambda=1.5664532891249369e-04 >>> Line search: Cubic step no good, shrinking lambda, current gnorm 2.114833995101e+01 lambda=6.5367917100814859e-05 >>> Line search: Cubic step no good, shrinking lambda, current gnorm 1.144636844292e+01 lambda=1.6044984646715980e-05 >>> Line search: Cubic step no good, shrinking lambda, current gnorm 1.095640770627e+01 lambda=1.6044984646715980e-06 >>> Line search: Cubic step no good, shrinking lambda, current gnorm 1.095196729511e+01 lambda=1.6044984646715980e-07 >>> Line search: Cubically determined step, current gnorm 1.095195451041e+01 lambda=2.3994454223607641e-08 >>> |residual|_2 of individual variables: >>> potential: 8.59983 >>> potentialliq: 0.396107 >>> em: 2.26492 >>> emliq: 0.642578 >>> Arp: 6.34733 >>> >>> 3 Nonlinear |R| = 1.095195e+01 >>> SVD: condition number 1.457474387942e+12, 0 of 851 singular values are (nearly) zero >>> SVD: smallest singular values: 5.637237413167e-09 9.347057670885e-08 1.017339654798e-04 1.146737961973e-04 4.064420550524e-04 >>> SVD: largest singular values : 1.498505466946e+03 1.577543716995e+03 1.718947893048e+03 2.343671853830e+03 8.216129148438e+03 >>> 0 KSP unpreconditioned resid norm 2.653237816527e+01 true resid norm 2.653237816527e+01 ||r(i)||/||b|| 1.000000000000e+00 >>> 1 KSP unpreconditioned resid norm 8.525213442515e-05 true resid norm 8.527696332776e-05 ||r(i)||/||b|| 3.214071607022e-06 >>> Linear solve converged due to CONVERGED_RTOL iterations 1 >>> Line search: gnorm after quadratic fit 2.481576195523e+11 >>> Line search: Cubic step no good, shrinking lambda, current gnorm 2.632005412624e+10 lambda=5.0000000000000003e-02 >>> Line search: Cubic step no good, shrinking lambda, current gnorm 4.289212002697e+09 lambda=2.5000000000000001e-02 >>> Line search: Cubic step no good, shrinking lambda, current gnorm 4.332196637845e+08 lambda=1.2500000000000001e-02 >>> Line search: Cubic step no good, shrinking lambda, current gnorm 1.678040222943e+07 lambda=6.2500000000000003e-03 >>> Line search: Cubic step no good, shrinking lambda, current gnorm 1.027868984884e+05 lambda=3.1250000000000002e-03 >>> Line search: Cubic step no good, shrinking lambda, current gnorm 7.010733464460e+03 lambda=1.5625000000000001e-03 >>> Line search: Cubic step no good, shrinking lambda, current gnorm 1.751519860441e+03 lambda=7.8125000000000004e-04 >>> Line search: Cubic step no good, shrinking lambda, current gnorm 3.497889916171e+02 lambda=3.4753778542938795e-04 >>> Line search: Cubic step no good, shrinking lambda, current gnorm 7.932631084466e+01 lambda=1.5879606741873878e-04 >>> Line search: Cubic step no good, shrinking lambda, current gnorm 2.194608479634e+01 lambda=6.5716583192912669e-05 >>> Line search: Cubic step no good, shrinking lambda, current gnorm 1.117190149691e+01 lambda=1.1541218569257328e-05 >>> Line search: Cubically determined step, current gnorm 1.093879875464e+01 lambda=1.1541218569257329e-06 >>> |residual|_2 of individual variables: >>> potential: 8.59942 >>> potentialliq: 0.403326 >>> em: 2.26505 >>> emliq: 0.714844 >>> Arp: 6.3169 >>> >>> 4 Nonlinear |R| = 1.093880e+01 >>> > > From zonexo at gmail.com Sat Nov 21 07:47:33 2015 From: zonexo at gmail.com (TAY wee-beng) Date: Sat, 21 Nov 2015 21:47:33 +0800 Subject: [petsc-users] Use of DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90 In-Reply-To: References: <5641B879.4020504@gmail.com> <5641E37E.5070401@gmail.com> Message-ID: <565075F5.9050004@gmail.com> On 10/11/2015 8:47 PM, Matthew Knepley wrote: > On Tue, Nov 10, 2015 at 6:30 AM, TAY wee-beng > wrote: > > > On 10/11/2015 8:27 PM, Matthew Knepley wrote: >> On Tue, Nov 10, 2015 at 3:27 AM, TAY wee-beng > > wrote: >> >> Hi, >> >> Inside my subroutine, I need to access the DA variable >> cu_types_array frequently. >> >> So I need to call DMDAVecGetArrayF90 and >> DMDAVecRestoreArrayF90 before and after frequently. >> >> Is this necessary? Can I call DMDAVecGetArrayF90 at the start >> and only call DMDAVecRestoreArrayF90 towards the end, where I >> don't need to modify the values of cu_types_array anymore? >> >> Will this cause memory corruption? >> >> >> You cannot use any other vector operations before you have called >> Restore. > > Hi, > > What do you mean by vector operations? I will just be doing some > maths operation to change the values in cu_types_array. Is that fine? > > > While you have the array, no other operation can change the values. Hi, Let me clarify this. I declare in this way: DM da_cu_types Vec cu_types_local,cu_types_global PetscScalar,pointer :: cu_types_array(:,:,:) call DMDACreate3d(MPI_COMM_WORLD,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE,DMDA_STENCIL_STAR,(end_ijk_uniform(1) - sta_ijk_uniform(1) + 1),(end_ijk_uniform(2) - sta_ijk_uniform(2) + 1),& (end_ijk_uniform(3) - sta_ijk_uniform(3) + 1),PETSC_DECIDE,PETSC_DECIDE,PETSC_DECIDE,1,stencil_width_IIB,PETSC_NULL_INTEGER,PETSC_NULL_INTEGER,PETSC_NULL_INTEGER,da_cu_types,ierr) call DMCreateGlobalVector(da_cu_types,cu_types_global,ierr) call DMCreateLocalVector(da_cu_types,cu_types_local,ierr) So when I need to change the values in DA variable cu_types, I call: call DMDAVecGetArrayF90(da_cu_types,cu_types_local,cu_types_array,ierr) .... math operations, changing the values of cu_types_array, such as: cu_types_array = 0.d0 call DMDAVecRestoreArrayF90(da_cu_types,cu_types_local,cu_types_array,ierr) 1st of all, does these DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90 take a lot of time, especially if I call them many times. Next qn is whether if I can call DMDAVecGetArrayF90 at the start, and DMDAVecRestoreArrayF90 after operations similar to the one above is finished. Thanks. > > Matt > >> Also, must the array be restored using DMDAVecRestoreArrayF90 >> before calling DMLocalToLocalBegin,DMLocalToLocalEnd? >> >> >> Yes. >> >> Matt >> >> >> -- >> Thank you. >> >> Yours sincerely, >> >> TAY wee-beng >> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to >> which their experiments lead. >> -- Norbert Wiener > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sat Nov 21 07:58:06 2015 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 21 Nov 2015 07:58:06 -0600 Subject: [petsc-users] Use of DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90 In-Reply-To: <565075F5.9050004@gmail.com> References: <5641B879.4020504@gmail.com> <5641E37E.5070401@gmail.com> <565075F5.9050004@gmail.com> Message-ID: On Sat, Nov 21, 2015 at 7:47 AM, TAY wee-beng wrote: > > On 10/11/2015 8:47 PM, Matthew Knepley wrote: > > On Tue, Nov 10, 2015 at 6:30 AM, TAY wee-beng wrote: > >> >> On 10/11/2015 8:27 PM, Matthew Knepley wrote: >> >> On Tue, Nov 10, 2015 at 3:27 AM, TAY wee-beng < >> zonexo at gmail.com> wrote: >> >>> Hi, >>> >>> Inside my subroutine, I need to access the DA variable cu_types_array >>> frequently. >>> >>> So I need to call DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90 before >>> and after frequently. >>> >>> Is this necessary? Can I call DMDAVecGetArrayF90 at the start and only >>> call DMDAVecRestoreArrayF90 towards the end, where I don't need to modify >>> the values of cu_types_array anymore? >>> >>> Will this cause memory corruption? >>> >> >> You cannot use any other vector operations before you have called Restore. >> >> >> Hi, >> >> What do you mean by vector operations? I will just be doing some maths >> operation to change the values in cu_types_array. Is that fine? >> > > While you have the array, no other operation can change the values. > > Hi, > > Let me clarify this. I declare in this way: > > DM da_cu_types > > Vec cu_types_local,cu_types_global > > PetscScalar,pointer :: cu_types_array(:,:,:) > > call > DMDACreate3d(MPI_COMM_WORLD,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE,DMDA_STENCIL_STAR,(end_ijk_uniform(1) > - sta_ijk_uniform(1) + 1),(end_ijk_uniform(2) - sta_ijk_uniform(2) + 1),& > > (end_ijk_uniform(3) - sta_ijk_uniform(3) + > 1),PETSC_DECIDE,PETSC_DECIDE,PETSC_DECIDE,1,stencil_width_IIB,PETSC_NULL_INTEGER,PETSC_NULL_INTEGER,PETSC_NULL_INTEGER,da_cu_types,ierr) > > call DMCreateGlobalVector(da_cu_types,cu_types_global,ierr) > > call DMCreateLocalVector(da_cu_types,cu_types_local,ierr) > > So when I need to change the values in DA variable cu_types, I call: > > call DMDAVecGetArrayF90(da_cu_types,cu_types_local,cu_types_array,ierr) > > .... math operations, changing the values of cu_types_array, such as: > > cu_types_array = 0.d0 > > call DMDAVecRestoreArrayF90(da_cu_types,cu_types_local,cu_types_array,ierr) > > 1st of all, does these DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90 take > a lot of time, especially if I call them many times. > No. > Next qn is whether if I can call DMDAVecGetArrayF90 at the start, and > DMDAVecRestoreArrayF90 after operations similar to the one above is > finished. > You cannot do other Vec operations before you call Restore. Matt > Thanks. > > > Matt > > >> >> >> Also, must the array be restored using DMDAVecRestoreArrayF90 before >>> calling DMLocalToLocalBegin,DMLocalToLocalEnd? >> >> >> Yes. >> >> Matt >> >> >>> >>> -- >>> Thank you. >>> >>> Yours sincerely, >>> >>> TAY wee-beng >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bhmerchant at gmail.com Sat Nov 21 14:08:41 2015 From: bhmerchant at gmail.com (Brian Merchant) Date: Sat, 21 Nov 2015 12:08:41 -0800 Subject: [petsc-users] Compiling PETSc on Windows with Cygwin: errors with `win32fe cl`; okay to use `gcc`, `g++`, `gfortran` as compilers instead of `win32cf cl`? Message-ID: Hi all, I'd like to try and install PETSc on Windows (even though I have been forewarned that it is better to simply install it on a unix system). Some information about my system: * Windows 10 Pro, 64 bit * latest Cygwin with make, gcc, g++, gfortran installed The PETSc installation instructions suggest that I run the following command: ./configure --with-cc="win32fe cl" --with-fc="win32fe ifort" --with-cxx="win32fe\ cl" --download-fblaslapack\ However, that results in the following error: C compiler you provided with -with-cc=win32fe cl does not work. Cannot compile C with /cygdrive/a/petsc/bin/win32fe/win32fe cl. The answer to this stackexchange question: http://stackoverflow.com/questions/30229620/petsc-build-error-c-compiler-does-not-work recommends the following command instead (with escaped whitespace): ./configure --with-cc="win32fe\ cl" --with-fc="win32fe\ ifort" --with-cxx="win32fe\ cl" --download-fblaslapack Running that command results in the following error (note no second line in error): C compiler you provided with -with-cc=win32fe\ cl does not work. In both cases, the configure.log output only contains (I do not use Chinese characters, these were in the file): > ?????arch-mswin-c-debug/lib/petsc/conf/configure.log What should I do? Also, do `gcc`/`g++` compiled programs not work on Windows? I ask because if I simply run the suggested command in the "Quick Installation" guide, using `gcc` and `g++`; then I don't get compilation errors. Would it be okay to simply use those compilers then? Kind regards, Brian -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Sat Nov 21 14:55:57 2015 From: balay at mcs.anl.gov (Satish Balay) Date: Sat, 21 Nov 2015 14:55:57 -0600 Subject: [petsc-users] Compiling PETSc on Windows with Cygwin: errors with `win32fe cl`; okay to use `gcc`, `g++`, `gfortran` as compilers instead of `win32cf cl`? In-Reply-To: References: Message-ID: Hm - If you can work with cygwin/gnu compilers - then you can build petsc with them - and use it. [however you can't build petsc with cygwin/gnu compilers - and then link in with an application using MS compilers] Satish On Sat, 21 Nov 2015, Brian Merchant wrote: > Hi all, > > I'd like to try and install PETSc on Windows (even though I have been > forewarned that it is better to simply install it on a unix system). > > Some information about my system: > * Windows 10 Pro, 64 bit > * latest Cygwin with make, gcc, g++, gfortran installed > > The PETSc installation instructions suggest that I run the following > command: > > ./configure --with-cc="win32fe cl" --with-fc="win32fe ifort" > --with-cxx="win32fe\ cl" --download-fblaslapack\ > > However, that results in the following error: > > C compiler you provided with -with-cc=win32fe cl does not work. > Cannot compile C with /cygdrive/a/petsc/bin/win32fe/win32fe cl. > > > The answer to this stackexchange question: > http://stackoverflow.com/questions/30229620/petsc-build-error-c-compiler-does-not-work > > recommends the following command instead (with escaped whitespace): > > ./configure --with-cc="win32fe\ cl" --with-fc="win32fe\ ifort" > --with-cxx="win32fe\ cl" --download-fblaslapack > > Running that command results in the following error (note no second line in > error): > > C compiler you provided with -with-cc=win32fe\ cl does not work. > > In both cases, the configure.log output only contains (I do not use Chinese > characters, these were in the file): > > > ?????arch-mswin-c-debug/lib/petsc/conf/configure.log > > What should I do? Also, do `gcc`/`g++` compiled programs not work on > Windows? I ask because if I simply run the suggested command in the "Quick > Installation" guide, using `gcc` and `g++`; then I don't get compilation > errors. Would it be okay to simply use those compilers then? > > Kind regards, > Brian > From bhmerchant at gmail.com Sat Nov 21 15:05:05 2015 From: bhmerchant at gmail.com (Brian Merchant) Date: Sat, 21 Nov 2015 13:05:05 -0800 Subject: [petsc-users] Compiling PETSc on Windows with Cygwin: errors with `win32fe cl`; okay to use `gcc`, `g++`, `gfortran` as compilers instead of `win32cf cl`? In-Reply-To: References: Message-ID: > [however you can't build petsc with cygwin/gnu compilers - and then link in with an application using MS compilers] Could you tell me a little more about what you mean here? (sorry, I am a newbie) Is it that if I make an application say in Visual Studio, and attempt to use that application to call PETSc functions compiled with Cygwin, then I will run into trouble? I contacted this group: http://www2.msic.ch/Software and got the latest build from them that is supposed to work with Visual Studio; which addresses your MS compilers concern? However, I also want to use petsc4py Python bindings with PETSc, so I shouldn't be in trouble there if I use Cygwin, right? Kind regards, Brian On Sat, Nov 21, 2015 at 12:55 PM, Satish Balay wrote: > Hm - If you can work with cygwin/gnu compilers - then you can build > petsc with them - and use it. > > [however you can't build petsc with cygwin/gnu compilers - and then > link in with an application using MS compilers] > > Satish > > On Sat, 21 Nov 2015, Brian Merchant wrote: > > > Hi all, > > > > I'd like to try and install PETSc on Windows (even though I have been > > forewarned that it is better to simply install it on a unix system). > > > > Some information about my system: > > * Windows 10 Pro, 64 bit > > * latest Cygwin with make, gcc, g++, gfortran installed > > > > The PETSc installation instructions suggest that I run the following > > command: > > > > ./configure --with-cc="win32fe cl" --with-fc="win32fe ifort" > > --with-cxx="win32fe\ cl" --download-fblaslapack\ > > > > However, that results in the following error: > > > > C compiler you provided with -with-cc=win32fe cl does not work. > > Cannot compile C with /cygdrive/a/petsc/bin/win32fe/win32fe cl. > > > > > > The answer to this stackexchange question: > > > http://stackoverflow.com/questions/30229620/petsc-build-error-c-compiler-does-not-work > > > > recommends the following command instead (with escaped whitespace): > > > > ./configure --with-cc="win32fe\ cl" --with-fc="win32fe\ ifort" > > --with-cxx="win32fe\ cl" --download-fblaslapack > > > > Running that command results in the following error (note no second line > in > > error): > > > > C compiler you provided with -with-cc=win32fe\ cl does not work. > > > > In both cases, the configure.log output only contains (I do not use > Chinese > > characters, these were in the file): > > > > > ?????arch-mswin-c-debug/lib/petsc/conf/configure.log > > > > What should I do? Also, do `gcc`/`g++` compiled programs not work on > > Windows? I ask because if I simply run the suggested command in the > "Quick > > Installation" guide, using `gcc` and `g++`; then I don't get compilation > > errors. Would it be okay to simply use those compilers then? > > > > Kind regards, > > Brian > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sat Nov 21 15:17:26 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sat, 21 Nov 2015 15:17:26 -0600 Subject: [petsc-users] Compiling PETSc on Windows with Cygwin: errors with `win32fe cl`; okay to use `gcc`, `g++`, `gfortran` as compilers instead of `win32cf cl`? In-Reply-To: References: Message-ID: <759C3878-52D0-45CB-BAE7-43E7EFBDAFBA@mcs.anl.gov> > On Nov 21, 2015, at 3:05 PM, Brian Merchant wrote: > > > [however you can't build petsc with cygwin/gnu compilers - and then > link in with an application using MS compilers] > > Could you tell me a little more about what you mean here? (sorry, I am a newbie) Is it that if I make an application say in Visual Studio, and attempt to use that application to call PETSc functions compiled with Cygwin, then I will run into trouble? Brian, If your application is in Visual Studio then you MUST build PETSc with the Microsoft/Intel compilers, you cannot use the cygwin gnu compilers. > > I contacted this group: http://www2.msic.ch/Software > and got the latest build from them that is supposed to work with Visual Studio; which addresses your MS compilers concern? Unfortunately this is the previous release of PETSc so is somewhat out of date. It is very strange that when you tried building you got garbage in the configure.log file. Can you try again (without the \ in compiler name business for any compiler.) > > However, I also want to use petsc4py Python bindings with PETSc, so I shouldn't be in trouble there if I use Cygwin, right? If you want to use petsc4py Python on a Windows machine then you can try that with the Cygwin Gnu compilers (and the Cygwin python). I don't know if anyone has ever done it so some issues that need fixing may come up but it should be possible. It is likely a great deal of work to get the petsc4py Python bindings working with Visual Studio code and the Windows python all together. That would be completely new territory. Barry > > Kind regards, > Brian > > On Sat, Nov 21, 2015 at 12:55 PM, Satish Balay wrote: > Hm - If you can work with cygwin/gnu compilers - then you can build > petsc with them - and use it. > > [however you can't build petsc with cygwin/gnu compilers - and then > link in with an application using MS compilers] > > Satish > > On Sat, 21 Nov 2015, Brian Merchant wrote: > > > Hi all, > > > > I'd like to try and install PETSc on Windows (even though I have been > > forewarned that it is better to simply install it on a unix system). > > > > Some information about my system: > > * Windows 10 Pro, 64 bit > > * latest Cygwin with make, gcc, g++, gfortran installed > > > > The PETSc installation instructions suggest that I run the following > > command: > > > > ./configure --with-cc="win32fe cl" --with-fc="win32fe ifort" > > --with-cxx="win32fe\ cl" --download-fblaslapack\ > > > > However, that results in the following error: > > > > C compiler you provided with -with-cc=win32fe cl does not work. > > Cannot compile C with /cygdrive/a/petsc/bin/win32fe/win32fe cl. > > > > > > The answer to this stackexchange question: > > http://stackoverflow.com/questions/30229620/petsc-build-error-c-compiler-does-not-work > > > > recommends the following command instead (with escaped whitespace): > > > > ./configure --with-cc="win32fe\ cl" --with-fc="win32fe\ ifort" > > --with-cxx="win32fe\ cl" --download-fblaslapack > > > > Running that command results in the following error (note no second line in > > error): > > > > C compiler you provided with -with-cc=win32fe\ cl does not work. > > > > In both cases, the configure.log output only contains (I do not use Chinese > > characters, these were in the file): > > > > > ?????arch-mswin-c-debug/lib/petsc/conf/configure.log > > > > What should I do? Also, do `gcc`/`g++` compiled programs not work on > > Windows? I ask because if I simply run the suggested command in the "Quick > > Installation" guide, using `gcc` and `g++`; then I don't get compilation > > errors. Would it be okay to simply use those compilers then? > > > > Kind regards, > > Brian > > > From balay at mcs.anl.gov Sat Nov 21 15:23:36 2015 From: balay at mcs.anl.gov (Satish Balay) Date: Sat, 21 Nov 2015 15:23:36 -0600 Subject: [petsc-users] Compiling PETSc on Windows with Cygwin: errors with `win32fe cl`; okay to use `gcc`, `g++`, `gfortran` as compilers instead of `win32cf cl`? In-Reply-To: References: Message-ID: On Sat, 21 Nov 2015, Brian Merchant wrote: > > [however you can't build petsc with cygwin/gnu compilers - and then > link in with an application using MS compilers] > > Could you tell me a little more about what you mean here? (sorry, I am a > newbie) Is it that if I make an application say in Visual Studio, and > attempt to use that application to call PETSc functions compiled with > Cygwin, then I will run into trouble? yes. Visual Studio uses Microsoft compilers - and you can link in libraries compiled with cygwin/gnu compilers. > > I contacted this group: http://www2.msic.ch/Software > and got the latest build from them that is supposed to work with Visual > Studio; which addresses your MS compilers concern? If you require to use PETSc from Visual studio - then this would be one option. [however the website lists petsc-3.5 - not the latest pets-3.6] > > However, I also want to use petsc4py Python bindings with PETSc, so I > shouldn't be in trouble there if I use Cygwin, right? Yes - petsc4py should work with cygwin/gnu compiler build of PETSc [with cygwin python] With cygwin - you can install liblapack-devel, libopenmpi-devel [with cygwin setup.exe] and use them to build PETSc and petsc4py Satish From balay at mcs.anl.gov Sat Nov 21 15:28:03 2015 From: balay at mcs.anl.gov (Satish Balay) Date: Sat, 21 Nov 2015 15:28:03 -0600 Subject: [petsc-users] Compiling PETSc on Windows with Cygwin: errors with `win32fe cl`; okay to use `gcc`, `g++`, `gfortran` as compilers instead of `win32cf cl`? In-Reply-To: References: Message-ID: On Sat, 21 Nov 2015, Satish Balay wrote: > On Sat, 21 Nov 2015, Brian Merchant wrote: > > > > [however you can't build petsc with cygwin/gnu compilers - and then > > link in with an application using MS compilers] > > > > Could you tell me a little more about what you mean here? (sorry, I am a > > newbie) Is it that if I make an application say in Visual Studio, and > > attempt to use that application to call PETSc functions compiled with > > Cygwin, then I will run into trouble? > > yes. Visual Studio uses Microsoft compilers - and you can link in > libraries compiled with cygwin/gnu compilers. can -> can't > > > > > I contacted this group: http://www2.msic.ch/Software > > and got the latest build from them that is supposed to work with Visual > > Studio; which addresses your MS compilers concern? > > If you require to use PETSc from Visual studio - then this would be one option. > [however the website lists petsc-3.5 - not the latest pets-3.6] > > > > > However, I also want to use petsc4py Python bindings with PETSc, so I > > shouldn't be in trouble there if I use Cygwin, right? > > Yes - petsc4py should work with cygwin/gnu compiler build of PETSc > [with cygwin python] > > With cygwin - you can install liblapack-devel, libopenmpi-devel [with > cygwin setup.exe] and use them to build PETSc and petsc4py > > Satish > From bhmerchant at gmail.com Sat Nov 21 15:31:30 2015 From: bhmerchant at gmail.com (Brian Merchant) Date: Sat, 21 Nov 2015 13:31:30 -0800 Subject: [petsc-users] Compiling PETSc on Windows with Cygwin: errors with `win32fe cl`; okay to use `gcc`, `g++`, `gfortran` as compilers instead of `win32cf cl`? In-Reply-To: <759C3878-52D0-45CB-BAE7-43E7EFBDAFBA@mcs.anl.gov> References: <759C3878-52D0-45CB-BAE7-43E7EFBDAFBA@mcs.anl.gov> Message-ID: Barry, Satish, Thank you for your time and explanations. Based on what you have explained, I think it will indeed be much easier for me to just build and use PETSc on my Linux system, since I don't build my own Python (I use whatever comes with the Anaconda Python distribution package). So, for me, it would be better to use PETSc with the Anaconda distribution for Unix-like systems. Thanks again, Brian On Sat, Nov 21, 2015 at 1:17 PM, Barry Smith wrote: > > > On Nov 21, 2015, at 3:05 PM, Brian Merchant > wrote: > > > > > [however you can't build petsc with cygwin/gnu compilers - and then > > link in with an application using MS compilers] > > > > Could you tell me a little more about what you mean here? (sorry, I am a > newbie) Is it that if I make an application say in Visual Studio, and > attempt to use that application to call PETSc functions compiled with > Cygwin, then I will run into trouble? > > Brian, > > If your application is in Visual Studio then you MUST build PETSc with > the Microsoft/Intel compilers, you cannot use the cygwin gnu compilers. > > > > > I contacted this group: http://www2.msic.ch/Software > > and got the latest build from them that is supposed to work with Visual > Studio; which addresses your MS compilers concern? > > Unfortunately this is the previous release of PETSc so is somewhat out > of date. > > It is very strange that when you tried building you got garbage in the > configure.log file. Can you try again (without the \ in compiler name > business for any compiler.) > > > > However, I also want to use petsc4py Python bindings with PETSc, so I > shouldn't be in trouble there if I use Cygwin, right? > > If you want to use petsc4py Python on a Windows machine then you can > try that with the Cygwin Gnu compilers (and the Cygwin python). I don't > know if anyone has ever done it so some issues that need fixing may come up > but it should be possible. > > It is likely a great deal of work to get the petsc4py Python bindings > working with Visual Studio code and the Windows python all together. That > would be completely new territory. > > Barry > > > > > Kind regards, > > Brian > > > > On Sat, Nov 21, 2015 at 12:55 PM, Satish Balay > wrote: > > Hm - If you can work with cygwin/gnu compilers - then you can build > > petsc with them - and use it. > > > > [however you can't build petsc with cygwin/gnu compilers - and then > > link in with an application using MS compilers] > > > > Satish > > > > On Sat, 21 Nov 2015, Brian Merchant wrote: > > > > > Hi all, > > > > > > I'd like to try and install PETSc on Windows (even though I have been > > > forewarned that it is better to simply install it on a unix system). > > > > > > Some information about my system: > > > * Windows 10 Pro, 64 bit > > > * latest Cygwin with make, gcc, g++, gfortran installed > > > > > > The PETSc installation instructions suggest that I run the following > > > command: > > > > > > ./configure --with-cc="win32fe cl" --with-fc="win32fe ifort" > > > --with-cxx="win32fe\ cl" --download-fblaslapack\ > > > > > > However, that results in the following error: > > > > > > C compiler you provided with -with-cc=win32fe cl does not work. > > > Cannot compile C with /cygdrive/a/petsc/bin/win32fe/win32fe cl. > > > > > > > > > The answer to this stackexchange question: > > > > http://stackoverflow.com/questions/30229620/petsc-build-error-c-compiler-does-not-work > > > > > > recommends the following command instead (with escaped whitespace): > > > > > > ./configure --with-cc="win32fe\ cl" --with-fc="win32fe\ ifort" > > > --with-cxx="win32fe\ cl" --download-fblaslapack > > > > > > Running that command results in the following error (note no second > line in > > > error): > > > > > > C compiler you provided with -with-cc=win32fe\ cl does not work. > > > > > > In both cases, the configure.log output only contains (I do not use > Chinese > > > characters, these were in the file): > > > > > > > ?????arch-mswin-c-debug/lib/petsc/conf/configure.log > > > > > > What should I do? Also, do `gcc`/`g++` compiled programs not work on > > > Windows? I ask because if I simply run the suggested command in the > "Quick > > > Installation" guide, using `gcc` and `g++`; then I don't get > compilation > > > errors. Would it be okay to simply use those compilers then? > > > > > > Kind regards, > > > Brian > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Sat Nov 21 15:44:07 2015 From: balay at mcs.anl.gov (Satish Balay) Date: Sat, 21 Nov 2015 15:44:07 -0600 Subject: [petsc-users] Compiling PETSc on Windows with Cygwin: errors with `win32fe cl`; okay to use `gcc`, `g++`, `gfortran` as compilers instead of `win32cf cl`? In-Reply-To: References: <759C3878-52D0-45CB-BAE7-43E7EFBDAFBA@mcs.anl.gov> Message-ID: Linux with Anaconda Python should work. We've primarily used system default python [on linux] - and that works. Using Windows with cygwin compilers, python should be pretty close to using linux. Satish On Sat, 21 Nov 2015, Brian Merchant wrote: > Barry, Satish, > > Thank you for your time and explanations. > > Based on what you have explained, I think it will indeed be much easier for > me to just build and use PETSc on my Linux system, since I don't build my > own Python (I use whatever comes with the Anaconda Python distribution > package). So, for me, it would be better to use PETSc with the Anaconda > distribution for Unix-like systems. > > Thanks again, > Brian From zonexo at gmail.com Sat Nov 21 20:01:53 2015 From: zonexo at gmail.com (TAY wee-beng) Date: Sun, 22 Nov 2015 10:01:53 +0800 Subject: [petsc-users] Use of DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90 In-Reply-To: References: <5641B879.4020504@gmail.com> <5641E37E.5070401@gmail.com> <565075F5.9050004@gmail.com> Message-ID: <56512211.2080502@gmail.com> On 21/11/2015 9:58 PM, Matthew Knepley wrote: > On Sat, Nov 21, 2015 at 7:47 AM, TAY wee-beng > wrote: > > > On 10/11/2015 8:47 PM, Matthew Knepley wrote: >> On Tue, Nov 10, 2015 at 6:30 AM, TAY wee-beng > > wrote: >> >> >> On 10/11/2015 8:27 PM, Matthew Knepley wrote: >>> On Tue, Nov 10, 2015 at 3:27 AM, TAY wee-beng >>> > wrote: >>> >>> Hi, >>> >>> Inside my subroutine, I need to access the DA variable >>> cu_types_array frequently. >>> >>> So I need to call DMDAVecGetArrayF90 and >>> DMDAVecRestoreArrayF90 before and after frequently. >>> >>> Is this necessary? Can I call DMDAVecGetArrayF90 at the >>> start and only call DMDAVecRestoreArrayF90 towards the >>> end, where I don't need to modify the values of >>> cu_types_array anymore? >>> >>> Will this cause memory corruption? >>> >>> >>> You cannot use any other vector operations before you have >>> called Restore. >> >> Hi, >> >> What do you mean by vector operations? I will just be doing >> some maths operation to change the values in cu_types_array. >> Is that fine? >> >> >> While you have the array, no other operation can change the values. > Hi, > > Let me clarify this. I declare in this way: > > DM da_cu_types > > Vec cu_types_local,cu_types_global > > PetscScalar,pointer :: cu_types_array(:,:,:) > > call > DMDACreate3d(MPI_COMM_WORLD,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE,DMDA_STENCIL_STAR,(end_ijk_uniform(1) > - sta_ijk_uniform(1) + 1),(end_ijk_uniform(2) - sta_ijk_uniform(2) > + 1),& > > (end_ijk_uniform(3) - sta_ijk_uniform(3) + > 1),PETSC_DECIDE,PETSC_DECIDE,PETSC_DECIDE,1,stencil_width_IIB,PETSC_NULL_INTEGER,PETSC_NULL_INTEGER,PETSC_NULL_INTEGER,da_cu_types,ierr) > > call DMCreateGlobalVector(da_cu_types,cu_types_global,ierr) > > call DMCreateLocalVector(da_cu_types,cu_types_local,ierr) > > So when I need to change the values in DA variable cu_types, I call: > > call > DMDAVecGetArrayF90(da_cu_types,cu_types_local,cu_types_array,ierr) > > .... math operations, changing the values of cu_types_array, such as: > > cu_types_array = 0.d0 > > call > DMDAVecRestoreArrayF90(da_cu_types,cu_types_local,cu_types_array,ierr) > > 1st of all, does these DMDAVecGetArrayF90 and > DMDAVecRestoreArrayF90 take a lot of time, especially if I call > them many times. > > > No. Hi, Another qn is supposed the sta_ijk_uniform and end_ijk_uniform change every time step. Hence I need to destroy the DM etc and re-create at each time step. If that's the case, will this slow down my code? Thanks. > > Next qn is whether if I can call DMDAVecGetArrayF90 at the start, > and DMDAVecRestoreArrayF90 after operations similar to the one > above is finished. > > > You cannot do other Vec operations before you call Restore. > > Matt > > Thanks. >> >> Matt >> >>> Also, must the array be restored using >>> DMDAVecRestoreArrayF90 before calling >>> DMLocalToLocalBegin,DMLocalToLocalEnd? >>> >>> >>> Yes. >>> >>> Matt >>> >>> >>> -- >>> Thank you. >>> >>> Yours sincerely, >>> >>> TAY wee-beng >>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin >>> their experiments is infinitely more interesting than any >>> results to which their experiments lead. >>> -- Norbert Wiener >> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to >> which their experiments lead. >> -- Norbert Wiener > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sat Nov 21 21:10:16 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sat, 21 Nov 2015 21:10:16 -0600 Subject: [petsc-users] Use of DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90 In-Reply-To: <56512211.2080502@gmail.com> References: <5641B879.4020504@gmail.com> <5641E37E.5070401@gmail.com> <565075F5.9050004@gmail.com> <56512211.2080502@gmail.com> Message-ID: <06439A58-217A-4701-B76D-5D8330DBCD59@mcs.anl.gov> > On Nov 21, 2015, at 8:01 PM, TAY wee-beng wrote: > > > On 21/11/2015 9:58 PM, Matthew Knepley wrote: >> On Sat, Nov 21, 2015 at 7:47 AM, TAY wee-beng wrote: >> >> On 10/11/2015 8:47 PM, Matthew Knepley wrote: >>> On Tue, Nov 10, 2015 at 6:30 AM, TAY wee-beng wrote: >>> >>> On 10/11/2015 8:27 PM, Matthew Knepley wrote: >>>> On Tue, Nov 10, 2015 at 3:27 AM, TAY wee-beng wrote: >>>> Hi, >>>> >>>> Inside my subroutine, I need to access the DA variable cu_types_array frequently. >>>> >>>> So I need to call DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90 before and after frequently. >>>> >>>> Is this necessary? Can I call DMDAVecGetArrayF90 at the start and only call DMDAVecRestoreArrayF90 towards the end, where I don't need to modify the values of cu_types_array anymore? >>>> >>>> Will this cause memory corruption? >>>> >>>> You cannot use any other vector operations before you have called Restore. >>> >>> Hi, >>> >>> What do you mean by vector operations? I will just be doing some maths operation to change the values in cu_types_array. Is that fine? >>> >>> While you have the array, no other operation can change the values. >> Hi, >> >> Let me clarify this. I declare in this way: >> >> DM da_cu_types >> >> Vec cu_types_local,cu_types_global >> >> PetscScalar,pointer :: cu_types_array(:,:,:) >> >> call DMDACreate3d(MPI_COMM_WORLD,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE,DMDA_STENCIL_STAR,(end_ijk_uniform(1) - sta_ijk_uniform(1) + 1),(end_ijk_uniform(2) - sta_ijk_uniform(2) + 1),& >> >> (end_ijk_uniform(3) - sta_ijk_uniform(3) + 1),PETSC_DECIDE,PETSC_DECIDE,PETSC_DECIDE,1,stencil_width_IIB,PETSC_NULL_INTEGER,PETSC_NULL_INTEGER,PETSC_NULL_INTEGER,da_cu_types,ierr) >> >> call DMCreateGlobalVector(da_cu_types,cu_types_global,ierr) >> >> call DMCreateLocalVector(da_cu_types,cu_types_local,ierr) >> >> So when I need to change the values in DA variable cu_types, I call: >> >> call DMDAVecGetArrayF90(da_cu_types,cu_types_local,cu_types_array,ierr) >> >> .... math operations, changing the values of cu_types_array, such as: >> >> cu_types_array = 0.d0 >> >> call DMDAVecRestoreArrayF90(da_cu_types,cu_types_local,cu_types_array,ierr) >> >> 1st of all, does these DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90 take a lot of time, especially if I call them many times. >> >> No. > Hi, > > Another qn is supposed the sta_ijk_uniform and end_ijk_uniform change every time step. Hence I need to destroy the DM etc and re-create at each time step. > > If that's the case, will this slow down my code? Well what choice do you have? If they change you need to create the DA again so just go ahead and do it. Worry about writing your simulation and getting good simulation results, worry less about optimizations that may or may not matter. Barry > > Thanks. >> >> Next qn is whether if I can call DMDAVecGetArrayF90 at the start, and DMDAVecRestoreArrayF90 after operations similar to the one above is finished. >> >> You cannot do other Vec operations before you call Restore. >> >> Matt >> >> Thanks. >>> >>> Matt >>> >>> >>>> Also, must the array be restored using DMDAVecRestoreArrayF90 before calling DMLocalToLocalBegin,DMLocalToLocalEnd? >>>> >>>> Yes. >>>> >>>> Matt >>>> >>>> >>>> -- >>>> Thank you. >>>> >>>> Yours sincerely, >>>> >>>> TAY wee-beng >>>> >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >> >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener > From qince168 at gmail.com Sun Nov 22 02:26:45 2015 From: qince168 at gmail.com (Ce Qin) Date: Sun, 22 Nov 2015 16:26:45 +0800 Subject: [petsc-users] petsc master build error Message-ID: Dear all, I got this error when compile petsc master branch. Best regards, Ce Qin -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: make.log Type: text/x-log Size: 69357 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: text/x-log Size: 4043062 bytes Desc: not available URL: From qince168 at gmail.com Sun Nov 22 02:34:07 2015 From: qince168 at gmail.com (Ce Qin) Date: Sun, 22 Nov 2015 16:34:07 +0800 Subject: [petsc-users] petsc master build error In-Reply-To: References: Message-ID: Sorry for the Chinese character in the previous make.log. 2015-11-22 16:26 GMT+08:00 Ce Qin : > Dear all, > > I got this error when compile petsc master branch. > > Best regards, > Ce Qin > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: make.log Type: text/x-log Size: 69166 bytes Desc: not available URL: From knepley at gmail.com Sun Nov 22 07:32:21 2015 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 22 Nov 2015 07:32:21 -0600 Subject: [petsc-users] petsc master build error In-Reply-To: References: Message-ID: You have an old 'sowing' package. Delete your PETSC_ARCH directory (perhaps saving the reconfigure-$PETSC_ARCH.py) and rebuild everything. Thanks, Matt On Sun, Nov 22, 2015 at 2:34 AM, Ce Qin wrote: > Sorry for the Chinese character in the previous make.log. > > 2015-11-22 16:26 GMT+08:00 Ce Qin : > >> Dear all, >> >> I got this error when compile petsc master branch. >> >> Best regards, >> Ce Qin >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From elbueler at alaska.edu Sun Nov 22 13:40:11 2015 From: elbueler at alaska.edu (Ed Bueler) Date: Sun, 22 Nov 2015 10:40:11 -0900 Subject: [petsc-users] master branch option "-snes_monitor_solution" Message-ID: Dear PETSc -- When I use option -snes_monitor_solution in master branch I get the error below. I have a sense that this is related to the change listed at http://www.mcs.anl.gov/petsc/documentation/changes/dev.html, namely "SNESSetMonitor(SNESMonitorXXX, calls now require passing a viewer as the final argument, you can no longer pass a NULL)" but the error message below is not informative enough to tell me what to do at the command line. Note that my X11 windows do work, as other options successfully give line graphs etc. Do I need -snes_monitor_solution Z with some value for Z? If so, where are the possibilities documented? Thanks! Ed $ ./ex5 -snes_monitor_solution [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Null argument, when expecting valid pointer [0]PETSC ERROR: Null Object: Parameter # 4 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.6.2-1635-g5e95a8a GIT Date: 2015-11-21 16:14:08 -0600 [0]PETSC ERROR: ./ex5 on a linux-c-dbg named bueler-leopard by ed Sun Nov 22 10:31:33 2015 [0]PETSC ERROR: Configure options --download-mpich --download-triangle --with-debugging=1 [0]PETSC ERROR: #1 SNESMonitorSolution() line 33 in /home/ed/petsc/src/snes/interface/snesut.c [0]PETSC ERROR: #2 SNESMonitor() line 3383 in /home/ed/petsc/src/snes/interface/snes.c [0]PETSC ERROR: #3 SNESSolve_NEWTONLS() line 191 in /home/ed/petsc/src/snes/impls/ls/ls.c [0]PETSC ERROR: #4 SNESSolve() line 3984 in /home/ed/petsc/src/snes/interface/snes.c [0]PETSC ERROR: #5 main() line 171 in /home/ed/petsc/src/snes/examples/tutorials/ex5.c [0]PETSC ERROR: PETSc Option Table entries: [0]PETSC ERROR: -snes_monitor_solution [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- application called MPI_Abort(MPI_COMM_WORLD, 85) - process 0 [unset]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 85) - process 0 -- Ed Bueler Dept of Math and Stat and Geophysical Institute University of Alaska Fairbanks Fairbanks, AK 99775-6660 301C Chapman and 410D Elvey 907 474-7693 and 907 474-7199 (fax 907 474-5394) -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sun Nov 22 20:51:14 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sun, 22 Nov 2015 20:51:14 -0600 Subject: [petsc-users] master branch option "-snes_monitor_solution" In-Reply-To: References: Message-ID: <49DD4FA4-8C55-43D8-A6E3-9266231110EC@mcs.anl.gov> I totally botched that update; looks like I broke a lot of the command line monitor options in master. Fixing it properly will take some work but also enhance the command line monitor and reduce the code a bit. Thanks for letting us know. Barry > On Nov 22, 2015, at 1:40 PM, Ed Bueler wrote: > > Dear PETSc -- > > When I use option -snes_monitor_solution in master branch I get the error below. I have a sense that this is related to the change listed at http://www.mcs.anl.gov/petsc/documentation/changes/dev.html, namely > > "SNESSetMonitor(SNESMonitorXXX, calls now require passing a viewer as the final argument, you can no longer pass a NULL)" > > but the error message below is not informative enough to tell me what to do at the command line. > > Note that my X11 windows do work, as other options successfully give line graphs etc. > > Do I need > > -snes_monitor_solution Z > > with some value for Z? If so, where are the possibilities documented? > > Thanks! > > Ed > > > > $ ./ex5 -snes_monitor_solution > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Null argument, when expecting valid pointer > [0]PETSC ERROR: Null Object: Parameter # 4 > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.6.2-1635-g5e95a8a GIT Date: 2015-11-21 16:14:08 -0600 > [0]PETSC ERROR: ./ex5 on a linux-c-dbg named bueler-leopard by ed Sun Nov 22 10:31:33 2015 > [0]PETSC ERROR: Configure options --download-mpich --download-triangle --with-debugging=1 > [0]PETSC ERROR: #1 SNESMonitorSolution() line 33 in /home/ed/petsc/src/snes/interface/snesut.c > [0]PETSC ERROR: #2 SNESMonitor() line 3383 in /home/ed/petsc/src/snes/interface/snes.c > [0]PETSC ERROR: #3 SNESSolve_NEWTONLS() line 191 in /home/ed/petsc/src/snes/impls/ls/ls.c > [0]PETSC ERROR: #4 SNESSolve() line 3984 in /home/ed/petsc/src/snes/interface/snes.c > [0]PETSC ERROR: #5 main() line 171 in /home/ed/petsc/src/snes/examples/tutorials/ex5.c > [0]PETSC ERROR: PETSc Option Table entries: > [0]PETSC ERROR: -snes_monitor_solution > [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- > application called MPI_Abort(MPI_COMM_WORLD, 85) - process 0 > [unset]: aborting job: > application called MPI_Abort(MPI_COMM_WORLD, 85) - process 0 > > > > -- > Ed Bueler > Dept of Math and Stat and Geophysical Institute > University of Alaska Fairbanks > Fairbanks, AK 99775-6660 > 301C Chapman and 410D Elvey > 907 474-7693 and 907 474-7199 (fax 907 474-5394) From gpitton at sissa.it Mon Nov 23 01:31:25 2015 From: gpitton at sissa.it (Giuseppe Pitton) Date: Mon, 23 Nov 2015 08:31:25 +0100 Subject: [petsc-users] Using PFFT within PETSc Message-ID: <5652C0CD.40100@sissa.it> Dear users and developers, I am trying to interface PETSc with the parallel fast Fourier transform library PFFT (https://www-user.tu-chemnitz.de/~potts/workgroup/pippig/software.php.en), based in turn on FFTW. My plan is to build a spectral differentiation code, and in the attached files you can see a simple example. The code works correctly in serial, but in parallel there are some problems regarding the output of the results, I think due to some differences in the way PETSc and PFFT store data, but I'm not sure if this is really the issue. In the attached code, the number of processors used should be specified at compile time in the variable "ncpus". As long as ncpus = 1, everything works fine, but if ncpus = 2 or an higher power of 2, the code terminates correctly but the results show some artifacts, as you can see from the generated hdf5 file, named "output-pfft.h5". In the makefile the variables PFFTINC, PFFTLIB, FFTWINC and FFTWLIB should be set correctly. Thank you, Giuseppe -------------- next part -------------- PFFTINC = /scratch/opt/pfft-dev/include FFTWINC = /scratch/opt/fftw-3.3.4/include PFFTLIB = /scratch/opt/pfft-dev/lib FFTWLIB = /scratch/opt/fftw-3.3.4/lib CFLAGS =-std=c99 -I${PFFTINC} -I${FFTWINC} -L${PFFTLIB} -L${FFTWLIB} CLEANFILES = output-pfft.h5 test-pfft NP = 1 include ${PETSC_DIR}lib/petsc/conf/variables include ${PETSC_DIR}lib/petsc/conf/rules test-pfft: test-pfft.o chkopts -${CLINKER} -o test-pfft test-pfft.o -lm -lpfft -lfftw3 -lfftw3_mpi ${PETSC_LIB} ${RM} test-pfft.o include ${PETSC_DIR}lib/petsc/conf/test -------------- next part -------------- A non-text attachment was scrubbed... Name: test-pfft.c Type: text/x-csrc Size: 5736 bytes Desc: not available URL: From knepley at gmail.com Mon Nov 23 08:09:53 2015 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 23 Nov 2015 08:09:53 -0600 Subject: [petsc-users] Using PFFT within PETSc In-Reply-To: <5652C0CD.40100@sissa.it> References: <5652C0CD.40100@sissa.it> Message-ID: On Mon, Nov 23, 2015 at 1:31 AM, Giuseppe Pitton wrote: > Dear users and developers, > I am trying to interface PETSc with the parallel fast Fourier transform > library PFFT ( > https://www-user.tu-chemnitz.de/~potts/workgroup/pippig/software.php.en), > based in turn on FFTW. My plan is to build a spectral differentiation code, > and in the attached files you can see a simple example. > Is it possible to write this example using the existing calls? http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateFFT.html That way we would have a baseline we could both run that works, and then we could look at something broken. Thanks, Matt > The code works correctly in serial, but in parallel there are some > problems regarding the output of the results, I think due to some > differences in the way PETSc and PFFT store data, but I'm not sure if this > is really the issue. > In the attached code, the number of processors used should be specified at > compile time in the variable "ncpus". As long as ncpus = 1, everything > works fine, but if ncpus = 2 or an higher power of 2, the code terminates > correctly but the results show some artifacts, as you can see from the > generated hdf5 file, named "output-pfft.h5". > In the makefile the variables PFFTINC, PFFTLIB, FFTWINC and FFTWLIB should > be set correctly. > Thank you, > Giuseppe > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From gpitton at sissa.it Mon Nov 23 09:51:11 2015 From: gpitton at sissa.it (Giuseppe Pitton) Date: Mon, 23 Nov 2015 16:51:11 +0100 Subject: [petsc-users] Using PFFT within PETSc In-Reply-To: References: <5652C0CD.40100@sissa.it> Message-ID: <565335EF.7020901@sissa.it> Dear Matt, I cannot rewrite this example using the MatCreateFFT calls because it is not clear to me which is the shape and the ordering of the vectors produced by MatScatterPetscToFFTW. Furthermore, if I understand correctly this function returns the c2r / r2c Fourier transforms, and not the c2c transforms (unless PETSc is configured for complex data). However, I have rewritten the example using FFTW instead of PFFT (attached), and it runs fine both in serial and in parallel (with a cpu count that should be a power of two). However, I prefer using PFFT over FFTW, so if somebody has comments on this library, it would be great to know. Thank you, Giuseppe On 11/23/2015 03:09 PM, Matthew Knepley wrote: > On Mon, Nov 23, 2015 at 1:31 AM, Giuseppe Pitton > wrote: > > Dear users and developers, > I am trying to interface PETSc with the parallel fast Fourier > transform library PFFT > (https://www-user.tu-chemnitz.de/~potts/workgroup/pippig/software.php.en > ), > based in turn on FFTW. My plan is to build a spectral > differentiation code, and in the attached files you can see a > simple example. > > > Is it possible to write this example using the existing calls? > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateFFT.html > > That way we would have a baseline we could both run that works, and > then we could look > at something broken. > > Thanks, > > Matt > > The code works correctly in serial, but in parallel there are some > problems regarding the output of the results, I think due to some > differences in the way PETSc and PFFT store data, but I'm not sure > if this is really the issue. > In the attached code, the number of processors used should be > specified at compile time in the variable "ncpus". As long as > ncpus = 1, everything works fine, but if ncpus = 2 or an higher > power of 2, the code terminates correctly but the results show > some artifacts, as you can see from the generated hdf5 file, named > "output-pfft.h5". > In the makefile the variables PFFTINC, PFFTLIB, FFTWINC and > FFTWLIB should be set correctly. > Thank you, > Giuseppe > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- PFFTINC = /pfft-1.0.8/include FFTWINC = /fftw-3.3.4/include PFFTLIB = /pfft-1.0.8/lib FFTWLIB = /fftw-3.3.4/lib CFLAGS =-std=c99 -I${PFFTINC} -I${FFTWINC} -L${PFFTLIB} -L${FFTWLIB} CLEANFILES = output-pfft.h5 test-pfft NP = 1 include ${PETSC_DIR}lib/petsc/conf/variables include ${PETSC_DIR}lib/petsc/conf/rules test-fftw: test-fftw.o chkopts -${CLINKER} -o test-fftw test-fftw.o -lm -lfftw3 -lfftw3_mpi ${PETSC_LIB} ${RM} test-fftw.o test-pfft: test-pfft.o chkopts -${CLINKER} -o test-pfft test-pfft.o -lm -lpfft -lfftw3 -lfftw3_mpi ${PETSC_LIB} ${RM} test-pfft.o include ${PETSC_DIR}lib/petsc/conf/test -------------- next part -------------- A non-text attachment was scrubbed... Name: test-fftw.c Type: text/x-csrc Size: 5423 bytes Desc: not available URL: From adlinds3 at ncsu.edu Mon Nov 23 10:46:18 2015 From: adlinds3 at ncsu.edu (Alex Lindsay) Date: Mon, 23 Nov 2015 11:46:18 -0500 Subject: [petsc-users] Debugging failed solve (what's an acceptable upper bound to the condition number?) In-Reply-To: References: <564F691F.9020302@ncsu.edu> <60DF658D-60DC-4CA6-B64E-40E2D31C91D2@mcs.anl.gov> <564FC45C.8020300@ncsu.edu> Message-ID: <565342DA.60306@ncsu.edu> On 11/20/2015 08:36 PM, Barry Smith wrote: > Always make sure that when you reply it goes to everyone on the mailing list; otherwise you're stuck with only stupid old me trying to understand what is going on. Oops, used to just hitting ctrl-r for the Moose list. > > Can you run with everything else the same but use equal permittivities? Do all the huge condition numbers and no convergence of the nonlinear solve go away? With equal permittivities, I still see the opposite electric field signs with Newton. When I run with Jacobian-free, I still have non-convergence issues and a large condition number, so I definitely still have more work to do. > > > Barry > > >> On Nov 20, 2015, at 7:09 PM, Alex Lindsay wrote: >> >> I think I may be honing in on what's causing my problems. I have an interface where I am coupling two different subdomains. Among the physics at the interface is a jump discontinuity in the gradient of the electrical potential (e.g. a jump discontinuity in the electric field), governed by the ratio of the permittivities on either side of the interface. This is implemented in my code like this: >> >> Real >> DGMatDiffusionInt::computeQpResidual(Moose::DGResidualType type) >> { >> if (_D_neighbor[_qp] < std::numeric_limits::epsilon()) >> mooseError("It doesn't appear that DG material properties got passed."); >> >> Real r = 0; >> >> switch (type) >> { >> case Moose::Element: >> r += 0.5 * (-_D[_qp] * _grad_u[_qp] * _normals[_qp] + -_D_neighbor[_qp] * _grad_neighbor_value[_qp] * _normals[_qp]) * _test[_i][_qp]; >> break; >> >> case Moose::Neighbor: >> r += 0.5 * (_D[_qp] * _grad_u[_qp] * _normals[_qp] + _D_neighbor[_qp] * _grad_neighbor_value[_qp] * _normals[_qp]) * _test_neighbor[_i][_qp]; >> break; >> } >> >> return r; >> } >> >> where here _D and _D_neighbor are the permittivities on either side of the interface. Attached are pictures showing the solution using Newton, and the solution using a Jacobin-Free method. Newton's method yields electric fields with opposite signs on either side of the interface, which is physically impossible. The Jacobian-free solution yields electric fields with the same sign, and with the proper ratio (a ratio of 5, equivalent to the ratio of the permittivities). I'm sure if I had the proper numerical analysis background, I might know why Newton's method has a hard time here, but I don't. Could someone explain why? >> >> Alex >> >> On 11/20/2015 03:24 PM, Barry Smith wrote: >>> Do you really only have 851 variables? >>> >>> SVD: condition number 1.457087640207e+12, 0 of 851 singular values are (nearly) zero >>> >>> if so you can use -snes_fd and -ksp_view_pmat binary:filename to save the small matrix and then load it up into >>> MATLAB or similar tool to fully analysis its eigenstructure to see the distribution from the tiny values to the large values; Is it just a small number of tiny ones etc. >>> >>> Note that with such a large condition number the factor the linear system "converges" quickly may be meaningless since a small residual doesn't always mean a small error. The error code still be huge >>> >>> >>> Barry >>> >>> >>> >>>> On Nov 20, 2015, at 12:40 PM, Alex Lindsay wrote: >>>> >>>> Hello, >>>> >>>> I have an application built on top of the Moose framework, and I'm trying to debug a solve that is not converging. My linear solve converges very nicely. However, my non-linear solve does not, and the problem appears to be in the line search. Reading the PetSc FAQ, I see that the most common cause of poor line searches are bad Jacobians. However, I'm using a finite-differenced Jacobian; if I run -snes_type=test, I get "norm of matrix ratios" < 1e-15. Thus in this case the Jacobian should be accurate. I'm wondering then if my problem might be these (taken from the FAQ page): >>>> >>>> ? The matrix is very ill-conditioned. Check the condition number. >>>> ? Try to improve it by choosing the relative scaling of components/boundary conditions. >>>> ? Try -ksp_diagonal_scale -ksp_diagonal_scale_fix. >>>> ? Perhaps change the formulation of the problem to produce more friendly algebraic equations. >>>> ? The matrix is nonlinear (e.g. evaluated using finite differencing of a nonlinear function). Try different differencing parameters, ./configure --with-precision=__float128 --download-f2cblaslapack, check if it converges in "easier" parameter regimes. >>>> I'm almost ashamed to share my condition number because I'm sure it must be absurdly high. Without applying -ksp_diagonal_scale and -ksp_diagonal_scale_fix, the condition number is around 1e25. When I do apply those two parameters, the condition number is reduced to 1e17. Even after scaling all my variable residuals so that they were all on the order of unity (a suggestion on the Moose list), I still have a condition number of 1e12. I have no experience with condition numbers, but knowing that perfect condition number is unity, 1e12 seems unacceptable. What's an acceptable upper limit on the condition number? Is it problem dependent? Having already tried scaling the individual variable residuals, I'm not exactly sure what my next method would be for trying to reduce the condition number. >>>> >>>> I definitely have a nonlinear problem. Could I be having problems because I'm finite differencing non-linear residuals to form my Jacobian? I can see about using a different differencing parameter. I'm also going to consider trying quad precision. However, my hypothesis is that my condition number is the fundamental problem. Is that a reasonable hypothesis? >>>> >>>> If it's useful, below is console output with -pc_type=svd >>>> >>>> Time Step 1, time = 1e-10 >>>> dt = 1e-10 >>>> |residual|_2 of individual variables: >>>> potential: 8.12402e+07 >>>> potentialliq: 0.000819748 >>>> em: 49.206 >>>> emliq: 3.08187e-11 >>>> Arp: 2375.94 >>>> >>>> 0 Nonlinear |R| = 8.124020e+07 >>>> SVD: condition number 1.457087640207e+12, 0 of 851 singular values are (nearly) zero >>>> SVD: smallest singular values: 5.637144317564e-09 9.345415388433e-08 4.106132915572e-05 1.017339655185e-04 1.147649477723e-04 >>>> SVD: largest singular values : 1.498505466947e+03 1.577560767570e+03 1.719172328193e+03 2.344218235296e+03 8.213813311188e+03 >>>> 0 KSP unpreconditioned resid norm 3.185019606208e+05 true resid norm 3.185019606208e+05 ||r(i)||/||b|| 1.000000000000e+00 >>>> 1 KSP unpreconditioned resid norm 6.382886902896e-07 true resid norm 6.382761808414e-07 ||r(i)||/||b|| 2.003994511046e-12 >>>> Linear solve converged due to CONVERGED_RTOL iterations 1 >>>> Line search: Using full step: fnorm 8.124020470169e+07 gnorm 1.097605946684e+01 >>>> |residual|_2 of individual variables: >>>> potential: 8.60047 >>>> potentialliq: 0.335436 >>>> em: 2.26472 >>>> emliq: 0.642578 >>>> Arp: 6.39151 >>>> >>>> 1 Nonlinear |R| = 1.097606e+01 >>>> SVD: condition number 1.457473763066e+12, 0 of 851 singular values are (nearly) zero >>>> SVD: smallest singular values: 5.637185516434e-09 9.347128557672e-08 1.017339655587e-04 1.146760266781e-04 4.064422034774e-04 >>>> SVD: largest singular values : 1.498505466944e+03 1.577544976882e+03 1.718956369043e+03 2.343692402876e+03 8.216049987736e+03 >>>> 0 KSP unpreconditioned resid norm 2.653715381459e+01 true resid norm 2.653715381459e+01 ||r(i)||/||b|| 1.000000000000e+00 >>>> 1 KSP unpreconditioned resid norm 6.031179341420e-05 true resid norm 6.031183387732e-05 ||r(i)||/||b|| 2.272731819648e-06 >>>> Linear solve converged due to CONVERGED_RTOL iterations 1 >>>> Line search: gnorm after quadratic fit 2.485190757827e+11 >>>> Line search: Cubic step no good, shrinking lambda, current gnorm 2.632996340352e+10 lambda=5.0000000000000003e-02 >>>> Line search: Cubic step no good, shrinking lambda, current gnorm 4.290675557416e+09 lambda=2.5000000000000001e-02 >>>> Line search: Cubic step no good, shrinking lambda, current gnorm 4.332980055153e+08 lambda=1.2500000000000001e-02 >>>> Line search: Cubic step no good, shrinking lambda, current gnorm 1.677118626669e+07 lambda=6.2500000000000003e-03 >>>> Line search: Cubic step no good, shrinking lambda, current gnorm 1.024469780306e+05 lambda=3.1250000000000002e-03 >>>> Line search: Cubic step no good, shrinking lambda, current gnorm 7.011543252988e+03 lambda=1.5625000000000001e-03 >>>> Line search: Cubic step no good, shrinking lambda, current gnorm 1.750171277470e+03 lambda=7.8125000000000004e-04 >>>> Line search: Cubic step no good, shrinking lambda, current gnorm 3.486970625406e+02 lambda=3.4794637057251714e-04 >>>> Line search: Cubic step no good, shrinking lambda, current gnorm 7.830624839582e+01 lambda=1.5977866967992950e-04 >>>> Line search: Cubic step no good, shrinking lambda, current gnorm 2.147529381328e+01 lambda=6.8049915671999093e-05 >>>> Line search: Cubic step no good, shrinking lambda, current gnorm 1.138950943123e+01 lambda=1.7575203052774536e-05 >>>> Line search: Cubically determined step, current gnorm 1.095195976135e+01 lambda=1.7575203052774537e-06 >>>> |residual|_2 of individual variables: >>>> potential: 8.59984 >>>> potentialliq: 0.395753 >>>> em: 2.26492 >>>> emliq: 0.642578 >>>> Arp: 6.34735 >>>> >>>> 2 Nonlinear |R| = 1.095196e+01 >>>> SVD: condition number 1.457459214030e+12, 0 of 851 singular values are (nearly) zero >>>> SVD: smallest singular values: 5.637295371943e-09 9.347057884198e-08 1.017339655949e-04 1.146738253493e-04 4.064421554132e-04 >>>> SVD: largest singular values : 1.498505466946e+03 1.577543742603e+03 1.718948052797e+03 2.343672206864e+03 8.216128082047e+03 >>>> 0 KSP unpreconditioned resid norm 2.653244141805e+01 true resid norm 2.653244141805e+01 ||r(i)||/||b|| 1.000000000000e+00 >>>> 1 KSP unpreconditioned resid norm 4.480869560737e-05 true resid norm 4.480686665183e-05 ||r(i)||/||b|| 1.688757771886e-06 >>>> Linear solve converged due to CONVERGED_RTOL iterations 1 >>>> Line search: gnorm after quadratic fit 2.481752147885e+11 >>>> Line search: Cubic step no good, shrinking lambda, current gnorm 2.631959989642e+10 lambda=5.0000000000000003e-02 >>>> Line search: Cubic step no good, shrinking lambda, current gnorm 4.289110800463e+09 lambda=2.5000000000000001e-02 >>>> Line search: Cubic step no good, shrinking lambda, current gnorm 4.332043942482e+08 lambda=1.2500000000000001e-02 >>>> Line search: Cubic step no good, shrinking lambda, current gnorm 1.677933337886e+07 lambda=6.2500000000000003e-03 >>>> Line search: Cubic step no good, shrinking lambda, current gnorm 1.027980597206e+05 lambda=3.1250000000000002e-03 >>>> Line search: Cubic step no good, shrinking lambda, current gnorm 7.054113639063e+03 lambda=1.5625000000000001e-03 >>>> Line search: Cubic step no good, shrinking lambda, current gnorm 1.771258630210e+03 lambda=7.8125000000000004e-04 >>>> Line search: Cubic step no good, shrinking lambda, current gnorm 3.517070127496e+02 lambda=3.4519087020105563e-04 >>>> Line search: Cubic step no good, shrinking lambda, current gnorm 7.844350966118e+01 lambda=1.5664532891249369e-04 >>>> Line search: Cubic step no good, shrinking lambda, current gnorm 2.114833995101e+01 lambda=6.5367917100814859e-05 >>>> Line search: Cubic step no good, shrinking lambda, current gnorm 1.144636844292e+01 lambda=1.6044984646715980e-05 >>>> Line search: Cubic step no good, shrinking lambda, current gnorm 1.095640770627e+01 lambda=1.6044984646715980e-06 >>>> Line search: Cubic step no good, shrinking lambda, current gnorm 1.095196729511e+01 lambda=1.6044984646715980e-07 >>>> Line search: Cubically determined step, current gnorm 1.095195451041e+01 lambda=2.3994454223607641e-08 >>>> |residual|_2 of individual variables: >>>> potential: 8.59983 >>>> potentialliq: 0.396107 >>>> em: 2.26492 >>>> emliq: 0.642578 >>>> Arp: 6.34733 >>>> >>>> 3 Nonlinear |R| = 1.095195e+01 >>>> SVD: condition number 1.457474387942e+12, 0 of 851 singular values are (nearly) zero >>>> SVD: smallest singular values: 5.637237413167e-09 9.347057670885e-08 1.017339654798e-04 1.146737961973e-04 4.064420550524e-04 >>>> SVD: largest singular values : 1.498505466946e+03 1.577543716995e+03 1.718947893048e+03 2.343671853830e+03 8.216129148438e+03 >>>> 0 KSP unpreconditioned resid norm 2.653237816527e+01 true resid norm 2.653237816527e+01 ||r(i)||/||b|| 1.000000000000e+00 >>>> 1 KSP unpreconditioned resid norm 8.525213442515e-05 true resid norm 8.527696332776e-05 ||r(i)||/||b|| 3.214071607022e-06 >>>> Linear solve converged due to CONVERGED_RTOL iterations 1 >>>> Line search: gnorm after quadratic fit 2.481576195523e+11 >>>> Line search: Cubic step no good, shrinking lambda, current gnorm 2.632005412624e+10 lambda=5.0000000000000003e-02 >>>> Line search: Cubic step no good, shrinking lambda, current gnorm 4.289212002697e+09 lambda=2.5000000000000001e-02 >>>> Line search: Cubic step no good, shrinking lambda, current gnorm 4.332196637845e+08 lambda=1.2500000000000001e-02 >>>> Line search: Cubic step no good, shrinking lambda, current gnorm 1.678040222943e+07 lambda=6.2500000000000003e-03 >>>> Line search: Cubic step no good, shrinking lambda, current gnorm 1.027868984884e+05 lambda=3.1250000000000002e-03 >>>> Line search: Cubic step no good, shrinking lambda, current gnorm 7.010733464460e+03 lambda=1.5625000000000001e-03 >>>> Line search: Cubic step no good, shrinking lambda, current gnorm 1.751519860441e+03 lambda=7.8125000000000004e-04 >>>> Line search: Cubic step no good, shrinking lambda, current gnorm 3.497889916171e+02 lambda=3.4753778542938795e-04 >>>> Line search: Cubic step no good, shrinking lambda, current gnorm 7.932631084466e+01 lambda=1.5879606741873878e-04 >>>> Line search: Cubic step no good, shrinking lambda, current gnorm 2.194608479634e+01 lambda=6.5716583192912669e-05 >>>> Line search: Cubic step no good, shrinking lambda, current gnorm 1.117190149691e+01 lambda=1.1541218569257328e-05 >>>> Line search: Cubically determined step, current gnorm 1.093879875464e+01 lambda=1.1541218569257329e-06 >>>> |residual|_2 of individual variables: >>>> potential: 8.59942 >>>> potentialliq: 0.403326 >>>> em: 2.26505 >>>> emliq: 0.714844 >>>> Arp: 6.3169 >>>> >>>> 4 Nonlinear |R| = 1.093880e+01 >>>> >> From adlinds3 at ncsu.edu Mon Nov 23 12:29:07 2015 From: adlinds3 at ncsu.edu (Alex Lindsay) Date: Mon, 23 Nov 2015 13:29:07 -0500 Subject: [petsc-users] Debugging failed solve (what's an acceptable upper bound to the condition number?) In-Reply-To: <877flc31qi.fsf@jedbrown.org> References: <564F691F.9020302@ncsu.edu> <877flc31qi.fsf@jedbrown.org> Message-ID: <56535AF3.4010007@ncsu.edu> On 11/20/2015 02:33 PM, Jed Brown wrote: > Alex Lindsay writes: >> I'm almost ashamed to share my condition number because I'm sure it must >> be absurdly high. Without applying -ksp_diagonal_scale and >> -ksp_diagonal_scale_fix, the condition number is around 1e25. When I do >> apply those two parameters, the condition number is reduced to 1e17. >> Even after scaling all my variable residuals so that they were all on >> the order of unity (a suggestion on the Moose list), I still have a >> condition number of 1e12. > Double precision provides 16 digits of accuracy in the best case. When > you finite difference, the accuracy is reduced to 8 digits if the > differencing parameter is chosen optimally. With the condition numbers > you're reporting, your matrix is singular up to available precision. > >> I have no experience with condition numbers, but knowing that perfect >> condition number is unity, 1e12 seems unacceptable. What's an >> acceptable upper limit on the condition number? Is it problem >> dependent? Having already tried scaling the individual variable >> residuals, I'm not exactly sure what my next method would be for >> trying to reduce the condition number. > Singular operators are often caused by incorrect boundary conditions. > You should try a small and simple version of your problem and find out > why it's producing a singular (or so close to singular we can't tell) > operator. Could large variable values also create singular operators? I'm essentially solving an advection-diffusion-reaction problem for several species where the advection is driven by an electric field. The species concentrations are in a logarithmic form such that the true concentration is given by exp(u). With my current units (# of particles / m^3) exp(u) is anywhere from 1e13 to 1e20, and thus the initial residuals are probably on the same order of magnitude. After I've assembled the total residual for each variable and before the residual is passed to the solver, I apply scaling to the residuals such that the sum of the variable residuals is around 1e3. But perhaps I lose some accuracy during the residual assembly process? I'm equating "incorrect" boundary conditions to "unphysical" or "unrealistic" boundary conditions. Hopefully that's fair. From adlinds3 at ncsu.edu Mon Nov 23 14:44:33 2015 From: adlinds3 at ncsu.edu (Alex Lindsay) Date: Mon, 23 Nov 2015 15:44:33 -0500 Subject: [petsc-users] Debugging failed solve (what's an acceptable upper bound to the condition number?) In-Reply-To: <56535AF3.4010007@ncsu.edu> References: <564F691F.9020302@ncsu.edu> <877flc31qi.fsf@jedbrown.org> <56535AF3.4010007@ncsu.edu> Message-ID: <56537AB1.6080404@ncsu.edu> I've found that with a "full" variable set, I can get convergence. However, if I remove two of my variables (the other variables have no dependence on the variables that I remove; the coupling is one-way), then I no longer get convergence. I've attached logs of one time-step for both the converged and non-converged cases. On 11/23/2015 01:29 PM, Alex Lindsay wrote: > On 11/20/2015 02:33 PM, Jed Brown wrote: >> Alex Lindsay writes: >>> I'm almost ashamed to share my condition number because I'm sure it >>> must >>> be absurdly high. Without applying -ksp_diagonal_scale and >>> -ksp_diagonal_scale_fix, the condition number is around 1e25. When I do >>> apply those two parameters, the condition number is reduced to 1e17. >>> Even after scaling all my variable residuals so that they were all on >>> the order of unity (a suggestion on the Moose list), I still have a >>> condition number of 1e12. >> Double precision provides 16 digits of accuracy in the best case. When >> you finite difference, the accuracy is reduced to 8 digits if the >> differencing parameter is chosen optimally. With the condition numbers >> you're reporting, your matrix is singular up to available precision. >> >>> I have no experience with condition numbers, but knowing that perfect >>> condition number is unity, 1e12 seems unacceptable. What's an >>> acceptable upper limit on the condition number? Is it problem >>> dependent? Having already tried scaling the individual variable >>> residuals, I'm not exactly sure what my next method would be for >>> trying to reduce the condition number. >> Singular operators are often caused by incorrect boundary conditions. >> You should try a small and simple version of your problem and find out >> why it's producing a singular (or so close to singular we can't tell) >> operator. > Could large variable values also create singular operators? I'm > essentially solving an advection-diffusion-reaction problem for > several species where the advection is driven by an electric field. > The species concentrations are in a logarithmic form such that the > true concentration is given by exp(u). With my current units (# of > particles / m^3) exp(u) is anywhere from 1e13 to 1e20, and thus the > initial residuals are probably on the same order of magnitude. After > I've assembled the total residual for each variable and before the > residual is passed to the solver, I apply scaling to the residuals > such that the sum of the variable residuals is around 1e3. But perhaps > I lose some accuracy during the residual assembly process? > > I'm equating "incorrect" boundary conditions to "unphysical" or > "unrealistic" boundary conditions. Hopefully that's fair. -------------- next part -------------- Time Step 1, time = 6.4e-11 dt = 6.4e-11 |residual|_2 of individual variables: potential: 1250 potentialliq: 0 em: 5.73385e-12 emliq: 3.08187e-10 Arp: 5.73385e-12 0 Nonlinear |R| = 1.250000e+03 0 KSP unpreconditioned resid norm 1.250000000000e+03 true resid norm 1.250000000000e+03 ||r(i)||/||b|| 1.000000000000e+00 1 KSP unpreconditioned resid norm 1.250000000000e+03 true resid norm 1.250000000000e+03 ||r(i)||/||b|| 1.000000000000e+00 2 KSP unpreconditioned resid norm 2.182679427760e+02 true resid norm 7.819529716916e+08 ||r(i)||/||b|| 6.255623773533e+05 3 KSP unpreconditioned resid norm 1.011652745364e+02 true resid norm 4.461678857470e+01 ||r(i)||/||b|| 3.569343085976e-02 4 KSP unpreconditioned resid norm 8.125623676015e+00 true resid norm 8.053940519499e+08 ||r(i)||/||b|| 6.443152415599e+05 5 KSP unpreconditioned resid norm 5.805247155944e+00 true resid norm 8.054105876447e+08 ||r(i)||/||b|| 6.443284701157e+05 6 KSP unpreconditioned resid norm 4.756488143441e+00 true resid norm 8.054162537433e+08 ||r(i)||/||b|| 6.443330029946e+05 7 KSP unpreconditioned resid norm 4.126450902175e+00 true resid norm 8.054191165808e+08 ||r(i)||/||b|| 6.443352932646e+05 8 KSP unpreconditioned resid norm 3.694696196953e+00 true resid norm 8.054208439360e+08 ||r(i)||/||b|| 6.443366751488e+05 9 KSP unpreconditioned resid norm 3.375152117403e+00 true resid norm 8.054219995564e+08 ||r(i)||/||b|| 6.443375996451e+05 10 KSP unpreconditioned resid norm 3.126354693526e+00 true resid norm 8.054228269923e+08 ||r(i)||/||b|| 6.443382615939e+05 Linear solve converged due to CONVERGED_ITS iterations 10 Line search: gnorm after quadratic fit 2.835243959849e+07 Line search: Cubic step no good, shrinking lambda, current gnorm 2.692665331241e+06 lambda=1.0000000000000002e-02 Line search: Cubic step no good, shrinking lambda, current gnorm 2.678836859543e+05 lambda=1.0000000000000002e-03 Line search: Cubic step no good, shrinking lambda, current gnorm 2.680339963946e+04 lambda=1.0000000000000003e-04 Line search: Cubic step no good, shrinking lambda, current gnorm 2.954721111888e+03 lambda=1.0000000000000004e-05 Line search: Cubic step no good, shrinking lambda, current gnorm 1.278349531522e+03 lambda=1.0000000000000004e-06 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250287112903e+03 lambda=1.0000000000000005e-07 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250002830553e+03 lambda=1.0000000000000005e-08 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250000030832e+03 lambda=1.0000000000000005e-09 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250000000426e+03 lambda=1.0000000000000006e-10 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250000000481e+03 lambda=1.0922440478434965e-11 Line search: Cubically determined step, current gnorm 1.249999999999e+03 lambda=1.0922440478434966e-12 |residual|_2 of individual variables: potential: 1250 potentialliq: 8.09094e-26 em: 1.07274e-11 emliq: 3.08187e-10 Arp: 5.73508e-12 1 Nonlinear |R| = 1.250000e+03 0 KSP unpreconditioned resid norm 1.249999999999e+03 true resid norm 1.249999999999e+03 ||r(i)||/||b|| 1.000000000000e+00 1 KSP unpreconditioned resid norm 1.249999999999e+03 true resid norm 1.249999999999e+03 ||r(i)||/||b|| 1.000000000000e+00 2 KSP unpreconditioned resid norm 2.183465611429e+02 true resid norm 8.427616262825e+08 ||r(i)||/||b|| 6.742093010267e+05 3 KSP unpreconditioned resid norm 1.599839731484e+02 true resid norm 8.129044290551e+08 ||r(i)||/||b|| 6.503235432448e+05 4 KSP unpreconditioned resid norm 1.320218637568e+02 true resid norm 8.182745495747e+08 ||r(i)||/||b|| 6.546196396604e+05 5 KSP unpreconditioned resid norm 1.149512781910e+02 true resid norm 1.687718402197e+09 ||r(i)||/||b|| 1.350174721759e+06 6 KSP unpreconditioned resid norm 1.031509670396e+02 true resid norm 1.645498956462e+09 ||r(i)||/||b|| 1.316399165171e+06 7 KSP unpreconditioned resid norm 9.436932085660e+01 true resid norm 1.647788157739e+09 ||r(i)||/||b|| 1.318230526193e+06 8 KSP unpreconditioned resid norm 8.750587817698e+01 true resid norm 1.649436300901e+09 ||r(i)||/||b|| 1.319549040722e+06 9 KSP unpreconditioned resid norm 8.195066818457e+01 true resid norm 1.696531801097e+09 ||r(i)||/||b|| 1.357225440879e+06 10 KSP unpreconditioned resid norm 7.733475365117e+01 true resid norm 1.651650881484e+09 ||r(i)||/||b|| 1.321320705189e+06 Linear solve converged due to CONVERGED_ITS iterations 10 Line search: gnorm after quadratic fit 1.594871627961e+15 Line search: Cubic step no good, shrinking lambda, current gnorm 1.130742402617e+08 lambda=5.0000000000000003e-02 Line search: Cubic step no good, shrinking lambda, current gnorm 3.597173061575e+07 lambda=2.5000000000000001e-02 Line search: Cubic step no good, shrinking lambda, current gnorm 7.589836896768e+06 lambda=5.3308924183071470e-03 Line search: Cubic step no good, shrinking lambda, current gnorm 7.570310699594e+05 lambda=5.3308924183071468e-04 Line search: Cubic step no good, shrinking lambda, current gnorm 7.569384032198e+04 lambda=5.3308924183071473e-05 Line search: Cubic step no good, shrinking lambda, current gnorm 7.670683394580e+03 lambda=5.3308924183071480e-06 Line search: Cubic step no good, shrinking lambda, current gnorm 1.461254084896e+03 lambda=5.3308924183071482e-07 Line search: Cubic step no good, shrinking lambda, current gnorm 1.252290220827e+03 lambda=5.3308924183071487e-08 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250022917003e+03 lambda=5.3308924183071492e-09 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250000221997e+03 lambda=5.3308924183071499e-10 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250000001910e+03 lambda=5.3308924183071504e-11 Line search: Cubically determined step, current gnorm 1.249999999992e+03 lambda=5.3308924183071507e-12 |residual|_2 of individual variables: potential: 1250 potentialliq: 7.42554e-26 em: 1.22453e-09 emliq: 3.08187e-10 Arp: 1.38635e-11 2 Nonlinear |R| = 1.250000e+03 0 KSP unpreconditioned resid norm 1.249999999992e+03 true resid norm 1.249999999992e+03 ||r(i)||/||b|| 1.000000000000e+00 1 KSP unpreconditioned resid norm 1.249999999992e+03 true resid norm 1.249999999992e+03 ||r(i)||/||b|| 1.000000000000e+00 2 KSP unpreconditioned resid norm 2.182596482393e+02 true resid norm 8.253967021927e+08 ||r(i)||/||b|| 6.603173617584e+05 3 KSP unpreconditioned resid norm 1.947039771924e+01 true resid norm 2.075931294022e+01 ||r(i)||/||b|| 1.660745035228e-02 4 KSP unpreconditioned resid norm 1.436461464059e+01 true resid norm 2.083785869768e+01 ||r(i)||/||b|| 1.667028695825e-02 5 KSP unpreconditioned resid norm 1.190526501926e+01 true resid norm 2.086697733900e+01 ||r(i)||/||b|| 1.669358187131e-02 6 KSP unpreconditioned resid norm 1.038939480233e+01 true resid norm 2.088215650170e+01 ||r(i)||/||b|| 1.670572520146e-02 7 KSP unpreconditioned resid norm 9.335802160607e+00 true resid norm 2.089153252677e+01 ||r(i)||/||b|| 1.671322602152e-02 8 KSP unpreconditioned resid norm 8.549016490263e+00 true resid norm 2.089795904301e+01 ||r(i)||/||b|| 1.671836723451e-02 9 KSP unpreconditioned resid norm 7.932601421080e+00 true resid norm 2.090238701435e+01 ||r(i)||/||b|| 1.672190961159e-02 10 KSP unpreconditioned resid norm 7.432799238568e+00 true resid norm 2.090615112323e+01 ||r(i)||/||b|| 1.672492089869e-02 Linear solve converged due to CONVERGED_ITS iterations 10 Line search: gnorm after quadratic fit 1.142198665507e+07 Line search: Cubic step no good, shrinking lambda, current gnorm 5.550401765706e+06 lambda=5.0000000000000003e-02 Line search: Cubic step no good, shrinking lambda, current gnorm 5.409748027962e+05 lambda=5.0000000000000010e-03 Line search: Cubic step no good, shrinking lambda, current gnorm 5.397311265735e+04 lambda=5.0000000000000012e-04 Line search: Cubic step no good, shrinking lambda, current gnorm 5.537394894252e+03 lambda=5.0000000000000016e-05 Line search: Cubic step no good, shrinking lambda, current gnorm 1.361427787561e+03 lambda=5.0000000000000021e-06 Line search: Cubic step no good, shrinking lambda, current gnorm 1.251161609975e+03 lambda=5.0000000000000019e-07 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250011611567e+03 lambda=5.0000000000000024e-08 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250000107464e+03 lambda=5.0000000000000026e-09 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250000001345e+03 lambda=5.0000000000000024e-10 Line search: Cubically determined step, current gnorm 1.249999999897e+03 lambda=7.6233253179964001e-11 |residual|_2 of individual variables: potential: 1250 potentialliq: 3.15588e-24 em: 2.84656e-09 emliq: 3.08187e-10 Arp: 2.99446e-11 3 Nonlinear |R| = 1.250000e+03 0 KSP unpreconditioned resid norm 1.249999999897e+03 true resid norm 1.249999999897e+03 ||r(i)||/||b|| 1.000000000000e+00 1 KSP unpreconditioned resid norm 1.249999999897e+03 true resid norm 1.249999999897e+03 ||r(i)||/||b|| 1.000000000000e+00 2 KSP unpreconditioned resid norm 2.181586411076e+02 true resid norm 7.819766499349e+08 ||r(i)||/||b|| 6.255813199996e+05 3 KSP unpreconditioned resid norm 1.854352309003e+01 true resid norm 2.003012339454e+01 ||r(i)||/||b|| 1.602409871696e-02 4 KSP unpreconditioned resid norm 1.362119502191e+01 true resid norm 8.053751742035e+08 ||r(i)||/||b|| 6.443001394160e+05 5 KSP unpreconditioned resid norm 1.127240232708e+01 true resid norm 8.054050843536e+08 ||r(i)||/||b|| 6.443240675361e+05 6 KSP unpreconditioned resid norm 9.828809807849e+00 true resid norm 8.054206656511e+08 ||r(i)||/||b|| 6.443365325741e+05 7 KSP unpreconditioned resid norm 8.827500453712e+00 true resid norm 8.054302209816e+08 ||r(i)||/||b|| 6.443441768386e+05 8 KSP unpreconditioned resid norm 8.080717820462e+00 true resid norm 8.054366796615e+08 ||r(i)||/||b|| 6.443493437824e+05 9 KSP unpreconditioned resid norm 7.496176409465e+00 true resid norm 8.054413371079e+08 ||r(i)||/||b|| 6.443530697396e+05 10 KSP unpreconditioned resid norm 7.022528708078e+00 true resid norm 8.054448546154e+08 ||r(i)||/||b|| 6.443558837456e+05 Linear solve converged due to CONVERGED_ITS iterations 10 Line search: gnorm after quadratic fit 2.511396331623e+07 Line search: Cubic step no good, shrinking lambda, current gnorm 1.220228888591e+07 lambda=5.0000000000000003e-02 Line search: Cubic step no good, shrinking lambda, current gnorm 1.189164603546e+06 lambda=5.0000000000000010e-03 Line search: Cubic step no good, shrinking lambda, current gnorm 1.186167166565e+05 lambda=5.0000000000000012e-04 Line search: Cubic step no good, shrinking lambda, current gnorm 1.192365831756e+04 lambda=5.0000000000000016e-05 Line search: Cubic step no good, shrinking lambda, current gnorm 1.722937964326e+03 lambda=5.0000000000000021e-06 Line search: Cubic step no good, shrinking lambda, current gnorm 1.255613303497e+03 lambda=5.0000000000000019e-07 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250056305277e+03 lambda=5.0000000000000024e-08 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250000578304e+03 lambda=5.0000000000000026e-09 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250000003888e+03 lambda=5.0000000000000024e-10 Line search: Cubically determined step, current gnorm 1.249999999834e+03 lambda=5.0000000000000028e-11 |residual|_2 of individual variables: potential: 1250 potentialliq: 4.64775e-24 em: 3.8499e-09 emliq: 3.08187e-10 Arp: 4.01495e-11 4 Nonlinear |R| = 1.250000e+03 0 KSP unpreconditioned resid norm 1.249999999834e+03 true resid norm 1.249999999834e+03 ||r(i)||/||b|| 1.000000000000e+00 1 KSP unpreconditioned resid norm 1.249999999834e+03 true resid norm 1.249999999834e+03 ||r(i)||/||b|| 1.000000000000e+00 2 KSP unpreconditioned resid norm 2.180911377684e+02 true resid norm 7.819912747111e+08 ||r(i)||/||b|| 6.255930198518e+05 3 KSP unpreconditioned resid norm 1.858951272351e+01 true resid norm 2.008399501169e+01 ||r(i)||/||b|| 1.606719601148e-02 4 KSP unpreconditioned resid norm 1.366214280005e+01 true resid norm 8.054474955190e+08 ||r(i)||/||b|| 6.443579965007e+05 5 KSP unpreconditioned resid norm 1.130864341976e+01 true resid norm 2.012535124271e+01 ||r(i)||/||b|| 1.610028099630e-02 6 KSP unpreconditioned resid norm 9.862166661146e+00 true resid norm 2.013215117729e+01 ||r(i)||/||b|| 1.610572094397e-02 7 KSP unpreconditioned resid norm 8.858424099310e+00 true resid norm 8.055390402102e+08 ||r(i)||/||b|| 6.444312322537e+05 8 KSP unpreconditioned resid norm 8.109622559839e+00 true resid norm 2.013927312439e+01 ||r(i)||/||b|| 1.611141850165e-02 9 KSP unpreconditioned resid norm 7.523390433943e+00 true resid norm 2.014118037135e+01 ||r(i)||/||b|| 1.611294429922e-02 10 KSP unpreconditioned resid norm 7.048305660327e+00 true resid norm 2.014299220373e+01 ||r(i)||/||b|| 1.611439376512e-02 Linear solve converged due to CONVERGED_ITS iterations 10 Line search: gnorm after quadratic fit 2.391135555969e+07 Line search: Cubic step no good, shrinking lambda, current gnorm 1.161799021499e+07 lambda=5.0000000000000003e-02 Line search: Cubic step no good, shrinking lambda, current gnorm 1.132224103053e+06 lambda=5.0000000000000010e-03 Line search: Cubic step no good, shrinking lambda, current gnorm 1.129376510639e+05 lambda=5.0000000000000012e-04 Line search: Cubic step no good, shrinking lambda, current gnorm 1.135915546728e+04 lambda=5.0000000000000016e-05 Line search: Cubic step no good, shrinking lambda, current gnorm 1.684379468954e+03 lambda=5.0000000000000021e-06 Line search: Cubic step no good, shrinking lambda, current gnorm 1.255085688155e+03 lambda=5.0000000000000019e-07 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250051037016e+03 lambda=5.0000000000000024e-08 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250000478960e+03 lambda=5.0000000000000026e-09 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250000003825e+03 lambda=5.0000000000000024e-10 Line search: Cubically determined step, current gnorm 1.249999999772e+03 lambda=5.0000000000000028e-11 |residual|_2 of individual variables: potential: 1250 potentialliq: 5.64851e-24 em: 4.85699e-09 emliq: 3.08187e-10 Arp: 5.04572e-11 5 Nonlinear |R| = 1.250000e+03 0 KSP unpreconditioned resid norm 1.249999999772e+03 true resid norm 1.249999999772e+03 ||r(i)||/||b|| 1.000000000000e+00 1 KSP unpreconditioned resid norm 1.249999999772e+03 true resid norm 1.249999999772e+03 ||r(i)||/||b|| 1.000000000000e+00 2 KSP unpreconditioned resid norm 2.180245879330e+02 true resid norm 7.820056805196e+08 ||r(i)||/||b|| 6.256045445299e+05 3 KSP unpreconditioned resid norm 1.946205886553e+01 true resid norm 2.109700215024e+01 ||r(i)||/||b|| 1.687760172328e-02 4 KSP unpreconditioned resid norm 1.435814972977e+01 true resid norm 2.117498811000e+01 ||r(i)||/||b|| 1.693999049110e-02 5 KSP unpreconditioned resid norm 1.189985941072e+01 true resid norm 2.120350622398e+01 ||r(i)||/||b|| 1.696280498229e-02 6 KSP unpreconditioned resid norm 1.038474534940e+01 true resid norm 2.121871510654e+01 ||r(i)||/||b|| 1.697497208834e-02 7 KSP unpreconditioned resid norm 9.331644867005e+00 true resid norm 2.122782552136e+01 ||r(i)||/||b|| 1.698226042019e-02 8 KSP unpreconditioned resid norm 8.545199194359e+00 true resid norm 2.123399509830e+01 ||r(i)||/||b|| 1.698719608174e-02 9 KSP unpreconditioned resid norm 7.929057651795e+00 true resid norm 2.123866776923e+01 ||r(i)||/||b|| 1.699093421849e-02 10 KSP unpreconditioned resid norm 7.429473836034e+00 true resid norm 2.124192017166e+01 ||r(i)||/||b|| 1.699353614043e-02 Linear solve converged due to CONVERGED_ITS iterations 10 Line search: gnorm after quadratic fit 1.137552646194e+07 Line search: Cubic step no good, shrinking lambda, current gnorm 5.527796109975e+06 lambda=5.0000000000000003e-02 Line search: Cubic step no good, shrinking lambda, current gnorm 5.387690215403e+05 lambda=5.0000000000000010e-03 Line search: Cubic step no good, shrinking lambda, current gnorm 5.375314646450e+04 lambda=5.0000000000000012e-04 Line search: Cubic step no good, shrinking lambda, current gnorm 5.515964934581e+03 lambda=5.0000000000000016e-05 Line search: Cubic step no good, shrinking lambda, current gnorm 1.360556335854e+03 lambda=5.0000000000000021e-06 Line search: Cubic step no good, shrinking lambda, current gnorm 1.251153963289e+03 lambda=5.0000000000000019e-07 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250011459902e+03 lambda=5.0000000000000024e-08 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250000107244e+03 lambda=5.0000000000000026e-09 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250000001236e+03 lambda=5.0000000000000024e-10 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250000000233e+03 lambda=7.1940637925727652e-11 Line search: Cubically determined step, current gnorm 1.249999999763e+03 lambda=7.1940637925727654e-12 |residual|_2 of individual variables: potential: 1250 potentialliq: 6.10417e-24 em: 5.0098e-09 emliq: 3.08187e-10 Arp: 5.20276e-11 6 Nonlinear |R| = 1.250000e+03 0 KSP unpreconditioned resid norm 1.249999999763e+03 true resid norm 1.249999999763e+03 ||r(i)||/||b|| 1.000000000000e+00 1 KSP unpreconditioned resid norm 1.249999999763e+03 true resid norm 1.249999999763e+03 ||r(i)||/||b|| 1.000000000000e+00 2 KSP unpreconditioned resid norm 2.180044693796e+02 true resid norm 7.802198484180e+08 ||r(i)||/||b|| 6.241758788529e+05 3 KSP unpreconditioned resid norm 1.732135088416e+02 true resid norm 1.563215239340e+09 ||r(i)||/||b|| 1.250572191709e+06 4 KSP unpreconditioned resid norm 1.061485630366e+02 true resid norm 1.577335315211e+09 ||r(i)||/||b|| 1.261868252409e+06 5 KSP unpreconditioned resid norm 8.357822500760e+01 true resid norm 1.580790144294e+09 ||r(i)||/||b|| 1.264632115675e+06 6 KSP unpreconditioned resid norm 7.114455942454e+01 true resid norm 1.582376132218e+09 ||r(i)||/||b|| 1.265900906015e+06 7 KSP unpreconditioned resid norm 6.299675123208e+01 true resid norm 1.583286480061e+09 ||r(i)||/||b|| 1.266629184289e+06 8 KSP unpreconditioned resid norm 5.712987155691e+01 true resid norm 1.583877062081e+09 ||r(i)||/||b|| 1.267101649905e+06 9 KSP unpreconditioned resid norm 5.264617509968e+01 true resid norm 2.420444849004e+09 ||r(i)||/||b|| 1.936355879571e+06 10 KSP unpreconditioned resid norm 4.907562644314e+01 true resid norm 2.420913038054e+09 ||r(i)||/||b|| 1.936730430811e+06 Linear solve converged due to CONVERGED_ITS iterations 10 Line search: gnorm after quadratic fit 2.034499848293e+08 Line search: Cubic step no good, shrinking lambda, current gnorm 9.871552786457e+07 lambda=5.0000000000000003e-02 Line search: Cubic step no good, shrinking lambda, current gnorm 9.608320969172e+06 lambda=5.0000000000000010e-03 Line search: Cubic step no good, shrinking lambda, current gnorm 9.582394863587e+05 lambda=5.0000000000000012e-04 Line search: Cubic step no good, shrinking lambda, current gnorm 9.580610519546e+04 lambda=5.0000000000000016e-05 Line search: Cubic step no good, shrinking lambda, current gnorm 9.660746121724e+03 lambda=5.0000000000000021e-06 Line search: Cubic step no good, shrinking lambda, current gnorm 1.574849255364e+03 lambda=5.0000000000000019e-07 Line search: Cubic step no good, shrinking lambda, current gnorm 1.253664335218e+03 lambda=5.0000000000000024e-08 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250036823987e+03 lambda=5.0000000000000026e-09 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250000366999e+03 lambda=5.0000000000000024e-10 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250000001678e+03 lambda=5.0000000000000028e-11 Line search: Cubically determined step, current gnorm 1.249999999756e+03 lambda=5.0000000000000029e-12 |residual|_2 of individual variables: potential: 1250 potentialliq: 6.45789e-24 em: 4.39941e-09 emliq: 3.08187e-10 Arp: 4.57523e-11 7 Nonlinear |R| = 1.250000e+03 0 KSP unpreconditioned resid norm 1.249999999756e+03 true resid norm 1.249999999756e+03 ||r(i)||/||b|| 1.000000000000e+00 1 KSP unpreconditioned resid norm 1.249999999756e+03 true resid norm 1.249999999756e+03 ||r(i)||/||b|| 1.000000000000e+00 2 KSP unpreconditioned resid norm 2.180081873135e+02 true resid norm 7.820092290130e+08 ||r(i)||/||b|| 6.256073833323e+05 3 KSP unpreconditioned resid norm 1.946129832795e+01 true resid norm 2.109616756847e+01 ||r(i)||/||b|| 1.687693405807e-02 4 KSP unpreconditioned resid norm 1.435756568789e+01 true resid norm 2.117410765832e+01 ||r(i)||/||b|| 1.693928612996e-02 5 KSP unpreconditioned resid norm 1.189936827192e+01 true resid norm 2.120240607210e+01 ||r(i)||/||b|| 1.696192486099e-02 6 KSP unpreconditioned resid norm 1.038420125742e+01 true resid norm 2.121758216460e+01 ||r(i)||/||b|| 1.697406573498e-02 7 KSP unpreconditioned resid norm 9.331121524454e+00 true resid norm 2.122668148932e+01 ||r(i)||/||b|| 1.698134519476e-02 8 KSP unpreconditioned resid norm 8.544713991399e+00 true resid norm 2.123285231780e+01 ||r(i)||/||b|| 1.698628185755e-02 9 KSP unpreconditioned resid norm 7.928603436376e+00 true resid norm 2.123726676664e+01 ||r(i)||/||b|| 1.698981341662e-02 10 KSP unpreconditioned resid norm 7.429045402840e+00 true resid norm 2.124077912149e+01 ||r(i)||/||b|| 1.699262330050e-02 Linear solve converged due to CONVERGED_ITS iterations 10 Line search: gnorm after quadratic fit 1.149326227191e+07 Line search: Cubic step no good, shrinking lambda, current gnorm 5.585045811793e+06 lambda=5.0000000000000003e-02 Line search: Cubic step no good, shrinking lambda, current gnorm 5.443521396286e+05 lambda=5.0000000000000010e-03 Line search: Cubic step no good, shrinking lambda, current gnorm 5.430987061393e+04 lambda=5.0000000000000012e-04 Line search: Cubic step no good, shrinking lambda, current gnorm 5.570191486810e+03 lambda=5.0000000000000016e-05 Line search: Cubic step no good, shrinking lambda, current gnorm 1.362768603478e+03 lambda=5.0000000000000021e-06 Line search: Cubic step no good, shrinking lambda, current gnorm 1.251176978273e+03 lambda=5.0000000000000019e-07 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250011763766e+03 lambda=5.0000000000000024e-08 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250000107229e+03 lambda=5.0000000000000026e-09 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250000001109e+03 lambda=5.0000000000000024e-10 Line search: Cubically determined step, current gnorm 1.249999999661e+03 lambda=7.6233447816453448e-11 |residual|_2 of individual variables: potential: 1250 potentialliq: 9.08761e-24 em: 6.01857e-09 emliq: 3.08187e-10 Arp: 6.23999e-11 8 Nonlinear |R| = 1.250000e+03 0 KSP unpreconditioned resid norm 1.249999999661e+03 true resid norm 1.249999999661e+03 ||r(i)||/||b|| 1.000000000000e+00 1 KSP unpreconditioned resid norm 1.249999999661e+03 true resid norm 1.249999999661e+03 ||r(i)||/||b|| 1.000000000000e+00 2 KSP unpreconditioned resid norm 2.179064067278e+02 true resid norm 7.820312571339e+08 ||r(i)||/||b|| 6.256250058767e+05 3 KSP unpreconditioned resid norm 1.945676272501e+01 true resid norm 2.109203237667e+01 ||r(i)||/||b|| 1.687362590591e-02 4 KSP unpreconditioned resid norm 1.435481284529e+01 true resid norm 2.117033993758e+01 ||r(i)||/||b|| 1.693627195466e-02 5 KSP unpreconditioned resid norm 1.189760945248e+01 true resid norm 8.068396959495e+08 ||r(i)||/||b|| 6.454717569346e+05 6 KSP unpreconditioned resid norm 1.038290484991e+01 true resid norm 2.121519397161e+01 ||r(i)||/||b|| 1.697215518189e-02 7 KSP unpreconditioned resid norm 9.330088374361e+00 true resid norm 2.122447838136e+01 ||r(i)||/||b|| 1.697958270969e-02 8 KSP unpreconditioned resid norm 8.543849617441e+00 true resid norm 2.123076797725e+01 ||r(i)||/||b|| 1.698461438641e-02 9 KSP unpreconditioned resid norm 7.927856131237e+00 true resid norm 2.123553406701e+01 ||r(i)||/||b|| 1.698842725821e-02 10 KSP unpreconditioned resid norm 7.428383953238e+00 true resid norm 2.123911387058e+01 ||r(i)||/||b|| 1.699129110107e-02 Linear solve converged due to CONVERGED_ITS iterations 10 Line search: gnorm after quadratic fit 1.044209236144e+07 Line search: Cubic step no good, shrinking lambda, current gnorm 5.074235494424e+06 lambda=5.0000000000000003e-02 Line search: Cubic step no good, shrinking lambda, current gnorm 4.945654291498e+05 lambda=5.0000000000000010e-03 Line search: Cubic step no good, shrinking lambda, current gnorm 4.934540687239e+04 lambda=5.0000000000000012e-04 Line search: Cubic step no good, shrinking lambda, current gnorm 5.087618901961e+03 lambda=5.0000000000000016e-05 Line search: Cubic step no good, shrinking lambda, current gnorm 1.343763724232e+03 lambda=5.0000000000000021e-06 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250972379117e+03 lambda=5.0000000000000019e-07 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250009557978e+03 lambda=5.0000000000000024e-08 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250000092632e+03 lambda=5.0000000000000026e-09 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250000001014e+03 lambda=5.0000000000000024e-10 Line search: Cubically determined step, current gnorm 1.249999999566e+03 lambda=7.5768409431551880e-11 |residual|_2 of individual variables: potential: 1250 potentialliq: 1.95779e-23 em: 7.62773e-09 emliq: 3.08187e-10 Arp: 7.89936e-11 9 Nonlinear |R| = 1.250000e+03 0 KSP unpreconditioned resid norm 1.249999999566e+03 true resid norm 1.249999999566e+03 ||r(i)||/||b|| 1.000000000000e+00 1 KSP unpreconditioned resid norm 1.249999999566e+03 true resid norm 1.249999999566e+03 ||r(i)||/||b|| 1.000000000000e+00 2 KSP unpreconditioned resid norm 2.177996374504e+02 true resid norm 2.177992926509e+02 ||r(i)||/||b|| 1.742394341812e-01 3 KSP unpreconditioned resid norm 7.296868244457e+01 true resid norm 8.007747954267e+08 ||r(i)||/||b|| 6.406198365636e+05 4 KSP unpreconditioned resid norm 3.789417960370e+01 true resid norm 4.444792493019e+01 ||r(i)||/||b|| 3.555833995648e-02 5 KSP unpreconditioned resid norm 3.401642135341e+01 true resid norm 5.080285249605e+01 ||r(i)||/||b|| 4.064228201094e-02 6 KSP unpreconditioned resid norm 2.788071299035e+01 true resid norm 5.009980145722e+01 ||r(i)||/||b|| 4.007984117968e-02 7 KSP unpreconditioned resid norm 2.415622001384e+01 true resid norm 4.966990407429e+01 ||r(i)||/||b|| 3.973592327321e-02 8 KSP unpreconditioned resid norm 2.124505872401e+01 true resid norm 8.043149971256e+08 ||r(i)||/||b|| 6.434519979236e+05 9 KSP unpreconditioned resid norm 1.936316058299e+01 true resid norm 8.043550613379e+08 ||r(i)||/||b|| 6.434840492935e+05 10 KSP unpreconditioned resid norm 1.790786512505e+01 true resid norm 8.043834705860e+08 ||r(i)||/||b|| 6.435067766920e+05 Linear solve converged due to CONVERGED_ITS iterations 10 Line search: gnorm after quadratic fit 4.565500932636e+07 Line search: Cubic step no good, shrinking lambda, current gnorm 2.218074635601e+07 lambda=5.0000000000000003e-02 Line search: Cubic step no good, shrinking lambda, current gnorm 2.161434937923e+06 lambda=5.0000000000000010e-03 Line search: Cubic step no good, shrinking lambda, current gnorm 2.155886905194e+05 lambda=5.0000000000000012e-04 Line search: Cubic step no good, shrinking lambda, current gnorm 2.158915141138e+04 lambda=5.0000000000000016e-05 Line search: Cubic step no good, shrinking lambda, current gnorm 2.491477605019e+03 lambda=5.0000000000000021e-06 Line search: Cubic step no good, shrinking lambda, current gnorm 1.268443119324e+03 lambda=5.0000000000000019e-07 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250186068288e+03 lambda=5.0000000000000024e-08 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250001812872e+03 lambda=5.0000000000000026e-09 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250000017071e+03 lambda=5.0000000000000024e-10 Line search: Cubically determined step, current gnorm 1.249999999504e+03 lambda=5.0000000000000028e-11 |residual|_2 of individual variables: potential: 1250 potentialliq: 1.38731e-23 em: 5.22255e-09 emliq: 3.08187e-10 Arp: 5.42863e-11 10 Nonlinear |R| = 1.250000e+03 0 KSP unpreconditioned resid norm 1.249999999504e+03 true resid norm 1.249999999504e+03 ||r(i)||/||b|| 1.000000000000e+00 1 KSP unpreconditioned resid norm 1.249999999504e+03 true resid norm 1.249999999504e+03 ||r(i)||/||b|| 1.000000000000e+00 2 KSP unpreconditioned resid norm 2.177384421333e+02 true resid norm 7.820675831572e+08 ||r(i)||/||b|| 6.256540667740e+05 3 KSP unpreconditioned resid norm 1.081777973894e+02 true resid norm 4.201341309844e+01 ||r(i)||/||b|| 3.361073049209e-02 4 KSP unpreconditioned resid norm 2.640821579390e+01 true resid norm 2.805535044414e+01 ||r(i)||/||b|| 2.244428036422e-02 5 KSP unpreconditioned resid norm 1.905945882684e+01 true resid norm 8.069791145039e+08 ||r(i)||/||b|| 6.455832918593e+05 6 KSP unpreconditioned resid norm 1.567136385119e+01 true resid norm 8.070387542176e+08 ||r(i)||/||b|| 6.456310036303e+05 7 KSP unpreconditioned resid norm 1.361991788497e+01 true resid norm 8.070692097320e+08 ||r(i)||/||b|| 6.456553680418e+05 8 KSP unpreconditioned resid norm 1.220806886567e+01 true resid norm 8.070876916759e+08 ||r(i)||/||b|| 6.456701535969e+05 9 KSP unpreconditioned resid norm 1.116032600035e+01 true resid norm 8.071001011474e+08 ||r(i)||/||b|| 6.456800811741e+05 10 KSP unpreconditioned resid norm 1.034303066560e+01 true resid norm 8.071090086418e+08 ||r(i)||/||b|| 6.456872071697e+05 Linear solve converged due to CONVERGED_ITS iterations 10 Line search: gnorm after quadratic fit 1.089937861665e+07 Line search: Cubic step no good, shrinking lambda, current gnorm 1.035457409406e+06 lambda=1.0000000000000002e-02 Line search: Cubic step no good, shrinking lambda, current gnorm 1.030236386438e+05 lambda=1.0000000000000002e-03 Line search: Cubic step no good, shrinking lambda, current gnorm 1.037190239506e+04 lambda=1.0000000000000003e-04 Line search: Cubic step no good, shrinking lambda, current gnorm 1.619425850339e+03 lambda=1.0000000000000004e-05 Line search: Cubic step no good, shrinking lambda, current gnorm 1.254229242392e+03 lambda=1.0000000000000004e-06 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250042364531e+03 lambda=1.0000000000000005e-07 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250000412395e+03 lambda=1.0000000000000005e-08 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250000002870e+03 lambda=1.0000000000000005e-09 Line search: Cubically determined step, current gnorm 1.249999999336e+03 lambda=1.3446088162829657e-10 |residual|_2 of individual variables: potential: 1250 potentialliq: 1.43966e-23 em: 1.50979e-09 emliq: 3.08188e-10 Arp: 1.75738e-11 11 Nonlinear |R| = 1.250000e+03 0 KSP unpreconditioned resid norm 1.249999999336e+03 true resid norm 1.249999999336e+03 ||r(i)||/||b|| 1.000000000000e+00 1 KSP unpreconditioned resid norm 1.249999999336e+03 true resid norm 1.249999999336e+03 ||r(i)||/||b|| 1.000000000000e+00 2 KSP unpreconditioned resid norm 2.175597245573e+02 true resid norm 7.821062023649e+08 ||r(i)||/||b|| 6.256849622243e+05 3 KSP unpreconditioned resid norm 1.021043830525e+02 true resid norm 4.377529678057e+01 ||r(i)||/||b|| 3.502023744306e-02 4 KSP unpreconditioned resid norm 5.286981533874e+00 true resid norm 5.395069109689e+00 ||r(i)||/||b|| 4.316055290044e-03 5 KSP unpreconditioned resid norm 3.764995339701e+00 true resid norm 5.379330980748e+00 ||r(i)||/||b|| 4.303464786885e-03 6 KSP unpreconditioned resid norm 3.081422795569e+00 true resid norm 5.373914648069e+00 ||r(i)||/||b|| 4.299131720739e-03 7 KSP unpreconditioned resid norm 2.671775781134e+00 true resid norm 5.370824562979e+00 ||r(i)||/||b|| 4.296659652666e-03 8 KSP unpreconditioned resid norm 2.391423255607e+00 true resid norm 5.369627011698e+00 ||r(i)||/||b|| 4.295701611640e-03 9 KSP unpreconditioned resid norm 2.184105961898e+00 true resid norm 5.368237080665e+00 ||r(i)||/||b|| 4.294589666813e-03 10 KSP unpreconditioned resid norm 2.022781236997e+00 true resid norm 5.367390500679e+00 ||r(i)||/||b|| 4.293912402824e-03 Linear solve converged due to CONVERGED_ITS iterations 10 Line search: gnorm after quadratic fit 2.861209708657e+07 Line search: Cubic step no good, shrinking lambda, current gnorm 2.717339432175e+06 lambda=1.0000000000000002e-02 Line search: Cubic step no good, shrinking lambda, current gnorm 2.703384896028e+05 lambda=1.0000000000000002e-03 Line search: Cubic step no good, shrinking lambda, current gnorm 2.704852379134e+04 lambda=1.0000000000000003e-04 Line search: Cubic step no good, shrinking lambda, current gnorm 2.976958286838e+03 lambda=1.0000000000000004e-05 Line search: Cubic step no good, shrinking lambda, current gnorm 1.278865140534e+03 lambda=1.0000000000000004e-06 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250291693482e+03 lambda=1.0000000000000005e-07 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250002905364e+03 lambda=1.0000000000000005e-08 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250000030612e+03 lambda=1.0000000000000005e-09 Line search: Cubic step no good, shrinking lambda, current gnorm 1.249999999762e+03 lambda=1.0000000000000006e-10 Line search: Cubically determined step, current gnorm 1.249999999322e+03 lambda=1.0930423985251739e-11 |residual|_2 of individual variables: potential: 1250 potentialliq: 1.57218e-23 em: 1.45111e-09 emliq: 3.08188e-10 Arp: 1.70541e-11 12 Nonlinear |R| = 1.250000e+03 0 KSP unpreconditioned resid norm 1.249999999322e+03 true resid norm 1.249999999322e+03 ||r(i)||/||b|| 1.000000000000e+00 1 KSP unpreconditioned resid norm 1.249999999322e+03 true resid norm 1.249999999322e+03 ||r(i)||/||b|| 1.000000000000e+00 2 KSP unpreconditioned resid norm 2.175450305482e+02 true resid norm 7.821093769274e+08 ||r(i)||/||b|| 6.256875018812e+05 3 KSP unpreconditioned resid norm 1.856598278624e+01 true resid norm 8.053047975903e+08 ||r(i)||/||b|| 6.442438384215e+05 4 KSP unpreconditioned resid norm 1.364130029651e+01 true resid norm 8.053862298593e+08 ||r(i)||/||b|| 6.443089842368e+05 5 KSP unpreconditioned resid norm 1.128861070753e+01 true resid norm 2.009050004965e+01 ||r(i)||/||b|| 1.607240004843e-02 6 KSP unpreconditioned resid norm 9.844049423257e+00 true resid norm 2.009560936027e+01 ||r(i)||/||b|| 1.607648749694e-02 7 KSP unpreconditioned resid norm 8.841796624314e+00 true resid norm 8.054761688339e+08 ||r(i)||/||b|| 6.443809354165e+05 8 KSP unpreconditioned resid norm 8.094181518398e+00 true resid norm 8.054867356846e+08 ||r(i)||/||b|| 6.443893888970e+05 9 KSP unpreconditioned resid norm 7.508918360359e+00 true resid norm 2.010262861965e+01 ||r(i)||/||b|| 1.608210290444e-02 10 KSP unpreconditioned resid norm 7.034644000207e+00 true resid norm 2.010415273871e+01 ||r(i)||/||b|| 1.608332219969e-02 Linear solve converged due to CONVERGED_ITS iterations 10 Line search: gnorm after quadratic fit 2.456914540663e+07 Line search: Cubic step no good, shrinking lambda, current gnorm 1.193761683055e+07 lambda=5.0000000000000003e-02 Line search: Cubic step no good, shrinking lambda, current gnorm 1.163374916793e+06 lambda=5.0000000000000010e-03 Line search: Cubic step no good, shrinking lambda, current gnorm 1.160445421994e+05 lambda=5.0000000000000012e-04 Line search: Cubic step no good, shrinking lambda, current gnorm 1.166795782185e+04 lambda=5.0000000000000016e-05 Line search: Cubic step no good, shrinking lambda, current gnorm 1.705330567194e+03 lambda=5.0000000000000021e-06 Line search: Cubic step no good, shrinking lambda, current gnorm 1.255370779993e+03 lambda=5.0000000000000019e-07 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250053955588e+03 lambda=5.0000000000000024e-08 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250000542389e+03 lambda=5.0000000000000026e-09 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250000003313e+03 lambda=5.0000000000000024e-10 Line search: Cubically determined step, current gnorm 1.249999999260e+03 lambda=5.0000000000000028e-11 |residual|_2 of individual variables: potential: 1250 potentialliq: 1.7191e-23 em: 2.45625e-09 emliq: 3.08188e-10 Arp: 2.66652e-11 13 Nonlinear |R| = 1.250000e+03 0 KSP unpreconditioned resid norm 1.249999999260e+03 true resid norm 1.249999999260e+03 ||r(i)||/||b|| 1.000000000000e+00 1 KSP unpreconditioned resid norm 1.249999999260e+03 true resid norm 1.249999999260e+03 ||r(i)||/||b|| 1.000000000000e+00 2 KSP unpreconditioned resid norm 2.174776207655e+02 true resid norm 7.821239394476e+08 ||r(i)||/||b|| 6.256991519286e+05 3 KSP unpreconditioned resid norm 1.943727378418e+01 true resid norm 8.066314023718e+08 ||r(i)||/||b|| 6.453051222796e+05 4 KSP unpreconditioned resid norm 1.433914491640e+01 true resid norm 8.067188574536e+08 ||r(i)||/||b|| 6.453750863451e+05 5 KSP unpreconditioned resid norm 1.188388041985e+01 true resid norm 8.067515271785e+08 ||r(i)||/||b|| 6.454012221250e+05 6 KSP unpreconditioned resid norm 1.037058464541e+01 true resid norm 8.067686157693e+08 ||r(i)||/||b|| 6.454148929976e+05 7 KSP unpreconditioned resid norm 9.318830051429e+00 true resid norm 8.067791235915e+08 ||r(i)||/||b|| 6.454232992554e+05 8 KSP unpreconditioned resid norm 8.533423866217e+00 true resid norm 8.067862382342e+08 ||r(i)||/||b|| 6.454289909695e+05 9 KSP unpreconditioned resid norm 7.918104223305e+00 true resid norm 8.067913747994e+08 ||r(i)||/||b|| 6.454331002218e+05 10 KSP unpreconditioned resid norm 7.419191319918e+00 true resid norm 8.067952575497e+08 ||r(i)||/||b|| 6.454362064219e+05 Linear solve converged due to CONVERGED_ITS iterations 10 Line search: gnorm after quadratic fit 1.160995970104e+07 Line search: Cubic step no good, shrinking lambda, current gnorm 5.641751396461e+06 lambda=5.0000000000000003e-02 Line search: Cubic step no good, shrinking lambda, current gnorm 5.498787894103e+05 lambda=5.0000000000000010e-03 Line search: Cubic step no good, shrinking lambda, current gnorm 5.486101720170e+04 lambda=5.0000000000000012e-04 Line search: Cubic step no good, shrinking lambda, current gnorm 5.623924275130e+03 lambda=5.0000000000000016e-05 Line search: Cubic step no good, shrinking lambda, current gnorm 1.364972618139e+03 lambda=5.0000000000000021e-06 Line search: Cubic step no good, shrinking lambda, current gnorm 1.251201759974e+03 lambda=5.0000000000000019e-07 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250012071103e+03 lambda=5.0000000000000024e-08 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250000122223e+03 lambda=5.0000000000000026e-09 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250000000724e+03 lambda=5.0000000000000024e-10 Line search: Cubically determined step, current gnorm 1.249999999169e+03 lambda=7.2394763111969298e-11 |residual|_2 of individual variables: potential: 1250 potentialliq: 2.18503e-23 em: 3.99192e-09 emliq: 3.08188e-10 Arp: 4.21813e-11 14 Nonlinear |R| = 1.250000e+03 0 KSP unpreconditioned resid norm 1.249999999169e+03 true resid norm 1.249999999169e+03 ||r(i)||/||b|| 1.000000000000e+00 1 KSP unpreconditioned resid norm 1.249999999169e+03 true resid norm 1.249999999169e+03 ||r(i)||/||b|| 1.000000000000e+00 2 KSP unpreconditioned resid norm 2.173795908399e+02 true resid norm 7.821379143780e+08 ||r(i)||/||b|| 6.257103319182e+05 3 KSP unpreconditioned resid norm 1.943273247043e+01 true resid norm 2.106828788472e+01 ||r(i)||/||b|| 1.685463031898e-02 4 KSP unpreconditioned resid norm 1.433566777424e+01 true resid norm 2.114336582255e+01 ||r(i)||/||b|| 1.691469266928e-02 5 KSP unpreconditioned resid norm 1.188095918186e+01 true resid norm 2.117103567694e+01 ||r(i)||/||b|| 1.693682855281e-02 6 KSP unpreconditioned resid norm 1.036801734617e+01 true resid norm 2.118538176387e+01 ||r(i)||/||b|| 1.694830542236e-02 7 KSP unpreconditioned resid norm 9.316513162844e+00 true resid norm 2.119418433075e+01 ||r(i)||/||b|| 1.695534747587e-02 8 KSP unpreconditioned resid norm 8.531296064394e+00 true resid norm 2.120033082506e+01 ||r(i)||/||b|| 1.696026467132e-02 9 KSP unpreconditioned resid norm 7.916125715076e+00 true resid norm 2.120472884008e+01 ||r(i)||/||b|| 1.696378308334e-02 10 KSP unpreconditioned resid norm 7.417334541413e+00 true resid norm 2.120771141093e+01 ||r(i)||/||b|| 1.696616914002e-02 Linear solve converged due to CONVERGED_ITS iterations 10 Line search: gnorm after quadratic fit 1.163854034154e+07 Line search: Cubic step no good, shrinking lambda, current gnorm 5.655577109047e+06 lambda=5.0000000000000003e-02 Line search: Cubic step no good, shrinking lambda, current gnorm 5.512208074702e+05 lambda=5.0000000000000010e-03 Line search: Cubic step no good, shrinking lambda, current gnorm 5.499476969807e+04 lambda=5.0000000000000012e-04 Line search: Cubic step no good, shrinking lambda, current gnorm 5.636988389459e+03 lambda=5.0000000000000016e-05 Line search: Cubic step no good, shrinking lambda, current gnorm 1.365501184305e+03 lambda=5.0000000000000021e-06 Line search: Cubic step no good, shrinking lambda, current gnorm 1.251207931333e+03 lambda=5.0000000000000019e-07 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250011916602e+03 lambda=5.0000000000000024e-08 Line search: Cubic step no good, shrinking lambda, current gnorm 1.250000106642e+03 lambda=5.0000000000000026e-09 Line search: Cubically determined step, current gnorm 1.249999999095e+03 lambda=5.0000000000000024e-10 |residual|_2 of individual variables: potential: 1250 potentialliq: 8.77002e-23 em: 1.45953e-08 emliq: 0.0371094 Arp: 1.51539e-10 15 Nonlinear |R| = 1.250000e+03 Nonlinear solve did not converge due to DIVERGED_MAX_IT iterations 15 Solve Did NOT Converge! -------------- next part -------------- Time Step 1, time = 6.4e-11 dt = 6.4e-11 |residual|_2 of individual variables: potential: 1250 potentialliq: 0 em: 5.73385e-12 emliq: 3.08187e-10 Arp: 5.73385e-12 OHm: 5.14298e-23 H3Op: 5.14298e-23 0 Nonlinear |R| = 1.250000e+03 0 KSP unpreconditioned resid norm 1.250000000000e+03 true resid norm 1.250000000000e+03 ||r(i)||/||b|| 1.000000000000e+00 1 KSP unpreconditioned resid norm 1.247314056942e+03 true resid norm 1.247314056942e+03 ||r(i)||/||b|| 9.978512455538e-01 2 KSP unpreconditioned resid norm 4.860927105197e-02 true resid norm 4.776005315904e-02 ||r(i)||/||b|| 3.820804252723e-05 3 KSP unpreconditioned resid norm 4.787844580540e-02 true resid norm 4.752835483387e-02 ||r(i)||/||b|| 3.802268386709e-05 4 KSP unpreconditioned resid norm 4.437366678888e-02 true resid norm 2.756372625290e-01 ||r(i)||/||b|| 2.205098100232e-04 5 KSP unpreconditioned resid norm 1.696908986557e-05 true resid norm 5.105761919450e+00 ||r(i)||/||b|| 4.084609535560e-03 Linear solve converged due to CONVERGED_RTOL iterations 5 Line search: gnorm after quadratic fit 1.041879543556e+03 Line search: Quadratically determined step, lambda=1.6649745633364457e-01 |residual|_2 of individual variables: potential: 1041.88 potentialliq: 5.8088e-17 em: 0.072466 emliq: 0.000102405 Arp: 0.000565487 OHm: 0.56393 H3Op: 1.01493 1 Nonlinear |R| = 1.041880e+03 0 KSP unpreconditioned resid norm 1.041879543556e+03 true resid norm 1.041879543556e+03 ||r(i)||/||b|| 1.000000000000e+00 1 KSP unpreconditioned resid norm 1.039102369962e+03 true resid norm 1.039102369962e+03 ||r(i)||/||b|| 9.973344580846e-01 2 KSP unpreconditioned resid norm 5.950166555328e-01 true resid norm 5.675973956612e-01 ||r(i)||/||b|| 5.447821671629e-04 3 KSP unpreconditioned resid norm 3.889464332642e-02 true resid norm 7.643703143268e-01 ||r(i)||/||b|| 7.336455726137e-04 4 KSP unpreconditioned resid norm 3.068800383788e-02 true resid norm 7.487202250394e-01 ||r(i)||/||b|| 7.186245566198e-04 5 KSP unpreconditioned resid norm 3.016106994371e-02 true resid norm 6.513904475182e-01 ||r(i)||/||b|| 6.252070611684e-04 6 KSP unpreconditioned resid norm 1.208157846839e-02 true resid norm 9.354083556895e+00 ||r(i)||/||b|| 8.978085436793e-03 7 KSP unpreconditioned resid norm 1.419355549112e-04 true resid norm 1.067868324404e+01 ||r(i)||/||b|| 1.024944131986e-02 Linear solve converged due to CONVERGED_RTOL iterations 7 Line search: Using full step: fnorm 1.041879543556e+03 gnorm 1.603618773063e+01 |residual|_2 of individual variables: potential: 0.00484951 potentialliq: 1.5564e-05 em: 0.0536872 emliq: 0.125383 Arp: 0.00243548 OHm: 5.68961 H3Op: 14.9923 2 Nonlinear |R| = 1.603619e+01 0 KSP unpreconditioned resid norm 1.603618773063e+01 true resid norm 1.603618773063e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP unpreconditioned resid norm 2.105432058619e+00 true resid norm 2.105432058619e+00 ||r(i)||/||b|| 1.312925549380e-01 2 KSP unpreconditioned resid norm 1.098264610573e-04 true resid norm 9.432163694076e-04 ||r(i)||/||b|| 5.881799248371e-05 Linear solve converged due to CONVERGED_RTOL iterations 2 Line search: gnorm after quadratic fit 1.443471881195e+01 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 |residual|_2 of individual variables: potential: 0.00436455 potentialliq: 1.40075e-05 em: 0.0483446 emliq: 0.112845 Arp: 0.00219194 OHm: 5.12895 H3Op: 13.4922 3 Nonlinear |R| = 1.443472e+01 0 KSP unpreconditioned resid norm 1.443471881195e+01 true resid norm 1.443471881195e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP unpreconditioned resid norm 1.884409964965e+00 true resid norm 1.884409964965e+00 ||r(i)||/||b|| 1.305470504493e-01 2 KSP unpreconditioned resid norm 2.375784959854e-04 true resid norm 3.914568823513e-04 ||r(i)||/||b|| 2.711912074291e-05 3 KSP unpreconditioned resid norm 5.044976302331e-05 true resid norm 2.951075274163e-04 ||r(i)||/||b|| 2.044428653310e-05 Linear solve converged due to CONVERGED_RTOL iterations 3 Line search: Using full step: fnorm 1.443471881195e+01 gnorm 1.229473661441e+00 |residual|_2 of individual variables: potential: 2.08202e-07 potentialliq: 6.92938e-10 em: 0.00232657 emliq: 2.5874e-05 Arp: 4.9962e-07 OHm: 0.90422 H3Op: 0.833058 4 Nonlinear |R| = 1.229474e+00 0 KSP unpreconditioned resid norm 1.229473661441e+00 true resid norm 1.229473661441e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP unpreconditioned resid norm 2.888383837807e-03 true resid norm 2.888383837807e-03 ||r(i)||/||b|| 2.349284843095e-03 2 KSP unpreconditioned resid norm 2.027495745286e-03 true resid norm 2.038450555261e-03 ||r(i)||/||b|| 1.657986355618e-03 3 KSP unpreconditioned resid norm 1.984035502182e-04 true resid norm 2.605402233762e-04 ||r(i)||/||b|| 2.119120006775e-04 4 KSP unpreconditioned resid norm 3.857309277933e-05 true resid norm 1.325600705450e-02 ||r(i)||/||b|| 1.078185525257e-02 5 KSP unpreconditioned resid norm 5.087499897324e-07 true resid norm 1.454419650915e-02 ||r(i)||/||b|| 1.182961210580e-02 Linear solve converged due to CONVERGED_RTOL iterations 5 Line search: Using full step: fnorm 1.229473661441e+00 gnorm 2.408927331884e-01 |residual|_2 of individual variables: potential: 1.00981e-07 potentialliq: 2.64367e-08 em: 1.33448e-05 emliq: 5.26969e-05 Arp: 9.8925e-07 OHm: 0.210391 H3Op: 0.117323 5 Nonlinear |R| = 2.408927e-01 0 KSP unpreconditioned resid norm 2.408927331884e-01 true resid norm 2.408927331884e-01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP unpreconditioned resid norm 6.838178821298e-04 true resid norm 6.838178821298e-04 ||r(i)||/||b|| 2.838682068483e-03 2 KSP unpreconditioned resid norm 3.862120943656e-06 true resid norm 6.724384237429e-05 ||r(i)||/||b|| 2.791443373333e-04 3 KSP unpreconditioned resid norm 1.405230248854e-07 true resid norm 7.134981350742e-05 ||r(i)||/||b|| 2.961891484357e-04 Linear solve converged due to CONVERGED_RTOL iterations 3 Line search: Using full step: fnorm 2.408927331884e-01 gnorm 3.538570473573e-02 |residual|_2 of individual variables: potential: 1.65546e-08 potentialliq: 8.63114e-11 em: 6.27176e-08 emliq: 3.26416e-07 Arp: 6.13426e-09 OHm: 0.0333149 H3Op: 0.0119275 6 Nonlinear |R| = 3.538570e-02 0 KSP unpreconditioned resid norm 3.538570473573e-02 true resid norm 3.538570473573e-02 ||r(i)||/||b|| 1.000000000000e+00 1 KSP unpreconditioned resid norm 3.691161652629e-05 true resid norm 3.691161652629e-05 ||r(i)||/||b|| 1.043122266519e-03 2 KSP unpreconditioned resid norm 5.495132451034e-06 true resid norm 2.671458554487e-05 ||r(i)||/||b|| 7.549541755458e-04 3 KSP unpreconditioned resid norm 7.269366746948e-08 true resid norm 2.624083112879e-05 ||r(i)||/||b|| 7.415658759593e-04 Linear solve converged due to CONVERGED_RTOL iterations 3 Line search: Using full step: fnorm 3.538570473573e-02 gnorm 1.647449494568e-03 |residual|_2 of individual variables: potential: 1.43289e-08 potentialliq: 2.5783e-11 em: 2.34555e-08 emliq: 8.30641e-08 Arp: 1.43537e-09 OHm: 0.00163337 H3Op: 0.000214957 7 Nonlinear |R| = 1.647449e-03 0 KSP unpreconditioned resid norm 1.647449494568e-03 true resid norm 1.647449494568e-03 ||r(i)||/||b|| 1.000000000000e+00 1 KSP unpreconditioned resid norm 1.677765659074e-06 true resid norm 1.677765659074e-06 ||r(i)||/||b|| 1.018401877937e-03 2 KSP unpreconditioned resid norm 7.379250907830e-07 true resid norm 1.186104631721e-06 ||r(i)||/||b|| 7.199641844145e-04 3 KSP unpreconditioned resid norm 1.512636252550e-07 true resid norm 1.191859851853e-06 ||r(i)||/||b|| 7.234575965958e-04 4 KSP unpreconditioned resid norm 1.499662963136e-07 true resid norm 1.507725064021e-06 ||r(i)||/||b|| 9.151874269849e-04 5 KSP unpreconditioned resid norm 8.585859763891e-08 true resid norm 5.572122911654e-05 ||r(i)||/||b|| 3.382272373161e-02 6 KSP unpreconditioned resid norm 9.699214415589e-11 true resid norm 5.632188557407e-05 ||r(i)||/||b|| 3.418732152930e-02 Linear solve converged due to CONVERGED_RTOL iterations 6 Line search: Using full step: fnorm 1.647449494568e-03 gnorm 5.673172147266e-05 |residual|_2 of individual variables: potential: 5.69344e-09 potentialliq: 1.05299e-10 em: 1.18164e-07 emliq: 3.30266e-07 Arp: 6.25748e-09 OHm: 2.53197e-05 H3Op: 5.07669e-05 8 Nonlinear |R| = 5.673172e-05 0 KSP unpreconditioned resid norm 5.673172147266e-05 true resid norm 5.673172147266e-05 ||r(i)||/||b|| 1.000000000000e+00 1 KSP unpreconditioned resid norm 2.283987852633e-05 true resid norm 2.283987852633e-05 ||r(i)||/||b|| 4.025944909382e-01 2 KSP unpreconditioned resid norm 1.620979330079e-09 true resid norm 9.175446330767e-09 ||r(i)||/||b|| 1.617339663347e-04 3 KSP unpreconditioned resid norm 3.189928680388e-11 true resid norm 8.666557521570e-09 ||r(i)||/||b|| 1.527638734838e-04 Linear solve converged due to CONVERGED_RTOL iterations 3 Line search: Using full step: fnorm 5.673172147266e-05 gnorm 5.106381275877e-09 |residual|_2 of individual variables: potential: 6.59384e-12 potentialliq: 8.77051e-15 em: 9.64886e-12 emliq: 3.45014e-11 Arp: 6.84834e-13 OHm: 2.14381e-09 H3Op: 4.63442e-09 9 Nonlinear |R| = 5.106381e-09 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 9 Solve Converged! From bsmith at mcs.anl.gov Mon Nov 23 15:51:28 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 23 Nov 2015 15:51:28 -0600 Subject: [petsc-users] Debugging failed solve (what's an acceptable upper bound to the condition number?) In-Reply-To: <56537AB1.6080404@ncsu.edu> References: <564F691F.9020302@ncsu.edu> <877flc31qi.fsf@jedbrown.org> <56535AF3.4010007@ncsu.edu> <56537AB1.6080404@ncsu.edu> Message-ID: So just keep the "full" variable set. Note that without the full set the true residual is not tracking the preconditioned residual 0 KSP unpreconditioned resid norm 1.250000000000e+03 true resid norm 1.250000000000e+03 ||r(i)||/||b|| 1.000000000000e+00 1 KSP unpreconditioned resid norm 1.250000000000e+03 true resid norm 1.250000000000e+03 ||r(i)||/||b|| 1.000000000000e+00 2 KSP unpreconditioned resid norm 2.182679427760e+02 true resid norm 7.819529716916e+08 ||r(i)||/||b|| 6.255623773533e+05 3 KSP unpreconditioned resid norm 1.011652745364e+02 true resid norm 4.461678857470e+01 ||r(i)||/||b|| 3.569343085976e-02 4 KSP unpreconditioned resid norm 8.125623676015e+00 true resid norm 8.053940519499e+08 ||r(i)||/||b|| 6.443152415599e+05 5 KSP unpreconditioned resid norm 5.805247155944e+00 true resid norm 8.054105876447e+08 ||r(i)||/||b|| 6.443284701157e+05 6 KSP unpreconditioned resid norm 4.756488143441e+00 true resid norm 8.054162537433e+08 ||r(i)||/||b|| 6.443330029946e+05 7 KSP unpreconditioned resid norm 4.126450902175e+00 true resid norm 8.054191165808e+08 ||r(i)||/||b|| 6.443352932646e+05 8 KSP unpreconditioned resid norm 3.694696196953e+00 true resid norm 8.054208439360e+08 ||r(i)||/||b|| 6.443366751488e+05 9 KSP unpreconditioned resid norm 3.375152117403e+00 true resid norm 8.054219995564e+08 ||r(i)||/||b|| 6.443375996451e+05 10 KSP unpreconditioned resid norm 3.126354693526e+00 true resid norm 8.054228269923e+08 ||r(i)||/||b|| 6.443382615939e+05 meaning that the linear solver is not making any progress. With the full set the as the preconditioned residual gets smaller the true one does as well, to some degree 0 KSP unpreconditioned resid norm 1.250000000000e+03 true resid norm 1.250000000000e+03 ||r(i)||/||b|| 1.000000000000e+00 1 KSP unpreconditioned resid norm 1.247314056942e+03 true resid norm 1.247314056942e+03 ||r(i)||/||b|| 9.978512455538e-01 2 KSP unpreconditioned resid norm 4.860927105197e-02 true resid norm 4.776005315904e-02 ||r(i)||/||b|| 3.820804252723e-05 3 KSP unpreconditioned resid norm 4.787844580540e-02 true resid norm 4.752835483387e-02 ||r(i)||/||b|| 3.802268386709e-05 4 KSP unpreconditioned resid norm 4.437366678888e-02 true resid norm 2.756372625290e-01 ||r(i)||/||b|| 2.205098100232e-04 5 KSP unpreconditioned resid norm 1.696908986557e-05 true resid norm 5.105761919450e+00 ||r(i)||/||b|| 4.084609535560e-03 Try using right preconditioning where the preconditioned residual does not appear -ksp_pc_side right what happens in both cases? Barry > On Nov 23, 2015, at 2:44 PM, Alex Lindsay wrote: > > I've found that with a "full" variable set, I can get convergence. However, if I remove two of my variables (the other variables have no dependence on the variables that I remove; the coupling is one-way), then I no longer get convergence. I've attached logs of one time-step for both the converged and non-converged cases. > > On 11/23/2015 01:29 PM, Alex Lindsay wrote: >> On 11/20/2015 02:33 PM, Jed Brown wrote: >>> Alex Lindsay writes: >>>> I'm almost ashamed to share my condition number because I'm sure it must >>>> be absurdly high. Without applying -ksp_diagonal_scale and >>>> -ksp_diagonal_scale_fix, the condition number is around 1e25. When I do >>>> apply those two parameters, the condition number is reduced to 1e17. >>>> Even after scaling all my variable residuals so that they were all on >>>> the order of unity (a suggestion on the Moose list), I still have a >>>> condition number of 1e12. >>> Double precision provides 16 digits of accuracy in the best case. When >>> you finite difference, the accuracy is reduced to 8 digits if the >>> differencing parameter is chosen optimally. With the condition numbers >>> you're reporting, your matrix is singular up to available precision. >>> >>>> I have no experience with condition numbers, but knowing that perfect >>>> condition number is unity, 1e12 seems unacceptable. What's an >>>> acceptable upper limit on the condition number? Is it problem >>>> dependent? Having already tried scaling the individual variable >>>> residuals, I'm not exactly sure what my next method would be for >>>> trying to reduce the condition number. >>> Singular operators are often caused by incorrect boundary conditions. >>> You should try a small and simple version of your problem and find out >>> why it's producing a singular (or so close to singular we can't tell) >>> operator. >> Could large variable values also create singular operators? I'm essentially solving an advection-diffusion-reaction problem for several species where the advection is driven by an electric field. The species concentrations are in a logarithmic form such that the true concentration is given by exp(u). With my current units (# of particles / m^3) exp(u) is anywhere from 1e13 to 1e20, and thus the initial residuals are probably on the same order of magnitude. After I've assembled the total residual for each variable and before the residual is passed to the solver, I apply scaling to the residuals such that the sum of the variable residuals is around 1e3. But perhaps I lose some accuracy during the residual assembly process? >> >> I'm equating "incorrect" boundary conditions to "unphysical" or "unrealistic" boundary conditions. Hopefully that's fair. > > From bsmith at mcs.anl.gov Mon Nov 23 16:03:34 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 23 Nov 2015 16:03:34 -0600 Subject: [petsc-users] Using PFFT within PETSc In-Reply-To: <5652C0CD.40100@sissa.it> References: <5652C0CD.40100@sissa.it> Message-ID: Your issues are likely due to a difference in how PETSc and pfft() think the "vectors" are laid out across processes You seem to assume that DMDACreate2d() and pfft_create_procmesh() will make the same decisions about parallel layout; you print some information from the pfft_create() pfft_init(); pfft_create_procmesh_1d(PETSC_COMM_WORLD,ncpus,&comm_cart_1d); alloc_local = pfft_local_size_dft(2,n,comm_cart_1d,PFFT_TRANSPOSED_NONE,local_no,local_o_start,local_ni,local_i_start); PetscPrintf(PETSC_COMM_SELF,"alloc_local: %d\n",(PetscInt)alloc_local); PetscPrintf(PETSC_COMM_SELF,"local_no: %d %d\n",(PetscInt)local_no[0],(PetscInt)local_no[1]); PetscPrintf(PETSC_COMM_SELF,"local_ni: %d %d\n",(PetscInt)local_ni[0],(PetscInt)local_ni[1]); PetscPrintf(PETSC_COMM_SELF,"local_o_start: %d %d\n",(PetscInt)local_o_start[0],(PetscInt)local_o_start[1]); PetscPrintf(PETSC_COMM_SELF,"local_i_start: %d %d\n",(PetscInt)local_i_start[0],(PetscInt)local_i_start[1]); but do not check anything about the layout DMDACreate2d() selected. First you need to call DMDAGetInfo() and DMDAGetLocalInfo() to see the layout PETSc is using and make sure it matches pfft Barry > On Nov 23, 2015, at 1:31 AM, Giuseppe Pitton wrote: > > Dear users and developers, > I am trying to interface PETSc with the parallel fast Fourier transform library PFFT (https://www-user.tu-chemnitz.de/~potts/workgroup/pippig/software.php.en), based in turn on FFTW. My plan is to build a spectral differentiation code, and in the attached files you can see a simple example. > The code works correctly in serial, but in parallel there are some problems regarding the output of the results, I think due to some differences in the way PETSc and PFFT store data, but I'm not sure if this is really the issue. > In the attached code, the number of processors used should be specified at compile time in the variable "ncpus". As long as ncpus = 1, everything works fine, but if ncpus = 2 or an higher power of 2, the code terminates correctly but the results show some artifacts, as you can see from the generated hdf5 file, named "output-pfft.h5". > In the makefile the variables PFFTINC, PFFTLIB, FFTWINC and FFTWLIB should be set correctly. > Thank you, > Giuseppe > > From adlinds3 at ncsu.edu Mon Nov 23 18:15:13 2015 From: adlinds3 at ncsu.edu (Alex Lindsay) Date: Mon, 23 Nov 2015 19:15:13 -0500 Subject: [petsc-users] Debugging failed solve (what's an acceptable upper bound to the condition number?) In-Reply-To: References: <564F691F.9020302@ncsu.edu> <877flc31qi.fsf@jedbrown.org> <56535AF3.4010007@ncsu.edu> <56537AB1.6080404@ncsu.edu> Message-ID: <5653AC11.2070405@ncsu.edu> I probably should have said that I'm getting convergence with the Jacobian-free method. I still haven't had any luck with Newton. It appears that the default for Jacobian-free is right preconditioning; I don't know if that's a Moose setting or PetSc. Anyways, if I try left preconditioning, I get comparable or slightly better performance with the full set. Still no convergence with the smaller set. I'll move forward with the full set, but I still want to understand why I can't achieve convergence with the smaller set. I think I'm going to review my linear algebra and do some research on preconditioning and iterative solutions so I have a better grasp of what's going on here. On 11/23/2015 04:51 PM, Barry Smith wrote: > So just keep the "full" variable set. > > Note that without the full set the true residual is not tracking the preconditioned residual > > 0 KSP unpreconditioned resid norm 1.250000000000e+03 true resid norm 1.250000000000e+03 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP unpreconditioned resid norm 1.250000000000e+03 true resid norm 1.250000000000e+03 ||r(i)||/||b|| 1.000000000000e+00 > 2 KSP unpreconditioned resid norm 2.182679427760e+02 true resid norm 7.819529716916e+08 ||r(i)||/||b|| 6.255623773533e+05 > 3 KSP unpreconditioned resid norm 1.011652745364e+02 true resid norm 4.461678857470e+01 ||r(i)||/||b|| 3.569343085976e-02 > 4 KSP unpreconditioned resid norm 8.125623676015e+00 true resid norm 8.053940519499e+08 ||r(i)||/||b|| 6.443152415599e+05 > 5 KSP unpreconditioned resid norm 5.805247155944e+00 true resid norm 8.054105876447e+08 ||r(i)||/||b|| 6.443284701157e+05 > 6 KSP unpreconditioned resid norm 4.756488143441e+00 true resid norm 8.054162537433e+08 ||r(i)||/||b|| 6.443330029946e+05 > 7 KSP unpreconditioned resid norm 4.126450902175e+00 true resid norm 8.054191165808e+08 ||r(i)||/||b|| 6.443352932646e+05 > 8 KSP unpreconditioned resid norm 3.694696196953e+00 true resid norm 8.054208439360e+08 ||r(i)||/||b|| 6.443366751488e+05 > 9 KSP unpreconditioned resid norm 3.375152117403e+00 true resid norm 8.054219995564e+08 ||r(i)||/||b|| 6.443375996451e+05 > 10 KSP unpreconditioned resid norm 3.126354693526e+00 true resid norm 8.054228269923e+08 ||r(i)||/||b|| 6.443382615939e+05 > > meaning that the linear solver is not making any progress. With the full set the as the preconditioned residual gets smaller the true one does as well, to some degree > > 0 KSP unpreconditioned resid norm 1.250000000000e+03 true resid norm 1.250000000000e+03 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP unpreconditioned resid norm 1.247314056942e+03 true resid norm 1.247314056942e+03 ||r(i)||/||b|| 9.978512455538e-01 > 2 KSP unpreconditioned resid norm 4.860927105197e-02 true resid norm 4.776005315904e-02 ||r(i)||/||b|| 3.820804252723e-05 > 3 KSP unpreconditioned resid norm 4.787844580540e-02 true resid norm 4.752835483387e-02 ||r(i)||/||b|| 3.802268386709e-05 > 4 KSP unpreconditioned resid norm 4.437366678888e-02 true resid norm 2.756372625290e-01 ||r(i)||/||b|| 2.205098100232e-04 > 5 KSP unpreconditioned resid norm 1.696908986557e-05 true resid norm 5.105761919450e+00 ||r(i)||/||b|| 4.084609535560e-03 > > > Try using right preconditioning where the preconditioned residual does not appear -ksp_pc_side right what happens in both cases? > > Barry > > > > >> On Nov 23, 2015, at 2:44 PM, Alex Lindsay wrote: >> >> I've found that with a "full" variable set, I can get convergence. However, if I remove two of my variables (the other variables have no dependence on the variables that I remove; the coupling is one-way), then I no longer get convergence. I've attached logs of one time-step for both the converged and non-converged cases. >> >> On 11/23/2015 01:29 PM, Alex Lindsay wrote: >>> On 11/20/2015 02:33 PM, Jed Brown wrote: >>>> Alex Lindsay writes: >>>>> I'm almost ashamed to share my condition number because I'm sure it must >>>>> be absurdly high. Without applying -ksp_diagonal_scale and >>>>> -ksp_diagonal_scale_fix, the condition number is around 1e25. When I do >>>>> apply those two parameters, the condition number is reduced to 1e17. >>>>> Even after scaling all my variable residuals so that they were all on >>>>> the order of unity (a suggestion on the Moose list), I still have a >>>>> condition number of 1e12. >>>> Double precision provides 16 digits of accuracy in the best case. When >>>> you finite difference, the accuracy is reduced to 8 digits if the >>>> differencing parameter is chosen optimally. With the condition numbers >>>> you're reporting, your matrix is singular up to available precision. >>>> >>>>> I have no experience with condition numbers, but knowing that perfect >>>>> condition number is unity, 1e12 seems unacceptable. What's an >>>>> acceptable upper limit on the condition number? Is it problem >>>>> dependent? Having already tried scaling the individual variable >>>>> residuals, I'm not exactly sure what my next method would be for >>>>> trying to reduce the condition number. >>>> Singular operators are often caused by incorrect boundary conditions. >>>> You should try a small and simple version of your problem and find out >>>> why it's producing a singular (or so close to singular we can't tell) >>>> operator. >>> Could large variable values also create singular operators? I'm essentially solving an advection-diffusion-reaction problem for several species where the advection is driven by an electric field. The species concentrations are in a logarithmic form such that the true concentration is given by exp(u). With my current units (# of particles / m^3) exp(u) is anywhere from 1e13 to 1e20, and thus the initial residuals are probably on the same order of magnitude. After I've assembled the total residual for each variable and before the residual is passed to the solver, I apply scaling to the residuals such that the sum of the variable residuals is around 1e3. But perhaps I lose some accuracy during the residual assembly process? >>> >>> I'm equating "incorrect" boundary conditions to "unphysical" or "unrealistic" boundary conditions. Hopefully that's fair. >> From bsmith at mcs.anl.gov Mon Nov 23 18:25:22 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 23 Nov 2015 18:25:22 -0600 Subject: [petsc-users] Debugging failed solve (what's an acceptable upper bound to the condition number?) In-Reply-To: <5653AC11.2070405@ncsu.edu> References: <564F691F.9020302@ncsu.edu> <877flc31qi.fsf@jedbrown.org> <56535AF3.4010007@ncsu.edu> <56537AB1.6080404@ncsu.edu> <5653AC11.2070405@ncsu.edu> Message-ID: <14425800-90BE-4C3D-B093-27CF43B921A4@mcs.anl.gov> > On Nov 23, 2015, at 6:15 PM, Alex Lindsay wrote: > > I probably should have said that I'm getting convergence with the Jacobian-free method. I still haven't had any luck with Newton. It appears that the default for Jacobian-free is right preconditioning; I don't know if that's a Moose setting or PetSc I don't know what you mean when you say "Jacobian-free method" and then "Newton" as if they are different methods. Jacobean-free Newton is a particular type of Newton method where the matrix representing the full Jacobian is not explicitly computed and stored instead the action of the Jacobian is "somehow" applied to vectors, often using a differencing of the function evaluation at two points (which is likely what Moose does). Read up on http://www.mcs.anl.gov/petsc/documentation/faq.html#newton Barry > . Anyways, if I try left preconditioning, I get comparable or slightly better performance with the full set. Still no convergence with the smaller set. > > I'll move forward with the full set, but I still want to understand why I can't achieve convergence with the smaller set. I think I'm going to review my linear algebra and do some research on preconditioning and iterative solutions so I have a better grasp of what's going on here. > > On 11/23/2015 04:51 PM, Barry Smith wrote: >> So just keep the "full" variable set. >> >> Note that without the full set the true residual is not tracking the preconditioned residual >> >> 0 KSP unpreconditioned resid norm 1.250000000000e+03 true resid norm 1.250000000000e+03 ||r(i)||/||b|| 1.000000000000e+00 >> 1 KSP unpreconditioned resid norm 1.250000000000e+03 true resid norm 1.250000000000e+03 ||r(i)||/||b|| 1.000000000000e+00 >> 2 KSP unpreconditioned resid norm 2.182679427760e+02 true resid norm 7.819529716916e+08 ||r(i)||/||b|| 6.255623773533e+05 >> 3 KSP unpreconditioned resid norm 1.011652745364e+02 true resid norm 4.461678857470e+01 ||r(i)||/||b|| 3.569343085976e-02 >> 4 KSP unpreconditioned resid norm 8.125623676015e+00 true resid norm 8.053940519499e+08 ||r(i)||/||b|| 6.443152415599e+05 >> 5 KSP unpreconditioned resid norm 5.805247155944e+00 true resid norm 8.054105876447e+08 ||r(i)||/||b|| 6.443284701157e+05 >> 6 KSP unpreconditioned resid norm 4.756488143441e+00 true resid norm 8.054162537433e+08 ||r(i)||/||b|| 6.443330029946e+05 >> 7 KSP unpreconditioned resid norm 4.126450902175e+00 true resid norm 8.054191165808e+08 ||r(i)||/||b|| 6.443352932646e+05 >> 8 KSP unpreconditioned resid norm 3.694696196953e+00 true resid norm 8.054208439360e+08 ||r(i)||/||b|| 6.443366751488e+05 >> 9 KSP unpreconditioned resid norm 3.375152117403e+00 true resid norm 8.054219995564e+08 ||r(i)||/||b|| 6.443375996451e+05 >> 10 KSP unpreconditioned resid norm 3.126354693526e+00 true resid norm 8.054228269923e+08 ||r(i)||/||b|| 6.443382615939e+05 >> >> meaning that the linear solver is not making any progress. With the full set the as the preconditioned residual gets smaller the true one does as well, to some degree >> >> 0 KSP unpreconditioned resid norm 1.250000000000e+03 true resid norm 1.250000000000e+03 ||r(i)||/||b|| 1.000000000000e+00 >> 1 KSP unpreconditioned resid norm 1.247314056942e+03 true resid norm 1.247314056942e+03 ||r(i)||/||b|| 9.978512455538e-01 >> 2 KSP unpreconditioned resid norm 4.860927105197e-02 true resid norm 4.776005315904e-02 ||r(i)||/||b|| 3.820804252723e-05 >> 3 KSP unpreconditioned resid norm 4.787844580540e-02 true resid norm 4.752835483387e-02 ||r(i)||/||b|| 3.802268386709e-05 >> 4 KSP unpreconditioned resid norm 4.437366678888e-02 true resid norm 2.756372625290e-01 ||r(i)||/||b|| 2.205098100232e-04 >> 5 KSP unpreconditioned resid norm 1.696908986557e-05 true resid norm 5.105761919450e+00 ||r(i)||/||b|| 4.084609535560e-03 >> >> >> Try using right preconditioning where the preconditioned residual does not appear -ksp_pc_side right what happens in both cases? >> >> Barry >> >> >> >> >>> On Nov 23, 2015, at 2:44 PM, Alex Lindsay wrote: >>> >>> I've found that with a "full" variable set, I can get convergence. However, if I remove two of my variables (the other variables have no dependence on the variables that I remove; the coupling is one-way), then I no longer get convergence. I've attached logs of one time-step for both the converged and non-converged cases. >>> >>> On 11/23/2015 01:29 PM, Alex Lindsay wrote: >>>> On 11/20/2015 02:33 PM, Jed Brown wrote: >>>>> Alex Lindsay writes: >>>>>> I'm almost ashamed to share my condition number because I'm sure it must >>>>>> be absurdly high. Without applying -ksp_diagonal_scale and >>>>>> -ksp_diagonal_scale_fix, the condition number is around 1e25. When I do >>>>>> apply those two parameters, the condition number is reduced to 1e17. >>>>>> Even after scaling all my variable residuals so that they were all on >>>>>> the order of unity (a suggestion on the Moose list), I still have a >>>>>> condition number of 1e12. >>>>> Double precision provides 16 digits of accuracy in the best case. When >>>>> you finite difference, the accuracy is reduced to 8 digits if the >>>>> differencing parameter is chosen optimally. With the condition numbers >>>>> you're reporting, your matrix is singular up to available precision. >>>>> >>>>>> I have no experience with condition numbers, but knowing that perfect >>>>>> condition number is unity, 1e12 seems unacceptable. What's an >>>>>> acceptable upper limit on the condition number? Is it problem >>>>>> dependent? Having already tried scaling the individual variable >>>>>> residuals, I'm not exactly sure what my next method would be for >>>>>> trying to reduce the condition number. >>>>> Singular operators are often caused by incorrect boundary conditions. >>>>> You should try a small and simple version of your problem and find out >>>>> why it's producing a singular (or so close to singular we can't tell) >>>>> operator. >>>> Could large variable values also create singular operators? I'm essentially solving an advection-diffusion-reaction problem for several species where the advection is driven by an electric field. The species concentrations are in a logarithmic form such that the true concentration is given by exp(u). With my current units (# of particles / m^3) exp(u) is anywhere from 1e13 to 1e20, and thus the initial residuals are probably on the same order of magnitude. After I've assembled the total residual for each variable and before the residual is passed to the solver, I apply scaling to the residuals such that the sum of the variable residuals is around 1e3. But perhaps I lose some accuracy during the residual assembly process? >>>> >>>> I'm equating "incorrect" boundary conditions to "unphysical" or "unrealistic" boundary conditions. Hopefully that's fair. >>> > From gpitton at sissa.it Tue Nov 24 08:00:34 2015 From: gpitton at sissa.it (Giuseppe Pitton) Date: Tue, 24 Nov 2015 15:00:34 +0100 Subject: [petsc-users] Using PFFT within PETSc In-Reply-To: References: <5652C0CD.40100@sissa.it> Message-ID: <56546D82.1030502@sissa.it> Thanks Barry. Indeed, the problem is the different vector distribution between PETSc and FFTW/PFFT. This however still leaves open the issue of how to deal with an array coming from FFTW. In the attached code, the vectors u and uy are saved correctly in hdf5, but for some reason the vector ux is not (not in parallel at least). I cannot find the error, to me the three vectors look as they are written exactly in the same way. Giuseppe On 11/23/2015 11:03 PM, Barry Smith wrote: > Your issues are likely due to a difference in how PETSc and pfft() think the "vectors" are laid out across processes > > You seem to assume that DMDACreate2d() and pfft_create_procmesh() will make the same decisions about parallel layout; you print some information from the pfft_create() > > pfft_init(); > pfft_create_procmesh_1d(PETSC_COMM_WORLD,ncpus,&comm_cart_1d); > alloc_local = pfft_local_size_dft(2,n,comm_cart_1d,PFFT_TRANSPOSED_NONE,local_no,local_o_start,local_ni,local_i_start); > > PetscPrintf(PETSC_COMM_SELF,"alloc_local: %d\n",(PetscInt)alloc_local); > PetscPrintf(PETSC_COMM_SELF,"local_no: %d %d\n",(PetscInt)local_no[0],(PetscInt)local_no[1]); > PetscPrintf(PETSC_COMM_SELF,"local_ni: %d %d\n",(PetscInt)local_ni[0],(PetscInt)local_ni[1]); > PetscPrintf(PETSC_COMM_SELF,"local_o_start: %d %d\n",(PetscInt)local_o_start[0],(PetscInt)local_o_start[1]); > PetscPrintf(PETSC_COMM_SELF,"local_i_start: %d %d\n",(PetscInt)local_i_start[0],(PetscInt)local_i_start[1]); > > but do not check anything about the layout DMDACreate2d() selected. First you need to call DMDAGetInfo() and DMDAGetLocalInfo() to see the layout PETSc is using and make sure it matches pfft > > > Barry > > >> On Nov 23, 2015, at 1:31 AM, Giuseppe Pitton wrote: >> >> Dear users and developers, >> I am trying to interface PETSc with the parallel fast Fourier transform library PFFT (https://www-user.tu-chemnitz.de/~potts/workgroup/pippig/software.php.en), based in turn on FFTW. My plan is to build a spectral differentiation code, and in the attached files you can see a simple example. >> The code works correctly in serial, but in parallel there are some problems regarding the output of the results, I think due to some differences in the way PETSc and PFFT store data, but I'm not sure if this is really the issue. >> In the attached code, the number of processors used should be specified at compile time in the variable "ncpus". As long as ncpus = 1, everything works fine, but if ncpus = 2 or an higher power of 2, the code terminates correctly but the results show some artifacts, as you can see from the generated hdf5 file, named "output-pfft.h5". >> In the makefile the variables PFFTINC, PFFTLIB, FFTWINC and FFTWLIB should be set correctly. >> Thank you, >> Giuseppe >> >> -------------- next part -------------- A non-text attachment was scrubbed... Name: test-fftw.c Type: text/x-csrc Size: 8040 bytes Desc: not available URL: From bsmith at mcs.anl.gov Tue Nov 24 10:41:28 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 24 Nov 2015 10:41:28 -0600 Subject: [petsc-users] Using PFFT within PETSc In-Reply-To: <56546D82.1030502@sissa.it> References: <5652C0CD.40100@sissa.it> <56546D82.1030502@sissa.it> Message-ID: <4328A558-59EE-4307-8DED-4A4F0A22343B@mcs.anl.gov> I don't understand what you mean but likely the issue comes from the fact that DMDA vectors are automatically saved to disk in the natural ordering (unlike other vectors you may create) independent of the parallel layout of the vector. Barry > On Nov 24, 2015, at 8:00 AM, Giuseppe Pitton wrote: > > Thanks Barry. Indeed, the problem is the different vector distribution between PETSc and FFTW/PFFT. > This however still leaves open the issue of how to deal with an array coming from FFTW. > In the attached code, the vectors u and uy are saved correctly in hdf5, but for some reason the vector ux is not (not in parallel at least). I cannot find the error, to me the three vectors look as they are written exactly in the same way. > > Giuseppe > > > > On 11/23/2015 11:03 PM, Barry Smith wrote: >> Your issues are likely due to a difference in how PETSc and pfft() think the "vectors" are laid out across processes >> >> You seem to assume that DMDACreate2d() and pfft_create_procmesh() will make the same decisions about parallel layout; you print some information from the pfft_create() >> >> pfft_init(); >> pfft_create_procmesh_1d(PETSC_COMM_WORLD,ncpus,&comm_cart_1d); >> alloc_local = pfft_local_size_dft(2,n,comm_cart_1d,PFFT_TRANSPOSED_NONE,local_no,local_o_start,local_ni,local_i_start); >> >> PetscPrintf(PETSC_COMM_SELF,"alloc_local: %d\n",(PetscInt)alloc_local); >> PetscPrintf(PETSC_COMM_SELF,"local_no: %d %d\n",(PetscInt)local_no[0],(PetscInt)local_no[1]); >> PetscPrintf(PETSC_COMM_SELF,"local_ni: %d %d\n",(PetscInt)local_ni[0],(PetscInt)local_ni[1]); >> PetscPrintf(PETSC_COMM_SELF,"local_o_start: %d %d\n",(PetscInt)local_o_start[0],(PetscInt)local_o_start[1]); >> PetscPrintf(PETSC_COMM_SELF,"local_i_start: %d %d\n",(PetscInt)local_i_start[0],(PetscInt)local_i_start[1]); >> >> but do not check anything about the layout DMDACreate2d() selected. First you need to call DMDAGetInfo() and DMDAGetLocalInfo() to see the layout PETSc is using and make sure it matches pfft >> >> >> Barry >> >> >>> On Nov 23, 2015, at 1:31 AM, Giuseppe Pitton wrote: >>> >>> Dear users and developers, >>> I am trying to interface PETSc with the parallel fast Fourier transform library PFFT (https://www-user.tu-chemnitz.de/~potts/workgroup/pippig/software.php.en), based in turn on FFTW. My plan is to build a spectral differentiation code, and in the attached files you can see a simple example. >>> The code works correctly in serial, but in parallel there are some problems regarding the output of the results, I think due to some differences in the way PETSc and PFFT store data, but I'm not sure if this is really the issue. >>> In the attached code, the number of processors used should be specified at compile time in the variable "ncpus". As long as ncpus = 1, everything works fine, but if ncpus = 2 or an higher power of 2, the code terminates correctly but the results show some artifacts, as you can see from the generated hdf5 file, named "output-pfft.h5". >>> In the makefile the variables PFFTINC, PFFTLIB, FFTWINC and FFTWLIB should be set correctly. >>> Thank you, >>> Giuseppe >>> >>> > > From mfadams at lbl.gov Tue Nov 24 14:31:06 2015 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 24 Nov 2015 15:31:06 -0500 Subject: [petsc-users] CG+GAMG convergence issues in GHEP Krylov-Schur for some MPI runs In-Reply-To: References: <084F486B-00B4-4803-A95B-EE390EB25391@gmail.com> <2724FD04-9FED-43B8-B8A4-9F66BB8BD43B@dsic.upv.es> <4E4A4E38-B9C3-4F41-8D52-1AFF3E4E8C8C@gmail.com> <5ED4A81C-002B-4A6B-BC6D-F93BE200F0A5@mcs.anl.gov> <11E7B1E9-D812-4717-A9F2-929A218573E0@mcs.anl.gov> Message-ID: On Tue, Nov 17, 2015 at 1:38 AM, Denis Davydov wrote: > Hi Mark, > > > On 12 Nov 2015, at 21:16, Mark Adams wrote: > > > > There is a valgrind for El Capitan now and I have it. It runs perfectly > clean. > Do you compile it yourself or use Homebrew / MacPorts? > Sorry for the delay. I ended up downloading it, but Homebrew works also (now, I think it did not work and I had to download it). Note, the web site said El Capitan support was partial so I don't know if they cover everything now. > I always seem to have some noise it valgrind at least from OpenMPI (even > with suppression file), > perhaps it?s better with MPICH. > > Kind regards, > Denis > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott at speakernickscott.net Tue Nov 24 18:26:43 2015 From: scott at speakernickscott.net (Nick Scott) Date: Wed, 25 Nov 2015 00:26:43 +0000 Subject: [petsc-users] Speaker for your Event Message-ID: <047d7bdc14324b5a3c05255283d3@google.com> I am a professional speaker and would like to work something out to come and speak. One of my core messages is that overcoming adversity is all about a mindset and I am living proof. To help your Marketing Team make this a successful event I have many high resolutions pictures and other material for them to use, plus I will mail you Free 6x9 Pictures to handout prior to the event. You can see and read more what I can do to help make the event a huge success. www.speakernickscott.com/planning_tools Below is some information of who I am and you can watch me speak live with the demo. I can customize my speech to fit your theme and get across your message in a very powerful way. Looking forward to working something out. On Saturday, September 26, 2015 I was one of the performers for Pope Francis and you can see it here: www.speakernickscott.com/pope All Programs: http://www.speakernickscott.com/all_programs Demo/Speaking: http://www.speakernickscott.com/demo My Biography: http://www.speakernickscott.com/biography Link: Professional Resume & CV Link: Professional Press Kit *Entrepreneurism and Education:* - Nick's Movie Perspective is the Overall Best Film Winner at the 2011 Arnold Sports Film Festival - Associate of Science - Bachelor's in Business Administration - Management Programs - Cash Flow Consultant - Certified Fitness Trainer (CFT) - Certified as a Specialist in Performance Nutrition (SPN) *Leadership and Motivation:* - Founder/CEO/President of Wheelchair Athletics, Inc. - Founder/CEO/President of Wheelchair Bodybuilding, Inc. - Published Author - Global Promoter of Wheelchair Bodybuilding - Abilities Expo National Ambassador - Expo's for the Disable Community - Founder/Owner of www.WheelchairBodybuilding.com - Professional Speaker - Professional Model - Certified Fitness Trainer - National Spokesperson - Actor *Athletic Achievements:* - Paralypian - Spokes Model - #1 IFBB Professional Wheelchair Bodybuilder in the World - Professional Wheelchair Ballroom Dancer - 2-Time World Powerlifting Champion - Competed and Guest Posed over 200 Bodybuilding Shows/Events around the World - Signed Athlete - Sponsored Athlete Sincerely, *Nick Scott* (785) 418-4191 CEO | President | Founder Wheelchair Athletics Foundation | Wheelchair Bodybuilding, Inc. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ii_ifamhaqe0_1502c1d445c56100 Type: image/jpeg Size: 123449 bytes Desc: not available URL: From davydden at gmail.com Wed Nov 25 03:10:37 2015 From: davydden at gmail.com (Denis Davydov) Date: Wed, 25 Nov 2015 10:10:37 +0100 Subject: [petsc-users] [SLEPc] GD is not deterministic when using different number of cores In-Reply-To: <24297832-E5E4-419B-91D3-09D38F267E9F@dsic.upv.es> References: <8E937148-1B39-443F-9A80-40776619472F@gmail.com> <24297832-E5E4-419B-91D3-09D38F267E9F@dsic.upv.es> Message-ID: <2AF785CF-A552-4E24-BD43-B7F1065867A0@gmail.com> > On 19 Nov 2015, at 11:19, Jose E. Roman wrote: > >> >> El 19 nov 2015, a las 10:49, Denis Davydov escribi?: >> >> Dear all, >> >> I was trying to get some scaling results for the GD eigensolver as applied to the density functional theory. >> Interestingly enough, the number of self-consistent iterations (solution of couple eigenvalue problem and poisson equations) >> depends on the number of MPI cores used. For my case the range of iterations is 19-24 for MPI cores between 2 and 160. >> That makes the whole scaling check useless as the eigenproblem is solved different number of times. >> >> That is **not** the case when I use Krylov-Schur eigensolver with zero shift, which makes me believe that I am missing some settings on GD to make it fully deterministic. The only non-deterministic part I am currently aware of is the initial subspace for the first SC iterations. But that?s the case for both KS and GD. For subsequent iterations I provide previously obtained eigenvectors as initial subspace. >> >> Certainly there will be some round-off error due to different partition of DoFs for different number of MPI cores, >> but i don?t expect it to have such a strong influence. Especially given the fact that I don?t see this problem with KS. >> >> Below is the output of -eps-view for GD with -eps_type gd -eps_harmonic -st_pc_type bjacobi -eps_gd_krylov_start -eps_target -10.0 >> I would appreciate any suggestions on how to address the issue. > > The block Jacobi preconditioner differs when you change the number of processes. This will probably make GD iterate more when you use more processes. I figured else was causing different solution for different MPI cores: -eps_harmonic. As soon as I remove it from GD and JD, i have the same number of eigenproblems solved until convergence for all MPI cores (1,2,4,10,20) and for all methods (KS/GD/JD). Regards, Denis. -------------- next part -------------- An HTML attachment was scrubbed... URL: From arne.morten.kvarving at sintef.no Wed Nov 25 06:16:01 2015 From: arne.morten.kvarving at sintef.no (Arne Morten Kvarving) Date: Wed, 25 Nov 2015 13:16:01 +0100 Subject: [petsc-users] .pc file does not include dependencies Message-ID: <5655A681.1070703@sintef.no> hi, a while back i was told to use the .pc file for petsc if i wanted pkgconfig style (duh) configuration. i used to use a (confusingly named) PETScConfig.cmake file for this, which was not intended for such use. problem now is that the pc file of petsc is a bit broken. in particular, i want to link petsc static, but not all the dependencies. if i build petsc (static or not), the lib entry is listed as Libs: -L/lib -lpetsc now, the dependencies are listed in Libs.private: /usr/lib/liblapack.a /usr/lib/libblas.a libX11.a libpthread.a libssl.a libcrypto.a libhwloc.a libm.a -L/usr/lib/gcc/x86_64-linux-gnu/4.9 -L/usr/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lgfortran -lm -lgfortran -lm -lquadmath -lm -L/usr/lib/gcc/x86_64-linux-gnu/4.9 -L/usr/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lstdc++ -L/usr/lib/gcc/x86_64-linux-gnu/4.9 -L/usr/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl -lgcc_s -ldl -ldl which are for a totally static link (you'd use --static with pkg-config to obtain it). the problem with a totally static link, however, is that pkg-config returns this: pkg-config --static --libs PETSc -> /usr/lib/liblapack.a /usr/lib/libblas.a libX11.a libpthread.a libssl.a libcrypto.a libhwloc.a libm.a -L/home/akva/kode/petsc/current/linux-gnu-cxx-opt/lib -L/usr/lib/gcc/x86_64-linux-gnu/4.9 -lpetsc -lgfortran -lquadmath -lm -lstdc++ -lgcc_s -ldl which breaks at linking because petsc won't see any of the symbols in those .a files before (static link is order dependent). so i cannot use this file to do a fully static build either (unless i add -Wl,--whole-archive or equivalent hacks). my suggested fix would be to always list the link dependencies in the Libs: line (and in the proper order - -lpetsc goes first). this works fine with dynamic links (even though -lpetsc would bring in the dependencies) and it works with petsc static. before i try to grok the petsc buildsystem i wanted to check if this would be welcome, or even better if i there is something i'm missing.. arnem From knepley at gmail.com Wed Nov 25 07:07:50 2015 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 25 Nov 2015 07:07:50 -0600 Subject: [petsc-users] .pc file does not include dependencies In-Reply-To: <5655A681.1070703@sintef.no> References: <5655A681.1070703@sintef.no> Message-ID: On Wed, Nov 25, 2015 at 6:16 AM, Arne Morten Kvarving < arne.morten.kvarving at sintef.no> wrote: > hi, > > a while back i was told to use the .pc file for petsc if i wanted > pkgconfig style (duh) configuration. > i used to use a (confusingly named) PETScConfig.cmake file for this, which > was not intended for such use. > > problem now is that the pc file of petsc is a bit broken. > > in particular, i want to link petsc static, but not all the dependencies. > if i build petsc (static or not), > the lib entry is listed as > > Libs: -L/lib -lpetsc > > now, the dependencies are listed in > > Libs.private: /usr/lib/liblapack.a /usr/lib/libblas.a libX11.a > libpthread.a libssl.a libcrypto.a libhwloc.a libm.a > -L/usr/lib/gcc/x86_64-linux-gnu/4.9 -L/usr/lib/x86_64-linux-gnu > -L/lib/x86_64-linux-gnu -lgfortran -lm -lgfortran -lm -lquadmath -lm > -L/usr/lib/gcc/x86_64-linux-gnu/4.9 -L/usr/lib/x86_64-linux-gnu > -L/lib/x86_64-linux-gnu -lstdc++ -L/usr/lib/gcc/x86_64-linux-gnu/4.9 > -L/usr/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu > -L/usr/lib/x86_64-linux-gnu -ldl -lgcc_s -ldl -ldl > > which are for a totally static link (you'd use --static with pkg-config to > obtain it). > > the problem with a totally static link, however, is that pkg-config > returns this: > > pkg-config --static --libs PETSc -> > > /usr/lib/liblapack.a /usr/lib/libblas.a libX11.a libpthread.a libssl.a > libcrypto.a libhwloc.a libm.a > -L/home/akva/kode/petsc/current/linux-gnu-cxx-opt/lib > -L/usr/lib/gcc/x86_64-linux-gnu/4.9 -lpetsc -lgfortran -lquadmath -lm > -lstdc++ -lgcc_s -ldl > > which breaks at linking because petsc won't see any of the symbols in > those .a files before (static link is order dependent). > I do not understand this explanation. 1) Why does the link fail? Please send the error. 2) Is pkgconfig changing/filtering the link line? That would seem like a bug in pkgconfig. Matt > so i cannot use this file to do a fully static build either (unless i add > -Wl,--whole-archive or equivalent hacks). > > my suggested fix would be to always list the link dependencies in the > Libs: line (and in the proper order - -lpetsc goes first). this works fine > with dynamic links (even though -lpetsc would bring in the dependencies) > and it works with petsc static. > > before i try to grok the petsc buildsystem i wanted to check if this would > be welcome, or even better if i there is something i'm missing.. > > arnem > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From arne.morten.kvarving at sintef.no Wed Nov 25 07:50:59 2015 From: arne.morten.kvarving at sintef.no (Arne Morten Kvarving) Date: Wed, 25 Nov 2015 14:50:59 +0100 Subject: [petsc-users] .pc file does not include dependencies In-Reply-To: <5655B8C2.4060708@sintef.no> References: <5655B8C2.4060708@sintef.no> Message-ID: <5655BCC3.4000808@sintef.no> On 25/11/15 14:07, Matthew Knepley wrote: > On Wed, Nov 25, 2015 at 6:16 AM, Arne Morten Kvarving > wrote: > > hi, > > a while back i was told to use the .pc file for petsc if i wanted > pkgconfig style (duh) configuration. > i used to use a (confusingly named) PETScConfig.cmake file for > this, which was not intended for such use. > > problem now is that the pc file of petsc is a bit broken. > > in particular, i want to link petsc static, but not all the > dependencies. if i build petsc (static or not), > the lib entry is listed as > > Libs: -L/lib -lpetsc > > now, the dependencies are listed in > > Libs.private: /usr/lib/liblapack.a /usr/lib/libblas.a libX11.a > libpthread.a libssl.a libcrypto.a libhwloc.a libm.a > -L/usr/lib/gcc/x86_64-linux-gnu/4.9 -L/usr/lib/x86_64-linux-gnu > -L/lib/x86_64-linux-gnu -lgfortran -lm -lgfortran -lm -lquadmath > -lm -L/usr/lib/gcc/x86_64-linux-gnu/4.9 > -L/usr/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lstdc++ > -L/usr/lib/gcc/x86_64-linux-gnu/4.9 -L/usr/lib/x86_64-linux-gnu > -L/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl -lgcc_s > -ldl -ldl > > which are for a totally static link (you'd use --static with > pkg-config to obtain it). > > the problem with a totally static link, however, is that > pkg-config returns this: > > pkg-config --static --libs PETSc -> > > /usr/lib/liblapack.a /usr/lib/libblas.a libX11.a libpthread.a > libssl.a libcrypto.a libhwloc.a libm.a > -L/home/akva/kode/petsc/current/linux-gnu-cxx-opt/lib > -L/usr/lib/gcc/x86_64-linux-gnu/4.9 -lpetsc -lgfortran -lquadmath > -lm -lstdc++ -lgcc_s -ldl > > which breaks at linking because petsc won't see any of the symbols > in those .a files before (static link is order dependent). > > > I do not understand this explanation. > > 1) Why does the link fail? Please send the error. it's because with static linking, the order of libraries matter. if you do -lfoo -lbar the linker will see the symbols in the bar library when linking foo, but not the other way around. thus the linker do not see the symbols in e.g. libX11.a when linking petsc and you get a lots of inker errors, random line; xops.c:(.text+0x2a9): undefined reference to `XSetForeground' xops.c:(.text+0x3e0): undefined reference to `XDrawLine' /home/akva/kode/petsc/current/linux-gnu-cxx-opt/lib/libpetsc.a(xops.o): In function `PetscDrawArrow_X(_p_PetscDraw*, double, double, double, double, int)': > > 2) Is pkgconfig changing/filtering the link line? That would seem like > a bug in pkgconfig. yes, pkg-config is behaving very weird. i have no idea of the logic here, it's likely a bug. if I 1) move Libs.private below Libs in the .pc file 2) replace the absolute references to archive foo.a with -lfoo it works. i.e. using http://paste.ubuntu.com/13502223/ (ignore the irrelevant changes, that is using the variables instead of absolute paths), i get /usr/lib/liblapack.a /usr/lib/libblas.a libX11.a libpthread.a libssl.a libcrypto.a libhwloc.a libm.a -L/home/akva/kode/petsc/current/linux-gnu-cxx-opt/lib -L/usr/lib/gcc/x86_64-linux-gnu/4.9 -lpetsc -lgfortran -lquadmath -lm -lstdc++ -lgcc_s -ldl which breaks (so step 1 is not enough). if i however use http://paste.ubuntu.com/13502228/ i get -L/home/akva/kode/petsc/current/linux-gnu-cxx-opt/lib -L/usr/lib/gcc/x86_64-linux-gnu/4.9 -lpetsc -llapack -lblas -lX11 -lpthread -lssl -lcrypto -lhwloc -lm -lstdc++ -lgcc_s -ldl as wanted (missing a few -L's in principle for the last file). this is with pkg-config 0.26 (ubuntu trusty). arnem -------------- next part -------------- An HTML attachment was scrubbed... URL: From arne.morten.kvarving at sintef.no Wed Nov 25 07:52:36 2015 From: arne.morten.kvarving at sintef.no (Arne Morten Kvarving) Date: Wed, 25 Nov 2015 14:52:36 +0100 Subject: [petsc-users] .pc file does not include dependencies In-Reply-To: <5655BC70.2020901@sintef.no> References: <5655BC70.2020901@sintef.no> Message-ID: <5655BD24.4050308@sintef.no> actually, it's proper behavior according to pkgconfig documentation. all options which do not start with -l or -L is stuck in the '--libs-only-other' output, and these will be first when it builds up the output string with --static --libs. so the proper fix is indeed for petsc to use -l and -L for those static dependencies i believe. sorry for the dupes matthew, reply-to-all-nub-error on my behalf.. From bsmith at mcs.anl.gov Wed Nov 25 12:34:52 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 25 Nov 2015 12:34:52 -0600 Subject: [petsc-users] .pc file does not include dependencies In-Reply-To: <5655BD24.4050308@sintef.no> References: <5655BC70.2020901@sintef.no> <5655BD24.4050308@sintef.no> Message-ID: It is very possible (make that extremely possible) that we are not generating the .pc correctly. We are not that familiar with pkg-config. If you could let us know what it should contain we can fix its generation to put in what is needed. Ideally you would send use two test make rules (one static one shared) so that as we try to fix the .pc file we have some confirmation that we have fixed it correctly. Thanks for you help, Barry > On Nov 25, 2015, at 7:52 AM, Arne Morten Kvarving wrote: > > actually, it's proper behavior according to pkgconfig documentation. all options which do not start with -l or -L is stuck in the '--libs-only-other' output, and these will be first when it builds up the output string with --static --libs. so the proper fix is indeed for petsc to use -l and -L for those static dependencies i believe. > > sorry for the dupes matthew, reply-to-all-nub-error on my behalf.. > > From balay at mcs.anl.gov Wed Nov 25 12:53:47 2015 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 25 Nov 2015 12:53:47 -0600 Subject: [petsc-users] .pc file does not include dependencies In-Reply-To: <5655A681.1070703@sintef.no> References: <5655A681.1070703@sintef.no> Message-ID: I don't see this behavior on my linux box. i.e [with --static option] -lpetsc is listed before -lx11 Satish balay at asterix /home/balay/petsc/arch-maint/lib/pkgconfig (maint>) $ cat PETSc.pc prefix=/home/balay/petsc exec_prefix=${prefix} includedir=${prefix}/include libdir=/home/balay/petsc/arch-maint/lib ccompiler=mpicc fcompiler=mpif90 blaslapacklibs=-llapack -lblas Name: PETSc Description: Library to solve ODEs and algebraic equations Version: 3.6.2 Cflags: -I/home/balay/petsc/include -I/home/balay/petsc/arch-maint/include -I/home/balay/soft/mpich-3.1.4/include Libs: -L/home/balay/petsc/arch-maint/lib -lpetsc Libs.private: liblapack.a libblas.a libX11.a libpthread.a libm.a -Wl,-rpath,/home/balay/soft/mpich-3.1.4/lib -L/home/balay/soft/mpich-3.1.4/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/5.1.1 -L/usr/lib/gcc/x86_64-redhat-linux/5.1.1 -lmpifort -lgfortran -lm -Wl,-rpath,/home/balay/soft/mpich-3.1.4/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/5.1.1 -lgfortran -lm -lquadmath -lm -Wl,-rpath,/home/balay/soft/mpich-3.1.4/lib -L/home/balay/soft/mpich-3.1.4/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/5.1.1 -L/usr/lib/gcc/x86_64-redhat-linux/5.1.1 -Wl,-rpath,/home/balay/soft/mpich-3.1.4/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/5.1.1 -lmpicxx -lstdc++ -Wl,-rpath,/home/balay/soft/mpich-3.1.4/lib -L/home/balay/soft/mpich-3.1.4/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/5.1.1 -L/usr/lib/gcc/x86_64-redhat-linux/5.1.1 -ldl -Wl,-rpath,/home/balay/soft/mpich-3.1.4/lib -lmpi -lgcc_s -ldl balay at asterix /home/balay/petsc/arch-maint/lib/pkgconfig (maint>) $ pkg-config --libs PETSc -L/home/balay/petsc/arch-maint/lib -lpetsc balay at asterix /home/balay/petsc/arch-maint/lib/pkgconfig (maint>) $ pkg-config --static --libs PETSc -L/home/balay/petsc/arch-maint/lib -L/home/balay/soft/mpich-3.1.4/lib -L/usr/lib/gcc/x86_64-redhat-linux/5.1.1 -L/home/balay/soft/mpich-3.1.4/lib -L/usr/lib/gcc/x86_64-redhat-linux/5.1.1 -L/home/balay/soft/mpich-3.1.4/lib -L/usr/lib/gcc/x86_64-redhat-linux/5.1.1 -lpetsc liblapack.a libblas.a libX11.a libpthread.a libm.a -Wl,-rpath,/home/balay/soft/mpich-3.1.4/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/5.1.1 -lmpifort -lgfortran -lm -Wl,-rpath,/home/balay/soft/mpich-3.1.4/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/5.1.1 -lgfortran -lm -lquadmath -lm -Wl,-rpath,/home/balay/soft/mpich-3.1.4/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/5.1.1 -Wl,-rpath,/home/balay/soft/mpich-3.1.4/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/5.1.1 -lmpicxx -lstdc++ -Wl,-rpath,/home/balay/soft/mpich-3.1.4/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/5.1.1 -ldl -Wl,-rpath,/home/balay/soft/mpich-3.1.4/lib -lmpi -lgcc_s -ldl balay at asterix /home/balay/petsc/arch-maint/lib/pkgconfig (maint>) $ pkg-config --version 0.28 balay at asterix /home/balay/petsc/arch-maint/lib/pkgconfig (maint>) $ On Wed, 25 Nov 2015, Arne Morten Kvarving wrote: > hi, > > a while back i was told to use the .pc file for petsc if i wanted pkgconfig > style (duh) configuration. > i used to use a (confusingly named) PETScConfig.cmake file for this, which was > not intended for such use. > > problem now is that the pc file of petsc is a bit broken. > > in particular, i want to link petsc static, but not all the dependencies. if i > build petsc (static or not), > the lib entry is listed as > > Libs: -L/lib -lpetsc > > now, the dependencies are listed in > > Libs.private: /usr/lib/liblapack.a /usr/lib/libblas.a libX11.a libpthread.a > libssl.a libcrypto.a libhwloc.a libm.a -L/usr/lib/gcc/x86_64-linux-gnu/4.9 > -L/usr/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lgfortran -lm -lgfortran > -lm -lquadmath -lm -L/usr/lib/gcc/x86_64-linux-gnu/4.9 > -L/usr/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lstdc++ > -L/usr/lib/gcc/x86_64-linux-gnu/4.9 -L/usr/lib/x86_64-linux-gnu > -L/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl -lgcc_s -ldl -ldl > > which are for a totally static link (you'd use --static with pkg-config to > obtain it). > > the problem with a totally static link, however, is that pkg-config returns > this: > > pkg-config --static --libs PETSc -> > > /usr/lib/liblapack.a /usr/lib/libblas.a libX11.a libpthread.a libssl.a > libcrypto.a libhwloc.a libm.a > -L/home/akva/kode/petsc/current/linux-gnu-cxx-opt/lib > -L/usr/lib/gcc/x86_64-linux-gnu/4.9 -lpetsc -lgfortran -lquadmath -lm -lstdc++ > -lgcc_s -ldl > > which breaks at linking because petsc won't see any of the symbols in those .a > files before (static link is order dependent). > > so i cannot use this file to do a fully static build either (unless i add > -Wl,--whole-archive or equivalent hacks). > > my suggested fix would be to always list the link dependencies in the Libs: > line (and in the proper order - -lpetsc goes first). this works fine with > dynamic links (even though -lpetsc would bring in the dependencies) and it > works with petsc static. > > before i try to grok the petsc buildsystem i wanted to check if this would be > welcome, or even better if i there is something i'm missing.. > > arnem > From balay at mcs.anl.gov Wed Nov 25 13:17:40 2015 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 25 Nov 2015 13:17:40 -0600 Subject: [petsc-users] .pc file does not include dependencies In-Reply-To: <5655BCC3.4000808@sintef.no> References: <5655B8C2.4060708@sintef.no> <5655BCC3.4000808@sintef.no> Message-ID: On Wed, 25 Nov 2015, Arne Morten Kvarving wrote: > if I > 1) move Libs.private below Libs in the .pc file > 2) replace the absolute references to archive foo.a with -lfoo > > it works. > this is with pkg-config 0.26 (ubuntu trusty). Ok - I can reproduce this on ubuntu 12.04 with pkg-config 0.26 For one - the Libs.private are already below Libs in PETSc.pc I'll check why libs are listed as libfoo.a instead of -lfoo in this file. Satish From balay at mcs.anl.gov Wed Nov 25 13:29:46 2015 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 25 Nov 2015 13:29:46 -0600 Subject: [petsc-users] .pc file does not include dependencies In-Reply-To: References: <5655B8C2.4060708@sintef.no> <5655BCC3.4000808@sintef.no> Message-ID: On Wed, 25 Nov 2015, Satish Balay wrote: > I'll check why libs are listed as libfoo.a instead of -lfoo in this file. Ok - the following patch should fix the issue. Could you try it out? Thanks, Satish -------- $ git diff |cat diff --git a/config/PETSc/Configure.py b/config/PETSc/Configure.py index 2a83a71..885bb01 100644 --- a/config/PETSc/Configure.py +++ b/config/PETSc/Configure.py @@ -173,7 +173,7 @@ class Configure(config.base.Configure): fd.write('Libs: '+plibs.replace(os.path.join(self.petscdir.dir,self.arch.arch),self.framework.argDB['prefix'])+'\n') else: fd.write('Libs: '+plibs+'\n') - fd.write('Libs.private: '+' '.join(self.packagelibs+self.libraries.math+self.compilers.flibs+self.compilers.cxxlibs)+' '+self.compilers.LIBS) + fd.write('Libs.private: '+self.libraries.toStringNoDupes(self.packagelibs+self.libraries.math+self.compilers.flibs+self.compilers.cxxlibs)+' '+self.compilers.LIBS+'\n') fd.close() return From bsmith at mcs.anl.gov Wed Nov 25 19:54:39 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 25 Nov 2015 19:54:39 -0600 Subject: [petsc-users] master branch option "-snes_monitor_solution" In-Reply-To: References: <49DD4FA4-8C55-43D8-A6E3-9266231110EC@mcs.anl.gov> Message-ID: Ed, I have fixed the error in the branch barry/update-monitors now in next for testing. There is one API change associated with the fix. To graphically visualize the solution one now needs -ksp/snes/ts_monitor_solution draw the default behavior is now to ASCII print the solution to the screen. Barry Further work is needed to unify and simplified the various monitor options. > On Nov 22, 2015, at 8:55 PM, Ed Bueler wrote: > > Barry -- > > That is reassuring, actually. That is, knowing that occasionally ya'll botch something, and that the problem is not entirely on this leaf of the internets. > > Ed > > > On Sun, Nov 22, 2015 at 5:51 PM, Barry Smith wrote: > > I totally botched that update; looks like I broke a lot of the command line monitor options in master. > > Fixing it properly will take some work but also enhance the command line monitor and reduce the code a bit. > > Thanks for letting us know. > > > Barry > > > On Nov 22, 2015, at 1:40 PM, Ed Bueler wrote: > > > > Dear PETSc -- > > > > When I use option -snes_monitor_solution in master branch I get the error below. I have a sense that this is related to the change listed at http://www.mcs.anl.gov/petsc/documentation/changes/dev.html, namely > > > > "SNESSetMonitor(SNESMonitorXXX, calls now require passing a viewer as the final argument, you can no longer pass a NULL)" > > > > but the error message below is not informative enough to tell me what to do at the command line. > > > > Note that my X11 windows do work, as other options successfully give line graphs etc. > > > > Do I need > > > > -snes_monitor_solution Z > > > > with some value for Z? If so, where are the possibilities documented? > > > > Thanks! > > > > Ed > > > > > > > > $ ./ex5 -snes_monitor_solution > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > [0]PETSC ERROR: Null argument, when expecting valid pointer > > [0]PETSC ERROR: Null Object: Parameter # 4 > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > [0]PETSC ERROR: Petsc Development GIT revision: v3.6.2-1635-g5e95a8a GIT Date: 2015-11-21 16:14:08 -0600 > > [0]PETSC ERROR: ./ex5 on a linux-c-dbg named bueler-leopard by ed Sun Nov 22 10:31:33 2015 > > [0]PETSC ERROR: Configure options --download-mpich --download-triangle --with-debugging=1 > > [0]PETSC ERROR: #1 SNESMonitorSolution() line 33 in /home/ed/petsc/src/snes/interface/snesut.c > > [0]PETSC ERROR: #2 SNESMonitor() line 3383 in /home/ed/petsc/src/snes/interface/snes.c > > [0]PETSC ERROR: #3 SNESSolve_NEWTONLS() line 191 in /home/ed/petsc/src/snes/impls/ls/ls.c > > [0]PETSC ERROR: #4 SNESSolve() line 3984 in /home/ed/petsc/src/snes/interface/snes.c > > [0]PETSC ERROR: #5 main() line 171 in /home/ed/petsc/src/snes/examples/tutorials/ex5.c > > [0]PETSC ERROR: PETSc Option Table entries: > > [0]PETSC ERROR: -snes_monitor_solution > > [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- > > application called MPI_Abort(MPI_COMM_WORLD, 85) - process 0 > > [unset]: aborting job: > > application called MPI_Abort(MPI_COMM_WORLD, 85) - process 0 > > > > > > > > -- > > Ed Bueler > > Dept of Math and Stat and Geophysical Institute > > University of Alaska Fairbanks > > Fairbanks, AK 99775-6660 > > 301C Chapman and 410D Elvey > > 907 474-7693 and 907 474-7199 (fax 907 474-5394) > > > > > -- > Ed Bueler > Dept of Math and Stat and Geophysical Institute > University of Alaska Fairbanks > Fairbanks, AK 99775-6660 > 301C Chapman and 410D Elvey > 907 474-7693 and 907 474-7199 (fax 907 474-5394) From rlmackie862 at gmail.com Thu Nov 26 11:01:04 2015 From: rlmackie862 at gmail.com (Randall Mackie) Date: Thu, 26 Nov 2015 09:01:04 -0800 Subject: [petsc-users] problem compiling Message-ID: <412960B4-F952-4937-AD25-186AF6DF718C@gmail.com> I was trying to recompile PETSc using superlu_dist on a linux system, and I had configure download the necessary packages, and the configure went fine. Compilation bombed out with the error message: /usr/bin/ld cannot find -ldat this came immediately after the ztaulinesearch compile in the make.log. I thought before I mailed off the make.log to petsc-maint I would see if anyone knew the issue here. Thanks, Randy M. From balay at mcs.anl.gov Thu Nov 26 11:40:27 2015 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 26 Nov 2015 11:40:27 -0600 Subject: [petsc-users] problem compiling In-Reply-To: <412960B4-F952-4937-AD25-186AF6DF718C@gmail.com> References: <412960B4-F952-4937-AD25-186AF6DF718C@gmail.com> Message-ID: Its always best to send logs so that we know whats hapenning. To be sure - you can try a clean build. Satish On Thu, 26 Nov 2015, Randall Mackie wrote: > I was trying to recompile PETSc using superlu_dist on a linux system, and I had configure download the necessary packages, and the configure went fine. > > Compilation bombed out with the error message: > > /usr/bin/ld cannot find -ldat > > this came immediately after the ztaulinesearch compile in the make.log. > > I thought before I mailed off the make.log to petsc-maint I would see if anyone knew the issue here. > > > Thanks, > > Randy M. From rlmackie862 at gmail.com Thu Nov 26 12:22:18 2015 From: rlmackie862 at gmail.com (Randall Mackie) Date: Thu, 26 Nov 2015 10:22:18 -0800 Subject: [petsc-users] problem compiling In-Reply-To: References: <412960B4-F952-4937-AD25-186AF6DF718C@gmail.com> Message-ID: <58949A45-D6F1-41FF-B841-021736C32280@gmail.com> Thanks Satish, Turns out it was a local system problem that has now been resolved. Randy M. > On Nov 26, 2015, at 9:40 AM, Satish Balay wrote: > > Its always best to send logs so that we know whats hapenning. To be > sure - you can try a clean build. > > Satish > > On Thu, 26 Nov 2015, Randall Mackie wrote: > >> I was trying to recompile PETSc using superlu_dist on a linux system, and I had configure download the necessary packages, and the configure went fine. >> >> Compilation bombed out with the error message: >> >> /usr/bin/ld cannot find -ldat >> >> this came immediately after the ztaulinesearch compile in the make.log. >> >> I thought before I mailed off the make.log to petsc-maint I would see if anyone knew the issue here. >> >> >> Thanks, >> >> Randy M. > From arne.morten.kvarving at sintef.no Fri Nov 27 04:10:25 2015 From: arne.morten.kvarving at sintef.no (Arne Morten Kvarving) Date: Fri, 27 Nov 2015 11:10:25 +0100 Subject: [petsc-users] .pc file does not include dependencies In-Reply-To: References: <5655B8C2.4060708@sintef.no> <5655BCC3.4000808@sintef.no> Message-ID: <56582C11.8050402@sintef.no> On 25/11/15 20:29, Satish Balay wrote: > On Wed, 25 Nov 2015, Satish Balay wrote: > >> I'll check why libs are listed as libfoo.a instead of -lfoo in this file. > Ok - the following patch should fix the issue. Could you try it out? > sorry for the late response, time zone differences and out-of-officing. it works just fine. thanks a lot satish. hopefully this can land and there's a 3.6.3 not too far the line =) arnem From balay at mcs.anl.gov Fri Nov 27 09:25:31 2015 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 27 Nov 2015 09:25:31 -0600 Subject: [petsc-users] .pc file does not include dependencies In-Reply-To: <56582C11.8050402@sintef.no> References: <5655B8C2.4060708@sintef.no> <5655BCC3.4000808@sintef.no> <56582C11.8050402@sintef.no> Message-ID: On Fri, 27 Nov 2015, Arne Morten Kvarving wrote: > On 25/11/15 20:29, Satish Balay wrote: > > On Wed, 25 Nov 2015, Satish Balay wrote: > > > > > I'll check why libs are listed as libfoo.a instead of -lfoo in this file. > > Ok - the following patch should fix the issue. Could you try it out? > > > sorry for the late response, time zone differences and out-of-officing. > > it works just fine. thanks a lot satish. hopefully this can land and there's a > 3.6.3 not too far the line =) Yes - the patch is now in 'maint' branch - and will be in 3.6.3 patch update. Satish From rlmackie862 at gmail.com Fri Nov 27 12:00:18 2015 From: rlmackie862 at gmail.com (Randall Mackie) Date: Fri, 27 Nov 2015 10:00:18 -0800 Subject: [petsc-users] question about MPI_Bcast and 64-bit-indices Message-ID: <82984BE9-ADA7-42FD-8EBF-455F08CE6B5C@gmail.com> If my program is compiled using 64-bit-indices, and I have an integer variable defined as PetscInt, what is the right way to broadcast that using MPI_Bcast? I currently have: call MPI_Bcast(n, 1, MPI_INTEGER, ? which is the right way to do it for regular integers, but what do I use in place of MPI_INTEGER when Petsc is compiled with 64-bit-indices. Thanks, Randy M. From jroman at dsic.upv.es Fri Nov 27 12:09:08 2015 From: jroman at dsic.upv.es (Jose E. Roman) Date: Fri, 27 Nov 2015 19:09:08 +0100 Subject: [petsc-users] question about MPI_Bcast and 64-bit-indices In-Reply-To: <82984BE9-ADA7-42FD-8EBF-455F08CE6B5C@gmail.com> References: <82984BE9-ADA7-42FD-8EBF-455F08CE6B5C@gmail.com> Message-ID: <722D82E0-01EF-440F-B435-58233B4F4103@dsic.upv.es> > El 27 nov 2015, a las 19:00, Randall Mackie escribi?: > > If my program is compiled using 64-bit-indices, and I have an integer variable defined as PetscInt, what is the right way to broadcast that using MPI_Bcast? > > I currently have: > > call MPI_Bcast(n, 1, MPI_INTEGER, ? > > which is the right way to do it for regular integers, but what do I use in place of MPI_INTEGER when Petsc is compiled with 64-bit-indices. > > > Thanks, > > Randy M. There are PETSc-defined MPI types for basic PETSc datatypes: MPIU_INT for PetscInt, MPIU_SCALAR for PetscScalar, and so on. See petscsys.h for details. Jose From bsmith at mcs.anl.gov Fri Nov 27 12:27:55 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 27 Nov 2015 12:27:55 -0600 Subject: [petsc-users] question about MPI_Bcast and 64-bit-indices In-Reply-To: <722D82E0-01EF-440F-B435-58233B4F4103@dsic.upv.es> References: <82984BE9-ADA7-42FD-8EBF-455F08CE6B5C@gmail.com> <722D82E0-01EF-440F-B435-58233B4F4103@dsic.upv.es> Message-ID: <87A72092-177B-4682-BD35-1952BB7256D0@mcs.anl.gov> Use MPIU_INTEGER for Fortran > On Nov 27, 2015, at 12:09 PM, Jose E. Roman wrote: > > >> El 27 nov 2015, a las 19:00, Randall Mackie escribi?: >> >> If my program is compiled using 64-bit-indices, and I have an integer variable defined as PetscInt, what is the right way to broadcast that using MPI_Bcast? >> >> I currently have: >> >> call MPI_Bcast(n, 1, MPI_INTEGER, ? >> >> which is the right way to do it for regular integers, but what do I use in place of MPI_INTEGER when Petsc is compiled with 64-bit-indices. >> >> >> Thanks, >> >> Randy M. > > There are PETSc-defined MPI types for basic PETSc datatypes: MPIU_INT for PetscInt, MPIU_SCALAR for PetscScalar, and so on. See petscsys.h for details. > > Jose > From rlmackie862 at gmail.com Fri Nov 27 12:32:11 2015 From: rlmackie862 at gmail.com (Randall Mackie) Date: Fri, 27 Nov 2015 10:32:11 -0800 Subject: [petsc-users] question about MPI_Bcast and 64-bit-indices In-Reply-To: <87A72092-177B-4682-BD35-1952BB7256D0@mcs.anl.gov> References: <82984BE9-ADA7-42FD-8EBF-455F08CE6B5C@gmail.com> <722D82E0-01EF-440F-B435-58233B4F4103@dsic.upv.es> <87A72092-177B-4682-BD35-1952BB7256D0@mcs.anl.gov> Message-ID: <98A7E8B2-43B1-4DB4-995D-77142544BADA@gmail.com> Thanks Barry and Jose. > On Nov 27, 2015, at 10:27 AM, Barry Smith wrote: > > > Use MPIU_INTEGER for Fortran > > >> On Nov 27, 2015, at 12:09 PM, Jose E. Roman wrote: >> >> >>> El 27 nov 2015, a las 19:00, Randall Mackie escribi?: >>> >>> If my program is compiled using 64-bit-indices, and I have an integer variable defined as PetscInt, what is the right way to broadcast that using MPI_Bcast? >>> >>> I currently have: >>> >>> call MPI_Bcast(n, 1, MPI_INTEGER, ? >>> >>> which is the right way to do it for regular integers, but what do I use in place of MPI_INTEGER when Petsc is compiled with 64-bit-indices. >>> >>> >>> Thanks, >>> >>> Randy M. >> >> There are PETSc-defined MPI types for basic PETSc datatypes: MPIU_INT for PetscInt, MPIU_SCALAR for PetscScalar, and so on. See petscsys.h for details. >> >> Jose >> > From fdkong.jd at gmail.com Fri Nov 27 13:05:02 2015 From: fdkong.jd at gmail.com (Fande Kong) Date: Fri, 27 Nov 2015 12:05:02 -0700 Subject: [petsc-users] parallel IO messages Message-ID: Hi all, I implemented a parallel IO based on the Vec and IS which uses HDF5. I am testing this loader on a supercomputer. I occasionally (not always) encounter the following errors (using 8192 cores): [7689]PETSC ERROR: ------------------------------------------------------------------------ [7689]PETSC ERROR: Caught signal number 5 TRAP [7689]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [7689]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind [7689]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [7689]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run [7689]PETSC ERROR: to get more information on the crash. [7689]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [7689]PETSC ERROR: Signal received [7689]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [7689]PETSC ERROR: Petsc Release Version 3.6.2, unknown [7689]PETSC ERROR: ./fsi on a arch-linux2-cxx-opt named ys6103 by fandek Fri Nov 27 11:26:30 2015 [7689]PETSC ERROR: Configure options --with-clanguage=cxx --with-shared-libraries=1 --download-fblaslapack=1 --with-mpi=1 --download-parmetis=1 --download-metis=1 --with-netcdf=1 --download-exodusii=1 --with-hdf5-dir=/glade/apps/opt/hdf5-mpi/1.8.12/intel/12.1.5 --with-debugging=no --with-c2html=0 --with-64-bit-indices=1 [7689]PETSC ERROR: #1 User provided function() line 0 in unknown file Abort(59) on node 7689 (rank 7689 in comm 1140850688): application called MPI_Abort(MPI_COMM_WORLD, 59) - process 7689 ERROR: 0031-300 Forcing all remote tasks to exit due to exit code 1 in task 7689 Make and configure logs are attached. Thanks, Fande Kong, -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure_log Type: application/octet-stream Size: 4911198 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: make_log Type: application/octet-stream Size: 103842 bytes Desc: not available URL: From dave.mayhem23 at gmail.com Fri Nov 27 13:08:48 2015 From: dave.mayhem23 at gmail.com (Dave May) Date: Fri, 27 Nov 2015 20:08:48 +0100 Subject: [petsc-users] parallel IO messages In-Reply-To: References: Message-ID: There is little information in this stack trace. You would get more information if you use a debug build of petsc. e.g. configure with --with-debugging=yes It is recommended to always debug problems using a debug build of petsc and a debug build of your application. Thanks, Dave On 27 November 2015 at 20:05, Fande Kong wrote: > Hi all, > > I implemented a parallel IO based on the Vec and IS which uses HDF5. I am > testing this loader on a supercomputer. I occasionally (not always) > encounter the following errors (using 8192 cores): > > [7689]PETSC ERROR: > ------------------------------------------------------------------------ > [7689]PETSC ERROR: Caught signal number 5 TRAP > [7689]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger > [7689]PETSC ERROR: or see > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [7689]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac > OS X to find memory corruption errors > [7689]PETSC ERROR: configure using --with-debugging=yes, recompile, link, > and run > [7689]PETSC ERROR: to get more information on the crash. > [7689]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [7689]PETSC ERROR: Signal received > [7689]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [7689]PETSC ERROR: Petsc Release Version 3.6.2, unknown > [7689]PETSC ERROR: ./fsi on a arch-linux2-cxx-opt named ys6103 by fandek > Fri Nov 27 11:26:30 2015 > [7689]PETSC ERROR: Configure options --with-clanguage=cxx > --with-shared-libraries=1 --download-fblaslapack=1 --with-mpi=1 > --download-parmetis=1 --download-metis=1 --with-netcdf=1 > --download-exodusii=1 > --with-hdf5-dir=/glade/apps/opt/hdf5-mpi/1.8.12/intel/12.1.5 > --with-debugging=no --with-c2html=0 --with-64-bit-indices=1 > [7689]PETSC ERROR: #1 User provided function() line 0 in unknown file > Abort(59) on node 7689 (rank 7689 in comm 1140850688): application called > MPI_Abort(MPI_COMM_WORLD, 59) - process 7689 > ERROR: 0031-300 Forcing all remote tasks to exit due to exit code 1 in > task 7689 > > Make and configure logs are attached. > > Thanks, > > Fande Kong, > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Nov 27 13:08:50 2015 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 27 Nov 2015 13:08:50 -0600 Subject: [petsc-users] parallel IO messages In-Reply-To: References: Message-ID: On Fri, Nov 27, 2015 at 1:05 PM, Fande Kong wrote: > Hi all, > > I implemented a parallel IO based on the Vec and IS which uses HDF5. I am > testing this loader on a supercomputer. I occasionally (not always) > encounter the following errors (using 8192 cores): > What is different from the current HDF5 output routines? Thanks, Matt > [7689]PETSC ERROR: > ------------------------------------------------------------------------ > [7689]PETSC ERROR: Caught signal number 5 TRAP > [7689]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger > [7689]PETSC ERROR: or see > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [7689]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac > OS X to find memory corruption errors > [7689]PETSC ERROR: configure using --with-debugging=yes, recompile, link, > and run > [7689]PETSC ERROR: to get more information on the crash. > [7689]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [7689]PETSC ERROR: Signal received > [7689]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [7689]PETSC ERROR: Petsc Release Version 3.6.2, unknown > [7689]PETSC ERROR: ./fsi on a arch-linux2-cxx-opt named ys6103 by fandek > Fri Nov 27 11:26:30 2015 > [7689]PETSC ERROR: Configure options --with-clanguage=cxx > --with-shared-libraries=1 --download-fblaslapack=1 --with-mpi=1 > --download-parmetis=1 --download-metis=1 --with-netcdf=1 > --download-exodusii=1 > --with-hdf5-dir=/glade/apps/opt/hdf5-mpi/1.8.12/intel/12.1.5 > --with-debugging=no --with-c2html=0 --with-64-bit-indices=1 > [7689]PETSC ERROR: #1 User provided function() line 0 in unknown file > Abort(59) on node 7689 (rank 7689 in comm 1140850688): application called > MPI_Abort(MPI_COMM_WORLD, 59) - process 7689 > ERROR: 0031-300 Forcing all remote tasks to exit due to exit code 1 in > task 7689 > > Make and configure logs are attached. > > Thanks, > > Fande Kong, > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdkong.jd at gmail.com Fri Nov 27 13:18:49 2015 From: fdkong.jd at gmail.com (Fande Kong) Date: Fri, 27 Nov 2015 12:18:49 -0700 Subject: [petsc-users] parallel IO messages In-Reply-To: References: Message-ID: HI Matt, Thanks for your reply. I put my application data into PETSc Vec and IS that take advantage of HDF5 viewer (you implemented). In fact, I did not add any new output and input functions. Thanks, Fande, On Fri, Nov 27, 2015 at 12:08 PM, Matthew Knepley wrote: > On Fri, Nov 27, 2015 at 1:05 PM, Fande Kong wrote: > >> Hi all, >> >> I implemented a parallel IO based on the Vec and IS which uses HDF5. I am >> testing this loader on a supercomputer. I occasionally (not always) >> encounter the following errors (using 8192 cores): >> > > What is different from the current HDF5 output routines? > > Thanks, > > Matt > > >> [7689]PETSC ERROR: >> ------------------------------------------------------------------------ >> [7689]PETSC ERROR: Caught signal number 5 TRAP >> [7689]PETSC ERROR: Try option -start_in_debugger or >> -on_error_attach_debugger >> [7689]PETSC ERROR: or see >> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >> [7689]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac >> OS X to find memory corruption errors >> [7689]PETSC ERROR: configure using --with-debugging=yes, recompile, link, >> and run >> [7689]PETSC ERROR: to get more information on the crash. >> [7689]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [7689]PETSC ERROR: Signal received >> [7689]PETSC ERROR: See >> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >> [7689]PETSC ERROR: Petsc Release Version 3.6.2, unknown >> [7689]PETSC ERROR: ./fsi on a arch-linux2-cxx-opt named ys6103 by fandek >> Fri Nov 27 11:26:30 2015 >> [7689]PETSC ERROR: Configure options --with-clanguage=cxx >> --with-shared-libraries=1 --download-fblaslapack=1 --with-mpi=1 >> --download-parmetis=1 --download-metis=1 --with-netcdf=1 >> --download-exodusii=1 >> --with-hdf5-dir=/glade/apps/opt/hdf5-mpi/1.8.12/intel/12.1.5 >> --with-debugging=no --with-c2html=0 --with-64-bit-indices=1 >> [7689]PETSC ERROR: #1 User provided function() line 0 in unknown file >> Abort(59) on node 7689 (rank 7689 in comm 1140850688): application called >> MPI_Abort(MPI_COMM_WORLD, 59) - process 7689 >> ERROR: 0031-300 Forcing all remote tasks to exit due to exit code 1 in >> task 7689 >> >> Make and configure logs are attached. >> >> Thanks, >> >> Fande Kong, >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdkong.jd at gmail.com Fri Nov 27 13:21:58 2015 From: fdkong.jd at gmail.com (Fande Kong) Date: Fri, 27 Nov 2015 12:21:58 -0700 Subject: [petsc-users] parallel IO messages In-Reply-To: References: Message-ID: Hi Dave, This not always happens. I am trying to get performance measurement so that the PETSc is compiled with --with-debugging=no. I will try later. Thanks, Fande, On Fri, Nov 27, 2015 at 12:08 PM, Dave May wrote: > There is little information in this stack trace. > You would get more information if you use a debug build of petsc. > e.g. configure with --with-debugging=yes > It is recommended to always debug problems using a debug build of petsc > and a debug build of your application. > > Thanks, > Dave > > On 27 November 2015 at 20:05, Fande Kong wrote: > >> Hi all, >> >> I implemented a parallel IO based on the Vec and IS which uses HDF5. I am >> testing this loader on a supercomputer. I occasionally (not always) >> encounter the following errors (using 8192 cores): >> >> [7689]PETSC ERROR: >> ------------------------------------------------------------------------ >> [7689]PETSC ERROR: Caught signal number 5 TRAP >> [7689]PETSC ERROR: Try option -start_in_debugger or >> -on_error_attach_debugger >> [7689]PETSC ERROR: or see >> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >> [7689]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac >> OS X to find memory corruption errors >> [7689]PETSC ERROR: configure using --with-debugging=yes, recompile, link, >> and run >> [7689]PETSC ERROR: to get more information on the crash. >> [7689]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [7689]PETSC ERROR: Signal received >> [7689]PETSC ERROR: See >> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >> [7689]PETSC ERROR: Petsc Release Version 3.6.2, unknown >> [7689]PETSC ERROR: ./fsi on a arch-linux2-cxx-opt named ys6103 by fandek >> Fri Nov 27 11:26:30 2015 >> [7689]PETSC ERROR: Configure options --with-clanguage=cxx >> --with-shared-libraries=1 --download-fblaslapack=1 --with-mpi=1 >> --download-parmetis=1 --download-metis=1 --with-netcdf=1 >> --download-exodusii=1 >> --with-hdf5-dir=/glade/apps/opt/hdf5-mpi/1.8.12/intel/12.1.5 >> --with-debugging=no --with-c2html=0 --with-64-bit-indices=1 >> [7689]PETSC ERROR: #1 User provided function() line 0 in unknown file >> Abort(59) on node 7689 (rank 7689 in comm 1140850688): application called >> MPI_Abort(MPI_COMM_WORLD, 59) - process 7689 >> ERROR: 0031-300 Forcing all remote tasks to exit due to exit code 1 in >> task 7689 >> >> Make and configure logs are attached. >> >> Thanks, >> >> Fande Kong, >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Nov 27 14:20:18 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 27 Nov 2015 14:20:18 -0600 Subject: [petsc-users] parallel IO messages In-Reply-To: References: Message-ID: <209FDD1A-53A4-4531-B1D6-F0F1D1B112F9@mcs.anl.gov> Edit PETSC_ARCH/include/petscconf.h and add #if !defined(PETSC_MISSING_SIGTRAP) #define PETSC_MISSING_SIGTRAP #endif then do make gnumake It is possible that they system you are using uses SIGTRAP in managing the IO; by making the change above you are telling PETSc to ignore SIGTRAPS. Let us know how this works out. Barry > On Nov 27, 2015, at 1:05 PM, Fande Kong wrote: > > Hi all, > > I implemented a parallel IO based on the Vec and IS which uses HDF5. I am testing this loader on a supercomputer. I occasionally (not always) encounter the following errors (using 8192 cores): > > [7689]PETSC ERROR: ------------------------------------------------------------------------ > [7689]PETSC ERROR: Caught signal number 5 TRAP > [7689]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [7689]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [7689]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors > [7689]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run > [7689]PETSC ERROR: to get more information on the crash. > [7689]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [7689]PETSC ERROR: Signal received > [7689]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [7689]PETSC ERROR: Petsc Release Version 3.6.2, unknown > [7689]PETSC ERROR: ./fsi on a arch-linux2-cxx-opt named ys6103 by fandek Fri Nov 27 11:26:30 2015 > [7689]PETSC ERROR: Configure options --with-clanguage=cxx --with-shared-libraries=1 --download-fblaslapack=1 --with-mpi=1 --download-parmetis=1 --download-metis=1 --with-netcdf=1 --download-exodusii=1 --with-hdf5-dir=/glade/apps/opt/hdf5-mpi/1.8.12/intel/12.1.5 --with-debugging=no --with-c2html=0 --with-64-bit-indices=1 > [7689]PETSC ERROR: #1 User provided function() line 0 in unknown file > Abort(59) on node 7689 (rank 7689 in comm 1140850688): application called MPI_Abort(MPI_COMM_WORLD, 59) - process 7689 > ERROR: 0031-300 Forcing all remote tasks to exit due to exit code 1 in task 7689 > > Make and configure logs are attached. > > Thanks, > > Fande Kong, > > From elbueler at alaska.edu Fri Nov 27 14:24:12 2015 From: elbueler at alaska.edu (Ed Bueler) Date: Fri, 27 Nov 2015 11:24:12 -0900 Subject: [petsc-users] master branch option "-snes_monitor_solution" In-Reply-To: References: <49DD4FA4-8C55-43D8-A6E3-9266231110EC@mcs.anl.gov> Message-ID: Barry -- Works great for me in next and master. Having value "draw" is perfectly natural, as is default behavior. Ed On Wed, Nov 25, 2015 at 4:54 PM, Barry Smith wrote: > > Ed, > > I have fixed the error in the branch barry/update-monitors now in next > for testing. > > There is one API change associated with the fix. To graphically > visualize the solution one now needs > > -ksp/snes/ts_monitor_solution draw > > the default behavior is now to ASCII print the solution to the screen. > > Barry > > Further work is needed to unify and simplified the various monitor options. > > > On Nov 22, 2015, at 8:55 PM, Ed Bueler wrote: > > > > Barry -- > > > > That is reassuring, actually. That is, knowing that occasionally ya'll > botch something, and that the problem is not entirely on this leaf of the > internets. > > > > Ed > > > > > > On Sun, Nov 22, 2015 at 5:51 PM, Barry Smith wrote: > > > > I totally botched that update; looks like I broke a lot of the > command line monitor options in master. > > > > Fixing it properly will take some work but also enhance the command > line monitor and reduce the code a bit. > > > > Thanks for letting us know. > > > > > > Barry > > > > > On Nov 22, 2015, at 1:40 PM, Ed Bueler wrote: > > > > > > Dear PETSc -- > > > > > > When I use option -snes_monitor_solution in master branch I get the > error below. I have a sense that this is related to the change listed at > http://www.mcs.anl.gov/petsc/documentation/changes/dev.html, namely > > > > > > "SNESSetMonitor(SNESMonitorXXX, calls now require passing a viewer as > the final argument, you can no longer pass a NULL)" > > > > > > but the error message below is not informative enough to tell me what > to do at the command line. > > > > > > Note that my X11 windows do work, as other options successfully give > line graphs etc. > > > > > > Do I need > > > > > > -snes_monitor_solution Z > > > > > > with some value for Z? If so, where are the possibilities documented? > > > > > > Thanks! > > > > > > Ed > > > > > > > > > > > > $ ./ex5 -snes_monitor_solution > > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > > [0]PETSC ERROR: Null argument, when expecting valid pointer > > > [0]PETSC ERROR: Null Object: Parameter # 4 > > > [0]PETSC ERROR: See > http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > > [0]PETSC ERROR: Petsc Development GIT revision: v3.6.2-1635-g5e95a8a > GIT Date: 2015-11-21 16:14:08 -0600 > > > [0]PETSC ERROR: ./ex5 on a linux-c-dbg named bueler-leopard by ed Sun > Nov 22 10:31:33 2015 > > > [0]PETSC ERROR: Configure options --download-mpich --download-triangle > --with-debugging=1 > > > [0]PETSC ERROR: #1 SNESMonitorSolution() line 33 in > /home/ed/petsc/src/snes/interface/snesut.c > > > [0]PETSC ERROR: #2 SNESMonitor() line 3383 in > /home/ed/petsc/src/snes/interface/snes.c > > > [0]PETSC ERROR: #3 SNESSolve_NEWTONLS() line 191 in > /home/ed/petsc/src/snes/impls/ls/ls.c > > > [0]PETSC ERROR: #4 SNESSolve() line 3984 in > /home/ed/petsc/src/snes/interface/snes.c > > > [0]PETSC ERROR: #5 main() line 171 in > /home/ed/petsc/src/snes/examples/tutorials/ex5.c > > > [0]PETSC ERROR: PETSc Option Table entries: > > > [0]PETSC ERROR: -snes_monitor_solution > > > [0]PETSC ERROR: ----------------End of Error Message -------send > entire error message to petsc-maint at mcs.anl.gov---------- > > > application called MPI_Abort(MPI_COMM_WORLD, 85) - process 0 > > > [unset]: aborting job: > > > application called MPI_Abort(MPI_COMM_WORLD, 85) - process 0 > > > > > > > > > > > > -- > > > Ed Bueler > > > Dept of Math and Stat and Geophysical Institute > > > University of Alaska Fairbanks > > > Fairbanks, AK 99775-6660 > > > 301C Chapman and 410D Elvey > > > 907 474-7693 and 907 474-7199 (fax 907 474-5394) > > > > > > > > > > -- > > Ed Bueler > > Dept of Math and Stat and Geophysical Institute > > University of Alaska Fairbanks > > Fairbanks, AK 99775-6660 > > 301C Chapman and 410D Elvey > > 907 474-7693 and 907 474-7199 (fax 907 474-5394) > > -- Ed Bueler Dept of Math and Stat and Geophysical Institute University of Alaska Fairbanks Fairbanks, AK 99775-6660 301C Chapman and 410D Elvey 907 474-7693 and 907 474-7199 (fax 907 474-5394) -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdkong.jd at gmail.com Fri Nov 27 14:27:42 2015 From: fdkong.jd at gmail.com (Fande Kong) Date: Fri, 27 Nov 2015 13:27:42 -0700 Subject: [petsc-users] parallel IO messages In-Reply-To: <209FDD1A-53A4-4531-B1D6-F0F1D1B112F9@mcs.anl.gov> References: <209FDD1A-53A4-4531-B1D6-F0F1D1B112F9@mcs.anl.gov> Message-ID: Thanks, Barry, I also was wondering why this happens randomly? Any explanations? If this is something in PETSc, that should happen always? Thanks, Fande Kong, On Fri, Nov 27, 2015 at 1:20 PM, Barry Smith wrote: > > Edit PETSC_ARCH/include/petscconf.h and add > > #if !defined(PETSC_MISSING_SIGTRAP) > #define PETSC_MISSING_SIGTRAP > #endif > > then do > > make gnumake > > It is possible that they system you are using uses SIGTRAP in managing the > IO; by making the change above you are telling PETSc to ignore SIGTRAPS. > Let us know how this works out. > > Barry > > > > On Nov 27, 2015, at 1:05 PM, Fande Kong wrote: > > > > Hi all, > > > > I implemented a parallel IO based on the Vec and IS which uses HDF5. I > am testing this loader on a supercomputer. I occasionally (not always) > encounter the following errors (using 8192 cores): > > > > [7689]PETSC ERROR: > ------------------------------------------------------------------------ > > [7689]PETSC ERROR: Caught signal number 5 TRAP > > [7689]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger > > [7689]PETSC ERROR: or see > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > [7689]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple > Mac OS X to find memory corruption errors > > [7689]PETSC ERROR: configure using --with-debugging=yes, recompile, > link, and run > > [7689]PETSC ERROR: to get more information on the crash. > > [7689]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [7689]PETSC ERROR: Signal received > > [7689]PETSC ERROR: See > http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > [7689]PETSC ERROR: Petsc Release Version 3.6.2, unknown > > [7689]PETSC ERROR: ./fsi on a arch-linux2-cxx-opt named ys6103 by fandek > Fri Nov 27 11:26:30 2015 > > [7689]PETSC ERROR: Configure options --with-clanguage=cxx > --with-shared-libraries=1 --download-fblaslapack=1 --with-mpi=1 > --download-parmetis=1 --download-metis=1 --with-netcdf=1 > --download-exodusii=1 > --with-hdf5-dir=/glade/apps/opt/hdf5-mpi/1.8.12/intel/12.1.5 > --with-debugging=no --with-c2html=0 --with-64-bit-indices=1 > > [7689]PETSC ERROR: #1 User provided function() line 0 in unknown file > > Abort(59) on node 7689 (rank 7689 in comm 1140850688): application > called MPI_Abort(MPI_COMM_WORLD, 59) - process 7689 > > ERROR: 0031-300 Forcing all remote tasks to exit due to exit code 1 in > task 7689 > > > > Make and configure logs are attached. > > > > Thanks, > > > > Fande Kong, > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Nov 27 15:29:14 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 27 Nov 2015 15:29:14 -0600 Subject: [petsc-users] parallel IO messages In-Reply-To: References: <209FDD1A-53A4-4531-B1D6-F0F1D1B112F9@mcs.anl.gov> Message-ID: <3479F21F-93AF-49B9-A399-D33CF19E9442@mcs.anl.gov> SIGTRAP is a way a process can interact with itself or another process asynchronously. It is possible that in all the mess of HDF5/MPI IO/OS code that manages getting the data in parallel from the MPI process memory to the hard disk some of the code uses SIGTRAP. PETSc, by default, always traps the SIGTRAP; thinking that it is indicating an error condition. The "randomness" could come from the fact that depending on how quickly the data is moving from the MPI processes to the disk only sometimes will the mess of code actually use a SIGTRAP. I could also be totally wrong and the SIGTRAP may just be triggered by errors in the IO system. Anyways give my suggestion a try and see if it helps, there is nothing else you can do. Barry > On Nov 27, 2015, at 2:27 PM, Fande Kong wrote: > > Thanks, Barry, > > I also was wondering why this happens randomly? Any explanations? If this is something in PETSc, that should happen always? > > Thanks, > > Fande Kong, > > On Fri, Nov 27, 2015 at 1:20 PM, Barry Smith wrote: > > Edit PETSC_ARCH/include/petscconf.h and add > > #if !defined(PETSC_MISSING_SIGTRAP) > #define PETSC_MISSING_SIGTRAP > #endif > > then do > > make gnumake > > It is possible that they system you are using uses SIGTRAP in managing the IO; by making the change above you are telling PETSc to ignore SIGTRAPS. Let us know how this works out. > > Barry > > > > On Nov 27, 2015, at 1:05 PM, Fande Kong wrote: > > > > Hi all, > > > > I implemented a parallel IO based on the Vec and IS which uses HDF5. I am testing this loader on a supercomputer. I occasionally (not always) encounter the following errors (using 8192 cores): > > > > [7689]PETSC ERROR: ------------------------------------------------------------------------ > > [7689]PETSC ERROR: Caught signal number 5 TRAP > > [7689]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > > [7689]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > [7689]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors > > [7689]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run > > [7689]PETSC ERROR: to get more information on the crash. > > [7689]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > [7689]PETSC ERROR: Signal received > > [7689]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > [7689]PETSC ERROR: Petsc Release Version 3.6.2, unknown > > [7689]PETSC ERROR: ./fsi on a arch-linux2-cxx-opt named ys6103 by fandek Fri Nov 27 11:26:30 2015 > > [7689]PETSC ERROR: Configure options --with-clanguage=cxx --with-shared-libraries=1 --download-fblaslapack=1 --with-mpi=1 --download-parmetis=1 --download-metis=1 --with-netcdf=1 --download-exodusii=1 --with-hdf5-dir=/glade/apps/opt/hdf5-mpi/1.8.12/intel/12.1.5 --with-debugging=no --with-c2html=0 --with-64-bit-indices=1 > > [7689]PETSC ERROR: #1 User provided function() line 0 in unknown file > > Abort(59) on node 7689 (rank 7689 in comm 1140850688): application called MPI_Abort(MPI_COMM_WORLD, 59) - process 7689 > > ERROR: 0031-300 Forcing all remote tasks to exit due to exit code 1 in task 7689 > > > > Make and configure logs are attached. > > > > Thanks, > > > > Fande Kong, > > > > > > From rlmackie862 at gmail.com Fri Nov 27 16:47:48 2015 From: rlmackie862 at gmail.com (Randall Mackie) Date: Fri, 27 Nov 2015 14:47:48 -0800 Subject: [petsc-users] running applications with 64 bit indices Message-ID: <1F6AE67E-F43B-4844-8211-EBD2B60338CB@gmail.com> I?ve been struggling to get an application running, which was compiled with 64 bit indices. It runs fine locally on my laptop with a petsc-downloaded mpich (and is Valgrind clean). On our cluster, with Intel MPI, it crashes immediately. When I say immediately, I put a goto end of program right after PetscInitialize, and it still crashes immediately. The same program compiled without 64 bit indices runs fine with Intel MPI. Are there any special configuration options that must be set to use 64 bit indices in an application compiled with Intel MPI? Any suggestions appreciated. Thanks, Randy M. From bsmith at mcs.anl.gov Fri Nov 27 16:59:06 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 27 Nov 2015 16:59:06 -0600 Subject: [petsc-users] running applications with 64 bit indices In-Reply-To: <1F6AE67E-F43B-4844-8211-EBD2B60338CB@gmail.com> References: <1F6AE67E-F43B-4844-8211-EBD2B60338CB@gmail.com> Message-ID: <11133A73-22A6-4674-ABE4-4B52B01B7D1C@mcs.anl.gov> > On Nov 27, 2015, at 4:47 PM, Randall Mackie wrote: > > I?ve been struggling to get an application running, which was compiled with 64 bit indices. > > It runs fine locally on my laptop with a petsc-downloaded mpich (and is Valgrind clean). > > On our cluster, with Intel MPI, it crashes immediately. When I say immediately, I put a goto end of program right after PetscInitialize, and it still crashes immediately. > > The same program compiled without 64 bit indices runs fine with Intel MPI. > > Are there any special configuration options that must be set to use 64 bit indices in an application compiled with Intel MPI? Nope. Try on the cluster with valgrind? Try on the cluster with a debugger? Does it crash with one process? Do the PETSc examples run with 64 bit integers on the cluster? Petsc Fortran examples? Barry > > Any suggestions appreciated. > > Thanks, > > Randy M. From fdkong.jd at gmail.com Fri Nov 27 18:24:38 2015 From: fdkong.jd at gmail.com (Fande Kong) Date: Fri, 27 Nov 2015 17:24:38 -0700 Subject: [petsc-users] parallel IO messages In-Reply-To: <3479F21F-93AF-49B9-A399-D33CF19E9442@mcs.anl.gov> References: <209FDD1A-53A4-4531-B1D6-F0F1D1B112F9@mcs.anl.gov> <3479F21F-93AF-49B9-A399-D33CF19E9442@mcs.anl.gov> Message-ID: Hi Barry, You are highly possibly right. Not 100% because this happens randomly. I have tried several tests, and all of them passed. Any reason to put SIGTRAP into IO system? Thanks, Fande, On Fri, Nov 27, 2015 at 2:29 PM, Barry Smith wrote: > > SIGTRAP is a way a process can interact with itself or another process > asynchronously. It is possible that in all the mess of HDF5/MPI IO/OS code > that manages getting the data in parallel from the MPI process memory to > the hard disk some of the code uses SIGTRAP. PETSc, by default, always > traps the SIGTRAP; thinking that it is indicating an error condition. The > "randomness" could come from the fact that depending on how quickly the > data is moving from the MPI processes to the disk only sometimes will the > mess of code actually use a SIGTRAP. I could also be totally wrong and > the SIGTRAP may just be triggered by errors in the IO system. Anyways give > my suggestion a try and see if it helps, there is nothing else you can do. > > Barry > > > > > > On Nov 27, 2015, at 2:27 PM, Fande Kong wrote: > > > > Thanks, Barry, > > > > I also was wondering why this happens randomly? Any explanations? If > this is something in PETSc, that should happen always? > > > > Thanks, > > > > Fande Kong, > > > > On Fri, Nov 27, 2015 at 1:20 PM, Barry Smith wrote: > > > > Edit PETSC_ARCH/include/petscconf.h and add > > > > #if !defined(PETSC_MISSING_SIGTRAP) > > #define PETSC_MISSING_SIGTRAP > > #endif > > > > then do > > > > make gnumake > > > > It is possible that they system you are using uses SIGTRAP in managing > the IO; by making the change above you are telling PETSc to ignore > SIGTRAPS. Let us know how this works out. > > > > Barry > > > > > > > On Nov 27, 2015, at 1:05 PM, Fande Kong wrote: > > > > > > Hi all, > > > > > > I implemented a parallel IO based on the Vec and IS which uses HDF5. I > am testing this loader on a supercomputer. I occasionally (not always) > encounter the following errors (using 8192 cores): > > > > > > [7689]PETSC ERROR: > ------------------------------------------------------------------------ > > > [7689]PETSC ERROR: Caught signal number 5 TRAP > > > [7689]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger > > > [7689]PETSC ERROR: or see > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > > [7689]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple > Mac OS X to find memory corruption errors > > > [7689]PETSC ERROR: configure using --with-debugging=yes, recompile, > link, and run > > > [7689]PETSC ERROR: to get more information on the crash. > > > [7689]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > > [7689]PETSC ERROR: Signal received > > > [7689]PETSC ERROR: See > http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > > [7689]PETSC ERROR: Petsc Release Version 3.6.2, unknown > > > [7689]PETSC ERROR: ./fsi on a arch-linux2-cxx-opt named ys6103 by > fandek Fri Nov 27 11:26:30 2015 > > > [7689]PETSC ERROR: Configure options --with-clanguage=cxx > --with-shared-libraries=1 --download-fblaslapack=1 --with-mpi=1 > --download-parmetis=1 --download-metis=1 --with-netcdf=1 > --download-exodusii=1 > --with-hdf5-dir=/glade/apps/opt/hdf5-mpi/1.8.12/intel/12.1.5 > --with-debugging=no --with-c2html=0 --with-64-bit-indices=1 > > > [7689]PETSC ERROR: #1 User provided function() line 0 in unknown file > > > Abort(59) on node 7689 (rank 7689 in comm 1140850688): application > called MPI_Abort(MPI_COMM_WORLD, 59) - process 7689 > > > ERROR: 0031-300 Forcing all remote tasks to exit due to exit code 1 > in task 7689 > > > > > > Make and configure logs are attached. > > > > > > Thanks, > > > > > > Fande Kong, > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jychang48 at gmail.com Sat Nov 28 00:10:16 2015 From: jychang48 at gmail.com (Justin Chang) Date: Sat, 28 Nov 2015 00:10:16 -0600 Subject: [petsc-users] Solving/creating SPD systems Message-ID: Hi all, Say I have a saddle-point system for the mixed-poisson equation: [I -grad] [u] = [0] [-div 0 ] [p] [-f] The above is symmetric but indefinite. I have heard that one could make the above symmetric and positive definite (SPD). How would I do that? And if that's the case, would this allow me to use CG instead of GMRES? Thanks, Justin -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.ghadam at gmail.com Sat Nov 28 00:40:09 2015 From: m.ghadam at gmail.com (Mostafa Ghadamyari) Date: Sat, 28 Nov 2015 10:10:09 +0330 Subject: [petsc-users] KSP Set Operators to be calculated before RHS Message-ID: <23133dcfd61083294d70dfc33eb44d38@imap.gmail.com> Hi all, I've used PETSc to develop my SIMPLE algorithm CFD code. SIMPLE algorithm has its own way to handle non-linearity of Navier-Stokes equation's so I only used PETSc's KSP solvers. In the SIMPLE algorithm, the diagonal coefficient of the matrix is used in the right hand side for implicit calculation of relaxation factor, so it has to be calculated before the RHS. The linear matrix has also to be calculated in each iteration as the algorithm is iterative in nature. I first used the KSPSetComputeOperators and KSPSetComputeRHS to do this and call KSPSetComputeOperators in each iteration so that the operator matrix is calculated but the problem was that PETSc always calculates RHS before the operator matrix. So I had to manually call the functions to fill operator matrix and rhs vector and then use KSPSolve to solve them. Now I want to use multigrid solvers and I guess I have to use KSPSetComputeOperators and KSPSetComputeRHS for this purpose, right? If so, is it possible to set PETSc to compute operators before RHS? Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sat Nov 28 06:28:45 2015 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 28 Nov 2015 06:28:45 -0600 Subject: [petsc-users] KSP Set Operators to be calculated before RHS In-Reply-To: <23133dcfd61083294d70dfc33eb44d38@imap.gmail.com> References: <23133dcfd61083294d70dfc33eb44d38@imap.gmail.com> Message-ID: On Sat, Nov 28, 2015 at 12:40 AM, Mostafa Ghadamyari wrote: > Hi all, > > I've used PETSc to develop my SIMPLE algorithm CFD code. SIMPLE algorithm > has its own way to handle non-linearity of Navier-Stokes equation's so I > only used PETSc's KSP solvers. > > In the SIMPLE algorithm, the diagonal coefficient of the matrix is used in > the right hand side for implicit calculation of relaxation factor, so it > has to be calculated before the RHS. The linear matrix has also to be > calculated in each iteration as the algorithm is iterative in nature. > > I first used the KSPSetComputeOperators and KSPSetComputeRHS to do this > and call KSPSetComputeOperators in each iteration so that the operator > matrix is calculated but the problem was that PETSc always calculates RHS > before the operator matrix. > > So I had to manually call the functions to fill operator matrix and rhs > vector and then use KSPSolve to solve them. > > Now I want to use multigrid solvers and I guess I have to use > KSPSetComputeOperators and KSPSetComputeRHS for this purpose, right? If so, > is it possible to set PETSc to compute operators before RHS? > If there is a computation that is shared between the RHS and system matrix and you with to reuse it, put the result in a context that is shared between the two operations. Thanks, Matt > Thanks > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sat Nov 28 06:31:31 2015 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 28 Nov 2015 06:31:31 -0600 Subject: [petsc-users] Solving/creating SPD systems In-Reply-To: References: Message-ID: On Sat, Nov 28, 2015 at 12:10 AM, Justin Chang wrote: > Hi all, > > Say I have a saddle-point system for the mixed-poisson equation: > > [I -grad] [u] = [0] > [-div 0 ] [p] [-f] > > The above is symmetric but indefinite. I have heard that one could make > the above symmetric and positive definite (SPD). How would I do that? And > if that's the case, would this allow me to use CG instead of GMRES? > I believe you just multiply the bottom row by -1. You can use CG for an SPD system, but you can use MINRES for symmetric indefinite. Matt > Thanks, > Justin > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrick.sanan at gmail.com Sat Nov 28 10:35:26 2015 From: patrick.sanan at gmail.com (Patrick Sanan) Date: Sat, 28 Nov 2015 17:35:26 +0100 Subject: [petsc-users] Solving/creating SPD systems In-Reply-To: References: Message-ID: <20151128163526.GA4917@Patricks-MBP-4.railnet.train> On Sat, Nov 28, 2015 at 06:31:31AM -0600, Matthew Knepley wrote: > On Sat, Nov 28, 2015 at 12:10 AM, Justin Chang wrote: > > > Hi all, > > > > Say I have a saddle-point system for the mixed-poisson equation: > > > > [I -grad] [u] = [0] > > [-div 0 ] [p] [-f] > > > > The above is symmetric but indefinite. I have heard that one could make > > the above symmetric and positive definite (SPD). How would I do that? And > > if that's the case, would this allow me to use CG instead of GMRES? > > > > I believe you just multiply the bottom row by -1. You can use CG for an SPD > system, but you can > use MINRES for symmetric indefinite. If I'm remembering correctly, flipping that sign lets you make your system alternately P.D. or symmetric, but not both. Maybe you were hearing about the Bramble-Pasciak preconditioner or a related approach? > > Matt > > > > Thanks, > > Justin > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 473 bytes Desc: not available URL: From knepley at gmail.com Sat Nov 28 10:52:30 2015 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 28 Nov 2015 10:52:30 -0600 Subject: [petsc-users] Solving/creating SPD systems In-Reply-To: <20151128163526.GA4917@Patricks-MBP-4.railnet.train> References: <20151128163526.GA4917@Patricks-MBP-4.railnet.train> Message-ID: On Sat, Nov 28, 2015 at 10:35 AM, Patrick Sanan wrote: > On Sat, Nov 28, 2015 at 06:31:31AM -0600, Matthew Knepley wrote: > > On Sat, Nov 28, 2015 at 12:10 AM, Justin Chang > wrote: > > > > > Hi all, > > > > > > Say I have a saddle-point system for the mixed-poisson equation: > > > > > > [I -grad] [u] = [0] > > > [-div 0 ] [p] [-f] > > > > > > The above is symmetric but indefinite. I have heard that one could make > > > the above symmetric and positive definite (SPD). How would I do that? > And > > > if that's the case, would this allow me to use CG instead of GMRES? > > > > > > > I believe you just multiply the bottom row by -1. You can use CG for an > SPD > > system, but you can > > use MINRES for symmetric indefinite. > If I'm remembering correctly, flipping that sign lets you make your system > alternately P.D. or > symmetric, but not both. Maybe you were hearing about the Bramble-Pasciak > preconditioner or a related approach? > Its possible that my Thanksgiving was too happy, however I was pretty sure that div was the transpose of -grad. Is this wrong? Thanks, Matt > > Matt > > > > > > > Thanks, > > > Justin > > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which > their > > experiments lead. > > -- Norbert Wiener > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jychang48 at gmail.com Sat Nov 28 10:56:31 2015 From: jychang48 at gmail.com (Justin Chang) Date: Sat, 28 Nov 2015 10:56:31 -0600 Subject: [petsc-users] Solving/creating SPD systems In-Reply-To: <20151128163526.GA4917@Patricks-MBP-4.railnet.train> References: <20151128163526.GA4917@Patricks-MBP-4.railnet.train> Message-ID: So I am wanting to compare the performance of various FEM discretization with their respective "best" possible solver/pre conditioner. There are saddle-point systems which HDiv formulations like RT0 work, but then there are others like LSFEM that are naturally SPD and so the CG solver can be used (though finding a good preconditioner is still an open problem). I have read and learned that the advantage of LSFEM is that it will always give you an SPD system, even for non-linear problems (because what you do is linearize the problem first and then minimize/take the Gateaux derivative to get the weak form). But after talking to some people and reading some stuff online, it seems one could also make non SPD systems SPD (hence eliminating what may be the only advantage of LSFEM). Two of said people happen to be PETSc developers but I forgot to ask them how one would achieve that. Or if one really only can achieve S or PD and not both :) On Saturday, November 28, 2015, Patrick Sanan wrote: > On Sat, Nov 28, 2015 at 06:31:31AM -0600, Matthew Knepley wrote: > > On Sat, Nov 28, 2015 at 12:10 AM, Justin Chang > wrote: > > > > > Hi all, > > > > > > Say I have a saddle-point system for the mixed-poisson equation: > > > > > > [I -grad] [u] = [0] > > > [-div 0 ] [p] [-f] > > > > > > The above is symmetric but indefinite. I have heard that one could make > > > the above symmetric and positive definite (SPD). How would I do that? > And > > > if that's the case, would this allow me to use CG instead of GMRES? > > > > > > > I believe you just multiply the bottom row by -1. You can use CG for an > SPD > > system, but you can > > use MINRES for symmetric indefinite. > If I'm remembering correctly, flipping that sign lets you make your system > alternately P.D. or > symmetric, but not both. Maybe you were hearing about the Bramble-Pasciak > preconditioner or a related approach? > > > > Matt > > > > > > > Thanks, > > > Justin > > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which > their > > experiments lead. > > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jychang48 at gmail.com Sat Nov 28 10:57:48 2015 From: jychang48 at gmail.com (Justin Chang) Date: Sat, 28 Nov 2015 10:57:48 -0600 Subject: [petsc-users] Solving/creating SPD systems In-Reply-To: References: <20151128163526.GA4917@Patricks-MBP-4.railnet.train> Message-ID: Yes you are correct matt On Saturday, November 28, 2015, Justin Chang wrote: > So I am wanting to compare the performance of various FEM discretization > with their respective "best" possible solver/pre conditioner. There > are saddle-point systems which HDiv formulations like RT0 work, but then > there are others like LSFEM that are naturally SPD and so the CG solver can > be used (though finding a good preconditioner is still an open problem). > > I have read and learned that the advantage of LSFEM is that it will always > give you an SPD system, even for non-linear problems (because what you do > is linearize the problem first and then minimize/take the Gateaux > derivative to get the weak form). But after talking to some people and > reading some stuff online, it seems one could also make non SPD systems SPD > (hence eliminating what may be the only advantage of LSFEM). > > Two of said people happen to be PETSc developers but I forgot to ask them > how one would achieve that. Or if one really only can achieve S or PD and > not both :) > > On Saturday, November 28, 2015, Patrick Sanan > wrote: > >> On Sat, Nov 28, 2015 at 06:31:31AM -0600, Matthew Knepley wrote: >> > On Sat, Nov 28, 2015 at 12:10 AM, Justin Chang >> wrote: >> > >> > > Hi all, >> > > >> > > Say I have a saddle-point system for the mixed-poisson equation: >> > > >> > > [I -grad] [u] = [0] >> > > [-div 0 ] [p] [-f] >> > > >> > > The above is symmetric but indefinite. I have heard that one could >> make >> > > the above symmetric and positive definite (SPD). How would I do that? >> And >> > > if that's the case, would this allow me to use CG instead of GMRES? >> > > >> > >> > I believe you just multiply the bottom row by -1. You can use CG for an >> SPD >> > system, but you can >> > use MINRES for symmetric indefinite. >> If I'm remembering correctly, flipping that sign lets you make your >> system alternately P.D. or >> symmetric, but not both. Maybe you were hearing about the Bramble-Pasciak >> preconditioner or a related approach? >> > >> > Matt >> > >> > >> > > Thanks, >> > > Justin >> > > >> > >> > >> > >> > -- >> > What most experimenters take for granted before they begin their >> > experiments is infinitely more interesting than any results to which >> their >> > experiments lead. >> > -- Norbert Wiener >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From namu.patel7 at gmail.com Sun Nov 29 11:42:11 2015 From: namu.patel7 at gmail.com (namu patel) Date: Sun, 29 Nov 2015 11:42:11 -0600 Subject: [petsc-users] Error configuring PETSc with SUNDIALS (to solve stiff PDE) Message-ID: Hello All, I was trying to configure PETSc with SUNDIALS so that I may use PVODE to solve a stiff hyperbolic PDE of the form A(x) u_tt = K [B(x) u_x - F(x, u, u_x, t)]_x , t > 0, 0 < x < L , u_x(0, t) = u_x(L, t) = 0 , u(x, 0) = u_t(x, 0) = 0 , where K >> 1. I was reading around to see what may be a good numerical implementation for such a problem and it can be tricky here because the stiffness is both in the linear part and the nonlinear forcing term. I want to try PVODE availble in the SUNDIALS package. When I try to configure PETSc with SUNDIALS, I get the message: Downloaded sundials could not be used. Please check install in /Users/namupatel/Softwares/PETSc/3.6.2/linux-dbg ******************************************************************************* File "./config/configure.py", line 363, in petsc_configure framework.configure(out = sys.stdout) File "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/framework.py", line 1081, in configure self.processChildren() File "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/framework.py", line 1070, in processChildren self.serialEvaluation(self.childGraph) File "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/framework.py", line 1051, in serialEvaluation child.configure() File "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/package.py", line 677, in configure self.executeTest(self.configureLibrary) File "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/base.py", line 126, in executeTest ret = test(*args,**kargs) File "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/package.py", line 592, in configureLibrary for location, directory, lib, incl in self.generateGuesses(): File "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/package.py", line 332, in generateGuesses raise RuntimeError('Downloaded '+self.package+' could not be used. Please check install in '+d+'\n') Two questions: 1. How can I resolve the above error? 2. Are there any recommendations to solving the stiff PDE stated above so that I can experiment to see what may be an efficient implementation? Thank you, Namu -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sun Nov 29 11:48:01 2015 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 29 Nov 2015 11:48:01 -0600 Subject: [petsc-users] Error configuring PETSc with SUNDIALS (to solve stiff PDE) In-Reply-To: References: Message-ID: You must send configure.log Thanks, Matt On Sun, Nov 29, 2015 at 11:42 AM, namu patel wrote: > Hello All, > > I was trying to configure PETSc with SUNDIALS so that I may use PVODE to > solve a stiff hyperbolic PDE of the form > > A(x) u_tt = K [B(x) u_x - F(x, u, u_x, t)]_x , t > 0, 0 < x < L , > u_x(0, t) = u_x(L, t) = 0 , > u(x, 0) = u_t(x, 0) = 0 , > > where K >> 1. I was reading around to see what may be a good numerical > implementation for such a problem and it can be tricky here because the > stiffness is both in the linear part and the nonlinear forcing term. > > I want to try PVODE availble in the SUNDIALS package. When I try to > configure PETSc with SUNDIALS, I get the message: > > Downloaded sundials could not be used. Please check install in > /Users/namupatel/Softwares/PETSc/3.6.2/linux-dbg > > > ******************************************************************************* > > File "./config/configure.py", line 363, in petsc_configure > > framework.configure(out = sys.stdout) > > File > "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/framework.py", > line 1081, in configure > > self.processChildren() > > File > "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/framework.py", > line 1070, in processChildren > > self.serialEvaluation(self.childGraph) > > File > "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/framework.py", > line 1051, in serialEvaluation > > child.configure() > > File > "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/package.py", > line 677, in configure > > self.executeTest(self.configureLibrary) > > File > "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/base.py", > line 126, in executeTest > > ret = test(*args,**kargs) > > File > "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/package.py", > line 592, in configureLibrary > > for location, directory, lib, incl in self.generateGuesses(): > > File > "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/package.py", > line 332, in generateGuesses > > raise RuntimeError('Downloaded '+self.package+' could not be used. > Please check install in '+d+'\n') > > Two questions: > > 1. How can I resolve the above error? > > 2. Are there any recommendations to solving the stiff PDE stated above so > that I can experiment to see what may be an efficient implementation? > > Thank you, > > Namu > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From namu.patel7 at gmail.com Sun Nov 29 11:50:57 2015 From: namu.patel7 at gmail.com (namu patel) Date: Sun, 29 Nov 2015 11:50:57 -0600 Subject: [petsc-users] Error configuring PETSc with SUNDIALS (to solve stiff PDE) In-Reply-To: References: Message-ID: Attached is my configuration log file. Thanks, Namu On Sun, Nov 29, 2015 at 11:48 AM, Matthew Knepley wrote: > You must send configure.log > > Thanks, > > Matt > > On Sun, Nov 29, 2015 at 11:42 AM, namu patel > wrote: > >> Hello All, >> >> I was trying to configure PETSc with SUNDIALS so that I may use PVODE to >> solve a stiff hyperbolic PDE of the form >> >> A(x) u_tt = K [B(x) u_x - F(x, u, u_x, t)]_x , t > 0, 0 < x < L , >> u_x(0, t) = u_x(L, t) = 0 , >> u(x, 0) = u_t(x, 0) = 0 , >> >> where K >> 1. I was reading around to see what may be a good numerical >> implementation for such a problem and it can be tricky here because the >> stiffness is both in the linear part and the nonlinear forcing term. >> >> I want to try PVODE availble in the SUNDIALS package. When I try to >> configure PETSc with SUNDIALS, I get the message: >> >> Downloaded sundials could not be used. Please check install in >> /Users/namupatel/Softwares/PETSc/3.6.2/linux-dbg >> >> >> ******************************************************************************* >> >> File "./config/configure.py", line 363, in petsc_configure >> >> framework.configure(out = sys.stdout) >> >> File >> "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/framework.py", >> line 1081, in configure >> >> self.processChildren() >> >> File >> "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/framework.py", >> line 1070, in processChildren >> >> self.serialEvaluation(self.childGraph) >> >> File >> "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/framework.py", >> line 1051, in serialEvaluation >> >> child.configure() >> >> File >> "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/package.py", >> line 677, in configure >> >> self.executeTest(self.configureLibrary) >> >> File >> "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/base.py", >> line 126, in executeTest >> >> ret = test(*args,**kargs) >> >> File >> "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/package.py", >> line 592, in configureLibrary >> >> for location, directory, lib, incl in self.generateGuesses(): >> >> File >> "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/package.py", >> line 332, in generateGuesses >> >> raise RuntimeError('Downloaded '+self.package+' could not be used. >> Please check install in '+d+'\n') >> >> Two questions: >> >> 1. How can I resolve the above error? >> >> 2. Are there any recommendations to solving the stiff PDE stated above so >> that I can experiment to see what may be an efficient implementation? >> >> Thank you, >> >> Namu >> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 2594676 bytes Desc: not available URL: From knepley at gmail.com Sun Nov 29 11:56:52 2015 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 29 Nov 2015 11:56:52 -0600 Subject: [petsc-users] Error configuring PETSc with SUNDIALS (to solve stiff PDE) In-Reply-To: References: Message-ID: On Sun, Nov 29, 2015 at 11:50 AM, namu patel wrote: > Attached is my configuration log file. > Satish, the idiotic Sundials configure cannot find mpicc even when its passed in as the C compiler: MPI-C Settings -------------- checking if using MPI-C script... yes checking if absolute path to mpicc was given... no checking for mpicc... none Unable to find a functional MPI-C compiler. Try using --with-mpicc to specify a MPI-C compiler script, --with-mpi-incdir, --with-mpi-libdir and --with-mpi-libs to specify the locations of all relevant MPI files, or --with-mpi-root to specify the base installation directory of the MPI implementation to be used. Disabling the parallel NVECTOR module and all parallel examples... Do we have to give -with-mpicc as an argument? Thanks, Matt > Thanks, > Namu > > On Sun, Nov 29, 2015 at 11:48 AM, Matthew Knepley > wrote: > >> You must send configure.log >> >> Thanks, >> >> Matt >> >> On Sun, Nov 29, 2015 at 11:42 AM, namu patel >> wrote: >> >>> Hello All, >>> >>> I was trying to configure PETSc with SUNDIALS so that I may use PVODE to >>> solve a stiff hyperbolic PDE of the form >>> >>> A(x) u_tt = K [B(x) u_x - F(x, u, u_x, t)]_x , t > 0, 0 < x < L , >>> u_x(0, t) = u_x(L, t) = 0 , >>> u(x, 0) = u_t(x, 0) = 0 , >>> >>> where K >> 1. I was reading around to see what may be a good numerical >>> implementation for such a problem and it can be tricky here because the >>> stiffness is both in the linear part and the nonlinear forcing term. >>> >>> I want to try PVODE availble in the SUNDIALS package. When I try to >>> configure PETSc with SUNDIALS, I get the message: >>> >>> Downloaded sundials could not be used. Please check install in >>> /Users/namupatel/Softwares/PETSc/3.6.2/linux-dbg >>> >>> >>> ******************************************************************************* >>> >>> File "./config/configure.py", line 363, in petsc_configure >>> >>> framework.configure(out = sys.stdout) >>> >>> File >>> "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/framework.py", >>> line 1081, in configure >>> >>> self.processChildren() >>> >>> File >>> "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/framework.py", >>> line 1070, in processChildren >>> >>> self.serialEvaluation(self.childGraph) >>> >>> File >>> "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/framework.py", >>> line 1051, in serialEvaluation >>> >>> child.configure() >>> >>> File >>> "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/package.py", >>> line 677, in configure >>> >>> self.executeTest(self.configureLibrary) >>> >>> File >>> "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/base.py", >>> line 126, in executeTest >>> >>> ret = test(*args,**kargs) >>> >>> File >>> "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/package.py", >>> line 592, in configureLibrary >>> >>> for location, directory, lib, incl in self.generateGuesses(): >>> >>> File >>> "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/package.py", >>> line 332, in generateGuesses >>> >>> raise RuntimeError('Downloaded '+self.package+' could not be used. >>> Please check install in '+d+'\n') >>> >>> Two questions: >>> >>> 1. How can I resolve the above error? >>> >>> 2. Are there any recommendations to solving the stiff PDE stated above >>> so that I can experiment to see what may be an efficient implementation? >>> >>> Thank you, >>> >>> Namu >>> >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From namu.patel7 at gmail.com Sun Nov 29 12:25:45 2015 From: namu.patel7 at gmail.com (namu patel) Date: Sun, 29 Nov 2015 12:25:45 -0600 Subject: [petsc-users] Error configuring PETSc with SUNDIALS (to solve stiff PDE) In-Reply-To: References: Message-ID: I didn't try the --with-mpicc flag, however, the configuration is successful for PETSc 3.5.4. On Sun, Nov 29, 2015 at 11:56 AM, Matthew Knepley wrote: > On Sun, Nov 29, 2015 at 11:50 AM, namu patel > wrote: > >> Attached is my configuration log file. >> > > Satish, the idiotic Sundials configure cannot find mpicc even when its > passed in as the C compiler: > > MPI-C Settings > -------------- > checking if using MPI-C script... yes > checking if absolute path to mpicc was given... no > checking for mpicc... none > Unable to find a functional MPI-C compiler. > Try using --with-mpicc to specify a MPI-C compiler script, > --with-mpi-incdir, --with-mpi-libdir and --with-mpi-libs > to specify the locations of all relevant MPI files, or > --with-mpi-root to specify the base installation directory > of the MPI implementation to be used. > Disabling the parallel NVECTOR module and all parallel examples... > > Do we have to give -with-mpicc as an argument? > > Thanks, > > Matt > > >> Thanks, >> Namu >> >> On Sun, Nov 29, 2015 at 11:48 AM, Matthew Knepley >> wrote: >> >>> You must send configure.log >>> >>> Thanks, >>> >>> Matt >>> >>> On Sun, Nov 29, 2015 at 11:42 AM, namu patel >>> wrote: >>> >>>> Hello All, >>>> >>>> I was trying to configure PETSc with SUNDIALS so that I may use PVODE >>>> to solve a stiff hyperbolic PDE of the form >>>> >>>> A(x) u_tt = K [B(x) u_x - F(x, u, u_x, t)]_x , t > 0, 0 < x < L , >>>> u_x(0, t) = u_x(L, t) = 0 , >>>> u(x, 0) = u_t(x, 0) = 0 , >>>> >>>> where K >> 1. I was reading around to see what may be a good numerical >>>> implementation for such a problem and it can be tricky here because the >>>> stiffness is both in the linear part and the nonlinear forcing term. >>>> >>>> I want to try PVODE availble in the SUNDIALS package. When I try to >>>> configure PETSc with SUNDIALS, I get the message: >>>> >>>> Downloaded sundials could not be used. Please check install in >>>> /Users/namupatel/Softwares/PETSc/3.6.2/linux-dbg >>>> >>>> >>>> ******************************************************************************* >>>> >>>> File "./config/configure.py", line 363, in petsc_configure >>>> >>>> framework.configure(out = sys.stdout) >>>> >>>> File >>>> "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/framework.py", >>>> line 1081, in configure >>>> >>>> self.processChildren() >>>> >>>> File >>>> "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/framework.py", >>>> line 1070, in processChildren >>>> >>>> self.serialEvaluation(self.childGraph) >>>> >>>> File >>>> "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/framework.py", >>>> line 1051, in serialEvaluation >>>> >>>> child.configure() >>>> >>>> File >>>> "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/package.py", >>>> line 677, in configure >>>> >>>> self.executeTest(self.configureLibrary) >>>> >>>> File >>>> "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/base.py", >>>> line 126, in executeTest >>>> >>>> ret = test(*args,**kargs) >>>> >>>> File >>>> "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/package.py", >>>> line 592, in configureLibrary >>>> >>>> for location, directory, lib, incl in self.generateGuesses(): >>>> >>>> File >>>> "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/package.py", >>>> line 332, in generateGuesses >>>> >>>> raise RuntimeError('Downloaded '+self.package+' could not be used. >>>> Please check install in '+d+'\n') >>>> >>>> Two questions: >>>> >>>> 1. How can I resolve the above error? >>>> >>>> 2. Are there any recommendations to solving the stiff PDE stated above >>>> so that I can experiment to see what may be an efficient implementation? >>>> >>>> Thank you, >>>> >>>> Namu >>>> >>>> >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Sun Nov 29 12:29:26 2015 From: balay at mcs.anl.gov (Satish Balay) Date: Sun, 29 Nov 2015 12:29:26 -0600 Subject: [petsc-users] Error configuring PETSc with SUNDIALS (to solve stiff PDE) In-Reply-To: References: Message-ID: > --LDFLAGS="-L/Users/namupatel/Softwares/OpenMPI/1.10.0/lib -Wl,-rpath,/Users/namupatel/Softwares/OpenMPI/1.10.0/lib" Try: export LD_LIBRARY_PATH=/Users/namupatel/Softwares/OpenMPI/1.10.0/lib And then rerun configure Satish On Sun, 29 Nov 2015, namu patel wrote: > I didn't try the --with-mpicc flag, however, the configuration is > successful for PETSc 3.5.4. > > On Sun, Nov 29, 2015 at 11:56 AM, Matthew Knepley wrote: > > > On Sun, Nov 29, 2015 at 11:50 AM, namu patel > > wrote: > > > >> Attached is my configuration log file. > >> > > > > Satish, the idiotic Sundials configure cannot find mpicc even when its > > passed in as the C compiler: > > > > MPI-C Settings > > -------------- > > checking if using MPI-C script... yes > > checking if absolute path to mpicc was given... no > > checking for mpicc... none > > Unable to find a functional MPI-C compiler. > > Try using --with-mpicc to specify a MPI-C compiler script, > > --with-mpi-incdir, --with-mpi-libdir and --with-mpi-libs > > to specify the locations of all relevant MPI files, or > > --with-mpi-root to specify the base installation directory > > of the MPI implementation to be used. > > Disabling the parallel NVECTOR module and all parallel examples... > > > > Do we have to give -with-mpicc as an argument? > > > > Thanks, > > > > Matt > > > > > >> Thanks, > >> Namu > >> > >> On Sun, Nov 29, 2015 at 11:48 AM, Matthew Knepley > >> wrote: > >> > >>> You must send configure.log > >>> > >>> Thanks, > >>> > >>> Matt > >>> > >>> On Sun, Nov 29, 2015 at 11:42 AM, namu patel > >>> wrote: > >>> > >>>> Hello All, > >>>> > >>>> I was trying to configure PETSc with SUNDIALS so that I may use PVODE > >>>> to solve a stiff hyperbolic PDE of the form > >>>> > >>>> A(x) u_tt = K [B(x) u_x - F(x, u, u_x, t)]_x , t > 0, 0 < x < L , > >>>> u_x(0, t) = u_x(L, t) = 0 , > >>>> u(x, 0) = u_t(x, 0) = 0 , > >>>> > >>>> where K >> 1. I was reading around to see what may be a good numerical > >>>> implementation for such a problem and it can be tricky here because the > >>>> stiffness is both in the linear part and the nonlinear forcing term. > >>>> > >>>> I want to try PVODE availble in the SUNDIALS package. When I try to > >>>> configure PETSc with SUNDIALS, I get the message: > >>>> > >>>> Downloaded sundials could not be used. Please check install in > >>>> /Users/namupatel/Softwares/PETSc/3.6.2/linux-dbg > >>>> > >>>> > >>>> ******************************************************************************* > >>>> > >>>> File "./config/configure.py", line 363, in petsc_configure > >>>> > >>>> framework.configure(out = sys.stdout) > >>>> > >>>> File > >>>> "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/framework.py", > >>>> line 1081, in configure > >>>> > >>>> self.processChildren() > >>>> > >>>> File > >>>> "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/framework.py", > >>>> line 1070, in processChildren > >>>> > >>>> self.serialEvaluation(self.childGraph) > >>>> > >>>> File > >>>> "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/framework.py", > >>>> line 1051, in serialEvaluation > >>>> > >>>> child.configure() > >>>> > >>>> File > >>>> "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/package.py", > >>>> line 677, in configure > >>>> > >>>> self.executeTest(self.configureLibrary) > >>>> > >>>> File > >>>> "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/base.py", > >>>> line 126, in executeTest > >>>> > >>>> ret = test(*args,**kargs) > >>>> > >>>> File > >>>> "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/package.py", > >>>> line 592, in configureLibrary > >>>> > >>>> for location, directory, lib, incl in self.generateGuesses(): > >>>> > >>>> File > >>>> "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/package.py", > >>>> line 332, in generateGuesses > >>>> > >>>> raise RuntimeError('Downloaded '+self.package+' could not be used. > >>>> Please check install in '+d+'\n') > >>>> > >>>> Two questions: > >>>> > >>>> 1. How can I resolve the above error? > >>>> > >>>> 2. Are there any recommendations to solving the stiff PDE stated above > >>>> so that I can experiment to see what may be an efficient implementation? > >>>> > >>>> Thank you, > >>>> > >>>> Namu > >>>> > >>>> > >>>> > >>> > >>> > >>> -- > >>> What most experimenters take for granted before they begin their > >>> experiments is infinitely more interesting than any results to which their > >>> experiments lead. > >>> -- Norbert Wiener > >>> > >> > >> > > > > > > -- > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which their > > experiments lead. > > -- Norbert Wiener > > > From namu.patel7 at gmail.com Sun Nov 29 13:27:41 2015 From: namu.patel7 at gmail.com (namu patel) Date: Sun, 29 Nov 2015 13:27:41 -0600 Subject: [petsc-users] Error configuring PETSc with SUNDIALS (to solve stiff PDE) In-Reply-To: References: Message-ID: Same error with PETSc 3.6.2 when I use export LD_LIBRARY_PATH=/Users/namupatel/Softwares/OpenMPI/1.10.0/lib On Sun, Nov 29, 2015 at 12:29 PM, Satish Balay wrote: > > --LDFLAGS="-L/Users/namupatel/Softwares/OpenMPI/1.10.0/lib > -Wl,-rpath,/Users/namupatel/Softwares/OpenMPI/1.10.0/lib" > > Try: > > export LD_LIBRARY_PATH=/Users/namupatel/Softwares/OpenMPI/1.10.0/lib > > And then rerun configure > > Satish > > On Sun, 29 Nov 2015, namu patel wrote: > > > I didn't try the --with-mpicc flag, however, the configuration is > > successful for PETSc 3.5.4. > > > > On Sun, Nov 29, 2015 at 11:56 AM, Matthew Knepley > wrote: > > > > > On Sun, Nov 29, 2015 at 11:50 AM, namu patel > > > wrote: > > > > > >> Attached is my configuration log file. > > >> > > > > > > Satish, the idiotic Sundials configure cannot find mpicc even when its > > > passed in as the C compiler: > > > > > > MPI-C Settings > > > -------------- > > > checking if using MPI-C script... yes > > > checking if absolute path to mpicc was given... no > > > checking for mpicc... none > > > Unable to find a functional MPI-C compiler. > > > Try using --with-mpicc to specify a MPI-C compiler script, > > > --with-mpi-incdir, --with-mpi-libdir and --with-mpi-libs > > > to specify the locations of all relevant MPI files, or > > > --with-mpi-root to specify the base installation directory > > > of the MPI implementation to be used. > > > Disabling the parallel NVECTOR module and all parallel examples... > > > > > > Do we have to give -with-mpicc as an argument? > > > > > > Thanks, > > > > > > Matt > > > > > > > > >> Thanks, > > >> Namu > > >> > > >> On Sun, Nov 29, 2015 at 11:48 AM, Matthew Knepley > > >> wrote: > > >> > > >>> You must send configure.log > > >>> > > >>> Thanks, > > >>> > > >>> Matt > > >>> > > >>> On Sun, Nov 29, 2015 at 11:42 AM, namu patel > > >>> wrote: > > >>> > > >>>> Hello All, > > >>>> > > >>>> I was trying to configure PETSc with SUNDIALS so that I may use > PVODE > > >>>> to solve a stiff hyperbolic PDE of the form > > >>>> > > >>>> A(x) u_tt = K [B(x) u_x - F(x, u, u_x, t)]_x , t > 0, 0 < x < L , > > >>>> u_x(0, t) = u_x(L, t) = 0 , > > >>>> u(x, 0) = u_t(x, 0) = 0 , > > >>>> > > >>>> where K >> 1. I was reading around to see what may be a good > numerical > > >>>> implementation for such a problem and it can be tricky here because > the > > >>>> stiffness is both in the linear part and the nonlinear forcing term. > > >>>> > > >>>> I want to try PVODE availble in the SUNDIALS package. When I try to > > >>>> configure PETSc with SUNDIALS, I get the message: > > >>>> > > >>>> Downloaded sundials could not be used. Please check install in > > >>>> /Users/namupatel/Softwares/PETSc/3.6.2/linux-dbg > > >>>> > > >>>> > > >>>> > ******************************************************************************* > > >>>> > > >>>> File "./config/configure.py", line 363, in petsc_configure > > >>>> > > >>>> framework.configure(out = sys.stdout) > > >>>> > > >>>> File > > >>>> > "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/framework.py", > > >>>> line 1081, in configure > > >>>> > > >>>> self.processChildren() > > >>>> > > >>>> File > > >>>> > "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/framework.py", > > >>>> line 1070, in processChildren > > >>>> > > >>>> self.serialEvaluation(self.childGraph) > > >>>> > > >>>> File > > >>>> > "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/framework.py", > > >>>> line 1051, in serialEvaluation > > >>>> > > >>>> child.configure() > > >>>> > > >>>> File > > >>>> > "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/package.py", > > >>>> line 677, in configure > > >>>> > > >>>> self.executeTest(self.configureLibrary) > > >>>> > > >>>> File > > >>>> > "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/base.py", > > >>>> line 126, in executeTest > > >>>> > > >>>> ret = test(*args,**kargs) > > >>>> > > >>>> File > > >>>> > "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/package.py", > > >>>> line 592, in configureLibrary > > >>>> > > >>>> for location, directory, lib, incl in self.generateGuesses(): > > >>>> > > >>>> File > > >>>> > "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/package.py", > > >>>> line 332, in generateGuesses > > >>>> > > >>>> raise RuntimeError('Downloaded '+self.package+' could not be > used. > > >>>> Please check install in '+d+'\n') > > >>>> > > >>>> Two questions: > > >>>> > > >>>> 1. How can I resolve the above error? > > >>>> > > >>>> 2. Are there any recommendations to solving the stiff PDE stated > above > > >>>> so that I can experiment to see what may be an efficient > implementation? > > >>>> > > >>>> Thank you, > > >>>> > > >>>> Namu > > >>>> > > >>>> > > >>>> > > >>> > > >>> > > >>> -- > > >>> What most experimenters take for granted before they begin their > > >>> experiments is infinitely more interesting than any results to which > their > > >>> experiments lead. > > >>> -- Norbert Wiener > > >>> > > >> > > >> > > > > > > > > > -- > > > What most experimenters take for granted before they begin their > > > experiments is infinitely more interesting than any results to which > their > > > experiments lead. > > > -- Norbert Wiener > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 2519305 bytes Desc: not available URL: From balay at mcs.anl.gov Sun Nov 29 14:09:56 2015 From: balay at mcs.anl.gov (Satish Balay) Date: Sun, 29 Nov 2015 14:09:56 -0600 Subject: [petsc-users] Error configuring PETSc with SUNDIALS (to solve stiff PDE) In-Reply-To: References: Message-ID: Ok - I see the issue now. We are using --with-mpi-root option with sundials. However - configure is not able to determine this - when configured with CC=/path/to/mpicc sundials.py looks complicated enough - just to work arround this mpi/non-mpi compiler issue of sundials. But perhaps we should always add -with-mpicc - as you suggest [and get rid of --with-mpi-root stuff? - I'm not sure if that would work] Namu, Can you try using: --with-mpi-dir=/Users/namupatel/Softwares/OpenMPI/1.10.0 [in place of options: "--CC=/Users/namupatel/Softwares/OpenMPI/1.10.0/bin/mpicc --CXX=/Users/namupatel/Softwares/OpenMPI/1.10.0/bin/mpicxx --FC=/Users/namupatel/Softwares/OpenMPI/1.10.0/bin/mpif90" ] Satish On Sun, 29 Nov 2015, Matthew Knepley wrote: > On Sun, Nov 29, 2015 at 11:50 AM, namu patel wrote: > > > Attached is my configuration log file. > > > > Satish, the idiotic Sundials configure cannot find mpicc even when its > passed in as the C compiler: > > MPI-C Settings > -------------- > checking if using MPI-C script... yes > checking if absolute path to mpicc was given... no > checking for mpicc... none > Unable to find a functional MPI-C compiler. > Try using --with-mpicc to specify a MPI-C compiler script, > --with-mpi-incdir, --with-mpi-libdir and --with-mpi-libs > to specify the locations of all relevant MPI files, or > --with-mpi-root to specify the base installation directory > of the MPI implementation to be used. > Disabling the parallel NVECTOR module and all parallel examples... > > Do we have to give -with-mpicc as an argument? > > Thanks, > > Matt > > > > Thanks, > > Namu > > > > On Sun, Nov 29, 2015 at 11:48 AM, Matthew Knepley > > wrote: > > > >> You must send configure.log > >> > >> Thanks, > >> > >> Matt > >> > >> On Sun, Nov 29, 2015 at 11:42 AM, namu patel > >> wrote: > >> > >>> Hello All, > >>> > >>> I was trying to configure PETSc with SUNDIALS so that I may use PVODE to > >>> solve a stiff hyperbolic PDE of the form > >>> > >>> A(x) u_tt = K [B(x) u_x - F(x, u, u_x, t)]_x , t > 0, 0 < x < L , > >>> u_x(0, t) = u_x(L, t) = 0 , > >>> u(x, 0) = u_t(x, 0) = 0 , > >>> > >>> where K >> 1. I was reading around to see what may be a good numerical > >>> implementation for such a problem and it can be tricky here because the > >>> stiffness is both in the linear part and the nonlinear forcing term. > >>> > >>> I want to try PVODE availble in the SUNDIALS package. When I try to > >>> configure PETSc with SUNDIALS, I get the message: > >>> > >>> Downloaded sundials could not be used. Please check install in > >>> /Users/namupatel/Softwares/PETSc/3.6.2/linux-dbg > >>> > >>> > >>> ******************************************************************************* > >>> > >>> File "./config/configure.py", line 363, in petsc_configure > >>> > >>> framework.configure(out = sys.stdout) > >>> > >>> File > >>> "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/framework.py", > >>> line 1081, in configure > >>> > >>> self.processChildren() > >>> > >>> File > >>> "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/framework.py", > >>> line 1070, in processChildren > >>> > >>> self.serialEvaluation(self.childGraph) > >>> > >>> File > >>> "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/framework.py", > >>> line 1051, in serialEvaluation > >>> > >>> child.configure() > >>> > >>> File > >>> "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/package.py", > >>> line 677, in configure > >>> > >>> self.executeTest(self.configureLibrary) > >>> > >>> File > >>> "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/base.py", > >>> line 126, in executeTest > >>> > >>> ret = test(*args,**kargs) > >>> > >>> File > >>> "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/package.py", > >>> line 592, in configureLibrary > >>> > >>> for location, directory, lib, incl in self.generateGuesses(): > >>> > >>> File > >>> "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/package.py", > >>> line 332, in generateGuesses > >>> > >>> raise RuntimeError('Downloaded '+self.package+' could not be used. > >>> Please check install in '+d+'\n') > >>> > >>> Two questions: > >>> > >>> 1. How can I resolve the above error? > >>> > >>> 2. Are there any recommendations to solving the stiff PDE stated above > >>> so that I can experiment to see what may be an efficient implementation? > >>> > >>> Thank you, > >>> > >>> Namu > >>> > >>> > >>> > >> > >> > >> -- > >> What most experimenters take for granted before they begin their > >> experiments is infinitely more interesting than any results to which their > >> experiments lead. > >> -- Norbert Wiener > >> > > > > > > > From namu.patel7 at gmail.com Sun Nov 29 14:19:02 2015 From: namu.patel7 at gmail.com (namu patel) Date: Sun, 29 Nov 2015 14:19:02 -0600 Subject: [petsc-users] Error configuring PETSc with SUNDIALS (to solve stiff PDE) In-Reply-To: References: Message-ID: On Sun, Nov 29, 2015 at 2:09 PM, Satish Balay wrote: > > > Namu, Can you try using: > > --with-mpi-dir=/Users/namupatel/Softwares/OpenMPI/1.10.0 > > [in place of options: > "--CC=/Users/namupatel/Softwares/OpenMPI/1.10.0/bin/mpicc > --CXX=/Users/namupatel/Softwares/OpenMPI/1.10.0/bin/mpicxx > --FC=/Users/namupatel/Softwares/OpenMPI/1.10.0/bin/mpif90" ] This works, thanks. On Sun, Nov 29, 2015 at 2:09 PM, Satish Balay wrote: > Ok - I see the issue now. > > We are using --with-mpi-root option with sundials. > > However - configure is not able to determine this - when configured > with CC=/path/to/mpicc > > sundials.py looks complicated enough - just to work arround this > mpi/non-mpi compiler issue of sundials. > > But perhaps we should always add -with-mpicc - as you suggest > [and get rid of --with-mpi-root stuff? - I'm not sure if that would work] > > Namu, Can you try using: > > --with-mpi-dir=/Users/namupatel/Softwares/OpenMPI/1.10.0 > > [in place of options: > "--CC=/Users/namupatel/Softwares/OpenMPI/1.10.0/bin/mpicc > --CXX=/Users/namupatel/Softwares/OpenMPI/1.10.0/bin/mpicxx > --FC=/Users/namupatel/Softwares/OpenMPI/1.10.0/bin/mpif90" ] > > Satish > > On Sun, 29 Nov 2015, Matthew Knepley wrote: > > > On Sun, Nov 29, 2015 at 11:50 AM, namu patel > wrote: > > > > > Attached is my configuration log file. > > > > > > > Satish, the idiotic Sundials configure cannot find mpicc even when its > > passed in as the C compiler: > > > > MPI-C Settings > > -------------- > > checking if using MPI-C script... yes > > checking if absolute path to mpicc was given... no > > checking for mpicc... none > > Unable to find a functional MPI-C compiler. > > Try using --with-mpicc to specify a MPI-C compiler script, > > --with-mpi-incdir, --with-mpi-libdir and --with-mpi-libs > > to specify the locations of all relevant MPI files, or > > --with-mpi-root to specify the base installation directory > > of the MPI implementation to be used. > > Disabling the parallel NVECTOR module and all parallel examples... > > > > Do we have to give -with-mpicc as an argument? > > > > Thanks, > > > > Matt > > > > > > > Thanks, > > > Namu > > > > > > On Sun, Nov 29, 2015 at 11:48 AM, Matthew Knepley > > > wrote: > > > > > >> You must send configure.log > > >> > > >> Thanks, > > >> > > >> Matt > > >> > > >> On Sun, Nov 29, 2015 at 11:42 AM, namu patel > > >> wrote: > > >> > > >>> Hello All, > > >>> > > >>> I was trying to configure PETSc with SUNDIALS so that I may use > PVODE to > > >>> solve a stiff hyperbolic PDE of the form > > >>> > > >>> A(x) u_tt = K [B(x) u_x - F(x, u, u_x, t)]_x , t > 0, 0 < x < L , > > >>> u_x(0, t) = u_x(L, t) = 0 , > > >>> u(x, 0) = u_t(x, 0) = 0 , > > >>> > > >>> where K >> 1. I was reading around to see what may be a good > numerical > > >>> implementation for such a problem and it can be tricky here because > the > > >>> stiffness is both in the linear part and the nonlinear forcing term. > > >>> > > >>> I want to try PVODE availble in the SUNDIALS package. When I try to > > >>> configure PETSc with SUNDIALS, I get the message: > > >>> > > >>> Downloaded sundials could not be used. Please check install in > > >>> /Users/namupatel/Softwares/PETSc/3.6.2/linux-dbg > > >>> > > >>> > > >>> > ******************************************************************************* > > >>> > > >>> File "./config/configure.py", line 363, in petsc_configure > > >>> > > >>> framework.configure(out = sys.stdout) > > >>> > > >>> File > > >>> > "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/framework.py", > > >>> line 1081, in configure > > >>> > > >>> self.processChildren() > > >>> > > >>> File > > >>> > "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/framework.py", > > >>> line 1070, in processChildren > > >>> > > >>> self.serialEvaluation(self.childGraph) > > >>> > > >>> File > > >>> > "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/framework.py", > > >>> line 1051, in serialEvaluation > > >>> > > >>> child.configure() > > >>> > > >>> File > > >>> > "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/package.py", > > >>> line 677, in configure > > >>> > > >>> self.executeTest(self.configureLibrary) > > >>> > > >>> File > > >>> > "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/base.py", > > >>> line 126, in executeTest > > >>> > > >>> ret = test(*args,**kargs) > > >>> > > >>> File > > >>> > "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/package.py", > > >>> line 592, in configureLibrary > > >>> > > >>> for location, directory, lib, incl in self.generateGuesses(): > > >>> > > >>> File > > >>> > "/Users/namupatel/Softwares/PETSc/3.6.2/config/BuildSystem/config/package.py", > > >>> line 332, in generateGuesses > > >>> > > >>> raise RuntimeError('Downloaded '+self.package+' could not be > used. > > >>> Please check install in '+d+'\n') > > >>> > > >>> Two questions: > > >>> > > >>> 1. How can I resolve the above error? > > >>> > > >>> 2. Are there any recommendations to solving the stiff PDE stated > above > > >>> so that I can experiment to see what may be an efficient > implementation? > > >>> > > >>> Thank you, > > >>> > > >>> Namu > > >>> > > >>> > > >>> > > >> > > >> > > >> -- > > >> What most experimenters take for granted before they begin their > > >> experiments is infinitely more interesting than any results to which > their > > >> experiments lead. > > >> -- Norbert Wiener > > >> > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mono at mek.dtu.dk Mon Nov 30 07:01:49 2015 From: mono at mek.dtu.dk (=?utf-8?B?TW9ydGVuIE5vYmVsLUrDuHJnZW5zZW4=?=) Date: Mon, 30 Nov 2015 13:01:49 +0000 Subject: [petsc-users] DMPlex: Ghost points after DMRefine Message-ID: I have a very simple unstructured mesh composed of two triangles (four vertices) with one shared edge using a DMPlex: /|\ / | \ \ | / \|/ After distributing this mesh to two processes, each process owns a triangle. However one process owns tree vertices, while the last vertex is owned by the other process. The problem occurs after uniformly refining the dm. The mesh now looks like this: /|\ /\|/\ \/|\/ \|/ The new center vertex is now not listed as a ghost vertex but instead exists as two individual points. Is there any way that this new center vertex could be created as a ghost vertex during refinement? Kind regards, Morten Ps. Here are some code snippets for getting global point index and test of point is a ghost point: int localToGlobal(DM dm, PetscInt point){ const PetscInt* array; ISLocalToGlobalMapping ltogm; DMGetLocalToGlobalMapping(dm,<ogm); ISLocalToGlobalMappingGetIndices(ltogm, &array); PetscInt res = array[point]; if (res < 0){ // if ghost res = -res +1; } return res; } bool isGhost(DM dm, PetscInt point){ const PetscInt* array; ISLocalToGlobalMapping ltogm; DMGetLocalToGlobalMapping(dm,<ogm); ISLocalToGlobalMappingGetIndices(ltogm, &array); return array[point]<0; } -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Nov 30 07:08:48 2015 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 30 Nov 2015 07:08:48 -0600 Subject: [petsc-users] DMPlex: Ghost points after DMRefine In-Reply-To: References: Message-ID: On Mon, Nov 30, 2015 at 7:01 AM, Morten Nobel-J?rgensen wrote: > I have a very simple unstructured mesh composed of two triangles (four > vertices) with one shared edge using a DMPlex: > > /|\ > / | \ > \ | / > \|/ > > After distributing this mesh to two processes, each process owns a > triangle. However one process owns tree vertices, while the last vertex is > owned by the other process. > > The problem occurs after uniformly refining the dm. The mesh now looks > like this: > > /|\ > /\|/\ > \/|\/ > \|/ > > The new center vertex is now not listed as a ghost vertex but instead > exists as two individual points. > > Is there any way that this new center vertex could be created as a ghost > vertex during refinement? > This could be a bug with the l2g mapping. I do not recreate it when refining, only the SF defining the mapping. Here is an experiment: do not retrieve the mapping until after the refinement. Do you get what you want? If so, I can easily fix this by destroying the map when I refine. Thanks, Matt > Kind regards, > Morten > > Ps. Here are some code snippets for getting global point index and test of > point is a ghost point: > > int localToGlobal(DM dm, PetscInt point){ > const PetscInt* array; > ISLocalToGlobalMapping ltogm; > DMGetLocalToGlobalMapping(dm,<ogm); > ISLocalToGlobalMappingGetIndices(ltogm, &array); > PetscInt res = array[point]; > if (res < 0){ // if ghost > res = -res +1; > } > return res; > } > > bool isGhost(DM dm, PetscInt point){ > const PetscInt* array; > ISLocalToGlobalMapping ltogm; > DMGetLocalToGlobalMapping(dm,<ogm); > ISLocalToGlobalMappingGetIndices(ltogm, &array); > return array[point]<0; > } > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From soumyamechanics at gmail.com Mon Nov 30 07:59:17 2015 From: soumyamechanics at gmail.com (Soumya Mukherjee) Date: Mon, 30 Nov 2015 08:59:17 -0500 Subject: [petsc-users] PETSC error: Caught signal number 8 FPE In-Reply-To: References: <44C0070A-C473-4218-85E1-A4451218C250@dsic.upv.es> Message-ID: It is a PETSc error. And I just wanted to know if runs without an error in your machine. On Nov 30, 2015 4:34 AM, "Jose E. Roman" wrote: > > I am not going to run your code. We are not a free debugging service. You have to debug the code yourself, and let us know only if the issue is related to the SLEPc library. Start adding error checking code with the CHKERRQ macro to all PETSc/SLEPc calls. This will catch most errors. It no errors are detected, then run with a debugger such as gdb or valgrind to determine the exact point where the program fails. > > Jose > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Nov 30 08:08:03 2015 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 30 Nov 2015 08:08:03 -0600 Subject: [petsc-users] PETSC error: Caught signal number 8 FPE In-Reply-To: References: <44C0070A-C473-4218-85E1-A4451218C250@dsic.upv.es> Message-ID: On Mon, Nov 30, 2015 at 7:59 AM, Soumya Mukherjee wrote: > It is a PETSc error. And I just wanted to know if runs without an error in > your machine. > This is not a PETSc error, as such. PETSc installs a signal handler so that we can try and get more information about signals. However, it is likely that you have a Floating Point Exception, like a divide by zero, in your user code. Thanks, Matt > On Nov 30, 2015 4:34 AM, "Jose E. Roman" wrote: > > > > I am not going to run your code. We are not a free debugging service. > You have to debug the code yourself, and let us know only if the issue is > related to the SLEPc library. Start adding error checking code with the > CHKERRQ macro to all PETSc/SLEPc calls. This will catch most errors. It no > errors are detected, then run with a debugger such as gdb or valgrind to > determine the exact point where the program fails. > > > > Jose > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From Eric.Chamberland at giref.ulaval.ca Mon Nov 30 10:14:51 2015 From: Eric.Chamberland at giref.ulaval.ca (Eric Chamberland) Date: Mon, 30 Nov 2015 11:14:51 -0500 Subject: [petsc-users] mpi_aij MatGetSubMatrix with mat_block_size!=1 Message-ID: <565C75FB.6060701@giref.ulaval.ca> Hi, Using PETSc 3.5.3. We have a "A" matrix, mpi_aij with block_size=3. We create a IS with ISCreateStride, then extract A_00 with MatGetSubMatrix(..., MAT_INITIAL_MATRIX,...). We know that A_00 is block_size = 3 and mpi_aij, however the matrix created by PETSc doesn't have the information... How can I have the block_size=3 option into the extracted matrix so the further PC we configure (gamg) can work with it? I tried: MatSetBlockSizes(A_00,3, 3); after the MatGetSubMatrix, but it doesn't change it... Thanks, Eric From lawrence.mitchell at imperial.ac.uk Mon Nov 30 10:18:20 2015 From: lawrence.mitchell at imperial.ac.uk (Lawrence Mitchell) Date: Mon, 30 Nov 2015 16:18:20 +0000 Subject: [petsc-users] mpi_aij MatGetSubMatrix with mat_block_size!=1 In-Reply-To: <565C75FB.6060701@giref.ulaval.ca> References: <565C75FB.6060701@giref.ulaval.ca> Message-ID: <565C76CC.9000709@imperial.ac.uk> On 30/11/15 16:14, Eric Chamberland wrote: > Hi, > > Using PETSc 3.5.3. > > We have a "A" matrix, mpi_aij with block_size=3. > > We create a IS with ISCreateStride, then extract A_00 with > MatGetSubMatrix(..., MAT_INITIAL_MATRIX,...). > > We know that A_00 is block_size = 3 and mpi_aij, however the matrix > created by PETSc doesn't have the information... > > How can I have the block_size=3 option into the extracted matrix so the > further PC we configure (gamg) can work with it? > > I tried: > > MatSetBlockSizes(A_00,3, 3); > after the MatGetSubMatrix, but it doesn't change it... The block size of the submatrix comes from the block size that lives on the IS used to define it. So set a block size on the IS you make (ISSetBlockSize). Cheers, LAwrence -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 490 bytes Desc: OpenPGP digital signature URL: From mono at mek.dtu.dk Mon Nov 30 10:24:40 2015 From: mono at mek.dtu.dk (=?utf-8?B?TW9ydGVuIE5vYmVsLUrDuHJnZW5zZW4=?=) Date: Mon, 30 Nov 2015 16:24:40 +0000 Subject: [petsc-users] DMPlex: Ghost points after DMRefine In-Reply-To: References: Message-ID: <619D3787-40B8-4E37-B143-22A49C595D28@dtu.dk> Hi Matt I don?t think the problem is within Petsc - rather somewhere in my code. When I dump the DMPlex using DMView (ascii-info?detail) the ghost mapping seems to be setup correctly. Is there a better way to determine if a local point is a ghost point? The way I iterate the DMPlex is like this: void iterateDMPlex(DM dm){ Vec coordinates; DMGetCoordinatesLocal(dm, &coordinates); PetscSection defaultSection; DMGetDefaultSection(dm, &defaultSection); PetscSection coordSection; DMGetCoordinateSection(dm, &coordSection); PetscScalar *coords; VecGetArray(coordinates, &coords); DM cdm; DMGetCoordinateDM(dm, &cdm); // iterate (local) mesh PetscInt cellsFrom, cellsTo; std::string s = ""; DMPlexGetHeightStratum(dm, 0, &cellsFrom, &cellsTo); for (PetscInt i=cellsFrom;i> Date: Monday 30 November 2015 at 14:08 To: Morten Nobel-J?rgensen > Cc: "petsc-users at mcs.anl.gov" > Subject: Re: [petsc-users] DMPlex: Ghost points after DMRefine On Mon, Nov 30, 2015 at 7:01 AM, Morten Nobel-J?rgensen > wrote: I have a very simple unstructured mesh composed of two triangles (four vertices) with one shared edge using a DMPlex: /|\ / | \ \ | / \|/ After distributing this mesh to two processes, each process owns a triangle. However one process owns tree vertices, while the last vertex is owned by the other process. The problem occurs after uniformly refining the dm. The mesh now looks like this: /|\ /\|/\ \/|\/ \|/ The new center vertex is now not listed as a ghost vertex but instead exists as two individual points. Is there any way that this new center vertex could be created as a ghost vertex during refinement? This could be a bug with the l2g mapping. I do not recreate it when refining, only the SF defining the mapping. Here is an experiment: do not retrieve the mapping until after the refinement. Do you get what you want? If so, I can easily fix this by destroying the map when I refine. Thanks, Matt Kind regards, Morten Ps. Here are some code snippets for getting global point index and test of point is a ghost point: int localToGlobal(DM dm, PetscInt point){ const PetscInt* array; ISLocalToGlobalMapping ltogm; DMGetLocalToGlobalMapping(dm,<ogm); ISLocalToGlobalMappingGetIndices(ltogm, &array); PetscInt res = array[point]; if (res < 0){ // if ghost res = -res +1; } return res; } bool isGhost(DM dm, PetscInt point){ const PetscInt* array; ISLocalToGlobalMapping ltogm; DMGetLocalToGlobalMapping(dm,<ogm); ISLocalToGlobalMappingGetIndices(ltogm, &array); return array[point]<0; } -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From Eric.Chamberland at giref.ulaval.ca Mon Nov 30 10:28:30 2015 From: Eric.Chamberland at giref.ulaval.ca (Eric Chamberland) Date: Mon, 30 Nov 2015 11:28:30 -0500 Subject: [petsc-users] mpi_aij MatGetSubMatrix with mat_block_size!=1 In-Reply-To: <565C76CC.9000709@imperial.ac.uk> References: <565C75FB.6060701@giref.ulaval.ca> <565C76CC.9000709@imperial.ac.uk> Message-ID: <565C792E.2090901@giref.ulaval.ca> Le 2015-11-30 11:18, Lawrence Mitchell a ?crit : > > The block size of the submatrix comes from the block size that lives > on the IS used to define it. So set a block size on the IS you make > (ISSetBlockSize). > Great! It works! :) Thanks! Eric > Cheers, > > LAwrence > From adlinds3 at ncsu.edu Mon Nov 30 14:19:55 2015 From: adlinds3 at ncsu.edu (Alex Lindsay) Date: Mon, 30 Nov 2015 15:19:55 -0500 Subject: [petsc-users] Output newton step Message-ID: <565CAF6B.50209@ncsu.edu> Is there an option for outputting the Newton step after my linear solve? Alex From soumyamechanics at gmail.com Mon Nov 30 14:43:35 2015 From: soumyamechanics at gmail.com (Soumya Mukherjee) Date: Mon, 30 Nov 2015 15:43:35 -0500 Subject: [petsc-users] PETSC error: Caught signal number 8 FPE In-Reply-To: References: <44C0070A-C473-4218-85E1-A4451218C250@dsic.upv.es> Message-ID: Thanks for the reply. The error message shows [0]PETSC ERROR: Invalid argument [0]PETSC ERROR: Scalar value must be same on all processes, argument # 3 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.6.1, Jul, 22, 2015 [0]PETSC ERROR: ./main on a arch-linux2-cxx-debug named soumya-OptiPlex-9010 by soumya Mon Nov 30 12:30:28 2015 [0]PETSC ERROR: Configure options --with-cc=gcc --with-fc=gfortran --with-cxx=g++ --with-clanguage=cxx --download-fblaslapack --download-mpich --with-scalar-type=complex [0]PETSC ERROR: #1016 BVScaleColumn() line 380 in /home/soumya/slepc-3.6.0/src/sys/classes/bv/interface/bvops.c [0]PETSC ERROR: #1017 EPSBasicArnoldi() line 65 in /home/soumya/slepc-3.6.0/src/eps/impls/krylov/epskrylov.c [0]PETSC ERROR: #1018 EPSSolve_KrylovSchur_Default() line 201 in /home/soumya/slepc-3.6.0/src/eps/impls/krylov/krylovschur/krylovschur.c [0]PETSC ERROR: #1019 EPSSolve() line 101 in /home/soumya/slepc-3.6.0/src/eps/interface/epssolve.c [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Must call EPSSolve() first: Parameter #1 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.6.1, Jul, 22, 2015 [0]PETSC ERROR: ./main on a arch-linux2-cxx-debug named soumya-OptiPlex-9010 by soumya Mon Nov 30 12:30:28 2015 [0]PETSC ERROR: Configure options --with-cc=gcc --with-fc=gfortran --with-cxx=g++ --with-clanguage=cxx --download-fblaslapack --download-mpich --with-scalar-type=complex [0]PETSC ERROR: #1020 EPSGetConverged() line 236 in /home/soumya/slepc-3.6.0/src/eps/interface/epssolve.c [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Must call EPSSolve() first: Parameter #1 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.6.1, Jul, 22, 2015 [0]PETSC ERROR: ./main on a arch-linux2-cxx-debug named soumya-OptiPlex-9010 by soumya Mon Nov 30 12:30:28 2015 [0]PETSC ERROR: Configure options --with-cc=gcc --with-fc=gfortran --with-cxx=g++ --with-clanguage=cxx --download-fblaslapack --download-mpich --with-scalar-type=complex [0]PETSC ERROR: #1021 EPSGetEigenpair() line 378 in /home/soumya/slepc-3.6.0/src/eps/interface/epssolve.c [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Must call EPSSolve() first: Parameter #1 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.6.1, Jul, 22, 2015 [0]PETSC ERROR: ./main on a arch-linux2-cxx-debug named soumya-OptiPlex-9010 by soumya Mon Nov 30 12:30:28 2015 [0]PETSC ERROR: Configure options --with-cc=gcc --with-fc=gfortran --with-cxx=g++ --with-clanguage=cxx --download-fblaslapack --download-mpich --with-scalar-type=complex [0]PETSC ERROR: #1022 EPSGetEigenpair() line 378 in /home/soumya/slepc-3.6.0/src/eps/interface/epssolve.c On Mon, Nov 30, 2015 at 9:08 AM, Matthew Knepley wrote: > On Mon, Nov 30, 2015 at 7:59 AM, Soumya Mukherjee < > soumyamechanics at gmail.com> wrote: > >> It is a PETSc error. And I just wanted to know if runs without an error >> in your machine. >> > This is not a PETSc error, as such. PETSc installs a signal handler so > that we can try and get more > information about signals. However, it is likely that you have a Floating > Point Exception, like a divide > by zero, in your user code. > > Thanks, > > Matt > >> On Nov 30, 2015 4:34 AM, "Jose E. Roman" wrote: >> > >> > I am not going to run your code. We are not a free debugging service. >> You have to debug the code yourself, and let us know only if the issue is >> related to the SLEPc library. Start adding error checking code with the >> CHKERRQ macro to all PETSc/SLEPc calls. This will catch most errors. It no >> errors are detected, then run with a debugger such as gdb or valgrind to >> determine the exact point where the program fails. >> > >> > Jose >> > >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon Nov 30 16:56:50 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 30 Nov 2015 16:56:50 -0600 Subject: [petsc-users] Output newton step In-Reply-To: <565CAF6B.50209@ncsu.edu> References: <565CAF6B.50209@ncsu.edu> Message-ID: <7F088E0C-413B-4311-8CF1-87E33643BE42@mcs.anl.gov> > On Nov 30, 2015, at 2:19 PM, Alex Lindsay wrote: > > Is there an option for outputting the Newton step after my linear solve? > > Alex Do you want the solution of the linear system before the line search (line search may shrink the vector) use -ksp_view_solution or the actual update selected by Newton -snes_monitor_solution_update If you use the master branch of PETSc then both of these flags take the option [ascii or binary or draw][:filename][:viewer format] allowing printing as ascii, binary or drawing the solution in a window (Drawing only works for DMDA 1d or 2d). Barry For the release version of PETSc it saves the vectors in a binary file called binaryoutput From aovsyannikov at lbl.gov Mon Nov 30 17:20:37 2015 From: aovsyannikov at lbl.gov (Andrey Ovsyannikov) Date: Mon, 30 Nov 2015 15:20:37 -0800 Subject: [petsc-users] Memory usage function: output for all ranks Message-ID: Dear PETSc team, I am working on optimization of Chombo-Crunch CFD code for next-generation supercomputer architectures at NERSC (Berkeley Lab) and we use PETSc AMG solver. During memory analysis study I faced with a difficulty to get memory usage data from PETSc for all MPI ranks. I am looking for memory dump function to get a detailed information on memory usage (not only resident size and virtual memory but allso allocation by Vec, Mat, etc). There is PetscMallocDumpLog() function but it is a collective function and it always provides a log for 0 rank. I am wondering if it is possible to include in PETSc a modification of PetscMallocDumpLog() which dumps the similar log but for all MPI ranks. I am attaching an example of my own memory function which uses PETSc non-collective functions and it provides a resident set size and virtual memory for all ranks. Perhaps in a similar way it is possible to modify PetscMallocDumpLog. Thank you, void petscMemoryLog(const char prefix[]) { FILE* fd; char fname[PETSC_MAX_PATH_LEN]; PetscMPIInt rank; MPI_Comm_rank(Chombo_MPI::comm,&rank); PetscLogDouble allocated; PetscLogDouble resident; PetscMallocGetCurrentUsage(&allocated); PetscMemoryGetCurrentUsage(&resident); PetscSNPrintf(fname,sizeof(fname),"%s.%d",prefix,rank); PetscFOpen(PETSC_COMM_SELF,fname,"a",&fd); PetscFPrintf(PETSC_COMM_SELF,fd,"### PETSc memory footprint for rank %d \n",rank); PetscFPrintf(PETSC_COMM_SELF,fd,"[%d] Memory allocated by PetscMalloc() %.0f bytes\n",rank,allocated); PetscFPrintf(PETSC_COMM_SELF,fd,"[%d] RSS usage by entire process %.0f KB\n",rank,resident); PetscFClose(PETSC_COMM_SELF,fd); } Best regards, Andrey Ovsyannikov, Ph.D. Postdoctoral Fellow NERSC Division Lawrence Berkeley National Laboratory 510-486-7880 aovsyannikov at lbl.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Nov 30 17:31:12 2015 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 30 Nov 2015 17:31:12 -0600 Subject: [petsc-users] [petsc-maint] Memory usage function: output for all ranks In-Reply-To: References: Message-ID: On Mon, Nov 30, 2015 at 5:20 PM, Andrey Ovsyannikov wrote: > Dear PETSc team, > > I am working on optimization of Chombo-Crunch CFD code for next-generation > supercomputer architectures at NERSC (Berkeley Lab) and we use PETSc AMG > solver. During memory analysis study I faced with a difficulty to get > memory usage data from PETSc for all MPI ranks. I am looking for memory > dump function to get a detailed information on memory usage (not only > resident size and virtual memory but allso allocation by Vec, Mat, etc). > There is PetscMallocDumpLog() function but it is a collective function and > it always provides a log for 0 rank. I am wondering if it is possible to > include in PETSc a modification of PetscMallocDumpLog() which dumps the > similar log but for all MPI ranks. > > I am attaching an example of my own memory function which uses PETSc > non-collective functions and it provides a resident set size and virtual > memory for all ranks. Perhaps in a similar way it is possible to modify > PetscMallocDumpLog. > You could walk the heap if you use the debugging malloc infrastructure in PETSc. However, I would really recommend trying out Massif from the valgrind toolset. Its designed for this and really nice. Thanks, Matt > Thank you, > > void petscMemoryLog(const char prefix[]) > { > FILE* fd; > char fname[PETSC_MAX_PATH_LEN]; > PetscMPIInt rank; > > MPI_Comm_rank(Chombo_MPI::comm,&rank); > > PetscLogDouble allocated; > PetscLogDouble resident; > PetscMallocGetCurrentUsage(&allocated); > PetscMemoryGetCurrentUsage(&resident); > PetscSNPrintf(fname,sizeof(fname),"%s.%d",prefix,rank); > PetscFOpen(PETSC_COMM_SELF,fname,"a",&fd); > > PetscFPrintf(PETSC_COMM_SELF,fd,"### PETSc memory footprint for rank %d > \n",rank); > PetscFPrintf(PETSC_COMM_SELF,fd,"[%d] Memory allocated by PetscMalloc() > %.0f bytes\n",rank,allocated); > PetscFPrintf(PETSC_COMM_SELF,fd,"[%d] RSS usage by entire process %.0f > KB\n",rank,resident); > PetscFClose(PETSC_COMM_SELF,fd); > } > > Best regards, > Andrey Ovsyannikov, Ph.D. > Postdoctoral Fellow > NERSC Division > Lawrence Berkeley National Laboratory > 510-486-7880 > aovsyannikov at lbl.gov > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From aovsyannikov at lbl.gov Mon Nov 30 17:42:57 2015 From: aovsyannikov at lbl.gov (Andrey Ovsyannikov) Date: Mon, 30 Nov 2015 15:42:57 -0800 Subject: [petsc-users] [petsc-maint] Memory usage function: output for all ranks In-Reply-To: References: Message-ID: Hi Matt, Thanks for your quick response. I like Massif tool and I have been using it recently. However, I was not able to run Valgrind for large jobs. I am interested in memory analysis of large scale runs with more than 1000 MPI ranks. PetscMemoryGetCurrentUsage() works fine for this puprpose but it does not provide details where I allocate memory. Maybe it would beneficial for PETSc community to have some tool/function from PETSc itself. Anyway, thanks very much for your suggestion! Andrey On Mon, Nov 30, 2015 at 3:31 PM, Matthew Knepley wrote: > On Mon, Nov 30, 2015 at 5:20 PM, Andrey Ovsyannikov > wrote: > >> Dear PETSc team, >> >> I am working on optimization of Chombo-Crunch CFD code for >> next-generation supercomputer architectures at NERSC (Berkeley Lab) and we >> use PETSc AMG solver. During memory analysis study I faced with a >> difficulty to get memory usage data from PETSc for all MPI ranks. I am >> looking for memory dump function to get a detailed information on memory >> usage (not only resident size and virtual memory but allso allocation by >> Vec, Mat, etc). There is PetscMallocDumpLog() function but it is a >> collective function and it always provides a log for 0 rank. I am wondering >> if it is possible to include in PETSc a modification of >> PetscMallocDumpLog() which dumps the similar log but for all MPI ranks. >> >> I am attaching an example of my own memory function which uses PETSc >> non-collective functions and it provides a resident set size and virtual >> memory for all ranks. Perhaps in a similar way it is possible to modify >> PetscMallocDumpLog. >> > > You could walk the heap if you use the debugging malloc infrastructure in > PETSc. However, I would really recommend > trying out Massif from the valgrind toolset. Its designed for this and > really nice. > > Thanks, > > Matt > > >> Thank you, >> >> void petscMemoryLog(const char prefix[]) >> { >> FILE* fd; >> char fname[PETSC_MAX_PATH_LEN]; >> PetscMPIInt rank; >> >> MPI_Comm_rank(Chombo_MPI::comm,&rank); >> >> PetscLogDouble allocated; >> PetscLogDouble resident; >> PetscMallocGetCurrentUsage(&allocated); >> PetscMemoryGetCurrentUsage(&resident); >> PetscSNPrintf(fname,sizeof(fname),"%s.%d",prefix,rank); >> PetscFOpen(PETSC_COMM_SELF,fname,"a",&fd); >> >> PetscFPrintf(PETSC_COMM_SELF,fd,"### PETSc memory footprint for rank %d >> \n",rank); >> PetscFPrintf(PETSC_COMM_SELF,fd,"[%d] Memory allocated by PetscMalloc() >> %.0f bytes\n",rank,allocated); >> PetscFPrintf(PETSC_COMM_SELF,fd,"[%d] RSS usage by entire process %.0f >> KB\n",rank,resident); >> PetscFClose(PETSC_COMM_SELF,fd); >> } >> >> Best regards, >> Andrey Ovsyannikov, Ph.D. >> Postdoctoral Fellow >> NERSC Division >> Lawrence Berkeley National Laboratory >> 510-486-7880 >> aovsyannikov at lbl.gov >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -- Andrey Ovsyannikov, Ph.D. Postdoctoral Fellow NERSC Division Lawrence Berkeley National Laboratory 510-486-7880 aovsyannikov at lbl.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From richardtmills at gmail.com Mon Nov 30 19:03:50 2015 From: richardtmills at gmail.com (Richard Mills) Date: Mon, 30 Nov 2015 17:03:50 -0800 Subject: [petsc-users] [petsc-maint] Memory usage function: output for all ranks In-Reply-To: References: Message-ID: Andrey, Maybe this is what you tried, but did you try running only a handful of MPI ranks (out of your 1000) with Massif? I've had success doing things that way. You won't know what every rank is doing, but you may be able to get a good idea from your sample. --Richard On Mon, Nov 30, 2015 at 3:42 PM, Andrey Ovsyannikov wrote: > Hi Matt, > > Thanks for your quick response. I like Massif tool and I have been using > it recently. However, I was not able to run Valgrind for large jobs. I am > interested in memory analysis of large scale runs with more than 1000 MPI > ranks. PetscMemoryGetCurrentUsage() works fine for this puprpose but it > does not provide details where I allocate memory. Maybe it would beneficial > for PETSc community to have some tool/function from PETSc itself. > > Anyway, thanks very much for your suggestion! > > Andrey > > On Mon, Nov 30, 2015 at 3:31 PM, Matthew Knepley > wrote: > >> On Mon, Nov 30, 2015 at 5:20 PM, Andrey Ovsyannikov > > wrote: >> >>> Dear PETSc team, >>> >>> I am working on optimization of Chombo-Crunch CFD code for >>> next-generation supercomputer architectures at NERSC (Berkeley Lab) and we >>> use PETSc AMG solver. During memory analysis study I faced with a >>> difficulty to get memory usage data from PETSc for all MPI ranks. I am >>> looking for memory dump function to get a detailed information on memory >>> usage (not only resident size and virtual memory but allso allocation by >>> Vec, Mat, etc). There is PetscMallocDumpLog() function but it is a >>> collective function and it always provides a log for 0 rank. I am wondering >>> if it is possible to include in PETSc a modification of >>> PetscMallocDumpLog() which dumps the similar log but for all MPI ranks. >>> >>> I am attaching an example of my own memory function which uses PETSc >>> non-collective functions and it provides a resident set size and virtual >>> memory for all ranks. Perhaps in a similar way it is possible to modify >>> PetscMallocDumpLog. >>> >> >> You could walk the heap if you use the debugging malloc infrastructure in >> PETSc. However, I would really recommend >> trying out Massif from the valgrind toolset. Its designed for this and >> really nice. >> >> Thanks, >> >> Matt >> >> >>> Thank you, >>> >>> void petscMemoryLog(const char prefix[]) >>> { >>> FILE* fd; >>> char fname[PETSC_MAX_PATH_LEN]; >>> PetscMPIInt rank; >>> >>> MPI_Comm_rank(Chombo_MPI::comm,&rank); >>> >>> PetscLogDouble allocated; >>> PetscLogDouble resident; >>> PetscMallocGetCurrentUsage(&allocated); >>> PetscMemoryGetCurrentUsage(&resident); >>> PetscSNPrintf(fname,sizeof(fname),"%s.%d",prefix,rank); >>> PetscFOpen(PETSC_COMM_SELF,fname,"a",&fd); >>> >>> PetscFPrintf(PETSC_COMM_SELF,fd,"### PETSc memory footprint for rank >>> %d \n",rank); >>> PetscFPrintf(PETSC_COMM_SELF,fd,"[%d] Memory allocated by >>> PetscMalloc() %.0f bytes\n",rank,allocated); >>> PetscFPrintf(PETSC_COMM_SELF,fd,"[%d] RSS usage by entire process %.0f >>> KB\n",rank,resident); >>> PetscFClose(PETSC_COMM_SELF,fd); >>> } >>> >>> Best regards, >>> Andrey Ovsyannikov, Ph.D. >>> Postdoctoral Fellow >>> NERSC Division >>> Lawrence Berkeley National Laboratory >>> 510-486-7880 >>> aovsyannikov at lbl.gov >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > > > -- > Andrey Ovsyannikov, Ph.D. > Postdoctoral Fellow > NERSC Division > Lawrence Berkeley National Laboratory > 510-486-7880 > aovsyannikov at lbl.gov > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon Nov 30 19:47:08 2015 From: jed at jedbrown.org (Jed Brown) Date: Mon, 30 Nov 2015 18:47:08 -0700 Subject: [petsc-users] [petsc-maint] Memory usage function: output for all ranks In-Reply-To: References: Message-ID: <87610jym77.fsf@jedbrown.org> Andrey Ovsyannikov writes: > Thanks for your quick response. I like Massif tool and I have been using it > recently. However, I was not able to run Valgrind for large jobs. I am > interested in memory analysis of large scale runs with more than 1000 MPI > ranks. PetscMemoryGetCurrentUsage() works fine for this puprpose but it > does not provide details where I allocate memory. Maybe it would beneficial > for PETSc community to have some tool/function from PETSc itself. Why do you want data from every rank? PETSc usually tries to avoid diagnostic output that scales with number of processes because it sneaks up on people and causes crashes or huge IO costs that wastes their time. That which is okay at 1k ranks may be unacceptable at 1M ranks. Would it be sufficient to compute some statistics, max/min, or the like? -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From bsmith at mcs.anl.gov Mon Nov 30 20:09:50 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 30 Nov 2015 20:09:50 -0600 Subject: [petsc-users] [petsc-maint] Memory usage function: output for all ranks In-Reply-To: <87610jym77.fsf@jedbrown.org> References: <87610jym77.fsf@jedbrown.org> Message-ID: <4D5AA86F-C82E-47A0-9E0C-6A4255D94EB2@mcs.anl.gov> PETSc reporting of memory usage for objects is unfortunately not that great; for example distinguishing between temporary work space allocation vs memory that is kept for the life of the object is not always clear. Associating memory with particular objects requires the PETSc source code to mark each allocation appropriately and that is tedious and prone to error so we don't always do it right. I'm with Rich (and Matt), use valgrind on a handful of nodes, including node 0, 1,2, 4, 8, 16 that will provide the information you need. (the other processes will have very similar information that is not worth saving). Barry Making PETSc track its own memory usage is actually a pretty hard problem we don't have the resources to do properly. > On Nov 30, 2015, at 7:47 PM, Jed Brown wrote: > > Andrey Ovsyannikov writes: >> Thanks for your quick response. I like Massif tool and I have been using it >> recently. However, I was not able to run Valgrind for large jobs. I am >> interested in memory analysis of large scale runs with more than 1000 MPI >> ranks. PetscMemoryGetCurrentUsage() works fine for this puprpose but it >> does not provide details where I allocate memory. Maybe it would beneficial >> for PETSc community to have some tool/function from PETSc itself. > > Why do you want data from every rank? PETSc usually tries to avoid > diagnostic output that scales with number of processes because it sneaks > up on people and causes crashes or huge IO costs that wastes their time. > That which is okay at 1k ranks may be unacceptable at 1M ranks. Would > it be sufficient to compute some statistics, max/min, or the like?