[petsc-users] Need help improving working PETSC code
Ganesh Vijayakumar
ganesh.iitm at gmail.com
Thu Jul 9 14:56:34 CDT 2015
Thanks. I apologize for any stupid assumptions/mistakes. One thing that
became clear to me was to get the machinery working first, improve later. I
respond to each of your questions below them.
On Wed, Jul 8, 2015 at 4:27 PM Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> Please provide a bit more detail about the operator. Is it a pressure
> solver for CFD? Cell centered? Does the matrix has a null space of the
> constant functions?
The operator.. yes it's a pressure solver for a PISO/SIMPLE type algorithm.
The underlying code is OpenFOAM.
Reg. null space of constant functions, let me verify ... Do you mean Ax = b
becomes A (x_1 + x_2) = (c+d) where Ax_1 = c and A x_2 = d where d is
constant or zero right? The basic equation is a poisson equation, contains
only the 2nd derivative. So I suppose if the entire solution is offset by a
constant, it would still satisfy the equations. However, the problem does
have Dirichlet boundary conditions on one boundary. So I'm thinking no null
space.
> Is it the same linear system for each "time-step?" in the CFD solver or
> different?
>
Different. The structure of the matrix doesn't change every time step (for
now), however the values do.
How many iterations is hypre BoomerAMG typically taking?
>
Around 5 after I relaxed the tolerances (Used to be around 10-13 when I
had the tighter tolerances). I solve the system of equations 3 times per
"time-step" like here.
Mat Object: 256 MPI processes
type: mpiaij
rows=12254823, cols=12254823
total: nonzeros=7.65502e+07, allocated nonzeros=1.83822e+08
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
simple.corrNonOrtho() = 1 simple.nNonOrthCorr() = 2
1 KSP Residual norm 1.000000000000e+00
End residual = 0.000816181009378
Number of iterations = 4
simple.corrNonOrtho() = 2 simple.nNonOrthCorr() = 2
1 KSP Residual norm 1.000000000000e+00
End residual = 0.00118562245298
Number of iterations = 3
simple.corrNonOrtho() = 3 simple.nNonOrthCorr() = 2
1 KSP Residual norm 1.000000000000e+00
End residual = 1.15194716042e-06
Number of iterations = 5
This is an extremely tight tolerance, why do you set it so small?
> KSPSetTolerances(ksp, 1e-13, 1e-13, PETSC_DEFAULT, PETSC_DEFAULT) ;
>
Well, that was just a first start.. wanted to make sure it worked.. But
reducing the required tolerance to 1e-5, 1e-5 also doesn't change the run
time by much. 5 "time-steps" now take 250s compared to 260s earlier. As a
comparison, I can use OpenFOAM's inbuilt PCG solver + DIC preconditioner to
do the same in about 13s (Thirteen). Sure the convergence tolerances are
not quite the same, but I don't think that's the issue here. I'm trying to
do one better OpenFOAM's existing solver suite. I think a good first step
would be to match it. Technically I'm comparing apples and oranges by
comparing PCG + DIC vs. FGMRES + AMG, however, I was hoping I could better.
I'm open to changing pretty much anything.
>
> Send the output from a run with -log_summary so we can see where the
> time is being spent.
>
Thanks for this. This is below.
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r
-fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary:
----------------------------------------------
/scratch/02504/ganesh10/OpenFOAM/ganesh10-2.1.x/platforms/linux64IccDPOpt/bin/SRFSimpleFoamPETSC
on a sandybridge named c415-502.stampede.tacc.utexas.edu with 256
processors, by ganesh10 Thu
Jul 9 13:29:19 2015
Using Petsc Release Version 3.5.3, Jan, 31, 2015
Max Max/Min Avg Total
Time (sec): 2.473e+02 1.00143 2.472e+02
Objects: 5.210e+02 1.00000 5.210e+02
Flops: 1.256e+08 1.17291 1.184e+08 3.031e+10
Flops/sec: 5.086e+05 1.17366 4.790e+05 1.226e+08
MPI Messages: 3.956e+03 21.50000 1.167e+03 2.986e+05
MPI Message Lengths: 6.769e+06 4.61934 3.787e+03 1.131e+09
MPI Reductions: 3.680e+02 1.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N
--> 2N flops
and VecAXPY() for complex vectors of length N
--> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages
--- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts
%Total Avg %Total counts %Total
0: Main Stage: 2.4720e+02 100.0% 3.0311e+10 100.0% 2.986e+05
100.0% 3.787e+03 100.0% 3.670e+02 99.7%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this
phase
%M - percent messages in this phase %L - percent message lengths
in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec)
Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len
Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 484 482 184328584 0
Vector Scatter 1 1 1060 0
Matrix 3 3 10175808 0
Index Set 2 2 38884 0
Viewer 1 0 0 0
Krylov Solver 15 15 284400 0
Preconditioner 15 15 16440 0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 1.64032e-05
Average time for zero size MPI_Send(): 0.000232594
#PETSc Option Table entries:
-info blah
-log_summary
-mat_view ::ascii_info
-parallel
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-x=0 -with-pic
--with-external-packages-dir=/opt/apps/intel13/mvapich2_1_9/petsc/3.5/externalpackages
--with-mpi-compilers=1 --with-mpi-dir=/opt/apps/intel13/mvapich2/1.9
--with-scalar-type=real --with-shared-libraries=1 --with-precision=double
--with-hypre=1 --download-hypre --with-ml=1 --download-ml --with-ml=1
--download-ml --with-superlu_dist=1 --download-superlu_dist
--with-superlu=1 --download-superlu --with-parmetis=1 --download-parmetis
--with-metis=1 --download-metis --with-spai=1 --download-spai
--with-mumps=1 --download-mumps --with-parmetis=1 --download-parmetis
--with-metis=1 --download-metis --with-scalapack=1 --download-scalapack
--with-blacs=1 --download-blacs --with-spooles=1 --download-spooles
--with-hdf5=1 --with-hdf5-dir=/opt/apps/intel13/mvapich2_1_9/phdf5/1.8.9
--with-debugging=no
--with-blas-lapack-dir=/opt/apps/intel/13/composer_xe_2013.2.146/mkl
--with-mpiexec=mpirun_rsh --COPTFLAGS= --FOPTFLAGS= --CXXOPTFLAGS=
-----------------------------------------
Libraries compiled on Thu Apr 2 10:06:57 2015 on
staff.stampede.tacc.utexas.edu
Machine characteristics:
Linux-2.6.32-431.17.1.el6.x86_64-x86_64-with-centos-6.6-Final
Using PETSc directory: /opt/apps/intel13/mvapich2_1_9/petsc/3.5
Using PETSc arch: sandybridge
-----------------------------------------
Using C compiler: /opt/apps/intel13/mvapich2/1.9/bin/mpicc -fPIC -wd1572
${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /opt/apps/intel13/mvapich2/1.9/bin/mpif90 -fPIC
${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths:
-I/opt/apps/intel13/mvapich2_1_9/petsc/3.5/sandybridge/include
-I/opt/apps/intel13/mvapich2_1_9/petsc/3.5/include
-I/opt/apps/intel13/mvapich2_1_9/petsc/3.5/include
-I/opt/apps/intel13/mvapich2_1_9/petsc/3.5/sandybridge/include
-I/opt/apps/intel13/mvapich2_1_9/phdf5/1.8.9/include
-I/opt/apps/intel13/mvapich2/1.9/include
ganesh
>
> Barry
>
>
>
> > On Jul 8, 2015, at 11:11 AM, Ganesh Vijayakumar <ganesh.iitm at gmail.com>
> wrote:
> >
> > Hello,
> >
> > First of all..thanks to the PETSC developers and everyone else
> contributing supporting material on the web.
> >
> > I need to solve a system of equations Ax =b, where A is symmetric,
> sparse, unstructured and in parallel as a part of a finite volume solver
> for CFD. Decomposition is already done. I initialize the matrix as MPIAIJ
> even though it's symmetric as that's the only thing I could get to work.
> >
> > MatCreate(PETSC_COMM_WORLD,&A);
> > MatSetType(A,MATMPIAIJ);
> > MatSetSizes(A,nCellsCurProc,nCellsCurProc,nTotalCells,nTotalCells);
> > MatMPIAIJSetPreallocation(A,10,PETSC_NULL,5,PETSC_NULL);
> > MatSetUp(A) ;
> > // Assemble matrix using MatSetValue.. assemble full matrix even though
> it's symmetric.
> > MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY) ;
> > MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY) ;
> > //Create solver
> > KSPCreate(PETSC_COMM_WORLD,&ksp);
> > KSPSetType(ksp,KSPFGMRES;)
> > KSPGetPC(ksp, &pc);
> > PCFactorSetShiftType(pc, MAT_SHIFT_POSITIVE_DEFINITE);
> >
> > //PCSetType(pc, PCML); //Trilinos - Multilevel
> >
> > //PCSetType(pc, PCBJACOBI) ; //Incomplete LU
> > //PetscOptionsSetValue("-sub_pc_type", "ilu") ;
> > KSPSetFromOptions(ksp);
> >
> > // PCSetType(pc, PCASM); //Additive Schwarz Methods
> > // PCASMSetOverlap(pc,overlap);
> >
> > PCSetType(pc, PCHYPRE); //HYPRE
> > PCHYPRESetType(pc, "boomeramg");
> >
> > //PCHYPRESetType(pc, "parasails");
> >
> > KSPSetPC(ksp, pc);
> > KSPSetTolerances(ksp, 1e-13, 1e-13, PETSC_DEFAULT, PETSC_DEFAULT) ;
> > KSPSetInitialGuessNonzero(ksp, PETSC_TRUE);
> > KSPSetOperators(ksp, A, A);
> > KSPMonitorDefault(ksp, 1, 1, pVoid) ;
> >
> > KSPSolve(ksp,b,x);
> > VecGetArray(x, &getResultArray) ;
> > KSPGetResidualNorm(ksp, &endNorm) ;
> > KSPGetIterationNumber(ksp, &nIterations);
> >
> > KSPDestroy(&ksp);
> >
> >
> > I know that this works and gives me the correct result. However,
> boomerAMG is proving to be too slow.. atleast the way I use it at 256
> processors with around 12 million unknowns. I need your advice on the
> preconditioner. Am I using it right? Is there anything else I could I be
> doing better?
> >
> > ganesh
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150709/6832d529/attachment-0001.html>
More information about the petsc-users
mailing list