[petsc-users] Need help improving working PETSC code
Barry Smith
bsmith at mcs.anl.gov
Thu Jul 9 18:32:25 CDT 2015
> On Jul 9, 2015, at 6:04 PM, Ganesh Vijayakumar <ganesh.iitm at gmail.com> wrote:
>
> Thanks.
>
> On Thu, Jul 9, 2015 at 6:46 PM Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> > On Jul 9, 2015, at 2:56 PM, Ganesh Vijayakumar <ganesh.iitm at gmail.com> wrote:
> >
> > Thanks. I apologize for any stupid assumptions/mistakes.
>
> You have not made any stupid assumptions/mistakes
>
> > One thing that became clear to me was to get the machinery working first, improve later. I respond to each of your questions below them.
> >
> > On Wed, Jul 8, 2015 at 4:27 PM Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> > Please provide a bit more detail about the operator. Is it a pressure solver for CFD? Cell centered? Does the matrix has a null space of the constant functions?
> >
> > The operator.. yes it's a pressure solver for a PISO/SIMPLE type algorithm. The underlying code is OpenFOAM.
>
> Make sure to read all the way down.
>
> Ok. I understand
>
> >
> > Reg. null space of constant functions, let me verify ... Do you mean Ax = b becomes A (x_1 + x_2) = (c+d) where Ax_1 = c and A x_2 = d where d is constant or zero right? The basic equation is a poisson equation, contains only the 2nd derivative. So I suppose if the entire solution is offset by a constant, it would still satisfy the equations. However, the problem does have Dirichlet boundary conditions on one boundary. So I'm thinking no null space.
>
> Yes, no null space
> >
> > Is it the same linear system for each "time-step?" in the CFD solver or different?
> >
> > Different. The structure of the matrix doesn't change every time step (for now), however the values do.
> >
> > How many iterations is hypre BoomerAMG typically taking?
> >
> > Around 5 after I relaxed the tolerances (Used to be around 10-13 when I had the tighter tolerances). I solve the system of equations 3 times per "time-step" like here.
> >
> > Mat Object: 256 MPI processes
> > type: mpiaij
> > rows=12254823, cols=12254823
> > total: nonzeros=7.65502e+07, allocated nonzeros=1.83822e+08
> > total number of mallocs used during MatSetValues calls =0
> > not using I-node (on process 0) routines
> > simple.corrNonOrtho() = 1 simple.nNonOrthCorr() = 2
> > 1 KSP Residual norm 1.000000000000e+00
> > End residual = 0.000816181009378
> > Number of iterations = 4
> > simple.corrNonOrtho() = 2 simple.nNonOrthCorr() = 2
> > 1 KSP Residual norm 1.000000000000e+00
> > End residual = 0.00118562245298
> > Number of iterations = 3
> > simple.corrNonOrtho() = 3 simple.nNonOrthCorr() = 2
> > 1 KSP Residual norm 1.000000000000e+00
> > End residual = 1.15194716042e-06
> > Number of iterations = 5
> >
> > This is an extremely tight tolerance, why do you set it so small? KSPSetTolerances(ksp, 1e-13, 1e-13, PETSC_DEFAULT, PETSC_DEFAULT) ;
> >
> > Well, that was just a first start.. wanted to make sure it worked.. But reducing the required tolerance to 1e-5, 1e-5 also doesn't change the run time by much. 5 "time-steps" now take 250s compared to 260s earlier. As a comparison, I can use OpenFOAM's inbuilt PCG solver + DIC preconditioner
>
> What is DIC?
>
> OpenFOAM says it's Diagonal Incomplete Cholesky. Quite frankly I've never understood what the diagonal part is. However, I know that the Incomplete Cholesky is a no-fill Cholesky that operates within a processor. As in, each processor won't try to access elements that are out of it. And by know, I mean I've worked the math out.
Ok, it is block Jacobi with ICC on each block (one per process) so -ksp_type cg -pc_type bjacobi -sub_pc_type icc with PETSc should give similar results to what they get.
>
> Where is all the data? It should list all the events and time it spends in each. Did you use PetscOptionsSetValue() to provide -log_summary? That won't work you need to pass it on the command line or in the PETSC_OPTIONS environmental variable or in a file called petscrc
>
> Ah.. see the main solver barfs if I pass these options in. So I have to use the other two options. Running this now. Will keep you posted soon.
>
> Well regardless, it is likely that the matrix is well conditioned enough that the large set up time of hypre BoomerAMG is killing performance. How many iterations in the PCG plus DIC taking?
>
> Around 100 each. OpenFOAM has a different method of normalizing it's residuals. I could figure out the equivalent for PETSC, but that's for a later date.
>
> FDICPCG: Solving for p, Initial residual = 1, Final residual = 0.000986548198903, No Iterations 80
> FDICPCG: Solving for p, Initial residual = 0.11287119259, Final residual = 0.000111241202221, No Iterations 98
> FDICPCG: Solving for p, Initial residual = 0.0204335526765, Final residual = 1.98222786415e-05, No Iterations 118
>
>
> BTW: you can also run with for example -ksp_type cg -pc_type jacobi or -ksp_type cg -pc_type bjacobi -sub_pc_type icc to get simpler preconditioners with much lower set up times. What do you get for those?
>
> Thanks. I ran into some trouble with the KSPCG part. If you didn't notice, I had commented these options out in my first email. So I was atleast on the right track.
>
> 1. How I do I tell PETSC that my matrix is symmetric. I tried setting my matrix as follows... but am apprehensive of it.
>
> MatCreateSBAIJ(PETSC_COMM_WORLD, 1, nCellsCurProc, nCellsCurProc, nTotalCells, nTotalCells, 10, NULL, 5, NULL, &A);
>
> Could I still use MatSetValue on both upper and lower diagonal part of the matrix. Will PETSC understand that it's redundant?
Yes, run with -mat_ignore_lower_triangular or call MatSetOption(mat,MAT_IGNORE_LOWER_TRIANGULAR,PETSC_TRUE)
>
> 2. Do I need PCFactorSetShiftType(pc,MAT_SHIFT_POSITIVE_DEFINITE); ?
I hope not. But you might.
>
> 3. What does KSPSetReusePreconditioner(ksp, PETSC_TRUE) do? Should I use it?
Not at first. What it does it not build a new preconditioner for each solve. If the matrix is changing "slowly" you can often get away with setting this for some number of linear solvers, then set it back to false for the next solve then set it to true again for some number of linear solvers. You could try it with hypre, say keeping it the same for 10, 50, 100 solves and see what happens time wise.
>
> ganesh
>
>
>
> Barry
>
>
>
> > ------------------------------------------------------------------------------------------------------------------------
> >
> > Memory usage is given in bytes:
> >
> > Object Type Creations Destructions Memory Descendants' Mem.
> > Reports information only for process 0.
> >
> > --- Event Stage 0: Main Stage
> >
> > Vector 484 482 184328584 0
> > Vector Scatter 1 1 1060 0
> > Matrix 3 3 10175808 0
> > Index Set 2 2 38884 0
> > Viewer 1 0 0 0
> > Krylov Solver 15 15 284400 0
> > Preconditioner 15 15 16440 0
> > ========================================================================================================================
> > Average time to get PetscTime(): 9.53674e-08
> > Average time for MPI_Barrier(): 1.64032e-05
> > Average time for zero size MPI_Send(): 0.000232594
> > #PETSc Option Table entries:
> > -info blah
> > -log_summary
> > -mat_view ::ascii_info
> > -parallel
> > #End of PETSc Option Table entries
> > Compiled without FORTRAN kernels
> > Compiled with full precision matrices (default)
> > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> > Configure options: --with-x=0 -with-pic --with-external-packages-dir=/opt/apps/intel13/mvapich2_1_9/petsc/3.5/externalpackages --with-mpi-compilers=1 --with-mpi-dir=/opt/apps/intel13/mvapich2/1.9 --with-scalar-type=real --with-shared-libraries=1 --with-precision=double --with-hypre=1 --download-hypre --with-ml=1 --download-ml --with-ml=1 --download-ml --with-superlu_dist=1 --download-superlu_dist --with-superlu=1 --download-superlu --with-parmetis=1 --download-parmetis --with-metis=1 --download-metis --with-spai=1 --download-spai --with-mumps=1 --download-mumps --with-parmetis=1 --download-parmetis --with-metis=1 --download-metis --with-scalapack=1 --download-scalapack --with-blacs=1 --download-blacs --with-spooles=1 --download-spooles --with-hdf5=1 --with-hdf5-dir=/opt/apps/intel13/mvapich2_1_9/phdf5/1.8.9 --with-debugging=no --with-blas-lapack-dir=/opt/apps/intel/13/composer_xe_2013.2.146/mkl --with-mpiexec=mpirun_rsh --COPTFLAGS= --FOPTFLAGS= --CXXOPTFLAGS=
> > -----------------------------------------
> > Libraries compiled on Thu Apr 2 10:06:57 2015 on staff.stampede.tacc.utexas.edu
> > Machine characteristics: Linux-2.6.32-431.17.1.el6.x86_64-x86_64-with-centos-6.6-Final
> > Using PETSc directory: /opt/apps/intel13/mvapich2_1_9/petsc/3.5
> > Using PETSc arch: sandybridge
> > -----------------------------------------
> >
> > Using C compiler: /opt/apps/intel13/mvapich2/1.9/bin/mpicc -fPIC -wd1572 ${COPTFLAGS} ${CFLAGS}
> > Using Fortran compiler: /opt/apps/intel13/mvapich2/1.9/bin/mpif90 -fPIC ${FOPTFLAGS} ${FFLAGS}
> > -----------------------------------------
> >
> > Using include paths: -I/opt/apps/intel13/mvapich2_1_9/petsc/3.5/sandybridge/include -I/opt/apps/intel13/mvapich2_1_9/petsc/3.5/include -I/opt/apps/intel13/mvapich2_1_9/petsc/3.5/include -I/opt/apps/intel13/mvapich2_1_9/petsc/3.5/sandybridge/include -I/opt/apps/intel13/mvapich2_1_9/phdf5/1.8.9/include -I/opt/apps/intel13/mvapich2/1.9/include
> >
> >
> > ganesh
> >
> >
> > Barry
> >
> >
> >
> > > On Jul 8, 2015, at 11:11 AM, Ganesh Vijayakumar <ganesh.iitm at gmail.com> wrote:
> > >
> > > Hello,
> > >
> > > First of all..thanks to the PETSC developers and everyone else contributing supporting material on the web.
> > >
> > > I need to solve a system of equations Ax =b, where A is symmetric, sparse, unstructured and in parallel as a part of a finite volume solver for CFD. Decomposition is already done. I initialize the matrix as MPIAIJ even though it's symmetric as that's the only thing I could get to work.
> > >
> > > MatCreate(PETSC_COMM_WORLD,&A);
> > > MatSetType(A,MATMPIAIJ);
> > > MatSetSizes(A,nCellsCurProc,nCellsCurProc,nTotalCells,nTotalCells);
> > > MatMPIAIJSetPreallocation(A,10,PETSC_NULL,5,PETSC_NULL);
> > > MatSetUp(A) ;
> > > // Assemble matrix using MatSetValue.. assemble full matrix even though it's symmetric.
> > > MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY) ;
> > > MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY) ;
> > > //Create solver
> > > KSPCreate(PETSC_COMM_WORLD,&ksp);
> > > KSPSetType(ksp,KSPFGMRES;)
> > > KSPGetPC(ksp, &pc);
> > > PCFactorSetShiftType(pc, MAT_SHIFT_POSITIVE_DEFINITE);
> > >
> > > //PCSetType(pc, PCML); //Trilinos - Multilevel
> > >
> > > //PCSetType(pc, PCBJACOBI) ; //Incomplete LU
> > > //PetscOptionsSetValue("-sub_pc_type", "ilu") ;
> > > KSPSetFromOptions(ksp);
> > >
> > > // PCSetType(pc, PCASM); //Additive Schwarz Methods
> > > // PCASMSetOverlap(pc,overlap);
> > >
> > > PCSetType(pc, PCHYPRE); //HYPRE
> > > PCHYPRESetType(pc, "boomeramg");
> > >
> > > //PCHYPRESetType(pc, "parasails");
> > >
> > > KSPSetPC(ksp, pc);
> > > KSPSetTolerances(ksp, 1e-13, 1e-13, PETSC_DEFAULT, PETSC_DEFAULT) ;
> > > KSPSetInitialGuessNonzero(ksp, PETSC_TRUE);
> > > KSPSetOperators(ksp, A, A);
> > > KSPMonitorDefault(ksp, 1, 1, pVoid) ;
> > >
> > > KSPSolve(ksp,b,x);
> > > VecGetArray(x, &getResultArray) ;
> > > KSPGetResidualNorm(ksp, &endNorm) ;
> > > KSPGetIterationNumber(ksp, &nIterations);
> > >
> > > KSPDestroy(&ksp);
> > >
> > >
> > > I know that this works and gives me the correct result. However, boomerAMG is proving to be too slow.. atleast the way I use it at 256 processors with around 12 million unknowns. I need your advice on the preconditioner. Am I using it right? Is there anything else I could I be doing better?
> > >
> > > ganesh
> >
More information about the petsc-users
mailing list