[petsc-users] KSP changes for successive solver
Barry Smith
bsmith at mcs.anl.gov
Wed Jul 29 13:25:14 CDT 2015
For the default PCMG using PCREDUNDANT on the coarse level with LU
VecScatterBegin 2356 1.0 1.6390e+01 4.7 0.00e+00 0.0 5.9e+09 1.7e+01 0.0e+00 2 0 98 99 0 2 0 98 99 0 0
VecScatterEnd 2356 1.0 4.1647e+02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 69 0 0 0 0 69 0 0 0 0 0
so the gathering together of the vector values from all processes to the one process is killing performance (presumably it is just using the "default" VecScatter so sending individual messages to all processes. Terribly slow. If the VecScatter were smart enough to switch to an alltoall here it would actually help an enormous amount).
---------
For the DMDAREPART the two sets of VecScatter is still killing you
VecScatterBegin 2907 1.0 2.7453e-02 2.1 0.00e+00 0.0 9.2e+07 7.6e+02 0.0e+00 0 0 92 99 0 0 0 98 99 0 0
VecScatterEnd 2907 1.0 1.8748e-01 3.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
VecScatterBegin 1393 3.0 3.2119e-02112.6 0.00e+00 0.0 5.9e+06 3.2e+01 0.0e+00 0 0 6 0 0 0 0 99 99 0 0
VecScatterEnd 1393 3.0 3.2946e-01 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 99 0 0 0 0 0
The first scatter is from the 8192 to the 128 and the second is from the 128 to the 1. Fixing the first scatter would require a little thought but fixing the second (from 128 to 1) is the same fix needed for the default PCMG using PCREDUNDANT.
The code in PCSetUp_Redundant() is
if (!red->scatterin) { /* efficiency of scatterin is independent from psubcomm_type! */
IS is1,is2;
PetscInt *idx1,*idx2,i,j,k;
ierr = MatCreateVecs(pc->pmat,&x,0);CHKERRQ(ierr);
ierr = VecGetSize(x,&M);CHKERRQ(ierr);
ierr = VecGetOwnershipRange(x,&mstart,&mend);CHKERRQ(ierr);
mlocal = mend - mstart;
ierr = PetscMalloc2(red->psubcomm->n*mlocal,&idx1,red->psubcomm->n*mlocal,&idx2);CHKERRQ(ierr);
j = 0;
for (k=0; k<red->psubcomm->n; k++) {
for (i=mstart; i<mend; i++) {
idx1[j] = i;
idx2[j++] = i + M*k;
}
}
ierr = ISCreateGeneral(comm,red->psubcomm->n*mlocal,idx1,PETSC_COPY_VALUES,&is1);CHKERRQ(ierr);
ierr = ISCreateGeneral(comm,red->psubcomm->n*mlocal,idx2,PETSC_COPY_VALUES,&is2);CHKERRQ(ierr);
ierr = VecScatterCreate(x,is1,red->xdup,is2,&red->scatterin);CHKERRQ(ierr);
ierr = ISDestroy(&is1);CHKERRQ(ierr);
ierr = ISDestroy(&is2);CHKERRQ(ierr);
/* Impl below is good for PETSC_SUBCOMM_INTERLACED (no inter-process communication) and PETSC_SUBCOMM_CONTIGUOUS (communication within subcomm) */
ierr = ISCreateStride(comm,mlocal,mstart+ red->psubcomm->color*M,1,&is1);CHKERRQ(ierr);
ierr = ISCreateStride(comm,mlocal,mstart,1,&is2);CHKERRQ(ierr);
ierr = VecScatterCreate(red->xdup,is1,x,is2,&red->scatterout);CHKERRQ(ierr);
ierr = ISDestroy(&is1);CHKERRQ(ierr);
ierr = ISDestroy(&is2);CHKERRQ(ierr);
ierr = PetscFree2(idx1,idx2);CHKERRQ(ierr);
ierr = VecDestroy(&x);CHKERRQ(ierr);
The simplest way to fix the code, but it is hacky and not PETSc style, is to not use the VecScatter here at all but instead use custom calls directly to MPI collectives. Or make VecScatterCreate() smart enough to detect these special cases and use them.
Ideas on how to proceed? From anyone?
Barry
> On Jul 29, 2015, at 12:45 PM, Michele Rosso <mrosso at uci.edu> wrote:
>
> Hi Barry,
>
> I tried what you suggested:
>
> 1) 5 levels of MG + defaults at the coarse level (PCREDUNDANT)
> 2) 5 levels of MG + 2 levels of MG via DMDAREPART + defaults at the coarse level (PCREDUNDANT)
>
> I attached ksp_view and log_summary for both cases.
> The use of PCREDUNDAND halves the time for case 1 ( from ~ 20 sec per solve to ~ 10 sec per solve ), while it seems not having much effect on case 2.
> Any thoughts on this?
>
> Thanks,
> Michele
>
>
> On Sat, 2015-07-25 at 22:18 -0500, Barry Smith wrote:
>> This dmdarepart business, which I am guessing is running PCMG on smaller sets of processes with a DMDDA on that smaller set of processes for a coarse problem is a fine idea but you should keep in mind the rule of thumb that that parallel iterative (and even more direct) solvers don't do well we there is roughly 10,000 or fewer degrees of freedom per processor. So you should definitely not be using SuperLU_DIST in parallel to solve a problem with 1048 degrees of freedom on 128 processes, just use PCREDUNDANT and its default (sequential) LU. That should be faster.
>>
>> Barry
>>
>>
>> > On Jul 25, 2015, at 10:09 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> >
>> >
>> > Don't use
>> >
>> > -mg_coarse_pc_factor_mat_solver_package superlu_dist
>> > -mg_coarse_pc_type lu
>> >
>> > with 8000+ processes and 1 degree of freedom per process SuperLU_DIST will be terrible. Just leave the defaults for this and send the -log_summary
>> >
>> > Barry
>> >
>> >> On Jul 24, 2015, at 2:44 PM, Michele Rosso <mrosso at uci.edu> wrote:
>> >>
>> >> Barry,
>> >>
>> >> I attached ksp_view and log_summary for two different setups:
>> >>
>> >> 1) Plain MG on 5 levels + LU at the coarse level (files ending in mg5)
>> >> 2) Plain MG on 5 levels + custom PC + LU at the coarse level (files ending in mg7)
>> >>
>> >> The custom PC works on a subset of processes, thus allowing to use two more levels of MG, for a total of 7.
>> >> Case 1) is extremely slow ( ~ 20 sec per solve ) and converges in 21 iterations.
>> >> Case 2) is way faster ( ~ 0.25 sec per solve ) and converges in 29 iterations.
>> >>
>> >> Thanks for your help!
>> >>
>> >> Michele
>> >>
>> >>
>> >> On Fri, 2015-07-24 at 13:56 -0500, Barry Smith wrote:
>> >>> The coarse problem for the PCMG (geometric multigrid) is
>> >>>
>> >>> Mat Object: 8192 MPI processes
>> >>> type: mpiaij
>> >>> rows=8192, cols=8192
>> >>>
>> >>> then it tries to solve it with algebraic multigrid on 8192 processes (which is completely insane). A lot of the time is spent in setting up the algebraic multigrid (not surprisingly).
>> >>>
>> >>> 8192 is kind of small to parallelize. Please run the same code but with the default coarse grid problem instead of PCGAMG and send us the -log_summary again
>> >>>
>> >>> Barry
>> >>>
>> >>>
>> >>>> On Jul 24, 2015, at 1:35 PM, Michele Rosso <mrosso at uci.edu> wrote:
>> >>>>
>> >>>> Hi Mark and Barry,
>> >>>>
>> >>>> I am sorry for my late reply: it was a busy week!
>> >>>> I run a test case for a larger problem with as many levels (i.e. 5) of MG I could and GAMG as PC at the coarse level. I attached the output of info ( after grep for "gmag"), ksp_view and log_summary.
>> >>>> The solve takes about 2 seconds on 8192 cores, which is way too much. The number of iterations to convergence is 24.
>> >>>> I hope there is a way to speed it up.
>> >>>>
>> >>>> Thanks,
>> >>>> Michele
>> >>>>
>> >>>>
>> >>>> On Fri, 2015-07-17 at 09:38 -0400, Mark Adams wrote:
>> >>>>>
>> >>>>>
>> >>>>> On Thu, Jul 16, 2015 at 8:18 PM, Michele Rosso <mrosso at uci.edu> wrote:
>> >>>>> Barry,
>> >>>>>
>> >>>>> thank you very much for the detailed answer. I tried what you suggested and it works.
>> >>>>> So far I tried on a small system but the final goal is to use it for very large runs. How does PCGAMG compares to PCMG as far as performances and scalability are concerned?
>> >>>>> Also, could you help me to tune the GAMG part ( my current setup is in the attached ksp_view.txt file )?
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> I am going to add this to the document today but you can run with -info. This is very noisy so you might want to do the next step at run time. Then grep on GAMG. This will be about 20 lines. Send that to us and we can go from there.
>> >>>>>
>> >>>>>
>> >>>>> Mark
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> I also tried to use superlu_dist for the LU decomposition on mg_coarse_mg_sub_
>> >>>>> -mg_coarse_mg_coarse_sub_pc_type lu
>> >>>>> -mg_coarse_mg_coarse_sub_pc_factor_mat_solver_package superlu_dist
>> >>>>>
>> >>>>> but I got an error:
>> >>>>>
>> >>>>> ****** Error in MC64A/AD. INFO(1) = -2
>> >>>>> ****** Error in MC64A/AD. INFO(1) = -2
>> >>>>> ****** Error in MC64A/AD. INFO(1) = -2
>> >>>>> ****** Error in MC64A/AD. INFO(1) = -2
>> >>>>> ****** Error in MC64A/AD. INFO(1) = -2
>> >>>>> ****** Error in MC64A/AD. INFO(1) = -2
>> >>>>> ****** Error in MC64A/AD. INFO(1) = -2
>> >>>>> symbfact() error returns 0
>> >>>>> symbfact() error returns 0
>> >>>>> symbfact() error returns 0
>> >>>>> symbfact() error returns 0
>> >>>>> symbfact() error returns 0
>> >>>>> symbfact() error returns 0
>> >>>>> symbfact() error returns 0
>> >>>>>
>> >>>>>
>> >>>>> Thank you,
>> >>>>> Michele
>> >>>>>
>> >>>>>
>> >>>>> On Thu, 2015-07-16 at 18:07 -0500, Barry Smith wrote:
>> >>>>>>
>> >>>>>>> On Jul 16, 2015, at 5:42 PM, Michele Rosso <mrosso at uci.edu> wrote:
>> >>>>>>>
>> >>>>>>> Barry,
>> >>>>>>>
>> >>>>>>> thanks for your reply. So if I want it fixed, I will have to use the master branch, correct?
>> >>>>>>
>> >>>>>>
>> >>>>>> Yes, or edit mg.c and remove the offending lines of code (easy enough).
>> >>>>>>
>> >>>>>>>
>> >>>>>>> On a side note, what I am trying to achieve is to be able to use how many levels of MG I want, despite the limitation imposed by the local number of grid nodes.
>> >>>>>>
>> >>>>>>
>> >>>>>> I assume you are talking about with DMDA? There is no generic limitation for PETSc's multigrid, it is only with the way the DMDA code figures out the interpolation that causes a restriction.
>> >>>>>>
>> >>>>>>
>> >>>>>>> So far I am using a borrowed code that implements a PC that creates a sub communicator and perform MG on it.
>> >>>>>>> While reading the documentation I found out that PCMGSetLevels takes in an optional array of communicators. How does this work?
>> >>>>>>
>> >>>>>>
>> >>>>>> It doesn't work. It was an idea that never got pursued.
>> >>>>>>
>> >>>>>>
>> >>>>>>> Can I can simply define my matrix and rhs on the fine grid as I would do normally ( I do not use kspsetoperators and kspsetrhs ) and KSP would take care of it by using the correct communicator for each level?
>> >>>>>>
>> >>>>>>
>> >>>>>> No.
>> >>>>>>
>> >>>>>> You can use the PCMG geometric multigrid with DMDA for as many levels as it works and then use PCGAMG as the coarse grid solver. PCGAMG automatically uses fewer processes for the coarse level matrices and vectors. You could do this all from the command line without writing code.
>> >>>>>>
>> >>>>>> For example if your code uses a DMDA and calls KSPSetDM() use for example -da_refine 3 -pc_type mg -pc_mg_galerkin -mg_coarse_pc_type gamg -ksp_view
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> Barry
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>>
>> >>>>>>> Thanks,
>> >>>>>>> Michele
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Thu, 2015-07-16 at 17:30 -0500, Barry Smith wrote:
>> >>>>>>>> Michel,
>> >>>>>>>>
>> >>>>>>>> This is a very annoying feature that has been fixed in master
>> >>>>>>>> http://www.mcs.anl.gov/petsc/developers/index.html
>> >>>>>>>> I would like to have changed it in maint but Jed would have a shit-fit :-) since it changes behavior.
>> >>>>>>>>
>> >>>>>>>> Barry
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>> On Jul 16, 2015, at 4:53 PM, Michele Rosso <mrosso at uci.edu> wrote:
>> >>>>>>>>>
>> >>>>>>>>> Hi,
>> >>>>>>>>>
>> >>>>>>>>> I am performing a series of solves inside a loop. The matrix for each solve changes but not enough to justify a rebuilt of the PC at each solve.
>> >>>>>>>>> Therefore I am using KSPSetReusePreconditioner to avoid rebuilding unless necessary. The solver is CG + MG with a custom PC at the coarse level.
>> >>>>>>>>> If KSP is not updated each time, everything works as it is supposed to.
>> >>>>>>>>> When instead I allow the default PETSc behavior, i.e. updating PC every time the matrix changes, the coarse level KSP , initially set to PREONLY, is changed into GMRES
>> >>>>>>>>> after the first solve. I am not sure where the problem lies (my PC or PETSc), so I would like to have your opinion on this.
>> >>>>>>>>> I attached the ksp_view for the 2 successive solve and the options stack.
>> >>>>>>>>>
>> >>>>>>>>> Thanks for your help,
>> >>>>>>>>> Michel
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> <ksp_view.txt><petsc_options.txt>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>
>> >>>> <info.txt><ksp_view.txt><log_gamg.txt>
>> >>>
>> >>>
>> >>>
>> >>
>> >> <ksp_view_mg5.txt><ksp_view_mg7.txt><log_mg5.txt><log_mg7.txt>
>> >
>>
>>
>>
>
> <ksp_view_dmdarepart.txt><ksp_view_mg.txt><log_dmdarepart.txt><log_mg.txt>
More information about the petsc-users
mailing list