[petsc-users] PCASMType
Barry Smith
bsmith at mcs.anl.gov
Fri Aug 5 15:27:10 CDT 2016
I looked at the code (and read the manual page better)
PC_ASM_BASIC - full interpolation and restriction
PC_ASM_RESTRICT - full restriction, local processor interpolation
PC_ASM_INTERPOLATE - full interpolation, local processor restriction
PC_ASM_NONE - local processor restriction and interpolation
It is not doing what you and I assumed it is doing. The restrict and interpolate are only short circuited (skipped) across processes any restriction and interpolation within an MPI process is always done. Thus in sequential runs the different variants will make no difference. I don't think I would have written it this way.
Sorry I wasted your time, but it doesn't look like there is anything useful for you with PCASM; it needs to be completely refactored.
Barry
> On Aug 5, 2016, at 1:26 AM, Boyce Griffith <griffith at cims.nyu.edu> wrote:
>
>
>> On Aug 4, 2016, at 9:52 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>
>>
>> The magic handling of _1_ etc is all done in PetscOptionsFindPair_Private() so you need to put a break point in that routine and see why the requested value is not located.
>
> I haven’t tracked down the source of the problem with using _1_ etc, but I have checked to see what happens if I switch between basic/restrict/interpolate/none “manually” on each level, and I still see the same results for all choices.
>
> I’ve checked the IS’es and am reasonably confident that they are being generated correctly the the “overlap” and “non-overlap” regions. It is definitely the case that the overlap region contains the non-overlap regions, and the overlap region is bigger (by the proper amount) from the non-overlap region.
>
> It looks like ksp/ksp/examples/tutorials/ex8.c uses PCASMSetLocalSubdomains to set up the subdomains for ASM. If I run this example using, e.g.,
>
> ./ex8 -m 100 -n 100 -Mdomains 8 -Ndomains 8 -user_set_subdomains -ksp_rtol 1.0e-3 -ksp_monitor -pc_asm_type XXXX
>
> I get the same exact results for all different ASM types. I checked (using -ksp_view) that the ASM type settings were being honored. Are these subdomains not being setup to include overlaps (in which case I guess all ASM versions would yield the same results)?
>
> Thanks,
>
> — Boyce
>
>>
>> Barry
>>
>>
>>> On Aug 4, 2016, at 9:46 PM, Boyce Griffith <griffith at cims.nyu.edu> wrote:
>>>
>>>
>>>> On Aug 4, 2016, at 9:41 PM, Boyce Griffith <griffith at cims.nyu.edu> wrote:
>>>>
>>>>
>>>>> On Aug 4, 2016, at 9:26 PM, Boyce Griffith <griffith at cims.nyu.edu> wrote:
>>>>>
>>>>>>
>>>>>> On Aug 4, 2016, at 9:01 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>>>>
>>>>>>>
>>>>>>> On Aug 4, 2016, at 8:51 PM, Boyce Griffith <griffith at cims.nyu.edu> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> On Aug 4, 2016, at 8:42 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> History,
>>>>>>>>
>>>>>>>> 1) I originally implemented the ASM with one subdomain per process
>>>>>>>> 2) easily extended to support multiple domain per process
>>>>>>>> 3) added -pc_asm_type restrict etc but it only worked for one subdomain per process because it took advantage of the fact that
>>>>>>>> restrict etc could be achieved by simply dropping the parallel communication in the vector scatters
>>>>>>>> 4) Matt didn't like the restriction to one process per subdomain so he added an additional argument to PCASMSetLocalSubdomains() that allowed passing in the overlapping and non-overlapping regions of each domain (foolishly calling the non-overlapping index set is_local even though local has nothing to do with), so that the restrict etc could be handled.
>>>>>>>>
>>>>>>>> Unfortunately IMHO Matt made a mess of things because if you use things like -pc_asm_blocks n or -pc_asm_overlap 1 etc it does not handle the -pc_asm_type restrict since it cannot track the is vs is_local. The code needs to be refactored so that things like -pc_asm_blocks and -pc_asm_overlap 1 can track the is vs is_local index sets properly when the -pc_asm_type is set. Also the name is_local needs to be changed to something meaningfully like is_nonoverlapping This refactoring would also result in easier cleaner code then is currently there.
>>>>>>>>
>>>>>>>> So basically until the PCASM is refactored properly to handle restrict etc you are stuck with being able to use the restrict etc ONLY if you specifically supply the overlapping and non overlapping domains yourself with PCASMSetLocalSubdomains and curse at Matt everyday like we all do.
>>>>>>>
>>>>>>> OK, got it. The reason I’m asking is that we are using PCASM in a custom smoother, and I noticed that basic/restrict/interpolate/none all give identical results. We are using PCASMSetLocalSubdomains to set up the subdomains.
>>>>>>
>>>>>> But are you setting different is and is_local (stupid name) and not have PETSc computing the overlap in your custom code? If you are setting them differently and not having PETSc compute overlap but getting identical convergence then something is wrong and you likely have to run in the debugger to insure that restrict etc is properly being set and used.
>>>>>
>>>>> Yes we are computing overlapping and non-overlapping IS’es.
>>>>>
>>>>> I just double-checked, and somehow the ASMType setting is not making it from the command line into the solver configuration — sorry, I should have checked this more carefully before emailing the list. (I thought that the command line options were being captured correctly, since I am able to control the PC type and all of the sub-KSP/sub-PC settings.)
>>>>
>>>> OK, so here is what appears to be happening. These solvers are named things like “stokes_pc_level_0_”, “stokes_pc_level_1_”, … . If I use the command-line argument
>>>>
>>>> -stokes_ib_pc_level_0_pc_asm_type basic
>>>>
>>>> then the ASM settings are used, but if I do:
>>>>
>>>> -stokes_ib_pc_level_pc_asm_type basic
>>>>
>>>> they are ignored. Any ideas? :-)
>>>
>>> I should have said: we are playing around with a lot of different command line options that are being collectively applied to all of the level solvers, and these options for ASM are the only ones I’ve encountered so far that have to include the level number to have an effect.
>>>
>>> Thanks,
>>>
>>> — Boyce
>>>
>>>>
>>>> Thanks,
>>>>
>>>> — Boyce
>>>>
>>>>>>> BTW, there is also this bit (which was easy to overlook in all of the repetitive convergence histories):
>>>>>>
>>>>>> Yeah, better one question per email or we will miss them.
>>>>>>
>>>>>> There is nothing that says that multiplicative will ALWAYS beat additive, though intuitively you expect it to.
>>>>>
>>>>> OK, so similar story as above: we have a custom MSM that, when used as a MG smoother, gives convergence rates that are about 2x PCASM, whereas when we use PCASM with MULTIPLICATIVE, it doesn’t seem to help.
>>>>>
>>>>> However, now I am questioning whether the settings are getting propagated into PCASM… I’ll need to take another look.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> — Boyce
>>>>>
>>>>>>
>>>>>> Barry
>>>>>>
>>>>>>>
>>>>>>>>> Also, the MULTIPLICATIVE variant does not seem to behave as I would expect --- for this same example, if you switch from ADDITIVE to MULTIPLICATIVE, the solver converges slightly more slowly:
>>>>>>>>>
>>>>>>>>> $ ./ex2 -m 32 -n 32 -pc_type asm -pc_asm_blocks 8 -ksp_view -ksp_monitor_true_residual -pc_asm_local_type MULTIPLICATIVE
>>>>>>>>> 0 KSP preconditioned resid norm 7.467363913958e+00 true resid norm 1.166190378969e+01 ||r(i)||/||b|| 1.000000000000e+00
>>>>>>>>> 1 KSP preconditioned resid norm 2.878371937592e+00 true resid norm 3.646367718253e+00 ||r(i)||/||b|| 3.126734522949e-01
>>>>>>>>> 2 KSP preconditioned resid norm 1.666575161021e+00 true resid norm 1.940699059619e+00 ||r(i)||/||b|| 1.664135714560e-01
>>>>>>>>> 3 KSP preconditioned resid norm 1.086140238220e+00 true resid norm 1.191473615464e+00 ||r(i)||/||b|| 1.021680196433e-01
>>>>>>>>> 4 KSP preconditioned resid norm 7.939217314942e-01 true resid norm 8.059317628307e-01 ||r(i)||/||b|| 6.910807852344e-02
>>>>>>>>> 5 KSP preconditioned resid norm 6.265169154675e-01 true resid norm 5.942294290555e-01 ||r(i)||/||b|| 5.095475316653e-02
>>>>>>>>> 6 KSP preconditioned resid norm 5.164999302721e-01 true resid norm 4.585844476718e-01 ||r(i)||/||b|| 3.932329197203e-02
>>>>>>>>> 7 KSP preconditioned resid norm 4.472399844370e-01 true resid norm 3.884049472908e-01 ||r(i)||/||b|| 3.330544946136e-02
>>>>>>>>> 8 KSP preconditioned resid norm 3.445446366213e-01 true resid norm 4.008290378967e-01 ||r(i)||/||b|| 3.437080644166e-02
>>>>>>>>> 9 KSP preconditioned resid norm 1.987509894375e-01 true resid norm 2.619628925380e-01 ||r(i)||/||b|| 2.246313271505e-02
>>>>>>>>> 10 KSP preconditioned resid norm 1.084551743751e-01 true resid norm 1.354891040098e-01 ||r(i)||/||b|| 1.161809481995e-02
>>>>>>>>> 11 KSP preconditioned resid norm 6.108303419460e-02 true resid norm 7.252267103275e-02 ||r(i)||/||b|| 6.218767736436e-03
>>>>>>>>> 12 KSP preconditioned resid norm 3.641579250431e-02 true resid norm 4.069996187932e-02 ||r(i)||/||b|| 3.489992938829e-03
>>>>>>>>> 13 KSP preconditioned resid norm 2.424898818735e-02 true resid norm 2.469590201945e-02 ||r(i)||/||b|| 2.117656127577e-03
>>>>>>>>> 14 KSP preconditioned resid norm 1.792399391125e-02 true resid norm 1.622090905110e-02 ||r(i)||/||b|| 1.390931475995e-03
>>>>>>>>> 15 KSP preconditioned resid norm 1.320657155648e-02 true resid norm 1.336753101147e-02 ||r(i)||/||b|| 1.146256327657e-03
>>>>>>>>> 16 KSP preconditioned resid norm 7.398524571182e-03 true resid norm 9.747691680405e-03 ||r(i)||/||b|| 8.358576657974e-04
>>>>>>>>> 17 KSP preconditioned resid norm 3.043993613039e-03 true resid norm 3.848714422908e-03 ||r(i)||/||b|| 3.300245390731e-04
>>>>>>>>> 18 KSP preconditioned resid norm 1.767867968946e-03 true resid norm 1.736586340170e-03 ||r(i)||/||b|| 1.489110501585e-04
>>>>>>>>> 19 KSP preconditioned resid norm 1.088792656005e-03 true resid norm 1.307506936484e-03 ||r(i)||/||b|| 1.121177948355e-04
>>>>>>>>> 20 KSP preconditioned resid norm 4.622653682144e-04 true resid norm 5.718427718734e-04 ||r(i)||/||b|| 4.903511315013e-05
>>>>>>>>> 21 KSP preconditioned resid norm 2.591703287585e-04 true resid norm 2.690982547548e-04 ||r(i)||/||b|| 2.307498497738e-05
>>>>>>>>> 22 KSP preconditioned resid norm 1.596527181997e-04 true resid norm 1.715846687846e-04 ||r(i)||/||b|| 1.471326396435e-05
>>>>>>>>> 23 KSP preconditioned resid norm 1.006766623019e-04 true resid norm 1.044525361282e-04 ||r(i)||/||b|| 8.956731080268e-06
>>>>>>>>> 24 KSP preconditioned resid norm 5.349814270060e-05 true resid norm 6.598682341705e-05 ||r(i)||/||b|| 5.658323427037e-06
>>>>>>>>> KSP Object: 1 MPI processes
>>>>>>>>> type: gmres
>>>>>>>>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>>>>>>>>> GMRES: happy breakdown tolerance 1e-30
>>>>>>>>> maximum iterations=10000, initial guess is zero
>>>>>>>>> tolerances: relative=9.18274e-06, absolute=1e-50, divergence=10000.
>>>>>>>>> left preconditioning
>>>>>>>>> using PRECONDITIONED norm type for convergence test
>>>>>>>>> PC Object: 1 MPI processes
>>>>>>>>> type: asm
>>>>>>>>> Additive Schwarz: total subdomain blocks = 8, amount of overlap = 1
>>>>>>>>> Additive Schwarz: restriction/interpolation type - BASIC
>>>>>>>>> Additive Schwarz: local solve composition type - MULTIPLICATIVE
>>>>>>>>> Local solve is same for all blocks, in the following KSP and PC objects:
>>>>>>>>> KSP Object: (sub_) 1 MPI processes
>>>>>>>>> type: preonly
>>>>>>>>> maximum iterations=10000, initial guess is zero
>>>>>>>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
>>>>>>>>> left preconditioning
>>>>>>>>> using NONE norm type for convergence test
>>>>>>>>> PC Object: (sub_) 1 MPI processes
>>>>>>>>> type: icc
>>>>>>>>> 0 levels of fill
>>>>>>>>> tolerance for zero pivot 2.22045e-14
>>>>>>>>> using Manteuffel shift [POSITIVE_DEFINITE]
>>>>>>>>> matrix ordering: natural
>>>>>>>>> factor fill ratio given 1., needed 1.
>>>>>>>>> Factored matrix follows:
>>>>>>>>> Mat Object: 1 MPI processes
>>>>>>>>> type: seqsbaij
>>>>>>>>> rows=160, cols=160
>>>>>>>>> package used to perform factorization: petsc
>>>>>>>>> total: nonzeros=443, allocated nonzeros=443
>>>>>>>>> total number of mallocs used during MatSetValues calls =0
>>>>>>>>> block size is 1
>>>>>>>>> linear system matrix = precond matrix:
>>>>>>>>> Mat Object: 1 MPI processes
>>>>>>>>> type: seqaij
>>>>>>>>> rows=160, cols=160
>>>>>>>>> total: nonzeros=726, allocated nonzeros=726
>>>>>>>>> total number of mallocs used during MatSetValues calls =0
>>>>>>>>> not using I-node routines
>>>>>>>>> linear system matrix = precond matrix:
>>>>>>>>> Mat Object: 1 MPI processes
>>>>>>>>> type: seqaij
>>>>>>>>> rows=1024, cols=1024
>>>>>>>>> total: nonzeros=4992, allocated nonzeros=5120
>>>>>>>>> total number of mallocs used during MatSetValues calls =0
>>>>>>>>> not using I-node routines
>>>>>>>>> Norm of error 0.000292304 iterations 24
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> -- Boyce
>>>>
>>>
>
More information about the petsc-users
mailing list