<div dir="ltr"><br><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
Message: 5<br>
Date: Fri, 24 Jun 2016 08:59:57 -0500<br>
From: Barry Smith <<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>><br>
To: Mark Adams <<a href="mailto:mfadams@lbl.gov">mfadams@lbl.gov</a>><br>
Cc: For users of the development version of PETSc<br>
<<a href="mailto:petsc-dev@mcs.anl.gov">petsc-dev@mcs.anl.gov</a>><br>
Subject: Re: [petsc-dev] asm / gasm<br>
Message-ID: <<a href="mailto:55FF6E27-46B0-4729-9A52-37C5ABBCC0CF@mcs.anl.gov">55FF6E27-46B0-4729-9A52-37C5ABBCC0CF@mcs.anl.gov</a>><br>
Content-Type: text/plain; charset="us-ascii"<br>
<br>
<br>
> On Jun 24, 2016, at 1:35 AM, Mark Adams <<a href="mailto:mfadams@lbl.gov">mfadams@lbl.gov</a>> wrote:<br>
><br>
><br>
><br>
> On Thu, Jun 23, 2016 at 11:46 PM, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>> wrote:<br>
><br>
> Mark,<br>
><br>
> It is not as simple as this to convert to ASM. It will take a little bit of work to use ASM here instead of GASM.<br>
><br>
><br>
> Just to be clear: ASM used to work. Did the semantics of ASM change?<br></blockquote><div><br></div><div>Hi Mark,</div><div><br></div><div>Assume that you said GASM used to work, and now is not working any more.</div><div><br></div><div>GASM was originally written by Dmitry. The basic idea is to allow mulit-rank blocks, that is, a mulit-rank subdomain problem could be solved using a small number of processor cores in parallel. This is different from ASM. </div><div><br></div><div>I was involved into the development of GASM last summer. There are some changes:</div><div><br></div><div>(1) Added a function to increase overlap of the multi-rank subdomains. The function is called by GASM in default.</div><div><br></div><div>(2) Added a hierarchical partitioning to optimize data exchange. Ensure that small subdomains in a multi-rank subdomain are geometrically connected. GASM does not use this functionality in default. </div><div><br></div><div>Any way, if you have an example (like Barry asked) showing the broken GASM, I will debug into it (of course, if Barry does not mind).</div><div><br></div><div>Fande Kong,</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<br>
Show me a commit where ASM worked! Do you mean that the GASM worked? The code has GASM calls in it, not ASM so how could ASM have previously worked? It is possible that something changed in GASM that broke GAMG's usage of GASM. Once you tell me how to reproduce the problem with GASM I can try to track down the problem.<br>
<br>
><br>
> But before that please please tell me the command line argument and example you use where the GASM crashes so I can get that fixed. Then I will look at using ASM instead after I have the current GASM code running again.<br>
><br>
> In branch mark/gamg-agg-asm in ksp ex56, 'make runex56':<br>
<br>
I don't care about this! This is where you have tried to change from GASM to ASM which I told you is non-trivial. Give me the example and command line where the GASM version in master (or maint) doesn't work where the error message includes ** Max-trans not allowed because matrix is distributed<br>
<br>
We are not communicating very well, you jumped from stating GASM crashed to monkeying with ASM and now refuse to tell me how to reproduce the GASM crash. We have to start by fixing the current code to work with GASM (if it ever worked) and then move on to using ASM (which is just an optimization of the GASM usage.)<br>
<br>
<br>
Barry<br>
<br>
<br>
><br>
> 14:12 nid00495 ~/petsc/src/ksp/ksp/examples/tutorials$ make runex56<br>
> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------<br>
> [0]PETSC ERROR: Petsc has generated inconsistent data<br>
> [0]PETSC ERROR: MPI_Allreduce() called in different locations (code lines) on different processors<br>
> [0]PETSC ERROR: See <a href="http://www.mcs.anl.gov/petsc/documentation/faq.html" rel="noreferrer" target="_blank">http://www.mcs.anl.gov/petsc/documentation/faq.html</a> for trouble shooting.<br>
> [0]PETSC ERROR: Petsc Development GIT revision: v3.7.2-633-g4f88208 GIT Date: 2016-06-23 18:53:31 +0200<br>
> [0]PETSC ERROR: /global/u2/m/madams/petsc/src/ksp/ksp/examples/tutorials/./ex56 on a arch-xc30-dbg64-intel named nid00495 by madams Thu Jun 23 14:12:57 2016<br>
> [0]PETSC ERROR: Configure options --COPTFLAGS="-no-ipo -g -O0" --CXXOPTFLAGS="-no-ipo -g -O0" --FOPTFLAGS="-fast -no-ipo -g -O0" --download-parmetis --download-metis --with-ssl=0 --with-cc=cc --with-clib-autodetect=0 --with-cxx=CC --with-cxxlib-autodetect=0 --with-debugging=1 --with-fc=0 --with-shared-libraries=0 --with-x=0 --with-mpiexec=srun LIBS=-lstdc++ --with-64-bit-indices PETSC_ARCH=arch-xc30-dbg64-intel<br>
> [0]PETSC ERROR: #1 MatGetSubMatrices_MPIAIJ() li<br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
> Barry<br>
><br>
><br>
> > On Jun 23, 2016, at 4:19 PM, Mark Adams <<a href="mailto:mfadams@lbl.gov">mfadams@lbl.gov</a>> wrote:<br>
> ><br>
> > The question boils down to, for empty processors do we:<br>
> ><br>
> > ierr = ISCreateGeneral(PETSC_COMM_SELF, 0, NULL, PETSC_COPY_VALUES, &is);CHKERRQ(ierr);<br>
> > ierr = PCASMSetLocalSubdomains(subpc, 1, &is, NULL);CHKERRQ(ierr);<br>
> > ierr = ISDestroy(&is);CHKERRQ(ierr);<br>
> ><br>
> > or<br>
> ><br>
> > PCASMSetLocalSubdomains(subpc, 0, NULL, NULL);<br>
> ><br>
> > The later gives and error that one domain is need and the later gives an error (appended).<br>
> ><br>
> > I've checked in the code for this second error in ksp (make runex56)<br>
> ><br>
> > Thanks,<br>
> ><br>
> > [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------<br>
> > [0]PETSC ERROR: Petsc has generated inconsistent data<br>
> > [0]PETSC ERROR: MPI_Allreduce() called in different locations (code lines) on different processors<br>
> > [0]PETSC ERROR: See <a href="http://www.mcs.anl.gov/petsc/documentation/faq.html" rel="noreferrer" target="_blank">http://www.mcs.anl.gov/petsc/documentation/faq.html</a> for trouble shooting.<br>
> > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.2-633-g4f88208 GIT Date: 2016-06-23 18:53:31 +0200<br>
> > [0]PETSC ERROR: /global/u2/m/madams/petsc/src/ksp/ksp/examples/tutorials/./ex56 on a arch-xc30-dbg64-intel named nid00495 by madams Thu Jun 23 14:12:57 2016<br>
> > [0]PETSC ERROR: Configure options --COPTFLAGS="-no-ipo -g -O0" --CXXOPTFLAGS="-no-ipo -g -O0" --FOPTFLAGS="-fast -no-ipo -g -O0" --download-parmetis --download-metis --with-ssl=0 --with-cc=cc --with-clib-autodetect=0 --with-cxx=CC --with-cxxlib-autodetect=0 --with-debugging=1 --with-fc=0 --with-shared-libraries=0 --with-x=0 --with-mpiexec=srun LIBS=-lstdc++ --with-64-bit-indices PETSC_ARCH=arch-xc30-dbg64-intel<br>
> > [0]PETSC ERROR: #1 MatGetSubMatrices_MPIAIJ() line 1147 in /global/u2/m/madams/petsc/src/mat/impls/aij/mpi/mpiov.c<br>
> > [0]PETSC ERROR: #2 MatGetSubMatrices_MPIAIJ() line 1147 in /global/u2/m/madams/petsc/src/mat/impls/aij/mpi/mpiov.c<br>
> > [1]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------<br>
> ><br>
> > On Thu, Jun 23, 2016 at 8:05 PM, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>> wrote:<br>
> ><br>
> > Where is the command line that generates the error?<br>
> ><br>
> ><br>
> > > On Jun 23, 2016, at 12:08 AM, Mark Adams <<a href="mailto:mfadams@lbl.gov">mfadams@lbl.gov</a>> wrote:<br>
> > ><br>
> > > [adding Garth]<br>
> > ><br>
> > > On Thu, Jun 23, 2016 at 12:52 AM, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>> wrote:<br>
> > ><br>
> > > Mark,<br>
> > ><br>
> > > I think there is a misunderstanding here. With GASM an individual block problem is __solved__ (via a parallel KSP) in parallel by several processes, with ASM each block is "owned" by and solved on a single process.<br>
> > ><br>
> > > Ah, OK, so this is for multiple processors in a block. Yes, we are looking at small, smother, blocks.<br>
> > ><br>
> > ><br>
> > > With both the "block" can come from any unknowns on any processes. You can have, for example a block that comes from a region snaking across several processes if you like (or it makes sense due to coupling in the matrix).<br>
> > ><br>
> > > By default if you use ASM it will create one non-overlapping block defined by all unknowns owned by a single process and then extend it by "one level" (defined by the nonzero structure of the matrix) to get overlapping.<br>
> > ><br>
> > > The default in ASM is one level of overlap? That is new. (OK, I have not looked at ASM in like over 10 years)<br>
> > ><br>
> > > If you use multiple blocks per process it defines the non-overlapping blocks within a single process's unknowns<br>
> > ><br>
> > > I assume this still chops the matrix and does not call a partitioner.<br>
> > ><br>
> > > and extends each of them to have overlap (again by the non-zero structure of the matrix). The default is simple because the user only need indicate the number of blocks per process, the drawback is of course that it does depend on the process layout, number of processes etc and does not take into account particular "coupling information" that the user may know about with their problem.<br>
> > ><br>
> > > If the user wishes to defined the blocks themselves that is also possible with PCASMSetSubLocalSubdomains(). Each process provides 1 or more index sets for the subdomains it will solve on. Note that the index sets can contain any unknowns in the entire problem so the blocks do not have to "line up" with the parallel decomposition at all.<br>
> > ><br>
> > > Oh, OK, this is what I want. (I thought this worked).<br>
> > ><br>
> > > Of course determining and providing good such subdomains may not always be clear.<br>
> > ><br>
> > > In smoothed aggregation there is an argument that the aggregates are good, but the scale is fixed obviously. On a regular grid smoothed aggregation wants 3^D sized aggregates, which is obviously wonderful for AMS. And for anisotropy you want your ASM blocks to be on strongly connected components, which is what smoothed aggregation wants (not that I do this very well).<br>
> > ><br>
> > ><br>
> > > I see in GAMG you have PCGAMGSetUseASMAggs<br>
> > ><br>
> > > But the code calls PCGASMSetSubdomains and the command line is -pc_gamg_use_agg_gasm, so this is all messed up. (more below)<br>
> > ><br>
> > > which sadly does not have an explanation in the users manual and sadly does not have a matching options data base name -pc_gamg_use_agg_gasm following the rule of drop the word set, all lower case, and put _ between words the option should be -pc_gamg_use_asm_aggs.<br>
> > ><br>
> > > BUT, THIS IS THE WAY IT WAS! It looks like someone hijacked this code and made it gasm. I never did this.<br>
> > ><br>
> > > Barry: you did this apparently in 2013.<br>
> > ><br>
> > ><br>
> > > In addition to this one you could also have one that uses the aggs but use the PCASM to manage the solves instead of GASM, it would likely be less buggy and more efficient.<br>
> > ><br>
> > > yes<br>
> > ><br>
> > ><br>
> > > Please tell me exactly what example you tried to run with what options and I will debug it.<br>
> > ><br>
> > > We got an error message:<br>
> > ><br>
> > > ** Max-trans not allowed because matrix is distributed<br>
> > ><br>
> > > Garth: is this from your code perhaps? I don't see it in PETSc.<br>
> > ><br>
> > > Note that ALL functionality that is included in PETSc should have tests that test that functionality then we will find out immediately when it is broken instead of two years later when it is much harder to debug. If this -pc_gamg_use_agg_gasm had had a test we won't be in this mess now. (Jed's damn code reviews sure don't pick up this stuff).<br>
> > ><br>
> > > First we need to change gasm to asm.<br>
> > ><br>
> > > We could add this argument pc_gamg_use_agg_asm to ksp/ex56 (runex56 or make a new test). The SNES version (also ex56) is my current test that I like to refer to as recommended parameters for elasticity. So I'd like to keep that clean, but we can add junk to ksp/ex56.<br>
> > ><br>
> > > I've done this in a branch mark/gamg-agg-asm. I get an error (appended). It looks like the second coarsest grid, which has 36 dof on one processor has an index 36 in the block on every processor. Strange. I can take a look at it later.<br>
> > ><br>
> > > Mark<br>
> > ><br>
> > > > [3]PETSC ERROR: [4]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------<br>
> > > > [4]PETSC ERROR: Petsc has generated inconsistent data<br>
> > > > [4]PETSC ERROR: ith 0 block entry 36 not owned by any process, upper bound 36<br>
> > > > [4]PETSC ERROR: See <a href="http://www.mcs.anl.gov/petsc/documentation/faq.html" rel="noreferrer" target="_blank">http://www.mcs.anl.gov/petsc/documentation/faq.html</a> for trouble shooting.<br>
> > > > [4]PETSC ERROR: Petsc Development GIT revision: v3.7.2-630-g96e0c40 GIT Date: 2016-06-22 10:03:02 -0500<br>
> > > > [4]PETSC ERROR: ./ex56 on a arch-macosx-gnu-g named MarksMac-3.local by markadams Thu Jun 23 06:53:27 2016<br>
> > > > [4]PETSC ERROR: Configure options COPTFLAGS="-g -O0" CXXOPTFLAGS="-g -O0" FOPTFLAGS="-g -O0" --download-hypre=1 --download-parmetis=1 --download-metis=1 --download-ml=1 --download-p4est=1 --download-exodus=1 --download-triangle=1 --with-hdf5-dir=/Users/markadams/Codes/hdf5 --with-x=0 --with-debugging=1 PETSC_ARCH=arch-macosx-gnu-g --download-chaco<br>
> > > > [4]PETSC ERROR: #1 VecScatterCreate_PtoS() line 2348 in /Users/markadams/Codes/petsc/src/vec/vec/utils/vpscat.c<br>
> > > > [4]PETSC ERROR: #2 VecScatterCreate() line 1552 in /Users/markadams/Codes/petsc/src/vec/vec/utils/vscat.c<br>
> > > > [4]PETSC ERROR: Petsc has generated inconsistent data<br>
> > > > [3]PETSC ERROR: ith 0 block entry 36 not owned by any process, upper bound 36<br>
> > > > [3]PETSC ERROR: See <a href="http://www.mcs.anl.gov/petsc/documentation/faq.html" rel="noreferrer" target="_blank">http://www.mcs.anl.gov/petsc/documentation/faq.html</a> for trouble shooting.<br>
> > > > [3]PETSC ERROR: Petsc Development GIT revision: v3.7.2-630-g96e0c40 GIT Date: 2016-06-22 10:03:02 -0500<br>
> > > > [3]PETSC ERROR: ./ex56 on a arch-macosx-gnu-g named MarksMac-3.local by markadams Thu Jun 23 06:53:27 2016<br>
> > > > [3]PETSC ERROR: Configure options COPTFLAGS="-g -O0" CXXOPTFLAGS="-g -O0" FOPTFLAGS="-g -O0" --download-hypre=1 --download-parmetis=1 --download-metis=1 --download-ml=1 --download-p4est=1 --download-exodus=1 --download-triangle=1 --with-hdf5-dir=/Users/markadams/Codes/hdf5 --with-x=0 --with-debugging=1 PETSC_ARCH=arch-macosx-gnu-g --download-chaco<br>
> > > > [3]PETSC ERROR: #1 VecScatterCreate_PtoS() line 2348 in /Users/markadams/Codes/petsc/src/vec/vec/utils/vpscat.c<br>
> > > > [3]PETSC ERROR: #2 VecScatterCreate() line 1552 in /Users/markadams/Codes/petsc/src/vec/vec/utils/vscat.c<br>
> > > > [3]PETSC ERROR: #3 PCSetUp_ASM() line 279 in /Users/markadams/Codes/petsc/src/ksp/pc/impls/asm/asm.c<br>
> > ><br>
> > ><br>
> > ><br>
> > ><br>
> > ><br>
> > ><br>
> > ><br>
> > > Barry<br>
> > ><br>
> > ><br>
> > ><br>
> > ><br>
> > ><br>
> > ><br>
> > > > On Jun 22, 2016, at 5:20 PM, Mark Adams <<a href="mailto:mfadams@lbl.gov">mfadams@lbl.gov</a>> wrote:<br>
> > > ><br>
> > > ><br>
> > > ><br>
> > > > On Wed, Jun 22, 2016 at 8:06 PM, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>> wrote:<br>
> > > ><br>
> > > > I suggest focusing on asm.<br>
> > > ><br>
> > > > OK, I will switch gasm to asm, this does not work anyway.<br>
> > > ><br>
> > > > Having blocks that span multiple processes seems like over kill for a smoother ?<br>
> > > ><br>
> > > > No, because it is a pain to have the math convolved with the parallel decompositions strategy (ie, I can't tell an application how to partition their problem). If an aggregate spans processor boundaries, which is fine and needed, and let's say we have a pretty uniform problem, then if the block gets split up, H is small in part of the domain and convergence could suffer along processor boundaries. And having the math change as the parallel decomposition changes is annoying.<br>
> > > ><br>
> > > > (Major league overkill) in fact doesn't one want multiple blocks per process, ie. pretty small blocks.<br>
> > > ><br>
> > > > No, it is just doing what would be done in serial. If the cost of moving the data across the processor is a problem then that is a tradeoff to consider.<br>
> > > ><br>
> > > > And I think you are misunderstanding me. There are lots of blocks per process (the aggregates are say 3^D in size). And many of the aggregates/blocks along the processor boundary will be split between processors, resulting is mall blocks and weak ASM PC on processor boundaries.<br>
> > > ><br>
> > > > I can understand ASM not being general and not letting blocks span processor boundaries, but I don't think the extra matrix communication costs are a big deal (done just once) and the vector communication costs are not bad, it probably does not include (too many) new processors to communicate with.<br>
> > > ><br>
> > > ><br>
> > > > Barry<br>
> > > ><br>
> > > > > On Jun 22, 2016, at 7:51 AM, Mark Adams <<a href="mailto:mfadams@lbl.gov">mfadams@lbl.gov</a>> wrote:<br>
> > > > ><br>
> > > > > I'm trying to get block smoothers to work for gamg. We (Garth) tried this and got this error:<br>
> > > > ><br>
> > > > ><br>
> > > > > - Another option is use '-pc_gamg_use_agg_gasm true' and use '-mg_levels_pc_type gasm'.<br>
> > > > ><br>
> > > > ><br>
> > > > > Running in parallel, I get<br>
> > > > ><br>
> > > > > ** Max-trans not allowed because matrix is distributed<br>
> > > > > ----<br>
> > > > ><br>
> > > > > First, what is the difference between asm and gasm?<br>
> > > > ><br>
> > > > > Second, I need to fix this to get block smoothers. This used to work. Did we lose the capability to have blocks that span processor subdomains?<br>
> > > > ><br>
> > > > > gamg only aggregates across processor subdomains within one layer, so maybe I could use one layer of overlap in some way?<br>
> > > > ><br>
> > > > > Thanks,<br>
> > > > > Mark<br>
> > > > ><br>
> > > ><br>
> > > ><br>
> > ><br>
> > ><br>
> ><br>
> ><br>
><br>
><br>
<br>
<br>
<br>
------------------------------<br>
<br>
_______________________________________________<br>
petsc-dev mailing list<br>
<a href="mailto:petsc-dev@mcs.anl.gov">petsc-dev@mcs.anl.gov</a><br>
<a href="https://lists.mcs.anl.gov/mailman/listinfo/petsc-dev" rel="noreferrer" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/petsc-dev</a><br>
<br>
<br>
End of petsc-dev Digest, Vol 90, Issue 30<br>
*****************************************<br>
</blockquote></div><br></div></div>