[petsc-users] Why use MATMPIBAIJ?

Thu Jan 14 12:50:50 CST 2016

> On Jan 14, 2016, at 5:04 AM, Hoang Giang Bui <hgbk2008 at gmail.com> wrote:
> 
> This is a very interesting thread because use of block matrix improves the performance of AMG a lot. In my case is the elasticity problem.
> 
> One more question I like to ask, which is more on the performance of the solver. That if I have a coupled problem, says the point block is [u_x u_y u_z p] in which entries of p block in stiffness matrix is in a much smaller scale than u (p~1e-6, u~1e+8), then AMG with hypre in PETSc still scale? Also, is there a utility in PETSc which does automatic scaling of variables?

   We highly recommend scaling in your MODEL (as much as possible) to have similar scaling of the various variables see https://en.wikipedia.org/wiki/Nondimensionalization.

   The problem with trying to do the scaling numerically after you have discretized your model is that the effect of the finite arithmetic as you "rescale" means that you lose possibly all the accuracy during the rescaling. For example say your "badly scaled" matrix J has a condition number of 1.e15; now you apply a numerical algorithm to "rescale" the variables to get a much better conditioned matrix. Since the accuracy of the numerical algorithm depends on the conditioning of J it will give you essentially no digits correct (due to the finite arithmetic) in your new J prime and in the transformation between your new and old variables.

   Barry

> 
> Giang
> 
> On Thu, Jan 14, 2016 at 7:13 AM, Justin Chang <jychang48 at gmail.com> wrote:
> Okay that makes sense, thanks
> 
> On Wed, Jan 13, 2016 at 10:12 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
> > On Jan 13, 2016, at 10:24 PM, Justin Chang <jychang48 at gmail.com> wrote:
> >
> > Thanks Barry,
> >
> > 1) So for block matrices, the ja array is smaller. But what's the "hardware" explanation for this performance improvement? Does it have to do with spatial locality where you are more likely to reuse data in that ja array, or does it have to do with the fact that loading/storing smaller arrays are less likely to invoke a cache miss, thus reducing the amount of bandwidth?
> 
> There are two distinct reasons for the improvement:
> 
> 1) For 5 by 5 blocks the ja array is 1/25th the size. The "hardware" savings is that you have to load something that is much smaller than before. Cache/spatial locality have nothing to do with this particular improvement.
> 
> 2) The other improvement comes from the reuse of each x[j] value multiplied by 5 values (a column) of the little block. The hardware explanation is that x[j] can be reused in a register for the 5 multiplies (while otherwise it would have to come from cache to register 5 times and sometimes might even have been flushed from the cache so would have to come from memory). This is why we have code like
> 
>     for (j=0; j<n; j++) {
>       xb    = x + 5*(*idx++);
>       x1    = xb[0]; x2 = xb[1]; x3 = xb[2]; x4 = xb[3]; x5 = xb[4];
>       sum1 += v[0]*x1 + v[5]*x2 + v[10]*x3  + v[15]*x4 + v[20]*x5;
>       sum2 += v[1]*x1 + v[6]*x2 + v[11]*x3  + v[16]*x4 + v[21]*x5;
>       sum3 += v[2]*x1 + v[7]*x2 + v[12]*x3  + v[17]*x4 + v[22]*x5;
>       sum4 += v[3]*x1 + v[8]*x2 + v[13]*x3  + v[18]*x4 + v[23]*x5;
>       sum5 += v[4]*x1 + v[9]*x2 + v[14]*x3  + v[19]*x4 + v[24]*x5;
>       v    += 25;
>     }
> 
> to do the block multiple.
> 
> >
> > 2) So if one wants to assemble a monolithic matrix (i.e., aggregation of more than one dof per point) then using the BAIJ format is highly advisable. But if I want to form a nested matrix, say I am solving Stokes equation, then each "submatrix" is of AIJ format? Can these sub matrices also be BAIJ?
> 
>    Sure, but if you have separated all the variables of pressure, velocity_x, velocity_y, etc into there own regions of the vector then the block size for the sub matrices would be 1 so BAIJ does not help.
> 
>    There are Stokes solvers that use Vanka smoothing that keep the variables interlaced and hence would use BAIJ and NOT use fieldsplit
> 
> 
> >
> > Thanks,
> > Justin
> >
> > On Wed, Jan 13, 2016 at 9:12 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> > > On Jan 13, 2016, at 9:57 PM, Justin Chang <jychang48 at gmail.com> wrote:
> > >
> > > Hi all,
> > >
> > > 1) I am guessing MATMPIBAIJ could theoretically have better performance than simply using MATMPIAIJ. Why is that? Is it similar to the reasoning that block (dense) matrix-vector multiply is "faster" than simple matrix-vector?
> >
> >   See for example table 1 in http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.38.7668&rep=rep1&type=pdf
> >
> > >
> > > 2) I am looking through the manual and online documentation and it seems the term "block" used everywhere. In the section on "block matrices" (3.1.3 of the manual), it refers to field splitting, where you could either have a monolithic matrix or a nested matrix. Does that concept have anything to do with MATMPIBAIJ?
> >
> >    Unfortunately the numerical analysis literature uses the term block in multiple ways. For small blocks, sometimes called "point-block" with BAIJ and for very large blocks (where the blocks are sparse themselves). I used fieldsplit for big sparse blocks to try to avoid confusion in PETSc.
> > >
> > > It makes sense to me that one could create a BAIJ where if you have 5 dofs of the same type of physics (e.g., five different primary species of a geochemical reaction) per grid point, you could create a block size of 5. And if you have different physics (e.g., velocity and pressure) you would ideally want to separate them out (i.e., nested matrices) for better preconditioning.
> >
> >    Sometimes you put them together with BAIJ and sometimes you keep them separate with nested matrices.
> >
> > >
> > > Thanks,
> > > Justin
> >
> >
> 
> 
>