[petsc-users] Why use MATMPIBAIJ?

Thu Jan 14 10:20:30 CST 2016

On Thu, Jan 14, 2016 at 5:04 AM, Hoang Giang Bui <hgbk2008 at gmail.com> wrote:

> This is a very interesting thread because use of block matrix improves the
> performance of AMG a lot. In my case is the elasticity problem.
>
> One more question I like to ask, which is more on the performance of the
> solver. That if I have a coupled problem, says the point block is [u_x u_y
> u_z p] in which entries of p block in stiffness matrix is in a much smaller
> scale than u (p~1e-6, u~1e+8), then AMG with hypre in PETSc still scale?
> Also, is there a utility in PETSc which does automatic scaling of variables?
>

You could use PC Jacobi, or perhaps

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/KSP/KSPSetDiagonalScale.html

  Thanks,

     Matt

> Giang
>
> On Thu, Jan 14, 2016 at 7:13 AM, Justin Chang <jychang48 at gmail.com> wrote:
>
>> Okay that makes sense, thanks
>>
>> On Wed, Jan 13, 2016 at 10:12 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>
>>>
>>> > On Jan 13, 2016, at 10:24 PM, Justin Chang <jychang48 at gmail.com>
>>> wrote:
>>> >
>>> > Thanks Barry,
>>> >
>>> > 1) So for block matrices, the ja array is smaller. But what's the
>>> "hardware" explanation for this performance improvement? Does it have to do
>>> with spatial locality where you are more likely to reuse data in that ja
>>> array, or does it have to do with the fact that loading/storing smaller
>>> arrays are less likely to invoke a cache miss, thus reducing the amount of
>>> bandwidth?
>>>
>>> There are two distinct reasons for the improvement:
>>>
>>> 1) For 5 by 5 blocks the ja array is 1/25th the size. The "hardware"
>>> savings is that you have to load something that is much smaller than
>>> before. Cache/spatial locality have nothing to do with this particular
>>> improvement.
>>>
>>> 2) The other improvement comes from the reuse of each x[j] value
>>> multiplied by 5 values (a column) of the little block. The hardware
>>> explanation is that x[j] can be reused in a register for the 5 multiplies
>>> (while otherwise it would have to come from cache to register 5 times and
>>> sometimes might even have been flushed from the cache so would have to come
>>> from memory). This is why we have code like
>>>
>>>     for (j=0; j<n; j++) {
>>>       xb    = x + 5*(*idx++);
>>>       x1    = xb[0]; x2 = xb[1]; x3 = xb[2]; x4 = xb[3]; x5 = xb[4];
>>>       sum1 += v[0]*x1 + v[5]*x2 + v[10]*x3  + v[15]*x4 + v[20]*x5;
>>>       sum2 += v[1]*x1 + v[6]*x2 + v[11]*x3  + v[16]*x4 + v[21]*x5;
>>>       sum3 += v[2]*x1 + v[7]*x2 + v[12]*x3  + v[17]*x4 + v[22]*x5;
>>>       sum4 += v[3]*x1 + v[8]*x2 + v[13]*x3  + v[18]*x4 + v[23]*x5;
>>>       sum5 += v[4]*x1 + v[9]*x2 + v[14]*x3  + v[19]*x4 + v[24]*x5;
>>>       v    += 25;
>>>     }
>>>
>>> to do the block multiple.
>>>
>>> >
>>> > 2) So if one wants to assemble a monolithic matrix (i.e., aggregation
>>> of more than one dof per point) then using the BAIJ format is highly
>>> advisable. But if I want to form a nested matrix, say I am solving Stokes
>>> equation, then each "submatrix" is of AIJ format? Can these sub matrices
>>> also be BAIJ?
>>>
>>>    Sure, but if you have separated all the variables of pressure,
>>> velocity_x, velocity_y, etc into there own regions of the vector then the
>>> block size for the sub matrices would be 1 so BAIJ does not help.
>>>
>>>    There are Stokes solvers that use Vanka smoothing that keep the
>>> variables interlaced and hence would use BAIJ and NOT use fieldsplit
>>>
>>>
>>> >
>>> > Thanks,
>>> > Justin
>>> >
>>> > On Wed, Jan 13, 2016 at 9:12 PM, Barry Smith <bsmith at mcs.anl.gov>
>>> wrote:
>>> >
>>> > > On Jan 13, 2016, at 9:57 PM, Justin Chang <jychang48 at gmail.com>
>>> wrote:
>>> > >
>>> > > Hi all,
>>> > >
>>> > > 1) I am guessing MATMPIBAIJ could theoretically have better
>>> performance than simply using MATMPIAIJ. Why is that? Is it similar to the
>>> reasoning that block (dense) matrix-vector multiply is "faster" than simple
>>> matrix-vector?
>>> >
>>> >   See for example table 1 in
>>> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.38.7668&rep=rep1&type=pdf
>>> >
>>> > >
>>> > > 2) I am looking through the manual and online documentation and it
>>> seems the term "block" used everywhere. In the section on "block matrices"
>>> (3.1.3 of the manual), it refers to field splitting, where you could either
>>> have a monolithic matrix or a nested matrix. Does that concept have
>>> anything to do with MATMPIBAIJ?
>>> >
>>> >    Unfortunately the numerical analysis literature uses the term block
>>> in multiple ways. For small blocks, sometimes called "point-block" with
>>> BAIJ and for very large blocks (where the blocks are sparse themselves). I
>>> used fieldsplit for big sparse blocks to try to avoid confusion in PETSc.
>>> > >
>>> > > It makes sense to me that one could create a BAIJ where if you have
>>> 5 dofs of the same type of physics (e.g., five different primary species of
>>> a geochemical reaction) per grid point, you could create a block size of 5.
>>> And if you have different physics (e.g., velocity and pressure) you would
>>> ideally want to separate them out (i.e., nested matrices) for better
>>> preconditioning.
>>> >
>>> >    Sometimes you put them together with BAIJ and sometimes you keep
>>> them separate with nested matrices.
>>> >
>>> > >
>>> > > Thanks,
>>> > > Justin
>>> >
>>> >
>>>
>>>
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160114/d862627f/attachment.html>