In addition to the assembly advantages that Satish pointed out, BAIJ
requires less storage for the column indices, effectively improving the
arithmetic intensity of many kernels, and speeding up matrix
factorization (e.g. symbolic factorization only needs to compute fill in
terms of blocks instead of individual elements).  The use of inodes with
AIJ (default when applicable) reduces the memory bandwidth requirements
of the column indices, turns point relaxation smoothers (SOR) into
stronger block relaxation, and allows a certain amount of unrolling.
BAIJ requires even less metadata, provides more regular memory access,
and does more unrolling.

If your matrix is truly blocked, BAIJ should provide better performance
with all preconditioners that support it.  Many third-party
preconditioners will not work with BAIJ, so it is useful to give your
matrix a prefix (or check the options database if you are getting your
matrix from a DA or similar) so that you can set it's type with
-foo_mat_type when using a preconditioner that requires it.


