<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family:Calibri,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
</div>
<div style="font-family:Calibri,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<div style="font-family:Helvetica; font-size:12px"><span style="font-family:Courier">Thanks a lot for your suggestions, Hong and Barry - </span></div>
<div style="font-family:Helvetica; font-size:12px"><font face="Courier"><br>
</font></div>
<div style="font-family:Helvetica; font-size:12px"><font face="Courier">As you suggested, I first tried the LU direct solvers (built-in and MUMPs) out this morning, which work perfectly, albeit slow. </font><span style="font-family: Courier; font-size: 12px; background: var(--white);">As
 my problem itself is a part of a PDE based optimization, the exact solution in the searching procedure is not necessary (I often set a relative tolerance of 1E-7/1E-8, etc. for Krylov subspace methods). Also tried BJACOBI with exact LU, the KSP just converges
 in one or two iterations, as expected. </span></div>
<div style="font-family:Helvetica; font-size:12px"><font face="Courier"><br>
</font></div>
<div style="font-family:Helvetica; font-size:12px"><font face="Courier">I added -kspview option for the D-ILU test (still with Block Jacobi as preconditioner and bcgs as KSP solver). The KSPview output from one of the examples in a toy model looks like: </font></div>
<div style="font-family:Helvetica; font-size:12px"><font face="Courier"><br>
</font></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">KSP Object: 1 MPI processes</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures"> <span> </span>type: bcgs</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures"> <span> </span>maximum iterations=120,
 nonzero initial guess</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures"> <span> </span>tolerances: <span> </span>relative=1e-07,
 absolute=1e-50, divergence=10000.</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures"> <span> </span>left preconditioning</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures"> <span> </span>using PRECONDITIONED
 norm type for convergence test</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures">PC Object: 1 MPI processes</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures"> <span> </span>type: bjacobi</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures">   <span> </span>number of blocks
 = 3</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures">   <span> </span>Local solve is
 same for all blocks, in the following KSP and PC objects:</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures">   <span> </span>KSP Object: (sub_)
 1 MPI processes</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures">     <span> </span>type: preonly</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures">     <span> </span>maximum iterations=10000,
 initial guess is zero</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures">     <span> </span>tolerances: <span> </span>relative=1e-05,
 absolute=1e-50, divergence=10000.</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures">     <span> </span>left preconditioning</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures">     <span> </span>using NONE norm
 type for convergence test</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures">   <span> </span>PC Object: (sub_)
 1 MPI processes</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures">     <span> </span>type: ilu</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures">       <span> </span>out-of-place
 factorization</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures">       <span> </span>0 levels of
 fill</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures">       <span> </span>tolerance for
 zero pivot 2.22045e-14</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures">       <span> </span>matrix ordering:
 natural</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures">       <span> </span>factor fill
 ratio given 1., needed 1.</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures">         <span> </span>Factored
 matrix follows:</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures">           <span> </span>Mat Object:
 1 MPI processes</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures">             <span> </span>type:
 seqaij</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures">             <span> </span>rows=11294,
 cols=11294</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures">             <span> </span>package
 used to perform factorization: petsc</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures">             <span> </span>total:
 nonzeros=76008, allocated nonzeros=76008</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures">             <span> </span>total
 number of mallocs used during MatSetValues calls=0</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures">               <span> </span>not
 using I-node routines</span></div>
<div style="font-family:Helvetica; font-size:12px"><span style="font-family:Menlo; font-size:11px; font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-family:Menlo; font-size:11px; font-variant-ligatures:no-common-ligatures"> 
     linear system matrix = precond matrix:</span></div>
<div style="font-family:Helvetica; font-size:12px">
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">       Mat Object: 1 MPI processes</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures">       <span> </span>type: seqaij</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures">       <span> </span>rows=11294,
 cols=11294</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures">       <span> </span>total: nonzeros=76008,
 allocated nonzeros=76008</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures">       <span> </span>total number
 of mallocs used during MatSetValues calls=0</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures">         <span> </span>not using
 I-node routines</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures"> <span> </span>linear system matrix
 = precond matrix:</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures"> <span> </span>Mat Object: 1 MPI
 processes</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures">   <span> </span>type: mpiaij</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures">   <span> </span>rows=33880, cols=33880</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures">   <span> </span>total: nonzeros=436968,
 allocated nonzeros=436968</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures">   <span> </span>total number of
 mallocs used during MatSetValues calls=0</span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures; color:rgb(117,117,117)"> </span><span style="font-variant-ligatures:no-common-ligatures">      not using I-node (on process
 0) routines</span></div>
</div>
<div style="font-family:Helvetica; font-size:12px"><span style="font-family:Courier"><br>
</span></div>
<div style="font-family:Helvetica; font-size:12px"><span style="font-family:Courier">do you see something wrong with my setup?</span></div>
<div style="font-family:Helvetica; font-size:12px"><font face="Courier"><br>
</font></div>
<div style="font-family:Helvetica; font-size:12px"><font face="Courier">I also tried a quick performance test with a small </font><span style="font-family:Menlo; font-size:11px">278906 by </span><span style="font-family:Menlo; font-size:11px">278906 matrix
 (</span><span style="font-family:Menlo; font-size:11px">3850990 nnz) with</span><span style="font-family:Courier"> the following parameters: </span></div>
<div style="font-family:Helvetica; font-size:12px"><span style="font-family:Courier"><br>
</span></div>
<div style="font-family:Helvetica; font-size:12px">
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><span style="color:rgb(200,20,201); font-variant-ligatures:no-common-ligatures">-ksp_type</span><span style="font-variant-ligatures:no-common-ligatures"><span> </span>bcgs<span> </span></span><span style="color:rgb(200,20,201); font-variant-ligatures:no-common-ligatures">-pc_type</span><span style="font-variant-ligatures:no-common-ligatures"><span> </span>bjacobi<span> </span></span><span style="color:rgb(200,20,201); font-variant-ligatures:no-common-ligatures">-</span><span style="color:rgb(200,20,201); font-variant-ligatures:no-common-ligatures">pc_bjacobi_local_blocks</span><span style="font-variant-ligatures:no-common-ligatures"> </span><span style="font-variant-ligatures:no-common-ligatures"><font color="#c1651c">3</font></span><span style="font-variant-ligatures:no-common-ligatures"> </span><span style="color:rgb(200,20,201); font-variant-ligatures:no-common-ligatures">-pc_sub_type</span><span style="font-variant-ligatures:no-common-ligatures"><span> </span>ilu<span> </span></span><span style="color:rgb(200,20,201); font-variant-ligatures:no-common-ligatures">-ksp_view</span></div>
</div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><br>
</div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo">Reducing the relative residual to 1E-7 </div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><br>
</div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo">Took 4.08s with 41 bcgs iterations. </div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><br>
</div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo">Merely changing the <span style="color:rgb(200,20,201); font-variant-ligatures:no-common-ligatures">-</span><span style="color:rgb(200,20,201); font-variant-ligatures:no-common-ligatures">pc_bjacobi_local_blocks
 to 6 </span></div>
<div style="margin:0px; font-size:11px; line-height:normal; font-family:Menlo"><br>
</div>
<div style="font-family:Helvetica; font-size:12px"><font face="Courier">Took 7.02s with 73 bcgs iterations. 9 blocks would further take 9.45s with 101 bcgs iterations.</font></div>
<div style="font-family:Helvetica; font-size:12px"><font face="Courier"><br>
</font></div>
<div style="font-family:Helvetica; font-size:12px"><font face="Courier">As a reference, my home-brew Fortran code solves the same problem (3-block D-ILU0) in </font></div>
<div style="font-family:Helvetica; font-size:12px"><font face="Courier"><br>
</font></div>
<div style="font-family:Helvetica; font-size:12px"><font face="Courier">1.84s with 24 bcgs iterations (the bcgs code is also a home-brew one)…</font></div>
<div style="font-family:Helvetica; font-size:12px"><br>
</div>
<div style="font-family:Helvetica; font-size:12px"><font face="Courier"><br>
</font></div>
<div style="font-family:Helvetica; font-size:12px"><font face="Courier"><br>
</font></div>
<div style="font-family:Helvetica; font-size:12px"><font face="Courier">Well, by saying “use explicit L/U matrix as preconditioner”, I wonder if a user is allowed to provide his own (separate) L and U Mat for preconditioning (like how it works in Matlab solvers),
 e.g.</font></div>
<div style="font-family:Helvetica; font-size:12px"><font face="Courier"><br>
</font></div>
<div style="font-family:Helvetica; font-size:12px">
<div style="margin:0px; font-size:10px; line-height:normal">x = qmr(A,b,Tol,MaxIter,L,U,x)</div>
</div>
<div style="font-family:Helvetica; font-size:12px"> </div>
<div style="font-family:Helvetica; font-size:12px"><font face="Courier">As I already explicitly constructed the L and U matrices in Fortran, it is not hard to convert them to Mat in Petsc to test Petsc and my Fortran code head-to-head. In that case, the A,
 b, x, and L/U are all identical, it would be easier to see where the problem came from. </font></div>
<div style="font-family:Helvetica; font-size:12px"><font face="Courier"><br>
</font></div>
<div style="font-family:Helvetica; font-size:12px"><font face="Courier"><br>
</font></div>
<div style="font-family:Helvetica; font-size:12px"><br>
</div>
<div style="font-family:Helvetica; font-size:12px"><font face="Courier">BTW, there is another thing I wondered - is there a way to output residual in </font><span style="font-family:Courier">unpreconditioned norm</span><span style="font-family:Courier">? I
 tried to</span></div>
<div style="font-family:Helvetica; font-size:12px"><font face="Courier"><br>
</font></div>
<div style="font-family:Helvetica; font-size:12px"><span style="font-family:Menlo; font-size:11px; font-variant-ligatures:no-common-ligatures; color:rgb(46,174,187)"><b>call</b></span><span style="font-family:Menlo; font-size:11px; font-variant-ligatures:no-common-ligatures"> KSPSetNormType(ksp_local,
 KSP_NORM_UNPRECONDITIONED, ierr)</span></div>
<div style="font-family:Helvetica; font-size:12px"><span style="font-family:Menlo; font-size:11px; font-variant-ligatures:no-common-ligatures"><br>
</span></div>
<div style="font-family:Helvetica; font-size:12px"><span style="font-family:Menlo; font-size:11px; font-variant-ligatures:no-common-ligatures">But always get an error that current ksp does not support unpreconditioned in LEFT/RIGHT (either way). Is it possible
 to do that (output unpreconditioned residual) in PETSc at all?</span></div>
<div style="font-family:Helvetica; font-size:12px"><font face="Courier"><br>
</font></div>
<div style="font-family:Helvetica; font-size:12px"><span style="font-family:Courier">Cheers, </span></div>
<div style="font-family:Helvetica; font-size:12px"><font face="Courier">Hao</font></div>
<br>
</div>
<div>
<div id="appendonsend"></div>
<div style="font-family:Calibri,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<hr tabindex="-1" style="display:inline-block; width:98%">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> Smith, Barry F. <bsmith@mcs.anl.gov><br>
<b>Sent:</b> Tuesday, February 4, 2020 8:27 PM<br>
<b>To:</b> Hao DONG <dong-hao@outlook.com><br>
<b>Cc:</b> petsc-users@mcs.anl.gov <petsc-users@mcs.anl.gov><br>
<b>Subject:</b> Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC?</font>
<div> </div>
</div>
<div class="BodyFragment"><font size="2"><span style="font-size:11pt">
<div class="PlainText"><br>
<br>
> On Feb 4, 2020, at 12:41 PM, Hao DONG <dong-hao@outlook.com> wrote:<br>
> <br>
> Dear all, <br>
> <br>
> <br>
> I have a few questions about the implementation of diagonal ILU PC in PETSc. I want to solve a very simple system with KSP (in parallel), the nature of the system (finite difference time-harmonic Maxwell) is probably not important to the question itself.
 Long story short, I just need to solve a set of equations of Ax = b with a block diagonal system matrix, like (not sure if the mono font works):
<br>
> <br>
>    |X    |  <br>
> A =|  Y  |  <br>
>    |    Z| <br>
> <br>
> Note that A is not really block-diagonal, it’s just a multi-diagonal matrix determined by the FD mesh, where most elements are close to diagonal. So instead of a full ILU decomposition, a D-ILU is a good approximation as a preconditioner, and the number of
 blocks should not matter too much: <br>
> <br>
>     |Lx      |         |Ux      |<br>
> L = |   Ly   | and U = |   Uy   |<br>
>     |      Lz|         |      Uz|<br>
> <br>
> Where [Lx, Ux] = ILU0(X), etc. Indeed, the D-ILU preconditioner (with 3N blocks) is quite efficient with Krylov subspace methods like BiCGstab or QMR in my sequential Fortran/Matlab code.
<br>
> <br>
> So like most users, I am looking for a parallel implement with this problem in PETSc. After looking through the manual, it seems that the most straightforward way to do it is through PCBJACOBI. Not sure I understand it right, I just setup a 3-block PCJACOBI
 and give each of the block a KSP with PCILU. Is this supposed to be equivalent to my D-ILU preconditioner? My little block of fortran code would look like:
<br>
> ...<br>
>       call PCBJacobiSetTotalBlocks(pc_local,Ntotal,                   &<br>
>      &     isubs,ierr)<br>
>       call PCBJacobiSetLocalBlocks(pc_local, Nsub,                    &<br>
>      &    isubs(istart:iend),ierr)<br>
>       ! set up the block jacobi structure<br>
>       call KSPSetup(ksp_local,ierr)<br>
>       ! allocate sub ksps<br>
>       allocate(ksp_sub(Nsub))<br>
>       call PCBJacobiGetSubKSP(pc_local,Nsub,istart,                   &<br>
>      &     ksp_sub,ierr)<br>
>       do i=1,Nsub<br>
>           call KSPGetPC(ksp_sub(i),pc_sub,ierr)<br>
>           !ILU preconditioner<br>
>           call PCSetType(pc_sub,ptype,ierr)<br>
>           call PCFactorSetLevels(pc_sub,1,ierr) ! use ILU(1) here<br>
>           call KSPSetType(ksp_sub(i),KSPPREONLY,ierr)]<br>
>       end do<br>
>       call KSPSetTolerances(ksp_local,KSSiter%tol,PETSC_DEFAULT_REAL, &<br>
>      &     PETSC_DEFAULT_REAL,KSSiter%maxit,ierr)<br>
> … <br>
<br>
     This code looks essentially right. You should call with -ksp_view to see exactly what PETSc is using for a solver.
<br>
<br>
> <br>
> I understand that the parallel performance may not be comparable, so I first set up a one-process test (with MPIAij, but all the rows are local since there is only one process). The system is solved without any problem (identical results within error). But
 the performance is actually a lot worse (code built without debugging flags in performance tests) than my own home-brew implementation in Fortran (I wrote my own ILU0 in CSR sparse matrix format), which is hard to believe. I suspect the difference is from
 the PC as the PETSc version took much more BiCGstab iterations (60-ish vs 100-ish) to converge to the same relative tol.
<br>
<br>
   PETSc uses GMRES by default with a restart of 30 and left preconditioning. It could be different exact choices in the solver (which is why -ksp_view is so useful) can explain the differences in the runs between your code and PETSc's<br>
> <br>
> This is further confirmed when I change the setup of D-ILU (using 6 or 9 blocks instead of 3). While my Fortran/Matlab codes see minimal performance difference (<5%) when I play with the D-ILU setup, increasing the number of D-ILU blocks from 3 to 9 caused
 the ksp setup with PCBJACOBI to suffer a performance decrease of more than 50% in sequential test.<br>
<br>
   This is odd, the more blocks the smaller each block so the quicker the ILU set up should be. You can run various cases with -log_view and send us the output to see what is happening at each part of the computation in time.<br>
 <br>
> So my implementation IS somewhat different in PETSc. Do I miss something in the PCBJACOBI setup? Or do I have some fundamental misunderstanding of how PCBJACOBI works in PETSc?
<br>
<br>
   Probably not.<br>
> <br>
> If this is not the right way to implement a block diagonal ILU as (parallel) PC, please kindly point me to the right direction. I searched through the mail list to find some answers, only to find a couple of similar questions... An example would be nice.<br>
<br>
   You approach seems fundamentally right but I cannot be sure of possible bugs.<br>
> <br>
> On the other hand, does PETSc support a simple way to use explicit L/U matrix as a preconditioner? I can import the  D-ILU matrix (I already converted my A matrix into Mat) constructed in my Fortran code to make a better comparison. Or do I have to construct
 my own PC using PCshell? If so, is there a good tutorial/example to learn about how to use PCSHELL (in a more advanced sense, like how to setup pc side and transpose)?
<br>
<br>
   Not sure what you mean by explicit L/U matrix as a preconditioner. As Hong said, yes you can use a parallel LU from MUMPS or SuperLU_DIST or Pastix as the solver. You do not need any shell code. You simply need to set the PCType to lu
<br>
<br>
   You can also set all this options from the command line and don't need to write the code specifically. So call KSPSetFromOptions() and then for example<br>
<br>
    -pc_type bjacobi  -pc_bjacobi_local_blocks 3 -pc_sub_type ilu (this last one is applied to each block so you could use -pc_type lu and it would use lu on each block.)
<br>
<br>
   -ksp_type_none  -pc_type lu -pc_factor_mat_solver_type mumps  (do parallel LU with mumps)<br>
<br>
By not hardwiring in the code and just using options you can test out different cases much quicker<br>
<br>
Use -ksp_view to make sure that is using the solver the way you expect.<br>
<br>
Barry<br>
<br>
<br>
<br>
   Barry<br>
<br>
> <br>
> Thanks in advance, <br>
> <br>
> Hao<br>
<br>
</div>
</span></font></div>
</div>
</body>
</html>