[petsc-users] [External] Re: request to add an option similar to use_omp_threads for mumps to cusparse solver

Barry Smith bsmith at petsc.dev
Wed Oct 20 18:56:19 CDT 2021


   Hmm. A fix should work (almost exactly the same) with or without the block Jacobi on subdomains level, I had assumed that Junchao's branch would handle this. Have you looked at it?

  Barry


> On Oct 20, 2021, at 6:14 PM, Chang Liu <cliu at pppl.gov> wrote:
> 
> Hi Barry,
> 
> Wait, by "branch" are you talking about the MR Junchao submitted?
> 
> That fix (proposed by me) is only to fix the issue for telescope to work on mpiaijcusparse, when using outside bjacobi. It has nothing to do with the issue for telescope inside bjacobi. It does not help in my tests.
> 
> If my emails made you think the other way, I apologize for that.
> 
> Regards,
> 
> Chang
> 
> On 10/20/21 4:40 PM, Barry Smith wrote:
>>   Yes, but the branch can be used to do telescoping inside the bjacobi as needed.
>>> On Oct 20, 2021, at 2:59 PM, Junchao Zhang <junchao.zhang at gmail.com <mailto:junchao.zhang at gmail.com>> wrote:
>>> 
>>> The MR https://gitlab.com/petsc/petsc/-/merge_requests/4471 <https://gitlab.com/petsc/petsc/-/merge_requests/4471> has not been merged yet.
>>> 
>>> --Junchao Zhang
>>> 
>>> 
>>> On Wed, Oct 20, 2021 at 1:47 PM Chang Liu via petsc-users <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>> wrote:
>>> 
>>>    Hi Barry,
>>> 
>>>    Are the fixes merged in the master? I was using bjacobi as a
>>>    preconditioner. Using the latest version of petsc, I found that by
>>>    calling
>>> 
>>>    mpiexec -n 32 --oversubscribe ./ex7 -m 1000 -ksp_view
>>>    -ksp_monitor_true_residual -ksp_type fgmres -pc_type bjacobi
>>>    -pc_bjacobi
>>>    _blocks 4 -sub_ksp_type preonly -sub_pc_type telescope
>>>    -sub_pc_telescope_reduction_factor 8 -sub_pc_telescope_subcomm_type
>>>    contiguous -sub_telescope_pc_type lu -sub_telescope_ksp_type preonly
>>>    -sub_telescope_pc_factor_mat_solver_type mumps -ksp_max_it 2000
>>>    -ksp_rtol 1.e-30 -ksp_atol 1.e-30
>>> 
>>>    The code is calling PCApply_BJacobi_Multiproc. If I use
>>> 
>>>    mpiexec -n 32 --oversubscribe ./ex7 -m 1000 -ksp_view
>>>    -ksp_monitor_true_residual -telescope_ksp_monitor_true_residual
>>>    -ksp_type preonly -pc_type telescope -pc_telescope_reduction_factor 8
>>>    -pc_telescope_subcomm_type contiguous -telescope_pc_type bjacobi
>>>    -telescope_ksp_type fgmres -telescope_pc_bjacobi_blocks 4
>>>    -telescope_sub_ksp_type preonly -telescope_sub_pc_type lu
>>>    -telescope_sub_pc_factor_mat_solver_type mumps -telescope_ksp_max_it
>>>    2000 -telescope_ksp_rtol 1.e-30 -telescope_ksp_atol 1.e-30
>>> 
>>>    The code is calling PCApply_BJacobi_Singleblock. You can test it
>>>    yourself.
>>> 
>>>    Regards,
>>> 
>>>    Chang
>>> 
>>>    On 10/20/21 1:14 PM, Barry Smith wrote:
>>>    >
>>>    >
>>>    >> On Oct 20, 2021, at 12:48 PM, Chang Liu <cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov>> wrote:
>>>    >>
>>>    >> Hi Pierre,
>>>    >>
>>>    >> I have another suggestion for telescope. I have achieved my
>>>    goal by putting telescope outside bjacobi. But the code still does
>>>    not work if I use telescope as a pc for subblock. I think the
>>>    reason is that I want to use cusparse as the solver, which can
>>>    only deal with seqaij matrix and not mpiaij matrix.
>>>    >
>>>    >
>>>    >      This is suppose to work with the recent fixes. The
>>>    telescope should produce a seq matrix and for each solve map the
>>>    parallel vector (over the subdomain) automatically down to the one
>>>    rank with the GPU to solve it on the GPU. It is not clear to me
>>>    where the process is going wrong.
>>>    >
>>>    >    Barry
>>>    >
>>>    >
>>>    >
>>>    >> However, for telescope pc, it can put the matrix into one mpi
>>>    rank, thus making it a seqaij for factorization stage, but then
>>>    after factorization it will give the data back to the original
>>>    comminicator. This will make the matrix back to mpiaij, and then
>>>    cusparse cannot solve it.
>>>    >>
>>>    >> I think a better option is to do the factorization on CPU with
>>>    mpiaij, then then transform the preconditioner matrix to seqaij
>>>    and do the matsolve GPU. But I am not sure if it can be achieved
>>>    using telescope.
>>>    >>
>>>    >> Regads,
>>>    >>
>>>    >> Chang
>>>    >>
>>>    >> On 10/15/21 5:29 AM, Pierre Jolivet wrote:
>>>    >>> Hi Chang,
>>>    >>> The output you sent with MUMPS looks alright to me, you can
>>>    see that the MatType is properly set to seqaijcusparse (and not
>>>    mpiaijcusparse).
>>>    >>> I don’t know what is wrong with
>>>    -sub_telescope_pc_factor_mat_solver_type cusparse, I don’t have a
>>>    PETSc installation for testing this, hopefully Barry or Junchao
>>>    can confirm this wrong behavior and get this fixed.
>>>    >>> As for permuting PCTELESCOPE and PCBJACOBI, in your case, the
>>>    outer PC will be equivalent, yes.
>>>    >>> However, it would be more efficient to do PCBJACOBI and then
>>>    PCTELESCOPE.
>>>    >>> PCBJACOBI prunes the operator by basically removing all
>>>    coefficients outside of the diagonal blocks.
>>>    >>> Then, PCTELESCOPE "groups everything together”.
>>>    >>> If you do it the other way around, PCTELESCOPE will “group
>>>    everything together” and then PCBJACOBI will prune the operator.
>>>    >>> So the PCTELESCOPE SetUp will be costly for nothing since some
>>>    coefficients will be thrown out afterwards in the PCBJACOBI SetUp.
>>>    >>> I hope I’m clear enough, otherwise I can try do draw some
>>>    pictures.
>>>    >>> Thanks,
>>>    >>> Pierre
>>>    >>>> On 15 Oct 2021, at 4:39 AM, Chang Liu <cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov>> wrote:
>>>    >>>>
>>>    >>>> Hi Pierre and Barry,
>>>    >>>>
>>>    >>>> I think maybe I should use telescope outside bjacobi? like this
>>>    >>>>
>>>    >>>> mpiexec -n 16 --hostfile hostfile --oversubscribe ./ex7 -m
>>>    400 -ksp_view -ksp_monitor_true_residual -pc_type telescope
>>>    -pc_telescope_reduction_factor 4 -t
>>>    >>>> elescope_pc_type bjacobi -telescope_ksp_type fgmres
>>>    -telescope_pc_bjacobi_blocks 4 -mat_type aijcusparse
>>>    -telescope_sub_ksp_type preonly -telescope_sub_pc_type lu
>>>    -telescope_sub_pc_factor_mat_solve
>>>    >>>> r_type cusparse -ksp_max_it 2000 -ksp_rtol 1.e-20 -ksp_atol 1.e-9
>>>    >>>>
>>>    >>>> But then I got an error that
>>>    >>>>
>>>    >>>> [0]PETSC ERROR: MatSolverType cusparse does not support
>>>    matrix type seqaij
>>>    >>>>
>>>    >>>> But the mat type should be aijcusparse. I think telescope
>>>    change the mat type.
>>>    >>>>
>>>    >>>> Chang
>>>    >>>>
>>>    >>>> On 10/14/21 10:11 PM, Chang Liu wrote:
>>>    >>>>> For comparison, here is the output using mumps instead of
>>>    cusparse
>>>    >>>>> $ mpiexec -n 16 --hostfile hostfile --oversubscribe ./ex7 -m
>>>    400 -ksp_view -ksp_monitor_true_residual -pc_type bjacobi
>>>    -pc_bjacobi_blocks 4 -ksp_type fgmres -mat_type aijcusparse
>>>    -sub_pc_type telescope -sub_ksp_type preonly
>>>    -sub_telescope_ksp_type preonly -sub_telescope_pc_type lu
>>>    -sub_telescope_pc_factor_mat_solver_type mumps
>>>    -sub_pc_telescope_reduction_factor 4
>>>    -sub_pc_telescope_subcomm_type contiguous -ksp_max_it 2000
>>>    -ksp_rtol 1.e-20 -ksp_atol 1.e-9
>>>    >>>>>    0 KSP unpreconditioned resid norm 4.014971979977e+01 true
>>>    resid norm 4.014971979977e+01 ||r(i)||/||b|| 1.000000000000e+00
>>>    >>>>>    1 KSP unpreconditioned resid norm 2.439995191694e+00 true
>>>    resid norm 2.439995191694e+00 ||r(i)||/||b|| 6.077240896978e-02
>>>    >>>>>    2 KSP unpreconditioned resid norm 1.280694102588e+00 true
>>>    resid norm 1.280694102588e+00 ||r(i)||/||b|| 3.189795866509e-02
>>>    >>>>>    3 KSP unpreconditioned resid norm 1.041100266810e+00 true
>>>    resid norm 1.041100266810e+00 ||r(i)||/||b|| 2.593044912896e-02
>>>    >>>>>    4 KSP unpreconditioned resid norm 7.274347137268e-01 true
>>>    resid norm 7.274347137268e-01 ||r(i)||/||b|| 1.811805206499e-02
>>>    >>>>>    5 KSP unpreconditioned resid norm 5.429229329787e-01 true
>>>    resid norm 5.429229329787e-01 ||r(i)||/||b|| 1.352245882876e-02
>>>    >>>>>    6 KSP unpreconditioned resid norm 4.332970410353e-01 true
>>>    resid norm 4.332970410353e-01 ||r(i)||/||b|| 1.079203150598e-02
>>>    >>>>>    7 KSP unpreconditioned resid norm 3.948206050950e-01 true
>>>    resid norm 3.948206050950e-01 ||r(i)||/||b|| 9.833707609019e-03
>>>    >>>>>    8 KSP unpreconditioned resid norm 3.379580577269e-01 true
>>>    resid norm 3.379580577269e-01 ||r(i)||/||b|| 8.417444988714e-03
>>>    >>>>>    9 KSP unpreconditioned resid norm 2.875593971410e-01 true
>>>    resid norm 2.875593971410e-01 ||r(i)||/||b|| 7.162176936105e-03
>>>    >>>>>   10 KSP unpreconditioned resid norm 2.533983363244e-01 true
>>>    resid norm 2.533983363244e-01 ||r(i)||/||b|| 6.311335112378e-03
>>>    >>>>>   11 KSP unpreconditioned resid norm 2.389169921094e-01 true
>>>    resid norm 2.389169921094e-01 ||r(i)||/||b|| 5.950651543793e-03
>>>    >>>>>   12 KSP unpreconditioned resid norm 2.118961639089e-01 true
>>>    resid norm 2.118961639089e-01 ||r(i)||/||b|| 5.277649880637e-03
>>>    >>>>>   13 KSP unpreconditioned resid norm 1.885892030223e-01 true
>>>    resid norm 1.885892030223e-01 ||r(i)||/||b|| 4.697148671593e-03
>>>    >>>>>   14 KSP unpreconditioned resid norm 1.763510666948e-01 true
>>>    resid norm 1.763510666948e-01 ||r(i)||/||b|| 4.392336175055e-03
>>>    >>>>>   15 KSP unpreconditioned resid norm 1.638219366731e-01 true
>>>    resid norm 1.638219366731e-01 ||r(i)||/||b|| 4.080275964317e-03
>>>    >>>>>   16 KSP unpreconditioned resid norm 1.476792766432e-01 true
>>>    resid norm 1.476792766432e-01 ||r(i)||/||b|| 3.678214378076e-03
>>>    >>>>>   17 KSP unpreconditioned resid norm 1.349906937321e-01 true
>>>    resid norm 1.349906937321e-01 ||r(i)||/||b|| 3.362182710248e-03
>>>    >>>>>   18 KSP unpreconditioned resid norm 1.289673236836e-01 true
>>>    resid norm 1.289673236836e-01 ||r(i)||/||b|| 3.212159993314e-03
>>>    >>>>>   19 KSP unpreconditioned resid norm 1.167505658153e-01 true
>>>    resid norm 1.167505658153e-01 ||r(i)||/||b|| 2.907879965230e-03
>>>    >>>>>   20 KSP unpreconditioned resid norm 1.046037988999e-01 true
>>>    resid norm 1.046037988999e-01 ||r(i)||/||b|| 2.605343185995e-03
>>>    >>>>>   21 KSP unpreconditioned resid norm 9.832660514331e-02 true
>>>    resid norm 9.832660514331e-02 ||r(i)||/||b|| 2.448998539309e-03
>>>    >>>>>   22 KSP unpreconditioned resid norm 8.835618950141e-02 true
>>>    resid norm 8.835618950142e-02 ||r(i)||/||b|| 2.200667649539e-03
>>>    >>>>>   23 KSP unpreconditioned resid norm 7.563496650115e-02 true
>>>    resid norm 7.563496650116e-02 ||r(i)||/||b|| 1.883823022386e-03
>>>    >>>>>   24 KSP unpreconditioned resid norm 6.651291376834e-02 true
>>>    resid norm 6.651291376834e-02 ||r(i)||/||b|| 1.656622115921e-03
>>>    >>>>>   25 KSP unpreconditioned resid norm 5.890393227906e-02 true
>>>    resid norm 5.890393227906e-02 ||r(i)||/||b|| 1.467106933070e-03
>>>    >>>>>   26 KSP unpreconditioned resid norm 4.661992782780e-02 true
>>>    resid norm 4.661992782780e-02 ||r(i)||/||b|| 1.161152009536e-03
>>>    >>>>>   27 KSP unpreconditioned resid norm 3.690705358716e-02 true
>>>    resid norm 3.690705358716e-02 ||r(i)||/||b|| 9.192356452602e-04
>>>    >>>>>   28 KSP unpreconditioned resid norm 3.209680460188e-02 true
>>>    resid norm 3.209680460188e-02 ||r(i)||/||b|| 7.994278605666e-04
>>>    >>>>>   29 KSP unpreconditioned resid norm 2.354337626000e-02 true
>>>    resid norm 2.354337626001e-02 ||r(i)||/||b|| 5.863895533373e-04
>>>    >>>>>   30 KSP unpreconditioned resid norm 1.701296561785e-02 true
>>>    resid norm 1.701296561785e-02 ||r(i)||/||b|| 4.237380908932e-04
>>>    >>>>>   31 KSP unpreconditioned resid norm 1.509942937258e-02 true
>>>    resid norm 1.509942937258e-02 ||r(i)||/||b|| 3.760780759588e-04
>>>    >>>>>   32 KSP unpreconditioned resid norm 1.258274688515e-02 true
>>>    resid norm 1.258274688515e-02 ||r(i)||/||b|| 3.133956338402e-04
>>>    >>>>>   33 KSP unpreconditioned resid norm 9.805748771638e-03 true
>>>    resid norm 9.805748771638e-03 ||r(i)||/||b|| 2.442295692359e-04
>>>    >>>>>   34 KSP unpreconditioned resid norm 8.596552678160e-03 true
>>>    resid norm 8.596552678160e-03 ||r(i)||/||b|| 2.141123953301e-04
>>>    >>>>>   35 KSP unpreconditioned resid norm 6.936406707500e-03 true
>>>    resid norm 6.936406707500e-03 ||r(i)||/||b|| 1.727635147167e-04
>>>    >>>>>   36 KSP unpreconditioned resid norm 5.533741607932e-03 true
>>>    resid norm 5.533741607932e-03 ||r(i)||/||b|| 1.378276519869e-04
>>>    >>>>>   37 KSP unpreconditioned resid norm 4.982347757923e-03 true
>>>    resid norm 4.982347757923e-03 ||r(i)||/||b|| 1.240942099414e-04
>>>    >>>>>   38 KSP unpreconditioned resid norm 4.309608348059e-03 true
>>>    resid norm 4.309608348059e-03 ||r(i)||/||b|| 1.073384414524e-04
>>>    >>>>>   39 KSP unpreconditioned resid norm 3.729408303186e-03 true
>>>    resid norm 3.729408303185e-03 ||r(i)||/||b|| 9.288753001974e-05
>>>    >>>>>   40 KSP unpreconditioned resid norm 3.490003351128e-03 true
>>>    resid norm 3.490003351128e-03 ||r(i)||/||b|| 8.692472496776e-05
>>>    >>>>>   41 KSP unpreconditioned resid norm 3.069012426454e-03 true
>>>    resid norm 3.069012426453e-03 ||r(i)||/||b|| 7.643919912166e-05
>>>    >>>>>   42 KSP unpreconditioned resid norm 2.772928845284e-03 true
>>>    resid norm 2.772928845284e-03 ||r(i)||/||b|| 6.906471225983e-05
>>>    >>>>>   43 KSP unpreconditioned resid norm 2.561454192399e-03 true
>>>    resid norm 2.561454192398e-03 ||r(i)||/||b|| 6.379756085902e-05
>>>    >>>>>   44 KSP unpreconditioned resid norm 2.253662762802e-03 true
>>>    resid norm 2.253662762802e-03 ||r(i)||/||b|| 5.613146926159e-05
>>>    >>>>>   45 KSP unpreconditioned resid norm 2.086800523919e-03 true
>>>    resid norm 2.086800523919e-03 ||r(i)||/||b|| 5.197546917701e-05
>>>    >>>>>   46 KSP unpreconditioned resid norm 1.926028182896e-03 true
>>>    resid norm 1.926028182896e-03 ||r(i)||/||b|| 4.797114880257e-05
>>>    >>>>>   47 KSP unpreconditioned resid norm 1.769243808622e-03 true
>>>    resid norm 1.769243808622e-03 ||r(i)||/||b|| 4.406615581492e-05
>>>    >>>>>   48 KSP unpreconditioned resid norm 1.656654905964e-03 true
>>>    resid norm 1.656654905964e-03 ||r(i)||/||b|| 4.126192945371e-05
>>>    >>>>>   49 KSP unpreconditioned resid norm 1.572052627273e-03 true
>>>    resid norm 1.572052627273e-03 ||r(i)||/||b|| 3.915475961260e-05
>>>    >>>>>   50 KSP unpreconditioned resid norm 1.454960682355e-03 true
>>>    resid norm 1.454960682355e-03 ||r(i)||/||b|| 3.623837699518e-05
>>>    >>>>>   51 KSP unpreconditioned resid norm 1.375985053014e-03 true
>>>    resid norm 1.375985053014e-03 ||r(i)||/||b|| 3.427134883820e-05
>>>    >>>>>   52 KSP unpreconditioned resid norm 1.269325501087e-03 true
>>>    resid norm 1.269325501087e-03 ||r(i)||/||b|| 3.161480347603e-05
>>>    >>>>>   53 KSP unpreconditioned resid norm 1.184791772965e-03 true
>>>    resid norm 1.184791772965e-03 ||r(i)||/||b|| 2.950934100844e-05
>>>    >>>>>   54 KSP unpreconditioned resid norm 1.064535156080e-03 true
>>>    resid norm 1.064535156080e-03 ||r(i)||/||b|| 2.651413662135e-05
>>>    >>>>>   55 KSP unpreconditioned resid norm 9.639036688120e-04 true
>>>    resid norm 9.639036688117e-04 ||r(i)||/||b|| 2.400773090370e-05
>>>    >>>>>   56 KSP unpreconditioned resid norm 8.632359780260e-04 true
>>>    resid norm 8.632359780260e-04 ||r(i)||/||b|| 2.150042347322e-05
>>>    >>>>>   57 KSP unpreconditioned resid norm 7.613605783850e-04 true
>>>    resid norm 7.613605783850e-04 ||r(i)||/||b|| 1.896303591113e-05
>>>    >>>>>   58 KSP unpreconditioned resid norm 6.681073248348e-04 true
>>>    resid norm 6.681073248349e-04 ||r(i)||/||b|| 1.664039819373e-05
>>>    >>>>>   59 KSP unpreconditioned resid norm 5.656127908544e-04 true
>>>    resid norm 5.656127908545e-04 ||r(i)||/||b|| 1.408758999254e-05
>>>    >>>>>   60 KSP unpreconditioned resid norm 4.850863370767e-04 true
>>>    resid norm 4.850863370767e-04 ||r(i)||/||b|| 1.208193580169e-05
>>>    >>>>>   61 KSP unpreconditioned resid norm 4.374055762320e-04 true
>>>    resid norm 4.374055762316e-04 ||r(i)||/||b|| 1.089436186387e-05
>>>    >>>>>   62 KSP unpreconditioned resid norm 3.874398257079e-04 true
>>>    resid norm 3.874398257077e-04 ||r(i)||/||b|| 9.649876204364e-06
>>>    >>>>>   63 KSP unpreconditioned resid norm 3.364908694427e-04 true
>>>    resid norm 3.364908694429e-04 ||r(i)||/||b|| 8.380902061609e-06
>>>    >>>>>   64 KSP unpreconditioned resid norm 2.961034697265e-04 true
>>>    resid norm 2.961034697268e-04 ||r(i)||/||b|| 7.374982221632e-06
>>>    >>>>>   65 KSP unpreconditioned resid norm 2.640593092764e-04 true
>>>    resid norm 2.640593092767e-04 ||r(i)||/||b|| 6.576865557059e-06
>>>    >>>>>   66 KSP unpreconditioned resid norm 2.423231125743e-04 true
>>>    resid norm 2.423231125745e-04 ||r(i)||/||b|| 6.035487016671e-06
>>>    >>>>>   67 KSP unpreconditioned resid norm 2.182349471179e-04 true
>>>    resid norm 2.182349471179e-04 ||r(i)||/||b|| 5.435528521898e-06
>>>    >>>>>   68 KSP unpreconditioned resid norm 2.008438265031e-04 true
>>>    resid norm 2.008438265028e-04 ||r(i)||/||b|| 5.002371809927e-06
>>>    >>>>>   69 KSP unpreconditioned resid norm 1.838732863386e-04 true
>>>    resid norm 1.838732863388e-04 ||r(i)||/||b|| 4.579690400226e-06
>>>    >>>>>   70 KSP unpreconditioned resid norm 1.723786027645e-04 true
>>>    resid norm 1.723786027645e-04 ||r(i)||/||b|| 4.293394913444e-06
>>>    >>>>>   71 KSP unpreconditioned resid norm 1.580945192204e-04 true
>>>    resid norm 1.580945192205e-04 ||r(i)||/||b|| 3.937624471826e-06
>>>    >>>>>   72 KSP unpreconditioned resid norm 1.476687469671e-04 true
>>>    resid norm 1.476687469671e-04 ||r(i)||/||b|| 3.677952117812e-06
>>>    >>>>>   73 KSP unpreconditioned resid norm 1.385018526182e-04 true
>>>    resid norm 1.385018526184e-04 ||r(i)||/||b|| 3.449634351350e-06
>>>    >>>>>   74 KSP unpreconditioned resid norm 1.279712893541e-04 true
>>>    resid norm 1.279712893541e-04 ||r(i)||/||b|| 3.187351991305e-06
>>>    >>>>>   75 KSP unpreconditioned resid norm 1.202010411772e-04 true
>>>    resid norm 1.202010411774e-04 ||r(i)||/||b|| 2.993820175504e-06
>>>    >>>>>   76 KSP unpreconditioned resid norm 1.113459414198e-04 true
>>>    resid norm 1.113459414200e-04 ||r(i)||/||b|| 2.773268206485e-06
>>>    >>>>>   77 KSP unpreconditioned resid norm 1.042523036036e-04 true
>>>    resid norm 1.042523036037e-04 ||r(i)||/||b|| 2.596588572066e-06
>>>    >>>>>   78 KSP unpreconditioned resid norm 9.565176453232e-05 true
>>>    resid norm 9.565176453227e-05 ||r(i)||/||b|| 2.382376888539e-06
>>>    >>>>>   79 KSP unpreconditioned resid norm 8.896901670359e-05 true
>>>    resid norm 8.896901670365e-05 ||r(i)||/||b|| 2.215931198209e-06
>>>    >>>>>   80 KSP unpreconditioned resid norm 8.119298425803e-05 true
>>>    resid norm 8.119298425824e-05 ||r(i)||/||b|| 2.022255314935e-06
>>>    >>>>>   81 KSP unpreconditioned resid norm 7.544528309154e-05 true
>>>    resid norm 7.544528309154e-05 ||r(i)||/||b|| 1.879098620558e-06
>>>    >>>>>   82 KSP unpreconditioned resid norm 6.755385041138e-05 true
>>>    resid norm 6.755385041176e-05 ||r(i)||/||b|| 1.682548489719e-06
>>>    >>>>>   83 KSP unpreconditioned resid norm 6.158629300870e-05 true
>>>    resid norm 6.158629300835e-05 ||r(i)||/||b|| 1.533915885727e-06
>>>    >>>>>   84 KSP unpreconditioned resid norm 5.358756885754e-05 true
>>>    resid norm 5.358756885765e-05 ||r(i)||/||b|| 1.334693470462e-06
>>>    >>>>>   85 KSP unpreconditioned resid norm 4.774852370380e-05 true
>>>    resid norm 4.774852370387e-05 ||r(i)||/||b|| 1.189261692037e-06
>>>    >>>>>   86 KSP unpreconditioned resid norm 3.919358737908e-05 true
>>>    resid norm 3.919358737930e-05 ||r(i)||/||b|| 9.761858258229e-07
>>>    >>>>>   87 KSP unpreconditioned resid norm 3.434042319950e-05 true
>>>    resid norm 3.434042319947e-05 ||r(i)||/||b|| 8.553091620745e-07
>>>    >>>>>   88 KSP unpreconditioned resid norm 2.813699436281e-05 true
>>>    resid norm 2.813699436302e-05 ||r(i)||/||b|| 7.008017615898e-07
>>>    >>>>>   89 KSP unpreconditioned resid norm 2.462248069068e-05 true
>>>    resid norm 2.462248069051e-05 ||r(i)||/||b|| 6.132665635851e-07
>>>    >>>>>   90 KSP unpreconditioned resid norm 2.040558789626e-05 true
>>>    resid norm 2.040558789626e-05 ||r(i)||/||b|| 5.082373674841e-07
>>>    >>>>>   91 KSP unpreconditioned resid norm 1.888523204468e-05 true
>>>    resid norm 1.888523204470e-05 ||r(i)||/||b|| 4.703702077842e-07
>>>    >>>>>   92 KSP unpreconditioned resid norm 1.707071292484e-05 true
>>>    resid norm 1.707071292474e-05 ||r(i)||/||b|| 4.251763900191e-07
>>>    >>>>>   93 KSP unpreconditioned resid norm 1.498636454665e-05 true
>>>    resid norm 1.498636454672e-05 ||r(i)||/||b|| 3.732619958859e-07
>>>    >>>>>   94 KSP unpreconditioned resid norm 1.219393542993e-05 true
>>>    resid norm 1.219393543006e-05 ||r(i)||/||b|| 3.037115947725e-07
>>>    >>>>>   95 KSP unpreconditioned resid norm 1.059996963300e-05 true
>>>    resid norm 1.059996963303e-05 ||r(i)||/||b|| 2.640110487917e-07
>>>    >>>>>   96 KSP unpreconditioned resid norm 9.099659872548e-06 true
>>>    resid norm 9.099659873214e-06 ||r(i)||/||b|| 2.266431725699e-07
>>>    >>>>>   97 KSP unpreconditioned resid norm 8.147347587295e-06 true
>>>    resid norm 8.147347587584e-06 ||r(i)||/||b|| 2.029241456283e-07
>>>    >>>>>   98 KSP unpreconditioned resid norm 7.167226146744e-06 true
>>>    resid norm 7.167226146783e-06 ||r(i)||/||b|| 1.785124823418e-07
>>>    >>>>>   99 KSP unpreconditioned resid norm 6.552540209538e-06 true
>>>    resid norm 6.552540209577e-06 ||r(i)||/||b|| 1.632026385802e-07
>>>    >>>>> 100 KSP unpreconditioned resid norm 5.767783600111e-06 true
>>>    resid norm 5.767783600320e-06 ||r(i)||/||b|| 1.436568830140e-07
>>>    >>>>> 101 KSP unpreconditioned resid norm 5.261057430584e-06 true
>>>    resid norm 5.261057431144e-06 ||r(i)||/||b|| 1.310359688033e-07
>>>    >>>>> 102 KSP unpreconditioned resid norm 4.715498525786e-06 true
>>>    resid norm 4.715498525947e-06 ||r(i)||/||b|| 1.174478564100e-07
>>>    >>>>> 103 KSP unpreconditioned resid norm 4.380052669622e-06 true
>>>    resid norm 4.380052669825e-06 ||r(i)||/||b|| 1.090929822591e-07
>>>    >>>>> 104 KSP unpreconditioned resid norm 3.911664470060e-06 true
>>>    resid norm 3.911664470226e-06 ||r(i)||/||b|| 9.742694319496e-08
>>>    >>>>> 105 KSP unpreconditioned resid norm 3.652211458315e-06 true
>>>    resid norm 3.652211458259e-06 ||r(i)||/||b|| 9.096480564430e-08
>>>    >>>>> 106 KSP unpreconditioned resid norm 3.387532128049e-06 true
>>>    resid norm 3.387532128358e-06 ||r(i)||/||b|| 8.437249737363e-08
>>>    >>>>> 107 KSP unpreconditioned resid norm 3.234218880987e-06 true
>>>    resid norm 3.234218880798e-06 ||r(i)||/||b|| 8.055395895481e-08
>>>    >>>>> 108 KSP unpreconditioned resid norm 3.016905196388e-06 true
>>>    resid norm 3.016905196492e-06 ||r(i)||/||b|| 7.514137611763e-08
>>>    >>>>> 109 KSP unpreconditioned resid norm 2.858246441921e-06 true
>>>    resid norm 2.858246441975e-06 ||r(i)||/||b|| 7.118969836476e-08
>>>    >>>>> 110 KSP unpreconditioned resid norm 2.637118810847e-06 true
>>>    resid norm 2.637118810750e-06 ||r(i)||/||b|| 6.568212241336e-08
>>>    >>>>> 111 KSP unpreconditioned resid norm 2.494976088717e-06 true
>>>    resid norm 2.494976088700e-06 ||r(i)||/||b|| 6.214180574966e-08
>>>    >>>>> 112 KSP unpreconditioned resid norm 2.270639574272e-06 true
>>>    resid norm 2.270639574200e-06 ||r(i)||/||b|| 5.655430686750e-08
>>>    >>>>> 113 KSP unpreconditioned resid norm 2.104988663813e-06 true
>>>    resid norm 2.104988664169e-06 ||r(i)||/||b|| 5.242847707696e-08
>>>    >>>>> 114 KSP unpreconditioned resid norm 1.889361127301e-06 true
>>>    resid norm 1.889361127526e-06 ||r(i)||/||b|| 4.705789073868e-08
>>>    >>>>> 115 KSP unpreconditioned resid norm 1.732367008052e-06 true
>>>    resid norm 1.732367007971e-06 ||r(i)||/||b|| 4.314767367271e-08
>>>    >>>>> 116 KSP unpreconditioned resid norm 1.509288268391e-06 true
>>>    resid norm 1.509288268645e-06 ||r(i)||/||b|| 3.759150191264e-08
>>>    >>>>> 117 KSP unpreconditioned resid norm 1.359169217644e-06 true
>>>    resid norm 1.359169217445e-06 ||r(i)||/||b|| 3.385252062089e-08
>>>    >>>>> 118 KSP unpreconditioned resid norm 1.180146337735e-06 true
>>>    resid norm 1.180146337908e-06 ||r(i)||/||b|| 2.939363820703e-08
>>>    >>>>> 119 KSP unpreconditioned resid norm 1.067757039683e-06 true
>>>    resid norm 1.067757039924e-06 ||r(i)||/||b|| 2.659438335433e-08
>>>    >>>>> 120 KSP unpreconditioned resid norm 9.435833073736e-07 true
>>>    resid norm 9.435833073736e-07 ||r(i)||/||b|| 2.350161625235e-08
>>>    >>>>> 121 KSP unpreconditioned resid norm 8.749457237613e-07 true
>>>    resid norm 8.749457236791e-07 ||r(i)||/||b|| 2.179207546261e-08
>>>    >>>>> 122 KSP unpreconditioned resid norm 7.945760150897e-07 true
>>>    resid norm 7.945760150444e-07 ||r(i)||/||b|| 1.979032528762e-08
>>>    >>>>> 123 KSP unpreconditioned resid norm 7.141240839013e-07 true
>>>    resid norm 7.141240838682e-07 ||r(i)||/||b|| 1.778652721438e-08
>>>    >>>>> 124 KSP unpreconditioned resid norm 6.300566936733e-07 true
>>>    resid norm 6.300566936607e-07 ||r(i)||/||b|| 1.569267971988e-08
>>>    >>>>> 125 KSP unpreconditioned resid norm 5.628986997544e-07 true
>>>    resid norm 5.628986995849e-07 ||r(i)||/||b|| 1.401999073448e-08
>>>    >>>>> 126 KSP unpreconditioned resid norm 5.119018951602e-07 true
>>>    resid norm 5.119018951837e-07 ||r(i)||/||b|| 1.274982484900e-08
>>>    >>>>> 127 KSP unpreconditioned resid norm 4.664670343748e-07 true
>>>    resid norm 4.664670344042e-07 ||r(i)||/||b|| 1.161818903670e-08
>>>    >>>>> 128 KSP unpreconditioned resid norm 4.253264691112e-07 true
>>>    resid norm 4.253264691948e-07 ||r(i)||/||b|| 1.059351027394e-08
>>>    >>>>> 129 KSP unpreconditioned resid norm 3.868921150516e-07 true
>>>    resid norm 3.868921150517e-07 ||r(i)||/||b|| 9.636234498800e-09
>>>    >>>>> 130 KSP unpreconditioned resid norm 3.558445658540e-07 true
>>>    resid norm 3.558445660061e-07 ||r(i)||/||b|| 8.862940209315e-09
>>>    >>>>> 131 KSP unpreconditioned resid norm 3.268710273840e-07 true
>>>    resid norm 3.268710272455e-07 ||r(i)||/||b|| 8.141302825416e-09
>>>    >>>>> 132 KSP unpreconditioned resid norm 3.041273897592e-07 true
>>>    resid norm 3.041273896694e-07 ||r(i)||/||b|| 7.574832182794e-09
>>>    >>>>> 133 KSP unpreconditioned resid norm 2.851926677922e-07 true
>>>    resid norm 2.851926674248e-07 ||r(i)||/||b|| 7.103229333782e-09
>>>    >>>>> 134 KSP unpreconditioned resid norm 2.694708315072e-07 true
>>>    resid norm 2.694708309500e-07 ||r(i)||/||b|| 6.711649104748e-09
>>>    >>>>> 135 KSP unpreconditioned resid norm 2.534825559099e-07 true
>>>    resid norm 2.534825557469e-07 ||r(i)||/||b|| 6.313432746507e-09
>>>    >>>>> 136 KSP unpreconditioned resid norm 2.387342352458e-07 true
>>>    resid norm 2.387342351804e-07 ||r(i)||/||b|| 5.946099658254e-09
>>>    >>>>> 137 KSP unpreconditioned resid norm 2.200861667617e-07 true
>>>    resid norm 2.200861665255e-07 ||r(i)||/||b|| 5.481636425438e-09
>>>    >>>>> 138 KSP unpreconditioned resid norm 2.051415370616e-07 true
>>>    resid norm 2.051415370614e-07 ||r(i)||/||b|| 5.109413915824e-09
>>>    >>>>> 139 KSP unpreconditioned resid norm 1.887376429396e-07 true
>>>    resid norm 1.887376426682e-07 ||r(i)||/||b|| 4.700845824315e-09
>>>    >>>>> 140 KSP unpreconditioned resid norm 1.729743133005e-07 true
>>>    resid norm 1.729743128342e-07 ||r(i)||/||b|| 4.308232129561e-09
>>>    >>>>> 141 KSP unpreconditioned resid norm 1.541021130781e-07 true
>>>    resid norm 1.541021128364e-07 ||r(i)||/||b|| 3.838186508023e-09
>>>    >>>>> 142 KSP unpreconditioned resid norm 1.384631628565e-07 true
>>>    resid norm 1.384631627735e-07 ||r(i)||/||b|| 3.448670712125e-09
>>>    >>>>> 143 KSP unpreconditioned resid norm 1.223114405626e-07 true
>>>    resid norm 1.223114403883e-07 ||r(i)||/||b|| 3.046383411846e-09
>>>    >>>>> 144 KSP unpreconditioned resid norm 1.087313066223e-07 true
>>>    resid norm 1.087313065117e-07 ||r(i)||/||b|| 2.708146085550e-09
>>>    >>>>> 145 KSP unpreconditioned resid norm 9.181901998734e-08 true
>>>    resid norm 9.181901984268e-08 ||r(i)||/||b|| 2.286915582489e-09
>>>    >>>>> 146 KSP unpreconditioned resid norm 7.885850510808e-08 true
>>>    resid norm 7.885850531446e-08 ||r(i)||/||b|| 1.964110975313e-09
>>>    >>>>> 147 KSP unpreconditioned resid norm 6.483393946950e-08 true
>>>    resid norm 6.483393931383e-08 ||r(i)||/||b|| 1.614804278515e-09
>>>    >>>>> 148 KSP unpreconditioned resid norm 5.690132597004e-08 true
>>>    resid norm 5.690132577518e-08 ||r(i)||/||b|| 1.417228465328e-09
>>>    >>>>> 149 KSP unpreconditioned resid norm 5.023671521579e-08 true
>>>    resid norm 5.023671502186e-08 ||r(i)||/||b|| 1.251234511035e-09
>>>    >>>>> 150 KSP unpreconditioned resid norm 4.625371062660e-08 true
>>>    resid norm 4.625371062660e-08 ||r(i)||/||b|| 1.152030720445e-09
>>>    >>>>> 151 KSP unpreconditioned resid norm 4.349049084805e-08 true
>>>    resid norm 4.349049089337e-08 ||r(i)||/||b|| 1.083207830846e-09
>>>    >>>>> 152 KSP unpreconditioned resid norm 3.932593324498e-08 true
>>>    resid norm 3.932593376918e-08 ||r(i)||/||b|| 9.794821474546e-10
>>>    >>>>> 153 KSP unpreconditioned resid norm 3.504167649202e-08 true
>>>    resid norm 3.504167638113e-08 ||r(i)||/||b|| 8.727751166356e-10
>>>    >>>>> 154 KSP unpreconditioned resid norm 2.892726347747e-08 true
>>>    resid norm 2.892726348583e-08 ||r(i)||/||b|| 7.204848160858e-10
>>>    >>>>> 155 KSP unpreconditioned resid norm 2.477647033202e-08 true
>>>    resid norm 2.477647041570e-08 ||r(i)||/||b|| 6.171019508795e-10
>>>    >>>>> 156 KSP unpreconditioned resid norm 2.128504065757e-08 true
>>>    resid norm 2.128504067423e-08 ||r(i)||/||b|| 5.301416991298e-10
>>>    >>>>> 157 KSP unpreconditioned resid norm 1.879248809429e-08 true
>>>    resid norm 1.879248818928e-08 ||r(i)||/||b|| 4.680602575310e-10
>>>    >>>>> 158 KSP unpreconditioned resid norm 1.673649140073e-08 true
>>>    resid norm 1.673649134005e-08 ||r(i)||/||b|| 4.168520085200e-10
>>>    >>>>> 159 KSP unpreconditioned resid norm 1.497123388109e-08 true
>>>    resid norm 1.497123365569e-08 ||r(i)||/||b|| 3.728851342016e-10
>>>    >>>>> 160 KSP unpreconditioned resid norm 1.315982130162e-08 true
>>>    resid norm 1.315982149329e-08 ||r(i)||/||b|| 3.277687007261e-10
>>>    >>>>> 161 KSP unpreconditioned resid norm 1.182395864938e-08 true
>>>    resid norm 1.182395868430e-08 ||r(i)||/||b|| 2.944966675550e-10
>>>    >>>>> 162 KSP unpreconditioned resid norm 1.070204481679e-08 true
>>>    resid norm 1.070204466432e-08 ||r(i)||/||b|| 2.665534085342e-10
>>>    >>>>> 163 KSP unpreconditioned resid norm 9.969290307649e-09 true
>>>    resid norm 9.969290432333e-09 ||r(i)||/||b|| 2.483028644297e-10
>>>    >>>>> 164 KSP unpreconditioned resid norm 9.134440883306e-09 true
>>>    resid norm 9.134440980976e-09 ||r(i)||/||b|| 2.275094577628e-10
>>>    >>>>> 165 KSP unpreconditioned resid norm 8.593316427292e-09 true
>>>    resid norm 8.593316413360e-09 ||r(i)||/||b|| 2.140317904139e-10
>>>    >>>>> 166 KSP unpreconditioned resid norm 8.042173048464e-09 true
>>>    resid norm 8.042173332848e-09 ||r(i)||/||b|| 2.003045942277e-10
>>>    >>>>> 167 KSP unpreconditioned resid norm 7.655518522782e-09 true
>>>    resid norm 7.655518879144e-09 ||r(i)||/||b|| 1.906742791064e-10
>>>    >>>>> 168 KSP unpreconditioned resid norm 7.210283391815e-09 true
>>>    resid norm 7.210283220312e-09 ||r(i)||/||b|| 1.795848951442e-10
>>>    >>>>> 169 KSP unpreconditioned resid norm 6.793967416271e-09 true
>>>    resid norm 6.793967448832e-09 ||r(i)||/||b|| 1.692158122825e-10
>>>    >>>>> 170 KSP unpreconditioned resid norm 6.249160304588e-09 true
>>>    resid norm 6.249160382647e-09 ||r(i)||/||b|| 1.556464257736e-10
>>>    >>>>> 171 KSP unpreconditioned resid norm 5.794936438798e-09 true
>>>    resid norm 5.794936332552e-09 ||r(i)||/||b|| 1.443331699811e-10
>>>    >>>>> 172 KSP unpreconditioned resid norm 5.222337397128e-09 true
>>>    resid norm 5.222337443277e-09 ||r(i)||/||b|| 1.300715788135e-10
>>>    >>>>> 173 KSP unpreconditioned resid norm 4.755359110447e-09 true
>>>    resid norm 4.755358888996e-09 ||r(i)||/||b|| 1.184406494668e-10
>>>    >>>>> 174 KSP unpreconditioned resid norm 4.317537007873e-09 true
>>>    resid norm 4.317537267718e-09 ||r(i)||/||b|| 1.075359252630e-10
>>>    >>>>> 175 KSP unpreconditioned resid norm 3.924177535665e-09 true
>>>    resid norm 3.924177629720e-09 ||r(i)||/||b|| 9.773860563138e-11
>>>    >>>>> 176 KSP unpreconditioned resid norm 3.502843065115e-09 true
>>>    resid norm 3.502843126359e-09 ||r(i)||/||b|| 8.724452234855e-11
>>>    >>>>> 177 KSP unpreconditioned resid norm 3.083873232869e-09 true
>>>    resid norm 3.083873352938e-09 ||r(i)||/||b|| 7.680933686007e-11
>>>    >>>>> 178 KSP unpreconditioned resid norm 2.758980676473e-09 true
>>>    resid norm 2.758980618096e-09 ||r(i)||/||b|| 6.871730691658e-11
>>>    >>>>> 179 KSP unpreconditioned resid norm 2.510978240429e-09 true
>>>    resid norm 2.510978327392e-09 ||r(i)||/||b|| 6.254036989334e-11
>>>    >>>>> 180 KSP unpreconditioned resid norm 2.323000193205e-09 true
>>>    resid norm 2.323000193205e-09 ||r(i)||/||b|| 5.785844097519e-11
>>>    >>>>> 181 KSP unpreconditioned resid norm 2.167480159274e-09 true
>>>    resid norm 2.167480113693e-09 ||r(i)||/||b|| 5.398493749153e-11
>>>    >>>>> 182 KSP unpreconditioned resid norm 1.983545827983e-09 true
>>>    resid norm 1.983546404840e-09 ||r(i)||/||b|| 4.940374216139e-11
>>>    >>>>> 183 KSP unpreconditioned resid norm 1.794576286774e-09 true
>>>    resid norm 1.794576224361e-09 ||r(i)||/||b|| 4.469710457036e-11
>>>    >>>>> 184 KSP unpreconditioned resid norm 1.583490590644e-09 true
>>>    resid norm 1.583490380603e-09 ||r(i)||/||b|| 3.943963715064e-11
>>>    >>>>> 185 KSP unpreconditioned resid norm 1.412659866247e-09 true
>>>    resid norm 1.412659832191e-09 ||r(i)||/||b|| 3.518479927722e-11
>>>    >>>>> 186 KSP unpreconditioned resid norm 1.285613344939e-09 true
>>>    resid norm 1.285612984761e-09 ||r(i)||/||b|| 3.202047215205e-11
>>>    >>>>> 187 KSP unpreconditioned resid norm 1.168115133929e-09 true
>>>    resid norm 1.168114766904e-09 ||r(i)||/||b|| 2.909397058634e-11
>>>    >>>>> 188 KSP unpreconditioned resid norm 1.063377926053e-09 true
>>>    resid norm 1.063377647554e-09 ||r(i)||/||b|| 2.648530681802e-11
>>>    >>>>> 189 KSP unpreconditioned resid norm 9.548967728122e-10 true
>>>    resid norm 9.548964523410e-10 ||r(i)||/||b|| 2.378339019807e-11
>>>    >>>>> KSP Object: 16 MPI processes
>>>    >>>>>    type: fgmres
>>>    >>>>>      restart=30, using Classical (unmodified) Gram-Schmidt
>>>    Orthogonalization with no iterative refinement
>>>    >>>>>      happy breakdown tolerance 1e-30
>>>    >>>>>    maximum iterations=2000, initial guess is zero
>>>    >>>>>    tolerances:  relative=1e-20, absolute=1e-09,
>>>    divergence=10000.
>>>    >>>>>    right preconditioning
>>>    >>>>>    using UNPRECONDITIONED norm type for convergence test
>>>    >>>>> PC Object: 16 MPI processes
>>>    >>>>>    type: bjacobi
>>>    >>>>>      number of blocks = 4
>>>    >>>>>      Local solver information for first block is in the
>>>    following KSP and PC objects on rank 0:
>>>    >>>>>      Use -ksp_view ::ascii_info_detail to display
>>>    information for all blocks
>>>    >>>>>    KSP Object: (sub_) 4 MPI processes
>>>    >>>>>      type: preonly
>>>    >>>>>      maximum iterations=10000, initial guess is zero
>>>    >>>>>      tolerances:  relative=1e-05, absolute=1e-50,
>>>    divergence=10000.
>>>    >>>>>      left preconditioning
>>>    >>>>>      using NONE norm type for convergence test
>>>    >>>>>    PC Object: (sub_) 4 MPI processes
>>>    >>>>>      type: telescope
>>>    >>>>>        petsc subcomm: parent comm size reduction factor = 4
>>>    >>>>>        petsc subcomm: parent_size = 4 , subcomm_size = 1
>>>    >>>>>        petsc subcomm type = contiguous
>>>    >>>>>      linear system matrix = precond matrix:
>>>    >>>>>      Mat Object: (sub_) 4 MPI processes
>>>    >>>>>        type: mpiaij
>>>    >>>>>        rows=40200, cols=40200
>>>    >>>>>        total: nonzeros=199996, allocated nonzeros=203412
>>>    >>>>>        total number of mallocs used during MatSetValues calls=0
>>>    >>>>>          not using I-node (on process 0) routines
>>>    >>>>>          setup type: default
>>>    >>>>>          Parent DM object: NULL
>>>    >>>>>          Sub DM object: NULL
>>>    >>>>>          KSP Object:   (sub_telescope_)   1 MPI processes
>>>    >>>>>            type: preonly
>>>    >>>>>            maximum iterations=10000, initial guess is zero
>>>    >>>>>            tolerances:  relative=1e-05, absolute=1e-50,
>>>    divergence=10000.
>>>    >>>>>            left preconditioning
>>>    >>>>>            using NONE norm type for convergence test
>>>    >>>>>          PC Object:   (sub_telescope_)   1 MPI processes
>>>    >>>>>            type: lu
>>>    >>>>>              out-of-place factorization
>>>    >>>>>              tolerance for zero pivot 2.22045e-14
>>>    >>>>>              matrix ordering: external
>>>    >>>>>              factor fill ratio given 0., needed 0.
>>>    >>>>>                Factored matrix follows:
>>>    >>>>>                  Mat Object:   1 MPI processes
>>>    >>>>>                    type: mumps
>>>    >>>>>                    rows=40200, cols=40200
>>>    >>>>>                    package used to perform factorization: mumps
>>>    >>>>>                    total: nonzeros=1849788, allocated
>>>    nonzeros=1849788
>>>    >>>>>                      MUMPS run parameters:
>>>    >>>>>                        SYM (matrix type):                   0
>>>    >>>>>                        PAR (host participation):            1
>>>    >>>>>                        ICNTL(1) (output for error):         6
>>>    >>>>>                        ICNTL(2) (output of diagnostic msg): 0
>>>    >>>>>                        ICNTL(3) (output for global info):   0
>>>    >>>>>                        ICNTL(4) (level of printing):        0
>>>    >>>>>                        ICNTL(5) (input mat struct):         0
>>>    >>>>>                        ICNTL(6) (matrix prescaling):        7
>>>    >>>>>                        ICNTL(7) (sequential matrix ordering):7
>>>    >>>>>                        ICNTL(8) (scaling strategy):        77
>>>    >>>>>                        ICNTL(10) (max num of refinements):  0
>>>    >>>>>                        ICNTL(11) (error analysis):          0
>>>    >>>>>                        ICNTL(12) (efficiency control):        1
>>>    >>>>>                        ICNTL(13) (sequential factorization
>>>    of the root node):  0
>>>    >>>>>                        ICNTL(14) (percentage of estimated
>>>    workspace increase): 20
>>>    >>>>>                        ICNTL(18) (input mat struct):        0
>>>    >>>>>                        ICNTL(19) (Schur complement info):           0
>>>    >>>>>                        ICNTL(20) (RHS sparse pattern):        0
>>>    >>>>>                        ICNTL(21) (solution struct):        0
>>>    >>>>>                        ICNTL(22) (in-core/out-of-core
>>>    facility):        0
>>>    >>>>>                        ICNTL(23) (max size of memory can be
>>>    allocated locally):0
>>>    >>>>>                        ICNTL(24) (detection of null pivot
>>>    rows):        0
>>>    >>>>>                        ICNTL(25) (computation of a null
>>>    space basis):         0
>>>    >>>>>                        ICNTL(26) (Schur options for RHS or
>>>    solution):         0
>>>    >>>>>                        ICNTL(27) (blocking size for multiple
>>>    RHS):         -32
>>>    >>>>>                        ICNTL(28) (use parallel or sequential
>>>    ordering):         1
>>>    >>>>>                        ICNTL(29) (parallel ordering):        0
>>>    >>>>>                        ICNTL(30) (user-specified set of
>>>    entries in inv(A)):    0
>>>    >>>>>                        ICNTL(31) (factors is discarded in
>>>    the solve phase):    0
>>>    >>>>>                        ICNTL(33) (compute determinant):        0
>>>    >>>>>                        ICNTL(35) (activate BLR based
>>>    factorization):         0
>>>    >>>>>                        ICNTL(36) (choice of BLR
>>>    factorization variant):         0
>>>    >>>>>                        ICNTL(38) (estimated compression rate
>>>    of LU factors):   333
>>>    >>>>>                        CNTL(1) (relative pivoting
>>>    threshold):      0.01
>>>    >>>>>                        CNTL(2) (stopping criterion of
>>>    refinement): 1.49012e-08
>>>    >>>>>                        CNTL(3) (absolute pivoting
>>>    threshold):      0.
>>>    >>>>>                        CNTL(4) (value of static pivoting):            -1.
>>>    >>>>>                        CNTL(5) (fixation for null pivots):            0.
>>>    >>>>>                        CNTL(7) (dropping parameter for
>>>    BLR):       0.
>>>    >>>>>                        RINFO(1) (local estimated flops for
>>>    the elimination after analysis):
>>>    >>>>>                          [0] 1.45525e+08
>>>    >>>>>                        RINFO(2) (local estimated flops for
>>>    the assembly after factorization):
>>>    >>>>>                          [0]  2.89397e+06
>>>    >>>>>                        RINFO(3) (local estimated flops for
>>>    the elimination after factorization):
>>>    >>>>>                          [0]  1.45525e+08
>>>    >>>>>                        INFO(15) (estimated size of (in MB)
>>>    MUMPS internal data for running numerical factorization):
>>>    >>>>>                        [0] 29
>>>    >>>>>                        INFO(16) (size of (in MB) MUMPS
>>>    internal data used during numerical factorization):
>>>    >>>>>                          [0] 29
>>>    >>>>>                        INFO(23) (num of pivots eliminated on
>>>    this processor after factorization):
>>>    >>>>>                          [0] 40200
>>>    >>>>>                        RINFOG(1) (global estimated flops for
>>>    the elimination after analysis): 1.45525e+08
>>>    >>>>>                        RINFOG(2) (global estimated flops for
>>>    the assembly after factorization): 2.89397e+06
>>>    >>>>>                        RINFOG(3) (global estimated flops for
>>>    the elimination after factorization): 1.45525e+08
>>>    >>>>>                        (RINFOG(12) RINFOG(13))*2^INFOG(34)
>>>    (determinant): (0.,0.)*(2^0)
>>>    >>>>>                        INFOG(3) (estimated real workspace
>>>    for factors on all processors after analysis): 1849788
>>>    >>>>>                        INFOG(4) (estimated integer workspace
>>>    for factors on all processors after analysis): 879986
>>>    >>>>>                        INFOG(5) (estimated maximum front
>>>    size in the complete tree): 282
>>>    >>>>>                        INFOG(6) (number of nodes in the
>>>    complete tree): 23709
>>>    >>>>>                        INFOG(7) (ordering option effectively
>>>    used after analysis): 5
>>>    >>>>>                        INFOG(8) (structural symmetry in
>>>    percent of the permuted matrix after analysis): 100
>>>    >>>>>                        INFOG(9) (total real/complex
>>>    workspace to store the matrix factors after factorization): 1849788
>>>    >>>>>                        INFOG(10) (total integer space store
>>>    the matrix factors after factorization): 879986
>>>    >>>>>                        INFOG(11) (order of largest frontal
>>>    matrix after factorization): 282
>>>    >>>>>                        INFOG(12) (number of off-diagonal
>>>    pivots): 0
>>>    >>>>>                        INFOG(13) (number of delayed pivots
>>>    after factorization): 0
>>>    >>>>>                        INFOG(14) (number of memory compress
>>>    after factorization): 0
>>>    >>>>>                        INFOG(15) (number of steps of
>>>    iterative refinement after solution): 0
>>>    >>>>>                        INFOG(16) (estimated size (in MB) of
>>>    all MUMPS internal data for factorization after analysis: value on
>>>    the most memory consuming processor): 29
>>>    >>>>>                        INFOG(17) (estimated size of all
>>>    MUMPS internal data for factorization after analysis: sum over all
>>>    processors): 29
>>>    >>>>>                        INFOG(18) (size of all MUMPS internal
>>>    data allocated during factorization: value on the most memory
>>>    consuming processor): 29
>>>    >>>>>                        INFOG(19) (size of all MUMPS internal
>>>    data allocated during factorization: sum over all processors): 29
>>>    >>>>>                        INFOG(20) (estimated number of
>>>    entries in the factors): 1849788
>>>    >>>>>                        INFOG(21) (size in MB of memory
>>>    effectively used during factorization - value on the most memory
>>>    consuming processor): 26
>>>    >>>>>                        INFOG(22) (size in MB of memory
>>>    effectively used during factorization - sum over all processors): 26
>>>    >>>>>                        INFOG(23) (after analysis: value of
>>>    ICNTL(6) effectively used): 0
>>>    >>>>>                        INFOG(24) (after analysis: value of
>>>    ICNTL(12) effectively used): 1
>>>    >>>>>                        INFOG(25) (after factorization:
>>>    number of pivots modified by static pivoting): 0
>>>    >>>>>                        INFOG(28) (after factorization:
>>>    number of null pivots encountered): 0
>>>    >>>>>                        INFOG(29) (after factorization:
>>>    effective number of entries in the factors (sum over all
>>>    processors)): 1849788
>>>    >>>>>                        INFOG(30, 31) (after solution: size
>>>    in Mbytes of memory used during solution phase): 29, 29
>>>    >>>>>                        INFOG(32) (after analysis: type of
>>>    analysis done): 1
>>>    >>>>>                        INFOG(33) (value used for ICNTL(8)): 7
>>>    >>>>>                        INFOG(34) (exponent of the
>>>    determinant if determinant is requested): 0
>>>    >>>>>                        INFOG(35) (after factorization:
>>>    number of entries taking into account BLR factor compression - sum
>>>    over all processors): 1849788
>>>    >>>>>                        INFOG(36) (after analysis: estimated
>>>    size of all MUMPS internal data for running BLR in-core - value on
>>>    the most memory consuming processor): 0
>>>    >>>>>                        INFOG(37) (after analysis: estimated
>>>    size of all MUMPS internal data for running BLR in-core - sum over
>>>    all processors): 0
>>>    >>>>>                        INFOG(38) (after analysis: estimated
>>>    size of all MUMPS internal data for running BLR out-of-core -
>>>    value on the most memory consuming processor): 0
>>>    >>>>>                        INFOG(39) (after analysis: estimated
>>>    size of all MUMPS internal data for running BLR out-of-core - sum
>>>    over all processors): 0
>>>    >>>>>            linear system matrix = precond matrix:
>>>    >>>>>            Mat Object:   1 MPI processes
>>>    >>>>>              type: seqaijcusparse
>>>    >>>>>              rows=40200, cols=40200
>>>    >>>>>              total: nonzeros=199996, allocated nonzeros=199996
>>>    >>>>>              total number of mallocs used during
>>>    MatSetValues calls=0
>>>    >>>>>                not using I-node routines
>>>    >>>>>    linear system matrix = precond matrix:
>>>    >>>>>    Mat Object: 16 MPI processes
>>>    >>>>>      type: mpiaijcusparse
>>>    >>>>>      rows=160800, cols=160800
>>>    >>>>>      total: nonzeros=802396, allocated nonzeros=1608000
>>>    >>>>>      total number of mallocs used during MatSetValues calls=0
>>>    >>>>>        not using I-node (on process 0) routines
>>>    >>>>> Norm of error 9.11684e-07 iterations 189
>>>    >>>>> Chang
>>>    >>>>> On 10/14/21 10:10 PM, Chang Liu wrote:
>>>    >>>>>> Hi Barry,
>>>    >>>>>>
>>>    >>>>>> No problem. Here is the output. It seems that the resid
>>>    norm calculation is incorrect.
>>>    >>>>>>
>>>    >>>>>> $ mpiexec -n 16 --hostfile hostfile --oversubscribe ./ex7
>>>    -m 400 -ksp_view -ksp_monitor_true_residual -pc_type bjacobi
>>>    -pc_bjacobi_blocks 4 -ksp_type fgmres -mat_type aijcusparse
>>>    -sub_pc_type telescope -sub_ksp_type preonly
>>>    -sub_telescope_ksp_type preonly -sub_telescope_pc_type lu
>>>    -sub_telescope_pc_factor_mat_solver_type cusparse
>>>    -sub_pc_telescope_reduction_factor 4
>>>    -sub_pc_telescope_subcomm_type contiguous -ksp_max_it 2000
>>>    -ksp_rtol 1.e-20 -ksp_atol 1.e-9
>>>    >>>>>>     0 KSP unpreconditioned resid norm 4.014971979977e+01
>>>    true resid norm 4.014971979977e+01 ||r(i)||/||b|| 1.000000000000e+00
>>>    >>>>>>     1 KSP unpreconditioned resid norm 0.000000000000e+00
>>>    true resid norm 4.014971979977e+01 ||r(i)||/||b|| 1.000000000000e+00
>>>    >>>>>> KSP Object: 16 MPI processes
>>>    >>>>>>     type: fgmres
>>>    >>>>>>       restart=30, using Classical (unmodified) Gram-Schmidt
>>>    Orthogonalization with no iterative refinement
>>>    >>>>>>       happy breakdown tolerance 1e-30
>>>    >>>>>>     maximum iterations=2000, initial guess is zero
>>>    >>>>>>     tolerances:  relative=1e-20, absolute=1e-09,
>>>    divergence=10000.
>>>    >>>>>>     right preconditioning
>>>    >>>>>>     using UNPRECONDITIONED norm type for convergence test
>>>    >>>>>> PC Object: 16 MPI processes
>>>    >>>>>>     type: bjacobi
>>>    >>>>>>       number of blocks = 4
>>>    >>>>>>       Local solver information for first block is in the
>>>    following KSP and PC objects on rank 0:
>>>    >>>>>>       Use -ksp_view ::ascii_info_detail to display
>>>    information for all blocks
>>>    >>>>>>     KSP Object: (sub_) 4 MPI processes
>>>    >>>>>>       type: preonly
>>>    >>>>>>       maximum iterations=10000, initial guess is zero
>>>    >>>>>>       tolerances:  relative=1e-05, absolute=1e-50,
>>>    divergence=10000.
>>>    >>>>>>       left preconditioning
>>>    >>>>>>       using NONE norm type for convergence test
>>>    >>>>>>     PC Object: (sub_) 4 MPI processes
>>>    >>>>>>       type: telescope
>>>    >>>>>>         petsc subcomm: parent comm size reduction factor = 4
>>>    >>>>>>         petsc subcomm: parent_size = 4 , subcomm_size = 1
>>>    >>>>>>         petsc subcomm type = contiguous
>>>    >>>>>>       linear system matrix = precond matrix:
>>>    >>>>>>       Mat Object: (sub_) 4 MPI processes
>>>    >>>>>>         type: mpiaij
>>>    >>>>>>         rows=40200, cols=40200
>>>    >>>>>>         total: nonzeros=199996, allocated nonzeros=203412
>>>    >>>>>>         total number of mallocs used during MatSetValues
>>>    calls=0
>>>    >>>>>>           not using I-node (on process 0) routines
>>>    >>>>>>           setup type: default
>>>    >>>>>>           Parent DM object: NULL
>>>    >>>>>>           Sub DM object: NULL
>>>    >>>>>>           KSP Object:   (sub_telescope_)   1 MPI processes
>>>    >>>>>>             type: preonly
>>>    >>>>>>             maximum iterations=10000, initial guess is zero
>>>    >>>>>>             tolerances:  relative=1e-05, absolute=1e-50,
>>>    divergence=10000.
>>>    >>>>>>             left preconditioning
>>>    >>>>>>             using NONE norm type for convergence test
>>>    >>>>>>           PC Object:   (sub_telescope_)   1 MPI processes
>>>    >>>>>>             type: lu
>>>    >>>>>>               out-of-place factorization
>>>    >>>>>>               tolerance for zero pivot 2.22045e-14
>>>    >>>>>>               matrix ordering: nd
>>>    >>>>>>               factor fill ratio given 5., needed 8.62558
>>>    >>>>>>                 Factored matrix follows:
>>>    >>>>>>                   Mat Object:   1 MPI processes
>>>    >>>>>>                     type: seqaijcusparse
>>>    >>>>>>                     rows=40200, cols=40200
>>>    >>>>>>                     package used to perform factorization:
>>>    cusparse
>>>    >>>>>>                     total: nonzeros=1725082, allocated
>>>    nonzeros=1725082
>>>    >>>>>>                       not using I-node routines
>>>    >>>>>>             linear system matrix = precond matrix:
>>>    >>>>>>             Mat Object:   1 MPI processes
>>>    >>>>>>               type: seqaijcusparse
>>>    >>>>>>               rows=40200, cols=40200
>>>    >>>>>>               total: nonzeros=199996, allocated nonzeros=199996
>>>    >>>>>>               total number of mallocs used during
>>>    MatSetValues calls=0
>>>    >>>>>>                 not using I-node routines
>>>    >>>>>>     linear system matrix = precond matrix:
>>>    >>>>>>     Mat Object: 16 MPI processes
>>>    >>>>>>       type: mpiaijcusparse
>>>    >>>>>>       rows=160800, cols=160800
>>>    >>>>>>       total: nonzeros=802396, allocated nonzeros=1608000
>>>    >>>>>>       total number of mallocs used during MatSetValues calls=0
>>>    >>>>>>         not using I-node (on process 0) routines
>>>    >>>>>> Norm of error 400.999 iterations 1
>>>    >>>>>>
>>>    >>>>>> Chang
>>>    >>>>>>
>>>    >>>>>>
>>>    >>>>>> On 10/14/21 9:47 PM, Barry Smith wrote:
>>>    >>>>>>>
>>>    >>>>>>>     Chang,
>>>    >>>>>>>
>>>    >>>>>>>      Sorry I did not notice that one. Please run that with
>>>    -ksp_view -ksp_monitor_true_residual so we can see exactly how
>>>    options are interpreted and solver used. At a glance it looks ok
>>>    but something must be wrong to get the wrong answer.
>>>    >>>>>>>
>>>    >>>>>>>     Barry
>>>    >>>>>>>
>>>    >>>>>>>> On Oct 14, 2021, at 6:02 PM, Chang Liu <cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov>> wrote:
>>>    >>>>>>>>
>>>    >>>>>>>> Hi Barry,
>>>    >>>>>>>>
>>>    >>>>>>>> That is exactly what I was doing in the second example,
>>>    in which the preconditioner works but the GMRES does not.
>>>    >>>>>>>>
>>>    >>>>>>>> Chang
>>>    >>>>>>>>
>>>    >>>>>>>> On 10/14/21 5:15 PM, Barry Smith wrote:
>>>    >>>>>>>>>     You need to use the PCTELESCOPE inside the block
>>>    Jacobi, not outside it. So something like -pc_type bjacobi
>>>    -sub_pc_type telescope -sub_telescope_pc_type lu
>>>    >>>>>>>>>> On Oct 14, 2021, at 4:14 PM, Chang Liu <cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov>> wrote:
>>>    >>>>>>>>>>
>>>    >>>>>>>>>> Hi Pierre,
>>>    >>>>>>>>>>
>>>    >>>>>>>>>> I wonder if the trick of PCTELESCOPE only works for
>>>    preconditioner and not for the solver. I have done some tests, and
>>>    find that for solving a small matrix using -telescope_ksp_type
>>>    preonly, it does work for GPU with multiple MPI processes.
>>>    However, for bjacobi and gmres, it does not work.
>>>    >>>>>>>>>>
>>>    >>>>>>>>>> The command line options I used for small matrix is like
>>>    >>>>>>>>>>
>>>    >>>>>>>>>> mpiexec -n 4 --oversubscribe ./ex7 -m 100
>>>    -ksp_monitor_short -pc_type telescope -mat_type aijcusparse
>>>    -telescope_pc_type lu -telescope_pc_factor_mat_solver_type
>>>    cusparse -telescope_ksp_type preonly -pc_telescope_reduction_factor 4
>>>    >>>>>>>>>>
>>>    >>>>>>>>>> which gives the correct output. For iterative solver, I
>>>    tried
>>>    >>>>>>>>>>
>>>    >>>>>>>>>> mpiexec -n 16 --oversubscribe ./ex7 -m 400
>>>    -ksp_monitor_short -pc_type bjacobi -pc_bjacobi_blocks 4 -ksp_type
>>>    fgmres -mat_type aijcusparse -sub_pc_type telescope -sub_ksp_type
>>>    preonly -sub_telescope_ksp_type preonly -sub_telescope_pc_type lu
>>>    -sub_telescope_pc_factor_mat_solver_type cusparse
>>>    -sub_pc_telescope_reduction_factor 4 -ksp_max_it 2000 -ksp_rtol
>>>    1.e-9 -ksp_atol 1.e-20
>>>    >>>>>>>>>>
>>>    >>>>>>>>>> for large matrix. The output is like
>>>    >>>>>>>>>>
>>>    >>>>>>>>>>    0 KSP Residual norm 40.1497
>>>    >>>>>>>>>>    1 KSP Residual norm < 1.e-11
>>>    >>>>>>>>>> Norm of error 400.999 iterations 1
>>>    >>>>>>>>>>
>>>    >>>>>>>>>> So it seems to call a direct solver instead of an
>>>    iterative one.
>>>    >>>>>>>>>>
>>>    >>>>>>>>>> Can you please help check these options?
>>>    >>>>>>>>>>
>>>    >>>>>>>>>> Chang
>>>    >>>>>>>>>>
>>>    >>>>>>>>>> On 10/14/21 10:04 AM, Pierre Jolivet wrote:
>>>    >>>>>>>>>>>> On 14 Oct 2021, at 3:50 PM, Chang Liu <cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov>> wrote:
>>>    >>>>>>>>>>>>
>>>    >>>>>>>>>>>> Thank you Pierre. I was not aware of PCTELESCOPE
>>>    before. This sounds exactly what I need. I wonder if PCTELESCOPE
>>>    can transform a mpiaijcusparse to seqaircusparse? Or I have to do
>>>    it manually?
>>>    >>>>>>>>>>> PCTELESCOPE uses MatCreateMPIMatConcatenateSeqMat().
>>>    >>>>>>>>>>> 1) I’m not sure this is implemented for cuSparse
>>>    matrices, but it should be;
>>>    >>>>>>>>>>> 2) at least for the implementations
>>>    MatCreateMPIMatConcatenateSeqMat_MPIBAIJ() and
>>>    MatCreateMPIMatConcatenateSeqMat_MPIAIJ(), the resulting MatType
>>>    is MATBAIJ (resp. MATAIJ). Constructors are usually “smart” enough
>>>    to detect if the MPI communicator on which the Mat lives is of
>>>    size 1 (your case), and then the resulting Mat is of type MatSeqX
>>>    instead of MatMPIX, so you would not need to worry about the
>>>    transformation you are mentioning.
>>>    >>>>>>>>>>> If you try this out and this does not work, please
>>>    provide the backtrace (probably something like “Operation XYZ not
>>>    implemented for MatType ABC”), and hopefully someone can add the
>>>    missing plumbing.
>>>    >>>>>>>>>>> I do not claim that this will be efficient, but I
>>>    think this goes in the direction of what you want to achieve.
>>>    >>>>>>>>>>> Thanks,
>>>    >>>>>>>>>>> Pierre
>>>    >>>>>>>>>>>> Chang
>>>    >>>>>>>>>>>>
>>>    >>>>>>>>>>>> On 10/14/21 1:35 AM, Pierre Jolivet wrote:
>>>    >>>>>>>>>>>>> Maybe I’m missing something, but can’t you use
>>>    PCTELESCOPE as a subdomain solver, with a reduction factor equal
>>>    to the number of MPI processes you have per block?
>>>    >>>>>>>>>>>>> -sub_pc_type telescope
>>>    -sub_pc_telescope_reduction_factor X -sub_telescope_pc_type lu
>>>    >>>>>>>>>>>>> This does not work with MUMPS
>>>    -mat_mumps_use_omp_threads because not only do the Mat needs to be
>>>    redistributed, the secondary processes also need to be “converted”
>>>    to OpenMP threads.
>>>    >>>>>>>>>>>>> Thus the need for specific code in mumps.c.
>>>    >>>>>>>>>>>>> Thanks,
>>>    >>>>>>>>>>>>> Pierre
>>>    >>>>>>>>>>>>>> On 14 Oct 2021, at 6:00 AM, Chang Liu via
>>>    petsc-users <petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>> wrote:
>>>    >>>>>>>>>>>>>>
>>>    >>>>>>>>>>>>>> Hi Junchao,
>>>    >>>>>>>>>>>>>>
>>>    >>>>>>>>>>>>>> Yes that is what I want.
>>>    >>>>>>>>>>>>>>
>>>    >>>>>>>>>>>>>> Chang
>>>    >>>>>>>>>>>>>>
>>>    >>>>>>>>>>>>>> On 10/13/21 11:42 PM, Junchao Zhang wrote:
>>>    >>>>>>>>>>>>>>> On Wed, Oct 13, 2021 at 8:58 PM Barry Smith
>>>    <bsmith at petsc.dev <mailto:bsmith at petsc.dev>
>>>    <mailto:bsmith at petsc.dev <mailto:bsmith at petsc.dev>>> wrote:
>>>    >>>>>>>>>>>>>>>         Junchao,
>>>    >>>>>>>>>>>>>>>            If I understand correctly Chang is
>>>    using the block Jacobi
>>>    >>>>>>>>>>>>>>>      method with a single block for a number of
>>>    MPI ranks and a direct
>>>    >>>>>>>>>>>>>>>      solver for each block so it uses
>>>    PCSetUp_BJacobi_Multiproc() which
>>>    >>>>>>>>>>>>>>>      is code Hong Zhang wrote a number of years
>>>    ago for CPUs. For their
>>>    >>>>>>>>>>>>>>>      particular problems this preconditioner works
>>>    well, but using an
>>>    >>>>>>>>>>>>>>>      iterative solver on the blocks does not work
>>>    well.
>>>    >>>>>>>>>>>>>>>            If we had complete MPI-GPU direct
>>>    solvers he could just use
>>>    >>>>>>>>>>>>>>>      the current code with MPIAIJCUSPARSE on each
>>>    block but since we do
>>>    >>>>>>>>>>>>>>>      not he would like to use a single GPU for
>>>    each block, this means
>>>    >>>>>>>>>>>>>>>      that diagonal blocks of  the global parallel
>>>    MPI matrix needs to be
>>>    >>>>>>>>>>>>>>>      sent to a subset of the GPUs (one GPU per
>>>    block, which has multiple
>>>    >>>>>>>>>>>>>>>      MPI ranks associated with the blocks).
>>>    Similarly for the triangular
>>>    >>>>>>>>>>>>>>>      solves the blocks of the right hand side
>>>    needs to be shipped to the
>>>    >>>>>>>>>>>>>>>      appropriate GPU and the resulting solution
>>>    shipped back to the
>>>    >>>>>>>>>>>>>>>      multiple GPUs. So Chang is absolutely
>>>    correct, this is somewhat like
>>>    >>>>>>>>>>>>>>>      your code for MUMPS with OpenMP. OK, I now
>>>    understand the background..
>>>    >>>>>>>>>>>>>>>      One could use PCSetUp_BJacobi_Multiproc() and
>>>    get the blocks on the
>>>    >>>>>>>>>>>>>>>      MPI ranks and then shrink each block down to
>>>    a single GPU but this
>>>    >>>>>>>>>>>>>>>      would be pretty inefficient, ideally one
>>>    would go directly from the
>>>    >>>>>>>>>>>>>>>      big MPI matrix on all the GPUs to the sub
>>>    matrices on the subset of
>>>    >>>>>>>>>>>>>>>      GPUs. But this may be a large coding project.
>>>    >>>>>>>>>>>>>>> I don't understand these sentences. Why do you say
>>>    "shrink"? In my mind, we just need to move each block (submatrix)
>>>    living over multiple MPI ranks to one of them and solve directly
>>>    there.  In other words, we keep blocks' size, no shrinking or
>>>    expanding.
>>>    >>>>>>>>>>>>>>> As mentioned before, cusparse does not provide LU
>>>    factorization. So the LU factorization would be done on CPU, and
>>>    the solve be done on GPU. I assume Chang wants to gain from the
>>>    (potential) faster solve (instead of factorization) on GPU.
>>>    >>>>>>>>>>>>>>>         Barry
>>>    >>>>>>>>>>>>>>>      Since the matrices being factored and solved
>>>    directly are relatively
>>>    >>>>>>>>>>>>>>>      large it is possible that the cusparse code
>>>    could be reasonably
>>>    >>>>>>>>>>>>>>>      efficient (they are not the tiny problems one
>>>    gets at the coarse
>>>    >>>>>>>>>>>>>>>      level of multigrid). Of course, this is
>>>    speculation, I don't
>>>    >>>>>>>>>>>>>>>      actually know how much better the cusparse
>>>    code would be on the
>>>    >>>>>>>>>>>>>>>      direct solver than a good CPU direct sparse
>>>    solver.
>>>    >>>>>>>>>>>>>>>       > On Oct 13, 2021, at 9:32 PM, Chang Liu
>>>    <cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov>>> wrote:
>>>    >>>>>>>>>>>>>>>       >
>>>    >>>>>>>>>>>>>>>       > Sorry I am not familiar with the details
>>>    either. Can you please
>>>    >>>>>>>>>>>>>>>      check the code in
>>>    MatMumpsGatherNonzerosOnMaster in mumps.c?
>>>    >>>>>>>>>>>>>>>       >
>>>    >>>>>>>>>>>>>>>       > Chang
>>>    >>>>>>>>>>>>>>>       >
>>>    >>>>>>>>>>>>>>>       > On 10/13/21 9:24 PM, Junchao Zhang wrote:
>>>    >>>>>>>>>>>>>>>       >> Hi Chang,
>>>    >>>>>>>>>>>>>>>       >>   I did the work in mumps. It is easy for
>>>    me to understand
>>>    >>>>>>>>>>>>>>>      gathering matrix rows to one process.
>>>    >>>>>>>>>>>>>>>       >>   But how to gather blocks (submatrices)
>>>    to form a large block?     Can you draw a picture of that?
>>>    >>>>>>>>>>>>>>>       >>   Thanks
>>>    >>>>>>>>>>>>>>>       >> --Junchao Zhang
>>>    >>>>>>>>>>>>>>>       >> On Wed, Oct 13, 2021 at 7:47 PM Chang Liu
>>>    via petsc-users
>>>    >>>>>>>>>>>>>>>      <petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov> <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov> <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>>>>
>>>    >>>>>>>>>>>>>>>      wrote:
>>>    >>>>>>>>>>>>>>>       >>    Hi Barry,
>>>    >>>>>>>>>>>>>>>       >>    I think mumps solver in petsc does
>>>    support that. You can
>>>    >>>>>>>>>>>>>>>      check the
>>>    >>>>>>>>>>>>>>>       >>    documentation on
>>>    "-mat_mumps_use_omp_threads" at
>>>    >>>>>>>>>>>>>>>       >>
>>>    >>>>>>>>>>>>>>>
>>>    https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html
>>>    <https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html>
>>>    >>>>>>>>>>>>>>>
>>>    <https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html <https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html>>
>>>    >>>>>>>>>>>>>>>       >>
>>>    <https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html <https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html>
>>>    >>>>>>>>>>>>>>>
>>>    <https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html <https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html>>>
>>>    >>>>>>>>>>>>>>>       >>    and the code enclosed by #if
>>>    >>>>>>>>>>>>>>>      defined(PETSC_HAVE_OPENMP_SUPPORT) in
>>>    >>>>>>>>>>>>>>>       >>    functions MatMumpsSetUpDistRHSInfo and
>>>    >>>>>>>>>>>>>>>       >>    MatMumpsGatherNonzerosOnMaster in
>>>    >>>>>>>>>>>>>>>       >>    mumps.c
>>>    >>>>>>>>>>>>>>>       >>    1. I understand it is ideal to do one
>>>    MPI rank per GPU.
>>>    >>>>>>>>>>>>>>>      However, I am
>>>    >>>>>>>>>>>>>>>       >>    working on an existing code that was
>>>    developed based on MPI
>>>    >>>>>>>>>>>>>>>      and the the
>>>    >>>>>>>>>>>>>>>       >>    # of mpi ranks is typically equal to #
>>>    of cpu cores. We don't
>>>    >>>>>>>>>>>>>>>      want to
>>>    >>>>>>>>>>>>>>>       >>    change the whole structure of the code.
>>>    >>>>>>>>>>>>>>>       >>    2. What you have suggested has been
>>>    coded in mumps.c. See
>>>    >>>>>>>>>>>>>>>      function
>>>    >>>>>>>>>>>>>>>       >>    MatMumpsSetUpDistRHSInfo.
>>>    >>>>>>>>>>>>>>>       >>    Regards,
>>>    >>>>>>>>>>>>>>>       >>    Chang
>>>    >>>>>>>>>>>>>>>       >>    On 10/13/21 7:53 PM, Barry Smith wrote:
>>>    >>>>>>>>>>>>>>>       >>     >
>>>    >>>>>>>>>>>>>>>       >>     >
>>>    >>>>>>>>>>>>>>>       >>     >> On Oct 13, 2021, at 3:50 PM, Chang
>>>    Liu <cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov>>>> wrote:
>>>    >>>>>>>>>>>>>>>       >>     >>
>>>    >>>>>>>>>>>>>>>       >>     >> Hi Barry,
>>>    >>>>>>>>>>>>>>>       >>     >>
>>>    >>>>>>>>>>>>>>>       >>     >> That is exactly what I want.
>>>    >>>>>>>>>>>>>>>       >>     >>
>>>    >>>>>>>>>>>>>>>       >>     >> Back to my original question, I am
>>>    looking for an approach to
>>>    >>>>>>>>>>>>>>>       >>    transfer
>>>    >>>>>>>>>>>>>>>       >>     >> matrix
>>>    >>>>>>>>>>>>>>>       >>     >> data from many MPI processes to
>>>    "master" MPI
>>>    >>>>>>>>>>>>>>>       >>     >> processes, each of which taking
>>>    care of one GPU, and then
>>>    >>>>>>>>>>>>>>>      upload
>>>    >>>>>>>>>>>>>>>       >>    the data to GPU to
>>>    >>>>>>>>>>>>>>>       >>     >> solve.
>>>    >>>>>>>>>>>>>>>       >>     >> One can just grab some codes from
>>>    mumps.c to
>>>    >>>>>>>>>>>>>>> aijcusparse.cu <http://aijcusparse.cu/>
>>>    <http://aijcusparse.cu <http://aijcusparse.cu/>>
>>>    >>>>>>>>>>>>>>>       >>    <http://aijcusparse.cu
>>>    <http://aijcusparse.cu/> <http://aijcusparse.cu
>>>    <http://aijcusparse.cu/>>>.
>>>    >>>>>>>>>>>>>>>       >>     >
>>>    >>>>>>>>>>>>>>>       >>     >    mumps.c doesn't actually do
>>>    that. It never needs to
>>>    >>>>>>>>>>>>>>>      copy the
>>>    >>>>>>>>>>>>>>>       >>    entire matrix to a single MPI rank.
>>>    >>>>>>>>>>>>>>>       >>     >
>>>    >>>>>>>>>>>>>>>       >>     >    It would be possible to write
>>>    such a code that you
>>>    >>>>>>>>>>>>>>>      suggest but
>>>    >>>>>>>>>>>>>>>       >>    it is not clear that it makes sense
>>>    >>>>>>>>>>>>>>>       >>     >
>>>    >>>>>>>>>>>>>>>       >>     > 1)  For normal PETSc GPU usage
>>>    there is one GPU per MPI
>>>    >>>>>>>>>>>>>>>      rank, so
>>>    >>>>>>>>>>>>>>>       >>    while your one GPU per big domain is
>>>    solving its systems the
>>>    >>>>>>>>>>>>>>>      other
>>>    >>>>>>>>>>>>>>>       >>    GPUs (with the other MPI ranks that
>>>    share that domain) are doing
>>>    >>>>>>>>>>>>>>>       >>    nothing.
>>>    >>>>>>>>>>>>>>>       >>     >
>>>    >>>>>>>>>>>>>>>       >>     > 2) For each triangular solve you
>>>    would have to gather the
>>>    >>>>>>>>>>>>>>>      right
>>>    >>>>>>>>>>>>>>>       >>    hand side from the multiple ranks to
>>>    the single GPU to pass it to
>>>    >>>>>>>>>>>>>>>       >>    the GPU solver and then scatter the
>>>    resulting solution back
>>>    >>>>>>>>>>>>>>>      to all
>>>    >>>>>>>>>>>>>>>       >>    of its subdomain ranks.
>>>    >>>>>>>>>>>>>>>       >>     >
>>>    >>>>>>>>>>>>>>>       >>     >    What I was suggesting was assign
>>>    an entire subdomain to a
>>>    >>>>>>>>>>>>>>>       >>    single MPI rank, thus it does
>>>    everything on one GPU and can
>>>    >>>>>>>>>>>>>>>      use the
>>>    >>>>>>>>>>>>>>>       >>    GPU solver directly. If all the major
>>>    computations of a subdomain
>>>    >>>>>>>>>>>>>>>       >>    can fit and be done on a single GPU
>>>    then you would be
>>>    >>>>>>>>>>>>>>>      utilizing all
>>>    >>>>>>>>>>>>>>>       >>    the GPUs you are using effectively.
>>>    >>>>>>>>>>>>>>>       >>     >
>>>    >>>>>>>>>>>>>>>       >>     >    Barry
>>>    >>>>>>>>>>>>>>>       >>     >
>>>    >>>>>>>>>>>>>>>       >>     >
>>>    >>>>>>>>>>>>>>>       >>     >
>>>    >>>>>>>>>>>>>>>       >>     >>
>>>    >>>>>>>>>>>>>>>       >>     >> Chang
>>>    >>>>>>>>>>>>>>>       >>     >>
>>>    >>>>>>>>>>>>>>>       >>     >> On 10/13/21 1:53 PM, Barry Smith
>>>    wrote:
>>>    >>>>>>>>>>>>>>>       >>     >>>    Chang,
>>>    >>>>>>>>>>>>>>>       >>     >>>      You are correct there is no
>>>    MPI + GPU direct
>>>    >>>>>>>>>>>>>>>      solvers that
>>>    >>>>>>>>>>>>>>>       >>    currently do the triangular solves
>>>    with MPI + GPU parallelism
>>>    >>>>>>>>>>>>>>>      that I
>>>    >>>>>>>>>>>>>>>       >>    am aware of. You are limited that
>>>    individual triangular solves be
>>>    >>>>>>>>>>>>>>>       >>    done on a single GPU. I can only
>>>    suggest making each subdomain as
>>>    >>>>>>>>>>>>>>>       >>    big as possible to utilize each GPU as
>>>    much as possible for the
>>>    >>>>>>>>>>>>>>>       >>    direct triangular solves.
>>>    >>>>>>>>>>>>>>>       >>     >>>     Barry
>>>    >>>>>>>>>>>>>>>       >>     >>>> On Oct 13, 2021, at 12:16 PM,
>>>    Chang Liu via petsc-users
>>>    >>>>>>>>>>>>>>>       >>    <petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov> <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov> <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>>>>
>>>    >>>>>>>>>>>>>>>      wrote:
>>>    >>>>>>>>>>>>>>>       >>     >>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>> Hi Mark,
>>>    >>>>>>>>>>>>>>>       >>     >>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>> '-mat_type aijcusparse' works
>>>    with mpiaijcusparse with
>>>    >>>>>>>>>>>>>>>      other
>>>    >>>>>>>>>>>>>>>       >>    solvers, but with
>>>    -pc_factor_mat_solver_type cusparse, it
>>>    >>>>>>>>>>>>>>>      will give
>>>    >>>>>>>>>>>>>>>       >>    an error.
>>>    >>>>>>>>>>>>>>>       >>     >>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>> Yes what I want is to have mumps
>>>    or superlu to do the
>>>    >>>>>>>>>>>>>>>       >>    factorization, and then do the rest,
>>>    including GMRES solver,
>>>    >>>>>>>>>>>>>>>      on gpu.
>>>    >>>>>>>>>>>>>>>       >>    Is that possible?
>>>    >>>>>>>>>>>>>>>       >>     >>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>> I have tried to use aijcusparse
>>>    with superlu_dist, it
>>>    >>>>>>>>>>>>>>>      runs but
>>>    >>>>>>>>>>>>>>>       >>    the iterative solver is still running
>>>    on CPUs. I have
>>>    >>>>>>>>>>>>>>>      contacted the
>>>    >>>>>>>>>>>>>>>       >>    superlu group and they confirmed that
>>>    is the case right now.
>>>    >>>>>>>>>>>>>>>      But if
>>>    >>>>>>>>>>>>>>>       >>    I set -pc_factor_mat_solver_type
>>>    cusparse, it seems that the
>>>    >>>>>>>>>>>>>>>       >>    iterative solver is running on GPU.
>>>    >>>>>>>>>>>>>>>       >>     >>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>> Chang
>>>    >>>>>>>>>>>>>>>       >>     >>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>> On 10/13/21 12:03 PM, Mark Adams
>>>    wrote:
>>>    >>>>>>>>>>>>>>>       >>     >>>>> On Wed, Oct 13, 2021 at 11:10
>>>    AM Chang Liu
>>>    >>>>>>>>>>>>>>>      <cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov>>>>> wrote:
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     Thank you Junchao for
>>>    explaining this. I guess in
>>>    >>>>>>>>>>>>>>>      my case
>>>    >>>>>>>>>>>>>>>       >>    the code is
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     just calling a seq solver
>>>    like superlu to do
>>>    >>>>>>>>>>>>>>>       >>    factorization on GPUs.
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     My idea is that I want to
>>>    have a traditional MPI
>>>    >>>>>>>>>>>>>>>      code to
>>>    >>>>>>>>>>>>>>>       >>    utilize GPUs
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     with cusparse. Right now
>>>    cusparse does not support
>>>    >>>>>>>>>>>>>>>      mpiaij
>>>    >>>>>>>>>>>>>>>       >>    matrix, Sure it does: '-mat_type
>>>    aijcusparse' will give you an
>>>    >>>>>>>>>>>>>>>       >>    mpiaijcusparse matrix with > 1 processes.
>>>    >>>>>>>>>>>>>>>       >>     >>>>> (-mat_type mpiaijcusparse might
>>>    also work with >1 proc).
>>>    >>>>>>>>>>>>>>>       >>     >>>>> However, I see in grepping the
>>>    repo that all the mumps and
>>>    >>>>>>>>>>>>>>>       >>    superlu tests use aij or sell matrix type.
>>>    >>>>>>>>>>>>>>>       >>     >>>>> MUMPS and SuperLU provide their
>>>    own solves, I assume
>>>    >>>>>>>>>>>>>>>      .... but
>>>    >>>>>>>>>>>>>>>       >>    you might want to do other matrix
>>>    operations on the GPU. Is
>>>    >>>>>>>>>>>>>>>      that the
>>>    >>>>>>>>>>>>>>>       >>    issue?
>>>    >>>>>>>>>>>>>>>       >>     >>>>> Did you try -mat_type
>>>    aijcusparse with MUMPS and/or
>>>    >>>>>>>>>>>>>>>      SuperLU
>>>    >>>>>>>>>>>>>>>       >>    have a problem? (no test with it so it
>>>    probably does not work)
>>>    >>>>>>>>>>>>>>>       >>     >>>>> Thanks,
>>>    >>>>>>>>>>>>>>>       >>     >>>>> Mark
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     so I
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     want the code to have a
>>>    mpiaij matrix when adding
>>>    >>>>>>>>>>>>>>>      all the
>>>    >>>>>>>>>>>>>>>       >>    matrix terms,
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     and then transform the
>>>    matrix to seqaij when doing the
>>>    >>>>>>>>>>>>>>>       >>    factorization
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     and
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     solve. This involves
>>>    sending the data to the master
>>>    >>>>>>>>>>>>>>>       >>    process, and I
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     think
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     the petsc mumps solver have
>>>    something similar already.
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     Chang
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     On 10/13/21 10:18 AM,
>>>    Junchao Zhang wrote:
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      > On Tue, Oct 12, 2021 at
>>>    1:07 PM Mark Adams
>>>    >>>>>>>>>>>>>>>       >>    <mfadams at lbl.gov
>>>    <mailto:mfadams at lbl.gov> <mailto:mfadams at lbl.gov
>>>    <mailto:mfadams at lbl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:mfadams at lbl.gov
>>>    <mailto:mfadams at lbl.gov> <mailto:mfadams at lbl.gov
>>>    <mailto:mfadams at lbl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     <mailto:mfadams at lbl.gov
>>>    <mailto:mfadams at lbl.gov> <mailto:mfadams at lbl.gov
>>>    <mailto:mfadams at lbl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:mfadams at lbl.gov
>>>    <mailto:mfadams at lbl.gov> <mailto:mfadams at lbl.gov
>>>    <mailto:mfadams at lbl.gov>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      > <mailto:mfadams at lbl.gov
>>>    <mailto:mfadams at lbl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:mfadams at lbl.gov
>>>    <mailto:mfadams at lbl.gov>> <mailto:mfadams at lbl.gov
>>>    <mailto:mfadams at lbl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:mfadams at lbl.gov
>>>    <mailto:mfadams at lbl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:mfadams at lbl.gov
>>>    <mailto:mfadams at lbl.gov> <mailto:mfadams at lbl.gov
>>>    <mailto:mfadams at lbl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:mfadams at lbl.gov
>>>    <mailto:mfadams at lbl.gov> <mailto:mfadams at lbl.gov
>>>    <mailto:mfadams at lbl.gov>>>>>> wrote:
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >     On Tue, Oct 12, 2021
>>>    at 1:45 PM Chang Liu
>>>    >>>>>>>>>>>>>>>       >>    <cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >        <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>> wrote:
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         Hi Mark,
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         The option I use
>>>    is like
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         -pc_type bjacobi
>>>    -pc_bjacobi_blocks 16
>>>    >>>>>>>>>>>>>>>       >>    -ksp_type fgmres
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     -mat_type
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         aijcusparse
>>>    *-sub_pc_factor_mat_solver_type
>>>    >>>>>>>>>>>>>>>       >>    cusparse
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     *-sub_ksp_type
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         preonly
>>>    *-sub_pc_type lu* -ksp_max_it 2000
>>>    >>>>>>>>>>>>>>>       >>    -ksp_rtol 1.e-300
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         -ksp_atol 1.e-300
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >     Note, If you use
>>>    -log_view the last column
>>>    >>>>>>>>>>>>>>>      (rows
>>>    >>>>>>>>>>>>>>>       >>    are the
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     method like
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >     MatFactorNumeric)
>>>    has the percent of work
>>>    >>>>>>>>>>>>>>>      in the GPU.
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >     Junchao: *This*
>>>    implies that we have a
>>>    >>>>>>>>>>>>>>>      cuSparse LU
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     factorization. Is
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >     that correct? (I
>>>    don't think we do)
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      > No, we don't have
>>>    cuSparse LU factorization.     If you check
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    MatLUFactorSymbolic_SeqAIJCUSPARSE(),you will
>>>    >>>>>>>>>>>>>>>      find it
>>>    >>>>>>>>>>>>>>>       >>    calls
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    MatLUFactorSymbolic_SeqAIJ() instead.
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      > So I don't understand
>>>    Chang's idea. Do you want to
>>>    >>>>>>>>>>>>>>>       >>    make bigger
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     blocks?
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         I think this one
>>>    do both factorization and
>>>    >>>>>>>>>>>>>>>       >>    solve on gpu.
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         You can check the
>>>    >>>>>>>>>>>>>>>      runex72_aijcusparse.sh file
>>>    >>>>>>>>>>>>>>>       >>    in petsc
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     install
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         directory, and
>>>    try it your self (this
>>>    >>>>>>>>>>>>>>>      is only lu
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     factorization
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         without
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         iterative solve).
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         Chang
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         On 10/12/21 1:17
>>>    PM, Mark Adams wrote:
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          > On Tue, Oct
>>>    12, 2021 at 11:19 AM
>>>    >>>>>>>>>>>>>>>      Chang Liu
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     <cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >            <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>>> wrote:
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     Hi Junchao,
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     No I only
>>>    needs it to be transferred
>>>    >>>>>>>>>>>>>>>       >>    within a
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     node. I use
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         block-Jacobi
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     method
>>>    and GMRES to solve the sparse
>>>    >>>>>>>>>>>>>>>       >>    matrix, so each
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         direct solver will
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     take care
>>>    of a sub-block of the
>>>    >>>>>>>>>>>>>>>      whole
>>>    >>>>>>>>>>>>>>>       >>    matrix. In this
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         way, I can use
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     one
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     GPU to
>>>    solve one sub-block, which is
>>>    >>>>>>>>>>>>>>>       >>    stored within
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     one node.
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     It was
>>>    stated in the
>>>    >>>>>>>>>>>>>>>      documentation that
>>>    >>>>>>>>>>>>>>>       >>    cusparse
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     solver
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         is slow.
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     However,
>>>    in my test using
>>>    >>>>>>>>>>>>>>>      ex72.c, the
>>>    >>>>>>>>>>>>>>>       >>    cusparse
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     solver is
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         faster than
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     mumps or
>>>    superlu_dist on CPUs.
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          > Are we
>>>    talking about the
>>>    >>>>>>>>>>>>>>>      factorization, the
>>>    >>>>>>>>>>>>>>>       >>    solve, or
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     both?
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          > We do not
>>>    have an interface to
>>>    >>>>>>>>>>>>>>>      cuSparse's LU
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     factorization (I
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         just
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          > learned that
>>>    it exists a few weeks ago).
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          > Perhaps your
>>>    fast "cusparse solver" is
>>>    >>>>>>>>>>>>>>>       >>    '-pc_type lu
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     -mat_type
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          > aijcusparse'
>>>    ? This would be the CPU
>>>    >>>>>>>>>>>>>>>       >>    factorization,
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     which is the
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          > dominant cost.
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     Chang
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     On
>>>    10/12/21 10:24 AM, Junchao
>>>    >>>>>>>>>>>>>>>      Zhang wrote:
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      > Hi, Chang,
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >        For the mumps solver, we
>>>    >>>>>>>>>>>>>>>      usually
>>>    >>>>>>>>>>>>>>>       >>    transfers
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     matrix
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         and vector
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     data
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      > within
>>>    a compute node.  For
>>>    >>>>>>>>>>>>>>>      the idea you
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     propose, it
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         looks like
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     we need
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      > to
>>>    gather data within
>>>    >>>>>>>>>>>>>>>       >>    MPI_COMM_WORLD, right?
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >        Mark, I remember you said
>>>    >>>>>>>>>>>>>>>       >>    cusparse solve is
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     slow
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         and you would
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      > rather
>>>    do it on CPU. Is it right?
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >
>>>    --Junchao Zhang
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      > On
>>>    Mon, Oct 11, 2021 at 10:25 PM
>>>    >>>>>>>>>>>>>>>       >>    Chang Liu via
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     petsc-users
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >
>>>    <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov> <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>        <mailto:petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >            <mailto:petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov> <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>        <mailto:petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>>>>>
>>>    <mailto:petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov> <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>        <mailto:petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >            <mailto:petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov> <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>        <mailto:petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>>>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >
>>>    <mailto:petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov> <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>        <mailto:petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >            <mailto:petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov> <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>        <mailto:petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>>>>>
>>>    <mailto:petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov> <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>        <mailto:petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >            <mailto:petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov> <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>        <mailto:petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov
>>>    <mailto:petsc-users at mcs.anl.gov>>>>>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     wrote:
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >     Hi,
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >        Currently, it is possible
>>>    >>>>>>>>>>>>>>>      to use
>>>    >>>>>>>>>>>>>>>       >>    mumps
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     solver in
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         PETSC with
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >
>>>    -mat_mumps_use_omp_threads
>>>    >>>>>>>>>>>>>>>       >>    option, so that
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         multiple MPI
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     processes
>>>    will
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >        transfer the matrix and
>>>    >>>>>>>>>>>>>>>      rhs data
>>>    >>>>>>>>>>>>>>>       >>    to the master
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         rank, and then
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     master
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >        rank will call mumps with
>>>    >>>>>>>>>>>>>>>      OpenMP
>>>    >>>>>>>>>>>>>>>       >>    to solve
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     the matrix.
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >     I
>>>    wonder if someone can
>>>    >>>>>>>>>>>>>>>      develop
>>>    >>>>>>>>>>>>>>>       >>    similar
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     option for
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         cusparse
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     solver.
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >        Right now, this solver
>>>    >>>>>>>>>>>>>>>      does not
>>>    >>>>>>>>>>>>>>>       >>    work with
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         mpiaijcusparse. I
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     think a
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >        possible workaround is to
>>>    >>>>>>>>>>>>>>>       >>    transfer all the
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     matrix
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         data to one MPI
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >        process, and then upload the
>>>    >>>>>>>>>>>>>>>       >>    data to GPU to
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     solve.
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         In this
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     way, one can
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >        use cusparse solver for a MPI
>>>    >>>>>>>>>>>>>>>       >>    program.
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >     Chang
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >     --
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >        Chang Liu
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >        Staff Research Physicist
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >     +1
>>>    609 243 3438
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >
>>>    cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >            <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >            <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >        <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >        Princeton Plasma Physics
>>>    >>>>>>>>>>>>>>>      Laboratory
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >        100 Stellarator Rd,
>>>    >>>>>>>>>>>>>>>      Princeton NJ
>>>    >>>>>>>>>>>>>>>       >>    08540, USA
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     --
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     Chang Liu
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     Staff
>>>    Research Physicist
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     +1 609
>>>    243 3438
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          > cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >            <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     Princeton
>>>    Plasma Physics Laboratory
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     100
>>>    Stellarator Rd, Princeton NJ
>>>    >>>>>>>>>>>>>>>      08540, USA
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         --
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         Chang Liu
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         Staff Research
>>>    Physicist
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         +1 609 243 3438
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      > cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         Princeton Plasma
>>>    Physics Laboratory
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         100 Stellarator
>>>    Rd, Princeton NJ 08540, USA
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     --     Chang Liu
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     Staff Research Physicist
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     +1 609 243 3438
>>>    >>>>>>>>>>>>>>>       >>     >>>>> cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     Princeton Plasma Physics
>>>    Laboratory
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     100 Stellarator Rd,
>>>    Princeton NJ 08540, USA
>>>    >>>>>>>>>>>>>>>       >>     >>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>> --
>>>    >>>>>>>>>>>>>>>       >>     >>>> Chang Liu
>>>    >>>>>>>>>>>>>>>       >>     >>>> Staff Research Physicist
>>>    >>>>>>>>>>>>>>>       >>     >>>> +1 609 243 3438
>>>    >>>>>>>>>>>>>>>       >>     >>>> cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>> Princeton Plasma Physics Laboratory
>>>    >>>>>>>>>>>>>>>       >>     >>>> 100 Stellarator Rd, Princeton NJ
>>>    08540, USA
>>>    >>>>>>>>>>>>>>>       >>     >>
>>>    >>>>>>>>>>>>>>>       >>     >> --
>>>    >>>>>>>>>>>>>>>       >>     >> Chang Liu
>>>    >>>>>>>>>>>>>>>       >>     >> Staff Research Physicist
>>>    >>>>>>>>>>>>>>>       >>     >> +1 609 243 3438
>>>    >>>>>>>>>>>>>>>       >>     >> cliu at pppl.gov
>>>    <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>     >> Princeton Plasma Physics Laboratory
>>>    >>>>>>>>>>>>>>>       >>     >> 100 Stellarator Rd, Princeton NJ
>>>    08540, USA
>>>    >>>>>>>>>>>>>>>       >>     >
>>>    >>>>>>>>>>>>>>>       >>    --     Chang Liu
>>>    >>>>>>>>>>>>>>>       >>    Staff Research Physicist
>>>    >>>>>>>>>>>>>>>       >>    +1 609 243 3438
>>>    >>>>>>>>>>>>>>>       >> cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>    Princeton Plasma Physics Laboratory
>>>    >>>>>>>>>>>>>>>       >>    100 Stellarator Rd, Princeton NJ
>>>    08540, USA
>>>    >>>>>>>>>>>>>>>       >
>>>    >>>>>>>>>>>>>>>       > --
>>>    >>>>>>>>>>>>>>>       > Chang Liu
>>>    >>>>>>>>>>>>>>>       > Staff Research Physicist
>>>    >>>>>>>>>>>>>>>       > +1 609 243 3438
>>>    >>>>>>>>>>>>>>>       > cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>    >>>>>>>>>>>>>>>       > Princeton Plasma Physics Laboratory
>>>    >>>>>>>>>>>>>>>       > 100 Stellarator Rd, Princeton NJ 08540, USA
>>>    >>>>>>>>>>>>>>
>>>    >>>>>>>>>>>>>> --
>>>    >>>>>>>>>>>>>> Chang Liu
>>>    >>>>>>>>>>>>>> Staff Research Physicist
>>>    >>>>>>>>>>>>>> +1 609 243 3438
>>>    >>>>>>>>>>>>>> cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    >>>>>>>>>>>>>> Princeton Plasma Physics Laboratory
>>>    >>>>>>>>>>>>>> 100 Stellarator Rd, Princeton NJ 08540, USA
>>>    >>>>>>>>>>>>
>>>    >>>>>>>>>>>> --
>>>    >>>>>>>>>>>> Chang Liu
>>>    >>>>>>>>>>>> Staff Research Physicist
>>>    >>>>>>>>>>>> +1 609 243 3438
>>>    >>>>>>>>>>>> cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    >>>>>>>>>>>> Princeton Plasma Physics Laboratory
>>>    >>>>>>>>>>>> 100 Stellarator Rd, Princeton NJ 08540, USA
>>>    >>>>>>>>>>
>>>    >>>>>>>>>> --
>>>    >>>>>>>>>> Chang Liu
>>>    >>>>>>>>>> Staff Research Physicist
>>>    >>>>>>>>>> +1 609 243 3438
>>>    >>>>>>>>>> cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    >>>>>>>>>> Princeton Plasma Physics Laboratory
>>>    >>>>>>>>>> 100 Stellarator Rd, Princeton NJ 08540, USA
>>>    >>>>>>>>
>>>    >>>>>>>> --
>>>    >>>>>>>> Chang Liu
>>>    >>>>>>>> Staff Research Physicist
>>>    >>>>>>>> +1 609 243 3438
>>>    >>>>>>>> cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    >>>>>>>> Princeton Plasma Physics Laboratory
>>>    >>>>>>>> 100 Stellarator Rd, Princeton NJ 08540, USA
>>>    >>>>>>>
>>>    >>>>>>
>>>    >>>>
>>>    >>>> --
>>>    >>>> Chang Liu
>>>    >>>> Staff Research Physicist
>>>    >>>> +1 609 243 3438
>>>    >>>> cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    >>>> Princeton Plasma Physics Laboratory
>>>    >>>> 100 Stellarator Rd, Princeton NJ 08540, USA
>>>    >>
>>>    >> --
>>>    >> Chang Liu
>>>    >> Staff Research Physicist
>>>    >> +1 609 243 3438
>>>    >> cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    >> Princeton Plasma Physics Laboratory
>>>    >> 100 Stellarator Rd, Princeton NJ 08540, USA
>>>    >
>>> 
>>>    --     Chang Liu
>>>    Staff Research Physicist
>>>    +1 609 243 3438
>>>    cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>    Princeton Plasma Physics Laboratory
>>>    100 Stellarator Rd, Princeton NJ 08540, USA
>>> 
> 
> -- 
> Chang Liu
> Staff Research Physicist
> +1 609 243 3438
> cliu at pppl.gov
> Princeton Plasma Physics Laboratory
> 100 Stellarator Rd, Princeton NJ 08540, USA



More information about the petsc-users mailing list