[petsc-users] [External] Re: request to add an option similar to use_omp_threads for mumps to cusparse solver

Wed Oct 20 13:59:12 CDT 2021

The MR https://gitlab.com/petsc/petsc/-/merge_requests/4471 has not been
merged yet.

--Junchao Zhang

On Wed, Oct 20, 2021 at 1:47 PM Chang Liu via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> Hi Barry,
>
> Are the fixes merged in the master? I was using bjacobi as a
> preconditioner. Using the latest version of petsc, I found that by calling
>
> mpiexec -n 32 --oversubscribe ./ex7 -m 1000 -ksp_view
> -ksp_monitor_true_residual -ksp_type fgmres -pc_type bjacobi -pc_bjacobi
> _blocks 4 -sub_ksp_type preonly -sub_pc_type telescope
> -sub_pc_telescope_reduction_factor 8 -sub_pc_telescope_subcomm_type
> contiguous -sub_telescope_pc_type lu -sub_telescope_ksp_type preonly
> -sub_telescope_pc_factor_mat_solver_type mumps -ksp_max_it 2000
> -ksp_rtol 1.e-30 -ksp_atol 1.e-30
>
> The code is calling PCApply_BJacobi_Multiproc. If I use
>
> mpiexec -n 32 --oversubscribe ./ex7 -m 1000 -ksp_view
> -ksp_monitor_true_residual -telescope_ksp_monitor_true_residual
> -ksp_type preonly -pc_type telescope -pc_telescope_reduction_factor 8
> -pc_telescope_subcomm_type contiguous -telescope_pc_type bjacobi
> -telescope_ksp_type fgmres -telescope_pc_bjacobi_blocks 4
> -telescope_sub_ksp_type preonly -telescope_sub_pc_type lu
> -telescope_sub_pc_factor_mat_solver_type mumps -telescope_ksp_max_it
> 2000 -telescope_ksp_rtol 1.e-30 -telescope_ksp_atol 1.e-30
>
> The code is calling PCApply_BJacobi_Singleblock. You can test it yourself.
>
> Regards,
>
> Chang
>
> On 10/20/21 1:14 PM, Barry Smith wrote:
> >
> >
> >> On Oct 20, 2021, at 12:48 PM, Chang Liu <cliu at pppl.gov> wrote:
> >>
> >> Hi Pierre,
> >>
> >> I have another suggestion for telescope. I have achieved my goal by
> putting telescope outside bjacobi. But the code still does not work if I
> use telescope as a pc for subblock. I think the reason is that I want to
> use cusparse as the solver, which can only deal with seqaij matrix and not
> mpiaij matrix.
> >
> >
> >      This is suppose to work with the recent fixes. The telescope should
> produce a seq matrix and for each solve map the parallel vector (over the
> subdomain) automatically down to the one rank with the GPU to solve it on
> the GPU. It is not clear to me where the process is going wrong.
> >
> >    Barry
> >
> >
> >
> >> However, for telescope pc, it can put the matrix into one mpi rank,
> thus making it a seqaij for factorization stage, but then after
> factorization it will give the data back to the original comminicator. This
> will make the matrix back to mpiaij, and then cusparse cannot solve it.
> >>
> >> I think a better option is to do the factorization on CPU with mpiaij,
> then then transform the preconditioner matrix to seqaij and do the matsolve
> GPU. But I am not sure if it can be achieved using telescope.
> >>
> >> Regads,
> >>
> >> Chang
> >>
> >> On 10/15/21 5:29 AM, Pierre Jolivet wrote:
> >>> Hi Chang,
> >>> The output you sent with MUMPS looks alright to me, you can see that
> the MatType is properly set to seqaijcusparse (and not mpiaijcusparse).
> >>> I don’t know what is wrong with
> -sub_telescope_pc_factor_mat_solver_type cusparse, I don’t have a PETSc
> installation for testing this, hopefully Barry or Junchao can confirm this
> wrong behavior and get this fixed.
> >>> As for permuting PCTELESCOPE and PCBJACOBI, in your case, the outer PC
> will be equivalent, yes.
> >>> However, it would be more efficient to do PCBJACOBI and then
> PCTELESCOPE.
> >>> PCBJACOBI prunes the operator by basically removing all coefficients
> outside of the diagonal blocks.
> >>> Then, PCTELESCOPE "groups everything together”.
> >>> If you do it the other way around, PCTELESCOPE will “group everything
> together” and then PCBJACOBI will prune the operator.
> >>> So the PCTELESCOPE SetUp will be costly for nothing since some
> coefficients will be thrown out afterwards in the PCBJACOBI SetUp.
> >>> I hope I’m clear enough, otherwise I can try do draw some pictures.
> >>> Thanks,
> >>> Pierre
> >>>> On 15 Oct 2021, at 4:39 AM, Chang Liu <cliu at pppl.gov> wrote:
> >>>>
> >>>> Hi Pierre and Barry,
> >>>>
> >>>> I think maybe I should use telescope outside bjacobi? like this
> >>>>
> >>>> mpiexec -n 16 --hostfile hostfile --oversubscribe ./ex7 -m 400
> -ksp_view -ksp_monitor_true_residual -pc_type telescope
> -pc_telescope_reduction_factor 4 -t
> >>>> elescope_pc_type bjacobi -telescope_ksp_type fgmres
> -telescope_pc_bjacobi_blocks 4 -mat_type aijcusparse
> -telescope_sub_ksp_type preonly -telescope_sub_pc_type lu
> -telescope_sub_pc_factor_mat_solve
> >>>> r_type cusparse -ksp_max_it 2000 -ksp_rtol 1.e-20 -ksp_atol 1.e-9
> >>>>
> >>>> But then I got an error that
> >>>>
> >>>> [0]PETSC ERROR: MatSolverType cusparse does not support matrix type
> seqaij
> >>>>
> >>>> But the mat type should be aijcusparse. I think telescope change the
> mat type.
> >>>>
> >>>> Chang
> >>>>
> >>>> On 10/14/21 10:11 PM, Chang Liu wrote:
> >>>>> For comparison, here is the output using mumps instead of cusparse
> >>>>> $ mpiexec -n 16 --hostfile hostfile --oversubscribe ./ex7 -m 400
> -ksp_view -ksp_monitor_true_residual -pc_type bjacobi -pc_bjacobi_blocks 4
> -ksp_type fgmres -mat_type aijcusparse -sub_pc_type telescope -sub_ksp_type
> preonly -sub_telescope_ksp_type preonly -sub_telescope_pc_type lu
> -sub_telescope_pc_factor_mat_solver_type mumps
> -sub_pc_telescope_reduction_factor 4 -sub_pc_telescope_subcomm_type
> contiguous -ksp_max_it 2000 -ksp_rtol 1.e-20 -ksp_atol 1.e-9
> >>>>>    0 KSP unpreconditioned resid norm 4.014971979977e+01 true resid
> norm 4.014971979977e+01 ||r(i)||/||b|| 1.000000000000e+00
> >>>>>    1 KSP unpreconditioned resid norm 2.439995191694e+00 true resid
> norm 2.439995191694e+00 ||r(i)||/||b|| 6.077240896978e-02
> >>>>>    2 KSP unpreconditioned resid norm 1.280694102588e+00 true resid
> norm 1.280694102588e+00 ||r(i)||/||b|| 3.189795866509e-02
> >>>>>    3 KSP unpreconditioned resid norm 1.041100266810e+00 true resid
> norm 1.041100266810e+00 ||r(i)||/||b|| 2.593044912896e-02
> >>>>>    4 KSP unpreconditioned resid norm 7.274347137268e-01 true resid
> norm 7.274347137268e-01 ||r(i)||/||b|| 1.811805206499e-02
> >>>>>    5 KSP unpreconditioned resid norm 5.429229329787e-01 true resid
> norm 5.429229329787e-01 ||r(i)||/||b|| 1.352245882876e-02
> >>>>>    6 KSP unpreconditioned resid norm 4.332970410353e-01 true resid
> norm 4.332970410353e-01 ||r(i)||/||b|| 1.079203150598e-02
> >>>>>    7 KSP unpreconditioned resid norm 3.948206050950e-01 true resid
> norm 3.948206050950e-01 ||r(i)||/||b|| 9.833707609019e-03
> >>>>>    8 KSP unpreconditioned resid norm 3.379580577269e-01 true resid
> norm 3.379580577269e-01 ||r(i)||/||b|| 8.417444988714e-03
> >>>>>    9 KSP unpreconditioned resid norm 2.875593971410e-01 true resid
> norm 2.875593971410e-01 ||r(i)||/||b|| 7.162176936105e-03
> >>>>>   10 KSP unpreconditioned resid norm 2.533983363244e-01 true resid
> norm 2.533983363244e-01 ||r(i)||/||b|| 6.311335112378e-03
> >>>>>   11 KSP unpreconditioned resid norm 2.389169921094e-01 true resid
> norm 2.389169921094e-01 ||r(i)||/||b|| 5.950651543793e-03
> >>>>>   12 KSP unpreconditioned resid norm 2.118961639089e-01 true resid
> norm 2.118961639089e-01 ||r(i)||/||b|| 5.277649880637e-03
> >>>>>   13 KSP unpreconditioned resid norm 1.885892030223e-01 true resid
> norm 1.885892030223e-01 ||r(i)||/||b|| 4.697148671593e-03
> >>>>>   14 KSP unpreconditioned resid norm 1.763510666948e-01 true resid
> norm 1.763510666948e-01 ||r(i)||/||b|| 4.392336175055e-03
> >>>>>   15 KSP unpreconditioned resid norm 1.638219366731e-01 true resid
> norm 1.638219366731e-01 ||r(i)||/||b|| 4.080275964317e-03
> >>>>>   16 KSP unpreconditioned resid norm 1.476792766432e-01 true resid
> norm 1.476792766432e-01 ||r(i)||/||b|| 3.678214378076e-03
> >>>>>   17 KSP unpreconditioned resid norm 1.349906937321e-01 true resid
> norm 1.349906937321e-01 ||r(i)||/||b|| 3.362182710248e-03
> >>>>>   18 KSP unpreconditioned resid norm 1.289673236836e-01 true resid
> norm 1.289673236836e-01 ||r(i)||/||b|| 3.212159993314e-03
> >>>>>   19 KSP unpreconditioned resid norm 1.167505658153e-01 true resid
> norm 1.167505658153e-01 ||r(i)||/||b|| 2.907879965230e-03
> >>>>>   20 KSP unpreconditioned resid norm 1.046037988999e-01 true resid
> norm 1.046037988999e-01 ||r(i)||/||b|| 2.605343185995e-03
> >>>>>   21 KSP unpreconditioned resid norm 9.832660514331e-02 true resid
> norm 9.832660514331e-02 ||r(i)||/||b|| 2.448998539309e-03
> >>>>>   22 KSP unpreconditioned resid norm 8.835618950141e-02 true resid
> norm 8.835618950142e-02 ||r(i)||/||b|| 2.200667649539e-03
> >>>>>   23 KSP unpreconditioned resid norm 7.563496650115e-02 true resid
> norm 7.563496650116e-02 ||r(i)||/||b|| 1.883823022386e-03
> >>>>>   24 KSP unpreconditioned resid norm 6.651291376834e-02 true resid
> norm 6.651291376834e-02 ||r(i)||/||b|| 1.656622115921e-03
> >>>>>   25 KSP unpreconditioned resid norm 5.890393227906e-02 true resid
> norm 5.890393227906e-02 ||r(i)||/||b|| 1.467106933070e-03
> >>>>>   26 KSP unpreconditioned resid norm 4.661992782780e-02 true resid
> norm 4.661992782780e-02 ||r(i)||/||b|| 1.161152009536e-03
> >>>>>   27 KSP unpreconditioned resid norm 3.690705358716e-02 true resid
> norm 3.690705358716e-02 ||r(i)||/||b|| 9.192356452602e-04
> >>>>>   28 KSP unpreconditioned resid norm 3.209680460188e-02 true resid
> norm 3.209680460188e-02 ||r(i)||/||b|| 7.994278605666e-04
> >>>>>   29 KSP unpreconditioned resid norm 2.354337626000e-02 true resid
> norm 2.354337626001e-02 ||r(i)||/||b|| 5.863895533373e-04
> >>>>>   30 KSP unpreconditioned resid norm 1.701296561785e-02 true resid
> norm 1.701296561785e-02 ||r(i)||/||b|| 4.237380908932e-04
> >>>>>   31 KSP unpreconditioned resid norm 1.509942937258e-02 true resid
> norm 1.509942937258e-02 ||r(i)||/||b|| 3.760780759588e-04
> >>>>>   32 KSP unpreconditioned resid norm 1.258274688515e-02 true resid
> norm 1.258274688515e-02 ||r(i)||/||b|| 3.133956338402e-04
> >>>>>   33 KSP unpreconditioned resid norm 9.805748771638e-03 true resid
> norm 9.805748771638e-03 ||r(i)||/||b|| 2.442295692359e-04
> >>>>>   34 KSP unpreconditioned resid norm 8.596552678160e-03 true resid
> norm 8.596552678160e-03 ||r(i)||/||b|| 2.141123953301e-04
> >>>>>   35 KSP unpreconditioned resid norm 6.936406707500e-03 true resid
> norm 6.936406707500e-03 ||r(i)||/||b|| 1.727635147167e-04
> >>>>>   36 KSP unpreconditioned resid norm 5.533741607932e-03 true resid
> norm 5.533741607932e-03 ||r(i)||/||b|| 1.378276519869e-04
> >>>>>   37 KSP unpreconditioned resid norm 4.982347757923e-03 true resid
> norm 4.982347757923e-03 ||r(i)||/||b|| 1.240942099414e-04
> >>>>>   38 KSP unpreconditioned resid norm 4.309608348059e-03 true resid
> norm 4.309608348059e-03 ||r(i)||/||b|| 1.073384414524e-04
> >>>>>   39 KSP unpreconditioned resid norm 3.729408303186e-03 true resid
> norm 3.729408303185e-03 ||r(i)||/||b|| 9.288753001974e-05
> >>>>>   40 KSP unpreconditioned resid norm 3.490003351128e-03 true resid
> norm 3.490003351128e-03 ||r(i)||/||b|| 8.692472496776e-05
> >>>>>   41 KSP unpreconditioned resid norm 3.069012426454e-03 true resid
> norm 3.069012426453e-03 ||r(i)||/||b|| 7.643919912166e-05
> >>>>>   42 KSP unpreconditioned resid norm 2.772928845284e-03 true resid
> norm 2.772928845284e-03 ||r(i)||/||b|| 6.906471225983e-05
> >>>>>   43 KSP unpreconditioned resid norm 2.561454192399e-03 true resid
> norm 2.561454192398e-03 ||r(i)||/||b|| 6.379756085902e-05
> >>>>>   44 KSP unpreconditioned resid norm 2.253662762802e-03 true resid
> norm 2.253662762802e-03 ||r(i)||/||b|| 5.613146926159e-05
> >>>>>   45 KSP unpreconditioned resid norm 2.086800523919e-03 true resid
> norm 2.086800523919e-03 ||r(i)||/||b|| 5.197546917701e-05
> >>>>>   46 KSP unpreconditioned resid norm 1.926028182896e-03 true resid
> norm 1.926028182896e-03 ||r(i)||/||b|| 4.797114880257e-05
> >>>>>   47 KSP unpreconditioned resid norm 1.769243808622e-03 true resid
> norm 1.769243808622e-03 ||r(i)||/||b|| 4.406615581492e-05
> >>>>>   48 KSP unpreconditioned resid norm 1.656654905964e-03 true resid
> norm 1.656654905964e-03 ||r(i)||/||b|| 4.126192945371e-05
> >>>>>   49 KSP unpreconditioned resid norm 1.572052627273e-03 true resid
> norm 1.572052627273e-03 ||r(i)||/||b|| 3.915475961260e-05
> >>>>>   50 KSP unpreconditioned resid norm 1.454960682355e-03 true resid
> norm 1.454960682355e-03 ||r(i)||/||b|| 3.623837699518e-05
> >>>>>   51 KSP unpreconditioned resid norm 1.375985053014e-03 true resid
> norm 1.375985053014e-03 ||r(i)||/||b|| 3.427134883820e-05
> >>>>>   52 KSP unpreconditioned resid norm 1.269325501087e-03 true resid
> norm 1.269325501087e-03 ||r(i)||/||b|| 3.161480347603e-05
> >>>>>   53 KSP unpreconditioned resid norm 1.184791772965e-03 true resid
> norm 1.184791772965e-03 ||r(i)||/||b|| 2.950934100844e-05
> >>>>>   54 KSP unpreconditioned resid norm 1.064535156080e-03 true resid
> norm 1.064535156080e-03 ||r(i)||/||b|| 2.651413662135e-05
> >>>>>   55 KSP unpreconditioned resid norm 9.639036688120e-04 true resid
> norm 9.639036688117e-04 ||r(i)||/||b|| 2.400773090370e-05
> >>>>>   56 KSP unpreconditioned resid norm 8.632359780260e-04 true resid
> norm 8.632359780260e-04 ||r(i)||/||b|| 2.150042347322e-05
> >>>>>   57 KSP unpreconditioned resid norm 7.613605783850e-04 true resid
> norm 7.613605783850e-04 ||r(i)||/||b|| 1.896303591113e-05
> >>>>>   58 KSP unpreconditioned resid norm 6.681073248348e-04 true resid
> norm 6.681073248349e-04 ||r(i)||/||b|| 1.664039819373e-05
> >>>>>   59 KSP unpreconditioned resid norm 5.656127908544e-04 true resid
> norm 5.656127908545e-04 ||r(i)||/||b|| 1.408758999254e-05
> >>>>>   60 KSP unpreconditioned resid norm 4.850863370767e-04 true resid
> norm 4.850863370767e-04 ||r(i)||/||b|| 1.208193580169e-05
> >>>>>   61 KSP unpreconditioned resid norm 4.374055762320e-04 true resid
> norm 4.374055762316e-04 ||r(i)||/||b|| 1.089436186387e-05
> >>>>>   62 KSP unpreconditioned resid norm 3.874398257079e-04 true resid
> norm 3.874398257077e-04 ||r(i)||/||b|| 9.649876204364e-06
> >>>>>   63 KSP unpreconditioned resid norm 3.364908694427e-04 true resid
> norm 3.364908694429e-04 ||r(i)||/||b|| 8.380902061609e-06
> >>>>>   64 KSP unpreconditioned resid norm 2.961034697265e-04 true resid
> norm 2.961034697268e-04 ||r(i)||/||b|| 7.374982221632e-06
> >>>>>   65 KSP unpreconditioned resid norm 2.640593092764e-04 true resid
> norm 2.640593092767e-04 ||r(i)||/||b|| 6.576865557059e-06
> >>>>>   66 KSP unpreconditioned resid norm 2.423231125743e-04 true resid
> norm 2.423231125745e-04 ||r(i)||/||b|| 6.035487016671e-06
> >>>>>   67 KSP unpreconditioned resid norm 2.182349471179e-04 true resid
> norm 2.182349471179e-04 ||r(i)||/||b|| 5.435528521898e-06
> >>>>>   68 KSP unpreconditioned resid norm 2.008438265031e-04 true resid
> norm 2.008438265028e-04 ||r(i)||/||b|| 5.002371809927e-06
> >>>>>   69 KSP unpreconditioned resid norm 1.838732863386e-04 true resid
> norm 1.838732863388e-04 ||r(i)||/||b|| 4.579690400226e-06
> >>>>>   70 KSP unpreconditioned resid norm 1.723786027645e-04 true resid
> norm 1.723786027645e-04 ||r(i)||/||b|| 4.293394913444e-06
> >>>>>   71 KSP unpreconditioned resid norm 1.580945192204e-04 true resid
> norm 1.580945192205e-04 ||r(i)||/||b|| 3.937624471826e-06
> >>>>>   72 KSP unpreconditioned resid norm 1.476687469671e-04 true resid
> norm 1.476687469671e-04 ||r(i)||/||b|| 3.677952117812e-06
> >>>>>   73 KSP unpreconditioned resid norm 1.385018526182e-04 true resid
> norm 1.385018526184e-04 ||r(i)||/||b|| 3.449634351350e-06
> >>>>>   74 KSP unpreconditioned resid norm 1.279712893541e-04 true resid
> norm 1.279712893541e-04 ||r(i)||/||b|| 3.187351991305e-06
> >>>>>   75 KSP unpreconditioned resid norm 1.202010411772e-04 true resid
> norm 1.202010411774e-04 ||r(i)||/||b|| 2.993820175504e-06
> >>>>>   76 KSP unpreconditioned resid norm 1.113459414198e-04 true resid
> norm 1.113459414200e-04 ||r(i)||/||b|| 2.773268206485e-06
> >>>>>   77 KSP unpreconditioned resid norm 1.042523036036e-04 true resid
> norm 1.042523036037e-04 ||r(i)||/||b|| 2.596588572066e-06
> >>>>>   78 KSP unpreconditioned resid norm 9.565176453232e-05 true resid
> norm 9.565176453227e-05 ||r(i)||/||b|| 2.382376888539e-06
> >>>>>   79 KSP unpreconditioned resid norm 8.896901670359e-05 true resid
> norm 8.896901670365e-05 ||r(i)||/||b|| 2.215931198209e-06
> >>>>>   80 KSP unpreconditioned resid norm 8.119298425803e-05 true resid
> norm 8.119298425824e-05 ||r(i)||/||b|| 2.022255314935e-06
> >>>>>   81 KSP unpreconditioned resid norm 7.544528309154e-05 true resid
> norm 7.544528309154e-05 ||r(i)||/||b|| 1.879098620558e-06
> >>>>>   82 KSP unpreconditioned resid norm 6.755385041138e-05 true resid
> norm 6.755385041176e-05 ||r(i)||/||b|| 1.682548489719e-06
> >>>>>   83 KSP unpreconditioned resid norm 6.158629300870e-05 true resid
> norm 6.158629300835e-05 ||r(i)||/||b|| 1.533915885727e-06
> >>>>>   84 KSP unpreconditioned resid norm 5.358756885754e-05 true resid
> norm 5.358756885765e-05 ||r(i)||/||b|| 1.334693470462e-06
> >>>>>   85 KSP unpreconditioned resid norm 4.774852370380e-05 true resid
> norm 4.774852370387e-05 ||r(i)||/||b|| 1.189261692037e-06
> >>>>>   86 KSP unpreconditioned resid norm 3.919358737908e-05 true resid
> norm 3.919358737930e-05 ||r(i)||/||b|| 9.761858258229e-07
> >>>>>   87 KSP unpreconditioned resid norm 3.434042319950e-05 true resid
> norm 3.434042319947e-05 ||r(i)||/||b|| 8.553091620745e-07
> >>>>>   88 KSP unpreconditioned resid norm 2.813699436281e-05 true resid
> norm 2.813699436302e-05 ||r(i)||/||b|| 7.008017615898e-07
> >>>>>   89 KSP unpreconditioned resid norm 2.462248069068e-05 true resid
> norm 2.462248069051e-05 ||r(i)||/||b|| 6.132665635851e-07
> >>>>>   90 KSP unpreconditioned resid norm 2.040558789626e-05 true resid
> norm 2.040558789626e-05 ||r(i)||/||b|| 5.082373674841e-07
> >>>>>   91 KSP unpreconditioned resid norm 1.888523204468e-05 true resid
> norm 1.888523204470e-05 ||r(i)||/||b|| 4.703702077842e-07
> >>>>>   92 KSP unpreconditioned resid norm 1.707071292484e-05 true resid
> norm 1.707071292474e-05 ||r(i)||/||b|| 4.251763900191e-07
> >>>>>   93 KSP unpreconditioned resid norm 1.498636454665e-05 true resid
> norm 1.498636454672e-05 ||r(i)||/||b|| 3.732619958859e-07
> >>>>>   94 KSP unpreconditioned resid norm 1.219393542993e-05 true resid
> norm 1.219393543006e-05 ||r(i)||/||b|| 3.037115947725e-07
> >>>>>   95 KSP unpreconditioned resid norm 1.059996963300e-05 true resid
> norm 1.059996963303e-05 ||r(i)||/||b|| 2.640110487917e-07
> >>>>>   96 KSP unpreconditioned resid norm 9.099659872548e-06 true resid
> norm 9.099659873214e-06 ||r(i)||/||b|| 2.266431725699e-07
> >>>>>   97 KSP unpreconditioned resid norm 8.147347587295e-06 true resid
> norm 8.147347587584e-06 ||r(i)||/||b|| 2.029241456283e-07
> >>>>>   98 KSP unpreconditioned resid norm 7.167226146744e-06 true resid
> norm 7.167226146783e-06 ||r(i)||/||b|| 1.785124823418e-07
> >>>>>   99 KSP unpreconditioned resid norm 6.552540209538e-06 true resid
> norm 6.552540209577e-06 ||r(i)||/||b|| 1.632026385802e-07
> >>>>> 100 KSP unpreconditioned resid norm 5.767783600111e-06 true resid
> norm 5.767783600320e-06 ||r(i)||/||b|| 1.436568830140e-07
> >>>>> 101 KSP unpreconditioned resid norm 5.261057430584e-06 true resid
> norm 5.261057431144e-06 ||r(i)||/||b|| 1.310359688033e-07
> >>>>> 102 KSP unpreconditioned resid norm 4.715498525786e-06 true resid
> norm 4.715498525947e-06 ||r(i)||/||b|| 1.174478564100e-07
> >>>>> 103 KSP unpreconditioned resid norm 4.380052669622e-06 true resid
> norm 4.380052669825e-06 ||r(i)||/||b|| 1.090929822591e-07
> >>>>> 104 KSP unpreconditioned resid norm 3.911664470060e-06 true resid
> norm 3.911664470226e-06 ||r(i)||/||b|| 9.742694319496e-08
> >>>>> 105 KSP unpreconditioned resid norm 3.652211458315e-06 true resid
> norm 3.652211458259e-06 ||r(i)||/||b|| 9.096480564430e-08
> >>>>> 106 KSP unpreconditioned resid norm 3.387532128049e-06 true resid
> norm 3.387532128358e-06 ||r(i)||/||b|| 8.437249737363e-08
> >>>>> 107 KSP unpreconditioned resid norm 3.234218880987e-06 true resid
> norm 3.234218880798e-06 ||r(i)||/||b|| 8.055395895481e-08
> >>>>> 108 KSP unpreconditioned resid norm 3.016905196388e-06 true resid
> norm 3.016905196492e-06 ||r(i)||/||b|| 7.514137611763e-08
> >>>>> 109 KSP unpreconditioned resid norm 2.858246441921e-06 true resid
> norm 2.858246441975e-06 ||r(i)||/||b|| 7.118969836476e-08
> >>>>> 110 KSP unpreconditioned resid norm 2.637118810847e-06 true resid
> norm 2.637118810750e-06 ||r(i)||/||b|| 6.568212241336e-08
> >>>>> 111 KSP unpreconditioned resid norm 2.494976088717e-06 true resid
> norm 2.494976088700e-06 ||r(i)||/||b|| 6.214180574966e-08
> >>>>> 112 KSP unpreconditioned resid norm 2.270639574272e-06 true resid
> norm 2.270639574200e-06 ||r(i)||/||b|| 5.655430686750e-08
> >>>>> 113 KSP unpreconditioned resid norm 2.104988663813e-06 true resid
> norm 2.104988664169e-06 ||r(i)||/||b|| 5.242847707696e-08
> >>>>> 114 KSP unpreconditioned resid norm 1.889361127301e-06 true resid
> norm 1.889361127526e-06 ||r(i)||/||b|| 4.705789073868e-08
> >>>>> 115 KSP unpreconditioned resid norm 1.732367008052e-06 true resid
> norm 1.732367007971e-06 ||r(i)||/||b|| 4.314767367271e-08
> >>>>> 116 KSP unpreconditioned resid norm 1.509288268391e-06 true resid
> norm 1.509288268645e-06 ||r(i)||/||b|| 3.759150191264e-08
> >>>>> 117 KSP unpreconditioned resid norm 1.359169217644e-06 true resid
> norm 1.359169217445e-06 ||r(i)||/||b|| 3.385252062089e-08
> >>>>> 118 KSP unpreconditioned resid norm 1.180146337735e-06 true resid
> norm 1.180146337908e-06 ||r(i)||/||b|| 2.939363820703e-08
> >>>>> 119 KSP unpreconditioned resid norm 1.067757039683e-06 true resid
> norm 1.067757039924e-06 ||r(i)||/||b|| 2.659438335433e-08
> >>>>> 120 KSP unpreconditioned resid norm 9.435833073736e-07 true resid
> norm 9.435833073736e-07 ||r(i)||/||b|| 2.350161625235e-08
> >>>>> 121 KSP unpreconditioned resid norm 8.749457237613e-07 true resid
> norm 8.749457236791e-07 ||r(i)||/||b|| 2.179207546261e-08
> >>>>> 122 KSP unpreconditioned resid norm 7.945760150897e-07 true resid
> norm 7.945760150444e-07 ||r(i)||/||b|| 1.979032528762e-08
> >>>>> 123 KSP unpreconditioned resid norm 7.141240839013e-07 true resid
> norm 7.141240838682e-07 ||r(i)||/||b|| 1.778652721438e-08
> >>>>> 124 KSP unpreconditioned resid norm 6.300566936733e-07 true resid
> norm 6.300566936607e-07 ||r(i)||/||b|| 1.569267971988e-08
> >>>>> 125 KSP unpreconditioned resid norm 5.628986997544e-07 true resid
> norm 5.628986995849e-07 ||r(i)||/||b|| 1.401999073448e-08
> >>>>> 126 KSP unpreconditioned resid norm 5.119018951602e-07 true resid
> norm 5.119018951837e-07 ||r(i)||/||b|| 1.274982484900e-08
> >>>>> 127 KSP unpreconditioned resid norm 4.664670343748e-07 true resid
> norm 4.664670344042e-07 ||r(i)||/||b|| 1.161818903670e-08
> >>>>> 128 KSP unpreconditioned resid norm 4.253264691112e-07 true resid
> norm 4.253264691948e-07 ||r(i)||/||b|| 1.059351027394e-08
> >>>>> 129 KSP unpreconditioned resid norm 3.868921150516e-07 true resid
> norm 3.868921150517e-07 ||r(i)||/||b|| 9.636234498800e-09
> >>>>> 130 KSP unpreconditioned resid norm 3.558445658540e-07 true resid
> norm 3.558445660061e-07 ||r(i)||/||b|| 8.862940209315e-09
> >>>>> 131 KSP unpreconditioned resid norm 3.268710273840e-07 true resid
> norm 3.268710272455e-07 ||r(i)||/||b|| 8.141302825416e-09
> >>>>> 132 KSP unpreconditioned resid norm 3.041273897592e-07 true resid
> norm 3.041273896694e-07 ||r(i)||/||b|| 7.574832182794e-09
> >>>>> 133 KSP unpreconditioned resid norm 2.851926677922e-07 true resid
> norm 2.851926674248e-07 ||r(i)||/||b|| 7.103229333782e-09
> >>>>> 134 KSP unpreconditioned resid norm 2.694708315072e-07 true resid
> norm 2.694708309500e-07 ||r(i)||/||b|| 6.711649104748e-09
> >>>>> 135 KSP unpreconditioned resid norm 2.534825559099e-07 true resid
> norm 2.534825557469e-07 ||r(i)||/||b|| 6.313432746507e-09
> >>>>> 136 KSP unpreconditioned resid norm 2.387342352458e-07 true resid
> norm 2.387342351804e-07 ||r(i)||/||b|| 5.946099658254e-09
> >>>>> 137 KSP unpreconditioned resid norm 2.200861667617e-07 true resid
> norm 2.200861665255e-07 ||r(i)||/||b|| 5.481636425438e-09
> >>>>> 138 KSP unpreconditioned resid norm 2.051415370616e-07 true resid
> norm 2.051415370614e-07 ||r(i)||/||b|| 5.109413915824e-09
> >>>>> 139 KSP unpreconditioned resid norm 1.887376429396e-07 true resid
> norm 1.887376426682e-07 ||r(i)||/||b|| 4.700845824315e-09
> >>>>> 140 KSP unpreconditioned resid norm 1.729743133005e-07 true resid
> norm 1.729743128342e-07 ||r(i)||/||b|| 4.308232129561e-09
> >>>>> 141 KSP unpreconditioned resid norm 1.541021130781e-07 true resid
> norm 1.541021128364e-07 ||r(i)||/||b|| 3.838186508023e-09
> >>>>> 142 KSP unpreconditioned resid norm 1.384631628565e-07 true resid
> norm 1.384631627735e-07 ||r(i)||/||b|| 3.448670712125e-09
> >>>>> 143 KSP unpreconditioned resid norm 1.223114405626e-07 true resid
> norm 1.223114403883e-07 ||r(i)||/||b|| 3.046383411846e-09
> >>>>> 144 KSP unpreconditioned resid norm 1.087313066223e-07 true resid
> norm 1.087313065117e-07 ||r(i)||/||b|| 2.708146085550e-09
> >>>>> 145 KSP unpreconditioned resid norm 9.181901998734e-08 true resid
> norm 9.181901984268e-08 ||r(i)||/||b|| 2.286915582489e-09
> >>>>> 146 KSP unpreconditioned resid norm 7.885850510808e-08 true resid
> norm 7.885850531446e-08 ||r(i)||/||b|| 1.964110975313e-09
> >>>>> 147 KSP unpreconditioned resid norm 6.483393946950e-08 true resid
> norm 6.483393931383e-08 ||r(i)||/||b|| 1.614804278515e-09
> >>>>> 148 KSP unpreconditioned resid norm 5.690132597004e-08 true resid
> norm 5.690132577518e-08 ||r(i)||/||b|| 1.417228465328e-09
> >>>>> 149 KSP unpreconditioned resid norm 5.023671521579e-08 true resid
> norm 5.023671502186e-08 ||r(i)||/||b|| 1.251234511035e-09
> >>>>> 150 KSP unpreconditioned resid norm 4.625371062660e-08 true resid
> norm 4.625371062660e-08 ||r(i)||/||b|| 1.152030720445e-09
> >>>>> 151 KSP unpreconditioned resid norm 4.349049084805e-08 true resid
> norm 4.349049089337e-08 ||r(i)||/||b|| 1.083207830846e-09
> >>>>> 152 KSP unpreconditioned resid norm 3.932593324498e-08 true resid
> norm 3.932593376918e-08 ||r(i)||/||b|| 9.794821474546e-10
> >>>>> 153 KSP unpreconditioned resid norm 3.504167649202e-08 true resid
> norm 3.504167638113e-08 ||r(i)||/||b|| 8.727751166356e-10
> >>>>> 154 KSP unpreconditioned resid norm 2.892726347747e-08 true resid
> norm 2.892726348583e-08 ||r(i)||/||b|| 7.204848160858e-10
> >>>>> 155 KSP unpreconditioned resid norm 2.477647033202e-08 true resid
> norm 2.477647041570e-08 ||r(i)||/||b|| 6.171019508795e-10
> >>>>> 156 KSP unpreconditioned resid norm 2.128504065757e-08 true resid
> norm 2.128504067423e-08 ||r(i)||/||b|| 5.301416991298e-10
> >>>>> 157 KSP unpreconditioned resid norm 1.879248809429e-08 true resid
> norm 1.879248818928e-08 ||r(i)||/||b|| 4.680602575310e-10
> >>>>> 158 KSP unpreconditioned resid norm 1.673649140073e-08 true resid
> norm 1.673649134005e-08 ||r(i)||/||b|| 4.168520085200e-10
> >>>>> 159 KSP unpreconditioned resid norm 1.497123388109e-08 true resid
> norm 1.497123365569e-08 ||r(i)||/||b|| 3.728851342016e-10
> >>>>> 160 KSP unpreconditioned resid norm 1.315982130162e-08 true resid
> norm 1.315982149329e-08 ||r(i)||/||b|| 3.277687007261e-10
> >>>>> 161 KSP unpreconditioned resid norm 1.182395864938e-08 true resid
> norm 1.182395868430e-08 ||r(i)||/||b|| 2.944966675550e-10
> >>>>> 162 KSP unpreconditioned resid norm 1.070204481679e-08 true resid
> norm 1.070204466432e-08 ||r(i)||/||b|| 2.665534085342e-10
> >>>>> 163 KSP unpreconditioned resid norm 9.969290307649e-09 true resid
> norm 9.969290432333e-09 ||r(i)||/||b|| 2.483028644297e-10
> >>>>> 164 KSP unpreconditioned resid norm 9.134440883306e-09 true resid
> norm 9.134440980976e-09 ||r(i)||/||b|| 2.275094577628e-10
> >>>>> 165 KSP unpreconditioned resid norm 8.593316427292e-09 true resid
> norm 8.593316413360e-09 ||r(i)||/||b|| 2.140317904139e-10
> >>>>> 166 KSP unpreconditioned resid norm 8.042173048464e-09 true resid
> norm 8.042173332848e-09 ||r(i)||/||b|| 2.003045942277e-10
> >>>>> 167 KSP unpreconditioned resid norm 7.655518522782e-09 true resid
> norm 7.655518879144e-09 ||r(i)||/||b|| 1.906742791064e-10
> >>>>> 168 KSP unpreconditioned resid norm 7.210283391815e-09 true resid
> norm 7.210283220312e-09 ||r(i)||/||b|| 1.795848951442e-10
> >>>>> 169 KSP unpreconditioned resid norm 6.793967416271e-09 true resid
> norm 6.793967448832e-09 ||r(i)||/||b|| 1.692158122825e-10
> >>>>> 170 KSP unpreconditioned resid norm 6.249160304588e-09 true resid
> norm 6.249160382647e-09 ||r(i)||/||b|| 1.556464257736e-10
> >>>>> 171 KSP unpreconditioned resid norm 5.794936438798e-09 true resid
> norm 5.794936332552e-09 ||r(i)||/||b|| 1.443331699811e-10
> >>>>> 172 KSP unpreconditioned resid norm 5.222337397128e-09 true resid
> norm 5.222337443277e-09 ||r(i)||/||b|| 1.300715788135e-10
> >>>>> 173 KSP unpreconditioned resid norm 4.755359110447e-09 true resid
> norm 4.755358888996e-09 ||r(i)||/||b|| 1.184406494668e-10
> >>>>> 174 KSP unpreconditioned resid norm 4.317537007873e-09 true resid
> norm 4.317537267718e-09 ||r(i)||/||b|| 1.075359252630e-10
> >>>>> 175 KSP unpreconditioned resid norm 3.924177535665e-09 true resid
> norm 3.924177629720e-09 ||r(i)||/||b|| 9.773860563138e-11
> >>>>> 176 KSP unpreconditioned resid norm 3.502843065115e-09 true resid
> norm 3.502843126359e-09 ||r(i)||/||b|| 8.724452234855e-11
> >>>>> 177 KSP unpreconditioned resid norm 3.083873232869e-09 true resid
> norm 3.083873352938e-09 ||r(i)||/||b|| 7.680933686007e-11
> >>>>> 178 KSP unpreconditioned resid norm 2.758980676473e-09 true resid
> norm 2.758980618096e-09 ||r(i)||/||b|| 6.871730691658e-11
> >>>>> 179 KSP unpreconditioned resid norm 2.510978240429e-09 true resid
> norm 2.510978327392e-09 ||r(i)||/||b|| 6.254036989334e-11
> >>>>> 180 KSP unpreconditioned resid norm 2.323000193205e-09 true resid
> norm 2.323000193205e-09 ||r(i)||/||b|| 5.785844097519e-11
> >>>>> 181 KSP unpreconditioned resid norm 2.167480159274e-09 true resid
> norm 2.167480113693e-09 ||r(i)||/||b|| 5.398493749153e-11
> >>>>> 182 KSP unpreconditioned resid norm 1.983545827983e-09 true resid
> norm 1.983546404840e-09 ||r(i)||/||b|| 4.940374216139e-11
> >>>>> 183 KSP unpreconditioned resid norm 1.794576286774e-09 true resid
> norm 1.794576224361e-09 ||r(i)||/||b|| 4.469710457036e-11
> >>>>> 184 KSP unpreconditioned resid norm 1.583490590644e-09 true resid
> norm 1.583490380603e-09 ||r(i)||/||b|| 3.943963715064e-11
> >>>>> 185 KSP unpreconditioned resid norm 1.412659866247e-09 true resid
> norm 1.412659832191e-09 ||r(i)||/||b|| 3.518479927722e-11
> >>>>> 186 KSP unpreconditioned resid norm 1.285613344939e-09 true resid
> norm 1.285612984761e-09 ||r(i)||/||b|| 3.202047215205e-11
> >>>>> 187 KSP unpreconditioned resid norm 1.168115133929e-09 true resid
> norm 1.168114766904e-09 ||r(i)||/||b|| 2.909397058634e-11
> >>>>> 188 KSP unpreconditioned resid norm 1.063377926053e-09 true resid
> norm 1.063377647554e-09 ||r(i)||/||b|| 2.648530681802e-11
> >>>>> 189 KSP unpreconditioned resid norm 9.548967728122e-10 true resid
> norm 9.548964523410e-10 ||r(i)||/||b|| 2.378339019807e-11
> >>>>> KSP Object: 16 MPI processes
> >>>>>    type: fgmres
> >>>>>      restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> >>>>>      happy breakdown tolerance 1e-30
> >>>>>    maximum iterations=2000, initial guess is zero
> >>>>>    tolerances:  relative=1e-20, absolute=1e-09, divergence=10000.
> >>>>>    right preconditioning
> >>>>>    using UNPRECONDITIONED norm type for convergence test
> >>>>> PC Object: 16 MPI processes
> >>>>>    type: bjacobi
> >>>>>      number of blocks = 4
> >>>>>      Local solver information for first block is in the following
> KSP and PC objects on rank 0:
> >>>>>      Use -ksp_view ::ascii_info_detail to display information for
> all blocks
> >>>>>    KSP Object: (sub_) 4 MPI processes
> >>>>>      type: preonly
> >>>>>      maximum iterations=10000, initial guess is zero
> >>>>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >>>>>      left preconditioning
> >>>>>      using NONE norm type for convergence test
> >>>>>    PC Object: (sub_) 4 MPI processes
> >>>>>      type: telescope
> >>>>>        petsc subcomm: parent comm size reduction factor = 4
> >>>>>        petsc subcomm: parent_size = 4 , subcomm_size = 1
> >>>>>        petsc subcomm type = contiguous
> >>>>>      linear system matrix = precond matrix:
> >>>>>      Mat Object: (sub_) 4 MPI processes
> >>>>>        type: mpiaij
> >>>>>        rows=40200, cols=40200
> >>>>>        total: nonzeros=199996, allocated nonzeros=203412
> >>>>>        total number of mallocs used during MatSetValues calls=0
> >>>>>          not using I-node (on process 0) routines
> >>>>>          setup type: default
> >>>>>          Parent DM object: NULL
> >>>>>          Sub DM object: NULL
> >>>>>          KSP Object:   (sub_telescope_)   1 MPI processes
> >>>>>            type: preonly
> >>>>>            maximum iterations=10000, initial guess is zero
> >>>>>            tolerances:  relative=1e-05, absolute=1e-50,
> divergence=10000.
> >>>>>            left preconditioning
> >>>>>            using NONE norm type for convergence test
> >>>>>          PC Object:   (sub_telescope_)   1 MPI processes
> >>>>>            type: lu
> >>>>>              out-of-place factorization
> >>>>>              tolerance for zero pivot 2.22045e-14
> >>>>>              matrix ordering: external
> >>>>>              factor fill ratio given 0., needed 0.
> >>>>>                Factored matrix follows:
> >>>>>                  Mat Object:   1 MPI processes
> >>>>>                    type: mumps
> >>>>>                    rows=40200, cols=40200
> >>>>>                    package used to perform factorization: mumps
> >>>>>                    total: nonzeros=1849788, allocated
> nonzeros=1849788
> >>>>>                      MUMPS run parameters:
> >>>>>                        SYM (matrix type):                   0
> >>>>>                        PAR (host participation):            1
> >>>>>                        ICNTL(1) (output for error):         6
> >>>>>                        ICNTL(2) (output of diagnostic msg): 0
> >>>>>                        ICNTL(3) (output for global info):   0
> >>>>>                        ICNTL(4) (level of printing):        0
> >>>>>                        ICNTL(5) (input mat struct):         0
> >>>>>                        ICNTL(6) (matrix prescaling):        7
> >>>>>                        ICNTL(7) (sequential matrix ordering):7
> >>>>>                        ICNTL(8) (scaling strategy):        77
> >>>>>                        ICNTL(10) (max num of refinements):  0
> >>>>>                        ICNTL(11) (error analysis):          0
> >>>>>                        ICNTL(12) (efficiency control):        1
> >>>>>                        ICNTL(13) (sequential factorization of the
> root node):  0
> >>>>>                        ICNTL(14) (percentage of estimated workspace
> increase): 20
> >>>>>                        ICNTL(18) (input mat struct):        0
> >>>>>                        ICNTL(19) (Schur complement info):        0
> >>>>>                        ICNTL(20) (RHS sparse pattern):        0
> >>>>>                        ICNTL(21) (solution struct):        0
> >>>>>                        ICNTL(22) (in-core/out-of-core facility):
>     0
> >>>>>                        ICNTL(23) (max size of memory can be
> allocated locally):0
> >>>>>                        ICNTL(24) (detection of null pivot rows):
>     0
> >>>>>                        ICNTL(25) (computation of a null space
> basis):         0
> >>>>>                        ICNTL(26) (Schur options for RHS or
> solution):         0
> >>>>>                        ICNTL(27) (blocking size for multiple RHS):
>        -32
> >>>>>                        ICNTL(28) (use parallel or sequential
> ordering):         1
> >>>>>                        ICNTL(29) (parallel ordering):        0
> >>>>>                        ICNTL(30) (user-specified set of entries in
> inv(A)):    0
> >>>>>                        ICNTL(31) (factors is discarded in the solve
> phase):    0
> >>>>>                        ICNTL(33) (compute determinant):        0
> >>>>>                        ICNTL(35) (activate BLR based
> factorization):         0
> >>>>>                        ICNTL(36) (choice of BLR factorization
> variant):         0
> >>>>>                        ICNTL(38) (estimated compression rate of LU
> factors):   333
> >>>>>                        CNTL(1) (relative pivoting threshold):
> 0.01
> >>>>>                        CNTL(2) (stopping criterion of refinement):
> 1.49012e-08
> >>>>>                        CNTL(3) (absolute pivoting threshold):      0.
> >>>>>                        CNTL(4) (value of static pivoting):
>  -1.
> >>>>>                        CNTL(5) (fixation for null pivots):         0.
> >>>>>                        CNTL(7) (dropping parameter for BLR):       0.
> >>>>>                        RINFO(1) (local estimated flops for the
> elimination after analysis):
> >>>>>                          [0] 1.45525e+08
> >>>>>                        RINFO(2) (local estimated flops for the
> assembly after factorization):
> >>>>>                          [0]  2.89397e+06
> >>>>>                        RINFO(3) (local estimated flops for the
> elimination after factorization):
> >>>>>                          [0]  1.45525e+08
> >>>>>                        INFO(15) (estimated size of (in MB) MUMPS
> internal data for running numerical factorization):
> >>>>>                        [0] 29
> >>>>>                        INFO(16) (size of (in MB) MUMPS internal data
> used during numerical factorization):
> >>>>>                          [0] 29
> >>>>>                        INFO(23) (num of pivots eliminated on this
> processor after factorization):
> >>>>>                          [0] 40200
> >>>>>                        RINFOG(1) (global estimated flops for the
> elimination after analysis): 1.45525e+08
> >>>>>                        RINFOG(2) (global estimated flops for the
> assembly after factorization): 2.89397e+06
> >>>>>                        RINFOG(3) (global estimated flops for the
> elimination after factorization): 1.45525e+08
> >>>>>                        (RINFOG(12) RINFOG(13))*2^INFOG(34)
> (determinant): (0.,0.)*(2^0)
> >>>>>                        INFOG(3) (estimated real workspace for
> factors on all processors after analysis): 1849788
> >>>>>                        INFOG(4) (estimated integer workspace for
> factors on all processors after analysis): 879986
> >>>>>                        INFOG(5) (estimated maximum front size in the
> complete tree): 282
> >>>>>                        INFOG(6) (number of nodes in the complete
> tree): 23709
> >>>>>                        INFOG(7) (ordering option effectively used
> after analysis): 5
> >>>>>                        INFOG(8) (structural symmetry in percent of
> the permuted matrix after analysis): 100
> >>>>>                        INFOG(9) (total real/complex workspace to
> store the matrix factors after factorization): 1849788
> >>>>>                        INFOG(10) (total integer space store the
> matrix factors after factorization): 879986
> >>>>>                        INFOG(11) (order of largest frontal matrix
> after factorization): 282
> >>>>>                        INFOG(12) (number of off-diagonal pivots): 0
> >>>>>                        INFOG(13) (number of delayed pivots after
> factorization): 0
> >>>>>                        INFOG(14) (number of memory compress after
> factorization): 0
> >>>>>                        INFOG(15) (number of steps of iterative
> refinement after solution): 0
> >>>>>                        INFOG(16) (estimated size (in MB) of all
> MUMPS internal data for factorization after analysis: value on the most
> memory consuming processor): 29
> >>>>>                        INFOG(17) (estimated size of all MUMPS
> internal data for factorization after analysis: sum over all processors): 29
> >>>>>                        INFOG(18) (size of all MUMPS internal data
> allocated during factorization: value on the most memory consuming
> processor): 29
> >>>>>                        INFOG(19) (size of all MUMPS internal data
> allocated during factorization: sum over all processors): 29
> >>>>>                        INFOG(20) (estimated number of entries in the
> factors): 1849788
> >>>>>                        INFOG(21) (size in MB of memory effectively
> used during factorization - value on the most memory consuming processor):
> 26
> >>>>>                        INFOG(22) (size in MB of memory effectively
> used during factorization - sum over all processors): 26
> >>>>>                        INFOG(23) (after analysis: value of ICNTL(6)
> effectively used): 0
> >>>>>                        INFOG(24) (after analysis: value of ICNTL(12)
> effectively used): 1
> >>>>>                        INFOG(25) (after factorization: number of
> pivots modified by static pivoting): 0
> >>>>>                        INFOG(28) (after factorization: number of
> null pivots encountered): 0
> >>>>>                        INFOG(29) (after factorization: effective
> number of entries in the factors (sum over all processors)): 1849788
> >>>>>                        INFOG(30, 31) (after solution: size in Mbytes
> of memory used during solution phase): 29, 29
> >>>>>                        INFOG(32) (after analysis: type of analysis
> done): 1
> >>>>>                        INFOG(33) (value used for ICNTL(8)): 7
> >>>>>                        INFOG(34) (exponent of the determinant if
> determinant is requested): 0
> >>>>>                        INFOG(35) (after factorization: number of
> entries taking into account BLR factor compression - sum over all
> processors): 1849788
> >>>>>                        INFOG(36) (after analysis: estimated size of
> all MUMPS internal data for running BLR in-core - value on the most memory
> consuming processor): 0
> >>>>>                        INFOG(37) (after analysis: estimated size of
> all MUMPS internal data for running BLR in-core - sum over all processors):
> 0
> >>>>>                        INFOG(38) (after analysis: estimated size of
> all MUMPS internal data for running BLR out-of-core - value on the most
> memory consuming processor): 0
> >>>>>                        INFOG(39) (after analysis: estimated size of
> all MUMPS internal data for running BLR out-of-core - sum over all
> processors): 0
> >>>>>            linear system matrix = precond matrix:
> >>>>>            Mat Object:   1 MPI processes
> >>>>>              type: seqaijcusparse
> >>>>>              rows=40200, cols=40200
> >>>>>              total: nonzeros=199996, allocated nonzeros=199996
> >>>>>              total number of mallocs used during MatSetValues calls=0
> >>>>>                not using I-node routines
> >>>>>    linear system matrix = precond matrix:
> >>>>>    Mat Object: 16 MPI processes
> >>>>>      type: mpiaijcusparse
> >>>>>      rows=160800, cols=160800
> >>>>>      total: nonzeros=802396, allocated nonzeros=1608000
> >>>>>      total number of mallocs used during MatSetValues calls=0
> >>>>>        not using I-node (on process 0) routines
> >>>>> Norm of error 9.11684e-07 iterations 189
> >>>>> Chang
> >>>>> On 10/14/21 10:10 PM, Chang Liu wrote:
> >>>>>> Hi Barry,
> >>>>>>
> >>>>>> No problem. Here is the output. It seems that the resid norm
> calculation is incorrect.
> >>>>>>
> >>>>>> $ mpiexec -n 16 --hostfile hostfile --oversubscribe ./ex7 -m 400
> -ksp_view -ksp_monitor_true_residual -pc_type bjacobi -pc_bjacobi_blocks 4
> -ksp_type fgmres -mat_type aijcusparse -sub_pc_type telescope -sub_ksp_type
> preonly -sub_telescope_ksp_type preonly -sub_telescope_pc_type lu
> -sub_telescope_pc_factor_mat_solver_type cusparse
> -sub_pc_telescope_reduction_factor 4 -sub_pc_telescope_subcomm_type
> contiguous -ksp_max_it 2000 -ksp_rtol 1.e-20 -ksp_atol 1.e-9
> >>>>>>     0 KSP unpreconditioned resid norm 4.014971979977e+01 true resid
> norm 4.014971979977e+01 ||r(i)||/||b|| 1.000000000000e+00
> >>>>>>     1 KSP unpreconditioned resid norm 0.000000000000e+00 true resid
> norm 4.014971979977e+01 ||r(i)||/||b|| 1.000000000000e+00
> >>>>>> KSP Object: 16 MPI processes
> >>>>>>     type: fgmres
> >>>>>>       restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> >>>>>>       happy breakdown tolerance 1e-30
> >>>>>>     maximum iterations=2000, initial guess is zero
> >>>>>>     tolerances:  relative=1e-20, absolute=1e-09, divergence=10000.
> >>>>>>     right preconditioning
> >>>>>>     using UNPRECONDITIONED norm type for convergence test
> >>>>>> PC Object: 16 MPI processes
> >>>>>>     type: bjacobi
> >>>>>>       number of blocks = 4
> >>>>>>       Local solver information for first block is in the following
> KSP and PC objects on rank 0:
> >>>>>>       Use -ksp_view ::ascii_info_detail to display information for
> all blocks
> >>>>>>     KSP Object: (sub_) 4 MPI processes
> >>>>>>       type: preonly
> >>>>>>       maximum iterations=10000, initial guess is zero
> >>>>>>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >>>>>>       left preconditioning
> >>>>>>       using NONE norm type for convergence test
> >>>>>>     PC Object: (sub_) 4 MPI processes
> >>>>>>       type: telescope
> >>>>>>         petsc subcomm: parent comm size reduction factor = 4
> >>>>>>         petsc subcomm: parent_size = 4 , subcomm_size = 1
> >>>>>>         petsc subcomm type = contiguous
> >>>>>>       linear system matrix = precond matrix:
> >>>>>>       Mat Object: (sub_) 4 MPI processes
> >>>>>>         type: mpiaij
> >>>>>>         rows=40200, cols=40200
> >>>>>>         total: nonzeros=199996, allocated nonzeros=203412
> >>>>>>         total number of mallocs used during MatSetValues calls=0
> >>>>>>           not using I-node (on process 0) routines
> >>>>>>           setup type: default
> >>>>>>           Parent DM object: NULL
> >>>>>>           Sub DM object: NULL
> >>>>>>           KSP Object:   (sub_telescope_)   1 MPI processes
> >>>>>>             type: preonly
> >>>>>>             maximum iterations=10000, initial guess is zero
> >>>>>>             tolerances:  relative=1e-05, absolute=1e-50,
> divergence=10000.
> >>>>>>             left preconditioning
> >>>>>>             using NONE norm type for convergence test
> >>>>>>           PC Object:   (sub_telescope_)   1 MPI processes
> >>>>>>             type: lu
> >>>>>>               out-of-place factorization
> >>>>>>               tolerance for zero pivot 2.22045e-14
> >>>>>>               matrix ordering: nd
> >>>>>>               factor fill ratio given 5., needed 8.62558
> >>>>>>                 Factored matrix follows:
> >>>>>>                   Mat Object:   1 MPI processes
> >>>>>>                     type: seqaijcusparse
> >>>>>>                     rows=40200, cols=40200
> >>>>>>                     package used to perform factorization: cusparse
> >>>>>>                     total: nonzeros=1725082, allocated
> nonzeros=1725082
> >>>>>>                       not using I-node routines
> >>>>>>             linear system matrix = precond matrix:
> >>>>>>             Mat Object:   1 MPI processes
> >>>>>>               type: seqaijcusparse
> >>>>>>               rows=40200, cols=40200
> >>>>>>               total: nonzeros=199996, allocated nonzeros=199996
> >>>>>>               total number of mallocs used during MatSetValues
> calls=0
> >>>>>>                 not using I-node routines
> >>>>>>     linear system matrix = precond matrix:
> >>>>>>     Mat Object: 16 MPI processes
> >>>>>>       type: mpiaijcusparse
> >>>>>>       rows=160800, cols=160800
> >>>>>>       total: nonzeros=802396, allocated nonzeros=1608000
> >>>>>>       total number of mallocs used during MatSetValues calls=0
> >>>>>>         not using I-node (on process 0) routines
> >>>>>> Norm of error 400.999 iterations 1
> >>>>>>
> >>>>>> Chang
> >>>>>>
> >>>>>>
> >>>>>> On 10/14/21 9:47 PM, Barry Smith wrote:
> >>>>>>>
> >>>>>>>     Chang,
> >>>>>>>
> >>>>>>>      Sorry I did not notice that one. Please run that with
> -ksp_view -ksp_monitor_true_residual so we can see exactly how options are
> interpreted and solver used. At a glance it looks ok but something must be
> wrong to get the wrong answer.
> >>>>>>>
> >>>>>>>     Barry
> >>>>>>>
> >>>>>>>> On Oct 14, 2021, at 6:02 PM, Chang Liu <cliu at pppl.gov> wrote:
> >>>>>>>>
> >>>>>>>> Hi Barry,
> >>>>>>>>
> >>>>>>>> That is exactly what I was doing in the second example, in which
> the preconditioner works but the GMRES does not.
> >>>>>>>>
> >>>>>>>> Chang
> >>>>>>>>
> >>>>>>>> On 10/14/21 5:15 PM, Barry Smith wrote:
> >>>>>>>>>     You need to use the PCTELESCOPE inside the block Jacobi, not
> outside it. So something like -pc_type bjacobi -sub_pc_type telescope
> -sub_telescope_pc_type lu
> >>>>>>>>>> On Oct 14, 2021, at 4:14 PM, Chang Liu <cliu at pppl.gov> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hi Pierre,
> >>>>>>>>>>
> >>>>>>>>>> I wonder if the trick of PCTELESCOPE only works for
> preconditioner and not for the solver. I have done some tests, and find
> that for solving a small matrix using -telescope_ksp_type preonly, it does
> work for GPU with multiple MPI processes. However, for bjacobi and gmres,
> it does not work.
> >>>>>>>>>>
> >>>>>>>>>> The command line options I used for small matrix is like
> >>>>>>>>>>
> >>>>>>>>>> mpiexec -n 4 --oversubscribe ./ex7 -m 100 -ksp_monitor_short
> -pc_type telescope -mat_type aijcusparse -telescope_pc_type lu
> -telescope_pc_factor_mat_solver_type cusparse -telescope_ksp_type preonly
> -pc_telescope_reduction_factor 4
> >>>>>>>>>>
> >>>>>>>>>> which gives the correct output. For iterative solver, I tried
> >>>>>>>>>>
> >>>>>>>>>> mpiexec -n 16 --oversubscribe ./ex7 -m 400 -ksp_monitor_short
> -pc_type bjacobi -pc_bjacobi_blocks 4 -ksp_type fgmres -mat_type
> aijcusparse -sub_pc_type telescope -sub_ksp_type preonly
> -sub_telescope_ksp_type preonly -sub_telescope_pc_type lu
> -sub_telescope_pc_factor_mat_solver_type cusparse
> -sub_pc_telescope_reduction_factor 4 -ksp_max_it 2000 -ksp_rtol 1.e-9
> -ksp_atol 1.e-20
> >>>>>>>>>>
> >>>>>>>>>> for large matrix. The output is like
> >>>>>>>>>>
> >>>>>>>>>>    0 KSP Residual norm 40.1497
> >>>>>>>>>>    1 KSP Residual norm < 1.e-11
> >>>>>>>>>> Norm of error 400.999 iterations 1
> >>>>>>>>>>
> >>>>>>>>>> So it seems to call a direct solver instead of an iterative one.
> >>>>>>>>>>
> >>>>>>>>>> Can you please help check these options?
> >>>>>>>>>>
> >>>>>>>>>> Chang
> >>>>>>>>>>
> >>>>>>>>>> On 10/14/21 10:04 AM, Pierre Jolivet wrote:
> >>>>>>>>>>>> On 14 Oct 2021, at 3:50 PM, Chang Liu <cliu at pppl.gov> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thank you Pierre. I was not aware of PCTELESCOPE before. This
> sounds exactly what I need. I wonder if PCTELESCOPE can transform a
> mpiaijcusparse to seqaircusparse? Or I have to do it manually?
> >>>>>>>>>>> PCTELESCOPE uses MatCreateMPIMatConcatenateSeqMat().
> >>>>>>>>>>> 1) I’m not sure this is implemented for cuSparse matrices, but
> it should be;
> >>>>>>>>>>> 2) at least for the implementations
> MatCreateMPIMatConcatenateSeqMat_MPIBAIJ() and
> MatCreateMPIMatConcatenateSeqMat_MPIAIJ(), the resulting MatType is MATBAIJ
> (resp. MATAIJ). Constructors are usually “smart” enough to detect if the
> MPI communicator on which the Mat lives is of size 1 (your case), and then
> the resulting Mat is of type MatSeqX instead of MatMPIX, so you would not
> need to worry about the transformation you are mentioning.
> >>>>>>>>>>> If you try this out and this does not work, please provide the
> backtrace (probably something like “Operation XYZ not implemented for
> MatType ABC”), and hopefully someone can add the missing plumbing.
> >>>>>>>>>>> I do not claim that this will be efficient, but I think this
> goes in the direction of what you want to achieve.
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>> Pierre
> >>>>>>>>>>>> Chang
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 10/14/21 1:35 AM, Pierre Jolivet wrote:
> >>>>>>>>>>>>> Maybe I’m missing something, but can’t you use PCTELESCOPE
> as a subdomain solver, with a reduction factor equal to the number of MPI
> processes you have per block?
> >>>>>>>>>>>>> -sub_pc_type telescope -sub_pc_telescope_reduction_factor X
> -sub_telescope_pc_type lu
> >>>>>>>>>>>>> This does not work with MUMPS -mat_mumps_use_omp_threads
> because not only do the Mat needs to be redistributed, the secondary
> processes also need to be “converted” to OpenMP threads.
> >>>>>>>>>>>>> Thus the need for specific code in mumps.c.
> >>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>> Pierre
> >>>>>>>>>>>>>> On 14 Oct 2021, at 6:00 AM, Chang Liu via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi Junchao,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Yes that is what I want.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Chang
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 10/13/21 11:42 PM, Junchao Zhang wrote:
> >>>>>>>>>>>>>>> On Wed, Oct 13, 2021 at 8:58 PM Barry Smith <
> bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
> >>>>>>>>>>>>>>>         Junchao,
> >>>>>>>>>>>>>>>            If I understand correctly Chang is using the
> block Jacobi
> >>>>>>>>>>>>>>>      method with a single block for a number of MPI ranks
> and a direct
> >>>>>>>>>>>>>>>      solver for each block so it uses
> PCSetUp_BJacobi_Multiproc() which
> >>>>>>>>>>>>>>>      is code Hong Zhang wrote a number of years ago for
> CPUs. For their
> >>>>>>>>>>>>>>>      particular problems this preconditioner works well,
> but using an
> >>>>>>>>>>>>>>>      iterative solver on the blocks does not work well.
> >>>>>>>>>>>>>>>            If we had complete MPI-GPU direct solvers he
> could just use
> >>>>>>>>>>>>>>>      the current code with MPIAIJCUSPARSE on each block
> but since we do
> >>>>>>>>>>>>>>>      not he would like to use a single GPU for each block,
> this means
> >>>>>>>>>>>>>>>      that diagonal blocks of  the global parallel MPI
> matrix needs to be
> >>>>>>>>>>>>>>>      sent to a subset of the GPUs (one GPU per block,
> which has multiple
> >>>>>>>>>>>>>>>      MPI ranks associated with the blocks). Similarly for
> the triangular
> >>>>>>>>>>>>>>>      solves the blocks of the right hand side needs to be
> shipped to the
> >>>>>>>>>>>>>>>      appropriate GPU and the resulting solution shipped
> back to the
> >>>>>>>>>>>>>>>      multiple GPUs. So Chang is absolutely correct, this
> is somewhat like
> >>>>>>>>>>>>>>>      your code for MUMPS with OpenMP. OK, I now understand
> the background..
> >>>>>>>>>>>>>>>      One could use PCSetUp_BJacobi_Multiproc() and get the
> blocks on the
> >>>>>>>>>>>>>>>      MPI ranks and then shrink each block down to a single
> GPU but this
> >>>>>>>>>>>>>>>      would be pretty inefficient, ideally one would go
> directly from the
> >>>>>>>>>>>>>>>      big MPI matrix on all the GPUs to the sub matrices on
> the subset of
> >>>>>>>>>>>>>>>      GPUs. But this may be a large coding project.
> >>>>>>>>>>>>>>> I don't understand these sentences. Why do you say
> "shrink"? In my mind, we just need to move each block (submatrix) living
> over multiple MPI ranks to one of them and solve directly there.  In other
> words, we keep blocks' size, no shrinking or expanding.
> >>>>>>>>>>>>>>> As mentioned before, cusparse does not provide LU
> factorization. So the LU factorization would be done on CPU, and the solve
> be done on GPU. I assume Chang wants to gain from the (potential) faster
> solve (instead of factorization) on GPU.
> >>>>>>>>>>>>>>>         Barry
> >>>>>>>>>>>>>>>      Since the matrices being factored and solved directly
> are relatively
> >>>>>>>>>>>>>>>      large it is possible that the cusparse code could be
> reasonably
> >>>>>>>>>>>>>>>      efficient (they are not the tiny problems one gets at
> the coarse
> >>>>>>>>>>>>>>>      level of multigrid). Of course, this is speculation,
> I don't
> >>>>>>>>>>>>>>>      actually know how much better the cusparse code would
> be on the
> >>>>>>>>>>>>>>>      direct solver than a good CPU direct sparse solver.
> >>>>>>>>>>>>>>>       > On Oct 13, 2021, at 9:32 PM, Chang Liu <
> cliu at pppl.gov
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov>> wrote:
> >>>>>>>>>>>>>>>       >
> >>>>>>>>>>>>>>>       > Sorry I am not familiar with the details either.
> Can you please
> >>>>>>>>>>>>>>>      check the code in MatMumpsGatherNonzerosOnMaster in
> mumps.c?
> >>>>>>>>>>>>>>>       >
> >>>>>>>>>>>>>>>       > Chang
> >>>>>>>>>>>>>>>       >
> >>>>>>>>>>>>>>>       > On 10/13/21 9:24 PM, Junchao Zhang wrote:
> >>>>>>>>>>>>>>>       >> Hi Chang,
> >>>>>>>>>>>>>>>       >>   I did the work in mumps. It is easy for me to
> understand
> >>>>>>>>>>>>>>>      gathering matrix rows to one process.
> >>>>>>>>>>>>>>>       >>   But how to gather blocks (submatrices) to form
> a large block?     Can you draw a picture of that?
> >>>>>>>>>>>>>>>       >>   Thanks
> >>>>>>>>>>>>>>>       >> --Junchao Zhang
> >>>>>>>>>>>>>>>       >> On Wed, Oct 13, 2021 at 7:47 PM Chang Liu via
> petsc-users
> >>>>>>>>>>>>>>>      <petsc-users at mcs.anl.gov <mailto:
> petsc-users at mcs.anl.gov>
> >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov <mailto:
> petsc-users at mcs.anl.gov>>>
> >>>>>>>>>>>>>>>      wrote:
> >>>>>>>>>>>>>>>       >>    Hi Barry,
> >>>>>>>>>>>>>>>       >>    I think mumps solver in petsc does support
> that. You can
> >>>>>>>>>>>>>>>      check the
> >>>>>>>>>>>>>>>       >>    documentation on "-mat_mumps_use_omp_threads"
> at
> >>>>>>>>>>>>>>>       >>
> >>>>>>>>>>>>>>>
> https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html
> >>>>>>>>>>>>>>> <
> https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html>
> >>>>>>>>>>>>>>>       >> <
> https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html
> >>>>>>>>>>>>>>> <
> https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html>>
> >>>>>>>>>>>>>>>       >>    and the code enclosed by #if
> >>>>>>>>>>>>>>>      defined(PETSC_HAVE_OPENMP_SUPPORT) in
> >>>>>>>>>>>>>>>       >>    functions MatMumpsSetUpDistRHSInfo and
> >>>>>>>>>>>>>>>       >>    MatMumpsGatherNonzerosOnMaster in
> >>>>>>>>>>>>>>>       >>    mumps.c
> >>>>>>>>>>>>>>>       >>    1. I understand it is ideal to do one MPI rank
> per GPU.
> >>>>>>>>>>>>>>>      However, I am
> >>>>>>>>>>>>>>>       >>    working on an existing code that was developed
> based on MPI
> >>>>>>>>>>>>>>>      and the the
> >>>>>>>>>>>>>>>       >>    # of mpi ranks is typically equal to # of cpu
> cores. We don't
> >>>>>>>>>>>>>>>      want to
> >>>>>>>>>>>>>>>       >>    change the whole structure of the code.
> >>>>>>>>>>>>>>>       >>    2. What you have suggested has been coded in
> mumps.c. See
> >>>>>>>>>>>>>>>      function
> >>>>>>>>>>>>>>>       >>    MatMumpsSetUpDistRHSInfo.
> >>>>>>>>>>>>>>>       >>    Regards,
> >>>>>>>>>>>>>>>       >>    Chang
> >>>>>>>>>>>>>>>       >>    On 10/13/21 7:53 PM, Barry Smith wrote:
> >>>>>>>>>>>>>>>       >>     >
> >>>>>>>>>>>>>>>       >>     >
> >>>>>>>>>>>>>>>       >>     >> On Oct 13, 2021, at 3:50 PM, Chang Liu <
> cliu at pppl.gov
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
> wrote:
> >>>>>>>>>>>>>>>       >>     >>
> >>>>>>>>>>>>>>>       >>     >> Hi Barry,
> >>>>>>>>>>>>>>>       >>     >>
> >>>>>>>>>>>>>>>       >>     >> That is exactly what I want.
> >>>>>>>>>>>>>>>       >>     >>
> >>>>>>>>>>>>>>>       >>     >> Back to my original question, I am looking
> for an approach to
> >>>>>>>>>>>>>>>       >>    transfer
> >>>>>>>>>>>>>>>       >>     >> matrix
> >>>>>>>>>>>>>>>       >>     >> data from many MPI processes to "master"
> MPI
> >>>>>>>>>>>>>>>       >>     >> processes, each of which taking care of
> one GPU, and then
> >>>>>>>>>>>>>>>      upload
> >>>>>>>>>>>>>>>       >>    the data to GPU to
> >>>>>>>>>>>>>>>       >>     >> solve.
> >>>>>>>>>>>>>>>       >>     >> One can just grab some codes from mumps.c
> to
> >>>>>>>>>>>>>>>      aijcusparse.cu <http://aijcusparse.cu>
> >>>>>>>>>>>>>>>       >>    <http://aijcusparse.cu <http://aijcusparse.cu
> >>.
> >>>>>>>>>>>>>>>       >>     >
> >>>>>>>>>>>>>>>       >>     >    mumps.c doesn't actually do that. It
> never needs to
> >>>>>>>>>>>>>>>      copy the
> >>>>>>>>>>>>>>>       >>    entire matrix to a single MPI rank.
> >>>>>>>>>>>>>>>       >>     >
> >>>>>>>>>>>>>>>       >>     >    It would be possible to write such a
> code that you
> >>>>>>>>>>>>>>>      suggest but
> >>>>>>>>>>>>>>>       >>    it is not clear that it makes sense
> >>>>>>>>>>>>>>>       >>     >
> >>>>>>>>>>>>>>>       >>     > 1)  For normal PETSc GPU usage there is one
> GPU per MPI
> >>>>>>>>>>>>>>>      rank, so
> >>>>>>>>>>>>>>>       >>    while your one GPU per big domain is solving
> its systems the
> >>>>>>>>>>>>>>>      other
> >>>>>>>>>>>>>>>       >>    GPUs (with the other MPI ranks that share that
> domain) are doing
> >>>>>>>>>>>>>>>       >>    nothing.
> >>>>>>>>>>>>>>>       >>     >
> >>>>>>>>>>>>>>>       >>     > 2) For each triangular solve you would have
> to gather the
> >>>>>>>>>>>>>>>      right
> >>>>>>>>>>>>>>>       >>    hand side from the multiple ranks to the
> single GPU to pass it to
> >>>>>>>>>>>>>>>       >>    the GPU solver and then scatter the resulting
> solution back
> >>>>>>>>>>>>>>>      to all
> >>>>>>>>>>>>>>>       >>    of its subdomain ranks.
> >>>>>>>>>>>>>>>       >>     >
> >>>>>>>>>>>>>>>       >>     >    What I was suggesting was assign an
> entire subdomain to a
> >>>>>>>>>>>>>>>       >>    single MPI rank, thus it does everything on
> one GPU and can
> >>>>>>>>>>>>>>>      use the
> >>>>>>>>>>>>>>>       >>    GPU solver directly. If all the major
> computations of a subdomain
> >>>>>>>>>>>>>>>       >>    can fit and be done on a single GPU then you
> would be
> >>>>>>>>>>>>>>>      utilizing all
> >>>>>>>>>>>>>>>       >>    the GPUs you are using effectively.
> >>>>>>>>>>>>>>>       >>     >
> >>>>>>>>>>>>>>>       >>     >    Barry
> >>>>>>>>>>>>>>>       >>     >
> >>>>>>>>>>>>>>>       >>     >
> >>>>>>>>>>>>>>>       >>     >
> >>>>>>>>>>>>>>>       >>     >>
> >>>>>>>>>>>>>>>       >>     >> Chang
> >>>>>>>>>>>>>>>       >>     >>
> >>>>>>>>>>>>>>>       >>     >> On 10/13/21 1:53 PM, Barry Smith wrote:
> >>>>>>>>>>>>>>>       >>     >>>    Chang,
> >>>>>>>>>>>>>>>       >>     >>>      You are correct there is no MPI +
> GPU direct
> >>>>>>>>>>>>>>>      solvers that
> >>>>>>>>>>>>>>>       >>    currently do the triangular solves with MPI +
> GPU parallelism
> >>>>>>>>>>>>>>>      that I
> >>>>>>>>>>>>>>>       >>    am aware of. You are limited that individual
> triangular solves be
> >>>>>>>>>>>>>>>       >>    done on a single GPU. I can only suggest
> making each subdomain as
> >>>>>>>>>>>>>>>       >>    big as possible to utilize each GPU as much as
> possible for the
> >>>>>>>>>>>>>>>       >>    direct triangular solves.
> >>>>>>>>>>>>>>>       >>     >>>     Barry
> >>>>>>>>>>>>>>>       >>     >>>> On Oct 13, 2021, at 12:16 PM, Chang Liu
> via petsc-users
> >>>>>>>>>>>>>>>       >>    <petsc-users at mcs.anl.gov <mailto:
> petsc-users at mcs.anl.gov>
> >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov <mailto:
> petsc-users at mcs.anl.gov>>>
> >>>>>>>>>>>>>>>      wrote:
> >>>>>>>>>>>>>>>       >>     >>>>
> >>>>>>>>>>>>>>>       >>     >>>> Hi Mark,
> >>>>>>>>>>>>>>>       >>     >>>>
> >>>>>>>>>>>>>>>       >>     >>>> '-mat_type aijcusparse' works with
> mpiaijcusparse with
> >>>>>>>>>>>>>>>      other
> >>>>>>>>>>>>>>>       >>    solvers, but with -pc_factor_mat_solver_type
> cusparse, it
> >>>>>>>>>>>>>>>      will give
> >>>>>>>>>>>>>>>       >>    an error.
> >>>>>>>>>>>>>>>       >>     >>>>
> >>>>>>>>>>>>>>>       >>     >>>> Yes what I want is to have mumps or
> superlu to do the
> >>>>>>>>>>>>>>>       >>    factorization, and then do the rest, including
> GMRES solver,
> >>>>>>>>>>>>>>>      on gpu.
> >>>>>>>>>>>>>>>       >>    Is that possible?
> >>>>>>>>>>>>>>>       >>     >>>>
> >>>>>>>>>>>>>>>       >>     >>>> I have tried to use aijcusparse with
> superlu_dist, it
> >>>>>>>>>>>>>>>      runs but
> >>>>>>>>>>>>>>>       >>    the iterative solver is still running on CPUs.
> I have
> >>>>>>>>>>>>>>>      contacted the
> >>>>>>>>>>>>>>>       >>    superlu group and they confirmed that is the
> case right now.
> >>>>>>>>>>>>>>>      But if
> >>>>>>>>>>>>>>>       >>    I set -pc_factor_mat_solver_type cusparse, it
> seems that the
> >>>>>>>>>>>>>>>       >>    iterative solver is running on GPU.
> >>>>>>>>>>>>>>>       >>     >>>>
> >>>>>>>>>>>>>>>       >>     >>>> Chang
> >>>>>>>>>>>>>>>       >>     >>>>
> >>>>>>>>>>>>>>>       >>     >>>> On 10/13/21 12:03 PM, Mark Adams wrote:
> >>>>>>>>>>>>>>>       >>     >>>>> On Wed, Oct 13, 2021 at 11:10 AM Chang
> Liu
> >>>>>>>>>>>>>>>      <cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
> wrote:
> >>>>>>>>>>>>>>>       >>     >>>>>     Thank you Junchao for explaining
> this. I guess in
> >>>>>>>>>>>>>>>      my case
> >>>>>>>>>>>>>>>       >>    the code is
> >>>>>>>>>>>>>>>       >>     >>>>>     just calling a seq solver like
> superlu to do
> >>>>>>>>>>>>>>>       >>    factorization on GPUs.
> >>>>>>>>>>>>>>>       >>     >>>>>     My idea is that I want to have a
> traditional MPI
> >>>>>>>>>>>>>>>      code to
> >>>>>>>>>>>>>>>       >>    utilize GPUs
> >>>>>>>>>>>>>>>       >>     >>>>>     with cusparse. Right now cusparse
> does not support
> >>>>>>>>>>>>>>>      mpiaij
> >>>>>>>>>>>>>>>       >>    matrix, Sure it does: '-mat_type aijcusparse'
> will give you an
> >>>>>>>>>>>>>>>       >>    mpiaijcusparse matrix with > 1 processes.
> >>>>>>>>>>>>>>>       >>     >>>>> (-mat_type mpiaijcusparse might also
> work with >1 proc).
> >>>>>>>>>>>>>>>       >>     >>>>> However, I see in grepping the repo
> that all the mumps and
> >>>>>>>>>>>>>>>       >>    superlu tests use aij or sell matrix type.
> >>>>>>>>>>>>>>>       >>     >>>>> MUMPS and SuperLU provide their own
> solves, I assume
> >>>>>>>>>>>>>>>      .... but
> >>>>>>>>>>>>>>>       >>    you might want to do other matrix operations
> on the GPU. Is
> >>>>>>>>>>>>>>>      that the
> >>>>>>>>>>>>>>>       >>    issue?
> >>>>>>>>>>>>>>>       >>     >>>>> Did you try -mat_type aijcusparse with
> MUMPS and/or
> >>>>>>>>>>>>>>>      SuperLU
> >>>>>>>>>>>>>>>       >>    have a problem? (no test with it so it
> probably does not work)
> >>>>>>>>>>>>>>>       >>     >>>>> Thanks,
> >>>>>>>>>>>>>>>       >>     >>>>> Mark
> >>>>>>>>>>>>>>>       >>     >>>>>     so I
> >>>>>>>>>>>>>>>       >>     >>>>>     want the code to have a mpiaij
> matrix when adding
> >>>>>>>>>>>>>>>      all the
> >>>>>>>>>>>>>>>       >>    matrix terms,
> >>>>>>>>>>>>>>>       >>     >>>>>     and then transform the matrix to
> seqaij when doing the
> >>>>>>>>>>>>>>>       >>    factorization
> >>>>>>>>>>>>>>>       >>     >>>>>     and
> >>>>>>>>>>>>>>>       >>     >>>>>     solve. This involves sending the
> data to the master
> >>>>>>>>>>>>>>>       >>    process, and I
> >>>>>>>>>>>>>>>       >>     >>>>>     think
> >>>>>>>>>>>>>>>       >>     >>>>>     the petsc mumps solver have
> something similar already.
> >>>>>>>>>>>>>>>       >>     >>>>>     Chang
> >>>>>>>>>>>>>>>       >>     >>>>>     On 10/13/21 10:18 AM, Junchao Zhang
> wrote:
> >>>>>>>>>>>>>>>       >>     >>>>>      >
> >>>>>>>>>>>>>>>       >>     >>>>>      >
> >>>>>>>>>>>>>>>       >>     >>>>>      >
> >>>>>>>>>>>>>>>       >>     >>>>>      > On Tue, Oct 12, 2021 at 1:07 PM
> Mark Adams
> >>>>>>>>>>>>>>>       >>    <mfadams at lbl.gov <mailto:mfadams at lbl.gov>
> >>>>>>>>>>>>>>>      <mailto:mfadams at lbl.gov <mailto:mfadams at lbl.gov>>
> >>>>>>>>>>>>>>>       >>     >>>>>     <mailto:mfadams at lbl.gov <mailto:
> mfadams at lbl.gov>
> >>>>>>>>>>>>>>>      <mailto:mfadams at lbl.gov <mailto:mfadams at lbl.gov>>>
> >>>>>>>>>>>>>>>       >>     >>>>>      > <mailto:mfadams at lbl.gov
> >>>>>>>>>>>>>>>      <mailto:mfadams at lbl.gov> <mailto:mfadams at lbl.gov
> >>>>>>>>>>>>>>>      <mailto:mfadams at lbl.gov>>
> >>>>>>>>>>>>>>>       >>    <mailto:mfadams at lbl.gov <mailto:
> mfadams at lbl.gov>
> >>>>>>>>>>>>>>>      <mailto:mfadams at lbl.gov <mailto:mfadams at lbl.gov>>>>>
> wrote:
> >>>>>>>>>>>>>>>       >>     >>>>>      >
> >>>>>>>>>>>>>>>       >>     >>>>>      >
> >>>>>>>>>>>>>>>       >>     >>>>>      >
> >>>>>>>>>>>>>>>       >>     >>>>>      >     On Tue, Oct 12, 2021 at 1:45
> PM Chang Liu
> >>>>>>>>>>>>>>>       >>    <cliu at pppl.gov <mailto:cliu at pppl.gov> <mailto:
> cliu at pppl.gov
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov>>
> >>>>>>>>>>>>>>>       >>     >>>>>     <mailto:cliu at pppl.gov <mailto:
> cliu at pppl.gov>
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
> >>>>>>>>>>>>>>>       >>     >>>>>      >     <mailto:cliu at pppl.gov
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:
> cliu at pppl.gov>>
> >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>
> wrote:
> >>>>>>>>>>>>>>>       >>     >>>>>      >
> >>>>>>>>>>>>>>>       >>     >>>>>      >         Hi Mark,
> >>>>>>>>>>>>>>>       >>     >>>>>      >
> >>>>>>>>>>>>>>>       >>     >>>>>      >         The option I use is like
> >>>>>>>>>>>>>>>       >>     >>>>>      >
> >>>>>>>>>>>>>>>       >>     >>>>>      >         -pc_type bjacobi
> -pc_bjacobi_blocks 16
> >>>>>>>>>>>>>>>       >>    -ksp_type fgmres
> >>>>>>>>>>>>>>>       >>     >>>>>     -mat_type
> >>>>>>>>>>>>>>>       >>     >>>>>      >         aijcusparse
> *-sub_pc_factor_mat_solver_type
> >>>>>>>>>>>>>>>       >>    cusparse
> >>>>>>>>>>>>>>>       >>     >>>>>     *-sub_ksp_type
> >>>>>>>>>>>>>>>       >>     >>>>>      >         preonly *-sub_pc_type
> lu* -ksp_max_it 2000
> >>>>>>>>>>>>>>>       >>    -ksp_rtol 1.e-300
> >>>>>>>>>>>>>>>       >>     >>>>>      >         -ksp_atol 1.e-300
> >>>>>>>>>>>>>>>       >>     >>>>>      >
> >>>>>>>>>>>>>>>       >>     >>>>>      >
> >>>>>>>>>>>>>>>       >>     >>>>>      >     Note, If you use -log_view
> the last column
> >>>>>>>>>>>>>>>      (rows
> >>>>>>>>>>>>>>>       >>    are the
> >>>>>>>>>>>>>>>       >>     >>>>>     method like
> >>>>>>>>>>>>>>>       >>     >>>>>      >     MatFactorNumeric) has the
> percent of work
> >>>>>>>>>>>>>>>      in the GPU.
> >>>>>>>>>>>>>>>       >>     >>>>>      >
> >>>>>>>>>>>>>>>       >>     >>>>>      >     Junchao: *This* implies that
> we have a
> >>>>>>>>>>>>>>>      cuSparse LU
> >>>>>>>>>>>>>>>       >>     >>>>>     factorization. Is
> >>>>>>>>>>>>>>>       >>     >>>>>      >     that correct? (I don't think
> we do)
> >>>>>>>>>>>>>>>       >>     >>>>>      >
> >>>>>>>>>>>>>>>       >>     >>>>>      > No, we don't have cuSparse LU
> factorization.     If you check
> >>>>>>>>>>>>>>>       >>     >>>>>      >
> MatLUFactorSymbolic_SeqAIJCUSPARSE(),you will
> >>>>>>>>>>>>>>>      find it
> >>>>>>>>>>>>>>>       >>    calls
> >>>>>>>>>>>>>>>       >>     >>>>>      > MatLUFactorSymbolic_SeqAIJ()
> instead.
> >>>>>>>>>>>>>>>       >>     >>>>>      > So I don't understand Chang's
> idea. Do you want to
> >>>>>>>>>>>>>>>       >>    make bigger
> >>>>>>>>>>>>>>>       >>     >>>>>     blocks?
> >>>>>>>>>>>>>>>       >>     >>>>>      >
> >>>>>>>>>>>>>>>       >>     >>>>>      >
> >>>>>>>>>>>>>>>       >>     >>>>>      >         I think this one do both
> factorization and
> >>>>>>>>>>>>>>>       >>    solve on gpu.
> >>>>>>>>>>>>>>>       >>     >>>>>      >
> >>>>>>>>>>>>>>>       >>     >>>>>      >         You can check the
> >>>>>>>>>>>>>>>      runex72_aijcusparse.sh file
> >>>>>>>>>>>>>>>       >>    in petsc
> >>>>>>>>>>>>>>>       >>     >>>>>     install
> >>>>>>>>>>>>>>>       >>     >>>>>      >         directory, and try it
> your self (this
> >>>>>>>>>>>>>>>      is only lu
> >>>>>>>>>>>>>>>       >>     >>>>>     factorization
> >>>>>>>>>>>>>>>       >>     >>>>>      >         without
> >>>>>>>>>>>>>>>       >>     >>>>>      >         iterative solve).
> >>>>>>>>>>>>>>>       >>     >>>>>      >
> >>>>>>>>>>>>>>>       >>     >>>>>      >         Chang
> >>>>>>>>>>>>>>>       >>     >>>>>      >
> >>>>>>>>>>>>>>>       >>     >>>>>      >         On 10/12/21 1:17 PM,
> Mark Adams wrote:
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >
> >>>>>>>>>>>>>>>       >>     >>>>>      >          > On Tue, Oct 12, 2021
> at 11:19 AM
> >>>>>>>>>>>>>>>      Chang Liu
> >>>>>>>>>>>>>>>       >>     >>>>>     <cliu at pppl.gov <mailto:
> cliu at pppl.gov>
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
> >>>>>>>>>>>>>>>       >>     >>>>>      >         <mailto:cliu at pppl.gov
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:
> cliu at pppl.gov>>
> >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
> >>>>>>>>>>>>>>>       >>     >>>>>      >          > <mailto:cliu at pppl.gov
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov> <mailto:
> cliu at pppl.gov
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov>>>
> >>>>>>>>>>>>>>>       >>     >>>>>     <mailto:cliu at pppl.gov <mailto:
> cliu at pppl.gov>
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>>
> wrote:
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >     Hi Junchao,
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >     No I only needs
> it to be transferred
> >>>>>>>>>>>>>>>       >>    within a
> >>>>>>>>>>>>>>>       >>     >>>>>     node. I use
> >>>>>>>>>>>>>>>       >>     >>>>>      >         block-Jacobi
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >     method and GMRES
> to solve the sparse
> >>>>>>>>>>>>>>>       >>    matrix, so each
> >>>>>>>>>>>>>>>       >>     >>>>>      >         direct solver will
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >     take care of a
> sub-block of the
> >>>>>>>>>>>>>>>      whole
> >>>>>>>>>>>>>>>       >>    matrix. In this
> >>>>>>>>>>>>>>>       >>     >>>>>      >         way, I can use
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >     one
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >     GPU to solve one
> sub-block, which is
> >>>>>>>>>>>>>>>       >>    stored within
> >>>>>>>>>>>>>>>       >>     >>>>>     one node.
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >     It was stated in
> the
> >>>>>>>>>>>>>>>      documentation that
> >>>>>>>>>>>>>>>       >>    cusparse
> >>>>>>>>>>>>>>>       >>     >>>>>     solver
> >>>>>>>>>>>>>>>       >>     >>>>>      >         is slow.
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >     However, in my
> test using
> >>>>>>>>>>>>>>>      ex72.c, the
> >>>>>>>>>>>>>>>       >>    cusparse
> >>>>>>>>>>>>>>>       >>     >>>>>     solver is
> >>>>>>>>>>>>>>>       >>     >>>>>      >         faster than
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >     mumps or
> superlu_dist on CPUs.
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >
> >>>>>>>>>>>>>>>       >>     >>>>>      >          > Are we talking about
> the
> >>>>>>>>>>>>>>>      factorization, the
> >>>>>>>>>>>>>>>       >>    solve, or
> >>>>>>>>>>>>>>>       >>     >>>>>     both?
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >
> >>>>>>>>>>>>>>>       >>     >>>>>      >          > We do not have an
> interface to
> >>>>>>>>>>>>>>>      cuSparse's LU
> >>>>>>>>>>>>>>>       >>     >>>>>     factorization (I
> >>>>>>>>>>>>>>>       >>     >>>>>      >         just
> >>>>>>>>>>>>>>>       >>     >>>>>      >          > learned that it
> exists a few weeks ago).
> >>>>>>>>>>>>>>>       >>     >>>>>      >          > Perhaps your fast
> "cusparse solver" is
> >>>>>>>>>>>>>>>       >>    '-pc_type lu
> >>>>>>>>>>>>>>>       >>     >>>>>     -mat_type
> >>>>>>>>>>>>>>>       >>     >>>>>      >          > aijcusparse' ? This
> would be the CPU
> >>>>>>>>>>>>>>>       >>    factorization,
> >>>>>>>>>>>>>>>       >>     >>>>>     which is the
> >>>>>>>>>>>>>>>       >>     >>>>>      >          > dominant cost.
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >     Chang
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >     On 10/12/21 10:24
> AM, Junchao
> >>>>>>>>>>>>>>>      Zhang wrote:
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >      > Hi, Chang,
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >     For the
> mumps solver, we
> >>>>>>>>>>>>>>>      usually
> >>>>>>>>>>>>>>>       >>    transfers
> >>>>>>>>>>>>>>>       >>     >>>>>     matrix
> >>>>>>>>>>>>>>>       >>     >>>>>      >         and vector
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >     data
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >      > within a
> compute node.  For
> >>>>>>>>>>>>>>>      the idea you
> >>>>>>>>>>>>>>>       >>     >>>>>     propose, it
> >>>>>>>>>>>>>>>       >>     >>>>>      >         looks like
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >     we need
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >      > to gather data
> within
> >>>>>>>>>>>>>>>       >>    MPI_COMM_WORLD, right?
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >     Mark, I
> remember you said
> >>>>>>>>>>>>>>>       >>    cusparse solve is
> >>>>>>>>>>>>>>>       >>     >>>>>     slow
> >>>>>>>>>>>>>>>       >>     >>>>>      >         and you would
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >      > rather do it
> on CPU. Is it right?
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >      > --Junchao Zhang
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >      > On Mon, Oct
> 11, 2021 at 10:25 PM
> >>>>>>>>>>>>>>>       >>    Chang Liu via
> >>>>>>>>>>>>>>>       >>     >>>>>     petsc-users
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >      > <
> petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov>
> >>>>>>>>>>>>>>>       >>    <mailto:petsc-users at mcs.anl.gov <mailto:
> petsc-users at mcs.anl.gov>>
> >>>>>>>>>>>>>>>       >>     >>>>>     <mailto:petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov>
> >>>>>>>>>>>>>>>       >>    <mailto:petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov>>>
> >>>>>>>>>>>>>>>       >>     >>>>>      >         <mailto:
> petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov>
> >>>>>>>>>>>>>>>       >>    <mailto:petsc-users at mcs.anl.gov <mailto:
> petsc-users at mcs.anl.gov>>
> >>>>>>>>>>>>>>>       >>     >>>>>     <mailto:petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov>
> >>>>>>>>>>>>>>>       >>    <mailto:petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov>>>> <mailto:
> petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov>
> >>>>>>>>>>>>>>>       >>    <mailto:petsc-users at mcs.anl.gov <mailto:
> petsc-users at mcs.anl.gov>>
> >>>>>>>>>>>>>>>       >>     >>>>>     <mailto:petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov>
> >>>>>>>>>>>>>>>       >>    <mailto:petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov>>>
> >>>>>>>>>>>>>>>       >>     >>>>>      >         <mailto:
> petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov>
> >>>>>>>>>>>>>>>       >>    <mailto:petsc-users at mcs.anl.gov <mailto:
> petsc-users at mcs.anl.gov>>
> >>>>>>>>>>>>>>>       >>     >>>>>     <mailto:petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov>
> >>>>>>>>>>>>>>>       >>    <mailto:petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov>>>>>
> >>>>>>>>>>>>>>>       >>     >>>>>      >          > <mailto:
> petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov>
> >>>>>>>>>>>>>>>       >>    <mailto:petsc-users at mcs.anl.gov <mailto:
> petsc-users at mcs.anl.gov>>
> >>>>>>>>>>>>>>>       >>     >>>>>     <mailto:petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov>
> >>>>>>>>>>>>>>>       >>    <mailto:petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov>>>
> >>>>>>>>>>>>>>>       >>     >>>>>      >         <mailto:
> petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov>
> >>>>>>>>>>>>>>>       >>    <mailto:petsc-users at mcs.anl.gov <mailto:
> petsc-users at mcs.anl.gov>>
> >>>>>>>>>>>>>>>       >>     >>>>>     <mailto:petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov>
> >>>>>>>>>>>>>>>       >>    <mailto:petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov>>>> <mailto:
> petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov>
> >>>>>>>>>>>>>>>       >>    <mailto:petsc-users at mcs.anl.gov <mailto:
> petsc-users at mcs.anl.gov>>
> >>>>>>>>>>>>>>>       >>     >>>>>     <mailto:petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov>
> >>>>>>>>>>>>>>>       >>    <mailto:petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov>>>
> >>>>>>>>>>>>>>>       >>     >>>>>      >         <mailto:
> petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov>
> >>>>>>>>>>>>>>>       >>    <mailto:petsc-users at mcs.anl.gov <mailto:
> petsc-users at mcs.anl.gov>>
> >>>>>>>>>>>>>>>       >>     >>>>>     <mailto:petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov>
> >>>>>>>>>>>>>>>       >>    <mailto:petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>>      <mailto:petsc-users at mcs.anl.gov>>>>>>>
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >     wrote:
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >     Hi,
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >     Currently,
> it is possible
> >>>>>>>>>>>>>>>      to use
> >>>>>>>>>>>>>>>       >>    mumps
> >>>>>>>>>>>>>>>       >>     >>>>>     solver in
> >>>>>>>>>>>>>>>       >>     >>>>>      >         PETSC with
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >
> -mat_mumps_use_omp_threads
> >>>>>>>>>>>>>>>       >>    option, so that
> >>>>>>>>>>>>>>>       >>     >>>>>      >         multiple MPI
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >     processes will
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >     transfer
> the matrix and
> >>>>>>>>>>>>>>>      rhs data
> >>>>>>>>>>>>>>>       >>    to the master
> >>>>>>>>>>>>>>>       >>     >>>>>      >         rank, and then
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >     master
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >     rank will
> call mumps with
> >>>>>>>>>>>>>>>      OpenMP
> >>>>>>>>>>>>>>>       >>    to solve
> >>>>>>>>>>>>>>>       >>     >>>>>     the matrix.
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >     I wonder
> if someone can
> >>>>>>>>>>>>>>>      develop
> >>>>>>>>>>>>>>>       >>    similar
> >>>>>>>>>>>>>>>       >>     >>>>>     option for
> >>>>>>>>>>>>>>>       >>     >>>>>      >         cusparse
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >     solver.
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >     Right now,
> this solver
> >>>>>>>>>>>>>>>      does not
> >>>>>>>>>>>>>>>       >>    work with
> >>>>>>>>>>>>>>>       >>     >>>>>      >         mpiaijcusparse. I
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >     think a
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >     possible
> workaround is to
> >>>>>>>>>>>>>>>       >>    transfer all the
> >>>>>>>>>>>>>>>       >>     >>>>>     matrix
> >>>>>>>>>>>>>>>       >>     >>>>>      >         data to one MPI
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >     process,
> and then upload the
> >>>>>>>>>>>>>>>       >>    data to GPU to
> >>>>>>>>>>>>>>>       >>     >>>>>     solve.
> >>>>>>>>>>>>>>>       >>     >>>>>      >         In this
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >     way, one can
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >     use
> cusparse solver for a MPI
> >>>>>>>>>>>>>>>       >>    program.
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >     Chang
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >     --
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >     Chang Liu
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >     Staff
> Research Physicist
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >     +1 609 243
> 3438
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >      > cliu at pppl.gov
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:
> cliu at pppl.gov>>
> >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
> >>>>>>>>>>>>>>>       >>     >>>>>     <mailto:cliu at pppl.gov <mailto:
> cliu at pppl.gov>
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
> >>>>>>>>>>>>>>>       >>     >>>>>      >         <mailto:cliu at pppl.gov
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:
> cliu at pppl.gov>>
> >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
> >>>>>>>>>>>>>>>       >>     >>>>>     <mailto:cliu at pppl.gov <mailto:
> cliu at pppl.gov>
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>
> >>>>>>>>>>>>>>>       >>     >>>>>      >         <mailto:cliu at pppl.gov
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:
> cliu at pppl.gov>>
> >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
> >>>>>>>>>>>>>>>       >>     >>>>>     <mailto:cliu at pppl.gov <mailto:
> cliu at pppl.gov>
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >     <mailto:
> cliu at pppl.gov
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov> <mailto:
> cliu at pppl.gov
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov>>>
> >>>>>>>>>>>>>>>       >>     >>>>>     <mailto:cliu at pppl.gov <mailto:
> cliu at pppl.gov>
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>>
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >     Princeton
> Plasma Physics
> >>>>>>>>>>>>>>>      Laboratory
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >     100
> Stellarator Rd,
> >>>>>>>>>>>>>>>      Princeton NJ
> >>>>>>>>>>>>>>>       >>    08540, USA
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >     --
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >     Chang Liu
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >     Staff Research
> Physicist
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >     +1 609 243 3438
> >>>>>>>>>>>>>>>       >>     >>>>>      >          > cliu at pppl.gov
> <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
> >>>>>>>>>>>>>>>       >>     >>>>>     <mailto:cliu at pppl.gov <mailto:
> cliu at pppl.gov>
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> >>>>>>>>>>>>>>>       >>     >>>>>     <mailto:cliu at pppl.gov <mailto:
> cliu at pppl.gov>
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
> >>>>>>>>>>>>>>>       >>     >>>>>      >         <mailto:cliu at pppl.gov
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:
> cliu at pppl.gov>>
> >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >     Princeton Plasma
> Physics Laboratory
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >     100 Stellarator
> Rd, Princeton NJ
> >>>>>>>>>>>>>>>      08540, USA
> >>>>>>>>>>>>>>>       >>     >>>>>      >          >
> >>>>>>>>>>>>>>>       >>     >>>>>      >
> >>>>>>>>>>>>>>>       >>     >>>>>      >         --
> >>>>>>>>>>>>>>>       >>     >>>>>      >         Chang Liu
> >>>>>>>>>>>>>>>       >>     >>>>>      >         Staff Research Physicist
> >>>>>>>>>>>>>>>       >>     >>>>>      >         +1 609 243 3438
> >>>>>>>>>>>>>>>       >>     >>>>>      > cliu at pppl.gov <mailto:
> cliu at pppl.gov>
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
> <mailto:cliu at pppl.gov
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> >>>>>>>>>>>>>>>       >>     >>>>>     <mailto:cliu at pppl.gov <mailto:
> cliu at pppl.gov>
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
> >>>>>>>>>>>>>>>       >>     >>>>>      >         Princeton Plasma Physics
> Laboratory
> >>>>>>>>>>>>>>>       >>     >>>>>      >         100 Stellarator Rd,
> Princeton NJ 08540, USA
> >>>>>>>>>>>>>>>       >>     >>>>>      >
> >>>>>>>>>>>>>>>       >>     >>>>>     --     Chang Liu
> >>>>>>>>>>>>>>>       >>     >>>>>     Staff Research Physicist
> >>>>>>>>>>>>>>>       >>     >>>>>     +1 609 243 3438
> >>>>>>>>>>>>>>>       >>     >>>>> cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> <mailto:cliu at pppl.gov
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>>       >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
> >>>>>>>>>>>>>>>       >>     >>>>>     Princeton Plasma Physics Laboratory
> >>>>>>>>>>>>>>>       >>     >>>>>     100 Stellarator Rd, Princeton NJ
> 08540, USA
> >>>>>>>>>>>>>>>       >>     >>>>
> >>>>>>>>>>>>>>>       >>     >>>> --
> >>>>>>>>>>>>>>>       >>     >>>> Chang Liu
> >>>>>>>>>>>>>>>       >>     >>>> Staff Research Physicist
> >>>>>>>>>>>>>>>       >>     >>>> +1 609 243 3438
> >>>>>>>>>>>>>>>       >>     >>>> cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> >>>>>>>>>>>>>>>       >>     >>>> Princeton Plasma Physics Laboratory
> >>>>>>>>>>>>>>>       >>     >>>> 100 Stellarator Rd, Princeton NJ 08540,
> USA
> >>>>>>>>>>>>>>>       >>     >>
> >>>>>>>>>>>>>>>       >>     >> --
> >>>>>>>>>>>>>>>       >>     >> Chang Liu
> >>>>>>>>>>>>>>>       >>     >> Staff Research Physicist
> >>>>>>>>>>>>>>>       >>     >> +1 609 243 3438
> >>>>>>>>>>>>>>>       >>     >> cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> >>>>>>>>>>>>>>>       >>     >> Princeton Plasma Physics Laboratory
> >>>>>>>>>>>>>>>       >>     >> 100 Stellarator Rd, Princeton NJ 08540, USA
> >>>>>>>>>>>>>>>       >>     >
> >>>>>>>>>>>>>>>       >>    --     Chang Liu
> >>>>>>>>>>>>>>>       >>    Staff Research Physicist
> >>>>>>>>>>>>>>>       >>    +1 609 243 3438
> >>>>>>>>>>>>>>>       >> cliu at pppl.gov <mailto:cliu at pppl.gov> <mailto:
> cliu at pppl.gov
> >>>>>>>>>>>>>>>      <mailto:cliu at pppl.gov>>
> >>>>>>>>>>>>>>>       >>    Princeton Plasma Physics Laboratory
> >>>>>>>>>>>>>>>       >>    100 Stellarator Rd, Princeton NJ 08540, USA
> >>>>>>>>>>>>>>>       >
> >>>>>>>>>>>>>>>       > --
> >>>>>>>>>>>>>>>       > Chang Liu
> >>>>>>>>>>>>>>>       > Staff Research Physicist
> >>>>>>>>>>>>>>>       > +1 609 243 3438
> >>>>>>>>>>>>>>>       > cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>>       > Princeton Plasma Physics Laboratory
> >>>>>>>>>>>>>>>       > 100 Stellarator Rd, Princeton NJ 08540, USA
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>> Chang Liu
> >>>>>>>>>>>>>> Staff Research Physicist
> >>>>>>>>>>>>>> +1 609 243 3438
> >>>>>>>>>>>>>> cliu at pppl.gov
> >>>>>>>>>>>>>> Princeton Plasma Physics Laboratory
> >>>>>>>>>>>>>> 100 Stellarator Rd, Princeton NJ 08540, USA
> >>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>> Chang Liu
> >>>>>>>>>>>> Staff Research Physicist
> >>>>>>>>>>>> +1 609 243 3438
> >>>>>>>>>>>> cliu at pppl.gov
> >>>>>>>>>>>> Princeton Plasma Physics Laboratory
> >>>>>>>>>>>> 100 Stellarator Rd, Princeton NJ 08540, USA
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> Chang Liu
> >>>>>>>>>> Staff Research Physicist
> >>>>>>>>>> +1 609 243 3438
> >>>>>>>>>> cliu at pppl.gov
> >>>>>>>>>> Princeton Plasma Physics Laboratory
> >>>>>>>>>> 100 Stellarator Rd, Princeton NJ 08540, USA
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Chang Liu
> >>>>>>>> Staff Research Physicist
> >>>>>>>> +1 609 243 3438
> >>>>>>>> cliu at pppl.gov
> >>>>>>>> Princeton Plasma Physics Laboratory
> >>>>>>>> 100 Stellarator Rd, Princeton NJ 08540, USA
> >>>>>>>
> >>>>>>
> >>>>
> >>>> --
> >>>> Chang Liu
> >>>> Staff Research Physicist
> >>>> +1 609 243 3438
> >>>> cliu at pppl.gov
> >>>> Princeton Plasma Physics Laboratory
> >>>> 100 Stellarator Rd, Princeton NJ 08540, USA
> >>
> >> --
> >> Chang Liu
> >> Staff Research Physicist
> >> +1 609 243 3438
> >> cliu at pppl.gov
> >> Princeton Plasma Physics Laboratory
> >> 100 Stellarator Rd, Princeton NJ 08540, USA
> >
>
> --
> Chang Liu
> Staff Research Physicist
> +1 609 243 3438
> cliu at pppl.gov
> Princeton Plasma Physics Laboratory
> 100 Stellarator Rd, Princeton NJ 08540, USA
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20211020/83d91451/attachment-0001.html>