[petsc-users] [External] Re: request to add an option similar to use_omp_threads for mumps to cusparse solver
Junchao Zhang
junchao.zhang at gmail.com
Wed Oct 20 13:59:12 CDT 2021
The MR https://gitlab.com/petsc/petsc/-/merge_requests/4471 has not been
merged yet.
--Junchao Zhang
On Wed, Oct 20, 2021 at 1:47 PM Chang Liu via petsc-users <
petsc-users at mcs.anl.gov> wrote:
> Hi Barry,
>
> Are the fixes merged in the master? I was using bjacobi as a
> preconditioner. Using the latest version of petsc, I found that by calling
>
> mpiexec -n 32 --oversubscribe ./ex7 -m 1000 -ksp_view
> -ksp_monitor_true_residual -ksp_type fgmres -pc_type bjacobi -pc_bjacobi
> _blocks 4 -sub_ksp_type preonly -sub_pc_type telescope
> -sub_pc_telescope_reduction_factor 8 -sub_pc_telescope_subcomm_type
> contiguous -sub_telescope_pc_type lu -sub_telescope_ksp_type preonly
> -sub_telescope_pc_factor_mat_solver_type mumps -ksp_max_it 2000
> -ksp_rtol 1.e-30 -ksp_atol 1.e-30
>
> The code is calling PCApply_BJacobi_Multiproc. If I use
>
> mpiexec -n 32 --oversubscribe ./ex7 -m 1000 -ksp_view
> -ksp_monitor_true_residual -telescope_ksp_monitor_true_residual
> -ksp_type preonly -pc_type telescope -pc_telescope_reduction_factor 8
> -pc_telescope_subcomm_type contiguous -telescope_pc_type bjacobi
> -telescope_ksp_type fgmres -telescope_pc_bjacobi_blocks 4
> -telescope_sub_ksp_type preonly -telescope_sub_pc_type lu
> -telescope_sub_pc_factor_mat_solver_type mumps -telescope_ksp_max_it
> 2000 -telescope_ksp_rtol 1.e-30 -telescope_ksp_atol 1.e-30
>
> The code is calling PCApply_BJacobi_Singleblock. You can test it yourself.
>
> Regards,
>
> Chang
>
> On 10/20/21 1:14 PM, Barry Smith wrote:
> >
> >
> >> On Oct 20, 2021, at 12:48 PM, Chang Liu <cliu at pppl.gov> wrote:
> >>
> >> Hi Pierre,
> >>
> >> I have another suggestion for telescope. I have achieved my goal by
> putting telescope outside bjacobi. But the code still does not work if I
> use telescope as a pc for subblock. I think the reason is that I want to
> use cusparse as the solver, which can only deal with seqaij matrix and not
> mpiaij matrix.
> >
> >
> > This is suppose to work with the recent fixes. The telescope should
> produce a seq matrix and for each solve map the parallel vector (over the
> subdomain) automatically down to the one rank with the GPU to solve it on
> the GPU. It is not clear to me where the process is going wrong.
> >
> > Barry
> >
> >
> >
> >> However, for telescope pc, it can put the matrix into one mpi rank,
> thus making it a seqaij for factorization stage, but then after
> factorization it will give the data back to the original comminicator. This
> will make the matrix back to mpiaij, and then cusparse cannot solve it.
> >>
> >> I think a better option is to do the factorization on CPU with mpiaij,
> then then transform the preconditioner matrix to seqaij and do the matsolve
> GPU. But I am not sure if it can be achieved using telescope.
> >>
> >> Regads,
> >>
> >> Chang
> >>
> >> On 10/15/21 5:29 AM, Pierre Jolivet wrote:
> >>> Hi Chang,
> >>> The output you sent with MUMPS looks alright to me, you can see that
> the MatType is properly set to seqaijcusparse (and not mpiaijcusparse).
> >>> I don’t know what is wrong with
> -sub_telescope_pc_factor_mat_solver_type cusparse, I don’t have a PETSc
> installation for testing this, hopefully Barry or Junchao can confirm this
> wrong behavior and get this fixed.
> >>> As for permuting PCTELESCOPE and PCBJACOBI, in your case, the outer PC
> will be equivalent, yes.
> >>> However, it would be more efficient to do PCBJACOBI and then
> PCTELESCOPE.
> >>> PCBJACOBI prunes the operator by basically removing all coefficients
> outside of the diagonal blocks.
> >>> Then, PCTELESCOPE "groups everything together”.
> >>> If you do it the other way around, PCTELESCOPE will “group everything
> together” and then PCBJACOBI will prune the operator.
> >>> So the PCTELESCOPE SetUp will be costly for nothing since some
> coefficients will be thrown out afterwards in the PCBJACOBI SetUp.
> >>> I hope I’m clear enough, otherwise I can try do draw some pictures.
> >>> Thanks,
> >>> Pierre
> >>>> On 15 Oct 2021, at 4:39 AM, Chang Liu <cliu at pppl.gov> wrote:
> >>>>
> >>>> Hi Pierre and Barry,
> >>>>
> >>>> I think maybe I should use telescope outside bjacobi? like this
> >>>>
> >>>> mpiexec -n 16 --hostfile hostfile --oversubscribe ./ex7 -m 400
> -ksp_view -ksp_monitor_true_residual -pc_type telescope
> -pc_telescope_reduction_factor 4 -t
> >>>> elescope_pc_type bjacobi -telescope_ksp_type fgmres
> -telescope_pc_bjacobi_blocks 4 -mat_type aijcusparse
> -telescope_sub_ksp_type preonly -telescope_sub_pc_type lu
> -telescope_sub_pc_factor_mat_solve
> >>>> r_type cusparse -ksp_max_it 2000 -ksp_rtol 1.e-20 -ksp_atol 1.e-9
> >>>>
> >>>> But then I got an error that
> >>>>
> >>>> [0]PETSC ERROR: MatSolverType cusparse does not support matrix type
> seqaij
> >>>>
> >>>> But the mat type should be aijcusparse. I think telescope change the
> mat type.
> >>>>
> >>>> Chang
> >>>>
> >>>> On 10/14/21 10:11 PM, Chang Liu wrote:
> >>>>> For comparison, here is the output using mumps instead of cusparse
> >>>>> $ mpiexec -n 16 --hostfile hostfile --oversubscribe ./ex7 -m 400
> -ksp_view -ksp_monitor_true_residual -pc_type bjacobi -pc_bjacobi_blocks 4
> -ksp_type fgmres -mat_type aijcusparse -sub_pc_type telescope -sub_ksp_type
> preonly -sub_telescope_ksp_type preonly -sub_telescope_pc_type lu
> -sub_telescope_pc_factor_mat_solver_type mumps
> -sub_pc_telescope_reduction_factor 4 -sub_pc_telescope_subcomm_type
> contiguous -ksp_max_it 2000 -ksp_rtol 1.e-20 -ksp_atol 1.e-9
> >>>>> 0 KSP unpreconditioned resid norm 4.014971979977e+01 true resid
> norm 4.014971979977e+01 ||r(i)||/||b|| 1.000000000000e+00
> >>>>> 1 KSP unpreconditioned resid norm 2.439995191694e+00 true resid
> norm 2.439995191694e+00 ||r(i)||/||b|| 6.077240896978e-02
> >>>>> 2 KSP unpreconditioned resid norm 1.280694102588e+00 true resid
> norm 1.280694102588e+00 ||r(i)||/||b|| 3.189795866509e-02
> >>>>> 3 KSP unpreconditioned resid norm 1.041100266810e+00 true resid
> norm 1.041100266810e+00 ||r(i)||/||b|| 2.593044912896e-02
> >>>>> 4 KSP unpreconditioned resid norm 7.274347137268e-01 true resid
> norm 7.274347137268e-01 ||r(i)||/||b|| 1.811805206499e-02
> >>>>> 5 KSP unpreconditioned resid norm 5.429229329787e-01 true resid
> norm 5.429229329787e-01 ||r(i)||/||b|| 1.352245882876e-02
> >>>>> 6 KSP unpreconditioned resid norm 4.332970410353e-01 true resid
> norm 4.332970410353e-01 ||r(i)||/||b|| 1.079203150598e-02
> >>>>> 7 KSP unpreconditioned resid norm 3.948206050950e-01 true resid
> norm 3.948206050950e-01 ||r(i)||/||b|| 9.833707609019e-03
> >>>>> 8 KSP unpreconditioned resid norm 3.379580577269e-01 true resid
> norm 3.379580577269e-01 ||r(i)||/||b|| 8.417444988714e-03
> >>>>> 9 KSP unpreconditioned resid norm 2.875593971410e-01 true resid
> norm 2.875593971410e-01 ||r(i)||/||b|| 7.162176936105e-03
> >>>>> 10 KSP unpreconditioned resid norm 2.533983363244e-01 true resid
> norm 2.533983363244e-01 ||r(i)||/||b|| 6.311335112378e-03
> >>>>> 11 KSP unpreconditioned resid norm 2.389169921094e-01 true resid
> norm 2.389169921094e-01 ||r(i)||/||b|| 5.950651543793e-03
> >>>>> 12 KSP unpreconditioned resid norm 2.118961639089e-01 true resid
> norm 2.118961639089e-01 ||r(i)||/||b|| 5.277649880637e-03
> >>>>> 13 KSP unpreconditioned resid norm 1.885892030223e-01 true resid
> norm 1.885892030223e-01 ||r(i)||/||b|| 4.697148671593e-03
> >>>>> 14 KSP unpreconditioned resid norm 1.763510666948e-01 true resid
> norm 1.763510666948e-01 ||r(i)||/||b|| 4.392336175055e-03
> >>>>> 15 KSP unpreconditioned resid norm 1.638219366731e-01 true resid
> norm 1.638219366731e-01 ||r(i)||/||b|| 4.080275964317e-03
> >>>>> 16 KSP unpreconditioned resid norm 1.476792766432e-01 true resid
> norm 1.476792766432e-01 ||r(i)||/||b|| 3.678214378076e-03
> >>>>> 17 KSP unpreconditioned resid norm 1.349906937321e-01 true resid
> norm 1.349906937321e-01 ||r(i)||/||b|| 3.362182710248e-03
> >>>>> 18 KSP unpreconditioned resid norm 1.289673236836e-01 true resid
> norm 1.289673236836e-01 ||r(i)||/||b|| 3.212159993314e-03
> >>>>> 19 KSP unpreconditioned resid norm 1.167505658153e-01 true resid
> norm 1.167505658153e-01 ||r(i)||/||b|| 2.907879965230e-03
> >>>>> 20 KSP unpreconditioned resid norm 1.046037988999e-01 true resid
> norm 1.046037988999e-01 ||r(i)||/||b|| 2.605343185995e-03
> >>>>> 21 KSP unpreconditioned resid norm 9.832660514331e-02 true resid
> norm 9.832660514331e-02 ||r(i)||/||b|| 2.448998539309e-03
> >>>>> 22 KSP unpreconditioned resid norm 8.835618950141e-02 true resid
> norm 8.835618950142e-02 ||r(i)||/||b|| 2.200667649539e-03
> >>>>> 23 KSP unpreconditioned resid norm 7.563496650115e-02 true resid
> norm 7.563496650116e-02 ||r(i)||/||b|| 1.883823022386e-03
> >>>>> 24 KSP unpreconditioned resid norm 6.651291376834e-02 true resid
> norm 6.651291376834e-02 ||r(i)||/||b|| 1.656622115921e-03
> >>>>> 25 KSP unpreconditioned resid norm 5.890393227906e-02 true resid
> norm 5.890393227906e-02 ||r(i)||/||b|| 1.467106933070e-03
> >>>>> 26 KSP unpreconditioned resid norm 4.661992782780e-02 true resid
> norm 4.661992782780e-02 ||r(i)||/||b|| 1.161152009536e-03
> >>>>> 27 KSP unpreconditioned resid norm 3.690705358716e-02 true resid
> norm 3.690705358716e-02 ||r(i)||/||b|| 9.192356452602e-04
> >>>>> 28 KSP unpreconditioned resid norm 3.209680460188e-02 true resid
> norm 3.209680460188e-02 ||r(i)||/||b|| 7.994278605666e-04
> >>>>> 29 KSP unpreconditioned resid norm 2.354337626000e-02 true resid
> norm 2.354337626001e-02 ||r(i)||/||b|| 5.863895533373e-04
> >>>>> 30 KSP unpreconditioned resid norm 1.701296561785e-02 true resid
> norm 1.701296561785e-02 ||r(i)||/||b|| 4.237380908932e-04
> >>>>> 31 KSP unpreconditioned resid norm 1.509942937258e-02 true resid
> norm 1.509942937258e-02 ||r(i)||/||b|| 3.760780759588e-04
> >>>>> 32 KSP unpreconditioned resid norm 1.258274688515e-02 true resid
> norm 1.258274688515e-02 ||r(i)||/||b|| 3.133956338402e-04
> >>>>> 33 KSP unpreconditioned resid norm 9.805748771638e-03 true resid
> norm 9.805748771638e-03 ||r(i)||/||b|| 2.442295692359e-04
> >>>>> 34 KSP unpreconditioned resid norm 8.596552678160e-03 true resid
> norm 8.596552678160e-03 ||r(i)||/||b|| 2.141123953301e-04
> >>>>> 35 KSP unpreconditioned resid norm 6.936406707500e-03 true resid
> norm 6.936406707500e-03 ||r(i)||/||b|| 1.727635147167e-04
> >>>>> 36 KSP unpreconditioned resid norm 5.533741607932e-03 true resid
> norm 5.533741607932e-03 ||r(i)||/||b|| 1.378276519869e-04
> >>>>> 37 KSP unpreconditioned resid norm 4.982347757923e-03 true resid
> norm 4.982347757923e-03 ||r(i)||/||b|| 1.240942099414e-04
> >>>>> 38 KSP unpreconditioned resid norm 4.309608348059e-03 true resid
> norm 4.309608348059e-03 ||r(i)||/||b|| 1.073384414524e-04
> >>>>> 39 KSP unpreconditioned resid norm 3.729408303186e-03 true resid
> norm 3.729408303185e-03 ||r(i)||/||b|| 9.288753001974e-05
> >>>>> 40 KSP unpreconditioned resid norm 3.490003351128e-03 true resid
> norm 3.490003351128e-03 ||r(i)||/||b|| 8.692472496776e-05
> >>>>> 41 KSP unpreconditioned resid norm 3.069012426454e-03 true resid
> norm 3.069012426453e-03 ||r(i)||/||b|| 7.643919912166e-05
> >>>>> 42 KSP unpreconditioned resid norm 2.772928845284e-03 true resid
> norm 2.772928845284e-03 ||r(i)||/||b|| 6.906471225983e-05
> >>>>> 43 KSP unpreconditioned resid norm 2.561454192399e-03 true resid
> norm 2.561454192398e-03 ||r(i)||/||b|| 6.379756085902e-05
> >>>>> 44 KSP unpreconditioned resid norm 2.253662762802e-03 true resid
> norm 2.253662762802e-03 ||r(i)||/||b|| 5.613146926159e-05
> >>>>> 45 KSP unpreconditioned resid norm 2.086800523919e-03 true resid
> norm 2.086800523919e-03 ||r(i)||/||b|| 5.197546917701e-05
> >>>>> 46 KSP unpreconditioned resid norm 1.926028182896e-03 true resid
> norm 1.926028182896e-03 ||r(i)||/||b|| 4.797114880257e-05
> >>>>> 47 KSP unpreconditioned resid norm 1.769243808622e-03 true resid
> norm 1.769243808622e-03 ||r(i)||/||b|| 4.406615581492e-05
> >>>>> 48 KSP unpreconditioned resid norm 1.656654905964e-03 true resid
> norm 1.656654905964e-03 ||r(i)||/||b|| 4.126192945371e-05
> >>>>> 49 KSP unpreconditioned resid norm 1.572052627273e-03 true resid
> norm 1.572052627273e-03 ||r(i)||/||b|| 3.915475961260e-05
> >>>>> 50 KSP unpreconditioned resid norm 1.454960682355e-03 true resid
> norm 1.454960682355e-03 ||r(i)||/||b|| 3.623837699518e-05
> >>>>> 51 KSP unpreconditioned resid norm 1.375985053014e-03 true resid
> norm 1.375985053014e-03 ||r(i)||/||b|| 3.427134883820e-05
> >>>>> 52 KSP unpreconditioned resid norm 1.269325501087e-03 true resid
> norm 1.269325501087e-03 ||r(i)||/||b|| 3.161480347603e-05
> >>>>> 53 KSP unpreconditioned resid norm 1.184791772965e-03 true resid
> norm 1.184791772965e-03 ||r(i)||/||b|| 2.950934100844e-05
> >>>>> 54 KSP unpreconditioned resid norm 1.064535156080e-03 true resid
> norm 1.064535156080e-03 ||r(i)||/||b|| 2.651413662135e-05
> >>>>> 55 KSP unpreconditioned resid norm 9.639036688120e-04 true resid
> norm 9.639036688117e-04 ||r(i)||/||b|| 2.400773090370e-05
> >>>>> 56 KSP unpreconditioned resid norm 8.632359780260e-04 true resid
> norm 8.632359780260e-04 ||r(i)||/||b|| 2.150042347322e-05
> >>>>> 57 KSP unpreconditioned resid norm 7.613605783850e-04 true resid
> norm 7.613605783850e-04 ||r(i)||/||b|| 1.896303591113e-05
> >>>>> 58 KSP unpreconditioned resid norm 6.681073248348e-04 true resid
> norm 6.681073248349e-04 ||r(i)||/||b|| 1.664039819373e-05
> >>>>> 59 KSP unpreconditioned resid norm 5.656127908544e-04 true resid
> norm 5.656127908545e-04 ||r(i)||/||b|| 1.408758999254e-05
> >>>>> 60 KSP unpreconditioned resid norm 4.850863370767e-04 true resid
> norm 4.850863370767e-04 ||r(i)||/||b|| 1.208193580169e-05
> >>>>> 61 KSP unpreconditioned resid norm 4.374055762320e-04 true resid
> norm 4.374055762316e-04 ||r(i)||/||b|| 1.089436186387e-05
> >>>>> 62 KSP unpreconditioned resid norm 3.874398257079e-04 true resid
> norm 3.874398257077e-04 ||r(i)||/||b|| 9.649876204364e-06
> >>>>> 63 KSP unpreconditioned resid norm 3.364908694427e-04 true resid
> norm 3.364908694429e-04 ||r(i)||/||b|| 8.380902061609e-06
> >>>>> 64 KSP unpreconditioned resid norm 2.961034697265e-04 true resid
> norm 2.961034697268e-04 ||r(i)||/||b|| 7.374982221632e-06
> >>>>> 65 KSP unpreconditioned resid norm 2.640593092764e-04 true resid
> norm 2.640593092767e-04 ||r(i)||/||b|| 6.576865557059e-06
> >>>>> 66 KSP unpreconditioned resid norm 2.423231125743e-04 true resid
> norm 2.423231125745e-04 ||r(i)||/||b|| 6.035487016671e-06
> >>>>> 67 KSP unpreconditioned resid norm 2.182349471179e-04 true resid
> norm 2.182349471179e-04 ||r(i)||/||b|| 5.435528521898e-06
> >>>>> 68 KSP unpreconditioned resid norm 2.008438265031e-04 true resid
> norm 2.008438265028e-04 ||r(i)||/||b|| 5.002371809927e-06
> >>>>> 69 KSP unpreconditioned resid norm 1.838732863386e-04 true resid
> norm 1.838732863388e-04 ||r(i)||/||b|| 4.579690400226e-06
> >>>>> 70 KSP unpreconditioned resid norm 1.723786027645e-04 true resid
> norm 1.723786027645e-04 ||r(i)||/||b|| 4.293394913444e-06
> >>>>> 71 KSP unpreconditioned resid norm 1.580945192204e-04 true resid
> norm 1.580945192205e-04 ||r(i)||/||b|| 3.937624471826e-06
> >>>>> 72 KSP unpreconditioned resid norm 1.476687469671e-04 true resid
> norm 1.476687469671e-04 ||r(i)||/||b|| 3.677952117812e-06
> >>>>> 73 KSP unpreconditioned resid norm 1.385018526182e-04 true resid
> norm 1.385018526184e-04 ||r(i)||/||b|| 3.449634351350e-06
> >>>>> 74 KSP unpreconditioned resid norm 1.279712893541e-04 true resid
> norm 1.279712893541e-04 ||r(i)||/||b|| 3.187351991305e-06
> >>>>> 75 KSP unpreconditioned resid norm 1.202010411772e-04 true resid
> norm 1.202010411774e-04 ||r(i)||/||b|| 2.993820175504e-06
> >>>>> 76 KSP unpreconditioned resid norm 1.113459414198e-04 true resid
> norm 1.113459414200e-04 ||r(i)||/||b|| 2.773268206485e-06
> >>>>> 77 KSP unpreconditioned resid norm 1.042523036036e-04 true resid
> norm 1.042523036037e-04 ||r(i)||/||b|| 2.596588572066e-06
> >>>>> 78 KSP unpreconditioned resid norm 9.565176453232e-05 true resid
> norm 9.565176453227e-05 ||r(i)||/||b|| 2.382376888539e-06
> >>>>> 79 KSP unpreconditioned resid norm 8.896901670359e-05 true resid
> norm 8.896901670365e-05 ||r(i)||/||b|| 2.215931198209e-06
> >>>>> 80 KSP unpreconditioned resid norm 8.119298425803e-05 true resid
> norm 8.119298425824e-05 ||r(i)||/||b|| 2.022255314935e-06
> >>>>> 81 KSP unpreconditioned resid norm 7.544528309154e-05 true resid
> norm 7.544528309154e-05 ||r(i)||/||b|| 1.879098620558e-06
> >>>>> 82 KSP unpreconditioned resid norm 6.755385041138e-05 true resid
> norm 6.755385041176e-05 ||r(i)||/||b|| 1.682548489719e-06
> >>>>> 83 KSP unpreconditioned resid norm 6.158629300870e-05 true resid
> norm 6.158629300835e-05 ||r(i)||/||b|| 1.533915885727e-06
> >>>>> 84 KSP unpreconditioned resid norm 5.358756885754e-05 true resid
> norm 5.358756885765e-05 ||r(i)||/||b|| 1.334693470462e-06
> >>>>> 85 KSP unpreconditioned resid norm 4.774852370380e-05 true resid
> norm 4.774852370387e-05 ||r(i)||/||b|| 1.189261692037e-06
> >>>>> 86 KSP unpreconditioned resid norm 3.919358737908e-05 true resid
> norm 3.919358737930e-05 ||r(i)||/||b|| 9.761858258229e-07
> >>>>> 87 KSP unpreconditioned resid norm 3.434042319950e-05 true resid
> norm 3.434042319947e-05 ||r(i)||/||b|| 8.553091620745e-07
> >>>>> 88 KSP unpreconditioned resid norm 2.813699436281e-05 true resid
> norm 2.813699436302e-05 ||r(i)||/||b|| 7.008017615898e-07
> >>>>> 89 KSP unpreconditioned resid norm 2.462248069068e-05 true resid
> norm 2.462248069051e-05 ||r(i)||/||b|| 6.132665635851e-07
> >>>>> 90 KSP unpreconditioned resid norm 2.040558789626e-05 true resid
> norm 2.040558789626e-05 ||r(i)||/||b|| 5.082373674841e-07
> >>>>> 91 KSP unpreconditioned resid norm 1.888523204468e-05 true resid
> norm 1.888523204470e-05 ||r(i)||/||b|| 4.703702077842e-07
> >>>>> 92 KSP unpreconditioned resid norm 1.707071292484e-05 true resid
> norm 1.707071292474e-05 ||r(i)||/||b|| 4.251763900191e-07
> >>>>> 93 KSP unpreconditioned resid norm 1.498636454665e-05 true resid
> norm 1.498636454672e-05 ||r(i)||/||b|| 3.732619958859e-07
> >>>>> 94 KSP unpreconditioned resid norm 1.219393542993e-05 true resid
> norm 1.219393543006e-05 ||r(i)||/||b|| 3.037115947725e-07
> >>>>> 95 KSP unpreconditioned resid norm 1.059996963300e-05 true resid
> norm 1.059996963303e-05 ||r(i)||/||b|| 2.640110487917e-07
> >>>>> 96 KSP unpreconditioned resid norm 9.099659872548e-06 true resid
> norm 9.099659873214e-06 ||r(i)||/||b|| 2.266431725699e-07
> >>>>> 97 KSP unpreconditioned resid norm 8.147347587295e-06 true resid
> norm 8.147347587584e-06 ||r(i)||/||b|| 2.029241456283e-07
> >>>>> 98 KSP unpreconditioned resid norm 7.167226146744e-06 true resid
> norm 7.167226146783e-06 ||r(i)||/||b|| 1.785124823418e-07
> >>>>> 99 KSP unpreconditioned resid norm 6.552540209538e-06 true resid
> norm 6.552540209577e-06 ||r(i)||/||b|| 1.632026385802e-07
> >>>>> 100 KSP unpreconditioned resid norm 5.767783600111e-06 true resid
> norm 5.767783600320e-06 ||r(i)||/||b|| 1.436568830140e-07
> >>>>> 101 KSP unpreconditioned resid norm 5.261057430584e-06 true resid
> norm 5.261057431144e-06 ||r(i)||/||b|| 1.310359688033e-07
> >>>>> 102 KSP unpreconditioned resid norm 4.715498525786e-06 true resid
> norm 4.715498525947e-06 ||r(i)||/||b|| 1.174478564100e-07
> >>>>> 103 KSP unpreconditioned resid norm 4.380052669622e-06 true resid
> norm 4.380052669825e-06 ||r(i)||/||b|| 1.090929822591e-07
> >>>>> 104 KSP unpreconditioned resid norm 3.911664470060e-06 true resid
> norm 3.911664470226e-06 ||r(i)||/||b|| 9.742694319496e-08
> >>>>> 105 KSP unpreconditioned resid norm 3.652211458315e-06 true resid
> norm 3.652211458259e-06 ||r(i)||/||b|| 9.096480564430e-08
> >>>>> 106 KSP unpreconditioned resid norm 3.387532128049e-06 true resid
> norm 3.387532128358e-06 ||r(i)||/||b|| 8.437249737363e-08
> >>>>> 107 KSP unpreconditioned resid norm 3.234218880987e-06 true resid
> norm 3.234218880798e-06 ||r(i)||/||b|| 8.055395895481e-08
> >>>>> 108 KSP unpreconditioned resid norm 3.016905196388e-06 true resid
> norm 3.016905196492e-06 ||r(i)||/||b|| 7.514137611763e-08
> >>>>> 109 KSP unpreconditioned resid norm 2.858246441921e-06 true resid
> norm 2.858246441975e-06 ||r(i)||/||b|| 7.118969836476e-08
> >>>>> 110 KSP unpreconditioned resid norm 2.637118810847e-06 true resid
> norm 2.637118810750e-06 ||r(i)||/||b|| 6.568212241336e-08
> >>>>> 111 KSP unpreconditioned resid norm 2.494976088717e-06 true resid
> norm 2.494976088700e-06 ||r(i)||/||b|| 6.214180574966e-08
> >>>>> 112 KSP unpreconditioned resid norm 2.270639574272e-06 true resid
> norm 2.270639574200e-06 ||r(i)||/||b|| 5.655430686750e-08
> >>>>> 113 KSP unpreconditioned resid norm 2.104988663813e-06 true resid
> norm 2.104988664169e-06 ||r(i)||/||b|| 5.242847707696e-08
> >>>>> 114 KSP unpreconditioned resid norm 1.889361127301e-06 true resid
> norm 1.889361127526e-06 ||r(i)||/||b|| 4.705789073868e-08
> >>>>> 115 KSP unpreconditioned resid norm 1.732367008052e-06 true resid
> norm 1.732367007971e-06 ||r(i)||/||b|| 4.314767367271e-08
> >>>>> 116 KSP unpreconditioned resid norm 1.509288268391e-06 true resid
> norm 1.509288268645e-06 ||r(i)||/||b|| 3.759150191264e-08
> >>>>> 117 KSP unpreconditioned resid norm 1.359169217644e-06 true resid
> norm 1.359169217445e-06 ||r(i)||/||b|| 3.385252062089e-08
> >>>>> 118 KSP unpreconditioned resid norm 1.180146337735e-06 true resid
> norm 1.180146337908e-06 ||r(i)||/||b|| 2.939363820703e-08
> >>>>> 119 KSP unpreconditioned resid norm 1.067757039683e-06 true resid
> norm 1.067757039924e-06 ||r(i)||/||b|| 2.659438335433e-08
> >>>>> 120 KSP unpreconditioned resid norm 9.435833073736e-07 true resid
> norm 9.435833073736e-07 ||r(i)||/||b|| 2.350161625235e-08
> >>>>> 121 KSP unpreconditioned resid norm 8.749457237613e-07 true resid
> norm 8.749457236791e-07 ||r(i)||/||b|| 2.179207546261e-08
> >>>>> 122 KSP unpreconditioned resid norm 7.945760150897e-07 true resid
> norm 7.945760150444e-07 ||r(i)||/||b|| 1.979032528762e-08
> >>>>> 123 KSP unpreconditioned resid norm 7.141240839013e-07 true resid
> norm 7.141240838682e-07 ||r(i)||/||b|| 1.778652721438e-08
> >>>>> 124 KSP unpreconditioned resid norm 6.300566936733e-07 true resid
> norm 6.300566936607e-07 ||r(i)||/||b|| 1.569267971988e-08
> >>>>> 125 KSP unpreconditioned resid norm 5.628986997544e-07 true resid
> norm 5.628986995849e-07 ||r(i)||/||b|| 1.401999073448e-08
> >>>>> 126 KSP unpreconditioned resid norm 5.119018951602e-07 true resid
> norm 5.119018951837e-07 ||r(i)||/||b|| 1.274982484900e-08
> >>>>> 127 KSP unpreconditioned resid norm 4.664670343748e-07 true resid
> norm 4.664670344042e-07 ||r(i)||/||b|| 1.161818903670e-08
> >>>>> 128 KSP unpreconditioned resid norm 4.253264691112e-07 true resid
> norm 4.253264691948e-07 ||r(i)||/||b|| 1.059351027394e-08
> >>>>> 129 KSP unpreconditioned resid norm 3.868921150516e-07 true resid
> norm 3.868921150517e-07 ||r(i)||/||b|| 9.636234498800e-09
> >>>>> 130 KSP unpreconditioned resid norm 3.558445658540e-07 true resid
> norm 3.558445660061e-07 ||r(i)||/||b|| 8.862940209315e-09
> >>>>> 131 KSP unpreconditioned resid norm 3.268710273840e-07 true resid
> norm 3.268710272455e-07 ||r(i)||/||b|| 8.141302825416e-09
> >>>>> 132 KSP unpreconditioned resid norm 3.041273897592e-07 true resid
> norm 3.041273896694e-07 ||r(i)||/||b|| 7.574832182794e-09
> >>>>> 133 KSP unpreconditioned resid norm 2.851926677922e-07 true resid
> norm 2.851926674248e-07 ||r(i)||/||b|| 7.103229333782e-09
> >>>>> 134 KSP unpreconditioned resid norm 2.694708315072e-07 true resid
> norm 2.694708309500e-07 ||r(i)||/||b|| 6.711649104748e-09
> >>>>> 135 KSP unpreconditioned resid norm 2.534825559099e-07 true resid
> norm 2.534825557469e-07 ||r(i)||/||b|| 6.313432746507e-09
> >>>>> 136 KSP unpreconditioned resid norm 2.387342352458e-07 true resid
> norm 2.387342351804e-07 ||r(i)||/||b|| 5.946099658254e-09
> >>>>> 137 KSP unpreconditioned resid norm 2.200861667617e-07 true resid
> norm 2.200861665255e-07 ||r(i)||/||b|| 5.481636425438e-09
> >>>>> 138 KSP unpreconditioned resid norm 2.051415370616e-07 true resid
> norm 2.051415370614e-07 ||r(i)||/||b|| 5.109413915824e-09
> >>>>> 139 KSP unpreconditioned resid norm 1.887376429396e-07 true resid
> norm 1.887376426682e-07 ||r(i)||/||b|| 4.700845824315e-09
> >>>>> 140 KSP unpreconditioned resid norm 1.729743133005e-07 true resid
> norm 1.729743128342e-07 ||r(i)||/||b|| 4.308232129561e-09
> >>>>> 141 KSP unpreconditioned resid norm 1.541021130781e-07 true resid
> norm 1.541021128364e-07 ||r(i)||/||b|| 3.838186508023e-09
> >>>>> 142 KSP unpreconditioned resid norm 1.384631628565e-07 true resid
> norm 1.384631627735e-07 ||r(i)||/||b|| 3.448670712125e-09
> >>>>> 143 KSP unpreconditioned resid norm 1.223114405626e-07 true resid
> norm 1.223114403883e-07 ||r(i)||/||b|| 3.046383411846e-09
> >>>>> 144 KSP unpreconditioned resid norm 1.087313066223e-07 true resid
> norm 1.087313065117e-07 ||r(i)||/||b|| 2.708146085550e-09
> >>>>> 145 KSP unpreconditioned resid norm 9.181901998734e-08 true resid
> norm 9.181901984268e-08 ||r(i)||/||b|| 2.286915582489e-09
> >>>>> 146 KSP unpreconditioned resid norm 7.885850510808e-08 true resid
> norm 7.885850531446e-08 ||r(i)||/||b|| 1.964110975313e-09
> >>>>> 147 KSP unpreconditioned resid norm 6.483393946950e-08 true resid
> norm 6.483393931383e-08 ||r(i)||/||b|| 1.614804278515e-09
> >>>>> 148 KSP unpreconditioned resid norm 5.690132597004e-08 true resid
> norm 5.690132577518e-08 ||r(i)||/||b|| 1.417228465328e-09
> >>>>> 149 KSP unpreconditioned resid norm 5.023671521579e-08 true resid
> norm 5.023671502186e-08 ||r(i)||/||b|| 1.251234511035e-09
> >>>>> 150 KSP unpreconditioned resid norm 4.625371062660e-08 true resid
> norm 4.625371062660e-08 ||r(i)||/||b|| 1.152030720445e-09
> >>>>> 151 KSP unpreconditioned resid norm 4.349049084805e-08 true resid
> norm 4.349049089337e-08 ||r(i)||/||b|| 1.083207830846e-09
> >>>>> 152 KSP unpreconditioned resid norm 3.932593324498e-08 true resid
> norm 3.932593376918e-08 ||r(i)||/||b|| 9.794821474546e-10
> >>>>> 153 KSP unpreconditioned resid norm 3.504167649202e-08 true resid
> norm 3.504167638113e-08 ||r(i)||/||b|| 8.727751166356e-10
> >>>>> 154 KSP unpreconditioned resid norm 2.892726347747e-08 true resid
> norm 2.892726348583e-08 ||r(i)||/||b|| 7.204848160858e-10
> >>>>> 155 KSP unpreconditioned resid norm 2.477647033202e-08 true resid
> norm 2.477647041570e-08 ||r(i)||/||b|| 6.171019508795e-10
> >>>>> 156 KSP unpreconditioned resid norm 2.128504065757e-08 true resid
> norm 2.128504067423e-08 ||r(i)||/||b|| 5.301416991298e-10
> >>>>> 157 KSP unpreconditioned resid norm 1.879248809429e-08 true resid
> norm 1.879248818928e-08 ||r(i)||/||b|| 4.680602575310e-10
> >>>>> 158 KSP unpreconditioned resid norm 1.673649140073e-08 true resid
> norm 1.673649134005e-08 ||r(i)||/||b|| 4.168520085200e-10
> >>>>> 159 KSP unpreconditioned resid norm 1.497123388109e-08 true resid
> norm 1.497123365569e-08 ||r(i)||/||b|| 3.728851342016e-10
> >>>>> 160 KSP unpreconditioned resid norm 1.315982130162e-08 true resid
> norm 1.315982149329e-08 ||r(i)||/||b|| 3.277687007261e-10
> >>>>> 161 KSP unpreconditioned resid norm 1.182395864938e-08 true resid
> norm 1.182395868430e-08 ||r(i)||/||b|| 2.944966675550e-10
> >>>>> 162 KSP unpreconditioned resid norm 1.070204481679e-08 true resid
> norm 1.070204466432e-08 ||r(i)||/||b|| 2.665534085342e-10
> >>>>> 163 KSP unpreconditioned resid norm 9.969290307649e-09 true resid
> norm 9.969290432333e-09 ||r(i)||/||b|| 2.483028644297e-10
> >>>>> 164 KSP unpreconditioned resid norm 9.134440883306e-09 true resid
> norm 9.134440980976e-09 ||r(i)||/||b|| 2.275094577628e-10
> >>>>> 165 KSP unpreconditioned resid norm 8.593316427292e-09 true resid
> norm 8.593316413360e-09 ||r(i)||/||b|| 2.140317904139e-10
> >>>>> 166 KSP unpreconditioned resid norm 8.042173048464e-09 true resid
> norm 8.042173332848e-09 ||r(i)||/||b|| 2.003045942277e-10
> >>>>> 167 KSP unpreconditioned resid norm 7.655518522782e-09 true resid
> norm 7.655518879144e-09 ||r(i)||/||b|| 1.906742791064e-10
> >>>>> 168 KSP unpreconditioned resid norm 7.210283391815e-09 true resid
> norm 7.210283220312e-09 ||r(i)||/||b|| 1.795848951442e-10
> >>>>> 169 KSP unpreconditioned resid norm 6.793967416271e-09 true resid
> norm 6.793967448832e-09 ||r(i)||/||b|| 1.692158122825e-10
> >>>>> 170 KSP unpreconditioned resid norm 6.249160304588e-09 true resid
> norm 6.249160382647e-09 ||r(i)||/||b|| 1.556464257736e-10
> >>>>> 171 KSP unpreconditioned resid norm 5.794936438798e-09 true resid
> norm 5.794936332552e-09 ||r(i)||/||b|| 1.443331699811e-10
> >>>>> 172 KSP unpreconditioned resid norm 5.222337397128e-09 true resid
> norm 5.222337443277e-09 ||r(i)||/||b|| 1.300715788135e-10
> >>>>> 173 KSP unpreconditioned resid norm 4.755359110447e-09 true resid
> norm 4.755358888996e-09 ||r(i)||/||b|| 1.184406494668e-10
> >>>>> 174 KSP unpreconditioned resid norm 4.317537007873e-09 true resid
> norm 4.317537267718e-09 ||r(i)||/||b|| 1.075359252630e-10
> >>>>> 175 KSP unpreconditioned resid norm 3.924177535665e-09 true resid
> norm 3.924177629720e-09 ||r(i)||/||b|| 9.773860563138e-11
> >>>>> 176 KSP unpreconditioned resid norm 3.502843065115e-09 true resid
> norm 3.502843126359e-09 ||r(i)||/||b|| 8.724452234855e-11
> >>>>> 177 KSP unpreconditioned resid norm 3.083873232869e-09 true resid
> norm 3.083873352938e-09 ||r(i)||/||b|| 7.680933686007e-11
> >>>>> 178 KSP unpreconditioned resid norm 2.758980676473e-09 true resid
> norm 2.758980618096e-09 ||r(i)||/||b|| 6.871730691658e-11
> >>>>> 179 KSP unpreconditioned resid norm 2.510978240429e-09 true resid
> norm 2.510978327392e-09 ||r(i)||/||b|| 6.254036989334e-11
> >>>>> 180 KSP unpreconditioned resid norm 2.323000193205e-09 true resid
> norm 2.323000193205e-09 ||r(i)||/||b|| 5.785844097519e-11
> >>>>> 181 KSP unpreconditioned resid norm 2.167480159274e-09 true resid
> norm 2.167480113693e-09 ||r(i)||/||b|| 5.398493749153e-11
> >>>>> 182 KSP unpreconditioned resid norm 1.983545827983e-09 true resid
> norm 1.983546404840e-09 ||r(i)||/||b|| 4.940374216139e-11
> >>>>> 183 KSP unpreconditioned resid norm 1.794576286774e-09 true resid
> norm 1.794576224361e-09 ||r(i)||/||b|| 4.469710457036e-11
> >>>>> 184 KSP unpreconditioned resid norm 1.583490590644e-09 true resid
> norm 1.583490380603e-09 ||r(i)||/||b|| 3.943963715064e-11
> >>>>> 185 KSP unpreconditioned resid norm 1.412659866247e-09 true resid
> norm 1.412659832191e-09 ||r(i)||/||b|| 3.518479927722e-11
> >>>>> 186 KSP unpreconditioned resid norm 1.285613344939e-09 true resid
> norm 1.285612984761e-09 ||r(i)||/||b|| 3.202047215205e-11
> >>>>> 187 KSP unpreconditioned resid norm 1.168115133929e-09 true resid
> norm 1.168114766904e-09 ||r(i)||/||b|| 2.909397058634e-11
> >>>>> 188 KSP unpreconditioned resid norm 1.063377926053e-09 true resid
> norm 1.063377647554e-09 ||r(i)||/||b|| 2.648530681802e-11
> >>>>> 189 KSP unpreconditioned resid norm 9.548967728122e-10 true resid
> norm 9.548964523410e-10 ||r(i)||/||b|| 2.378339019807e-11
> >>>>> KSP Object: 16 MPI processes
> >>>>> type: fgmres
> >>>>> restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> >>>>> happy breakdown tolerance 1e-30
> >>>>> maximum iterations=2000, initial guess is zero
> >>>>> tolerances: relative=1e-20, absolute=1e-09, divergence=10000.
> >>>>> right preconditioning
> >>>>> using UNPRECONDITIONED norm type for convergence test
> >>>>> PC Object: 16 MPI processes
> >>>>> type: bjacobi
> >>>>> number of blocks = 4
> >>>>> Local solver information for first block is in the following
> KSP and PC objects on rank 0:
> >>>>> Use -ksp_view ::ascii_info_detail to display information for
> all blocks
> >>>>> KSP Object: (sub_) 4 MPI processes
> >>>>> type: preonly
> >>>>> maximum iterations=10000, initial guess is zero
> >>>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> >>>>> left preconditioning
> >>>>> using NONE norm type for convergence test
> >>>>> PC Object: (sub_) 4 MPI processes
> >>>>> type: telescope
> >>>>> petsc subcomm: parent comm size reduction factor = 4
> >>>>> petsc subcomm: parent_size = 4 , subcomm_size = 1
> >>>>> petsc subcomm type = contiguous
> >>>>> linear system matrix = precond matrix:
> >>>>> Mat Object: (sub_) 4 MPI processes
> >>>>> type: mpiaij
> >>>>> rows=40200, cols=40200
> >>>>> total: nonzeros=199996, allocated nonzeros=203412
> >>>>> total number of mallocs used during MatSetValues calls=0
> >>>>> not using I-node (on process 0) routines
> >>>>> setup type: default
> >>>>> Parent DM object: NULL
> >>>>> Sub DM object: NULL
> >>>>> KSP Object: (sub_telescope_) 1 MPI processes
> >>>>> type: preonly
> >>>>> maximum iterations=10000, initial guess is zero
> >>>>> tolerances: relative=1e-05, absolute=1e-50,
> divergence=10000.
> >>>>> left preconditioning
> >>>>> using NONE norm type for convergence test
> >>>>> PC Object: (sub_telescope_) 1 MPI processes
> >>>>> type: lu
> >>>>> out-of-place factorization
> >>>>> tolerance for zero pivot 2.22045e-14
> >>>>> matrix ordering: external
> >>>>> factor fill ratio given 0., needed 0.
> >>>>> Factored matrix follows:
> >>>>> Mat Object: 1 MPI processes
> >>>>> type: mumps
> >>>>> rows=40200, cols=40200
> >>>>> package used to perform factorization: mumps
> >>>>> total: nonzeros=1849788, allocated
> nonzeros=1849788
> >>>>> MUMPS run parameters:
> >>>>> SYM (matrix type): 0
> >>>>> PAR (host participation): 1
> >>>>> ICNTL(1) (output for error): 6
> >>>>> ICNTL(2) (output of diagnostic msg): 0
> >>>>> ICNTL(3) (output for global info): 0
> >>>>> ICNTL(4) (level of printing): 0
> >>>>> ICNTL(5) (input mat struct): 0
> >>>>> ICNTL(6) (matrix prescaling): 7
> >>>>> ICNTL(7) (sequential matrix ordering):7
> >>>>> ICNTL(8) (scaling strategy): 77
> >>>>> ICNTL(10) (max num of refinements): 0
> >>>>> ICNTL(11) (error analysis): 0
> >>>>> ICNTL(12) (efficiency control): 1
> >>>>> ICNTL(13) (sequential factorization of the
> root node): 0
> >>>>> ICNTL(14) (percentage of estimated workspace
> increase): 20
> >>>>> ICNTL(18) (input mat struct): 0
> >>>>> ICNTL(19) (Schur complement info): 0
> >>>>> ICNTL(20) (RHS sparse pattern): 0
> >>>>> ICNTL(21) (solution struct): 0
> >>>>> ICNTL(22) (in-core/out-of-core facility):
> 0
> >>>>> ICNTL(23) (max size of memory can be
> allocated locally):0
> >>>>> ICNTL(24) (detection of null pivot rows):
> 0
> >>>>> ICNTL(25) (computation of a null space
> basis): 0
> >>>>> ICNTL(26) (Schur options for RHS or
> solution): 0
> >>>>> ICNTL(27) (blocking size for multiple RHS):
> -32
> >>>>> ICNTL(28) (use parallel or sequential
> ordering): 1
> >>>>> ICNTL(29) (parallel ordering): 0
> >>>>> ICNTL(30) (user-specified set of entries in
> inv(A)): 0
> >>>>> ICNTL(31) (factors is discarded in the solve
> phase): 0
> >>>>> ICNTL(33) (compute determinant): 0
> >>>>> ICNTL(35) (activate BLR based
> factorization): 0
> >>>>> ICNTL(36) (choice of BLR factorization
> variant): 0
> >>>>> ICNTL(38) (estimated compression rate of LU
> factors): 333
> >>>>> CNTL(1) (relative pivoting threshold):
> 0.01
> >>>>> CNTL(2) (stopping criterion of refinement):
> 1.49012e-08
> >>>>> CNTL(3) (absolute pivoting threshold): 0.
> >>>>> CNTL(4) (value of static pivoting):
> -1.
> >>>>> CNTL(5) (fixation for null pivots): 0.
> >>>>> CNTL(7) (dropping parameter for BLR): 0.
> >>>>> RINFO(1) (local estimated flops for the
> elimination after analysis):
> >>>>> [0] 1.45525e+08
> >>>>> RINFO(2) (local estimated flops for the
> assembly after factorization):
> >>>>> [0] 2.89397e+06
> >>>>> RINFO(3) (local estimated flops for the
> elimination after factorization):
> >>>>> [0] 1.45525e+08
> >>>>> INFO(15) (estimated size of (in MB) MUMPS
> internal data for running numerical factorization):
> >>>>> [0] 29
> >>>>> INFO(16) (size of (in MB) MUMPS internal data
> used during numerical factorization):
> >>>>> [0] 29
> >>>>> INFO(23) (num of pivots eliminated on this
> processor after factorization):
> >>>>> [0] 40200
> >>>>> RINFOG(1) (global estimated flops for the
> elimination after analysis): 1.45525e+08
> >>>>> RINFOG(2) (global estimated flops for the
> assembly after factorization): 2.89397e+06
> >>>>> RINFOG(3) (global estimated flops for the
> elimination after factorization): 1.45525e+08
> >>>>> (RINFOG(12) RINFOG(13))*2^INFOG(34)
> (determinant): (0.,0.)*(2^0)
> >>>>> INFOG(3) (estimated real workspace for
> factors on all processors after analysis): 1849788
> >>>>> INFOG(4) (estimated integer workspace for
> factors on all processors after analysis): 879986
> >>>>> INFOG(5) (estimated maximum front size in the
> complete tree): 282
> >>>>> INFOG(6) (number of nodes in the complete
> tree): 23709
> >>>>> INFOG(7) (ordering option effectively used
> after analysis): 5
> >>>>> INFOG(8) (structural symmetry in percent of
> the permuted matrix after analysis): 100
> >>>>> INFOG(9) (total real/complex workspace to
> store the matrix factors after factorization): 1849788
> >>>>> INFOG(10) (total integer space store the
> matrix factors after factorization): 879986
> >>>>> INFOG(11) (order of largest frontal matrix
> after factorization): 282
> >>>>> INFOG(12) (number of off-diagonal pivots): 0
> >>>>> INFOG(13) (number of delayed pivots after
> factorization): 0
> >>>>> INFOG(14) (number of memory compress after
> factorization): 0
> >>>>> INFOG(15) (number of steps of iterative
> refinement after solution): 0
> >>>>> INFOG(16) (estimated size (in MB) of all
> MUMPS internal data for factorization after analysis: value on the most
> memory consuming processor): 29
> >>>>> INFOG(17) (estimated size of all MUMPS
> internal data for factorization after analysis: sum over all processors): 29
> >>>>> INFOG(18) (size of all MUMPS internal data
> allocated during factorization: value on the most memory consuming
> processor): 29
> >>>>> INFOG(19) (size of all MUMPS internal data
> allocated during factorization: sum over all processors): 29
> >>>>> INFOG(20) (estimated number of entries in the
> factors): 1849788
> >>>>> INFOG(21) (size in MB of memory effectively
> used during factorization - value on the most memory consuming processor):
> 26
> >>>>> INFOG(22) (size in MB of memory effectively
> used during factorization - sum over all processors): 26
> >>>>> INFOG(23) (after analysis: value of ICNTL(6)
> effectively used): 0
> >>>>> INFOG(24) (after analysis: value of ICNTL(12)
> effectively used): 1
> >>>>> INFOG(25) (after factorization: number of
> pivots modified by static pivoting): 0
> >>>>> INFOG(28) (after factorization: number of
> null pivots encountered): 0
> >>>>> INFOG(29) (after factorization: effective
> number of entries in the factors (sum over all processors)): 1849788
> >>>>> INFOG(30, 31) (after solution: size in Mbytes
> of memory used during solution phase): 29, 29
> >>>>> INFOG(32) (after analysis: type of analysis
> done): 1
> >>>>> INFOG(33) (value used for ICNTL(8)): 7
> >>>>> INFOG(34) (exponent of the determinant if
> determinant is requested): 0
> >>>>> INFOG(35) (after factorization: number of
> entries taking into account BLR factor compression - sum over all
> processors): 1849788
> >>>>> INFOG(36) (after analysis: estimated size of
> all MUMPS internal data for running BLR in-core - value on the most memory
> consuming processor): 0
> >>>>> INFOG(37) (after analysis: estimated size of
> all MUMPS internal data for running BLR in-core - sum over all processors):
> 0
> >>>>> INFOG(38) (after analysis: estimated size of
> all MUMPS internal data for running BLR out-of-core - value on the most
> memory consuming processor): 0
> >>>>> INFOG(39) (after analysis: estimated size of
> all MUMPS internal data for running BLR out-of-core - sum over all
> processors): 0
> >>>>> linear system matrix = precond matrix:
> >>>>> Mat Object: 1 MPI processes
> >>>>> type: seqaijcusparse
> >>>>> rows=40200, cols=40200
> >>>>> total: nonzeros=199996, allocated nonzeros=199996
> >>>>> total number of mallocs used during MatSetValues calls=0
> >>>>> not using I-node routines
> >>>>> linear system matrix = precond matrix:
> >>>>> Mat Object: 16 MPI processes
> >>>>> type: mpiaijcusparse
> >>>>> rows=160800, cols=160800
> >>>>> total: nonzeros=802396, allocated nonzeros=1608000
> >>>>> total number of mallocs used during MatSetValues calls=0
> >>>>> not using I-node (on process 0) routines
> >>>>> Norm of error 9.11684e-07 iterations 189
> >>>>> Chang
> >>>>> On 10/14/21 10:10 PM, Chang Liu wrote:
> >>>>>> Hi Barry,
> >>>>>>
> >>>>>> No problem. Here is the output. It seems that the resid norm
> calculation is incorrect.
> >>>>>>
> >>>>>> $ mpiexec -n 16 --hostfile hostfile --oversubscribe ./ex7 -m 400
> -ksp_view -ksp_monitor_true_residual -pc_type bjacobi -pc_bjacobi_blocks 4
> -ksp_type fgmres -mat_type aijcusparse -sub_pc_type telescope -sub_ksp_type
> preonly -sub_telescope_ksp_type preonly -sub_telescope_pc_type lu
> -sub_telescope_pc_factor_mat_solver_type cusparse
> -sub_pc_telescope_reduction_factor 4 -sub_pc_telescope_subcomm_type
> contiguous -ksp_max_it 2000 -ksp_rtol 1.e-20 -ksp_atol 1.e-9
> >>>>>> 0 KSP unpreconditioned resid norm 4.014971979977e+01 true resid
> norm 4.014971979977e+01 ||r(i)||/||b|| 1.000000000000e+00
> >>>>>> 1 KSP unpreconditioned resid norm 0.000000000000e+00 true resid
> norm 4.014971979977e+01 ||r(i)||/||b|| 1.000000000000e+00
> >>>>>> KSP Object: 16 MPI processes
> >>>>>> type: fgmres
> >>>>>> restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> >>>>>> happy breakdown tolerance 1e-30
> >>>>>> maximum iterations=2000, initial guess is zero
> >>>>>> tolerances: relative=1e-20, absolute=1e-09, divergence=10000.
> >>>>>> right preconditioning
> >>>>>> using UNPRECONDITIONED norm type for convergence test
> >>>>>> PC Object: 16 MPI processes
> >>>>>> type: bjacobi
> >>>>>> number of blocks = 4
> >>>>>> Local solver information for first block is in the following
> KSP and PC objects on rank 0:
> >>>>>> Use -ksp_view ::ascii_info_detail to display information for
> all blocks
> >>>>>> KSP Object: (sub_) 4 MPI processes
> >>>>>> type: preonly
> >>>>>> maximum iterations=10000, initial guess is zero
> >>>>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> >>>>>> left preconditioning
> >>>>>> using NONE norm type for convergence test
> >>>>>> PC Object: (sub_) 4 MPI processes
> >>>>>> type: telescope
> >>>>>> petsc subcomm: parent comm size reduction factor = 4
> >>>>>> petsc subcomm: parent_size = 4 , subcomm_size = 1
> >>>>>> petsc subcomm type = contiguous
> >>>>>> linear system matrix = precond matrix:
> >>>>>> Mat Object: (sub_) 4 MPI processes
> >>>>>> type: mpiaij
> >>>>>> rows=40200, cols=40200
> >>>>>> total: nonzeros=199996, allocated nonzeros=203412
> >>>>>> total number of mallocs used during MatSetValues calls=0
> >>>>>> not using I-node (on process 0) routines
> >>>>>> setup type: default
> >>>>>> Parent DM object: NULL
> >>>>>> Sub DM object: NULL
> >>>>>> KSP Object: (sub_telescope_) 1 MPI processes
> >>>>>> type: preonly
> >>>>>> maximum iterations=10000, initial guess is zero
> >>>>>> tolerances: relative=1e-05, absolute=1e-50,
> divergence=10000.
> >>>>>> left preconditioning
> >>>>>> using NONE norm type for convergence test
> >>>>>> PC Object: (sub_telescope_) 1 MPI processes
> >>>>>> type: lu
> >>>>>> out-of-place factorization
> >>>>>> tolerance for zero pivot 2.22045e-14
> >>>>>> matrix ordering: nd
> >>>>>> factor fill ratio given 5., needed 8.62558
> >>>>>> Factored matrix follows:
> >>>>>> Mat Object: 1 MPI processes
> >>>>>> type: seqaijcusparse
> >>>>>> rows=40200, cols=40200
> >>>>>> package used to perform factorization: cusparse
> >>>>>> total: nonzeros=1725082, allocated
> nonzeros=1725082
> >>>>>> not using I-node routines
> >>>>>> linear system matrix = precond matrix:
> >>>>>> Mat Object: 1 MPI processes
> >>>>>> type: seqaijcusparse
> >>>>>> rows=40200, cols=40200
> >>>>>> total: nonzeros=199996, allocated nonzeros=199996
> >>>>>> total number of mallocs used during MatSetValues
> calls=0
> >>>>>> not using I-node routines
> >>>>>> linear system matrix = precond matrix:
> >>>>>> Mat Object: 16 MPI processes
> >>>>>> type: mpiaijcusparse
> >>>>>> rows=160800, cols=160800
> >>>>>> total: nonzeros=802396, allocated nonzeros=1608000
> >>>>>> total number of mallocs used during MatSetValues calls=0
> >>>>>> not using I-node (on process 0) routines
> >>>>>> Norm of error 400.999 iterations 1
> >>>>>>
> >>>>>> Chang
> >>>>>>
> >>>>>>
> >>>>>> On 10/14/21 9:47 PM, Barry Smith wrote:
> >>>>>>>
> >>>>>>> Chang,
> >>>>>>>
> >>>>>>> Sorry I did not notice that one. Please run that with
> -ksp_view -ksp_monitor_true_residual so we can see exactly how options are
> interpreted and solver used. At a glance it looks ok but something must be
> wrong to get the wrong answer.
> >>>>>>>
> >>>>>>> Barry
> >>>>>>>
> >>>>>>>> On Oct 14, 2021, at 6:02 PM, Chang Liu <cliu at pppl.gov> wrote:
> >>>>>>>>
> >>>>>>>> Hi Barry,
> >>>>>>>>
> >>>>>>>> That is exactly what I was doing in the second example, in which
> the preconditioner works but the GMRES does not.
> >>>>>>>>
> >>>>>>>> Chang
> >>>>>>>>
> >>>>>>>> On 10/14/21 5:15 PM, Barry Smith wrote:
> >>>>>>>>> You need to use the PCTELESCOPE inside the block Jacobi, not
> outside it. So something like -pc_type bjacobi -sub_pc_type telescope
> -sub_telescope_pc_type lu
> >>>>>>>>>> On Oct 14, 2021, at 4:14 PM, Chang Liu <cliu at pppl.gov> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hi Pierre,
> >>>>>>>>>>
> >>>>>>>>>> I wonder if the trick of PCTELESCOPE only works for
> preconditioner and not for the solver. I have done some tests, and find
> that for solving a small matrix using -telescope_ksp_type preonly, it does
> work for GPU with multiple MPI processes. However, for bjacobi and gmres,
> it does not work.
> >>>>>>>>>>
> >>>>>>>>>> The command line options I used for small matrix is like
> >>>>>>>>>>
> >>>>>>>>>> mpiexec -n 4 --oversubscribe ./ex7 -m 100 -ksp_monitor_short
> -pc_type telescope -mat_type aijcusparse -telescope_pc_type lu
> -telescope_pc_factor_mat_solver_type cusparse -telescope_ksp_type preonly
> -pc_telescope_reduction_factor 4
> >>>>>>>>>>
> >>>>>>>>>> which gives the correct output. For iterative solver, I tried
> >>>>>>>>>>
> >>>>>>>>>> mpiexec -n 16 --oversubscribe ./ex7 -m 400 -ksp_monitor_short
> -pc_type bjacobi -pc_bjacobi_blocks 4 -ksp_type fgmres -mat_type
> aijcusparse -sub_pc_type telescope -sub_ksp_type preonly
> -sub_telescope_ksp_type preonly -sub_telescope_pc_type lu
> -sub_telescope_pc_factor_mat_solver_type cusparse
> -sub_pc_telescope_reduction_factor 4 -ksp_max_it 2000 -ksp_rtol 1.e-9
> -ksp_atol 1.e-20
> >>>>>>>>>>
> >>>>>>>>>> for large matrix. The output is like
> >>>>>>>>>>
> >>>>>>>>>> 0 KSP Residual norm 40.1497
> >>>>>>>>>> 1 KSP Residual norm < 1.e-11
> >>>>>>>>>> Norm of error 400.999 iterations 1
> >>>>>>>>>>
> >>>>>>>>>> So it seems to call a direct solver instead of an iterative one.
> >>>>>>>>>>
> >>>>>>>>>> Can you please help check these options?
> >>>>>>>>>>
> >>>>>>>>>> Chang
> >>>>>>>>>>
> >>>>>>>>>> On 10/14/21 10:04 AM, Pierre Jolivet wrote:
> >>>>>>>>>>>> On 14 Oct 2021, at 3:50 PM, Chang Liu <cliu at pppl.gov> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thank you Pierre. I was not aware of PCTELESCOPE before. This
> sounds exactly what I need. I wonder if PCTELESCOPE can transform a
> mpiaijcusparse to seqaircusparse? Or I have to do it manually?
> >>>>>>>>>>> PCTELESCOPE uses MatCreateMPIMatConcatenateSeqMat().
> >>>>>>>>>>> 1) I’m not sure this is implemented for cuSparse matrices, but
> it should be;
> >>>>>>>>>>> 2) at least for the implementations
> MatCreateMPIMatConcatenateSeqMat_MPIBAIJ() and
> MatCreateMPIMatConcatenateSeqMat_MPIAIJ(), the resulting MatType is MATBAIJ
> (resp. MATAIJ). Constructors are usually “smart” enough to detect if the
> MPI communicator on which the Mat lives is of size 1 (your case), and then
> the resulting Mat is of type MatSeqX instead of MatMPIX, so you would not
> need to worry about the transformation you are mentioning.
> >>>>>>>>>>> If you try this out and this does not work, please provide the
> backtrace (probably something like “Operation XYZ not implemented for
> MatType ABC”), and hopefully someone can add the missing plumbing.
> >>>>>>>>>>> I do not claim that this will be efficient, but I think this
> goes in the direction of what you want to achieve.
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>> Pierre
> >>>>>>>>>>>> Chang
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 10/14/21 1:35 AM, Pierre Jolivet wrote:
> >>>>>>>>>>>>> Maybe I’m missing something, but can’t you use PCTELESCOPE
> as a subdomain solver, with a reduction factor equal to the number of MPI
> processes you have per block?
> >>>>>>>>>>>>> -sub_pc_type telescope -sub_pc_telescope_reduction_factor X
> -sub_telescope_pc_type lu
> >>>>>>>>>>>>> This does not work with MUMPS -mat_mumps_use_omp_threads
> because not only do the Mat needs to be redistributed, the secondary
> processes also need to be “converted” to OpenMP threads.
> >>>>>>>>>>>>> Thus the need for specific code in mumps.c.
> >>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>> Pierre
> >>>>>>>>>>>>>> On 14 Oct 2021, at 6:00 AM, Chang Liu via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi Junchao,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Yes that is what I want.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Chang
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 10/13/21 11:42 PM, Junchao Zhang wrote:
> >>>>>>>>>>>>>>> On Wed, Oct 13, 2021 at 8:58 PM Barry Smith <
> bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
> >>>>>>>>>>>>>>> Junchao,
> >>>>>>>>>>>>>>> If I understand correctly Chang is using the
> block Jacobi
> >>>>>>>>>>>>>>> method with a single block for a number of MPI ranks
> and a direct
> >>>>>>>>>>>>>>> solver for each block so it uses
> PCSetUp_BJacobi_Multiproc() which
> >>>>>>>>>>>>>>> is code Hong Zhang wrote a number of years ago for
> CPUs. For their
> >>>>>>>>>>>>>>> particular problems this preconditioner works well,
> but using an
> >>>>>>>>>>>>>>> iterative solver on the blocks does not work well.
> >>>>>>>>>>>>>>> If we had complete MPI-GPU direct solvers he
> could just use
> >>>>>>>>>>>>>>> the current code with MPIAIJCUSPARSE on each block
> but since we do
> >>>>>>>>>>>>>>> not he would like to use a single GPU for each block,
> this means
> >>>>>>>>>>>>>>> that diagonal blocks of the global parallel MPI
> matrix needs to be
> >>>>>>>>>>>>>>> sent to a subset of the GPUs (one GPU per block,
> which has multiple
> >>>>>>>>>>>>>>> MPI ranks associated with the blocks). Similarly for
> the triangular
> >>>>>>>>>>>>>>> solves the blocks of the right hand side needs to be
> shipped to the
> >>>>>>>>>>>>>>> appropriate GPU and the resulting solution shipped
> back to the
> >>>>>>>>>>>>>>> multiple GPUs. So Chang is absolutely correct, this
> is somewhat like
> >>>>>>>>>>>>>>> your code for MUMPS with OpenMP. OK, I now understand
> the background..
> >>>>>>>>>>>>>>> One could use PCSetUp_BJacobi_Multiproc() and get the
> blocks on the
> >>>>>>>>>>>>>>> MPI ranks and then shrink each block down to a single
> GPU but this
> >>>>>>>>>>>>>>> would be pretty inefficient, ideally one would go
> directly from the
> >>>>>>>>>>>>>>> big MPI matrix on all the GPUs to the sub matrices on
> the subset of
> >>>>>>>>>>>>>>> GPUs. But this may be a large coding project.
> >>>>>>>>>>>>>>> I don't understand these sentences. Why do you say
> "shrink"? In my mind, we just need to move each block (submatrix) living
> over multiple MPI ranks to one of them and solve directly there. In other
> words, we keep blocks' size, no shrinking or expanding.
> >>>>>>>>>>>>>>> As mentioned before, cusparse does not provide LU
> factorization. So the LU factorization would be done on CPU, and the solve
> be done on GPU. I assume Chang wants to gain from the (potential) faster
> solve (instead of factorization) on GPU.
> >>>>>>>>>>>>>>> Barry
> >>>>>>>>>>>>>>> Since the matrices being factored and solved directly
> are relatively
> >>>>>>>>>>>>>>> large it is possible that the cusparse code could be
> reasonably
> >>>>>>>>>>>>>>> efficient (they are not the tiny problems one gets at
> the coarse
> >>>>>>>>>>>>>>> level of multigrid). Of course, this is speculation,
> I don't
> >>>>>>>>>>>>>>> actually know how much better the cusparse code would
> be on the
> >>>>>>>>>>>>>>> direct solver than a good CPU direct sparse solver.
> >>>>>>>>>>>>>>> > On Oct 13, 2021, at 9:32 PM, Chang Liu <
> cliu at pppl.gov
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov>> wrote:
> >>>>>>>>>>>>>>> >
> >>>>>>>>>>>>>>> > Sorry I am not familiar with the details either.
> Can you please
> >>>>>>>>>>>>>>> check the code in MatMumpsGatherNonzerosOnMaster in
> mumps.c?
> >>>>>>>>>>>>>>> >
> >>>>>>>>>>>>>>> > Chang
> >>>>>>>>>>>>>>> >
> >>>>>>>>>>>>>>> > On 10/13/21 9:24 PM, Junchao Zhang wrote:
> >>>>>>>>>>>>>>> >> Hi Chang,
> >>>>>>>>>>>>>>> >> I did the work in mumps. It is easy for me to
> understand
> >>>>>>>>>>>>>>> gathering matrix rows to one process.
> >>>>>>>>>>>>>>> >> But how to gather blocks (submatrices) to form
> a large block? Can you draw a picture of that?
> >>>>>>>>>>>>>>> >> Thanks
> >>>>>>>>>>>>>>> >> --Junchao Zhang
> >>>>>>>>>>>>>>> >> On Wed, Oct 13, 2021 at 7:47 PM Chang Liu via
> petsc-users
> >>>>>>>>>>>>>>> <petsc-users at mcs.anl.gov <mailto:
> petsc-users at mcs.anl.gov>
> >>>>>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov <mailto:
> petsc-users at mcs.anl.gov>>>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>> >> Hi Barry,
> >>>>>>>>>>>>>>> >> I think mumps solver in petsc does support
> that. You can
> >>>>>>>>>>>>>>> check the
> >>>>>>>>>>>>>>> >> documentation on "-mat_mumps_use_omp_threads"
> at
> >>>>>>>>>>>>>>> >>
> >>>>>>>>>>>>>>>
> https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html
> >>>>>>>>>>>>>>> <
> https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html>
> >>>>>>>>>>>>>>> >> <
> https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html
> >>>>>>>>>>>>>>> <
> https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html>>
> >>>>>>>>>>>>>>> >> and the code enclosed by #if
> >>>>>>>>>>>>>>> defined(PETSC_HAVE_OPENMP_SUPPORT) in
> >>>>>>>>>>>>>>> >> functions MatMumpsSetUpDistRHSInfo and
> >>>>>>>>>>>>>>> >> MatMumpsGatherNonzerosOnMaster in
> >>>>>>>>>>>>>>> >> mumps.c
> >>>>>>>>>>>>>>> >> 1. I understand it is ideal to do one MPI rank
> per GPU.
> >>>>>>>>>>>>>>> However, I am
> >>>>>>>>>>>>>>> >> working on an existing code that was developed
> based on MPI
> >>>>>>>>>>>>>>> and the the
> >>>>>>>>>>>>>>> >> # of mpi ranks is typically equal to # of cpu
> cores. We don't
> >>>>>>>>>>>>>>> want to
> >>>>>>>>>>>>>>> >> change the whole structure of the code.
> >>>>>>>>>>>>>>> >> 2. What you have suggested has been coded in
> mumps.c. See
> >>>>>>>>>>>>>>> function
> >>>>>>>>>>>>>>> >> MatMumpsSetUpDistRHSInfo.
> >>>>>>>>>>>>>>> >> Regards,
> >>>>>>>>>>>>>>> >> Chang
> >>>>>>>>>>>>>>> >> On 10/13/21 7:53 PM, Barry Smith wrote:
> >>>>>>>>>>>>>>> >> >
> >>>>>>>>>>>>>>> >> >
> >>>>>>>>>>>>>>> >> >> On Oct 13, 2021, at 3:50 PM, Chang Liu <
> cliu at pppl.gov
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
> wrote:
> >>>>>>>>>>>>>>> >> >>
> >>>>>>>>>>>>>>> >> >> Hi Barry,
> >>>>>>>>>>>>>>> >> >>
> >>>>>>>>>>>>>>> >> >> That is exactly what I want.
> >>>>>>>>>>>>>>> >> >>
> >>>>>>>>>>>>>>> >> >> Back to my original question, I am looking
> for an approach to
> >>>>>>>>>>>>>>> >> transfer
> >>>>>>>>>>>>>>> >> >> matrix
> >>>>>>>>>>>>>>> >> >> data from many MPI processes to "master"
> MPI
> >>>>>>>>>>>>>>> >> >> processes, each of which taking care of
> one GPU, and then
> >>>>>>>>>>>>>>> upload
> >>>>>>>>>>>>>>> >> the data to GPU to
> >>>>>>>>>>>>>>> >> >> solve.
> >>>>>>>>>>>>>>> >> >> One can just grab some codes from mumps.c
> to
> >>>>>>>>>>>>>>> aijcusparse.cu <http://aijcusparse.cu>
> >>>>>>>>>>>>>>> >> <http://aijcusparse.cu <http://aijcusparse.cu
> >>.
> >>>>>>>>>>>>>>> >> >
> >>>>>>>>>>>>>>> >> > mumps.c doesn't actually do that. It
> never needs to
> >>>>>>>>>>>>>>> copy the
> >>>>>>>>>>>>>>> >> entire matrix to a single MPI rank.
> >>>>>>>>>>>>>>> >> >
> >>>>>>>>>>>>>>> >> > It would be possible to write such a
> code that you
> >>>>>>>>>>>>>>> suggest but
> >>>>>>>>>>>>>>> >> it is not clear that it makes sense
> >>>>>>>>>>>>>>> >> >
> >>>>>>>>>>>>>>> >> > 1) For normal PETSc GPU usage there is one
> GPU per MPI
> >>>>>>>>>>>>>>> rank, so
> >>>>>>>>>>>>>>> >> while your one GPU per big domain is solving
> its systems the
> >>>>>>>>>>>>>>> other
> >>>>>>>>>>>>>>> >> GPUs (with the other MPI ranks that share that
> domain) are doing
> >>>>>>>>>>>>>>> >> nothing.
> >>>>>>>>>>>>>>> >> >
> >>>>>>>>>>>>>>> >> > 2) For each triangular solve you would have
> to gather the
> >>>>>>>>>>>>>>> right
> >>>>>>>>>>>>>>> >> hand side from the multiple ranks to the
> single GPU to pass it to
> >>>>>>>>>>>>>>> >> the GPU solver and then scatter the resulting
> solution back
> >>>>>>>>>>>>>>> to all
> >>>>>>>>>>>>>>> >> of its subdomain ranks.
> >>>>>>>>>>>>>>> >> >
> >>>>>>>>>>>>>>> >> > What I was suggesting was assign an
> entire subdomain to a
> >>>>>>>>>>>>>>> >> single MPI rank, thus it does everything on
> one GPU and can
> >>>>>>>>>>>>>>> use the
> >>>>>>>>>>>>>>> >> GPU solver directly. If all the major
> computations of a subdomain
> >>>>>>>>>>>>>>> >> can fit and be done on a single GPU then you
> would be
> >>>>>>>>>>>>>>> utilizing all
> >>>>>>>>>>>>>>> >> the GPUs you are using effectively.
> >>>>>>>>>>>>>>> >> >
> >>>>>>>>>>>>>>> >> > Barry
> >>>>>>>>>>>>>>> >> >
> >>>>>>>>>>>>>>> >> >
> >>>>>>>>>>>>>>> >> >
> >>>>>>>>>>>>>>> >> >>
> >>>>>>>>>>>>>>> >> >> Chang
> >>>>>>>>>>>>>>> >> >>
> >>>>>>>>>>>>>>> >> >> On 10/13/21 1:53 PM, Barry Smith wrote:
> >>>>>>>>>>>>>>> >> >>> Chang,
> >>>>>>>>>>>>>>> >> >>> You are correct there is no MPI +
> GPU direct
> >>>>>>>>>>>>>>> solvers that
> >>>>>>>>>>>>>>> >> currently do the triangular solves with MPI +
> GPU parallelism
> >>>>>>>>>>>>>>> that I
> >>>>>>>>>>>>>>> >> am aware of. You are limited that individual
> triangular solves be
> >>>>>>>>>>>>>>> >> done on a single GPU. I can only suggest
> making each subdomain as
> >>>>>>>>>>>>>>> >> big as possible to utilize each GPU as much as
> possible for the
> >>>>>>>>>>>>>>> >> direct triangular solves.
> >>>>>>>>>>>>>>> >> >>> Barry
> >>>>>>>>>>>>>>> >> >>>> On Oct 13, 2021, at 12:16 PM, Chang Liu
> via petsc-users
> >>>>>>>>>>>>>>> >> <petsc-users at mcs.anl.gov <mailto:
> petsc-users at mcs.anl.gov>
> >>>>>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov <mailto:
> petsc-users at mcs.anl.gov>>>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>> >> >>>>
> >>>>>>>>>>>>>>> >> >>>> Hi Mark,
> >>>>>>>>>>>>>>> >> >>>>
> >>>>>>>>>>>>>>> >> >>>> '-mat_type aijcusparse' works with
> mpiaijcusparse with
> >>>>>>>>>>>>>>> other
> >>>>>>>>>>>>>>> >> solvers, but with -pc_factor_mat_solver_type
> cusparse, it
> >>>>>>>>>>>>>>> will give
> >>>>>>>>>>>>>>> >> an error.
> >>>>>>>>>>>>>>> >> >>>>
> >>>>>>>>>>>>>>> >> >>>> Yes what I want is to have mumps or
> superlu to do the
> >>>>>>>>>>>>>>> >> factorization, and then do the rest, including
> GMRES solver,
> >>>>>>>>>>>>>>> on gpu.
> >>>>>>>>>>>>>>> >> Is that possible?
> >>>>>>>>>>>>>>> >> >>>>
> >>>>>>>>>>>>>>> >> >>>> I have tried to use aijcusparse with
> superlu_dist, it
> >>>>>>>>>>>>>>> runs but
> >>>>>>>>>>>>>>> >> the iterative solver is still running on CPUs.
> I have
> >>>>>>>>>>>>>>> contacted the
> >>>>>>>>>>>>>>> >> superlu group and they confirmed that is the
> case right now.
> >>>>>>>>>>>>>>> But if
> >>>>>>>>>>>>>>> >> I set -pc_factor_mat_solver_type cusparse, it
> seems that the
> >>>>>>>>>>>>>>> >> iterative solver is running on GPU.
> >>>>>>>>>>>>>>> >> >>>>
> >>>>>>>>>>>>>>> >> >>>> Chang
> >>>>>>>>>>>>>>> >> >>>>
> >>>>>>>>>>>>>>> >> >>>> On 10/13/21 12:03 PM, Mark Adams wrote:
> >>>>>>>>>>>>>>> >> >>>>> On Wed, Oct 13, 2021 at 11:10 AM Chang
> Liu
> >>>>>>>>>>>>>>> <cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
> wrote:
> >>>>>>>>>>>>>>> >> >>>>> Thank you Junchao for explaining
> this. I guess in
> >>>>>>>>>>>>>>> my case
> >>>>>>>>>>>>>>> >> the code is
> >>>>>>>>>>>>>>> >> >>>>> just calling a seq solver like
> superlu to do
> >>>>>>>>>>>>>>> >> factorization on GPUs.
> >>>>>>>>>>>>>>> >> >>>>> My idea is that I want to have a
> traditional MPI
> >>>>>>>>>>>>>>> code to
> >>>>>>>>>>>>>>> >> utilize GPUs
> >>>>>>>>>>>>>>> >> >>>>> with cusparse. Right now cusparse
> does not support
> >>>>>>>>>>>>>>> mpiaij
> >>>>>>>>>>>>>>> >> matrix, Sure it does: '-mat_type aijcusparse'
> will give you an
> >>>>>>>>>>>>>>> >> mpiaijcusparse matrix with > 1 processes.
> >>>>>>>>>>>>>>> >> >>>>> (-mat_type mpiaijcusparse might also
> work with >1 proc).
> >>>>>>>>>>>>>>> >> >>>>> However, I see in grepping the repo
> that all the mumps and
> >>>>>>>>>>>>>>> >> superlu tests use aij or sell matrix type.
> >>>>>>>>>>>>>>> >> >>>>> MUMPS and SuperLU provide their own
> solves, I assume
> >>>>>>>>>>>>>>> .... but
> >>>>>>>>>>>>>>> >> you might want to do other matrix operations
> on the GPU. Is
> >>>>>>>>>>>>>>> that the
> >>>>>>>>>>>>>>> >> issue?
> >>>>>>>>>>>>>>> >> >>>>> Did you try -mat_type aijcusparse with
> MUMPS and/or
> >>>>>>>>>>>>>>> SuperLU
> >>>>>>>>>>>>>>> >> have a problem? (no test with it so it
> probably does not work)
> >>>>>>>>>>>>>>> >> >>>>> Thanks,
> >>>>>>>>>>>>>>> >> >>>>> Mark
> >>>>>>>>>>>>>>> >> >>>>> so I
> >>>>>>>>>>>>>>> >> >>>>> want the code to have a mpiaij
> matrix when adding
> >>>>>>>>>>>>>>> all the
> >>>>>>>>>>>>>>> >> matrix terms,
> >>>>>>>>>>>>>>> >> >>>>> and then transform the matrix to
> seqaij when doing the
> >>>>>>>>>>>>>>> >> factorization
> >>>>>>>>>>>>>>> >> >>>>> and
> >>>>>>>>>>>>>>> >> >>>>> solve. This involves sending the
> data to the master
> >>>>>>>>>>>>>>> >> process, and I
> >>>>>>>>>>>>>>> >> >>>>> think
> >>>>>>>>>>>>>>> >> >>>>> the petsc mumps solver have
> something similar already.
> >>>>>>>>>>>>>>> >> >>>>> Chang
> >>>>>>>>>>>>>>> >> >>>>> On 10/13/21 10:18 AM, Junchao Zhang
> wrote:
> >>>>>>>>>>>>>>> >> >>>>> >
> >>>>>>>>>>>>>>> >> >>>>> >
> >>>>>>>>>>>>>>> >> >>>>> >
> >>>>>>>>>>>>>>> >> >>>>> > On Tue, Oct 12, 2021 at 1:07 PM
> Mark Adams
> >>>>>>>>>>>>>>> >> <mfadams at lbl.gov <mailto:mfadams at lbl.gov>
> >>>>>>>>>>>>>>> <mailto:mfadams at lbl.gov <mailto:mfadams at lbl.gov>>
> >>>>>>>>>>>>>>> >> >>>>> <mailto:mfadams at lbl.gov <mailto:
> mfadams at lbl.gov>
> >>>>>>>>>>>>>>> <mailto:mfadams at lbl.gov <mailto:mfadams at lbl.gov>>>
> >>>>>>>>>>>>>>> >> >>>>> > <mailto:mfadams at lbl.gov
> >>>>>>>>>>>>>>> <mailto:mfadams at lbl.gov> <mailto:mfadams at lbl.gov
> >>>>>>>>>>>>>>> <mailto:mfadams at lbl.gov>>
> >>>>>>>>>>>>>>> >> <mailto:mfadams at lbl.gov <mailto:
> mfadams at lbl.gov>
> >>>>>>>>>>>>>>> <mailto:mfadams at lbl.gov <mailto:mfadams at lbl.gov>>>>>
> wrote:
> >>>>>>>>>>>>>>> >> >>>>> >
> >>>>>>>>>>>>>>> >> >>>>> >
> >>>>>>>>>>>>>>> >> >>>>> >
> >>>>>>>>>>>>>>> >> >>>>> > On Tue, Oct 12, 2021 at 1:45
> PM Chang Liu
> >>>>>>>>>>>>>>> >> <cliu at pppl.gov <mailto:cliu at pppl.gov> <mailto:
> cliu at pppl.gov
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov>>
> >>>>>>>>>>>>>>> >> >>>>> <mailto:cliu at pppl.gov <mailto:
> cliu at pppl.gov>
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
> >>>>>>>>>>>>>>> >> >>>>> > <mailto:cliu at pppl.gov
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:
> cliu at pppl.gov>>
> >>>>>>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>
> wrote:
> >>>>>>>>>>>>>>> >> >>>>> >
> >>>>>>>>>>>>>>> >> >>>>> > Hi Mark,
> >>>>>>>>>>>>>>> >> >>>>> >
> >>>>>>>>>>>>>>> >> >>>>> > The option I use is like
> >>>>>>>>>>>>>>> >> >>>>> >
> >>>>>>>>>>>>>>> >> >>>>> > -pc_type bjacobi
> -pc_bjacobi_blocks 16
> >>>>>>>>>>>>>>> >> -ksp_type fgmres
> >>>>>>>>>>>>>>> >> >>>>> -mat_type
> >>>>>>>>>>>>>>> >> >>>>> > aijcusparse
> *-sub_pc_factor_mat_solver_type
> >>>>>>>>>>>>>>> >> cusparse
> >>>>>>>>>>>>>>> >> >>>>> *-sub_ksp_type
> >>>>>>>>>>>>>>> >> >>>>> > preonly *-sub_pc_type
> lu* -ksp_max_it 2000
> >>>>>>>>>>>>>>> >> -ksp_rtol 1.e-300
> >>>>>>>>>>>>>>> >> >>>>> > -ksp_atol 1.e-300
> >>>>>>>>>>>>>>> >> >>>>> >
> >>>>>>>>>>>>>>> >> >>>>> >
> >>>>>>>>>>>>>>> >> >>>>> > Note, If you use -log_view
> the last column
> >>>>>>>>>>>>>>> (rows
> >>>>>>>>>>>>>>> >> are the
> >>>>>>>>>>>>>>> >> >>>>> method like
> >>>>>>>>>>>>>>> >> >>>>> > MatFactorNumeric) has the
> percent of work
> >>>>>>>>>>>>>>> in the GPU.
> >>>>>>>>>>>>>>> >> >>>>> >
> >>>>>>>>>>>>>>> >> >>>>> > Junchao: *This* implies that
> we have a
> >>>>>>>>>>>>>>> cuSparse LU
> >>>>>>>>>>>>>>> >> >>>>> factorization. Is
> >>>>>>>>>>>>>>> >> >>>>> > that correct? (I don't think
> we do)
> >>>>>>>>>>>>>>> >> >>>>> >
> >>>>>>>>>>>>>>> >> >>>>> > No, we don't have cuSparse LU
> factorization. If you check
> >>>>>>>>>>>>>>> >> >>>>> >
> MatLUFactorSymbolic_SeqAIJCUSPARSE(),you will
> >>>>>>>>>>>>>>> find it
> >>>>>>>>>>>>>>> >> calls
> >>>>>>>>>>>>>>> >> >>>>> > MatLUFactorSymbolic_SeqAIJ()
> instead.
> >>>>>>>>>>>>>>> >> >>>>> > So I don't understand Chang's
> idea. Do you want to
> >>>>>>>>>>>>>>> >> make bigger
> >>>>>>>>>>>>>>> >> >>>>> blocks?
> >>>>>>>>>>>>>>> >> >>>>> >
> >>>>>>>>>>>>>>> >> >>>>> >
> >>>>>>>>>>>>>>> >> >>>>> > I think this one do both
> factorization and
> >>>>>>>>>>>>>>> >> solve on gpu.
> >>>>>>>>>>>>>>> >> >>>>> >
> >>>>>>>>>>>>>>> >> >>>>> > You can check the
> >>>>>>>>>>>>>>> runex72_aijcusparse.sh file
> >>>>>>>>>>>>>>> >> in petsc
> >>>>>>>>>>>>>>> >> >>>>> install
> >>>>>>>>>>>>>>> >> >>>>> > directory, and try it
> your self (this
> >>>>>>>>>>>>>>> is only lu
> >>>>>>>>>>>>>>> >> >>>>> factorization
> >>>>>>>>>>>>>>> >> >>>>> > without
> >>>>>>>>>>>>>>> >> >>>>> > iterative solve).
> >>>>>>>>>>>>>>> >> >>>>> >
> >>>>>>>>>>>>>>> >> >>>>> > Chang
> >>>>>>>>>>>>>>> >> >>>>> >
> >>>>>>>>>>>>>>> >> >>>>> > On 10/12/21 1:17 PM,
> Mark Adams wrote:
> >>>>>>>>>>>>>>> >> >>>>> > >
> >>>>>>>>>>>>>>> >> >>>>> > >
> >>>>>>>>>>>>>>> >> >>>>> > > On Tue, Oct 12, 2021
> at 11:19 AM
> >>>>>>>>>>>>>>> Chang Liu
> >>>>>>>>>>>>>>> >> >>>>> <cliu at pppl.gov <mailto:
> cliu at pppl.gov>
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> >>>>>>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
> >>>>>>>>>>>>>>> >> >>>>> > <mailto:cliu at pppl.gov
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:
> cliu at pppl.gov>>
> >>>>>>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
> >>>>>>>>>>>>>>> >> >>>>> > > <mailto:cliu at pppl.gov
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov> <mailto:
> cliu at pppl.gov
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov>>>
> >>>>>>>>>>>>>>> >> >>>>> <mailto:cliu at pppl.gov <mailto:
> cliu at pppl.gov>
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> >>>>>>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>>
> wrote:
> >>>>>>>>>>>>>>> >> >>>>> > >
> >>>>>>>>>>>>>>> >> >>>>> > > Hi Junchao,
> >>>>>>>>>>>>>>> >> >>>>> > >
> >>>>>>>>>>>>>>> >> >>>>> > > No I only needs
> it to be transferred
> >>>>>>>>>>>>>>> >> within a
> >>>>>>>>>>>>>>> >> >>>>> node. I use
> >>>>>>>>>>>>>>> >> >>>>> > block-Jacobi
> >>>>>>>>>>>>>>> >> >>>>> > > method and GMRES
> to solve the sparse
> >>>>>>>>>>>>>>> >> matrix, so each
> >>>>>>>>>>>>>>> >> >>>>> > direct solver will
> >>>>>>>>>>>>>>> >> >>>>> > > take care of a
> sub-block of the
> >>>>>>>>>>>>>>> whole
> >>>>>>>>>>>>>>> >> matrix. In this
> >>>>>>>>>>>>>>> >> >>>>> > way, I can use
> >>>>>>>>>>>>>>> >> >>>>> > > one
> >>>>>>>>>>>>>>> >> >>>>> > > GPU to solve one
> sub-block, which is
> >>>>>>>>>>>>>>> >> stored within
> >>>>>>>>>>>>>>> >> >>>>> one node.
> >>>>>>>>>>>>>>> >> >>>>> > >
> >>>>>>>>>>>>>>> >> >>>>> > > It was stated in
> the
> >>>>>>>>>>>>>>> documentation that
> >>>>>>>>>>>>>>> >> cusparse
> >>>>>>>>>>>>>>> >> >>>>> solver
> >>>>>>>>>>>>>>> >> >>>>> > is slow.
> >>>>>>>>>>>>>>> >> >>>>> > > However, in my
> test using
> >>>>>>>>>>>>>>> ex72.c, the
> >>>>>>>>>>>>>>> >> cusparse
> >>>>>>>>>>>>>>> >> >>>>> solver is
> >>>>>>>>>>>>>>> >> >>>>> > faster than
> >>>>>>>>>>>>>>> >> >>>>> > > mumps or
> superlu_dist on CPUs.
> >>>>>>>>>>>>>>> >> >>>>> > >
> >>>>>>>>>>>>>>> >> >>>>> > >
> >>>>>>>>>>>>>>> >> >>>>> > > Are we talking about
> the
> >>>>>>>>>>>>>>> factorization, the
> >>>>>>>>>>>>>>> >> solve, or
> >>>>>>>>>>>>>>> >> >>>>> both?
> >>>>>>>>>>>>>>> >> >>>>> > >
> >>>>>>>>>>>>>>> >> >>>>> > > We do not have an
> interface to
> >>>>>>>>>>>>>>> cuSparse's LU
> >>>>>>>>>>>>>>> >> >>>>> factorization (I
> >>>>>>>>>>>>>>> >> >>>>> > just
> >>>>>>>>>>>>>>> >> >>>>> > > learned that it
> exists a few weeks ago).
> >>>>>>>>>>>>>>> >> >>>>> > > Perhaps your fast
> "cusparse solver" is
> >>>>>>>>>>>>>>> >> '-pc_type lu
> >>>>>>>>>>>>>>> >> >>>>> -mat_type
> >>>>>>>>>>>>>>> >> >>>>> > > aijcusparse' ? This
> would be the CPU
> >>>>>>>>>>>>>>> >> factorization,
> >>>>>>>>>>>>>>> >> >>>>> which is the
> >>>>>>>>>>>>>>> >> >>>>> > > dominant cost.
> >>>>>>>>>>>>>>> >> >>>>> > >
> >>>>>>>>>>>>>>> >> >>>>> > >
> >>>>>>>>>>>>>>> >> >>>>> > > Chang
> >>>>>>>>>>>>>>> >> >>>>> > >
> >>>>>>>>>>>>>>> >> >>>>> > > On 10/12/21 10:24
> AM, Junchao
> >>>>>>>>>>>>>>> Zhang wrote:
> >>>>>>>>>>>>>>> >> >>>>> > > > Hi, Chang,
> >>>>>>>>>>>>>>> >> >>>>> > > > For the
> mumps solver, we
> >>>>>>>>>>>>>>> usually
> >>>>>>>>>>>>>>> >> transfers
> >>>>>>>>>>>>>>> >> >>>>> matrix
> >>>>>>>>>>>>>>> >> >>>>> > and vector
> >>>>>>>>>>>>>>> >> >>>>> > > data
> >>>>>>>>>>>>>>> >> >>>>> > > > within a
> compute node. For
> >>>>>>>>>>>>>>> the idea you
> >>>>>>>>>>>>>>> >> >>>>> propose, it
> >>>>>>>>>>>>>>> >> >>>>> > looks like
> >>>>>>>>>>>>>>> >> >>>>> > > we need
> >>>>>>>>>>>>>>> >> >>>>> > > > to gather data
> within
> >>>>>>>>>>>>>>> >> MPI_COMM_WORLD, right?
> >>>>>>>>>>>>>>> >> >>>>> > > >
> >>>>>>>>>>>>>>> >> >>>>> > > > Mark, I
> remember you said
> >>>>>>>>>>>>>>> >> cusparse solve is
> >>>>>>>>>>>>>>> >> >>>>> slow
> >>>>>>>>>>>>>>> >> >>>>> > and you would
> >>>>>>>>>>>>>>> >> >>>>> > > > rather do it
> on CPU. Is it right?
> >>>>>>>>>>>>>>> >> >>>>> > > >
> >>>>>>>>>>>>>>> >> >>>>> > > > --Junchao Zhang
> >>>>>>>>>>>>>>> >> >>>>> > > >
> >>>>>>>>>>>>>>> >> >>>>> > > >
> >>>>>>>>>>>>>>> >> >>>>> > > > On Mon, Oct
> 11, 2021 at 10:25 PM
> >>>>>>>>>>>>>>> >> Chang Liu via
> >>>>>>>>>>>>>>> >> >>>>> petsc-users
> >>>>>>>>>>>>>>> >> >>>>> > > > <
> petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
> >>>>>>>>>>>>>>> >> <mailto:petsc-users at mcs.anl.gov <mailto:
> petsc-users at mcs.anl.gov>>
> >>>>>>>>>>>>>>> >> >>>>> <mailto:petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
> >>>>>>>>>>>>>>> >> <mailto:petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>>
> >>>>>>>>>>>>>>> >> >>>>> > <mailto:
> petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
> >>>>>>>>>>>>>>> >> <mailto:petsc-users at mcs.anl.gov <mailto:
> petsc-users at mcs.anl.gov>>
> >>>>>>>>>>>>>>> >> >>>>> <mailto:petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
> >>>>>>>>>>>>>>> >> <mailto:petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>>> <mailto:
> petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
> >>>>>>>>>>>>>>> >> <mailto:petsc-users at mcs.anl.gov <mailto:
> petsc-users at mcs.anl.gov>>
> >>>>>>>>>>>>>>> >> >>>>> <mailto:petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
> >>>>>>>>>>>>>>> >> <mailto:petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>>
> >>>>>>>>>>>>>>> >> >>>>> > <mailto:
> petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
> >>>>>>>>>>>>>>> >> <mailto:petsc-users at mcs.anl.gov <mailto:
> petsc-users at mcs.anl.gov>>
> >>>>>>>>>>>>>>> >> >>>>> <mailto:petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
> >>>>>>>>>>>>>>> >> <mailto:petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>>>>
> >>>>>>>>>>>>>>> >> >>>>> > > <mailto:
> petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
> >>>>>>>>>>>>>>> >> <mailto:petsc-users at mcs.anl.gov <mailto:
> petsc-users at mcs.anl.gov>>
> >>>>>>>>>>>>>>> >> >>>>> <mailto:petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
> >>>>>>>>>>>>>>> >> <mailto:petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>>
> >>>>>>>>>>>>>>> >> >>>>> > <mailto:
> petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
> >>>>>>>>>>>>>>> >> <mailto:petsc-users at mcs.anl.gov <mailto:
> petsc-users at mcs.anl.gov>>
> >>>>>>>>>>>>>>> >> >>>>> <mailto:petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
> >>>>>>>>>>>>>>> >> <mailto:petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>>> <mailto:
> petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
> >>>>>>>>>>>>>>> >> <mailto:petsc-users at mcs.anl.gov <mailto:
> petsc-users at mcs.anl.gov>>
> >>>>>>>>>>>>>>> >> >>>>> <mailto:petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
> >>>>>>>>>>>>>>> >> <mailto:petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>>
> >>>>>>>>>>>>>>> >> >>>>> > <mailto:
> petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
> >>>>>>>>>>>>>>> >> <mailto:petsc-users at mcs.anl.gov <mailto:
> petsc-users at mcs.anl.gov>>
> >>>>>>>>>>>>>>> >> >>>>> <mailto:petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
> >>>>>>>>>>>>>>> >> <mailto:petsc-users at mcs.anl.gov
> >>>>>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>>>>>>
> >>>>>>>>>>>>>>> >> >>>>> > > wrote:
> >>>>>>>>>>>>>>> >> >>>>> > > >
> >>>>>>>>>>>>>>> >> >>>>> > > > Hi,
> >>>>>>>>>>>>>>> >> >>>>> > > >
> >>>>>>>>>>>>>>> >> >>>>> > > > Currently,
> it is possible
> >>>>>>>>>>>>>>> to use
> >>>>>>>>>>>>>>> >> mumps
> >>>>>>>>>>>>>>> >> >>>>> solver in
> >>>>>>>>>>>>>>> >> >>>>> > PETSC with
> >>>>>>>>>>>>>>> >> >>>>> > > >
> -mat_mumps_use_omp_threads
> >>>>>>>>>>>>>>> >> option, so that
> >>>>>>>>>>>>>>> >> >>>>> > multiple MPI
> >>>>>>>>>>>>>>> >> >>>>> > > processes will
> >>>>>>>>>>>>>>> >> >>>>> > > > transfer
> the matrix and
> >>>>>>>>>>>>>>> rhs data
> >>>>>>>>>>>>>>> >> to the master
> >>>>>>>>>>>>>>> >> >>>>> > rank, and then
> >>>>>>>>>>>>>>> >> >>>>> > > master
> >>>>>>>>>>>>>>> >> >>>>> > > > rank will
> call mumps with
> >>>>>>>>>>>>>>> OpenMP
> >>>>>>>>>>>>>>> >> to solve
> >>>>>>>>>>>>>>> >> >>>>> the matrix.
> >>>>>>>>>>>>>>> >> >>>>> > > >
> >>>>>>>>>>>>>>> >> >>>>> > > > I wonder
> if someone can
> >>>>>>>>>>>>>>> develop
> >>>>>>>>>>>>>>> >> similar
> >>>>>>>>>>>>>>> >> >>>>> option for
> >>>>>>>>>>>>>>> >> >>>>> > cusparse
> >>>>>>>>>>>>>>> >> >>>>> > > solver.
> >>>>>>>>>>>>>>> >> >>>>> > > > Right now,
> this solver
> >>>>>>>>>>>>>>> does not
> >>>>>>>>>>>>>>> >> work with
> >>>>>>>>>>>>>>> >> >>>>> > mpiaijcusparse. I
> >>>>>>>>>>>>>>> >> >>>>> > > think a
> >>>>>>>>>>>>>>> >> >>>>> > > > possible
> workaround is to
> >>>>>>>>>>>>>>> >> transfer all the
> >>>>>>>>>>>>>>> >> >>>>> matrix
> >>>>>>>>>>>>>>> >> >>>>> > data to one MPI
> >>>>>>>>>>>>>>> >> >>>>> > > > process,
> and then upload the
> >>>>>>>>>>>>>>> >> data to GPU to
> >>>>>>>>>>>>>>> >> >>>>> solve.
> >>>>>>>>>>>>>>> >> >>>>> > In this
> >>>>>>>>>>>>>>> >> >>>>> > > way, one can
> >>>>>>>>>>>>>>> >> >>>>> > > > use
> cusparse solver for a MPI
> >>>>>>>>>>>>>>> >> program.
> >>>>>>>>>>>>>>> >> >>>>> > > >
> >>>>>>>>>>>>>>> >> >>>>> > > > Chang
> >>>>>>>>>>>>>>> >> >>>>> > > > --
> >>>>>>>>>>>>>>> >> >>>>> > > > Chang Liu
> >>>>>>>>>>>>>>> >> >>>>> > > > Staff
> Research Physicist
> >>>>>>>>>>>>>>> >> >>>>> > > > +1 609 243
> 3438
> >>>>>>>>>>>>>>> >> >>>>> > > > cliu at pppl.gov
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:
> cliu at pppl.gov>>
> >>>>>>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
> >>>>>>>>>>>>>>> >> >>>>> <mailto:cliu at pppl.gov <mailto:
> cliu at pppl.gov>
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> >>>>>>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
> >>>>>>>>>>>>>>> >> >>>>> > <mailto:cliu at pppl.gov
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:
> cliu at pppl.gov>>
> >>>>>>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
> >>>>>>>>>>>>>>> >> >>>>> <mailto:cliu at pppl.gov <mailto:
> cliu at pppl.gov>
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> >>>>>>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>
> >>>>>>>>>>>>>>> >> >>>>> > <mailto:cliu at pppl.gov
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:
> cliu at pppl.gov>>
> >>>>>>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
> >>>>>>>>>>>>>>> >> >>>>> <mailto:cliu at pppl.gov <mailto:
> cliu at pppl.gov>
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> >>>>>>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
> >>>>>>>>>>>>>>> >> >>>>> > > <mailto:
> cliu at pppl.gov
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov> <mailto:
> cliu at pppl.gov
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov>>>
> >>>>>>>>>>>>>>> >> >>>>> <mailto:cliu at pppl.gov <mailto:
> cliu at pppl.gov>
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> >>>>>>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>>
> >>>>>>>>>>>>>>> >> >>>>> > > > Princeton
> Plasma Physics
> >>>>>>>>>>>>>>> Laboratory
> >>>>>>>>>>>>>>> >> >>>>> > > > 100
> Stellarator Rd,
> >>>>>>>>>>>>>>> Princeton NJ
> >>>>>>>>>>>>>>> >> 08540, USA
> >>>>>>>>>>>>>>> >> >>>>> > > >
> >>>>>>>>>>>>>>> >> >>>>> > >
> >>>>>>>>>>>>>>> >> >>>>> > > --
> >>>>>>>>>>>>>>> >> >>>>> > > Chang Liu
> >>>>>>>>>>>>>>> >> >>>>> > > Staff Research
> Physicist
> >>>>>>>>>>>>>>> >> >>>>> > > +1 609 243 3438
> >>>>>>>>>>>>>>> >> >>>>> > > cliu at pppl.gov
> <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> >>>>>>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
> >>>>>>>>>>>>>>> >> >>>>> <mailto:cliu at pppl.gov <mailto:
> cliu at pppl.gov>
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> >>>>>>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> >>>>>>>>>>>>>>> >> >>>>> <mailto:cliu at pppl.gov <mailto:
> cliu at pppl.gov>
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
> >>>>>>>>>>>>>>> >> >>>>> > <mailto:cliu at pppl.gov
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:
> cliu at pppl.gov>>
> >>>>>>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>
> >>>>>>>>>>>>>>> >> >>>>> > > Princeton Plasma
> Physics Laboratory
> >>>>>>>>>>>>>>> >> >>>>> > > 100 Stellarator
> Rd, Princeton NJ
> >>>>>>>>>>>>>>> 08540, USA
> >>>>>>>>>>>>>>> >> >>>>> > >
> >>>>>>>>>>>>>>> >> >>>>> >
> >>>>>>>>>>>>>>> >> >>>>> > --
> >>>>>>>>>>>>>>> >> >>>>> > Chang Liu
> >>>>>>>>>>>>>>> >> >>>>> > Staff Research Physicist
> >>>>>>>>>>>>>>> >> >>>>> > +1 609 243 3438
> >>>>>>>>>>>>>>> >> >>>>> > cliu at pppl.gov <mailto:
> cliu at pppl.gov>
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> >>>>>>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
> <mailto:cliu at pppl.gov
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> >>>>>>>>>>>>>>> >> >>>>> <mailto:cliu at pppl.gov <mailto:
> cliu at pppl.gov>
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
> >>>>>>>>>>>>>>> >> >>>>> > Princeton Plasma Physics
> Laboratory
> >>>>>>>>>>>>>>> >> >>>>> > 100 Stellarator Rd,
> Princeton NJ 08540, USA
> >>>>>>>>>>>>>>> >> >>>>> >
> >>>>>>>>>>>>>>> >> >>>>> -- Chang Liu
> >>>>>>>>>>>>>>> >> >>>>> Staff Research Physicist
> >>>>>>>>>>>>>>> >> >>>>> +1 609 243 3438
> >>>>>>>>>>>>>>> >> >>>>> cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> <mailto:cliu at pppl.gov
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
> >>>>>>>>>>>>>>> >> >>>>> Princeton Plasma Physics Laboratory
> >>>>>>>>>>>>>>> >> >>>>> 100 Stellarator Rd, Princeton NJ
> 08540, USA
> >>>>>>>>>>>>>>> >> >>>>
> >>>>>>>>>>>>>>> >> >>>> --
> >>>>>>>>>>>>>>> >> >>>> Chang Liu
> >>>>>>>>>>>>>>> >> >>>> Staff Research Physicist
> >>>>>>>>>>>>>>> >> >>>> +1 609 243 3438
> >>>>>>>>>>>>>>> >> >>>> cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> >>>>>>>>>>>>>>> >> >>>> Princeton Plasma Physics Laboratory
> >>>>>>>>>>>>>>> >> >>>> 100 Stellarator Rd, Princeton NJ 08540,
> USA
> >>>>>>>>>>>>>>> >> >>
> >>>>>>>>>>>>>>> >> >> --
> >>>>>>>>>>>>>>> >> >> Chang Liu
> >>>>>>>>>>>>>>> >> >> Staff Research Physicist
> >>>>>>>>>>>>>>> >> >> +1 609 243 3438
> >>>>>>>>>>>>>>> >> >> cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> >>>>>>>>>>>>>>> >> >> Princeton Plasma Physics Laboratory
> >>>>>>>>>>>>>>> >> >> 100 Stellarator Rd, Princeton NJ 08540, USA
> >>>>>>>>>>>>>>> >> >
> >>>>>>>>>>>>>>> >> -- Chang Liu
> >>>>>>>>>>>>>>> >> Staff Research Physicist
> >>>>>>>>>>>>>>> >> +1 609 243 3438
> >>>>>>>>>>>>>>> >> cliu at pppl.gov <mailto:cliu at pppl.gov> <mailto:
> cliu at pppl.gov
> >>>>>>>>>>>>>>> <mailto:cliu at pppl.gov>>
> >>>>>>>>>>>>>>> >> Princeton Plasma Physics Laboratory
> >>>>>>>>>>>>>>> >> 100 Stellarator Rd, Princeton NJ 08540, USA
> >>>>>>>>>>>>>>> >
> >>>>>>>>>>>>>>> > --
> >>>>>>>>>>>>>>> > Chang Liu
> >>>>>>>>>>>>>>> > Staff Research Physicist
> >>>>>>>>>>>>>>> > +1 609 243 3438
> >>>>>>>>>>>>>>> > cliu at pppl.gov <mailto:cliu at pppl.gov>
> >>>>>>>>>>>>>>> > Princeton Plasma Physics Laboratory
> >>>>>>>>>>>>>>> > 100 Stellarator Rd, Princeton NJ 08540, USA
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>> Chang Liu
> >>>>>>>>>>>>>> Staff Research Physicist
> >>>>>>>>>>>>>> +1 609 243 3438
> >>>>>>>>>>>>>> cliu at pppl.gov
> >>>>>>>>>>>>>> Princeton Plasma Physics Laboratory
> >>>>>>>>>>>>>> 100 Stellarator Rd, Princeton NJ 08540, USA
> >>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>> Chang Liu
> >>>>>>>>>>>> Staff Research Physicist
> >>>>>>>>>>>> +1 609 243 3438
> >>>>>>>>>>>> cliu at pppl.gov
> >>>>>>>>>>>> Princeton Plasma Physics Laboratory
> >>>>>>>>>>>> 100 Stellarator Rd, Princeton NJ 08540, USA
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> Chang Liu
> >>>>>>>>>> Staff Research Physicist
> >>>>>>>>>> +1 609 243 3438
> >>>>>>>>>> cliu at pppl.gov
> >>>>>>>>>> Princeton Plasma Physics Laboratory
> >>>>>>>>>> 100 Stellarator Rd, Princeton NJ 08540, USA
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Chang Liu
> >>>>>>>> Staff Research Physicist
> >>>>>>>> +1 609 243 3438
> >>>>>>>> cliu at pppl.gov
> >>>>>>>> Princeton Plasma Physics Laboratory
> >>>>>>>> 100 Stellarator Rd, Princeton NJ 08540, USA
> >>>>>>>
> >>>>>>
> >>>>
> >>>> --
> >>>> Chang Liu
> >>>> Staff Research Physicist
> >>>> +1 609 243 3438
> >>>> cliu at pppl.gov
> >>>> Princeton Plasma Physics Laboratory
> >>>> 100 Stellarator Rd, Princeton NJ 08540, USA
> >>
> >> --
> >> Chang Liu
> >> Staff Research Physicist
> >> +1 609 243 3438
> >> cliu at pppl.gov
> >> Princeton Plasma Physics Laboratory
> >> 100 Stellarator Rd, Princeton NJ 08540, USA
> >
>
> --
> Chang Liu
> Staff Research Physicist
> +1 609 243 3438
> cliu at pppl.gov
> Princeton Plasma Physics Laboratory
> 100 Stellarator Rd, Princeton NJ 08540, USA
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20211020/83d91451/attachment-0001.html>
More information about the petsc-users
mailing list