[petsc-users] [External] Re: request to add an option similar to use_omp_threads for mumps to cusparse solver

Thu Oct 14 21:39:34 CDT 2021

Hi Pierre and Barry,

I think maybe I should use telescope outside bjacobi? like this

mpiexec -n 16 --hostfile hostfile --oversubscribe ./ex7 -m 400 -ksp_view 
-ksp_monitor_true_residual -pc_type telescope 
-pc_telescope_reduction_factor 4 -t
elescope_pc_type bjacobi -telescope_ksp_type fgmres 
-telescope_pc_bjacobi_blocks 4 -mat_type aijcusparse 
-telescope_sub_ksp_type preonly -telescope_sub_pc_type lu 
-telescope_sub_pc_factor_mat_solve
r_type cusparse -ksp_max_it 2000 -ksp_rtol 1.e-20 -ksp_atol 1.e-9

But then I got an error that

[0]PETSC ERROR: MatSolverType cusparse does not support matrix type seqaij

But the mat type should be aijcusparse. I think telescope change the mat 
type.

Chang

On 10/14/21 10:11 PM, Chang Liu wrote:
> For comparison, here is the output using mumps instead of cusparse
> 
> $ mpiexec -n 16 --hostfile hostfile --oversubscribe ./ex7 -m 400 
> -ksp_view -ksp_monitor_true_residual -pc_type bjacobi -pc_bjacobi_blocks 
> 4 -ksp_type fgmres -mat_type aijcusparse -sub_pc_type telescope 
> -sub_ksp_type preonly -sub_telescope_ksp_type preonly 
> -sub_telescope_pc_type lu -sub_telescope_pc_factor_mat_solver_type mumps 
> -sub_pc_telescope_reduction_factor 4 -sub_pc_telescope_subcomm_type 
> contiguous -ksp_max_it 2000 -ksp_rtol 1.e-20 -ksp_atol 1.e-9
>    0 KSP unpreconditioned resid norm 4.014971979977e+01 true resid norm 
> 4.014971979977e+01 ||r(i)||/||b|| 1.000000000000e+00
>    1 KSP unpreconditioned resid norm 2.439995191694e+00 true resid norm 
> 2.439995191694e+00 ||r(i)||/||b|| 6.077240896978e-02
>    2 KSP unpreconditioned resid norm 1.280694102588e+00 true resid norm 
> 1.280694102588e+00 ||r(i)||/||b|| 3.189795866509e-02
>    3 KSP unpreconditioned resid norm 1.041100266810e+00 true resid norm 
> 1.041100266810e+00 ||r(i)||/||b|| 2.593044912896e-02
>    4 KSP unpreconditioned resid norm 7.274347137268e-01 true resid norm 
> 7.274347137268e-01 ||r(i)||/||b|| 1.811805206499e-02
>    5 KSP unpreconditioned resid norm 5.429229329787e-01 true resid norm 
> 5.429229329787e-01 ||r(i)||/||b|| 1.352245882876e-02
>    6 KSP unpreconditioned resid norm 4.332970410353e-01 true resid norm 
> 4.332970410353e-01 ||r(i)||/||b|| 1.079203150598e-02
>    7 KSP unpreconditioned resid norm 3.948206050950e-01 true resid norm 
> 3.948206050950e-01 ||r(i)||/||b|| 9.833707609019e-03
>    8 KSP unpreconditioned resid norm 3.379580577269e-01 true resid norm 
> 3.379580577269e-01 ||r(i)||/||b|| 8.417444988714e-03
>    9 KSP unpreconditioned resid norm 2.875593971410e-01 true resid norm 
> 2.875593971410e-01 ||r(i)||/||b|| 7.162176936105e-03
>   10 KSP unpreconditioned resid norm 2.533983363244e-01 true resid norm 
> 2.533983363244e-01 ||r(i)||/||b|| 6.311335112378e-03
>   11 KSP unpreconditioned resid norm 2.389169921094e-01 true resid norm 
> 2.389169921094e-01 ||r(i)||/||b|| 5.950651543793e-03
>   12 KSP unpreconditioned resid norm 2.118961639089e-01 true resid norm 
> 2.118961639089e-01 ||r(i)||/||b|| 5.277649880637e-03
>   13 KSP unpreconditioned resid norm 1.885892030223e-01 true resid norm 
> 1.885892030223e-01 ||r(i)||/||b|| 4.697148671593e-03
>   14 KSP unpreconditioned resid norm 1.763510666948e-01 true resid norm 
> 1.763510666948e-01 ||r(i)||/||b|| 4.392336175055e-03
>   15 KSP unpreconditioned resid norm 1.638219366731e-01 true resid norm 
> 1.638219366731e-01 ||r(i)||/||b|| 4.080275964317e-03
>   16 KSP unpreconditioned resid norm 1.476792766432e-01 true resid norm 
> 1.476792766432e-01 ||r(i)||/||b|| 3.678214378076e-03
>   17 KSP unpreconditioned resid norm 1.349906937321e-01 true resid norm 
> 1.349906937321e-01 ||r(i)||/||b|| 3.362182710248e-03
>   18 KSP unpreconditioned resid norm 1.289673236836e-01 true resid norm 
> 1.289673236836e-01 ||r(i)||/||b|| 3.212159993314e-03
>   19 KSP unpreconditioned resid norm 1.167505658153e-01 true resid norm 
> 1.167505658153e-01 ||r(i)||/||b|| 2.907879965230e-03
>   20 KSP unpreconditioned resid norm 1.046037988999e-01 true resid norm 
> 1.046037988999e-01 ||r(i)||/||b|| 2.605343185995e-03
>   21 KSP unpreconditioned resid norm 9.832660514331e-02 true resid norm 
> 9.832660514331e-02 ||r(i)||/||b|| 2.448998539309e-03
>   22 KSP unpreconditioned resid norm 8.835618950141e-02 true resid norm 
> 8.835618950142e-02 ||r(i)||/||b|| 2.200667649539e-03
>   23 KSP unpreconditioned resid norm 7.563496650115e-02 true resid norm 
> 7.563496650116e-02 ||r(i)||/||b|| 1.883823022386e-03
>   24 KSP unpreconditioned resid norm 6.651291376834e-02 true resid norm 
> 6.651291376834e-02 ||r(i)||/||b|| 1.656622115921e-03
>   25 KSP unpreconditioned resid norm 5.890393227906e-02 true resid norm 
> 5.890393227906e-02 ||r(i)||/||b|| 1.467106933070e-03
>   26 KSP unpreconditioned resid norm 4.661992782780e-02 true resid norm 
> 4.661992782780e-02 ||r(i)||/||b|| 1.161152009536e-03
>   27 KSP unpreconditioned resid norm 3.690705358716e-02 true resid norm 
> 3.690705358716e-02 ||r(i)||/||b|| 9.192356452602e-04
>   28 KSP unpreconditioned resid norm 3.209680460188e-02 true resid norm 
> 3.209680460188e-02 ||r(i)||/||b|| 7.994278605666e-04
>   29 KSP unpreconditioned resid norm 2.354337626000e-02 true resid norm 
> 2.354337626001e-02 ||r(i)||/||b|| 5.863895533373e-04
>   30 KSP unpreconditioned resid norm 1.701296561785e-02 true resid norm 
> 1.701296561785e-02 ||r(i)||/||b|| 4.237380908932e-04
>   31 KSP unpreconditioned resid norm 1.509942937258e-02 true resid norm 
> 1.509942937258e-02 ||r(i)||/||b|| 3.760780759588e-04
>   32 KSP unpreconditioned resid norm 1.258274688515e-02 true resid norm 
> 1.258274688515e-02 ||r(i)||/||b|| 3.133956338402e-04
>   33 KSP unpreconditioned resid norm 9.805748771638e-03 true resid norm 
> 9.805748771638e-03 ||r(i)||/||b|| 2.442295692359e-04
>   34 KSP unpreconditioned resid norm 8.596552678160e-03 true resid norm 
> 8.596552678160e-03 ||r(i)||/||b|| 2.141123953301e-04
>   35 KSP unpreconditioned resid norm 6.936406707500e-03 true resid norm 
> 6.936406707500e-03 ||r(i)||/||b|| 1.727635147167e-04
>   36 KSP unpreconditioned resid norm 5.533741607932e-03 true resid norm 
> 5.533741607932e-03 ||r(i)||/||b|| 1.378276519869e-04
>   37 KSP unpreconditioned resid norm 4.982347757923e-03 true resid norm 
> 4.982347757923e-03 ||r(i)||/||b|| 1.240942099414e-04
>   38 KSP unpreconditioned resid norm 4.309608348059e-03 true resid norm 
> 4.309608348059e-03 ||r(i)||/||b|| 1.073384414524e-04
>   39 KSP unpreconditioned resid norm 3.729408303186e-03 true resid norm 
> 3.729408303185e-03 ||r(i)||/||b|| 9.288753001974e-05
>   40 KSP unpreconditioned resid norm 3.490003351128e-03 true resid norm 
> 3.490003351128e-03 ||r(i)||/||b|| 8.692472496776e-05
>   41 KSP unpreconditioned resid norm 3.069012426454e-03 true resid norm 
> 3.069012426453e-03 ||r(i)||/||b|| 7.643919912166e-05
>   42 KSP unpreconditioned resid norm 2.772928845284e-03 true resid norm 
> 2.772928845284e-03 ||r(i)||/||b|| 6.906471225983e-05
>   43 KSP unpreconditioned resid norm 2.561454192399e-03 true resid norm 
> 2.561454192398e-03 ||r(i)||/||b|| 6.379756085902e-05
>   44 KSP unpreconditioned resid norm 2.253662762802e-03 true resid norm 
> 2.253662762802e-03 ||r(i)||/||b|| 5.613146926159e-05
>   45 KSP unpreconditioned resid norm 2.086800523919e-03 true resid norm 
> 2.086800523919e-03 ||r(i)||/||b|| 5.197546917701e-05
>   46 KSP unpreconditioned resid norm 1.926028182896e-03 true resid norm 
> 1.926028182896e-03 ||r(i)||/||b|| 4.797114880257e-05
>   47 KSP unpreconditioned resid norm 1.769243808622e-03 true resid norm 
> 1.769243808622e-03 ||r(i)||/||b|| 4.406615581492e-05
>   48 KSP unpreconditioned resid norm 1.656654905964e-03 true resid norm 
> 1.656654905964e-03 ||r(i)||/||b|| 4.126192945371e-05
>   49 KSP unpreconditioned resid norm 1.572052627273e-03 true resid norm 
> 1.572052627273e-03 ||r(i)||/||b|| 3.915475961260e-05
>   50 KSP unpreconditioned resid norm 1.454960682355e-03 true resid norm 
> 1.454960682355e-03 ||r(i)||/||b|| 3.623837699518e-05
>   51 KSP unpreconditioned resid norm 1.375985053014e-03 true resid norm 
> 1.375985053014e-03 ||r(i)||/||b|| 3.427134883820e-05
>   52 KSP unpreconditioned resid norm 1.269325501087e-03 true resid norm 
> 1.269325501087e-03 ||r(i)||/||b|| 3.161480347603e-05
>   53 KSP unpreconditioned resid norm 1.184791772965e-03 true resid norm 
> 1.184791772965e-03 ||r(i)||/||b|| 2.950934100844e-05
>   54 KSP unpreconditioned resid norm 1.064535156080e-03 true resid norm 
> 1.064535156080e-03 ||r(i)||/||b|| 2.651413662135e-05
>   55 KSP unpreconditioned resid norm 9.639036688120e-04 true resid norm 
> 9.639036688117e-04 ||r(i)||/||b|| 2.400773090370e-05
>   56 KSP unpreconditioned resid norm 8.632359780260e-04 true resid norm 
> 8.632359780260e-04 ||r(i)||/||b|| 2.150042347322e-05
>   57 KSP unpreconditioned resid norm 7.613605783850e-04 true resid norm 
> 7.613605783850e-04 ||r(i)||/||b|| 1.896303591113e-05
>   58 KSP unpreconditioned resid norm 6.681073248348e-04 true resid norm 
> 6.681073248349e-04 ||r(i)||/||b|| 1.664039819373e-05
>   59 KSP unpreconditioned resid norm 5.656127908544e-04 true resid norm 
> 5.656127908545e-04 ||r(i)||/||b|| 1.408758999254e-05
>   60 KSP unpreconditioned resid norm 4.850863370767e-04 true resid norm 
> 4.850863370767e-04 ||r(i)||/||b|| 1.208193580169e-05
>   61 KSP unpreconditioned resid norm 4.374055762320e-04 true resid norm 
> 4.374055762316e-04 ||r(i)||/||b|| 1.089436186387e-05
>   62 KSP unpreconditioned resid norm 3.874398257079e-04 true resid norm 
> 3.874398257077e-04 ||r(i)||/||b|| 9.649876204364e-06
>   63 KSP unpreconditioned resid norm 3.364908694427e-04 true resid norm 
> 3.364908694429e-04 ||r(i)||/||b|| 8.380902061609e-06
>   64 KSP unpreconditioned resid norm 2.961034697265e-04 true resid norm 
> 2.961034697268e-04 ||r(i)||/||b|| 7.374982221632e-06
>   65 KSP unpreconditioned resid norm 2.640593092764e-04 true resid norm 
> 2.640593092767e-04 ||r(i)||/||b|| 6.576865557059e-06
>   66 KSP unpreconditioned resid norm 2.423231125743e-04 true resid norm 
> 2.423231125745e-04 ||r(i)||/||b|| 6.035487016671e-06
>   67 KSP unpreconditioned resid norm 2.182349471179e-04 true resid norm 
> 2.182349471179e-04 ||r(i)||/||b|| 5.435528521898e-06
>   68 KSP unpreconditioned resid norm 2.008438265031e-04 true resid norm 
> 2.008438265028e-04 ||r(i)||/||b|| 5.002371809927e-06
>   69 KSP unpreconditioned resid norm 1.838732863386e-04 true resid norm 
> 1.838732863388e-04 ||r(i)||/||b|| 4.579690400226e-06
>   70 KSP unpreconditioned resid norm 1.723786027645e-04 true resid norm 
> 1.723786027645e-04 ||r(i)||/||b|| 4.293394913444e-06
>   71 KSP unpreconditioned resid norm 1.580945192204e-04 true resid norm 
> 1.580945192205e-04 ||r(i)||/||b|| 3.937624471826e-06
>   72 KSP unpreconditioned resid norm 1.476687469671e-04 true resid norm 
> 1.476687469671e-04 ||r(i)||/||b|| 3.677952117812e-06
>   73 KSP unpreconditioned resid norm 1.385018526182e-04 true resid norm 
> 1.385018526184e-04 ||r(i)||/||b|| 3.449634351350e-06
>   74 KSP unpreconditioned resid norm 1.279712893541e-04 true resid norm 
> 1.279712893541e-04 ||r(i)||/||b|| 3.187351991305e-06
>   75 KSP unpreconditioned resid norm 1.202010411772e-04 true resid norm 
> 1.202010411774e-04 ||r(i)||/||b|| 2.993820175504e-06
>   76 KSP unpreconditioned resid norm 1.113459414198e-04 true resid norm 
> 1.113459414200e-04 ||r(i)||/||b|| 2.773268206485e-06
>   77 KSP unpreconditioned resid norm 1.042523036036e-04 true resid norm 
> 1.042523036037e-04 ||r(i)||/||b|| 2.596588572066e-06
>   78 KSP unpreconditioned resid norm 9.565176453232e-05 true resid norm 
> 9.565176453227e-05 ||r(i)||/||b|| 2.382376888539e-06
>   79 KSP unpreconditioned resid norm 8.896901670359e-05 true resid norm 
> 8.896901670365e-05 ||r(i)||/||b|| 2.215931198209e-06
>   80 KSP unpreconditioned resid norm 8.119298425803e-05 true resid norm 
> 8.119298425824e-05 ||r(i)||/||b|| 2.022255314935e-06
>   81 KSP unpreconditioned resid norm 7.544528309154e-05 true resid norm 
> 7.544528309154e-05 ||r(i)||/||b|| 1.879098620558e-06
>   82 KSP unpreconditioned resid norm 6.755385041138e-05 true resid norm 
> 6.755385041176e-05 ||r(i)||/||b|| 1.682548489719e-06
>   83 KSP unpreconditioned resid norm 6.158629300870e-05 true resid norm 
> 6.158629300835e-05 ||r(i)||/||b|| 1.533915885727e-06
>   84 KSP unpreconditioned resid norm 5.358756885754e-05 true resid norm 
> 5.358756885765e-05 ||r(i)||/||b|| 1.334693470462e-06
>   85 KSP unpreconditioned resid norm 4.774852370380e-05 true resid norm 
> 4.774852370387e-05 ||r(i)||/||b|| 1.189261692037e-06
>   86 KSP unpreconditioned resid norm 3.919358737908e-05 true resid norm 
> 3.919358737930e-05 ||r(i)||/||b|| 9.761858258229e-07
>   87 KSP unpreconditioned resid norm 3.434042319950e-05 true resid norm 
> 3.434042319947e-05 ||r(i)||/||b|| 8.553091620745e-07
>   88 KSP unpreconditioned resid norm 2.813699436281e-05 true resid norm 
> 2.813699436302e-05 ||r(i)||/||b|| 7.008017615898e-07
>   89 KSP unpreconditioned resid norm 2.462248069068e-05 true resid norm 
> 2.462248069051e-05 ||r(i)||/||b|| 6.132665635851e-07
>   90 KSP unpreconditioned resid norm 2.040558789626e-05 true resid norm 
> 2.040558789626e-05 ||r(i)||/||b|| 5.082373674841e-07
>   91 KSP unpreconditioned resid norm 1.888523204468e-05 true resid norm 
> 1.888523204470e-05 ||r(i)||/||b|| 4.703702077842e-07
>   92 KSP unpreconditioned resid norm 1.707071292484e-05 true resid norm 
> 1.707071292474e-05 ||r(i)||/||b|| 4.251763900191e-07
>   93 KSP unpreconditioned resid norm 1.498636454665e-05 true resid norm 
> 1.498636454672e-05 ||r(i)||/||b|| 3.732619958859e-07
>   94 KSP unpreconditioned resid norm 1.219393542993e-05 true resid norm 
> 1.219393543006e-05 ||r(i)||/||b|| 3.037115947725e-07
>   95 KSP unpreconditioned resid norm 1.059996963300e-05 true resid norm 
> 1.059996963303e-05 ||r(i)||/||b|| 2.640110487917e-07
>   96 KSP unpreconditioned resid norm 9.099659872548e-06 true resid norm 
> 9.099659873214e-06 ||r(i)||/||b|| 2.266431725699e-07
>   97 KSP unpreconditioned resid norm 8.147347587295e-06 true resid norm 
> 8.147347587584e-06 ||r(i)||/||b|| 2.029241456283e-07
>   98 KSP unpreconditioned resid norm 7.167226146744e-06 true resid norm 
> 7.167226146783e-06 ||r(i)||/||b|| 1.785124823418e-07
>   99 KSP unpreconditioned resid norm 6.552540209538e-06 true resid norm 
> 6.552540209577e-06 ||r(i)||/||b|| 1.632026385802e-07
> 100 KSP unpreconditioned resid norm 5.767783600111e-06 true resid norm 
> 5.767783600320e-06 ||r(i)||/||b|| 1.436568830140e-07
> 101 KSP unpreconditioned resid norm 5.261057430584e-06 true resid norm 
> 5.261057431144e-06 ||r(i)||/||b|| 1.310359688033e-07
> 102 KSP unpreconditioned resid norm 4.715498525786e-06 true resid norm 
> 4.715498525947e-06 ||r(i)||/||b|| 1.174478564100e-07
> 103 KSP unpreconditioned resid norm 4.380052669622e-06 true resid norm 
> 4.380052669825e-06 ||r(i)||/||b|| 1.090929822591e-07
> 104 KSP unpreconditioned resid norm 3.911664470060e-06 true resid norm 
> 3.911664470226e-06 ||r(i)||/||b|| 9.742694319496e-08
> 105 KSP unpreconditioned resid norm 3.652211458315e-06 true resid norm 
> 3.652211458259e-06 ||r(i)||/||b|| 9.096480564430e-08
> 106 KSP unpreconditioned resid norm 3.387532128049e-06 true resid norm 
> 3.387532128358e-06 ||r(i)||/||b|| 8.437249737363e-08
> 107 KSP unpreconditioned resid norm 3.234218880987e-06 true resid norm 
> 3.234218880798e-06 ||r(i)||/||b|| 8.055395895481e-08
> 108 KSP unpreconditioned resid norm 3.016905196388e-06 true resid norm 
> 3.016905196492e-06 ||r(i)||/||b|| 7.514137611763e-08
> 109 KSP unpreconditioned resid norm 2.858246441921e-06 true resid norm 
> 2.858246441975e-06 ||r(i)||/||b|| 7.118969836476e-08
> 110 KSP unpreconditioned resid norm 2.637118810847e-06 true resid norm 
> 2.637118810750e-06 ||r(i)||/||b|| 6.568212241336e-08
> 111 KSP unpreconditioned resid norm 2.494976088717e-06 true resid norm 
> 2.494976088700e-06 ||r(i)||/||b|| 6.214180574966e-08
> 112 KSP unpreconditioned resid norm 2.270639574272e-06 true resid norm 
> 2.270639574200e-06 ||r(i)||/||b|| 5.655430686750e-08
> 113 KSP unpreconditioned resid norm 2.104988663813e-06 true resid norm 
> 2.104988664169e-06 ||r(i)||/||b|| 5.242847707696e-08
> 114 KSP unpreconditioned resid norm 1.889361127301e-06 true resid norm 
> 1.889361127526e-06 ||r(i)||/||b|| 4.705789073868e-08
> 115 KSP unpreconditioned resid norm 1.732367008052e-06 true resid norm 
> 1.732367007971e-06 ||r(i)||/||b|| 4.314767367271e-08
> 116 KSP unpreconditioned resid norm 1.509288268391e-06 true resid norm 
> 1.509288268645e-06 ||r(i)||/||b|| 3.759150191264e-08
> 117 KSP unpreconditioned resid norm 1.359169217644e-06 true resid norm 
> 1.359169217445e-06 ||r(i)||/||b|| 3.385252062089e-08
> 118 KSP unpreconditioned resid norm 1.180146337735e-06 true resid norm 
> 1.180146337908e-06 ||r(i)||/||b|| 2.939363820703e-08
> 119 KSP unpreconditioned resid norm 1.067757039683e-06 true resid norm 
> 1.067757039924e-06 ||r(i)||/||b|| 2.659438335433e-08
> 120 KSP unpreconditioned resid norm 9.435833073736e-07 true resid norm 
> 9.435833073736e-07 ||r(i)||/||b|| 2.350161625235e-08
> 121 KSP unpreconditioned resid norm 8.749457237613e-07 true resid norm 
> 8.749457236791e-07 ||r(i)||/||b|| 2.179207546261e-08
> 122 KSP unpreconditioned resid norm 7.945760150897e-07 true resid norm 
> 7.945760150444e-07 ||r(i)||/||b|| 1.979032528762e-08
> 123 KSP unpreconditioned resid norm 7.141240839013e-07 true resid norm 
> 7.141240838682e-07 ||r(i)||/||b|| 1.778652721438e-08
> 124 KSP unpreconditioned resid norm 6.300566936733e-07 true resid norm 
> 6.300566936607e-07 ||r(i)||/||b|| 1.569267971988e-08
> 125 KSP unpreconditioned resid norm 5.628986997544e-07 true resid norm 
> 5.628986995849e-07 ||r(i)||/||b|| 1.401999073448e-08
> 126 KSP unpreconditioned resid norm 5.119018951602e-07 true resid norm 
> 5.119018951837e-07 ||r(i)||/||b|| 1.274982484900e-08
> 127 KSP unpreconditioned resid norm 4.664670343748e-07 true resid norm 
> 4.664670344042e-07 ||r(i)||/||b|| 1.161818903670e-08
> 128 KSP unpreconditioned resid norm 4.253264691112e-07 true resid norm 
> 4.253264691948e-07 ||r(i)||/||b|| 1.059351027394e-08
> 129 KSP unpreconditioned resid norm 3.868921150516e-07 true resid norm 
> 3.868921150517e-07 ||r(i)||/||b|| 9.636234498800e-09
> 130 KSP unpreconditioned resid norm 3.558445658540e-07 true resid norm 
> 3.558445660061e-07 ||r(i)||/||b|| 8.862940209315e-09
> 131 KSP unpreconditioned resid norm 3.268710273840e-07 true resid norm 
> 3.268710272455e-07 ||r(i)||/||b|| 8.141302825416e-09
> 132 KSP unpreconditioned resid norm 3.041273897592e-07 true resid norm 
> 3.041273896694e-07 ||r(i)||/||b|| 7.574832182794e-09
> 133 KSP unpreconditioned resid norm 2.851926677922e-07 true resid norm 
> 2.851926674248e-07 ||r(i)||/||b|| 7.103229333782e-09
> 134 KSP unpreconditioned resid norm 2.694708315072e-07 true resid norm 
> 2.694708309500e-07 ||r(i)||/||b|| 6.711649104748e-09
> 135 KSP unpreconditioned resid norm 2.534825559099e-07 true resid norm 
> 2.534825557469e-07 ||r(i)||/||b|| 6.313432746507e-09
> 136 KSP unpreconditioned resid norm 2.387342352458e-07 true resid norm 
> 2.387342351804e-07 ||r(i)||/||b|| 5.946099658254e-09
> 137 KSP unpreconditioned resid norm 2.200861667617e-07 true resid norm 
> 2.200861665255e-07 ||r(i)||/||b|| 5.481636425438e-09
> 138 KSP unpreconditioned resid norm 2.051415370616e-07 true resid norm 
> 2.051415370614e-07 ||r(i)||/||b|| 5.109413915824e-09
> 139 KSP unpreconditioned resid norm 1.887376429396e-07 true resid norm 
> 1.887376426682e-07 ||r(i)||/||b|| 4.700845824315e-09
> 140 KSP unpreconditioned resid norm 1.729743133005e-07 true resid norm 
> 1.729743128342e-07 ||r(i)||/||b|| 4.308232129561e-09
> 141 KSP unpreconditioned resid norm 1.541021130781e-07 true resid norm 
> 1.541021128364e-07 ||r(i)||/||b|| 3.838186508023e-09
> 142 KSP unpreconditioned resid norm 1.384631628565e-07 true resid norm 
> 1.384631627735e-07 ||r(i)||/||b|| 3.448670712125e-09
> 143 KSP unpreconditioned resid norm 1.223114405626e-07 true resid norm 
> 1.223114403883e-07 ||r(i)||/||b|| 3.046383411846e-09
> 144 KSP unpreconditioned resid norm 1.087313066223e-07 true resid norm 
> 1.087313065117e-07 ||r(i)||/||b|| 2.708146085550e-09
> 145 KSP unpreconditioned resid norm 9.181901998734e-08 true resid norm 
> 9.181901984268e-08 ||r(i)||/||b|| 2.286915582489e-09
> 146 KSP unpreconditioned resid norm 7.885850510808e-08 true resid norm 
> 7.885850531446e-08 ||r(i)||/||b|| 1.964110975313e-09
> 147 KSP unpreconditioned resid norm 6.483393946950e-08 true resid norm 
> 6.483393931383e-08 ||r(i)||/||b|| 1.614804278515e-09
> 148 KSP unpreconditioned resid norm 5.690132597004e-08 true resid norm 
> 5.690132577518e-08 ||r(i)||/||b|| 1.417228465328e-09
> 149 KSP unpreconditioned resid norm 5.023671521579e-08 true resid norm 
> 5.023671502186e-08 ||r(i)||/||b|| 1.251234511035e-09
> 150 KSP unpreconditioned resid norm 4.625371062660e-08 true resid norm 
> 4.625371062660e-08 ||r(i)||/||b|| 1.152030720445e-09
> 151 KSP unpreconditioned resid norm 4.349049084805e-08 true resid norm 
> 4.349049089337e-08 ||r(i)||/||b|| 1.083207830846e-09
> 152 KSP unpreconditioned resid norm 3.932593324498e-08 true resid norm 
> 3.932593376918e-08 ||r(i)||/||b|| 9.794821474546e-10
> 153 KSP unpreconditioned resid norm 3.504167649202e-08 true resid norm 
> 3.504167638113e-08 ||r(i)||/||b|| 8.727751166356e-10
> 154 KSP unpreconditioned resid norm 2.892726347747e-08 true resid norm 
> 2.892726348583e-08 ||r(i)||/||b|| 7.204848160858e-10
> 155 KSP unpreconditioned resid norm 2.477647033202e-08 true resid norm 
> 2.477647041570e-08 ||r(i)||/||b|| 6.171019508795e-10
> 156 KSP unpreconditioned resid norm 2.128504065757e-08 true resid norm 
> 2.128504067423e-08 ||r(i)||/||b|| 5.301416991298e-10
> 157 KSP unpreconditioned resid norm 1.879248809429e-08 true resid norm 
> 1.879248818928e-08 ||r(i)||/||b|| 4.680602575310e-10
> 158 KSP unpreconditioned resid norm 1.673649140073e-08 true resid norm 
> 1.673649134005e-08 ||r(i)||/||b|| 4.168520085200e-10
> 159 KSP unpreconditioned resid norm 1.497123388109e-08 true resid norm 
> 1.497123365569e-08 ||r(i)||/||b|| 3.728851342016e-10
> 160 KSP unpreconditioned resid norm 1.315982130162e-08 true resid norm 
> 1.315982149329e-08 ||r(i)||/||b|| 3.277687007261e-10
> 161 KSP unpreconditioned resid norm 1.182395864938e-08 true resid norm 
> 1.182395868430e-08 ||r(i)||/||b|| 2.944966675550e-10
> 162 KSP unpreconditioned resid norm 1.070204481679e-08 true resid norm 
> 1.070204466432e-08 ||r(i)||/||b|| 2.665534085342e-10
> 163 KSP unpreconditioned resid norm 9.969290307649e-09 true resid norm 
> 9.969290432333e-09 ||r(i)||/||b|| 2.483028644297e-10
> 164 KSP unpreconditioned resid norm 9.134440883306e-09 true resid norm 
> 9.134440980976e-09 ||r(i)||/||b|| 2.275094577628e-10
> 165 KSP unpreconditioned resid norm 8.593316427292e-09 true resid norm 
> 8.593316413360e-09 ||r(i)||/||b|| 2.140317904139e-10
> 166 KSP unpreconditioned resid norm 8.042173048464e-09 true resid norm 
> 8.042173332848e-09 ||r(i)||/||b|| 2.003045942277e-10
> 167 KSP unpreconditioned resid norm 7.655518522782e-09 true resid norm 
> 7.655518879144e-09 ||r(i)||/||b|| 1.906742791064e-10
> 168 KSP unpreconditioned resid norm 7.210283391815e-09 true resid norm 
> 7.210283220312e-09 ||r(i)||/||b|| 1.795848951442e-10
> 169 KSP unpreconditioned resid norm 6.793967416271e-09 true resid norm 
> 6.793967448832e-09 ||r(i)||/||b|| 1.692158122825e-10
> 170 KSP unpreconditioned resid norm 6.249160304588e-09 true resid norm 
> 6.249160382647e-09 ||r(i)||/||b|| 1.556464257736e-10
> 171 KSP unpreconditioned resid norm 5.794936438798e-09 true resid norm 
> 5.794936332552e-09 ||r(i)||/||b|| 1.443331699811e-10
> 172 KSP unpreconditioned resid norm 5.222337397128e-09 true resid norm 
> 5.222337443277e-09 ||r(i)||/||b|| 1.300715788135e-10
> 173 KSP unpreconditioned resid norm 4.755359110447e-09 true resid norm 
> 4.755358888996e-09 ||r(i)||/||b|| 1.184406494668e-10
> 174 KSP unpreconditioned resid norm 4.317537007873e-09 true resid norm 
> 4.317537267718e-09 ||r(i)||/||b|| 1.075359252630e-10
> 175 KSP unpreconditioned resid norm 3.924177535665e-09 true resid norm 
> 3.924177629720e-09 ||r(i)||/||b|| 9.773860563138e-11
> 176 KSP unpreconditioned resid norm 3.502843065115e-09 true resid norm 
> 3.502843126359e-09 ||r(i)||/||b|| 8.724452234855e-11
> 177 KSP unpreconditioned resid norm 3.083873232869e-09 true resid norm 
> 3.083873352938e-09 ||r(i)||/||b|| 7.680933686007e-11
> 178 KSP unpreconditioned resid norm 2.758980676473e-09 true resid norm 
> 2.758980618096e-09 ||r(i)||/||b|| 6.871730691658e-11
> 179 KSP unpreconditioned resid norm 2.510978240429e-09 true resid norm 
> 2.510978327392e-09 ||r(i)||/||b|| 6.254036989334e-11
> 180 KSP unpreconditioned resid norm 2.323000193205e-09 true resid norm 
> 2.323000193205e-09 ||r(i)||/||b|| 5.785844097519e-11
> 181 KSP unpreconditioned resid norm 2.167480159274e-09 true resid norm 
> 2.167480113693e-09 ||r(i)||/||b|| 5.398493749153e-11
> 182 KSP unpreconditioned resid norm 1.983545827983e-09 true resid norm 
> 1.983546404840e-09 ||r(i)||/||b|| 4.940374216139e-11
> 183 KSP unpreconditioned resid norm 1.794576286774e-09 true resid norm 
> 1.794576224361e-09 ||r(i)||/||b|| 4.469710457036e-11
> 184 KSP unpreconditioned resid norm 1.583490590644e-09 true resid norm 
> 1.583490380603e-09 ||r(i)||/||b|| 3.943963715064e-11
> 185 KSP unpreconditioned resid norm 1.412659866247e-09 true resid norm 
> 1.412659832191e-09 ||r(i)||/||b|| 3.518479927722e-11
> 186 KSP unpreconditioned resid norm 1.285613344939e-09 true resid norm 
> 1.285612984761e-09 ||r(i)||/||b|| 3.202047215205e-11
> 187 KSP unpreconditioned resid norm 1.168115133929e-09 true resid norm 
> 1.168114766904e-09 ||r(i)||/||b|| 2.909397058634e-11
> 188 KSP unpreconditioned resid norm 1.063377926053e-09 true resid norm 
> 1.063377647554e-09 ||r(i)||/||b|| 2.648530681802e-11
> 189 KSP unpreconditioned resid norm 9.548967728122e-10 true resid norm 
> 9.548964523410e-10 ||r(i)||/||b|| 2.378339019807e-11
> KSP Object: 16 MPI processes
>    type: fgmres
>      restart=30, using Classical (unmodified) Gram-Schmidt 
> Orthogonalization with no iterative refinement
>      happy breakdown tolerance 1e-30
>    maximum iterations=2000, initial guess is zero
>    tolerances:  relative=1e-20, absolute=1e-09, divergence=10000.
>    right preconditioning
>    using UNPRECONDITIONED norm type for convergence test
> PC Object: 16 MPI processes
>    type: bjacobi
>      number of blocks = 4
>      Local solver information for first block is in the following KSP 
> and PC objects on rank 0:
>      Use -ksp_view ::ascii_info_detail to display information for all 
> blocks
>    KSP Object: (sub_) 4 MPI processes
>      type: preonly
>      maximum iterations=10000, initial guess is zero
>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>      left preconditioning
>      using NONE norm type for convergence test
>    PC Object: (sub_) 4 MPI processes
>      type: telescope
>        petsc subcomm: parent comm size reduction factor = 4
>        petsc subcomm: parent_size = 4 , subcomm_size = 1
>        petsc subcomm type = contiguous
>      linear system matrix = precond matrix:
>      Mat Object: (sub_) 4 MPI processes
>        type: mpiaij
>        rows=40200, cols=40200
>        total: nonzeros=199996, allocated nonzeros=203412
>        total number of mallocs used during MatSetValues calls=0
>          not using I-node (on process 0) routines
>          setup type: default
>          Parent DM object: NULL
>          Sub DM object: NULL
>          KSP Object:   (sub_telescope_)   1 MPI processes
>            type: preonly
>            maximum iterations=10000, initial guess is zero
>            tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>            left preconditioning
>            using NONE norm type for convergence test
>          PC Object:   (sub_telescope_)   1 MPI processes
>            type: lu
>              out-of-place factorization
>              tolerance for zero pivot 2.22045e-14
>              matrix ordering: external
>              factor fill ratio given 0., needed 0.
>                Factored matrix follows:
>                  Mat Object:   1 MPI processes
>                    type: mumps
>                    rows=40200, cols=40200
>                    package used to perform factorization: mumps
>                    total: nonzeros=1849788, allocated nonzeros=1849788
>                      MUMPS run parameters:
>                        SYM (matrix type):                   0
>                        PAR (host participation):            1
>                        ICNTL(1) (output for error):         6
>                        ICNTL(2) (output of diagnostic msg): 0
>                        ICNTL(3) (output for global info):   0
>                        ICNTL(4) (level of printing):        0
>                        ICNTL(5) (input mat struct):         0
>                        ICNTL(6) (matrix prescaling):        7
>                        ICNTL(7) (sequential matrix ordering):7
>                        ICNTL(8) (scaling strategy):        77
>                        ICNTL(10) (max num of refinements):  0
>                        ICNTL(11) (error analysis):          0
>                        ICNTL(12) (efficiency control):        1
>                        ICNTL(13) (sequential factorization of the root 
> node):  0
>                        ICNTL(14) (percentage of estimated workspace 
> increase): 20
>                        ICNTL(18) (input mat struct):        0
>                        ICNTL(19) (Schur complement info):        0
>                        ICNTL(20) (RHS sparse pattern):        0
>                        ICNTL(21) (solution struct):        0
>                        ICNTL(22) (in-core/out-of-core facility):        0
>                        ICNTL(23) (max size of memory can be allocated 
> locally):0
>                        ICNTL(24) (detection of null pivot rows):        0
>                        ICNTL(25) (computation of a null space basis): 
>         0
>                        ICNTL(26) (Schur options for RHS or solution): 
>         0
>                        ICNTL(27) (blocking size for multiple RHS): 
>         -32
>                        ICNTL(28) (use parallel or sequential ordering): 
>         1
>                        ICNTL(29) (parallel ordering):        0
>                        ICNTL(30) (user-specified set of entries in 
> inv(A)):    0
>                        ICNTL(31) (factors is discarded in the solve 
> phase):    0
>                        ICNTL(33) (compute determinant):        0
>                        ICNTL(35) (activate BLR based factorization): 
>         0
>                        ICNTL(36) (choice of BLR factorization variant): 
>         0
>                        ICNTL(38) (estimated compression rate of LU 
> factors):   333
>                        CNTL(1) (relative pivoting threshold):      0.01
>                        CNTL(2) (stopping criterion of refinement): 
> 1.49012e-08
>                        CNTL(3) (absolute pivoting threshold):      0.
>                        CNTL(4) (value of static pivoting):         -1.
>                        CNTL(5) (fixation for null pivots):         0.
>                        CNTL(7) (dropping parameter for BLR):       0.
>                        RINFO(1) (local estimated flops for the 
> elimination after analysis):
>                          [0] 1.45525e+08
>                        RINFO(2) (local estimated flops for the assembly 
> after factorization):
>                          [0]  2.89397e+06
>                        RINFO(3) (local estimated flops for the 
> elimination after factorization):
>                          [0]  1.45525e+08
>                        INFO(15) (estimated size of (in MB) MUMPS 
> internal data for running numerical factorization):
>                        [0] 29
>                        INFO(16) (size of (in MB) MUMPS internal data 
> used during numerical factorization):
>                          [0] 29
>                        INFO(23) (num of pivots eliminated on this 
> processor after factorization):
>                          [0] 40200
>                        RINFOG(1) (global estimated flops for the 
> elimination after analysis): 1.45525e+08
>                        RINFOG(2) (global estimated flops for the 
> assembly after factorization): 2.89397e+06
>                        RINFOG(3) (global estimated flops for the 
> elimination after factorization): 1.45525e+08
>                        (RINFOG(12) RINFOG(13))*2^INFOG(34) 
> (determinant): (0.,0.)*(2^0)
>                        INFOG(3) (estimated real workspace for factors on 
> all processors after analysis): 1849788
>                        INFOG(4) (estimated integer workspace for factors 
> on all processors after analysis): 879986
>                        INFOG(5) (estimated maximum front size in the 
> complete tree): 282
>                        INFOG(6) (number of nodes in the complete tree): 
> 23709
>                        INFOG(7) (ordering option effectively used after 
> analysis): 5
>                        INFOG(8) (structural symmetry in percent of the 
> permuted matrix after analysis): 100
>                        INFOG(9) (total real/complex workspace to store 
> the matrix factors after factorization): 1849788
>                        INFOG(10) (total integer space store the matrix 
> factors after factorization): 879986
>                        INFOG(11) (order of largest frontal matrix after 
> factorization): 282
>                        INFOG(12) (number of off-diagonal pivots): 0
>                        INFOG(13) (number of delayed pivots after 
> factorization): 0
>                        INFOG(14) (number of memory compress after 
> factorization): 0
>                        INFOG(15) (number of steps of iterative 
> refinement after solution): 0
>                        INFOG(16) (estimated size (in MB) of all MUMPS 
> internal data for factorization after analysis: value on the most memory 
> consuming processor): 29
>                        INFOG(17) (estimated size of all MUMPS internal 
> data for factorization after analysis: sum over all processors): 29
>                        INFOG(18) (size of all MUMPS internal data 
> allocated during factorization: value on the most memory consuming 
> processor): 29
>                        INFOG(19) (size of all MUMPS internal data 
> allocated during factorization: sum over all processors): 29
>                        INFOG(20) (estimated number of entries in the 
> factors): 1849788
>                        INFOG(21) (size in MB of memory effectively used 
> during factorization - value on the most memory consuming processor): 26
>                        INFOG(22) (size in MB of memory effectively used 
> during factorization - sum over all processors): 26
>                        INFOG(23) (after analysis: value of ICNTL(6) 
> effectively used): 0
>                        INFOG(24) (after analysis: value of ICNTL(12) 
> effectively used): 1
>                        INFOG(25) (after factorization: number of pivots 
> modified by static pivoting): 0
>                        INFOG(28) (after factorization: number of null 
> pivots encountered): 0
>                        INFOG(29) (after factorization: effective number 
> of entries in the factors (sum over all processors)): 1849788
>                        INFOG(30, 31) (after solution: size in Mbytes of 
> memory used during solution phase): 29, 29
>                        INFOG(32) (after analysis: type of analysis done): 1
>                        INFOG(33) (value used for ICNTL(8)): 7
>                        INFOG(34) (exponent of the determinant if 
> determinant is requested): 0
>                        INFOG(35) (after factorization: number of entries 
> taking into account BLR factor compression - sum over all processors): 
> 1849788
>                        INFOG(36) (after analysis: estimated size of all 
> MUMPS internal data for running BLR in-core - value on the most memory 
> consuming processor): 0
>                        INFOG(37) (after analysis: estimated size of all 
> MUMPS internal data for running BLR in-core - sum over all processors): 0
>                        INFOG(38) (after analysis: estimated size of all 
> MUMPS internal data for running BLR out-of-core - value on the most 
> memory consuming processor): 0
>                        INFOG(39) (after analysis: estimated size of all 
> MUMPS internal data for running BLR out-of-core - sum over all 
> processors): 0
>            linear system matrix = precond matrix:
>            Mat Object:   1 MPI processes
>              type: seqaijcusparse
>              rows=40200, cols=40200
>              total: nonzeros=199996, allocated nonzeros=199996
>              total number of mallocs used during MatSetValues calls=0
>                not using I-node routines
>    linear system matrix = precond matrix:
>    Mat Object: 16 MPI processes
>      type: mpiaijcusparse
>      rows=160800, cols=160800
>      total: nonzeros=802396, allocated nonzeros=1608000
>      total number of mallocs used during MatSetValues calls=0
>        not using I-node (on process 0) routines
> Norm of error 9.11684e-07 iterations 189
> 
> Chang
> 
> 
> 
> On 10/14/21 10:10 PM, Chang Liu wrote:
>> Hi Barry,
>>
>> No problem. Here is the output. It seems that the resid norm 
>> calculation is incorrect.
>>
>> $ mpiexec -n 16 --hostfile hostfile --oversubscribe ./ex7 -m 400 
>> -ksp_view -ksp_monitor_true_residual -pc_type bjacobi 
>> -pc_bjacobi_blocks 4 -ksp_type fgmres -mat_type aijcusparse 
>> -sub_pc_type telescope -sub_ksp_type preonly -sub_telescope_ksp_type 
>> preonly -sub_telescope_pc_type lu 
>> -sub_telescope_pc_factor_mat_solver_type cusparse 
>> -sub_pc_telescope_reduction_factor 4 -sub_pc_telescope_subcomm_type 
>> contiguous -ksp_max_it 2000 -ksp_rtol 1.e-20 -ksp_atol 1.e-9
>>    0 KSP unpreconditioned resid norm 4.014971979977e+01 true resid 
>> norm 4.014971979977e+01 ||r(i)||/||b|| 1.000000000000e+00
>>    1 KSP unpreconditioned resid norm 0.000000000000e+00 true resid 
>> norm 4.014971979977e+01 ||r(i)||/||b|| 1.000000000000e+00
>> KSP Object: 16 MPI processes
>>    type: fgmres
>>      restart=30, using Classical (unmodified) Gram-Schmidt 
>> Orthogonalization with no iterative refinement
>>      happy breakdown tolerance 1e-30
>>    maximum iterations=2000, initial guess is zero
>>    tolerances:  relative=1e-20, absolute=1e-09, divergence=10000.
>>    right preconditioning
>>    using UNPRECONDITIONED norm type for convergence test
>> PC Object: 16 MPI processes
>>    type: bjacobi
>>      number of blocks = 4
>>      Local solver information for first block is in the following KSP 
>> and PC objects on rank 0:
>>      Use -ksp_view ::ascii_info_detail to display information for all 
>> blocks
>>    KSP Object: (sub_) 4 MPI processes
>>      type: preonly
>>      maximum iterations=10000, initial guess is zero
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using NONE norm type for convergence test
>>    PC Object: (sub_) 4 MPI processes
>>      type: telescope
>>        petsc subcomm: parent comm size reduction factor = 4
>>        petsc subcomm: parent_size = 4 , subcomm_size = 1
>>        petsc subcomm type = contiguous
>>      linear system matrix = precond matrix:
>>      Mat Object: (sub_) 4 MPI processes
>>        type: mpiaij
>>        rows=40200, cols=40200
>>        total: nonzeros=199996, allocated nonzeros=203412
>>        total number of mallocs used during MatSetValues calls=0
>>          not using I-node (on process 0) routines
>>          setup type: default
>>          Parent DM object: NULL
>>          Sub DM object: NULL
>>          KSP Object:   (sub_telescope_)   1 MPI processes
>>            type: preonly
>>            maximum iterations=10000, initial guess is zero
>>            tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>            left preconditioning
>>            using NONE norm type for convergence test
>>          PC Object:   (sub_telescope_)   1 MPI processes
>>            type: lu
>>              out-of-place factorization
>>              tolerance for zero pivot 2.22045e-14
>>              matrix ordering: nd
>>              factor fill ratio given 5., needed 8.62558
>>                Factored matrix follows:
>>                  Mat Object:   1 MPI processes
>>                    type: seqaijcusparse
>>                    rows=40200, cols=40200
>>                    package used to perform factorization: cusparse
>>                    total: nonzeros=1725082, allocated nonzeros=1725082
>>                      not using I-node routines
>>            linear system matrix = precond matrix:
>>            Mat Object:   1 MPI processes
>>              type: seqaijcusparse
>>              rows=40200, cols=40200
>>              total: nonzeros=199996, allocated nonzeros=199996
>>              total number of mallocs used during MatSetValues calls=0
>>                not using I-node routines
>>    linear system matrix = precond matrix:
>>    Mat Object: 16 MPI processes
>>      type: mpiaijcusparse
>>      rows=160800, cols=160800
>>      total: nonzeros=802396, allocated nonzeros=1608000
>>      total number of mallocs used during MatSetValues calls=0
>>        not using I-node (on process 0) routines
>> Norm of error 400.999 iterations 1
>>
>> Chang
>>
>>
>> On 10/14/21 9:47 PM, Barry Smith wrote:
>>>
>>>    Chang,
>>>
>>>     Sorry I did not notice that one. Please run that with -ksp_view 
>>> -ksp_monitor_true_residual so we can see exactly how options are 
>>> interpreted and solver used. At a glance it looks ok but something 
>>> must be wrong to get the wrong answer.
>>>
>>>    Barry
>>>
>>>> On Oct 14, 2021, at 6:02 PM, Chang Liu <cliu at pppl.gov> wrote:
>>>>
>>>> Hi Barry,
>>>>
>>>> That is exactly what I was doing in the second example, in which the 
>>>> preconditioner works but the GMRES does not.
>>>>
>>>> Chang
>>>>
>>>> On 10/14/21 5:15 PM, Barry Smith wrote:
>>>>>    You need to use the PCTELESCOPE inside the block Jacobi, not 
>>>>> outside it. So something like -pc_type bjacobi -sub_pc_type 
>>>>> telescope -sub_telescope_pc_type lu
>>>>>> On Oct 14, 2021, at 4:14 PM, Chang Liu <cliu at pppl.gov> wrote:
>>>>>>
>>>>>> Hi Pierre,
>>>>>>
>>>>>> I wonder if the trick of PCTELESCOPE only works for preconditioner 
>>>>>> and not for the solver. I have done some tests, and find that for 
>>>>>> solving a small matrix using -telescope_ksp_type preonly, it does 
>>>>>> work for GPU with multiple MPI processes. However, for bjacobi and 
>>>>>> gmres, it does not work.
>>>>>>
>>>>>> The command line options I used for small matrix is like
>>>>>>
>>>>>> mpiexec -n 4 --oversubscribe ./ex7 -m 100 -ksp_monitor_short 
>>>>>> -pc_type telescope -mat_type aijcusparse -telescope_pc_type lu 
>>>>>> -telescope_pc_factor_mat_solver_type cusparse -telescope_ksp_type 
>>>>>> preonly -pc_telescope_reduction_factor 4
>>>>>>
>>>>>> which gives the correct output. For iterative solver, I tried
>>>>>>
>>>>>> mpiexec -n 16 --oversubscribe ./ex7 -m 400 -ksp_monitor_short 
>>>>>> -pc_type bjacobi -pc_bjacobi_blocks 4 -ksp_type fgmres -mat_type 
>>>>>> aijcusparse -sub_pc_type telescope -sub_ksp_type preonly 
>>>>>> -sub_telescope_ksp_type preonly -sub_telescope_pc_type lu 
>>>>>> -sub_telescope_pc_factor_mat_solver_type cusparse 
>>>>>> -sub_pc_telescope_reduction_factor 4 -ksp_max_it 2000 -ksp_rtol 
>>>>>> 1.e-9 -ksp_atol 1.e-20
>>>>>>
>>>>>> for large matrix. The output is like
>>>>>>
>>>>>>   0 KSP Residual norm 40.1497
>>>>>>   1 KSP Residual norm < 1.e-11
>>>>>> Norm of error 400.999 iterations 1
>>>>>>
>>>>>> So it seems to call a direct solver instead of an iterative one.
>>>>>>
>>>>>> Can you please help check these options?
>>>>>>
>>>>>> Chang
>>>>>>
>>>>>> On 10/14/21 10:04 AM, Pierre Jolivet wrote:
>>>>>>>> On 14 Oct 2021, at 3:50 PM, Chang Liu <cliu at pppl.gov> wrote:
>>>>>>>>
>>>>>>>> Thank you Pierre. I was not aware of PCTELESCOPE before. This 
>>>>>>>> sounds exactly what I need. I wonder if PCTELESCOPE can 
>>>>>>>> transform a mpiaijcusparse to seqaircusparse? Or I have to do it 
>>>>>>>> manually?
>>>>>>> PCTELESCOPE uses MatCreateMPIMatConcatenateSeqMat().
>>>>>>> 1) I’m not sure this is implemented for cuSparse matrices, but it 
>>>>>>> should be;
>>>>>>> 2) at least for the implementations 
>>>>>>> MatCreateMPIMatConcatenateSeqMat_MPIBAIJ() and 
>>>>>>> MatCreateMPIMatConcatenateSeqMat_MPIAIJ(), the resulting MatType 
>>>>>>> is MATBAIJ (resp. MATAIJ). Constructors are usually “smart” 
>>>>>>> enough to detect if the MPI communicator on which the Mat lives 
>>>>>>> is of size 1 (your case), and then the resulting Mat is of type 
>>>>>>> MatSeqX instead of MatMPIX, so you would not need to worry about 
>>>>>>> the transformation you are mentioning.
>>>>>>> If you try this out and this does not work, please provide the 
>>>>>>> backtrace (probably something like “Operation XYZ not implemented 
>>>>>>> for MatType ABC”), and hopefully someone can add the missing 
>>>>>>> plumbing.
>>>>>>> I do not claim that this will be efficient, but I think this goes 
>>>>>>> in the direction of what you want to achieve.
>>>>>>> Thanks,
>>>>>>> Pierre
>>>>>>>> Chang
>>>>>>>>
>>>>>>>> On 10/14/21 1:35 AM, Pierre Jolivet wrote:
>>>>>>>>> Maybe I’m missing something, but can’t you use PCTELESCOPE as a 
>>>>>>>>> subdomain solver, with a reduction factor equal to the number 
>>>>>>>>> of MPI processes you have per block?
>>>>>>>>> -sub_pc_type telescope -sub_pc_telescope_reduction_factor X 
>>>>>>>>> -sub_telescope_pc_type lu
>>>>>>>>> This does not work with MUMPS -mat_mumps_use_omp_threads 
>>>>>>>>> because not only do the Mat needs to be redistributed, the 
>>>>>>>>> secondary processes also need to be “converted” to OpenMP threads.
>>>>>>>>> Thus the need for specific code in mumps.c.
>>>>>>>>> Thanks,
>>>>>>>>> Pierre
>>>>>>>>>> On 14 Oct 2021, at 6:00 AM, Chang Liu via petsc-users 
>>>>>>>>>> <petsc-users at mcs.anl.gov> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Junchao,
>>>>>>>>>>
>>>>>>>>>> Yes that is what I want.
>>>>>>>>>>
>>>>>>>>>> Chang
>>>>>>>>>>
>>>>>>>>>> On 10/13/21 11:42 PM, Junchao Zhang wrote:
>>>>>>>>>>> On Wed, Oct 13, 2021 at 8:58 PM Barry Smith <bsmith at petsc.dev 
>>>>>>>>>>> <mailto:bsmith at petsc.dev>> wrote:
>>>>>>>>>>>        Junchao,
>>>>>>>>>>>           If I understand correctly Chang is using the block 
>>>>>>>>>>> Jacobi
>>>>>>>>>>>     method with a single block for a number of MPI ranks and 
>>>>>>>>>>> a direct
>>>>>>>>>>>     solver for each block so it uses 
>>>>>>>>>>> PCSetUp_BJacobi_Multiproc() which
>>>>>>>>>>>     is code Hong Zhang wrote a number of years ago for CPUs. 
>>>>>>>>>>> For their
>>>>>>>>>>>     particular problems this preconditioner works well, but 
>>>>>>>>>>> using an
>>>>>>>>>>>     iterative solver on the blocks does not work well.
>>>>>>>>>>>           If we had complete MPI-GPU direct solvers he could 
>>>>>>>>>>> just use
>>>>>>>>>>>     the current code with MPIAIJCUSPARSE on each block but 
>>>>>>>>>>> since we do
>>>>>>>>>>>     not he would like to use a single GPU for each block, 
>>>>>>>>>>> this means
>>>>>>>>>>>     that diagonal blocks of  the global parallel MPI matrix 
>>>>>>>>>>> needs to be
>>>>>>>>>>>     sent to a subset of the GPUs (one GPU per block, which 
>>>>>>>>>>> has multiple
>>>>>>>>>>>     MPI ranks associated with the blocks). Similarly for the 
>>>>>>>>>>> triangular
>>>>>>>>>>>     solves the blocks of the right hand side needs to be 
>>>>>>>>>>> shipped to the
>>>>>>>>>>>     appropriate GPU and the resulting solution shipped back 
>>>>>>>>>>> to the
>>>>>>>>>>>     multiple GPUs. So Chang is absolutely correct, this is 
>>>>>>>>>>> somewhat like
>>>>>>>>>>>     your code for MUMPS with OpenMP. OK, I now understand the 
>>>>>>>>>>> background..
>>>>>>>>>>>     One could use PCSetUp_BJacobi_Multiproc() and get the 
>>>>>>>>>>> blocks on the
>>>>>>>>>>>     MPI ranks and then shrink each block down to a single GPU 
>>>>>>>>>>> but this
>>>>>>>>>>>     would be pretty inefficient, ideally one would go 
>>>>>>>>>>> directly from the
>>>>>>>>>>>     big MPI matrix on all the GPUs to the sub matrices on the 
>>>>>>>>>>> subset of
>>>>>>>>>>>     GPUs. But this may be a large coding project.
>>>>>>>>>>> I don't understand these sentences. Why do you say "shrink"? 
>>>>>>>>>>> In my mind, we just need to move each block (submatrix) 
>>>>>>>>>>> living over multiple MPI ranks to one of them and solve 
>>>>>>>>>>> directly there.  In other words, we keep blocks' size, no 
>>>>>>>>>>> shrinking or expanding.
>>>>>>>>>>> As mentioned before, cusparse does not provide LU 
>>>>>>>>>>> factorization. So the LU factorization would be done on CPU, 
>>>>>>>>>>> and the solve be done on GPU. I assume Chang wants to gain 
>>>>>>>>>>> from the (potential) faster solve (instead of factorization) 
>>>>>>>>>>> on GPU.
>>>>>>>>>>>        Barry
>>>>>>>>>>>     Since the matrices being factored and solved directly are 
>>>>>>>>>>> relatively
>>>>>>>>>>>     large it is possible that the cusparse code could be 
>>>>>>>>>>> reasonably
>>>>>>>>>>>     efficient (they are not the tiny problems one gets at the 
>>>>>>>>>>> coarse
>>>>>>>>>>>     level of multigrid). Of course, this is speculation, I don't
>>>>>>>>>>>     actually know how much better the cusparse code would be 
>>>>>>>>>>> on the
>>>>>>>>>>>     direct solver than a good CPU direct sparse solver.
>>>>>>>>>>>      > On Oct 13, 2021, at 9:32 PM, Chang Liu <cliu at pppl.gov
>>>>>>>>>>>     <mailto:cliu at pppl.gov>> wrote:
>>>>>>>>>>>      >
>>>>>>>>>>>      > Sorry I am not familiar with the details either. Can 
>>>>>>>>>>> you please
>>>>>>>>>>>     check the code in MatMumpsGatherNonzerosOnMaster in mumps.c?
>>>>>>>>>>>      >
>>>>>>>>>>>      > Chang
>>>>>>>>>>>      >
>>>>>>>>>>>      > On 10/13/21 9:24 PM, Junchao Zhang wrote:
>>>>>>>>>>>      >> Hi Chang,
>>>>>>>>>>>      >>   I did the work in mumps. It is easy for me to 
>>>>>>>>>>> understand
>>>>>>>>>>>     gathering matrix rows to one process.
>>>>>>>>>>>      >>   But how to gather blocks (submatrices) to form a 
>>>>>>>>>>> large block?     Can you draw a picture of that?
>>>>>>>>>>>      >>   Thanks
>>>>>>>>>>>      >> --Junchao Zhang
>>>>>>>>>>>      >> On Wed, Oct 13, 2021 at 7:47 PM Chang Liu via 
>>>>>>>>>>> petsc-users
>>>>>>>>>>>     <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>
>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov 
>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>>
>>>>>>>>>>>     wrote:
>>>>>>>>>>>      >>    Hi Barry,
>>>>>>>>>>>      >>    I think mumps solver in petsc does support that. 
>>>>>>>>>>> You can
>>>>>>>>>>>     check the
>>>>>>>>>>>      >>    documentation on "-mat_mumps_use_omp_threads" at
>>>>>>>>>>>      >>
>>>>>>>>>>> https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html 
>>>>>>>>>>>
>>>>>>>>>>> <https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html> 
>>>>>>>>>>>
>>>>>>>>>>>      >> 
>>>>>>>>>>> <https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html 
>>>>>>>>>>>
>>>>>>>>>>> <https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html>> 
>>>>>>>>>>>
>>>>>>>>>>>      >>    and the code enclosed by #if
>>>>>>>>>>>     defined(PETSC_HAVE_OPENMP_SUPPORT) in
>>>>>>>>>>>      >>    functions MatMumpsSetUpDistRHSInfo and
>>>>>>>>>>>      >>    MatMumpsGatherNonzerosOnMaster in
>>>>>>>>>>>      >>    mumps.c
>>>>>>>>>>>      >>    1. I understand it is ideal to do one MPI rank per 
>>>>>>>>>>> GPU.
>>>>>>>>>>>     However, I am
>>>>>>>>>>>      >>    working on an existing code that was developed 
>>>>>>>>>>> based on MPI
>>>>>>>>>>>     and the the
>>>>>>>>>>>      >>    # of mpi ranks is typically equal to # of cpu 
>>>>>>>>>>> cores. We don't
>>>>>>>>>>>     want to
>>>>>>>>>>>      >>    change the whole structure of the code.
>>>>>>>>>>>      >>    2. What you have suggested has been coded in 
>>>>>>>>>>> mumps.c. See
>>>>>>>>>>>     function
>>>>>>>>>>>      >>    MatMumpsSetUpDistRHSInfo.
>>>>>>>>>>>      >>    Regards,
>>>>>>>>>>>      >>    Chang
>>>>>>>>>>>      >>    On 10/13/21 7:53 PM, Barry Smith wrote:
>>>>>>>>>>>      >>     >
>>>>>>>>>>>      >>     >
>>>>>>>>>>>      >>     >> On Oct 13, 2021, at 3:50 PM, Chang Liu 
>>>>>>>>>>> <cliu at pppl.gov
>>>>>>>>>>>     <mailto:cliu at pppl.gov>
>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>> wrote:
>>>>>>>>>>>      >>     >>
>>>>>>>>>>>      >>     >> Hi Barry,
>>>>>>>>>>>      >>     >>
>>>>>>>>>>>      >>     >> That is exactly what I want.
>>>>>>>>>>>      >>     >>
>>>>>>>>>>>      >>     >> Back to my original question, I am looking for 
>>>>>>>>>>> an approach to
>>>>>>>>>>>      >>    transfer
>>>>>>>>>>>      >>     >> matrix
>>>>>>>>>>>      >>     >> data from many MPI processes to "master" MPI
>>>>>>>>>>>      >>     >> processes, each of which taking care of one 
>>>>>>>>>>> GPU, and then
>>>>>>>>>>>     upload
>>>>>>>>>>>      >>    the data to GPU to
>>>>>>>>>>>      >>     >> solve.
>>>>>>>>>>>      >>     >> One can just grab some codes from mumps.c to
>>>>>>>>>>>     aijcusparse.cu <http://aijcusparse.cu>
>>>>>>>>>>>      >>    <http://aijcusparse.cu <http://aijcusparse.cu>>.
>>>>>>>>>>>      >>     >
>>>>>>>>>>>      >>     >    mumps.c doesn't actually do that. It never 
>>>>>>>>>>> needs to
>>>>>>>>>>>     copy the
>>>>>>>>>>>      >>    entire matrix to a single MPI rank.
>>>>>>>>>>>      >>     >
>>>>>>>>>>>      >>     >    It would be possible to write such a code 
>>>>>>>>>>> that you
>>>>>>>>>>>     suggest but
>>>>>>>>>>>      >>    it is not clear that it makes sense
>>>>>>>>>>>      >>     >
>>>>>>>>>>>      >>     > 1)  For normal PETSc GPU usage there is one GPU 
>>>>>>>>>>> per MPI
>>>>>>>>>>>     rank, so
>>>>>>>>>>>      >>    while your one GPU per big domain is solving its 
>>>>>>>>>>> systems the
>>>>>>>>>>>     other
>>>>>>>>>>>      >>    GPUs (with the other MPI ranks that share that 
>>>>>>>>>>> domain) are doing
>>>>>>>>>>>      >>    nothing.
>>>>>>>>>>>      >>     >
>>>>>>>>>>>      >>     > 2) For each triangular solve you would have to 
>>>>>>>>>>> gather the
>>>>>>>>>>>     right
>>>>>>>>>>>      >>    hand side from the multiple ranks to the single 
>>>>>>>>>>> GPU to pass it to
>>>>>>>>>>>      >>    the GPU solver and then scatter the resulting 
>>>>>>>>>>> solution back
>>>>>>>>>>>     to all
>>>>>>>>>>>      >>    of its subdomain ranks.
>>>>>>>>>>>      >>     >
>>>>>>>>>>>      >>     >    What I was suggesting was assign an entire 
>>>>>>>>>>> subdomain to a
>>>>>>>>>>>      >>    single MPI rank, thus it does everything on one 
>>>>>>>>>>> GPU and can
>>>>>>>>>>>     use the
>>>>>>>>>>>      >>    GPU solver directly. If all the major computations 
>>>>>>>>>>> of a subdomain
>>>>>>>>>>>      >>    can fit and be done on a single GPU then you would be
>>>>>>>>>>>     utilizing all
>>>>>>>>>>>      >>    the GPUs you are using effectively.
>>>>>>>>>>>      >>     >
>>>>>>>>>>>      >>     >    Barry
>>>>>>>>>>>      >>     >
>>>>>>>>>>>      >>     >
>>>>>>>>>>>      >>     >
>>>>>>>>>>>      >>     >>
>>>>>>>>>>>      >>     >> Chang
>>>>>>>>>>>      >>     >>
>>>>>>>>>>>      >>     >> On 10/13/21 1:53 PM, Barry Smith wrote:
>>>>>>>>>>>      >>     >>>    Chang,
>>>>>>>>>>>      >>     >>>      You are correct there is no MPI + GPU 
>>>>>>>>>>> direct
>>>>>>>>>>>     solvers that
>>>>>>>>>>>      >>    currently do the triangular solves with MPI + GPU 
>>>>>>>>>>> parallelism
>>>>>>>>>>>     that I
>>>>>>>>>>>      >>    am aware of. You are limited that individual 
>>>>>>>>>>> triangular solves be
>>>>>>>>>>>      >>    done on a single GPU. I can only suggest making 
>>>>>>>>>>> each subdomain as
>>>>>>>>>>>      >>    big as possible to utilize each GPU as much as 
>>>>>>>>>>> possible for the
>>>>>>>>>>>      >>    direct triangular solves.
>>>>>>>>>>>      >>     >>>     Barry
>>>>>>>>>>>      >>     >>>> On Oct 13, 2021, at 12:16 PM, Chang Liu via 
>>>>>>>>>>> petsc-users
>>>>>>>>>>>      >>    <petsc-users at mcs.anl.gov 
>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov 
>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>>
>>>>>>>>>>>     wrote:
>>>>>>>>>>>      >>     >>>>
>>>>>>>>>>>      >>     >>>> Hi Mark,
>>>>>>>>>>>      >>     >>>>
>>>>>>>>>>>      >>     >>>> '-mat_type aijcusparse' works with 
>>>>>>>>>>> mpiaijcusparse with
>>>>>>>>>>>     other
>>>>>>>>>>>      >>    solvers, but with -pc_factor_mat_solver_type 
>>>>>>>>>>> cusparse, it
>>>>>>>>>>>     will give
>>>>>>>>>>>      >>    an error.
>>>>>>>>>>>      >>     >>>>
>>>>>>>>>>>      >>     >>>> Yes what I want is to have mumps or superlu 
>>>>>>>>>>> to do the
>>>>>>>>>>>      >>    factorization, and then do the rest, including 
>>>>>>>>>>> GMRES solver,
>>>>>>>>>>>     on gpu.
>>>>>>>>>>>      >>    Is that possible?
>>>>>>>>>>>      >>     >>>>
>>>>>>>>>>>      >>     >>>> I have tried to use aijcusparse with 
>>>>>>>>>>> superlu_dist, it
>>>>>>>>>>>     runs but
>>>>>>>>>>>      >>    the iterative solver is still running on CPUs. I have
>>>>>>>>>>>     contacted the
>>>>>>>>>>>      >>    superlu group and they confirmed that is the case 
>>>>>>>>>>> right now.
>>>>>>>>>>>     But if
>>>>>>>>>>>      >>    I set -pc_factor_mat_solver_type cusparse, it 
>>>>>>>>>>> seems that the
>>>>>>>>>>>      >>    iterative solver is running on GPU.
>>>>>>>>>>>      >>     >>>>
>>>>>>>>>>>      >>     >>>> Chang
>>>>>>>>>>>      >>     >>>>
>>>>>>>>>>>      >>     >>>> On 10/13/21 12:03 PM, Mark Adams wrote:
>>>>>>>>>>>      >>     >>>>> On Wed, Oct 13, 2021 at 11:10 AM Chang Liu
>>>>>>>>>>>     <cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>      >>     >>>>>     Thank you Junchao for explaining this. 
>>>>>>>>>>> I guess in
>>>>>>>>>>>     my case
>>>>>>>>>>>      >>    the code is
>>>>>>>>>>>      >>     >>>>>     just calling a seq solver like superlu 
>>>>>>>>>>> to do
>>>>>>>>>>>      >>    factorization on GPUs.
>>>>>>>>>>>      >>     >>>>>     My idea is that I want to have a 
>>>>>>>>>>> traditional MPI
>>>>>>>>>>>     code to
>>>>>>>>>>>      >>    utilize GPUs
>>>>>>>>>>>      >>     >>>>>     with cusparse. Right now cusparse does 
>>>>>>>>>>> not support
>>>>>>>>>>>     mpiaij
>>>>>>>>>>>      >>    matrix, Sure it does: '-mat_type aijcusparse' will 
>>>>>>>>>>> give you an
>>>>>>>>>>>      >>    mpiaijcusparse matrix with > 1 processes.
>>>>>>>>>>>      >>     >>>>> (-mat_type mpiaijcusparse might also work 
>>>>>>>>>>> with >1 proc).
>>>>>>>>>>>      >>     >>>>> However, I see in grepping the repo that 
>>>>>>>>>>> all the mumps and
>>>>>>>>>>>      >>    superlu tests use aij or sell matrix type.
>>>>>>>>>>>      >>     >>>>> MUMPS and SuperLU provide their own solves, 
>>>>>>>>>>> I assume
>>>>>>>>>>>     .... but
>>>>>>>>>>>      >>    you might want to do other matrix operations on 
>>>>>>>>>>> the GPU. Is
>>>>>>>>>>>     that the
>>>>>>>>>>>      >>    issue?
>>>>>>>>>>>      >>     >>>>> Did you try -mat_type aijcusparse with 
>>>>>>>>>>> MUMPS and/or
>>>>>>>>>>>     SuperLU
>>>>>>>>>>>      >>    have a problem? (no test with it so it probably 
>>>>>>>>>>> does not work)
>>>>>>>>>>>      >>     >>>>> Thanks,
>>>>>>>>>>>      >>     >>>>> Mark
>>>>>>>>>>>      >>     >>>>>     so I
>>>>>>>>>>>      >>     >>>>>     want the code to have a mpiaij matrix 
>>>>>>>>>>> when adding
>>>>>>>>>>>     all the
>>>>>>>>>>>      >>    matrix terms,
>>>>>>>>>>>      >>     >>>>>     and then transform the matrix to seqaij 
>>>>>>>>>>> when doing the
>>>>>>>>>>>      >>    factorization
>>>>>>>>>>>      >>     >>>>>     and
>>>>>>>>>>>      >>     >>>>>     solve. This involves sending the data 
>>>>>>>>>>> to the master
>>>>>>>>>>>      >>    process, and I
>>>>>>>>>>>      >>     >>>>>     think
>>>>>>>>>>>      >>     >>>>>     the petsc mumps solver have something 
>>>>>>>>>>> similar already.
>>>>>>>>>>>      >>     >>>>>     Chang
>>>>>>>>>>>      >>     >>>>>     On 10/13/21 10:18 AM, Junchao Zhang wrote:
>>>>>>>>>>>      >>     >>>>>      >
>>>>>>>>>>>      >>     >>>>>      >
>>>>>>>>>>>      >>     >>>>>      >
>>>>>>>>>>>      >>     >>>>>      > On Tue, Oct 12, 2021 at 1:07 PM Mark 
>>>>>>>>>>> Adams
>>>>>>>>>>>      >>    <mfadams at lbl.gov <mailto:mfadams at lbl.gov>
>>>>>>>>>>>     <mailto:mfadams at lbl.gov <mailto:mfadams at lbl.gov>>
>>>>>>>>>>>      >>     >>>>>     <mailto:mfadams at lbl.gov 
>>>>>>>>>>> <mailto:mfadams at lbl.gov>
>>>>>>>>>>>     <mailto:mfadams at lbl.gov <mailto:mfadams at lbl.gov>>>
>>>>>>>>>>>      >>     >>>>>      > <mailto:mfadams at lbl.gov
>>>>>>>>>>>     <mailto:mfadams at lbl.gov> <mailto:mfadams at lbl.gov
>>>>>>>>>>>     <mailto:mfadams at lbl.gov>>
>>>>>>>>>>>      >>    <mailto:mfadams at lbl.gov <mailto:mfadams at lbl.gov>
>>>>>>>>>>>     <mailto:mfadams at lbl.gov <mailto:mfadams at lbl.gov>>>>> wrote:
>>>>>>>>>>>      >>     >>>>>      >
>>>>>>>>>>>      >>     >>>>>      >
>>>>>>>>>>>      >>     >>>>>      >
>>>>>>>>>>>      >>     >>>>>      >     On Tue, Oct 12, 2021 at 1:45 PM 
>>>>>>>>>>> Chang Liu
>>>>>>>>>>>      >>    <cliu at pppl.gov <mailto:cliu at pppl.gov> 
>>>>>>>>>>> <mailto:cliu at pppl.gov
>>>>>>>>>>>     <mailto:cliu at pppl.gov>>
>>>>>>>>>>>      >>     >>>>>     <mailto:cliu at pppl.gov 
>>>>>>>>>>> <mailto:cliu at pppl.gov>
>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>>>>>>>>>>>      >>     >>>>>      >     <mailto:cliu at pppl.gov
>>>>>>>>>>>     <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov 
>>>>>>>>>>> <mailto:cliu at pppl.gov>>
>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>> wrote:
>>>>>>>>>>>      >>     >>>>>      >
>>>>>>>>>>>      >>     >>>>>      >         Hi Mark,
>>>>>>>>>>>      >>     >>>>>      >
>>>>>>>>>>>      >>     >>>>>      >         The option I use is like
>>>>>>>>>>>      >>     >>>>>      >
>>>>>>>>>>>      >>     >>>>>      >         -pc_type bjacobi 
>>>>>>>>>>> -pc_bjacobi_blocks 16
>>>>>>>>>>>      >>    -ksp_type fgmres
>>>>>>>>>>>      >>     >>>>>     -mat_type
>>>>>>>>>>>      >>     >>>>>      >         aijcusparse 
>>>>>>>>>>> *-sub_pc_factor_mat_solver_type
>>>>>>>>>>>      >>    cusparse
>>>>>>>>>>>      >>     >>>>>     *-sub_ksp_type
>>>>>>>>>>>      >>     >>>>>      >         preonly *-sub_pc_type lu* 
>>>>>>>>>>> -ksp_max_it 2000
>>>>>>>>>>>      >>    -ksp_rtol 1.e-300
>>>>>>>>>>>      >>     >>>>>      >         -ksp_atol 1.e-300
>>>>>>>>>>>      >>     >>>>>      >
>>>>>>>>>>>      >>     >>>>>      >
>>>>>>>>>>>      >>     >>>>>      >     Note, If you use -log_view the 
>>>>>>>>>>> last column
>>>>>>>>>>>     (rows
>>>>>>>>>>>      >>    are the
>>>>>>>>>>>      >>     >>>>>     method like
>>>>>>>>>>>      >>     >>>>>      >     MatFactorNumeric) has the 
>>>>>>>>>>> percent of work
>>>>>>>>>>>     in the GPU.
>>>>>>>>>>>      >>     >>>>>      >
>>>>>>>>>>>      >>     >>>>>      >     Junchao: *This* implies that we 
>>>>>>>>>>> have a
>>>>>>>>>>>     cuSparse LU
>>>>>>>>>>>      >>     >>>>>     factorization. Is
>>>>>>>>>>>      >>     >>>>>      >     that correct? (I don't think we do)
>>>>>>>>>>>      >>     >>>>>      >
>>>>>>>>>>>      >>     >>>>>      > No, we don't have cuSparse LU 
>>>>>>>>>>> factorization.     If you check
>>>>>>>>>>>      >>     >>>>>      > 
>>>>>>>>>>> MatLUFactorSymbolic_SeqAIJCUSPARSE(),you will
>>>>>>>>>>>     find it
>>>>>>>>>>>      >>    calls
>>>>>>>>>>>      >>     >>>>>      > MatLUFactorSymbolic_SeqAIJ() instead.
>>>>>>>>>>>      >>     >>>>>      > So I don't understand Chang's idea. 
>>>>>>>>>>> Do you want to
>>>>>>>>>>>      >>    make bigger
>>>>>>>>>>>      >>     >>>>>     blocks?
>>>>>>>>>>>      >>     >>>>>      >
>>>>>>>>>>>      >>     >>>>>      >
>>>>>>>>>>>      >>     >>>>>      >         I think this one do both 
>>>>>>>>>>> factorization and
>>>>>>>>>>>      >>    solve on gpu.
>>>>>>>>>>>      >>     >>>>>      >
>>>>>>>>>>>      >>     >>>>>      >         You can check the
>>>>>>>>>>>     runex72_aijcusparse.sh file
>>>>>>>>>>>      >>    in petsc
>>>>>>>>>>>      >>     >>>>>     install
>>>>>>>>>>>      >>     >>>>>      >         directory, and try it your 
>>>>>>>>>>> self (this
>>>>>>>>>>>     is only lu
>>>>>>>>>>>      >>     >>>>>     factorization
>>>>>>>>>>>      >>     >>>>>      >         without
>>>>>>>>>>>      >>     >>>>>      >         iterative solve).
>>>>>>>>>>>      >>     >>>>>      >
>>>>>>>>>>>      >>     >>>>>      >         Chang
>>>>>>>>>>>      >>     >>>>>      >
>>>>>>>>>>>      >>     >>>>>      >         On 10/12/21 1:17 PM, Mark 
>>>>>>>>>>> Adams wrote:
>>>>>>>>>>>      >>     >>>>>      >          >
>>>>>>>>>>>      >>     >>>>>      >          >
>>>>>>>>>>>      >>     >>>>>      >          > On Tue, Oct 12, 2021 at 
>>>>>>>>>>> 11:19 AM
>>>>>>>>>>>     Chang Liu
>>>>>>>>>>>      >>     >>>>>     <cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>>>>>>>>>>>      >>     >>>>>      >         <mailto:cliu at pppl.gov
>>>>>>>>>>>     <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov 
>>>>>>>>>>> <mailto:cliu at pppl.gov>>
>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
>>>>>>>>>>>      >>     >>>>>      >          > <mailto:cliu at pppl.gov
>>>>>>>>>>>     <mailto:cliu at pppl.gov>
>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov> 
>>>>>>>>>>> <mailto:cliu at pppl.gov
>>>>>>>>>>>     <mailto:cliu at pppl.gov>>>
>>>>>>>>>>>      >>     >>>>>     <mailto:cliu at pppl.gov 
>>>>>>>>>>> <mailto:cliu at pppl.gov>
>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>> wrote:
>>>>>>>>>>>      >>     >>>>>      >          >
>>>>>>>>>>>      >>     >>>>>      >          >     Hi Junchao,
>>>>>>>>>>>      >>     >>>>>      >          >
>>>>>>>>>>>      >>     >>>>>      >          >     No I only needs it to 
>>>>>>>>>>> be transferred
>>>>>>>>>>>      >>    within a
>>>>>>>>>>>      >>     >>>>>     node. I use
>>>>>>>>>>>      >>     >>>>>      >         block-Jacobi
>>>>>>>>>>>      >>     >>>>>      >          >     method and GMRES to 
>>>>>>>>>>> solve the sparse
>>>>>>>>>>>      >>    matrix, so each
>>>>>>>>>>>      >>     >>>>>      >         direct solver will
>>>>>>>>>>>      >>     >>>>>      >          >     take care of a 
>>>>>>>>>>> sub-block of the
>>>>>>>>>>>     whole
>>>>>>>>>>>      >>    matrix. In this
>>>>>>>>>>>      >>     >>>>>      >         way, I can use
>>>>>>>>>>>      >>     >>>>>      >          >     one
>>>>>>>>>>>      >>     >>>>>      >          >     GPU to solve one 
>>>>>>>>>>> sub-block, which is
>>>>>>>>>>>      >>    stored within
>>>>>>>>>>>      >>     >>>>>     one node.
>>>>>>>>>>>      >>     >>>>>      >          >
>>>>>>>>>>>      >>     >>>>>      >          >     It was stated in the
>>>>>>>>>>>     documentation that
>>>>>>>>>>>      >>    cusparse
>>>>>>>>>>>      >>     >>>>>     solver
>>>>>>>>>>>      >>     >>>>>      >         is slow.
>>>>>>>>>>>      >>     >>>>>      >          >     However, in my test 
>>>>>>>>>>> using
>>>>>>>>>>>     ex72.c, the
>>>>>>>>>>>      >>    cusparse
>>>>>>>>>>>      >>     >>>>>     solver is
>>>>>>>>>>>      >>     >>>>>      >         faster than
>>>>>>>>>>>      >>     >>>>>      >          >     mumps or superlu_dist 
>>>>>>>>>>> on CPUs.
>>>>>>>>>>>      >>     >>>>>      >          >
>>>>>>>>>>>      >>     >>>>>      >          >
>>>>>>>>>>>      >>     >>>>>      >          > Are we talking about the
>>>>>>>>>>>     factorization, the
>>>>>>>>>>>      >>    solve, or
>>>>>>>>>>>      >>     >>>>>     both?
>>>>>>>>>>>      >>     >>>>>      >          >
>>>>>>>>>>>      >>     >>>>>      >          > We do not have an 
>>>>>>>>>>> interface to
>>>>>>>>>>>     cuSparse's LU
>>>>>>>>>>>      >>     >>>>>     factorization (I
>>>>>>>>>>>      >>     >>>>>      >         just
>>>>>>>>>>>      >>     >>>>>      >          > learned that it exists a 
>>>>>>>>>>> few weeks ago).
>>>>>>>>>>>      >>     >>>>>      >          > Perhaps your fast 
>>>>>>>>>>> "cusparse solver" is
>>>>>>>>>>>      >>    '-pc_type lu
>>>>>>>>>>>      >>     >>>>>     -mat_type
>>>>>>>>>>>      >>     >>>>>      >          > aijcusparse' ? This would 
>>>>>>>>>>> be the CPU
>>>>>>>>>>>      >>    factorization,
>>>>>>>>>>>      >>     >>>>>     which is the
>>>>>>>>>>>      >>     >>>>>      >          > dominant cost.
>>>>>>>>>>>      >>     >>>>>      >          >
>>>>>>>>>>>      >>     >>>>>      >          >
>>>>>>>>>>>      >>     >>>>>      >          >     Chang
>>>>>>>>>>>      >>     >>>>>      >          >
>>>>>>>>>>>      >>     >>>>>      >          >     On 10/12/21 10:24 AM, 
>>>>>>>>>>> Junchao
>>>>>>>>>>>     Zhang wrote:
>>>>>>>>>>>      >>     >>>>>      >          >      > Hi, Chang,
>>>>>>>>>>>      >>     >>>>>      >          >      >     For the mumps 
>>>>>>>>>>> solver, we
>>>>>>>>>>>     usually
>>>>>>>>>>>      >>    transfers
>>>>>>>>>>>      >>     >>>>>     matrix
>>>>>>>>>>>      >>     >>>>>      >         and vector
>>>>>>>>>>>      >>     >>>>>      >          >     data
>>>>>>>>>>>      >>     >>>>>      >          >      > within a compute 
>>>>>>>>>>> node.  For
>>>>>>>>>>>     the idea you
>>>>>>>>>>>      >>     >>>>>     propose, it
>>>>>>>>>>>      >>     >>>>>      >         looks like
>>>>>>>>>>>      >>     >>>>>      >          >     we need
>>>>>>>>>>>      >>     >>>>>      >          >      > to gather data within
>>>>>>>>>>>      >>    MPI_COMM_WORLD, right?
>>>>>>>>>>>      >>     >>>>>      >          >      >
>>>>>>>>>>>      >>     >>>>>      >          >      >     Mark, I 
>>>>>>>>>>> remember you said
>>>>>>>>>>>      >>    cusparse solve is
>>>>>>>>>>>      >>     >>>>>     slow
>>>>>>>>>>>      >>     >>>>>      >         and you would
>>>>>>>>>>>      >>     >>>>>      >          >      > rather do it on 
>>>>>>>>>>> CPU. Is it right?
>>>>>>>>>>>      >>     >>>>>      >          >      >
>>>>>>>>>>>      >>     >>>>>      >          >      > --Junchao Zhang
>>>>>>>>>>>      >>     >>>>>      >          >      >
>>>>>>>>>>>      >>     >>>>>      >          >      >
>>>>>>>>>>>      >>     >>>>>      >          >      > On Mon, Oct 11, 
>>>>>>>>>>> 2021 at 10:25 PM
>>>>>>>>>>>      >>    Chang Liu via
>>>>>>>>>>>      >>     >>>>>     petsc-users
>>>>>>>>>>>      >>     >>>>>      >          >      > 
>>>>>>>>>>> <petsc-users at mcs.anl.gov
>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov>
>>>>>>>>>>>      >>    <mailto:petsc-users at mcs.anl.gov 
>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>
>>>>>>>>>>>      >>     >>>>>     <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov>
>>>>>>>>>>>      >>    <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov>>>
>>>>>>>>>>>      >>     >>>>>      >         <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov>
>>>>>>>>>>>      >>    <mailto:petsc-users at mcs.anl.gov 
>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>
>>>>>>>>>>>      >>     >>>>>     <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov>
>>>>>>>>>>>      >>    <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov>>>> 
>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov>
>>>>>>>>>>>      >>    <mailto:petsc-users at mcs.anl.gov 
>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>
>>>>>>>>>>>      >>     >>>>>     <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov>
>>>>>>>>>>>      >>    <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov>>>
>>>>>>>>>>>      >>     >>>>>      >         <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov>
>>>>>>>>>>>      >>    <mailto:petsc-users at mcs.anl.gov 
>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>
>>>>>>>>>>>      >>     >>>>>     <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov>
>>>>>>>>>>>      >>    <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov>>>>>
>>>>>>>>>>>      >>     >>>>>      >          > 
>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov>
>>>>>>>>>>>      >>    <mailto:petsc-users at mcs.anl.gov 
>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>
>>>>>>>>>>>      >>     >>>>>     <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov>
>>>>>>>>>>>      >>    <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov>>>
>>>>>>>>>>>      >>     >>>>>      >         <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov>
>>>>>>>>>>>      >>    <mailto:petsc-users at mcs.anl.gov 
>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>
>>>>>>>>>>>      >>     >>>>>     <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov>
>>>>>>>>>>>      >>    <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov>>>> 
>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov>
>>>>>>>>>>>      >>    <mailto:petsc-users at mcs.anl.gov 
>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>
>>>>>>>>>>>      >>     >>>>>     <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov>
>>>>>>>>>>>      >>    <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov>>>
>>>>>>>>>>>      >>     >>>>>      >         <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov>
>>>>>>>>>>>      >>    <mailto:petsc-users at mcs.anl.gov 
>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>
>>>>>>>>>>>      >>     >>>>>     <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov>
>>>>>>>>>>>      >>    <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov>>>>>>>
>>>>>>>>>>>      >>     >>>>>      >          >     wrote:
>>>>>>>>>>>      >>     >>>>>      >          >      >
>>>>>>>>>>>      >>     >>>>>      >          >      >     Hi,
>>>>>>>>>>>      >>     >>>>>      >          >      >
>>>>>>>>>>>      >>     >>>>>      >          >      >     Currently, it 
>>>>>>>>>>> is possible
>>>>>>>>>>>     to use
>>>>>>>>>>>      >>    mumps
>>>>>>>>>>>      >>     >>>>>     solver in
>>>>>>>>>>>      >>     >>>>>      >         PETSC with
>>>>>>>>>>>      >>     >>>>>      >          >      > 
>>>>>>>>>>> -mat_mumps_use_omp_threads
>>>>>>>>>>>      >>    option, so that
>>>>>>>>>>>      >>     >>>>>      >         multiple MPI
>>>>>>>>>>>      >>     >>>>>      >          >     processes will
>>>>>>>>>>>      >>     >>>>>      >          >      >     transfer the 
>>>>>>>>>>> matrix and
>>>>>>>>>>>     rhs data
>>>>>>>>>>>      >>    to the master
>>>>>>>>>>>      >>     >>>>>      >         rank, and then
>>>>>>>>>>>      >>     >>>>>      >          >     master
>>>>>>>>>>>      >>     >>>>>      >          >      >     rank will call 
>>>>>>>>>>> mumps with
>>>>>>>>>>>     OpenMP
>>>>>>>>>>>      >>    to solve
>>>>>>>>>>>      >>     >>>>>     the matrix.
>>>>>>>>>>>      >>     >>>>>      >          >      >
>>>>>>>>>>>      >>     >>>>>      >          >      >     I wonder if 
>>>>>>>>>>> someone can
>>>>>>>>>>>     develop
>>>>>>>>>>>      >>    similar
>>>>>>>>>>>      >>     >>>>>     option for
>>>>>>>>>>>      >>     >>>>>      >         cusparse
>>>>>>>>>>>      >>     >>>>>      >          >     solver.
>>>>>>>>>>>      >>     >>>>>      >          >      >     Right now, 
>>>>>>>>>>> this solver
>>>>>>>>>>>     does not
>>>>>>>>>>>      >>    work with
>>>>>>>>>>>      >>     >>>>>      >         mpiaijcusparse. I
>>>>>>>>>>>      >>     >>>>>      >          >     think a
>>>>>>>>>>>      >>     >>>>>      >          >      >     possible 
>>>>>>>>>>> workaround is to
>>>>>>>>>>>      >>    transfer all the
>>>>>>>>>>>      >>     >>>>>     matrix
>>>>>>>>>>>      >>     >>>>>      >         data to one MPI
>>>>>>>>>>>      >>     >>>>>      >          >      >     process, and 
>>>>>>>>>>> then upload the
>>>>>>>>>>>      >>    data to GPU to
>>>>>>>>>>>      >>     >>>>>     solve.
>>>>>>>>>>>      >>     >>>>>      >         In this
>>>>>>>>>>>      >>     >>>>>      >          >     way, one can
>>>>>>>>>>>      >>     >>>>>      >          >      >     use cusparse 
>>>>>>>>>>> solver for a MPI
>>>>>>>>>>>      >>    program.
>>>>>>>>>>>      >>     >>>>>      >          >      >
>>>>>>>>>>>      >>     >>>>>      >          >      >     Chang
>>>>>>>>>>>      >>     >>>>>      >          >      >     --
>>>>>>>>>>>      >>     >>>>>      >          >      >     Chang Liu
>>>>>>>>>>>      >>     >>>>>      >          >      >     Staff Research 
>>>>>>>>>>> Physicist
>>>>>>>>>>>      >>     >>>>>      >          >      >     +1 609 243 3438
>>>>>>>>>>>      >>     >>>>>      >          >      > cliu at pppl.gov
>>>>>>>>>>>     <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov 
>>>>>>>>>>> <mailto:cliu at pppl.gov>>
>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>>>>>>>>>>>      >>     >>>>>     <mailto:cliu at pppl.gov 
>>>>>>>>>>> <mailto:cliu at pppl.gov>
>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
>>>>>>>>>>>      >>     >>>>>      >         <mailto:cliu at pppl.gov
>>>>>>>>>>>     <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov 
>>>>>>>>>>> <mailto:cliu at pppl.gov>>
>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>>>>>>>>>>>      >>     >>>>>     <mailto:cliu at pppl.gov 
>>>>>>>>>>> <mailto:cliu at pppl.gov>
>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>
>>>>>>>>>>>      >>     >>>>>      >         <mailto:cliu at pppl.gov
>>>>>>>>>>>     <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov 
>>>>>>>>>>> <mailto:cliu at pppl.gov>>
>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>>>>>>>>>>>      >>     >>>>>     <mailto:cliu at pppl.gov 
>>>>>>>>>>> <mailto:cliu at pppl.gov>
>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
>>>>>>>>>>>      >>     >>>>>      >          >     <mailto:cliu at pppl.gov
>>>>>>>>>>>     <mailto:cliu at pppl.gov>
>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov> 
>>>>>>>>>>> <mailto:cliu at pppl.gov
>>>>>>>>>>>     <mailto:cliu at pppl.gov>>>
>>>>>>>>>>>      >>     >>>>>     <mailto:cliu at pppl.gov 
>>>>>>>>>>> <mailto:cliu at pppl.gov>
>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>>
>>>>>>>>>>>      >>     >>>>>      >          >      >     Princeton 
>>>>>>>>>>> Plasma Physics
>>>>>>>>>>>     Laboratory
>>>>>>>>>>>      >>     >>>>>      >          >      >     100 
>>>>>>>>>>> Stellarator Rd,
>>>>>>>>>>>     Princeton NJ
>>>>>>>>>>>      >>    08540, USA
>>>>>>>>>>>      >>     >>>>>      >          >      >
>>>>>>>>>>>      >>     >>>>>      >          >
>>>>>>>>>>>      >>     >>>>>      >          >     --
>>>>>>>>>>>      >>     >>>>>      >          >     Chang Liu
>>>>>>>>>>>      >>     >>>>>      >          >     Staff Research Physicist
>>>>>>>>>>>      >>     >>>>>      >          >     +1 609 243 3438
>>>>>>>>>>>      >>     >>>>>      >          > cliu at pppl.gov 
>>>>>>>>>>> <mailto:cliu at pppl.gov>
>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>>>>>>>>>>>      >>     >>>>>     <mailto:cliu at pppl.gov 
>>>>>>>>>>> <mailto:cliu at pppl.gov>
>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>>>>>>>>>      >>     >>>>>     <mailto:cliu at pppl.gov 
>>>>>>>>>>> <mailto:cliu at pppl.gov>
>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>>>>>>>>>>>      >>     >>>>>      >         <mailto:cliu at pppl.gov
>>>>>>>>>>>     <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov 
>>>>>>>>>>> <mailto:cliu at pppl.gov>>
>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>
>>>>>>>>>>>      >>     >>>>>      >          >     Princeton Plasma 
>>>>>>>>>>> Physics Laboratory
>>>>>>>>>>>      >>     >>>>>      >          >     100 Stellarator Rd, 
>>>>>>>>>>> Princeton NJ
>>>>>>>>>>>     08540, USA
>>>>>>>>>>>      >>     >>>>>      >          >
>>>>>>>>>>>      >>     >>>>>      >
>>>>>>>>>>>      >>     >>>>>      >         --
>>>>>>>>>>>      >>     >>>>>      >         Chang Liu
>>>>>>>>>>>      >>     >>>>>      >         Staff Research Physicist
>>>>>>>>>>>      >>     >>>>>      >         +1 609 243 3438
>>>>>>>>>>>      >>     >>>>>      > cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>> 
>>>>>>>>>>> <mailto:cliu at pppl.gov
>>>>>>>>>>>     <mailto:cliu at pppl.gov>
>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>>>>>>>>>      >>     >>>>>     <mailto:cliu at pppl.gov 
>>>>>>>>>>> <mailto:cliu at pppl.gov>
>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
>>>>>>>>>>>      >>     >>>>>      >         Princeton Plasma Physics 
>>>>>>>>>>> Laboratory
>>>>>>>>>>>      >>     >>>>>      >         100 Stellarator Rd, 
>>>>>>>>>>> Princeton NJ 08540, USA
>>>>>>>>>>>      >>     >>>>>      >
>>>>>>>>>>>      >>     >>>>>     --     Chang Liu
>>>>>>>>>>>      >>     >>>>>     Staff Research Physicist
>>>>>>>>>>>      >>     >>>>>     +1 609 243 3438
>>>>>>>>>>>      >>     >>>>> cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>> 
>>>>>>>>>>> <mailto:cliu at pppl.gov
>>>>>>>>>>>     <mailto:cliu at pppl.gov>
>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>>>>>>>>>>>      >>     >>>>>     Princeton Plasma Physics Laboratory
>>>>>>>>>>>      >>     >>>>>     100 Stellarator Rd, Princeton NJ 08540, 
>>>>>>>>>>> USA
>>>>>>>>>>>      >>     >>>>
>>>>>>>>>>>      >>     >>>> --
>>>>>>>>>>>      >>     >>>> Chang Liu
>>>>>>>>>>>      >>     >>>> Staff Research Physicist
>>>>>>>>>>>      >>     >>>> +1 609 243 3438
>>>>>>>>>>>      >>     >>>> cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>>>>>>>>>      >>     >>>> Princeton Plasma Physics Laboratory
>>>>>>>>>>>      >>     >>>> 100 Stellarator Rd, Princeton NJ 08540, USA
>>>>>>>>>>>      >>     >>
>>>>>>>>>>>      >>     >> --
>>>>>>>>>>>      >>     >> Chang Liu
>>>>>>>>>>>      >>     >> Staff Research Physicist
>>>>>>>>>>>      >>     >> +1 609 243 3438
>>>>>>>>>>>      >>     >> cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>>>>>>>>>      >>     >> Princeton Plasma Physics Laboratory
>>>>>>>>>>>      >>     >> 100 Stellarator Rd, Princeton NJ 08540, USA
>>>>>>>>>>>      >>     >
>>>>>>>>>>>      >>    --     Chang Liu
>>>>>>>>>>>      >>    Staff Research Physicist
>>>>>>>>>>>      >>    +1 609 243 3438
>>>>>>>>>>>      >> cliu at pppl.gov <mailto:cliu at pppl.gov> 
>>>>>>>>>>> <mailto:cliu at pppl.gov
>>>>>>>>>>>     <mailto:cliu at pppl.gov>>
>>>>>>>>>>>      >>    Princeton Plasma Physics Laboratory
>>>>>>>>>>>      >>    100 Stellarator Rd, Princeton NJ 08540, USA
>>>>>>>>>>>      >
>>>>>>>>>>>      > --
>>>>>>>>>>>      > Chang Liu
>>>>>>>>>>>      > Staff Research Physicist
>>>>>>>>>>>      > +1 609 243 3438
>>>>>>>>>>>      > cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>>>      > Princeton Plasma Physics Laboratory
>>>>>>>>>>>      > 100 Stellarator Rd, Princeton NJ 08540, USA
>>>>>>>>>>
>>>>>>>>>> -- 
>>>>>>>>>> Chang Liu
>>>>>>>>>> Staff Research Physicist
>>>>>>>>>> +1 609 243 3438
>>>>>>>>>> cliu at pppl.gov
>>>>>>>>>> Princeton Plasma Physics Laboratory
>>>>>>>>>> 100 Stellarator Rd, Princeton NJ 08540, USA
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> Chang Liu
>>>>>>>> Staff Research Physicist
>>>>>>>> +1 609 243 3438
>>>>>>>> cliu at pppl.gov
>>>>>>>> Princeton Plasma Physics Laboratory
>>>>>>>> 100 Stellarator Rd, Princeton NJ 08540, USA
>>>>>>
>>>>>> -- 
>>>>>> Chang Liu
>>>>>> Staff Research Physicist
>>>>>> +1 609 243 3438
>>>>>> cliu at pppl.gov
>>>>>> Princeton Plasma Physics Laboratory
>>>>>> 100 Stellarator Rd, Princeton NJ 08540, USA
>>>>
>>>> -- 
>>>> Chang Liu
>>>> Staff Research Physicist
>>>> +1 609 243 3438
>>>> cliu at pppl.gov
>>>> Princeton Plasma Physics Laboratory
>>>> 100 Stellarator Rd, Princeton NJ 08540, USA
>>>
>>
> 

-- 
Chang Liu
Staff Research Physicist
+1 609 243 3438
cliu at pppl.gov
Princeton Plasma Physics Laboratory
100 Stellarator Rd, Princeton NJ 08540, USA