[petsc-users] [External] Re: request to add an option similar to use_omp_threads for mumps to cusparse solver
Chang Liu
cliu at pppl.gov
Thu Oct 14 21:11:57 CDT 2021
For comparison, here is the output using mumps instead of cusparse
$ mpiexec -n 16 --hostfile hostfile --oversubscribe ./ex7 -m 400
-ksp_view -ksp_monitor_true_residual -pc_type bjacobi -pc_bjacobi_blocks
4 -ksp_type fgmres -mat_type aijcusparse -sub_pc_type telescope
-sub_ksp_type preonly -sub_telescope_ksp_type preonly
-sub_telescope_pc_type lu -sub_telescope_pc_factor_mat_solver_type mumps
-sub_pc_telescope_reduction_factor 4 -sub_pc_telescope_subcomm_type
contiguous -ksp_max_it 2000 -ksp_rtol 1.e-20 -ksp_atol 1.e-9
0 KSP unpreconditioned resid norm 4.014971979977e+01 true resid norm
4.014971979977e+01 ||r(i)||/||b|| 1.000000000000e+00
1 KSP unpreconditioned resid norm 2.439995191694e+00 true resid norm
2.439995191694e+00 ||r(i)||/||b|| 6.077240896978e-02
2 KSP unpreconditioned resid norm 1.280694102588e+00 true resid norm
1.280694102588e+00 ||r(i)||/||b|| 3.189795866509e-02
3 KSP unpreconditioned resid norm 1.041100266810e+00 true resid norm
1.041100266810e+00 ||r(i)||/||b|| 2.593044912896e-02
4 KSP unpreconditioned resid norm 7.274347137268e-01 true resid norm
7.274347137268e-01 ||r(i)||/||b|| 1.811805206499e-02
5 KSP unpreconditioned resid norm 5.429229329787e-01 true resid norm
5.429229329787e-01 ||r(i)||/||b|| 1.352245882876e-02
6 KSP unpreconditioned resid norm 4.332970410353e-01 true resid norm
4.332970410353e-01 ||r(i)||/||b|| 1.079203150598e-02
7 KSP unpreconditioned resid norm 3.948206050950e-01 true resid norm
3.948206050950e-01 ||r(i)||/||b|| 9.833707609019e-03
8 KSP unpreconditioned resid norm 3.379580577269e-01 true resid norm
3.379580577269e-01 ||r(i)||/||b|| 8.417444988714e-03
9 KSP unpreconditioned resid norm 2.875593971410e-01 true resid norm
2.875593971410e-01 ||r(i)||/||b|| 7.162176936105e-03
10 KSP unpreconditioned resid norm 2.533983363244e-01 true resid norm
2.533983363244e-01 ||r(i)||/||b|| 6.311335112378e-03
11 KSP unpreconditioned resid norm 2.389169921094e-01 true resid norm
2.389169921094e-01 ||r(i)||/||b|| 5.950651543793e-03
12 KSP unpreconditioned resid norm 2.118961639089e-01 true resid norm
2.118961639089e-01 ||r(i)||/||b|| 5.277649880637e-03
13 KSP unpreconditioned resid norm 1.885892030223e-01 true resid norm
1.885892030223e-01 ||r(i)||/||b|| 4.697148671593e-03
14 KSP unpreconditioned resid norm 1.763510666948e-01 true resid norm
1.763510666948e-01 ||r(i)||/||b|| 4.392336175055e-03
15 KSP unpreconditioned resid norm 1.638219366731e-01 true resid norm
1.638219366731e-01 ||r(i)||/||b|| 4.080275964317e-03
16 KSP unpreconditioned resid norm 1.476792766432e-01 true resid norm
1.476792766432e-01 ||r(i)||/||b|| 3.678214378076e-03
17 KSP unpreconditioned resid norm 1.349906937321e-01 true resid norm
1.349906937321e-01 ||r(i)||/||b|| 3.362182710248e-03
18 KSP unpreconditioned resid norm 1.289673236836e-01 true resid norm
1.289673236836e-01 ||r(i)||/||b|| 3.212159993314e-03
19 KSP unpreconditioned resid norm 1.167505658153e-01 true resid norm
1.167505658153e-01 ||r(i)||/||b|| 2.907879965230e-03
20 KSP unpreconditioned resid norm 1.046037988999e-01 true resid norm
1.046037988999e-01 ||r(i)||/||b|| 2.605343185995e-03
21 KSP unpreconditioned resid norm 9.832660514331e-02 true resid norm
9.832660514331e-02 ||r(i)||/||b|| 2.448998539309e-03
22 KSP unpreconditioned resid norm 8.835618950141e-02 true resid norm
8.835618950142e-02 ||r(i)||/||b|| 2.200667649539e-03
23 KSP unpreconditioned resid norm 7.563496650115e-02 true resid norm
7.563496650116e-02 ||r(i)||/||b|| 1.883823022386e-03
24 KSP unpreconditioned resid norm 6.651291376834e-02 true resid norm
6.651291376834e-02 ||r(i)||/||b|| 1.656622115921e-03
25 KSP unpreconditioned resid norm 5.890393227906e-02 true resid norm
5.890393227906e-02 ||r(i)||/||b|| 1.467106933070e-03
26 KSP unpreconditioned resid norm 4.661992782780e-02 true resid norm
4.661992782780e-02 ||r(i)||/||b|| 1.161152009536e-03
27 KSP unpreconditioned resid norm 3.690705358716e-02 true resid norm
3.690705358716e-02 ||r(i)||/||b|| 9.192356452602e-04
28 KSP unpreconditioned resid norm 3.209680460188e-02 true resid norm
3.209680460188e-02 ||r(i)||/||b|| 7.994278605666e-04
29 KSP unpreconditioned resid norm 2.354337626000e-02 true resid norm
2.354337626001e-02 ||r(i)||/||b|| 5.863895533373e-04
30 KSP unpreconditioned resid norm 1.701296561785e-02 true resid norm
1.701296561785e-02 ||r(i)||/||b|| 4.237380908932e-04
31 KSP unpreconditioned resid norm 1.509942937258e-02 true resid norm
1.509942937258e-02 ||r(i)||/||b|| 3.760780759588e-04
32 KSP unpreconditioned resid norm 1.258274688515e-02 true resid norm
1.258274688515e-02 ||r(i)||/||b|| 3.133956338402e-04
33 KSP unpreconditioned resid norm 9.805748771638e-03 true resid norm
9.805748771638e-03 ||r(i)||/||b|| 2.442295692359e-04
34 KSP unpreconditioned resid norm 8.596552678160e-03 true resid norm
8.596552678160e-03 ||r(i)||/||b|| 2.141123953301e-04
35 KSP unpreconditioned resid norm 6.936406707500e-03 true resid norm
6.936406707500e-03 ||r(i)||/||b|| 1.727635147167e-04
36 KSP unpreconditioned resid norm 5.533741607932e-03 true resid norm
5.533741607932e-03 ||r(i)||/||b|| 1.378276519869e-04
37 KSP unpreconditioned resid norm 4.982347757923e-03 true resid norm
4.982347757923e-03 ||r(i)||/||b|| 1.240942099414e-04
38 KSP unpreconditioned resid norm 4.309608348059e-03 true resid norm
4.309608348059e-03 ||r(i)||/||b|| 1.073384414524e-04
39 KSP unpreconditioned resid norm 3.729408303186e-03 true resid norm
3.729408303185e-03 ||r(i)||/||b|| 9.288753001974e-05
40 KSP unpreconditioned resid norm 3.490003351128e-03 true resid norm
3.490003351128e-03 ||r(i)||/||b|| 8.692472496776e-05
41 KSP unpreconditioned resid norm 3.069012426454e-03 true resid norm
3.069012426453e-03 ||r(i)||/||b|| 7.643919912166e-05
42 KSP unpreconditioned resid norm 2.772928845284e-03 true resid norm
2.772928845284e-03 ||r(i)||/||b|| 6.906471225983e-05
43 KSP unpreconditioned resid norm 2.561454192399e-03 true resid norm
2.561454192398e-03 ||r(i)||/||b|| 6.379756085902e-05
44 KSP unpreconditioned resid norm 2.253662762802e-03 true resid norm
2.253662762802e-03 ||r(i)||/||b|| 5.613146926159e-05
45 KSP unpreconditioned resid norm 2.086800523919e-03 true resid norm
2.086800523919e-03 ||r(i)||/||b|| 5.197546917701e-05
46 KSP unpreconditioned resid norm 1.926028182896e-03 true resid norm
1.926028182896e-03 ||r(i)||/||b|| 4.797114880257e-05
47 KSP unpreconditioned resid norm 1.769243808622e-03 true resid norm
1.769243808622e-03 ||r(i)||/||b|| 4.406615581492e-05
48 KSP unpreconditioned resid norm 1.656654905964e-03 true resid norm
1.656654905964e-03 ||r(i)||/||b|| 4.126192945371e-05
49 KSP unpreconditioned resid norm 1.572052627273e-03 true resid norm
1.572052627273e-03 ||r(i)||/||b|| 3.915475961260e-05
50 KSP unpreconditioned resid norm 1.454960682355e-03 true resid norm
1.454960682355e-03 ||r(i)||/||b|| 3.623837699518e-05
51 KSP unpreconditioned resid norm 1.375985053014e-03 true resid norm
1.375985053014e-03 ||r(i)||/||b|| 3.427134883820e-05
52 KSP unpreconditioned resid norm 1.269325501087e-03 true resid norm
1.269325501087e-03 ||r(i)||/||b|| 3.161480347603e-05
53 KSP unpreconditioned resid norm 1.184791772965e-03 true resid norm
1.184791772965e-03 ||r(i)||/||b|| 2.950934100844e-05
54 KSP unpreconditioned resid norm 1.064535156080e-03 true resid norm
1.064535156080e-03 ||r(i)||/||b|| 2.651413662135e-05
55 KSP unpreconditioned resid norm 9.639036688120e-04 true resid norm
9.639036688117e-04 ||r(i)||/||b|| 2.400773090370e-05
56 KSP unpreconditioned resid norm 8.632359780260e-04 true resid norm
8.632359780260e-04 ||r(i)||/||b|| 2.150042347322e-05
57 KSP unpreconditioned resid norm 7.613605783850e-04 true resid norm
7.613605783850e-04 ||r(i)||/||b|| 1.896303591113e-05
58 KSP unpreconditioned resid norm 6.681073248348e-04 true resid norm
6.681073248349e-04 ||r(i)||/||b|| 1.664039819373e-05
59 KSP unpreconditioned resid norm 5.656127908544e-04 true resid norm
5.656127908545e-04 ||r(i)||/||b|| 1.408758999254e-05
60 KSP unpreconditioned resid norm 4.850863370767e-04 true resid norm
4.850863370767e-04 ||r(i)||/||b|| 1.208193580169e-05
61 KSP unpreconditioned resid norm 4.374055762320e-04 true resid norm
4.374055762316e-04 ||r(i)||/||b|| 1.089436186387e-05
62 KSP unpreconditioned resid norm 3.874398257079e-04 true resid norm
3.874398257077e-04 ||r(i)||/||b|| 9.649876204364e-06
63 KSP unpreconditioned resid norm 3.364908694427e-04 true resid norm
3.364908694429e-04 ||r(i)||/||b|| 8.380902061609e-06
64 KSP unpreconditioned resid norm 2.961034697265e-04 true resid norm
2.961034697268e-04 ||r(i)||/||b|| 7.374982221632e-06
65 KSP unpreconditioned resid norm 2.640593092764e-04 true resid norm
2.640593092767e-04 ||r(i)||/||b|| 6.576865557059e-06
66 KSP unpreconditioned resid norm 2.423231125743e-04 true resid norm
2.423231125745e-04 ||r(i)||/||b|| 6.035487016671e-06
67 KSP unpreconditioned resid norm 2.182349471179e-04 true resid norm
2.182349471179e-04 ||r(i)||/||b|| 5.435528521898e-06
68 KSP unpreconditioned resid norm 2.008438265031e-04 true resid norm
2.008438265028e-04 ||r(i)||/||b|| 5.002371809927e-06
69 KSP unpreconditioned resid norm 1.838732863386e-04 true resid norm
1.838732863388e-04 ||r(i)||/||b|| 4.579690400226e-06
70 KSP unpreconditioned resid norm 1.723786027645e-04 true resid norm
1.723786027645e-04 ||r(i)||/||b|| 4.293394913444e-06
71 KSP unpreconditioned resid norm 1.580945192204e-04 true resid norm
1.580945192205e-04 ||r(i)||/||b|| 3.937624471826e-06
72 KSP unpreconditioned resid norm 1.476687469671e-04 true resid norm
1.476687469671e-04 ||r(i)||/||b|| 3.677952117812e-06
73 KSP unpreconditioned resid norm 1.385018526182e-04 true resid norm
1.385018526184e-04 ||r(i)||/||b|| 3.449634351350e-06
74 KSP unpreconditioned resid norm 1.279712893541e-04 true resid norm
1.279712893541e-04 ||r(i)||/||b|| 3.187351991305e-06
75 KSP unpreconditioned resid norm 1.202010411772e-04 true resid norm
1.202010411774e-04 ||r(i)||/||b|| 2.993820175504e-06
76 KSP unpreconditioned resid norm 1.113459414198e-04 true resid norm
1.113459414200e-04 ||r(i)||/||b|| 2.773268206485e-06
77 KSP unpreconditioned resid norm 1.042523036036e-04 true resid norm
1.042523036037e-04 ||r(i)||/||b|| 2.596588572066e-06
78 KSP unpreconditioned resid norm 9.565176453232e-05 true resid norm
9.565176453227e-05 ||r(i)||/||b|| 2.382376888539e-06
79 KSP unpreconditioned resid norm 8.896901670359e-05 true resid norm
8.896901670365e-05 ||r(i)||/||b|| 2.215931198209e-06
80 KSP unpreconditioned resid norm 8.119298425803e-05 true resid norm
8.119298425824e-05 ||r(i)||/||b|| 2.022255314935e-06
81 KSP unpreconditioned resid norm 7.544528309154e-05 true resid norm
7.544528309154e-05 ||r(i)||/||b|| 1.879098620558e-06
82 KSP unpreconditioned resid norm 6.755385041138e-05 true resid norm
6.755385041176e-05 ||r(i)||/||b|| 1.682548489719e-06
83 KSP unpreconditioned resid norm 6.158629300870e-05 true resid norm
6.158629300835e-05 ||r(i)||/||b|| 1.533915885727e-06
84 KSP unpreconditioned resid norm 5.358756885754e-05 true resid norm
5.358756885765e-05 ||r(i)||/||b|| 1.334693470462e-06
85 KSP unpreconditioned resid norm 4.774852370380e-05 true resid norm
4.774852370387e-05 ||r(i)||/||b|| 1.189261692037e-06
86 KSP unpreconditioned resid norm 3.919358737908e-05 true resid norm
3.919358737930e-05 ||r(i)||/||b|| 9.761858258229e-07
87 KSP unpreconditioned resid norm 3.434042319950e-05 true resid norm
3.434042319947e-05 ||r(i)||/||b|| 8.553091620745e-07
88 KSP unpreconditioned resid norm 2.813699436281e-05 true resid norm
2.813699436302e-05 ||r(i)||/||b|| 7.008017615898e-07
89 KSP unpreconditioned resid norm 2.462248069068e-05 true resid norm
2.462248069051e-05 ||r(i)||/||b|| 6.132665635851e-07
90 KSP unpreconditioned resid norm 2.040558789626e-05 true resid norm
2.040558789626e-05 ||r(i)||/||b|| 5.082373674841e-07
91 KSP unpreconditioned resid norm 1.888523204468e-05 true resid norm
1.888523204470e-05 ||r(i)||/||b|| 4.703702077842e-07
92 KSP unpreconditioned resid norm 1.707071292484e-05 true resid norm
1.707071292474e-05 ||r(i)||/||b|| 4.251763900191e-07
93 KSP unpreconditioned resid norm 1.498636454665e-05 true resid norm
1.498636454672e-05 ||r(i)||/||b|| 3.732619958859e-07
94 KSP unpreconditioned resid norm 1.219393542993e-05 true resid norm
1.219393543006e-05 ||r(i)||/||b|| 3.037115947725e-07
95 KSP unpreconditioned resid norm 1.059996963300e-05 true resid norm
1.059996963303e-05 ||r(i)||/||b|| 2.640110487917e-07
96 KSP unpreconditioned resid norm 9.099659872548e-06 true resid norm
9.099659873214e-06 ||r(i)||/||b|| 2.266431725699e-07
97 KSP unpreconditioned resid norm 8.147347587295e-06 true resid norm
8.147347587584e-06 ||r(i)||/||b|| 2.029241456283e-07
98 KSP unpreconditioned resid norm 7.167226146744e-06 true resid norm
7.167226146783e-06 ||r(i)||/||b|| 1.785124823418e-07
99 KSP unpreconditioned resid norm 6.552540209538e-06 true resid norm
6.552540209577e-06 ||r(i)||/||b|| 1.632026385802e-07
100 KSP unpreconditioned resid norm 5.767783600111e-06 true resid norm
5.767783600320e-06 ||r(i)||/||b|| 1.436568830140e-07
101 KSP unpreconditioned resid norm 5.261057430584e-06 true resid norm
5.261057431144e-06 ||r(i)||/||b|| 1.310359688033e-07
102 KSP unpreconditioned resid norm 4.715498525786e-06 true resid norm
4.715498525947e-06 ||r(i)||/||b|| 1.174478564100e-07
103 KSP unpreconditioned resid norm 4.380052669622e-06 true resid norm
4.380052669825e-06 ||r(i)||/||b|| 1.090929822591e-07
104 KSP unpreconditioned resid norm 3.911664470060e-06 true resid norm
3.911664470226e-06 ||r(i)||/||b|| 9.742694319496e-08
105 KSP unpreconditioned resid norm 3.652211458315e-06 true resid norm
3.652211458259e-06 ||r(i)||/||b|| 9.096480564430e-08
106 KSP unpreconditioned resid norm 3.387532128049e-06 true resid norm
3.387532128358e-06 ||r(i)||/||b|| 8.437249737363e-08
107 KSP unpreconditioned resid norm 3.234218880987e-06 true resid norm
3.234218880798e-06 ||r(i)||/||b|| 8.055395895481e-08
108 KSP unpreconditioned resid norm 3.016905196388e-06 true resid norm
3.016905196492e-06 ||r(i)||/||b|| 7.514137611763e-08
109 KSP unpreconditioned resid norm 2.858246441921e-06 true resid norm
2.858246441975e-06 ||r(i)||/||b|| 7.118969836476e-08
110 KSP unpreconditioned resid norm 2.637118810847e-06 true resid norm
2.637118810750e-06 ||r(i)||/||b|| 6.568212241336e-08
111 KSP unpreconditioned resid norm 2.494976088717e-06 true resid norm
2.494976088700e-06 ||r(i)||/||b|| 6.214180574966e-08
112 KSP unpreconditioned resid norm 2.270639574272e-06 true resid norm
2.270639574200e-06 ||r(i)||/||b|| 5.655430686750e-08
113 KSP unpreconditioned resid norm 2.104988663813e-06 true resid norm
2.104988664169e-06 ||r(i)||/||b|| 5.242847707696e-08
114 KSP unpreconditioned resid norm 1.889361127301e-06 true resid norm
1.889361127526e-06 ||r(i)||/||b|| 4.705789073868e-08
115 KSP unpreconditioned resid norm 1.732367008052e-06 true resid norm
1.732367007971e-06 ||r(i)||/||b|| 4.314767367271e-08
116 KSP unpreconditioned resid norm 1.509288268391e-06 true resid norm
1.509288268645e-06 ||r(i)||/||b|| 3.759150191264e-08
117 KSP unpreconditioned resid norm 1.359169217644e-06 true resid norm
1.359169217445e-06 ||r(i)||/||b|| 3.385252062089e-08
118 KSP unpreconditioned resid norm 1.180146337735e-06 true resid norm
1.180146337908e-06 ||r(i)||/||b|| 2.939363820703e-08
119 KSP unpreconditioned resid norm 1.067757039683e-06 true resid norm
1.067757039924e-06 ||r(i)||/||b|| 2.659438335433e-08
120 KSP unpreconditioned resid norm 9.435833073736e-07 true resid norm
9.435833073736e-07 ||r(i)||/||b|| 2.350161625235e-08
121 KSP unpreconditioned resid norm 8.749457237613e-07 true resid norm
8.749457236791e-07 ||r(i)||/||b|| 2.179207546261e-08
122 KSP unpreconditioned resid norm 7.945760150897e-07 true resid norm
7.945760150444e-07 ||r(i)||/||b|| 1.979032528762e-08
123 KSP unpreconditioned resid norm 7.141240839013e-07 true resid norm
7.141240838682e-07 ||r(i)||/||b|| 1.778652721438e-08
124 KSP unpreconditioned resid norm 6.300566936733e-07 true resid norm
6.300566936607e-07 ||r(i)||/||b|| 1.569267971988e-08
125 KSP unpreconditioned resid norm 5.628986997544e-07 true resid norm
5.628986995849e-07 ||r(i)||/||b|| 1.401999073448e-08
126 KSP unpreconditioned resid norm 5.119018951602e-07 true resid norm
5.119018951837e-07 ||r(i)||/||b|| 1.274982484900e-08
127 KSP unpreconditioned resid norm 4.664670343748e-07 true resid norm
4.664670344042e-07 ||r(i)||/||b|| 1.161818903670e-08
128 KSP unpreconditioned resid norm 4.253264691112e-07 true resid norm
4.253264691948e-07 ||r(i)||/||b|| 1.059351027394e-08
129 KSP unpreconditioned resid norm 3.868921150516e-07 true resid norm
3.868921150517e-07 ||r(i)||/||b|| 9.636234498800e-09
130 KSP unpreconditioned resid norm 3.558445658540e-07 true resid norm
3.558445660061e-07 ||r(i)||/||b|| 8.862940209315e-09
131 KSP unpreconditioned resid norm 3.268710273840e-07 true resid norm
3.268710272455e-07 ||r(i)||/||b|| 8.141302825416e-09
132 KSP unpreconditioned resid norm 3.041273897592e-07 true resid norm
3.041273896694e-07 ||r(i)||/||b|| 7.574832182794e-09
133 KSP unpreconditioned resid norm 2.851926677922e-07 true resid norm
2.851926674248e-07 ||r(i)||/||b|| 7.103229333782e-09
134 KSP unpreconditioned resid norm 2.694708315072e-07 true resid norm
2.694708309500e-07 ||r(i)||/||b|| 6.711649104748e-09
135 KSP unpreconditioned resid norm 2.534825559099e-07 true resid norm
2.534825557469e-07 ||r(i)||/||b|| 6.313432746507e-09
136 KSP unpreconditioned resid norm 2.387342352458e-07 true resid norm
2.387342351804e-07 ||r(i)||/||b|| 5.946099658254e-09
137 KSP unpreconditioned resid norm 2.200861667617e-07 true resid norm
2.200861665255e-07 ||r(i)||/||b|| 5.481636425438e-09
138 KSP unpreconditioned resid norm 2.051415370616e-07 true resid norm
2.051415370614e-07 ||r(i)||/||b|| 5.109413915824e-09
139 KSP unpreconditioned resid norm 1.887376429396e-07 true resid norm
1.887376426682e-07 ||r(i)||/||b|| 4.700845824315e-09
140 KSP unpreconditioned resid norm 1.729743133005e-07 true resid norm
1.729743128342e-07 ||r(i)||/||b|| 4.308232129561e-09
141 KSP unpreconditioned resid norm 1.541021130781e-07 true resid norm
1.541021128364e-07 ||r(i)||/||b|| 3.838186508023e-09
142 KSP unpreconditioned resid norm 1.384631628565e-07 true resid norm
1.384631627735e-07 ||r(i)||/||b|| 3.448670712125e-09
143 KSP unpreconditioned resid norm 1.223114405626e-07 true resid norm
1.223114403883e-07 ||r(i)||/||b|| 3.046383411846e-09
144 KSP unpreconditioned resid norm 1.087313066223e-07 true resid norm
1.087313065117e-07 ||r(i)||/||b|| 2.708146085550e-09
145 KSP unpreconditioned resid norm 9.181901998734e-08 true resid norm
9.181901984268e-08 ||r(i)||/||b|| 2.286915582489e-09
146 KSP unpreconditioned resid norm 7.885850510808e-08 true resid norm
7.885850531446e-08 ||r(i)||/||b|| 1.964110975313e-09
147 KSP unpreconditioned resid norm 6.483393946950e-08 true resid norm
6.483393931383e-08 ||r(i)||/||b|| 1.614804278515e-09
148 KSP unpreconditioned resid norm 5.690132597004e-08 true resid norm
5.690132577518e-08 ||r(i)||/||b|| 1.417228465328e-09
149 KSP unpreconditioned resid norm 5.023671521579e-08 true resid norm
5.023671502186e-08 ||r(i)||/||b|| 1.251234511035e-09
150 KSP unpreconditioned resid norm 4.625371062660e-08 true resid norm
4.625371062660e-08 ||r(i)||/||b|| 1.152030720445e-09
151 KSP unpreconditioned resid norm 4.349049084805e-08 true resid norm
4.349049089337e-08 ||r(i)||/||b|| 1.083207830846e-09
152 KSP unpreconditioned resid norm 3.932593324498e-08 true resid norm
3.932593376918e-08 ||r(i)||/||b|| 9.794821474546e-10
153 KSP unpreconditioned resid norm 3.504167649202e-08 true resid norm
3.504167638113e-08 ||r(i)||/||b|| 8.727751166356e-10
154 KSP unpreconditioned resid norm 2.892726347747e-08 true resid norm
2.892726348583e-08 ||r(i)||/||b|| 7.204848160858e-10
155 KSP unpreconditioned resid norm 2.477647033202e-08 true resid norm
2.477647041570e-08 ||r(i)||/||b|| 6.171019508795e-10
156 KSP unpreconditioned resid norm 2.128504065757e-08 true resid norm
2.128504067423e-08 ||r(i)||/||b|| 5.301416991298e-10
157 KSP unpreconditioned resid norm 1.879248809429e-08 true resid norm
1.879248818928e-08 ||r(i)||/||b|| 4.680602575310e-10
158 KSP unpreconditioned resid norm 1.673649140073e-08 true resid norm
1.673649134005e-08 ||r(i)||/||b|| 4.168520085200e-10
159 KSP unpreconditioned resid norm 1.497123388109e-08 true resid norm
1.497123365569e-08 ||r(i)||/||b|| 3.728851342016e-10
160 KSP unpreconditioned resid norm 1.315982130162e-08 true resid norm
1.315982149329e-08 ||r(i)||/||b|| 3.277687007261e-10
161 KSP unpreconditioned resid norm 1.182395864938e-08 true resid norm
1.182395868430e-08 ||r(i)||/||b|| 2.944966675550e-10
162 KSP unpreconditioned resid norm 1.070204481679e-08 true resid norm
1.070204466432e-08 ||r(i)||/||b|| 2.665534085342e-10
163 KSP unpreconditioned resid norm 9.969290307649e-09 true resid norm
9.969290432333e-09 ||r(i)||/||b|| 2.483028644297e-10
164 KSP unpreconditioned resid norm 9.134440883306e-09 true resid norm
9.134440980976e-09 ||r(i)||/||b|| 2.275094577628e-10
165 KSP unpreconditioned resid norm 8.593316427292e-09 true resid norm
8.593316413360e-09 ||r(i)||/||b|| 2.140317904139e-10
166 KSP unpreconditioned resid norm 8.042173048464e-09 true resid norm
8.042173332848e-09 ||r(i)||/||b|| 2.003045942277e-10
167 KSP unpreconditioned resid norm 7.655518522782e-09 true resid norm
7.655518879144e-09 ||r(i)||/||b|| 1.906742791064e-10
168 KSP unpreconditioned resid norm 7.210283391815e-09 true resid norm
7.210283220312e-09 ||r(i)||/||b|| 1.795848951442e-10
169 KSP unpreconditioned resid norm 6.793967416271e-09 true resid norm
6.793967448832e-09 ||r(i)||/||b|| 1.692158122825e-10
170 KSP unpreconditioned resid norm 6.249160304588e-09 true resid norm
6.249160382647e-09 ||r(i)||/||b|| 1.556464257736e-10
171 KSP unpreconditioned resid norm 5.794936438798e-09 true resid norm
5.794936332552e-09 ||r(i)||/||b|| 1.443331699811e-10
172 KSP unpreconditioned resid norm 5.222337397128e-09 true resid norm
5.222337443277e-09 ||r(i)||/||b|| 1.300715788135e-10
173 KSP unpreconditioned resid norm 4.755359110447e-09 true resid norm
4.755358888996e-09 ||r(i)||/||b|| 1.184406494668e-10
174 KSP unpreconditioned resid norm 4.317537007873e-09 true resid norm
4.317537267718e-09 ||r(i)||/||b|| 1.075359252630e-10
175 KSP unpreconditioned resid norm 3.924177535665e-09 true resid norm
3.924177629720e-09 ||r(i)||/||b|| 9.773860563138e-11
176 KSP unpreconditioned resid norm 3.502843065115e-09 true resid norm
3.502843126359e-09 ||r(i)||/||b|| 8.724452234855e-11
177 KSP unpreconditioned resid norm 3.083873232869e-09 true resid norm
3.083873352938e-09 ||r(i)||/||b|| 7.680933686007e-11
178 KSP unpreconditioned resid norm 2.758980676473e-09 true resid norm
2.758980618096e-09 ||r(i)||/||b|| 6.871730691658e-11
179 KSP unpreconditioned resid norm 2.510978240429e-09 true resid norm
2.510978327392e-09 ||r(i)||/||b|| 6.254036989334e-11
180 KSP unpreconditioned resid norm 2.323000193205e-09 true resid norm
2.323000193205e-09 ||r(i)||/||b|| 5.785844097519e-11
181 KSP unpreconditioned resid norm 2.167480159274e-09 true resid norm
2.167480113693e-09 ||r(i)||/||b|| 5.398493749153e-11
182 KSP unpreconditioned resid norm 1.983545827983e-09 true resid norm
1.983546404840e-09 ||r(i)||/||b|| 4.940374216139e-11
183 KSP unpreconditioned resid norm 1.794576286774e-09 true resid norm
1.794576224361e-09 ||r(i)||/||b|| 4.469710457036e-11
184 KSP unpreconditioned resid norm 1.583490590644e-09 true resid norm
1.583490380603e-09 ||r(i)||/||b|| 3.943963715064e-11
185 KSP unpreconditioned resid norm 1.412659866247e-09 true resid norm
1.412659832191e-09 ||r(i)||/||b|| 3.518479927722e-11
186 KSP unpreconditioned resid norm 1.285613344939e-09 true resid norm
1.285612984761e-09 ||r(i)||/||b|| 3.202047215205e-11
187 KSP unpreconditioned resid norm 1.168115133929e-09 true resid norm
1.168114766904e-09 ||r(i)||/||b|| 2.909397058634e-11
188 KSP unpreconditioned resid norm 1.063377926053e-09 true resid norm
1.063377647554e-09 ||r(i)||/||b|| 2.648530681802e-11
189 KSP unpreconditioned resid norm 9.548967728122e-10 true resid norm
9.548964523410e-10 ||r(i)||/||b|| 2.378339019807e-11
KSP Object: 16 MPI processes
type: fgmres
restart=30, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement
happy breakdown tolerance 1e-30
maximum iterations=2000, initial guess is zero
tolerances: relative=1e-20, absolute=1e-09, divergence=10000.
right preconditioning
using UNPRECONDITIONED norm type for convergence test
PC Object: 16 MPI processes
type: bjacobi
number of blocks = 4
Local solver information for first block is in the following KSP
and PC objects on rank 0:
Use -ksp_view ::ascii_info_detail to display information for all blocks
KSP Object: (sub_) 4 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (sub_) 4 MPI processes
type: telescope
petsc subcomm: parent comm size reduction factor = 4
petsc subcomm: parent_size = 4 , subcomm_size = 1
petsc subcomm type = contiguous
linear system matrix = precond matrix:
Mat Object: (sub_) 4 MPI processes
type: mpiaij
rows=40200, cols=40200
total: nonzeros=199996, allocated nonzeros=203412
total number of mallocs used during MatSetValues calls=0
not using I-node (on process 0) routines
setup type: default
Parent DM object: NULL
Sub DM object: NULL
KSP Object: (sub_telescope_) 1 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (sub_telescope_) 1 MPI processes
type: lu
out-of-place factorization
tolerance for zero pivot 2.22045e-14
matrix ordering: external
factor fill ratio given 0., needed 0.
Factored matrix follows:
Mat Object: 1 MPI processes
type: mumps
rows=40200, cols=40200
package used to perform factorization: mumps
total: nonzeros=1849788, allocated nonzeros=1849788
MUMPS run parameters:
SYM (matrix type): 0
PAR (host participation): 1
ICNTL(1) (output for error): 6
ICNTL(2) (output of diagnostic msg): 0
ICNTL(3) (output for global info): 0
ICNTL(4) (level of printing): 0
ICNTL(5) (input mat struct): 0
ICNTL(6) (matrix prescaling): 7
ICNTL(7) (sequential matrix ordering):7
ICNTL(8) (scaling strategy): 77
ICNTL(10) (max num of refinements): 0
ICNTL(11) (error analysis): 0
ICNTL(12) (efficiency control):
1
ICNTL(13) (sequential factorization of the root
node): 0
ICNTL(14) (percentage of estimated workspace
increase): 20
ICNTL(18) (input mat struct):
0
ICNTL(19) (Schur complement info):
0
ICNTL(20) (RHS sparse pattern):
0
ICNTL(21) (solution struct):
0
ICNTL(22) (in-core/out-of-core facility):
0
ICNTL(23) (max size of memory can be allocated
locally):0
ICNTL(24) (detection of null pivot rows):
0
ICNTL(25) (computation of a null space basis):
0
ICNTL(26) (Schur options for RHS or solution):
0
ICNTL(27) (blocking size for multiple RHS):
-32
ICNTL(28) (use parallel or sequential ordering):
1
ICNTL(29) (parallel ordering):
0
ICNTL(30) (user-specified set of entries in
inv(A)): 0
ICNTL(31) (factors is discarded in the solve
phase): 0
ICNTL(33) (compute determinant):
0
ICNTL(35) (activate BLR based factorization):
0
ICNTL(36) (choice of BLR factorization variant):
0
ICNTL(38) (estimated compression rate of LU
factors): 333
CNTL(1) (relative pivoting threshold): 0.01
CNTL(2) (stopping criterion of refinement):
1.49012e-08
CNTL(3) (absolute pivoting threshold): 0.
CNTL(4) (value of static pivoting): -1.
CNTL(5) (fixation for null pivots): 0.
CNTL(7) (dropping parameter for BLR): 0.
RINFO(1) (local estimated flops for the
elimination after analysis):
[0] 1.45525e+08
RINFO(2) (local estimated flops for the assembly
after factorization):
[0] 2.89397e+06
RINFO(3) (local estimated flops for the
elimination after factorization):
[0] 1.45525e+08
INFO(15) (estimated size of (in MB) MUMPS
internal data for running numerical factorization):
[0] 29
INFO(16) (size of (in MB) MUMPS internal data
used during numerical factorization):
[0] 29
INFO(23) (num of pivots eliminated on this
processor after factorization):
[0] 40200
RINFOG(1) (global estimated flops for the
elimination after analysis): 1.45525e+08
RINFOG(2) (global estimated flops for the
assembly after factorization): 2.89397e+06
RINFOG(3) (global estimated flops for the
elimination after factorization): 1.45525e+08
(RINFOG(12) RINFOG(13))*2^INFOG(34)
(determinant): (0.,0.)*(2^0)
INFOG(3) (estimated real workspace for factors on
all processors after analysis): 1849788
INFOG(4) (estimated integer workspace for factors
on all processors after analysis): 879986
INFOG(5) (estimated maximum front size in the
complete tree): 282
INFOG(6) (number of nodes in the complete tree):
23709
INFOG(7) (ordering option effectively used after
analysis): 5
INFOG(8) (structural symmetry in percent of the
permuted matrix after analysis): 100
INFOG(9) (total real/complex workspace to store
the matrix factors after factorization): 1849788
INFOG(10) (total integer space store the matrix
factors after factorization): 879986
INFOG(11) (order of largest frontal matrix after
factorization): 282
INFOG(12) (number of off-diagonal pivots): 0
INFOG(13) (number of delayed pivots after
factorization): 0
INFOG(14) (number of memory compress after
factorization): 0
INFOG(15) (number of steps of iterative
refinement after solution): 0
INFOG(16) (estimated size (in MB) of all MUMPS
internal data for factorization after analysis: value on the most memory
consuming processor): 29
INFOG(17) (estimated size of all MUMPS internal
data for factorization after analysis: sum over all processors): 29
INFOG(18) (size of all MUMPS internal data
allocated during factorization: value on the most memory consuming
processor): 29
INFOG(19) (size of all MUMPS internal data
allocated during factorization: sum over all processors): 29
INFOG(20) (estimated number of entries in the
factors): 1849788
INFOG(21) (size in MB of memory effectively used
during factorization - value on the most memory consuming processor): 26
INFOG(22) (size in MB of memory effectively used
during factorization - sum over all processors): 26
INFOG(23) (after analysis: value of ICNTL(6)
effectively used): 0
INFOG(24) (after analysis: value of ICNTL(12)
effectively used): 1
INFOG(25) (after factorization: number of pivots
modified by static pivoting): 0
INFOG(28) (after factorization: number of null
pivots encountered): 0
INFOG(29) (after factorization: effective number
of entries in the factors (sum over all processors)): 1849788
INFOG(30, 31) (after solution: size in Mbytes of
memory used during solution phase): 29, 29
INFOG(32) (after analysis: type of analysis done): 1
INFOG(33) (value used for ICNTL(8)): 7
INFOG(34) (exponent of the determinant if
determinant is requested): 0
INFOG(35) (after factorization: number of entries
taking into account BLR factor compression - sum over all processors):
1849788
INFOG(36) (after analysis: estimated size of all
MUMPS internal data for running BLR in-core - value on the most memory
consuming processor): 0
INFOG(37) (after analysis: estimated size of all
MUMPS internal data for running BLR in-core - sum over all processors): 0
INFOG(38) (after analysis: estimated size of all
MUMPS internal data for running BLR out-of-core - value on the most
memory consuming processor): 0
INFOG(39) (after analysis: estimated size of all
MUMPS internal data for running BLR out-of-core - sum over all
processors): 0
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaijcusparse
rows=40200, cols=40200
total: nonzeros=199996, allocated nonzeros=199996
total number of mallocs used during MatSetValues calls=0
not using I-node routines
linear system matrix = precond matrix:
Mat Object: 16 MPI processes
type: mpiaijcusparse
rows=160800, cols=160800
total: nonzeros=802396, allocated nonzeros=1608000
total number of mallocs used during MatSetValues calls=0
not using I-node (on process 0) routines
Norm of error 9.11684e-07 iterations 189
Chang
On 10/14/21 10:10 PM, Chang Liu wrote:
> Hi Barry,
>
> No problem. Here is the output. It seems that the resid norm calculation
> is incorrect.
>
> $ mpiexec -n 16 --hostfile hostfile --oversubscribe ./ex7 -m 400
> -ksp_view -ksp_monitor_true_residual -pc_type bjacobi -pc_bjacobi_blocks
> 4 -ksp_type fgmres -mat_type aijcusparse -sub_pc_type telescope
> -sub_ksp_type preonly -sub_telescope_ksp_type preonly
> -sub_telescope_pc_type lu -sub_telescope_pc_factor_mat_solver_type
> cusparse -sub_pc_telescope_reduction_factor 4
> -sub_pc_telescope_subcomm_type contiguous -ksp_max_it 2000 -ksp_rtol
> 1.e-20 -ksp_atol 1.e-9
> 0 KSP unpreconditioned resid norm 4.014971979977e+01 true resid norm
> 4.014971979977e+01 ||r(i)||/||b|| 1.000000000000e+00
> 1 KSP unpreconditioned resid norm 0.000000000000e+00 true resid norm
> 4.014971979977e+01 ||r(i)||/||b|| 1.000000000000e+00
> KSP Object: 16 MPI processes
> type: fgmres
> restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> happy breakdown tolerance 1e-30
> maximum iterations=2000, initial guess is zero
> tolerances: relative=1e-20, absolute=1e-09, divergence=10000.
> right preconditioning
> using UNPRECONDITIONED norm type for convergence test
> PC Object: 16 MPI processes
> type: bjacobi
> number of blocks = 4
> Local solver information for first block is in the following KSP
> and PC objects on rank 0:
> Use -ksp_view ::ascii_info_detail to display information for all
> blocks
> KSP Object: (sub_) 4 MPI processes
> type: preonly
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> left preconditioning
> using NONE norm type for convergence test
> PC Object: (sub_) 4 MPI processes
> type: telescope
> petsc subcomm: parent comm size reduction factor = 4
> petsc subcomm: parent_size = 4 , subcomm_size = 1
> petsc subcomm type = contiguous
> linear system matrix = precond matrix:
> Mat Object: (sub_) 4 MPI processes
> type: mpiaij
> rows=40200, cols=40200
> total: nonzeros=199996, allocated nonzeros=203412
> total number of mallocs used during MatSetValues calls=0
> not using I-node (on process 0) routines
> setup type: default
> Parent DM object: NULL
> Sub DM object: NULL
> KSP Object: (sub_telescope_) 1 MPI processes
> type: preonly
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> left preconditioning
> using NONE norm type for convergence test
> PC Object: (sub_telescope_) 1 MPI processes
> type: lu
> out-of-place factorization
> tolerance for zero pivot 2.22045e-14
> matrix ordering: nd
> factor fill ratio given 5., needed 8.62558
> Factored matrix follows:
> Mat Object: 1 MPI processes
> type: seqaijcusparse
> rows=40200, cols=40200
> package used to perform factorization: cusparse
> total: nonzeros=1725082, allocated nonzeros=1725082
> not using I-node routines
> linear system matrix = precond matrix:
> Mat Object: 1 MPI processes
> type: seqaijcusparse
> rows=40200, cols=40200
> total: nonzeros=199996, allocated nonzeros=199996
> total number of mallocs used during MatSetValues calls=0
> not using I-node routines
> linear system matrix = precond matrix:
> Mat Object: 16 MPI processes
> type: mpiaijcusparse
> rows=160800, cols=160800
> total: nonzeros=802396, allocated nonzeros=1608000
> total number of mallocs used during MatSetValues calls=0
> not using I-node (on process 0) routines
> Norm of error 400.999 iterations 1
>
> Chang
>
>
> On 10/14/21 9:47 PM, Barry Smith wrote:
>>
>> Chang,
>>
>> Sorry I did not notice that one. Please run that with -ksp_view
>> -ksp_monitor_true_residual so we can see exactly how options are
>> interpreted and solver used. At a glance it looks ok but something
>> must be wrong to get the wrong answer.
>>
>> Barry
>>
>>> On Oct 14, 2021, at 6:02 PM, Chang Liu <cliu at pppl.gov> wrote:
>>>
>>> Hi Barry,
>>>
>>> That is exactly what I was doing in the second example, in which the
>>> preconditioner works but the GMRES does not.
>>>
>>> Chang
>>>
>>> On 10/14/21 5:15 PM, Barry Smith wrote:
>>>> You need to use the PCTELESCOPE inside the block Jacobi, not
>>>> outside it. So something like -pc_type bjacobi -sub_pc_type
>>>> telescope -sub_telescope_pc_type lu
>>>>> On Oct 14, 2021, at 4:14 PM, Chang Liu <cliu at pppl.gov> wrote:
>>>>>
>>>>> Hi Pierre,
>>>>>
>>>>> I wonder if the trick of PCTELESCOPE only works for preconditioner
>>>>> and not for the solver. I have done some tests, and find that for
>>>>> solving a small matrix using -telescope_ksp_type preonly, it does
>>>>> work for GPU with multiple MPI processes. However, for bjacobi and
>>>>> gmres, it does not work.
>>>>>
>>>>> The command line options I used for small matrix is like
>>>>>
>>>>> mpiexec -n 4 --oversubscribe ./ex7 -m 100 -ksp_monitor_short
>>>>> -pc_type telescope -mat_type aijcusparse -telescope_pc_type lu
>>>>> -telescope_pc_factor_mat_solver_type cusparse -telescope_ksp_type
>>>>> preonly -pc_telescope_reduction_factor 4
>>>>>
>>>>> which gives the correct output. For iterative solver, I tried
>>>>>
>>>>> mpiexec -n 16 --oversubscribe ./ex7 -m 400 -ksp_monitor_short
>>>>> -pc_type bjacobi -pc_bjacobi_blocks 4 -ksp_type fgmres -mat_type
>>>>> aijcusparse -sub_pc_type telescope -sub_ksp_type preonly
>>>>> -sub_telescope_ksp_type preonly -sub_telescope_pc_type lu
>>>>> -sub_telescope_pc_factor_mat_solver_type cusparse
>>>>> -sub_pc_telescope_reduction_factor 4 -ksp_max_it 2000 -ksp_rtol
>>>>> 1.e-9 -ksp_atol 1.e-20
>>>>>
>>>>> for large matrix. The output is like
>>>>>
>>>>> 0 KSP Residual norm 40.1497
>>>>> 1 KSP Residual norm < 1.e-11
>>>>> Norm of error 400.999 iterations 1
>>>>>
>>>>> So it seems to call a direct solver instead of an iterative one.
>>>>>
>>>>> Can you please help check these options?
>>>>>
>>>>> Chang
>>>>>
>>>>> On 10/14/21 10:04 AM, Pierre Jolivet wrote:
>>>>>>> On 14 Oct 2021, at 3:50 PM, Chang Liu <cliu at pppl.gov> wrote:
>>>>>>>
>>>>>>> Thank you Pierre. I was not aware of PCTELESCOPE before. This
>>>>>>> sounds exactly what I need. I wonder if PCTELESCOPE can transform
>>>>>>> a mpiaijcusparse to seqaircusparse? Or I have to do it manually?
>>>>>> PCTELESCOPE uses MatCreateMPIMatConcatenateSeqMat().
>>>>>> 1) I’m not sure this is implemented for cuSparse matrices, but it
>>>>>> should be;
>>>>>> 2) at least for the implementations
>>>>>> MatCreateMPIMatConcatenateSeqMat_MPIBAIJ() and
>>>>>> MatCreateMPIMatConcatenateSeqMat_MPIAIJ(), the resulting MatType
>>>>>> is MATBAIJ (resp. MATAIJ). Constructors are usually “smart” enough
>>>>>> to detect if the MPI communicator on which the Mat lives is of
>>>>>> size 1 (your case), and then the resulting Mat is of type MatSeqX
>>>>>> instead of MatMPIX, so you would not need to worry about the
>>>>>> transformation you are mentioning.
>>>>>> If you try this out and this does not work, please provide the
>>>>>> backtrace (probably something like “Operation XYZ not implemented
>>>>>> for MatType ABC”), and hopefully someone can add the missing
>>>>>> plumbing.
>>>>>> I do not claim that this will be efficient, but I think this goes
>>>>>> in the direction of what you want to achieve.
>>>>>> Thanks,
>>>>>> Pierre
>>>>>>> Chang
>>>>>>>
>>>>>>> On 10/14/21 1:35 AM, Pierre Jolivet wrote:
>>>>>>>> Maybe I’m missing something, but can’t you use PCTELESCOPE as a
>>>>>>>> subdomain solver, with a reduction factor equal to the number of
>>>>>>>> MPI processes you have per block?
>>>>>>>> -sub_pc_type telescope -sub_pc_telescope_reduction_factor X
>>>>>>>> -sub_telescope_pc_type lu
>>>>>>>> This does not work with MUMPS -mat_mumps_use_omp_threads because
>>>>>>>> not only do the Mat needs to be redistributed, the secondary
>>>>>>>> processes also need to be “converted” to OpenMP threads.
>>>>>>>> Thus the need for specific code in mumps.c.
>>>>>>>> Thanks,
>>>>>>>> Pierre
>>>>>>>>> On 14 Oct 2021, at 6:00 AM, Chang Liu via petsc-users
>>>>>>>>> <petsc-users at mcs.anl.gov> wrote:
>>>>>>>>>
>>>>>>>>> Hi Junchao,
>>>>>>>>>
>>>>>>>>> Yes that is what I want.
>>>>>>>>>
>>>>>>>>> Chang
>>>>>>>>>
>>>>>>>>> On 10/13/21 11:42 PM, Junchao Zhang wrote:
>>>>>>>>>> On Wed, Oct 13, 2021 at 8:58 PM Barry Smith <bsmith at petsc.dev
>>>>>>>>>> <mailto:bsmith at petsc.dev>> wrote:
>>>>>>>>>> Junchao,
>>>>>>>>>> If I understand correctly Chang is using the block
>>>>>>>>>> Jacobi
>>>>>>>>>> method with a single block for a number of MPI ranks and a
>>>>>>>>>> direct
>>>>>>>>>> solver for each block so it uses
>>>>>>>>>> PCSetUp_BJacobi_Multiproc() which
>>>>>>>>>> is code Hong Zhang wrote a number of years ago for CPUs.
>>>>>>>>>> For their
>>>>>>>>>> particular problems this preconditioner works well, but
>>>>>>>>>> using an
>>>>>>>>>> iterative solver on the blocks does not work well.
>>>>>>>>>> If we had complete MPI-GPU direct solvers he could
>>>>>>>>>> just use
>>>>>>>>>> the current code with MPIAIJCUSPARSE on each block but
>>>>>>>>>> since we do
>>>>>>>>>> not he would like to use a single GPU for each block, this
>>>>>>>>>> means
>>>>>>>>>> that diagonal blocks of the global parallel MPI matrix
>>>>>>>>>> needs to be
>>>>>>>>>> sent to a subset of the GPUs (one GPU per block, which has
>>>>>>>>>> multiple
>>>>>>>>>> MPI ranks associated with the blocks). Similarly for the
>>>>>>>>>> triangular
>>>>>>>>>> solves the blocks of the right hand side needs to be
>>>>>>>>>> shipped to the
>>>>>>>>>> appropriate GPU and the resulting solution shipped back to
>>>>>>>>>> the
>>>>>>>>>> multiple GPUs. So Chang is absolutely correct, this is
>>>>>>>>>> somewhat like
>>>>>>>>>> your code for MUMPS with OpenMP. OK, I now understand the
>>>>>>>>>> background..
>>>>>>>>>> One could use PCSetUp_BJacobi_Multiproc() and get the
>>>>>>>>>> blocks on the
>>>>>>>>>> MPI ranks and then shrink each block down to a single GPU
>>>>>>>>>> but this
>>>>>>>>>> would be pretty inefficient, ideally one would go directly
>>>>>>>>>> from the
>>>>>>>>>> big MPI matrix on all the GPUs to the sub matrices on the
>>>>>>>>>> subset of
>>>>>>>>>> GPUs. But this may be a large coding project.
>>>>>>>>>> I don't understand these sentences. Why do you say "shrink"?
>>>>>>>>>> In my mind, we just need to move each block (submatrix) living
>>>>>>>>>> over multiple MPI ranks to one of them and solve directly
>>>>>>>>>> there. In other words, we keep blocks' size, no shrinking or
>>>>>>>>>> expanding.
>>>>>>>>>> As mentioned before, cusparse does not provide LU
>>>>>>>>>> factorization. So the LU factorization would be done on CPU,
>>>>>>>>>> and the solve be done on GPU. I assume Chang wants to gain
>>>>>>>>>> from the (potential) faster solve (instead of factorization)
>>>>>>>>>> on GPU.
>>>>>>>>>> Barry
>>>>>>>>>> Since the matrices being factored and solved directly are
>>>>>>>>>> relatively
>>>>>>>>>> large it is possible that the cusparse code could be
>>>>>>>>>> reasonably
>>>>>>>>>> efficient (they are not the tiny problems one gets at the
>>>>>>>>>> coarse
>>>>>>>>>> level of multigrid). Of course, this is speculation, I don't
>>>>>>>>>> actually know how much better the cusparse code would be
>>>>>>>>>> on the
>>>>>>>>>> direct solver than a good CPU direct sparse solver.
>>>>>>>>>> > On Oct 13, 2021, at 9:32 PM, Chang Liu <cliu at pppl.gov
>>>>>>>>>> <mailto:cliu at pppl.gov>> wrote:
>>>>>>>>>> >
>>>>>>>>>> > Sorry I am not familiar with the details either. Can
>>>>>>>>>> you please
>>>>>>>>>> check the code in MatMumpsGatherNonzerosOnMaster in mumps.c?
>>>>>>>>>> >
>>>>>>>>>> > Chang
>>>>>>>>>> >
>>>>>>>>>> > On 10/13/21 9:24 PM, Junchao Zhang wrote:
>>>>>>>>>> >> Hi Chang,
>>>>>>>>>> >> I did the work in mumps. It is easy for me to
>>>>>>>>>> understand
>>>>>>>>>> gathering matrix rows to one process.
>>>>>>>>>> >> But how to gather blocks (submatrices) to form a
>>>>>>>>>> large block? Can you draw a picture of that?
>>>>>>>>>> >> Thanks
>>>>>>>>>> >> --Junchao Zhang
>>>>>>>>>> >> On Wed, Oct 13, 2021 at 7:47 PM Chang Liu via petsc-users
>>>>>>>>>> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>>
>>>>>>>>>> wrote:
>>>>>>>>>> >> Hi Barry,
>>>>>>>>>> >> I think mumps solver in petsc does support that.
>>>>>>>>>> You can
>>>>>>>>>> check the
>>>>>>>>>> >> documentation on "-mat_mumps_use_omp_threads" at
>>>>>>>>>> >>
>>>>>>>>>>
>>>>>>>>>> https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> <https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html>
>>>>>>>>>>
>>>>>>>>>> >>
>>>>>>>>>> <https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> <https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html>>
>>>>>>>>>>
>>>>>>>>>> >> and the code enclosed by #if
>>>>>>>>>> defined(PETSC_HAVE_OPENMP_SUPPORT) in
>>>>>>>>>> >> functions MatMumpsSetUpDistRHSInfo and
>>>>>>>>>> >> MatMumpsGatherNonzerosOnMaster in
>>>>>>>>>> >> mumps.c
>>>>>>>>>> >> 1. I understand it is ideal to do one MPI rank per
>>>>>>>>>> GPU.
>>>>>>>>>> However, I am
>>>>>>>>>> >> working on an existing code that was developed
>>>>>>>>>> based on MPI
>>>>>>>>>> and the the
>>>>>>>>>> >> # of mpi ranks is typically equal to # of cpu
>>>>>>>>>> cores. We don't
>>>>>>>>>> want to
>>>>>>>>>> >> change the whole structure of the code.
>>>>>>>>>> >> 2. What you have suggested has been coded in
>>>>>>>>>> mumps.c. See
>>>>>>>>>> function
>>>>>>>>>> >> MatMumpsSetUpDistRHSInfo.
>>>>>>>>>> >> Regards,
>>>>>>>>>> >> Chang
>>>>>>>>>> >> On 10/13/21 7:53 PM, Barry Smith wrote:
>>>>>>>>>> >> >
>>>>>>>>>> >> >
>>>>>>>>>> >> >> On Oct 13, 2021, at 3:50 PM, Chang Liu
>>>>>>>>>> <cliu at pppl.gov
>>>>>>>>>> <mailto:cliu at pppl.gov>
>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>> wrote:
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> Hi Barry,
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> That is exactly what I want.
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> Back to my original question, I am looking for
>>>>>>>>>> an approach to
>>>>>>>>>> >> transfer
>>>>>>>>>> >> >> matrix
>>>>>>>>>> >> >> data from many MPI processes to "master" MPI
>>>>>>>>>> >> >> processes, each of which taking care of one
>>>>>>>>>> GPU, and then
>>>>>>>>>> upload
>>>>>>>>>> >> the data to GPU to
>>>>>>>>>> >> >> solve.
>>>>>>>>>> >> >> One can just grab some codes from mumps.c to
>>>>>>>>>> aijcusparse.cu <http://aijcusparse.cu>
>>>>>>>>>> >> <http://aijcusparse.cu <http://aijcusparse.cu>>.
>>>>>>>>>> >> >
>>>>>>>>>> >> > mumps.c doesn't actually do that. It never
>>>>>>>>>> needs to
>>>>>>>>>> copy the
>>>>>>>>>> >> entire matrix to a single MPI rank.
>>>>>>>>>> >> >
>>>>>>>>>> >> > It would be possible to write such a code
>>>>>>>>>> that you
>>>>>>>>>> suggest but
>>>>>>>>>> >> it is not clear that it makes sense
>>>>>>>>>> >> >
>>>>>>>>>> >> > 1) For normal PETSc GPU usage there is one GPU
>>>>>>>>>> per MPI
>>>>>>>>>> rank, so
>>>>>>>>>> >> while your one GPU per big domain is solving its
>>>>>>>>>> systems the
>>>>>>>>>> other
>>>>>>>>>> >> GPUs (with the other MPI ranks that share that
>>>>>>>>>> domain) are doing
>>>>>>>>>> >> nothing.
>>>>>>>>>> >> >
>>>>>>>>>> >> > 2) For each triangular solve you would have to
>>>>>>>>>> gather the
>>>>>>>>>> right
>>>>>>>>>> >> hand side from the multiple ranks to the single GPU
>>>>>>>>>> to pass it to
>>>>>>>>>> >> the GPU solver and then scatter the resulting
>>>>>>>>>> solution back
>>>>>>>>>> to all
>>>>>>>>>> >> of its subdomain ranks.
>>>>>>>>>> >> >
>>>>>>>>>> >> > What I was suggesting was assign an entire
>>>>>>>>>> subdomain to a
>>>>>>>>>> >> single MPI rank, thus it does everything on one GPU
>>>>>>>>>> and can
>>>>>>>>>> use the
>>>>>>>>>> >> GPU solver directly. If all the major computations
>>>>>>>>>> of a subdomain
>>>>>>>>>> >> can fit and be done on a single GPU then you would be
>>>>>>>>>> utilizing all
>>>>>>>>>> >> the GPUs you are using effectively.
>>>>>>>>>> >> >
>>>>>>>>>> >> > Barry
>>>>>>>>>> >> >
>>>>>>>>>> >> >
>>>>>>>>>> >> >
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> Chang
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> On 10/13/21 1:53 PM, Barry Smith wrote:
>>>>>>>>>> >> >>> Chang,
>>>>>>>>>> >> >>> You are correct there is no MPI + GPU direct
>>>>>>>>>> solvers that
>>>>>>>>>> >> currently do the triangular solves with MPI + GPU
>>>>>>>>>> parallelism
>>>>>>>>>> that I
>>>>>>>>>> >> am aware of. You are limited that individual
>>>>>>>>>> triangular solves be
>>>>>>>>>> >> done on a single GPU. I can only suggest making
>>>>>>>>>> each subdomain as
>>>>>>>>>> >> big as possible to utilize each GPU as much as
>>>>>>>>>> possible for the
>>>>>>>>>> >> direct triangular solves.
>>>>>>>>>> >> >>> Barry
>>>>>>>>>> >> >>>> On Oct 13, 2021, at 12:16 PM, Chang Liu via
>>>>>>>>>> petsc-users
>>>>>>>>>> >> <petsc-users at mcs.anl.gov
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>>
>>>>>>>>>> wrote:
>>>>>>>>>> >> >>>>
>>>>>>>>>> >> >>>> Hi Mark,
>>>>>>>>>> >> >>>>
>>>>>>>>>> >> >>>> '-mat_type aijcusparse' works with
>>>>>>>>>> mpiaijcusparse with
>>>>>>>>>> other
>>>>>>>>>> >> solvers, but with -pc_factor_mat_solver_type
>>>>>>>>>> cusparse, it
>>>>>>>>>> will give
>>>>>>>>>> >> an error.
>>>>>>>>>> >> >>>>
>>>>>>>>>> >> >>>> Yes what I want is to have mumps or superlu
>>>>>>>>>> to do the
>>>>>>>>>> >> factorization, and then do the rest, including
>>>>>>>>>> GMRES solver,
>>>>>>>>>> on gpu.
>>>>>>>>>> >> Is that possible?
>>>>>>>>>> >> >>>>
>>>>>>>>>> >> >>>> I have tried to use aijcusparse with
>>>>>>>>>> superlu_dist, it
>>>>>>>>>> runs but
>>>>>>>>>> >> the iterative solver is still running on CPUs. I have
>>>>>>>>>> contacted the
>>>>>>>>>> >> superlu group and they confirmed that is the case
>>>>>>>>>> right now.
>>>>>>>>>> But if
>>>>>>>>>> >> I set -pc_factor_mat_solver_type cusparse, it seems
>>>>>>>>>> that the
>>>>>>>>>> >> iterative solver is running on GPU.
>>>>>>>>>> >> >>>>
>>>>>>>>>> >> >>>> Chang
>>>>>>>>>> >> >>>>
>>>>>>>>>> >> >>>> On 10/13/21 12:03 PM, Mark Adams wrote:
>>>>>>>>>> >> >>>>> On Wed, Oct 13, 2021 at 11:10 AM Chang Liu
>>>>>>>>>> <cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>> wrote:
>>>>>>>>>> >> >>>>> Thank you Junchao for explaining this. I
>>>>>>>>>> guess in
>>>>>>>>>> my case
>>>>>>>>>> >> the code is
>>>>>>>>>> >> >>>>> just calling a seq solver like superlu
>>>>>>>>>> to do
>>>>>>>>>> >> factorization on GPUs.
>>>>>>>>>> >> >>>>> My idea is that I want to have a
>>>>>>>>>> traditional MPI
>>>>>>>>>> code to
>>>>>>>>>> >> utilize GPUs
>>>>>>>>>> >> >>>>> with cusparse. Right now cusparse does
>>>>>>>>>> not support
>>>>>>>>>> mpiaij
>>>>>>>>>> >> matrix, Sure it does: '-mat_type aijcusparse' will
>>>>>>>>>> give you an
>>>>>>>>>> >> mpiaijcusparse matrix with > 1 processes.
>>>>>>>>>> >> >>>>> (-mat_type mpiaijcusparse might also work
>>>>>>>>>> with >1 proc).
>>>>>>>>>> >> >>>>> However, I see in grepping the repo that all
>>>>>>>>>> the mumps and
>>>>>>>>>> >> superlu tests use aij or sell matrix type.
>>>>>>>>>> >> >>>>> MUMPS and SuperLU provide their own solves,
>>>>>>>>>> I assume
>>>>>>>>>> .... but
>>>>>>>>>> >> you might want to do other matrix operations on the
>>>>>>>>>> GPU. Is
>>>>>>>>>> that the
>>>>>>>>>> >> issue?
>>>>>>>>>> >> >>>>> Did you try -mat_type aijcusparse with MUMPS
>>>>>>>>>> and/or
>>>>>>>>>> SuperLU
>>>>>>>>>> >> have a problem? (no test with it so it probably
>>>>>>>>>> does not work)
>>>>>>>>>> >> >>>>> Thanks,
>>>>>>>>>> >> >>>>> Mark
>>>>>>>>>> >> >>>>> so I
>>>>>>>>>> >> >>>>> want the code to have a mpiaij matrix
>>>>>>>>>> when adding
>>>>>>>>>> all the
>>>>>>>>>> >> matrix terms,
>>>>>>>>>> >> >>>>> and then transform the matrix to seqaij
>>>>>>>>>> when doing the
>>>>>>>>>> >> factorization
>>>>>>>>>> >> >>>>> and
>>>>>>>>>> >> >>>>> solve. This involves sending the data to
>>>>>>>>>> the master
>>>>>>>>>> >> process, and I
>>>>>>>>>> >> >>>>> think
>>>>>>>>>> >> >>>>> the petsc mumps solver have something
>>>>>>>>>> similar already.
>>>>>>>>>> >> >>>>> Chang
>>>>>>>>>> >> >>>>> On 10/13/21 10:18 AM, Junchao Zhang wrote:
>>>>>>>>>> >> >>>>> >
>>>>>>>>>> >> >>>>> >
>>>>>>>>>> >> >>>>> >
>>>>>>>>>> >> >>>>> > On Tue, Oct 12, 2021 at 1:07 PM Mark
>>>>>>>>>> Adams
>>>>>>>>>> >> <mfadams at lbl.gov <mailto:mfadams at lbl.gov>
>>>>>>>>>> <mailto:mfadams at lbl.gov <mailto:mfadams at lbl.gov>>
>>>>>>>>>> >> >>>>> <mailto:mfadams at lbl.gov
>>>>>>>>>> <mailto:mfadams at lbl.gov>
>>>>>>>>>> <mailto:mfadams at lbl.gov <mailto:mfadams at lbl.gov>>>
>>>>>>>>>> >> >>>>> > <mailto:mfadams at lbl.gov
>>>>>>>>>> <mailto:mfadams at lbl.gov> <mailto:mfadams at lbl.gov
>>>>>>>>>> <mailto:mfadams at lbl.gov>>
>>>>>>>>>> >> <mailto:mfadams at lbl.gov <mailto:mfadams at lbl.gov>
>>>>>>>>>> <mailto:mfadams at lbl.gov <mailto:mfadams at lbl.gov>>>>> wrote:
>>>>>>>>>> >> >>>>> >
>>>>>>>>>> >> >>>>> >
>>>>>>>>>> >> >>>>> >
>>>>>>>>>> >> >>>>> > On Tue, Oct 12, 2021 at 1:45 PM
>>>>>>>>>> Chang Liu
>>>>>>>>>> >> <cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>> <mailto:cliu at pppl.gov
>>>>>>>>>> <mailto:cliu at pppl.gov>>
>>>>>>>>>> >> >>>>> <mailto:cliu at pppl.gov
>>>>>>>>>> <mailto:cliu at pppl.gov>
>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>>>>>>>>>> >> >>>>> > <mailto:cliu at pppl.gov
>>>>>>>>>> <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov
>>>>>>>>>> <mailto:cliu at pppl.gov>>
>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>> wrote:
>>>>>>>>>> >> >>>>> >
>>>>>>>>>> >> >>>>> > Hi Mark,
>>>>>>>>>> >> >>>>> >
>>>>>>>>>> >> >>>>> > The option I use is like
>>>>>>>>>> >> >>>>> >
>>>>>>>>>> >> >>>>> > -pc_type bjacobi
>>>>>>>>>> -pc_bjacobi_blocks 16
>>>>>>>>>> >> -ksp_type fgmres
>>>>>>>>>> >> >>>>> -mat_type
>>>>>>>>>> >> >>>>> > aijcusparse
>>>>>>>>>> *-sub_pc_factor_mat_solver_type
>>>>>>>>>> >> cusparse
>>>>>>>>>> >> >>>>> *-sub_ksp_type
>>>>>>>>>> >> >>>>> > preonly *-sub_pc_type lu*
>>>>>>>>>> -ksp_max_it 2000
>>>>>>>>>> >> -ksp_rtol 1.e-300
>>>>>>>>>> >> >>>>> > -ksp_atol 1.e-300
>>>>>>>>>> >> >>>>> >
>>>>>>>>>> >> >>>>> >
>>>>>>>>>> >> >>>>> > Note, If you use -log_view the
>>>>>>>>>> last column
>>>>>>>>>> (rows
>>>>>>>>>> >> are the
>>>>>>>>>> >> >>>>> method like
>>>>>>>>>> >> >>>>> > MatFactorNumeric) has the percent
>>>>>>>>>> of work
>>>>>>>>>> in the GPU.
>>>>>>>>>> >> >>>>> >
>>>>>>>>>> >> >>>>> > Junchao: *This* implies that we
>>>>>>>>>> have a
>>>>>>>>>> cuSparse LU
>>>>>>>>>> >> >>>>> factorization. Is
>>>>>>>>>> >> >>>>> > that correct? (I don't think we do)
>>>>>>>>>> >> >>>>> >
>>>>>>>>>> >> >>>>> > No, we don't have cuSparse LU
>>>>>>>>>> factorization. If you check
>>>>>>>>>> >> >>>>> >
>>>>>>>>>> MatLUFactorSymbolic_SeqAIJCUSPARSE(),you will
>>>>>>>>>> find it
>>>>>>>>>> >> calls
>>>>>>>>>> >> >>>>> > MatLUFactorSymbolic_SeqAIJ() instead.
>>>>>>>>>> >> >>>>> > So I don't understand Chang's idea.
>>>>>>>>>> Do you want to
>>>>>>>>>> >> make bigger
>>>>>>>>>> >> >>>>> blocks?
>>>>>>>>>> >> >>>>> >
>>>>>>>>>> >> >>>>> >
>>>>>>>>>> >> >>>>> > I think this one do both
>>>>>>>>>> factorization and
>>>>>>>>>> >> solve on gpu.
>>>>>>>>>> >> >>>>> >
>>>>>>>>>> >> >>>>> > You can check the
>>>>>>>>>> runex72_aijcusparse.sh file
>>>>>>>>>> >> in petsc
>>>>>>>>>> >> >>>>> install
>>>>>>>>>> >> >>>>> > directory, and try it your
>>>>>>>>>> self (this
>>>>>>>>>> is only lu
>>>>>>>>>> >> >>>>> factorization
>>>>>>>>>> >> >>>>> > without
>>>>>>>>>> >> >>>>> > iterative solve).
>>>>>>>>>> >> >>>>> >
>>>>>>>>>> >> >>>>> > Chang
>>>>>>>>>> >> >>>>> >
>>>>>>>>>> >> >>>>> > On 10/12/21 1:17 PM, Mark
>>>>>>>>>> Adams wrote:
>>>>>>>>>> >> >>>>> > >
>>>>>>>>>> >> >>>>> > >
>>>>>>>>>> >> >>>>> > > On Tue, Oct 12, 2021 at
>>>>>>>>>> 11:19 AM
>>>>>>>>>> Chang Liu
>>>>>>>>>> >> >>>>> <cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>>>>>>>>>> >> >>>>> > <mailto:cliu at pppl.gov
>>>>>>>>>> <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov
>>>>>>>>>> <mailto:cliu at pppl.gov>>
>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
>>>>>>>>>> >> >>>>> > > <mailto:cliu at pppl.gov
>>>>>>>>>> <mailto:cliu at pppl.gov>
>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>> <mailto:cliu at pppl.gov
>>>>>>>>>> <mailto:cliu at pppl.gov>>>
>>>>>>>>>> >> >>>>> <mailto:cliu at pppl.gov
>>>>>>>>>> <mailto:cliu at pppl.gov>
>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>> wrote:
>>>>>>>>>> >> >>>>> > >
>>>>>>>>>> >> >>>>> > > Hi Junchao,
>>>>>>>>>> >> >>>>> > >
>>>>>>>>>> >> >>>>> > > No I only needs it to
>>>>>>>>>> be transferred
>>>>>>>>>> >> within a
>>>>>>>>>> >> >>>>> node. I use
>>>>>>>>>> >> >>>>> > block-Jacobi
>>>>>>>>>> >> >>>>> > > method and GMRES to
>>>>>>>>>> solve the sparse
>>>>>>>>>> >> matrix, so each
>>>>>>>>>> >> >>>>> > direct solver will
>>>>>>>>>> >> >>>>> > > take care of a
>>>>>>>>>> sub-block of the
>>>>>>>>>> whole
>>>>>>>>>> >> matrix. In this
>>>>>>>>>> >> >>>>> > way, I can use
>>>>>>>>>> >> >>>>> > > one
>>>>>>>>>> >> >>>>> > > GPU to solve one
>>>>>>>>>> sub-block, which is
>>>>>>>>>> >> stored within
>>>>>>>>>> >> >>>>> one node.
>>>>>>>>>> >> >>>>> > >
>>>>>>>>>> >> >>>>> > > It was stated in the
>>>>>>>>>> documentation that
>>>>>>>>>> >> cusparse
>>>>>>>>>> >> >>>>> solver
>>>>>>>>>> >> >>>>> > is slow.
>>>>>>>>>> >> >>>>> > > However, in my test using
>>>>>>>>>> ex72.c, the
>>>>>>>>>> >> cusparse
>>>>>>>>>> >> >>>>> solver is
>>>>>>>>>> >> >>>>> > faster than
>>>>>>>>>> >> >>>>> > > mumps or superlu_dist
>>>>>>>>>> on CPUs.
>>>>>>>>>> >> >>>>> > >
>>>>>>>>>> >> >>>>> > >
>>>>>>>>>> >> >>>>> > > Are we talking about the
>>>>>>>>>> factorization, the
>>>>>>>>>> >> solve, or
>>>>>>>>>> >> >>>>> both?
>>>>>>>>>> >> >>>>> > >
>>>>>>>>>> >> >>>>> > > We do not have an
>>>>>>>>>> interface to
>>>>>>>>>> cuSparse's LU
>>>>>>>>>> >> >>>>> factorization (I
>>>>>>>>>> >> >>>>> > just
>>>>>>>>>> >> >>>>> > > learned that it exists a
>>>>>>>>>> few weeks ago).
>>>>>>>>>> >> >>>>> > > Perhaps your fast
>>>>>>>>>> "cusparse solver" is
>>>>>>>>>> >> '-pc_type lu
>>>>>>>>>> >> >>>>> -mat_type
>>>>>>>>>> >> >>>>> > > aijcusparse' ? This would
>>>>>>>>>> be the CPU
>>>>>>>>>> >> factorization,
>>>>>>>>>> >> >>>>> which is the
>>>>>>>>>> >> >>>>> > > dominant cost.
>>>>>>>>>> >> >>>>> > >
>>>>>>>>>> >> >>>>> > >
>>>>>>>>>> >> >>>>> > > Chang
>>>>>>>>>> >> >>>>> > >
>>>>>>>>>> >> >>>>> > > On 10/12/21 10:24 AM,
>>>>>>>>>> Junchao
>>>>>>>>>> Zhang wrote:
>>>>>>>>>> >> >>>>> > > > Hi, Chang,
>>>>>>>>>> >> >>>>> > > > For the mumps
>>>>>>>>>> solver, we
>>>>>>>>>> usually
>>>>>>>>>> >> transfers
>>>>>>>>>> >> >>>>> matrix
>>>>>>>>>> >> >>>>> > and vector
>>>>>>>>>> >> >>>>> > > data
>>>>>>>>>> >> >>>>> > > > within a compute
>>>>>>>>>> node. For
>>>>>>>>>> the idea you
>>>>>>>>>> >> >>>>> propose, it
>>>>>>>>>> >> >>>>> > looks like
>>>>>>>>>> >> >>>>> > > we need
>>>>>>>>>> >> >>>>> > > > to gather data within
>>>>>>>>>> >> MPI_COMM_WORLD, right?
>>>>>>>>>> >> >>>>> > > >
>>>>>>>>>> >> >>>>> > > > Mark, I
>>>>>>>>>> remember you said
>>>>>>>>>> >> cusparse solve is
>>>>>>>>>> >> >>>>> slow
>>>>>>>>>> >> >>>>> > and you would
>>>>>>>>>> >> >>>>> > > > rather do it on
>>>>>>>>>> CPU. Is it right?
>>>>>>>>>> >> >>>>> > > >
>>>>>>>>>> >> >>>>> > > > --Junchao Zhang
>>>>>>>>>> >> >>>>> > > >
>>>>>>>>>> >> >>>>> > > >
>>>>>>>>>> >> >>>>> > > > On Mon, Oct 11,
>>>>>>>>>> 2021 at 10:25 PM
>>>>>>>>>> >> Chang Liu via
>>>>>>>>>> >> >>>>> petsc-users
>>>>>>>>>> >> >>>>> > > >
>>>>>>>>>> <petsc-users at mcs.anl.gov
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>>>>>>>>>> >> <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>
>>>>>>>>>> >> >>>>> <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>>>>>>>>>> >> <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>>
>>>>>>>>>> >> >>>>> > <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>>>>>>>>>> >> <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>
>>>>>>>>>> >> >>>>> <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>>>>>>>>>> >> <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>>>
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>>>>>>>>>> >> <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>
>>>>>>>>>> >> >>>>> <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>>>>>>>>>> >> <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>>
>>>>>>>>>> >> >>>>> > <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>>>>>>>>>> >> <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>
>>>>>>>>>> >> >>>>> <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>>>>>>>>>> >> <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>>>>
>>>>>>>>>> >> >>>>> > >
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>>>>>>>>>> >> <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>
>>>>>>>>>> >> >>>>> <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>>>>>>>>>> >> <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>>
>>>>>>>>>> >> >>>>> > <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>>>>>>>>>> >> <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>
>>>>>>>>>> >> >>>>> <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>>>>>>>>>> >> <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>>>
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>>>>>>>>>> >> <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>
>>>>>>>>>> >> >>>>> <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>>>>>>>>>> >> <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>>
>>>>>>>>>> >> >>>>> > <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>>>>>>>>>> >> <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>
>>>>>>>>>> >> >>>>> <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>>>>>>>>>> >> <mailto:petsc-users at mcs.anl.gov
>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>>>>>>
>>>>>>>>>> >> >>>>> > > wrote:
>>>>>>>>>> >> >>>>> > > >
>>>>>>>>>> >> >>>>> > > > Hi,
>>>>>>>>>> >> >>>>> > > >
>>>>>>>>>> >> >>>>> > > > Currently, it
>>>>>>>>>> is possible
>>>>>>>>>> to use
>>>>>>>>>> >> mumps
>>>>>>>>>> >> >>>>> solver in
>>>>>>>>>> >> >>>>> > PETSC with
>>>>>>>>>> >> >>>>> > > >
>>>>>>>>>> -mat_mumps_use_omp_threads
>>>>>>>>>> >> option, so that
>>>>>>>>>> >> >>>>> > multiple MPI
>>>>>>>>>> >> >>>>> > > processes will
>>>>>>>>>> >> >>>>> > > > transfer the
>>>>>>>>>> matrix and
>>>>>>>>>> rhs data
>>>>>>>>>> >> to the master
>>>>>>>>>> >> >>>>> > rank, and then
>>>>>>>>>> >> >>>>> > > master
>>>>>>>>>> >> >>>>> > > > rank will call
>>>>>>>>>> mumps with
>>>>>>>>>> OpenMP
>>>>>>>>>> >> to solve
>>>>>>>>>> >> >>>>> the matrix.
>>>>>>>>>> >> >>>>> > > >
>>>>>>>>>> >> >>>>> > > > I wonder if
>>>>>>>>>> someone can
>>>>>>>>>> develop
>>>>>>>>>> >> similar
>>>>>>>>>> >> >>>>> option for
>>>>>>>>>> >> >>>>> > cusparse
>>>>>>>>>> >> >>>>> > > solver.
>>>>>>>>>> >> >>>>> > > > Right now, this
>>>>>>>>>> solver
>>>>>>>>>> does not
>>>>>>>>>> >> work with
>>>>>>>>>> >> >>>>> > mpiaijcusparse. I
>>>>>>>>>> >> >>>>> > > think a
>>>>>>>>>> >> >>>>> > > > possible
>>>>>>>>>> workaround is to
>>>>>>>>>> >> transfer all the
>>>>>>>>>> >> >>>>> matrix
>>>>>>>>>> >> >>>>> > data to one MPI
>>>>>>>>>> >> >>>>> > > > process, and
>>>>>>>>>> then upload the
>>>>>>>>>> >> data to GPU to
>>>>>>>>>> >> >>>>> solve.
>>>>>>>>>> >> >>>>> > In this
>>>>>>>>>> >> >>>>> > > way, one can
>>>>>>>>>> >> >>>>> > > > use cusparse
>>>>>>>>>> solver for a MPI
>>>>>>>>>> >> program.
>>>>>>>>>> >> >>>>> > > >
>>>>>>>>>> >> >>>>> > > > Chang
>>>>>>>>>> >> >>>>> > > > --
>>>>>>>>>> >> >>>>> > > > Chang Liu
>>>>>>>>>> >> >>>>> > > > Staff Research
>>>>>>>>>> Physicist
>>>>>>>>>> >> >>>>> > > > +1 609 243 3438
>>>>>>>>>> >> >>>>> > > > cliu at pppl.gov
>>>>>>>>>> <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov
>>>>>>>>>> <mailto:cliu at pppl.gov>>
>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>>>>>>>>>> >> >>>>> <mailto:cliu at pppl.gov
>>>>>>>>>> <mailto:cliu at pppl.gov>
>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
>>>>>>>>>> >> >>>>> > <mailto:cliu at pppl.gov
>>>>>>>>>> <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov
>>>>>>>>>> <mailto:cliu at pppl.gov>>
>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>>>>>>>>>> >> >>>>> <mailto:cliu at pppl.gov
>>>>>>>>>> <mailto:cliu at pppl.gov>
>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>
>>>>>>>>>> >> >>>>> > <mailto:cliu at pppl.gov
>>>>>>>>>> <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov
>>>>>>>>>> <mailto:cliu at pppl.gov>>
>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>>>>>>>>>> >> >>>>> <mailto:cliu at pppl.gov
>>>>>>>>>> <mailto:cliu at pppl.gov>
>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
>>>>>>>>>> >> >>>>> > > <mailto:cliu at pppl.gov
>>>>>>>>>> <mailto:cliu at pppl.gov>
>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>> <mailto:cliu at pppl.gov
>>>>>>>>>> <mailto:cliu at pppl.gov>>>
>>>>>>>>>> >> >>>>> <mailto:cliu at pppl.gov
>>>>>>>>>> <mailto:cliu at pppl.gov>
>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>>
>>>>>>>>>> >> >>>>> > > > Princeton
>>>>>>>>>> Plasma Physics
>>>>>>>>>> Laboratory
>>>>>>>>>> >> >>>>> > > > 100 Stellarator
>>>>>>>>>> Rd,
>>>>>>>>>> Princeton NJ
>>>>>>>>>> >> 08540, USA
>>>>>>>>>> >> >>>>> > > >
>>>>>>>>>> >> >>>>> > >
>>>>>>>>>> >> >>>>> > > --
>>>>>>>>>> >> >>>>> > > Chang Liu
>>>>>>>>>> >> >>>>> > > Staff Research Physicist
>>>>>>>>>> >> >>>>> > > +1 609 243 3438
>>>>>>>>>> >> >>>>> > > cliu at pppl.gov
>>>>>>>>>> <mailto:cliu at pppl.gov>
>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>>>>>>>>>> >> >>>>> <mailto:cliu at pppl.gov
>>>>>>>>>> <mailto:cliu at pppl.gov>
>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>>>>>>>> >> >>>>> <mailto:cliu at pppl.gov
>>>>>>>>>> <mailto:cliu at pppl.gov>
>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>>>>>>>>>> >> >>>>> > <mailto:cliu at pppl.gov
>>>>>>>>>> <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov
>>>>>>>>>> <mailto:cliu at pppl.gov>>
>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>
>>>>>>>>>> >> >>>>> > > Princeton Plasma
>>>>>>>>>> Physics Laboratory
>>>>>>>>>> >> >>>>> > > 100 Stellarator Rd,
>>>>>>>>>> Princeton NJ
>>>>>>>>>> 08540, USA
>>>>>>>>>> >> >>>>> > >
>>>>>>>>>> >> >>>>> >
>>>>>>>>>> >> >>>>> > --
>>>>>>>>>> >> >>>>> > Chang Liu
>>>>>>>>>> >> >>>>> > Staff Research Physicist
>>>>>>>>>> >> >>>>> > +1 609 243 3438
>>>>>>>>>> >> >>>>> > cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>>>>>>>>>> <mailto:cliu at pppl.gov
>>>>>>>>>> <mailto:cliu at pppl.gov>
>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>>>>>>>> >> >>>>> <mailto:cliu at pppl.gov
>>>>>>>>>> <mailto:cliu at pppl.gov>
>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
>>>>>>>>>> >> >>>>> > Princeton Plasma Physics
>>>>>>>>>> Laboratory
>>>>>>>>>> >> >>>>> > 100 Stellarator Rd, Princeton
>>>>>>>>>> NJ 08540, USA
>>>>>>>>>> >> >>>>> >
>>>>>>>>>> >> >>>>> -- Chang Liu
>>>>>>>>>> >> >>>>> Staff Research Physicist
>>>>>>>>>> >> >>>>> +1 609 243 3438
>>>>>>>>>> >> >>>>> cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>>>>>>>> <mailto:cliu at pppl.gov
>>>>>>>>>> <mailto:cliu at pppl.gov>
>>>>>>>>>> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>>>>>>>>>> >> >>>>> Princeton Plasma Physics Laboratory
>>>>>>>>>> >> >>>>> 100 Stellarator Rd, Princeton NJ 08540, USA
>>>>>>>>>> >> >>>>
>>>>>>>>>> >> >>>> --
>>>>>>>>>> >> >>>> Chang Liu
>>>>>>>>>> >> >>>> Staff Research Physicist
>>>>>>>>>> >> >>>> +1 609 243 3438
>>>>>>>>>> >> >>>> cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>>>>>>>> >> >>>> Princeton Plasma Physics Laboratory
>>>>>>>>>> >> >>>> 100 Stellarator Rd, Princeton NJ 08540, USA
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> --
>>>>>>>>>> >> >> Chang Liu
>>>>>>>>>> >> >> Staff Research Physicist
>>>>>>>>>> >> >> +1 609 243 3438
>>>>>>>>>> >> >> cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>>>>>>>>>> >> >> Princeton Plasma Physics Laboratory
>>>>>>>>>> >> >> 100 Stellarator Rd, Princeton NJ 08540, USA
>>>>>>>>>> >> >
>>>>>>>>>> >> -- Chang Liu
>>>>>>>>>> >> Staff Research Physicist
>>>>>>>>>> >> +1 609 243 3438
>>>>>>>>>> >> cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>> <mailto:cliu at pppl.gov
>>>>>>>>>> <mailto:cliu at pppl.gov>>
>>>>>>>>>> >> Princeton Plasma Physics Laboratory
>>>>>>>>>> >> 100 Stellarator Rd, Princeton NJ 08540, USA
>>>>>>>>>> >
>>>>>>>>>> > --
>>>>>>>>>> > Chang Liu
>>>>>>>>>> > Staff Research Physicist
>>>>>>>>>> > +1 609 243 3438
>>>>>>>>>> > cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>>>> > Princeton Plasma Physics Laboratory
>>>>>>>>>> > 100 Stellarator Rd, Princeton NJ 08540, USA
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Chang Liu
>>>>>>>>> Staff Research Physicist
>>>>>>>>> +1 609 243 3438
>>>>>>>>> cliu at pppl.gov
>>>>>>>>> Princeton Plasma Physics Laboratory
>>>>>>>>> 100 Stellarator Rd, Princeton NJ 08540, USA
>>>>>>>
>>>>>>> --
>>>>>>> Chang Liu
>>>>>>> Staff Research Physicist
>>>>>>> +1 609 243 3438
>>>>>>> cliu at pppl.gov
>>>>>>> Princeton Plasma Physics Laboratory
>>>>>>> 100 Stellarator Rd, Princeton NJ 08540, USA
>>>>>
>>>>> --
>>>>> Chang Liu
>>>>> Staff Research Physicist
>>>>> +1 609 243 3438
>>>>> cliu at pppl.gov
>>>>> Princeton Plasma Physics Laboratory
>>>>> 100 Stellarator Rd, Princeton NJ 08540, USA
>>>
>>> --
>>> Chang Liu
>>> Staff Research Physicist
>>> +1 609 243 3438
>>> cliu at pppl.gov
>>> Princeton Plasma Physics Laboratory
>>> 100 Stellarator Rd, Princeton NJ 08540, USA
>>
>
--
Chang Liu
Staff Research Physicist
+1 609 243 3438
cliu at pppl.gov
Princeton Plasma Physics Laboratory
100 Stellarator Rd, Princeton NJ 08540, USA
More information about the petsc-users
mailing list