[petsc-users] [External] Re: request to add an option similar to use_omp_threads for mumps to cusparse solver

Junchao Zhang junchao.zhang at gmail.com
Mon Oct 18 20:24:59 CDT 2021


Hi, Chang,
 I revised your patch and made an MR
https://gitlab.com/petsc/petsc/-/merge_requests/4471 with branch
jczhang/fix-PCTelescope-GPU
 Could you check if it works on your end?
 Thanks.
 --Junchao Zhang


On Sat, Oct 16, 2021 at 8:59 PM Junchao Zhang <junchao.zhang at gmail.com>
wrote:

> Hi, Chang,
>   Thanks a lot for the fix.  I will create an MR for it.
> --Junchao Zhang
>
>
> On Sat, Oct 16, 2021 at 8:12 PM Chang Liu <cliu at pppl.gov> wrote:
>
>> Hi Barry, Pierre and Junchao,
>>
>> I spent some time to find the reason for the error. I think it is caused
>> by some compability issues between telescope and cusparse.
>>
>> 1. In PCTelescopeMatCreate_default in telescope.c, it calls
>> MatCreateMPIMatConcatenateSeqMat to concat seqmat to mpimat, but this
>> function is from mpiaij.c and will set the mat type to mpiaij, even if
>> the original matrix is mpiaijcusparse.
>>
>> 2. Simiar issue exists in PCTelescopeSetUp_default, where the vector is
>> set to type mpi rather than mpicuda.
>>
>> I have fixed the issue using the following patch. After applying it,
>> telescope and cusparse work as expected.
>>
>> diff --git a/src/ksp/pc/impls/telescope/telescope.c
>> b/src/ksp/pc/impls/telescope/telescope.c
>> index 893febb055..d3f687eff9 100644
>> --- a/src/ksp/pc/impls/telescope/telescope.c
>> +++ b/src/ksp/pc/impls/telescope/telescope.c
>> @@ -159,6 +159,7 @@ PetscErrorCode PCTelescopeSetUp_default(PC
>> pc,PC_Telescope sred)
>>       ierr = VecCreate(subcomm,&xred);CHKERRQ(ierr);
>>       ierr = VecSetSizes(xred,PETSC_DECIDE,M);CHKERRQ(ierr);
>>       ierr = VecSetBlockSize(xred,bs);CHKERRQ(ierr);
>> +    ierr = VecSetType(xred,((PetscObject)x)->type_name);CHKERRQ(ierr);
>>       ierr = VecSetFromOptions(xred);CHKERRQ(ierr);
>>       ierr = VecGetLocalSize(xred,&m);CHKERRQ(ierr);
>>     }
>> diff --git a/src/mat/impls/aij/mpi/mpiaij.c
>> b/src/mat/impls/aij/mpi/mpiaij.c
>> index 36077002db..ac374e07eb 100644
>> --- a/src/mat/impls/aij/mpi/mpiaij.c
>> +++ b/src/mat/impls/aij/mpi/mpiaij.c
>> @@ -4486,6 +4486,7 @@ PetscErrorCode
>> MatCreateMPIMatConcatenateSeqMat_MPIAIJ(MPI_Comm comm,Mat inmat,P
>>     PetscInt       m,N,i,rstart,nnz,Ii;
>>     PetscInt       *indx;
>>     PetscScalar    *values;
>> +  PetscBool      isseqaijcusparse;
>>
>>     PetscFunctionBegin;
>>     ierr = MatGetSize(inmat,&m,&N);CHKERRQ(ierr);
>> @@ -4513,7 +4514,12 @@ PetscErrorCode
>> MatCreateMPIMatConcatenateSeqMat_MPIAIJ(MPI_Comm comm,Mat inmat,P
>>       ierr =
>> MatSetSizes(*outmat,m,n,PETSC_DETERMINE,PETSC_DETERMINE);CHKERRQ(ierr);
>>       ierr = MatGetBlockSizes(inmat,&bs,&cbs);CHKERRQ(ierr);
>>       ierr = MatSetBlockSizes(*outmat,bs,cbs);CHKERRQ(ierr);
>> -    ierr = MatSetType(*outmat,MATAIJ);CHKERRQ(ierr);
>> +    ierr =
>>
>> PetscObjectBaseTypeCompare((PetscObject)inmat,MATSEQAIJCUSPARSE,&isseqaijcusparse);CHKERRQ(ierr);
>> +    if (isseqaijcusparse) {
>> +      ierr = MatSetType(*outmat,MATAIJCUSPARSE);CHKERRQ(ierr);
>> +    } else {
>> +      ierr = MatSetType(*outmat,MATAIJ);CHKERRQ(ierr);
>> +    }
>>       ierr = MatSeqAIJSetPreallocation(*outmat,0,dnz);CHKERRQ(ierr);
>>       ierr = MatMPIAIJSetPreallocation(*outmat,0,dnz,0,onz);CHKERRQ(ierr);
>>       ierr = MatPreallocateFinalize(dnz,onz);CHKERRQ(ierr);
>>
>> Please help view it and merge to master if possible.
>>
>> Regards,
>>
>> Chang
>>
>> On 10/15/21 1:27 PM, Barry Smith wrote:
>> >
>> >    So the only difference is between
>> > -sub_telescope_pc_factor_mat_solver_type cusparse  and
>> > -sub_telescope_pc_factor_mat_solver_type mumps    ?
>> >
>> >     Try without the -sub_telescope_pc_factor_mat_solver_type cusparse
>> >   and then PETSc will just use the CPU solvers, I want to see if that
>> > works, it should. If it works then there is perhaps something specific
>> > about the PCTELESCOPE and the cusparse solver, for example the right
>> > hand side array values may never get to the GPU.
>> >
>> >    Barry
>> >
>> >> On Oct 14, 2021, at 10:11 PM, Chang Liu <cliu at pppl.gov
>> >> <mailto:cliu at pppl.gov>> wrote:
>> >>
>> >> For comparison, here is the output using mumps instead of cusparse
>> >>
>> >> $ mpiexec -n 16 --hostfile hostfile --oversubscribe ./ex7 -m 400
>> >> -ksp_view -ksp_monitor_true_residual -pc_type bjacobi
>> >> -pc_bjacobi_blocks 4 -ksp_type fgmres -mat_type aijcusparse
>> >> -sub_pc_type telescope -sub_ksp_type preonly -sub_telescope_ksp_type
>> >> preonly -sub_telescope_pc_type lu
>> >> -sub_telescope_pc_factor_mat_solver_type mumps
>> >> -sub_pc_telescope_reduction_factor 4 -sub_pc_telescope_subcomm_type
>> >> contiguous -ksp_max_it 2000 -ksp_rtol 1.e-20 -ksp_atol 1.e-9
>> >
>> > $ mpiexec -n 16 --hostfile hostfile --oversubscribe ./ex7 -m 400
>> > -ksp_view -ksp_monitor_true_residual -pc_type bjacobi
>> -pc_bjacobi_blocks
>> > 4 -ksp_type fgmres -mat_type aijcusparse -sub_pc_type telescope
>> > -sub_ksp_type preonly -sub_telescope_ksp_type preonly
>> > -sub_telescope_pc_type lu -sub_telescope_pc_factor_mat_solver_type
>> > cusparse -sub_pc_telescope_reduction_factor 4
>> > -sub_pc_telescope_subcomm_type contiguous -ksp_max_it 2000 -ksp_rtol
>> > 1.e-20 -ksp_atol 1.e-9
>> >
>> >
>> >>  0 KSP unpreconditioned resid norm 4.014971979977e+01 true resid norm
>> >> 4.014971979977e+01 ||r(i)||/||b|| 1.000000000000e+00
>> >>  1 KSP unpreconditioned resid norm 2.439995191694e+00 true resid norm
>> >> 2.439995191694e+00 ||r(i)||/||b|| 6.077240896978e-02
>> >>  2 KSP unpreconditioned resid norm 1.280694102588e+00 true resid norm
>> >> 1.280694102588e+00 ||r(i)||/||b|| 3.189795866509e-02
>> >>  3 KSP unpreconditioned resid norm 1.041100266810e+00 true resid norm
>> >> 1.041100266810e+00 ||r(i)||/||b|| 2.593044912896e-02
>> >>  4 KSP unpreconditioned resid norm 7.274347137268e-01 true resid norm
>> >> 7.274347137268e-01 ||r(i)||/||b|| 1.811805206499e-02
>> >>  5 KSP unpreconditioned resid norm 5.429229329787e-01 true resid norm
>> >> 5.429229329787e-01 ||r(i)||/||b|| 1.352245882876e-02
>> >>  6 KSP unpreconditioned resid norm 4.332970410353e-01 true resid norm
>> >> 4.332970410353e-01 ||r(i)||/||b|| 1.079203150598e-02
>> >>  7 KSP unpreconditioned resid norm 3.948206050950e-01 true resid norm
>> >> 3.948206050950e-01 ||r(i)||/||b|| 9.833707609019e-03
>> >>  8 KSP unpreconditioned resid norm 3.379580577269e-01 true resid norm
>> >> 3.379580577269e-01 ||r(i)||/||b|| 8.417444988714e-03
>> >>  9 KSP unpreconditioned resid norm 2.875593971410e-01 true resid norm
>> >> 2.875593971410e-01 ||r(i)||/||b|| 7.162176936105e-03
>> >> 10 KSP unpreconditioned resid norm 2.533983363244e-01 true resid norm
>> >> 2.533983363244e-01 ||r(i)||/||b|| 6.311335112378e-03
>> >> 11 KSP unpreconditioned resid norm 2.389169921094e-01 true resid norm
>> >> 2.389169921094e-01 ||r(i)||/||b|| 5.950651543793e-03
>> >> 12 KSP unpreconditioned resid norm 2.118961639089e-01 true resid norm
>> >> 2.118961639089e-01 ||r(i)||/||b|| 5.277649880637e-03
>> >> 13 KSP unpreconditioned resid norm 1.885892030223e-01 true resid norm
>> >> 1.885892030223e-01 ||r(i)||/||b|| 4.697148671593e-03
>> >> 14 KSP unpreconditioned resid norm 1.763510666948e-01 true resid norm
>> >> 1.763510666948e-01 ||r(i)||/||b|| 4.392336175055e-03
>> >> 15 KSP unpreconditioned resid norm 1.638219366731e-01 true resid norm
>> >> 1.638219366731e-01 ||r(i)||/||b|| 4.080275964317e-03
>> >> 16 KSP unpreconditioned resid norm 1.476792766432e-01 true resid norm
>> >> 1.476792766432e-01 ||r(i)||/||b|| 3.678214378076e-03
>> >> 17 KSP unpreconditioned resid norm 1.349906937321e-01 true resid norm
>> >> 1.349906937321e-01 ||r(i)||/||b|| 3.362182710248e-03
>> >> 18 KSP unpreconditioned resid norm 1.289673236836e-01 true resid norm
>> >> 1.289673236836e-01 ||r(i)||/||b|| 3.212159993314e-03
>> >> 19 KSP unpreconditioned resid norm 1.167505658153e-01 true resid norm
>> >> 1.167505658153e-01 ||r(i)||/||b|| 2.907879965230e-03
>> >> 20 KSP unpreconditioned resid norm 1.046037988999e-01 true resid norm
>> >> 1.046037988999e-01 ||r(i)||/||b|| 2.605343185995e-03
>> >> 21 KSP unpreconditioned resid norm 9.832660514331e-02 true resid norm
>> >> 9.832660514331e-02 ||r(i)||/||b|| 2.448998539309e-03
>> >> 22 KSP unpreconditioned resid norm 8.835618950141e-02 true resid norm
>> >> 8.835618950142e-02 ||r(i)||/||b|| 2.200667649539e-03
>> >> 23 KSP unpreconditioned resid norm 7.563496650115e-02 true resid norm
>> >> 7.563496650116e-02 ||r(i)||/||b|| 1.883823022386e-03
>> >> 24 KSP unpreconditioned resid norm 6.651291376834e-02 true resid norm
>> >> 6.651291376834e-02 ||r(i)||/||b|| 1.656622115921e-03
>> >> 25 KSP unpreconditioned resid norm 5.890393227906e-02 true resid norm
>> >> 5.890393227906e-02 ||r(i)||/||b|| 1.467106933070e-03
>> >> 26 KSP unpreconditioned resid norm 4.661992782780e-02 true resid norm
>> >> 4.661992782780e-02 ||r(i)||/||b|| 1.161152009536e-03
>> >> 27 KSP unpreconditioned resid norm 3.690705358716e-02 true resid norm
>> >> 3.690705358716e-02 ||r(i)||/||b|| 9.192356452602e-04
>> >> 28 KSP unpreconditioned resid norm 3.209680460188e-02 true resid norm
>> >> 3.209680460188e-02 ||r(i)||/||b|| 7.994278605666e-04
>> >> 29 KSP unpreconditioned resid norm 2.354337626000e-02 true resid norm
>> >> 2.354337626001e-02 ||r(i)||/||b|| 5.863895533373e-04
>> >> 30 KSP unpreconditioned resid norm 1.701296561785e-02 true resid norm
>> >> 1.701296561785e-02 ||r(i)||/||b|| 4.237380908932e-04
>> >> 31 KSP unpreconditioned resid norm 1.509942937258e-02 true resid norm
>> >> 1.509942937258e-02 ||r(i)||/||b|| 3.760780759588e-04
>> >> 32 KSP unpreconditioned resid norm 1.258274688515e-02 true resid norm
>> >> 1.258274688515e-02 ||r(i)||/||b|| 3.133956338402e-04
>> >> 33 KSP unpreconditioned resid norm 9.805748771638e-03 true resid norm
>> >> 9.805748771638e-03 ||r(i)||/||b|| 2.442295692359e-04
>> >> 34 KSP unpreconditioned resid norm 8.596552678160e-03 true resid norm
>> >> 8.596552678160e-03 ||r(i)||/||b|| 2.141123953301e-04
>> >> 35 KSP unpreconditioned resid norm 6.936406707500e-03 true resid norm
>> >> 6.936406707500e-03 ||r(i)||/||b|| 1.727635147167e-04
>> >> 36 KSP unpreconditioned resid norm 5.533741607932e-03 true resid norm
>> >> 5.533741607932e-03 ||r(i)||/||b|| 1.378276519869e-04
>> >> 37 KSP unpreconditioned resid norm 4.982347757923e-03 true resid norm
>> >> 4.982347757923e-03 ||r(i)||/||b|| 1.240942099414e-04
>> >> 38 KSP unpreconditioned resid norm 4.309608348059e-03 true resid norm
>> >> 4.309608348059e-03 ||r(i)||/||b|| 1.073384414524e-04
>> >> 39 KSP unpreconditioned resid norm 3.729408303186e-03 true resid norm
>> >> 3.729408303185e-03 ||r(i)||/||b|| 9.288753001974e-05
>> >> 40 KSP unpreconditioned resid norm 3.490003351128e-03 true resid norm
>> >> 3.490003351128e-03 ||r(i)||/||b|| 8.692472496776e-05
>> >> 41 KSP unpreconditioned resid norm 3.069012426454e-03 true resid norm
>> >> 3.069012426453e-03 ||r(i)||/||b|| 7.643919912166e-05
>> >> 42 KSP unpreconditioned resid norm 2.772928845284e-03 true resid norm
>> >> 2.772928845284e-03 ||r(i)||/||b|| 6.906471225983e-05
>> >> 43 KSP unpreconditioned resid norm 2.561454192399e-03 true resid norm
>> >> 2.561454192398e-03 ||r(i)||/||b|| 6.379756085902e-05
>> >> 44 KSP unpreconditioned resid norm 2.253662762802e-03 true resid norm
>> >> 2.253662762802e-03 ||r(i)||/||b|| 5.613146926159e-05
>> >> 45 KSP unpreconditioned resid norm 2.086800523919e-03 true resid norm
>> >> 2.086800523919e-03 ||r(i)||/||b|| 5.197546917701e-05
>> >> 46 KSP unpreconditioned resid norm 1.926028182896e-03 true resid norm
>> >> 1.926028182896e-03 ||r(i)||/||b|| 4.797114880257e-05
>> >> 47 KSP unpreconditioned resid norm 1.769243808622e-03 true resid norm
>> >> 1.769243808622e-03 ||r(i)||/||b|| 4.406615581492e-05
>> >> 48 KSP unpreconditioned resid norm 1.656654905964e-03 true resid norm
>> >> 1.656654905964e-03 ||r(i)||/||b|| 4.126192945371e-05
>> >> 49 KSP unpreconditioned resid norm 1.572052627273e-03 true resid norm
>> >> 1.572052627273e-03 ||r(i)||/||b|| 3.915475961260e-05
>> >> 50 KSP unpreconditioned resid norm 1.454960682355e-03 true resid norm
>> >> 1.454960682355e-03 ||r(i)||/||b|| 3.623837699518e-05
>> >> 51 KSP unpreconditioned resid norm 1.375985053014e-03 true resid norm
>> >> 1.375985053014e-03 ||r(i)||/||b|| 3.427134883820e-05
>> >> 52 KSP unpreconditioned resid norm 1.269325501087e-03 true resid norm
>> >> 1.269325501087e-03 ||r(i)||/||b|| 3.161480347603e-05
>> >> 53 KSP unpreconditioned resid norm 1.184791772965e-03 true resid norm
>> >> 1.184791772965e-03 ||r(i)||/||b|| 2.950934100844e-05
>> >> 54 KSP unpreconditioned resid norm 1.064535156080e-03 true resid norm
>> >> 1.064535156080e-03 ||r(i)||/||b|| 2.651413662135e-05
>> >> 55 KSP unpreconditioned resid norm 9.639036688120e-04 true resid norm
>> >> 9.639036688117e-04 ||r(i)||/||b|| 2.400773090370e-05
>> >> 56 KSP unpreconditioned resid norm 8.632359780260e-04 true resid norm
>> >> 8.632359780260e-04 ||r(i)||/||b|| 2.150042347322e-05
>> >> 57 KSP unpreconditioned resid norm 7.613605783850e-04 true resid norm
>> >> 7.613605783850e-04 ||r(i)||/||b|| 1.896303591113e-05
>> >> 58 KSP unpreconditioned resid norm 6.681073248348e-04 true resid norm
>> >> 6.681073248349e-04 ||r(i)||/||b|| 1.664039819373e-05
>> >> 59 KSP unpreconditioned resid norm 5.656127908544e-04 true resid norm
>> >> 5.656127908545e-04 ||r(i)||/||b|| 1.408758999254e-05
>> >> 60 KSP unpreconditioned resid norm 4.850863370767e-04 true resid norm
>> >> 4.850863370767e-04 ||r(i)||/||b|| 1.208193580169e-05
>> >> 61 KSP unpreconditioned resid norm 4.374055762320e-04 true resid norm
>> >> 4.374055762316e-04 ||r(i)||/||b|| 1.089436186387e-05
>> >> 62 KSP unpreconditioned resid norm 3.874398257079e-04 true resid norm
>> >> 3.874398257077e-04 ||r(i)||/||b|| 9.649876204364e-06
>> >> 63 KSP unpreconditioned resid norm 3.364908694427e-04 true resid norm
>> >> 3.364908694429e-04 ||r(i)||/||b|| 8.380902061609e-06
>> >> 64 KSP unpreconditioned resid norm 2.961034697265e-04 true resid norm
>> >> 2.961034697268e-04 ||r(i)||/||b|| 7.374982221632e-06
>> >> 65 KSP unpreconditioned resid norm 2.640593092764e-04 true resid norm
>> >> 2.640593092767e-04 ||r(i)||/||b|| 6.576865557059e-06
>> >> 66 KSP unpreconditioned resid norm 2.423231125743e-04 true resid norm
>> >> 2.423231125745e-04 ||r(i)||/||b|| 6.035487016671e-06
>> >> 67 KSP unpreconditioned resid norm 2.182349471179e-04 true resid norm
>> >> 2.182349471179e-04 ||r(i)||/||b|| 5.435528521898e-06
>> >> 68 KSP unpreconditioned resid norm 2.008438265031e-04 true resid norm
>> >> 2.008438265028e-04 ||r(i)||/||b|| 5.002371809927e-06
>> >> 69 KSP unpreconditioned resid norm 1.838732863386e-04 true resid norm
>> >> 1.838732863388e-04 ||r(i)||/||b|| 4.579690400226e-06
>> >> 70 KSP unpreconditioned resid norm 1.723786027645e-04 true resid norm
>> >> 1.723786027645e-04 ||r(i)||/||b|| 4.293394913444e-06
>> >> 71 KSP unpreconditioned resid norm 1.580945192204e-04 true resid norm
>> >> 1.580945192205e-04 ||r(i)||/||b|| 3.937624471826e-06
>> >> 72 KSP unpreconditioned resid norm 1.476687469671e-04 true resid norm
>> >> 1.476687469671e-04 ||r(i)||/||b|| 3.677952117812e-06
>> >> 73 KSP unpreconditioned resid norm 1.385018526182e-04 true resid norm
>> >> 1.385018526184e-04 ||r(i)||/||b|| 3.449634351350e-06
>> >> 74 KSP unpreconditioned resid norm 1.279712893541e-04 true resid norm
>> >> 1.279712893541e-04 ||r(i)||/||b|| 3.187351991305e-06
>> >> 75 KSP unpreconditioned resid norm 1.202010411772e-04 true resid norm
>> >> 1.202010411774e-04 ||r(i)||/||b|| 2.993820175504e-06
>> >> 76 KSP unpreconditioned resid norm 1.113459414198e-04 true resid norm
>> >> 1.113459414200e-04 ||r(i)||/||b|| 2.773268206485e-06
>> >> 77 KSP unpreconditioned resid norm 1.042523036036e-04 true resid norm
>> >> 1.042523036037e-04 ||r(i)||/||b|| 2.596588572066e-06
>> >> 78 KSP unpreconditioned resid norm 9.565176453232e-05 true resid norm
>> >> 9.565176453227e-05 ||r(i)||/||b|| 2.382376888539e-06
>> >> 79 KSP unpreconditioned resid norm 8.896901670359e-05 true resid norm
>> >> 8.896901670365e-05 ||r(i)||/||b|| 2.215931198209e-06
>> >> 80 KSP unpreconditioned resid norm 8.119298425803e-05 true resid norm
>> >> 8.119298425824e-05 ||r(i)||/||b|| 2.022255314935e-06
>> >> 81 KSP unpreconditioned resid norm 7.544528309154e-05 true resid norm
>> >> 7.544528309154e-05 ||r(i)||/||b|| 1.879098620558e-06
>> >> 82 KSP unpreconditioned resid norm 6.755385041138e-05 true resid norm
>> >> 6.755385041176e-05 ||r(i)||/||b|| 1.682548489719e-06
>> >> 83 KSP unpreconditioned resid norm 6.158629300870e-05 true resid norm
>> >> 6.158629300835e-05 ||r(i)||/||b|| 1.533915885727e-06
>> >> 84 KSP unpreconditioned resid norm 5.358756885754e-05 true resid norm
>> >> 5.358756885765e-05 ||r(i)||/||b|| 1.334693470462e-06
>> >> 85 KSP unpreconditioned resid norm 4.774852370380e-05 true resid norm
>> >> 4.774852370387e-05 ||r(i)||/||b|| 1.189261692037e-06
>> >> 86 KSP unpreconditioned resid norm 3.919358737908e-05 true resid norm
>> >> 3.919358737930e-05 ||r(i)||/||b|| 9.761858258229e-07
>> >> 87 KSP unpreconditioned resid norm 3.434042319950e-05 true resid norm
>> >> 3.434042319947e-05 ||r(i)||/||b|| 8.553091620745e-07
>> >> 88 KSP unpreconditioned resid norm 2.813699436281e-05 true resid norm
>> >> 2.813699436302e-05 ||r(i)||/||b|| 7.008017615898e-07
>> >> 89 KSP unpreconditioned resid norm 2.462248069068e-05 true resid norm
>> >> 2.462248069051e-05 ||r(i)||/||b|| 6.132665635851e-07
>> >> 90 KSP unpreconditioned resid norm 2.040558789626e-05 true resid norm
>> >> 2.040558789626e-05 ||r(i)||/||b|| 5.082373674841e-07
>> >> 91 KSP unpreconditioned resid norm 1.888523204468e-05 true resid norm
>> >> 1.888523204470e-05 ||r(i)||/||b|| 4.703702077842e-07
>> >> 92 KSP unpreconditioned resid norm 1.707071292484e-05 true resid norm
>> >> 1.707071292474e-05 ||r(i)||/||b|| 4.251763900191e-07
>> >> 93 KSP unpreconditioned resid norm 1.498636454665e-05 true resid norm
>> >> 1.498636454672e-05 ||r(i)||/||b|| 3.732619958859e-07
>> >> 94 KSP unpreconditioned resid norm 1.219393542993e-05 true resid norm
>> >> 1.219393543006e-05 ||r(i)||/||b|| 3.037115947725e-07
>> >> 95 KSP unpreconditioned resid norm 1.059996963300e-05 true resid norm
>> >> 1.059996963303e-05 ||r(i)||/||b|| 2.640110487917e-07
>> >> 96 KSP unpreconditioned resid norm 9.099659872548e-06 true resid norm
>> >> 9.099659873214e-06 ||r(i)||/||b|| 2.266431725699e-07
>> >> 97 KSP unpreconditioned resid norm 8.147347587295e-06 true resid norm
>> >> 8.147347587584e-06 ||r(i)||/||b|| 2.029241456283e-07
>> >> 98 KSP unpreconditioned resid norm 7.167226146744e-06 true resid norm
>> >> 7.167226146783e-06 ||r(i)||/||b|| 1.785124823418e-07
>> >> 99 KSP unpreconditioned resid norm 6.552540209538e-06 true resid norm
>> >> 6.552540209577e-06 ||r(i)||/||b|| 1.632026385802e-07
>> >> 100 KSP unpreconditioned resid norm 5.767783600111e-06 true resid norm
>> >> 5.767783600320e-06 ||r(i)||/||b|| 1.436568830140e-07
>> >> 101 KSP unpreconditioned resid norm 5.261057430584e-06 true resid norm
>> >> 5.261057431144e-06 ||r(i)||/||b|| 1.310359688033e-07
>> >> 102 KSP unpreconditioned resid norm 4.715498525786e-06 true resid norm
>> >> 4.715498525947e-06 ||r(i)||/||b|| 1.174478564100e-07
>> >> 103 KSP unpreconditioned resid norm 4.380052669622e-06 true resid norm
>> >> 4.380052669825e-06 ||r(i)||/||b|| 1.090929822591e-07
>> >> 104 KSP unpreconditioned resid norm 3.911664470060e-06 true resid norm
>> >> 3.911664470226e-06 ||r(i)||/||b|| 9.742694319496e-08
>> >> 105 KSP unpreconditioned resid norm 3.652211458315e-06 true resid norm
>> >> 3.652211458259e-06 ||r(i)||/||b|| 9.096480564430e-08
>> >> 106 KSP unpreconditioned resid norm 3.387532128049e-06 true resid norm
>> >> 3.387532128358e-06 ||r(i)||/||b|| 8.437249737363e-08
>> >> 107 KSP unpreconditioned resid norm 3.234218880987e-06 true resid norm
>> >> 3.234218880798e-06 ||r(i)||/||b|| 8.055395895481e-08
>> >> 108 KSP unpreconditioned resid norm 3.016905196388e-06 true resid norm
>> >> 3.016905196492e-06 ||r(i)||/||b|| 7.514137611763e-08
>> >> 109 KSP unpreconditioned resid norm 2.858246441921e-06 true resid norm
>> >> 2.858246441975e-06 ||r(i)||/||b|| 7.118969836476e-08
>> >> 110 KSP unpreconditioned resid norm 2.637118810847e-06 true resid norm
>> >> 2.637118810750e-06 ||r(i)||/||b|| 6.568212241336e-08
>> >> 111 KSP unpreconditioned resid norm 2.494976088717e-06 true resid norm
>> >> 2.494976088700e-06 ||r(i)||/||b|| 6.214180574966e-08
>> >> 112 KSP unpreconditioned resid norm 2.270639574272e-06 true resid norm
>> >> 2.270639574200e-06 ||r(i)||/||b|| 5.655430686750e-08
>> >> 113 KSP unpreconditioned resid norm 2.104988663813e-06 true resid norm
>> >> 2.104988664169e-06 ||r(i)||/||b|| 5.242847707696e-08
>> >> 114 KSP unpreconditioned resid norm 1.889361127301e-06 true resid norm
>> >> 1.889361127526e-06 ||r(i)||/||b|| 4.705789073868e-08
>> >> 115 KSP unpreconditioned resid norm 1.732367008052e-06 true resid norm
>> >> 1.732367007971e-06 ||r(i)||/||b|| 4.314767367271e-08
>> >> 116 KSP unpreconditioned resid norm 1.509288268391e-06 true resid norm
>> >> 1.509288268645e-06 ||r(i)||/||b|| 3.759150191264e-08
>> >> 117 KSP unpreconditioned resid norm 1.359169217644e-06 true resid norm
>> >> 1.359169217445e-06 ||r(i)||/||b|| 3.385252062089e-08
>> >> 118 KSP unpreconditioned resid norm 1.180146337735e-06 true resid norm
>> >> 1.180146337908e-06 ||r(i)||/||b|| 2.939363820703e-08
>> >> 119 KSP unpreconditioned resid norm 1.067757039683e-06 true resid norm
>> >> 1.067757039924e-06 ||r(i)||/||b|| 2.659438335433e-08
>> >> 120 KSP unpreconditioned resid norm 9.435833073736e-07 true resid norm
>> >> 9.435833073736e-07 ||r(i)||/||b|| 2.350161625235e-08
>> >> 121 KSP unpreconditioned resid norm 8.749457237613e-07 true resid norm
>> >> 8.749457236791e-07 ||r(i)||/||b|| 2.179207546261e-08
>> >> 122 KSP unpreconditioned resid norm 7.945760150897e-07 true resid norm
>> >> 7.945760150444e-07 ||r(i)||/||b|| 1.979032528762e-08
>> >> 123 KSP unpreconditioned resid norm 7.141240839013e-07 true resid norm
>> >> 7.141240838682e-07 ||r(i)||/||b|| 1.778652721438e-08
>> >> 124 KSP unpreconditioned resid norm 6.300566936733e-07 true resid norm
>> >> 6.300566936607e-07 ||r(i)||/||b|| 1.569267971988e-08
>> >> 125 KSP unpreconditioned resid norm 5.628986997544e-07 true resid norm
>> >> 5.628986995849e-07 ||r(i)||/||b|| 1.401999073448e-08
>> >> 126 KSP unpreconditioned resid norm 5.119018951602e-07 true resid norm
>> >> 5.119018951837e-07 ||r(i)||/||b|| 1.274982484900e-08
>> >> 127 KSP unpreconditioned resid norm 4.664670343748e-07 true resid norm
>> >> 4.664670344042e-07 ||r(i)||/||b|| 1.161818903670e-08
>> >> 128 KSP unpreconditioned resid norm 4.253264691112e-07 true resid norm
>> >> 4.253264691948e-07 ||r(i)||/||b|| 1.059351027394e-08
>> >> 129 KSP unpreconditioned resid norm 3.868921150516e-07 true resid norm
>> >> 3.868921150517e-07 ||r(i)||/||b|| 9.636234498800e-09
>> >> 130 KSP unpreconditioned resid norm 3.558445658540e-07 true resid norm
>> >> 3.558445660061e-07 ||r(i)||/||b|| 8.862940209315e-09
>> >> 131 KSP unpreconditioned resid norm 3.268710273840e-07 true resid norm
>> >> 3.268710272455e-07 ||r(i)||/||b|| 8.141302825416e-09
>> >> 132 KSP unpreconditioned resid norm 3.041273897592e-07 true resid norm
>> >> 3.041273896694e-07 ||r(i)||/||b|| 7.574832182794e-09
>> >> 133 KSP unpreconditioned resid norm 2.851926677922e-07 true resid norm
>> >> 2.851926674248e-07 ||r(i)||/||b|| 7.103229333782e-09
>> >> 134 KSP unpreconditioned resid norm 2.694708315072e-07 true resid norm
>> >> 2.694708309500e-07 ||r(i)||/||b|| 6.711649104748e-09
>> >> 135 KSP unpreconditioned resid norm 2.534825559099e-07 true resid norm
>> >> 2.534825557469e-07 ||r(i)||/||b|| 6.313432746507e-09
>> >> 136 KSP unpreconditioned resid norm 2.387342352458e-07 true resid norm
>> >> 2.387342351804e-07 ||r(i)||/||b|| 5.946099658254e-09
>> >> 137 KSP unpreconditioned resid norm 2.200861667617e-07 true resid norm
>> >> 2.200861665255e-07 ||r(i)||/||b|| 5.481636425438e-09
>> >> 138 KSP unpreconditioned resid norm 2.051415370616e-07 true resid norm
>> >> 2.051415370614e-07 ||r(i)||/||b|| 5.109413915824e-09
>> >> 139 KSP unpreconditioned resid norm 1.887376429396e-07 true resid norm
>> >> 1.887376426682e-07 ||r(i)||/||b|| 4.700845824315e-09
>> >> 140 KSP unpreconditioned resid norm 1.729743133005e-07 true resid norm
>> >> 1.729743128342e-07 ||r(i)||/||b|| 4.308232129561e-09
>> >> 141 KSP unpreconditioned resid norm 1.541021130781e-07 true resid norm
>> >> 1.541021128364e-07 ||r(i)||/||b|| 3.838186508023e-09
>> >> 142 KSP unpreconditioned resid norm 1.384631628565e-07 true resid norm
>> >> 1.384631627735e-07 ||r(i)||/||b|| 3.448670712125e-09
>> >> 143 KSP unpreconditioned resid norm 1.223114405626e-07 true resid norm
>> >> 1.223114403883e-07 ||r(i)||/||b|| 3.046383411846e-09
>> >> 144 KSP unpreconditioned resid norm 1.087313066223e-07 true resid norm
>> >> 1.087313065117e-07 ||r(i)||/||b|| 2.708146085550e-09
>> >> 145 KSP unpreconditioned resid norm 9.181901998734e-08 true resid norm
>> >> 9.181901984268e-08 ||r(i)||/||b|| 2.286915582489e-09
>> >> 146 KSP unpreconditioned resid norm 7.885850510808e-08 true resid norm
>> >> 7.885850531446e-08 ||r(i)||/||b|| 1.964110975313e-09
>> >> 147 KSP unpreconditioned resid norm 6.483393946950e-08 true resid norm
>> >> 6.483393931383e-08 ||r(i)||/||b|| 1.614804278515e-09
>> >> 148 KSP unpreconditioned resid norm 5.690132597004e-08 true resid norm
>> >> 5.690132577518e-08 ||r(i)||/||b|| 1.417228465328e-09
>> >> 149 KSP unpreconditioned resid norm 5.023671521579e-08 true resid norm
>> >> 5.023671502186e-08 ||r(i)||/||b|| 1.251234511035e-09
>> >> 150 KSP unpreconditioned resid norm 4.625371062660e-08 true resid norm
>> >> 4.625371062660e-08 ||r(i)||/||b|| 1.152030720445e-09
>> >> 151 KSP unpreconditioned resid norm 4.349049084805e-08 true resid norm
>> >> 4.349049089337e-08 ||r(i)||/||b|| 1.083207830846e-09
>> >> 152 KSP unpreconditioned resid norm 3.932593324498e-08 true resid norm
>> >> 3.932593376918e-08 ||r(i)||/||b|| 9.794821474546e-10
>> >> 153 KSP unpreconditioned resid norm 3.504167649202e-08 true resid norm
>> >> 3.504167638113e-08 ||r(i)||/||b|| 8.727751166356e-10
>> >> 154 KSP unpreconditioned resid norm 2.892726347747e-08 true resid norm
>> >> 2.892726348583e-08 ||r(i)||/||b|| 7.204848160858e-10
>> >> 155 KSP unpreconditioned resid norm 2.477647033202e-08 true resid norm
>> >> 2.477647041570e-08 ||r(i)||/||b|| 6.171019508795e-10
>> >> 156 KSP unpreconditioned resid norm 2.128504065757e-08 true resid norm
>> >> 2.128504067423e-08 ||r(i)||/||b|| 5.301416991298e-10
>> >> 157 KSP unpreconditioned resid norm 1.879248809429e-08 true resid norm
>> >> 1.879248818928e-08 ||r(i)||/||b|| 4.680602575310e-10
>> >> 158 KSP unpreconditioned resid norm 1.673649140073e-08 true resid norm
>> >> 1.673649134005e-08 ||r(i)||/||b|| 4.168520085200e-10
>> >> 159 KSP unpreconditioned resid norm 1.497123388109e-08 true resid norm
>> >> 1.497123365569e-08 ||r(i)||/||b|| 3.728851342016e-10
>> >> 160 KSP unpreconditioned resid norm 1.315982130162e-08 true resid norm
>> >> 1.315982149329e-08 ||r(i)||/||b|| 3.277687007261e-10
>> >> 161 KSP unpreconditioned resid norm 1.182395864938e-08 true resid norm
>> >> 1.182395868430e-08 ||r(i)||/||b|| 2.944966675550e-10
>> >> 162 KSP unpreconditioned resid norm 1.070204481679e-08 true resid norm
>> >> 1.070204466432e-08 ||r(i)||/||b|| 2.665534085342e-10
>> >> 163 KSP unpreconditioned resid norm 9.969290307649e-09 true resid norm
>> >> 9.969290432333e-09 ||r(i)||/||b|| 2.483028644297e-10
>> >> 164 KSP unpreconditioned resid norm 9.134440883306e-09 true resid norm
>> >> 9.134440980976e-09 ||r(i)||/||b|| 2.275094577628e-10
>> >> 165 KSP unpreconditioned resid norm 8.593316427292e-09 true resid norm
>> >> 8.593316413360e-09 ||r(i)||/||b|| 2.140317904139e-10
>> >> 166 KSP unpreconditioned resid norm 8.042173048464e-09 true resid norm
>> >> 8.042173332848e-09 ||r(i)||/||b|| 2.003045942277e-10
>> >> 167 KSP unpreconditioned resid norm 7.655518522782e-09 true resid norm
>> >> 7.655518879144e-09 ||r(i)||/||b|| 1.906742791064e-10
>> >> 168 KSP unpreconditioned resid norm 7.210283391815e-09 true resid norm
>> >> 7.210283220312e-09 ||r(i)||/||b|| 1.795848951442e-10
>> >> 169 KSP unpreconditioned resid norm 6.793967416271e-09 true resid norm
>> >> 6.793967448832e-09 ||r(i)||/||b|| 1.692158122825e-10
>> >> 170 KSP unpreconditioned resid norm 6.249160304588e-09 true resid norm
>> >> 6.249160382647e-09 ||r(i)||/||b|| 1.556464257736e-10
>> >> 171 KSP unpreconditioned resid norm 5.794936438798e-09 true resid norm
>> >> 5.794936332552e-09 ||r(i)||/||b|| 1.443331699811e-10
>> >> 172 KSP unpreconditioned resid norm 5.222337397128e-09 true resid norm
>> >> 5.222337443277e-09 ||r(i)||/||b|| 1.300715788135e-10
>> >> 173 KSP unpreconditioned resid norm 4.755359110447e-09 true resid norm
>> >> 4.755358888996e-09 ||r(i)||/||b|| 1.184406494668e-10
>> >> 174 KSP unpreconditioned resid norm 4.317537007873e-09 true resid norm
>> >> 4.317537267718e-09 ||r(i)||/||b|| 1.075359252630e-10
>> >> 175 KSP unpreconditioned resid norm 3.924177535665e-09 true resid norm
>> >> 3.924177629720e-09 ||r(i)||/||b|| 9.773860563138e-11
>> >> 176 KSP unpreconditioned resid norm 3.502843065115e-09 true resid norm
>> >> 3.502843126359e-09 ||r(i)||/||b|| 8.724452234855e-11
>> >> 177 KSP unpreconditioned resid norm 3.083873232869e-09 true resid norm
>> >> 3.083873352938e-09 ||r(i)||/||b|| 7.680933686007e-11
>> >> 178 KSP unpreconditioned resid norm 2.758980676473e-09 true resid norm
>> >> 2.758980618096e-09 ||r(i)||/||b|| 6.871730691658e-11
>> >> 179 KSP unpreconditioned resid norm 2.510978240429e-09 true resid norm
>> >> 2.510978327392e-09 ||r(i)||/||b|| 6.254036989334e-11
>> >> 180 KSP unpreconditioned resid norm 2.323000193205e-09 true resid norm
>> >> 2.323000193205e-09 ||r(i)||/||b|| 5.785844097519e-11
>> >> 181 KSP unpreconditioned resid norm 2.167480159274e-09 true resid norm
>> >> 2.167480113693e-09 ||r(i)||/||b|| 5.398493749153e-11
>> >> 182 KSP unpreconditioned resid norm 1.983545827983e-09 true resid norm
>> >> 1.983546404840e-09 ||r(i)||/||b|| 4.940374216139e-11
>> >> 183 KSP unpreconditioned resid norm 1.794576286774e-09 true resid norm
>> >> 1.794576224361e-09 ||r(i)||/||b|| 4.469710457036e-11
>> >> 184 KSP unpreconditioned resid norm 1.583490590644e-09 true resid norm
>> >> 1.583490380603e-09 ||r(i)||/||b|| 3.943963715064e-11
>> >> 185 KSP unpreconditioned resid norm 1.412659866247e-09 true resid norm
>> >> 1.412659832191e-09 ||r(i)||/||b|| 3.518479927722e-11
>> >> 186 KSP unpreconditioned resid norm 1.285613344939e-09 true resid norm
>> >> 1.285612984761e-09 ||r(i)||/||b|| 3.202047215205e-11
>> >> 187 KSP unpreconditioned resid norm 1.168115133929e-09 true resid norm
>> >> 1.168114766904e-09 ||r(i)||/||b|| 2.909397058634e-11
>> >> 188 KSP unpreconditioned resid norm 1.063377926053e-09 true resid norm
>> >> 1.063377647554e-09 ||r(i)||/||b|| 2.648530681802e-11
>> >> 189 KSP unpreconditioned resid norm 9.548967728122e-10 true resid norm
>> >> 9.548964523410e-10 ||r(i)||/||b|| 2.378339019807e-11
>> >> KSP Object: 16 MPI processes
>> >>  type: fgmres
>> >>    restart=30, using Classical (unmodified) Gram-Schmidt
>> >> Orthogonalization with no iterative refinement
>> >>    happy breakdown tolerance 1e-30
>> >>  maximum iterations=2000, initial guess is zero
>> >>  tolerances:  relative=1e-20, absolute=1e-09, divergence=10000.
>> >>  right preconditioning
>> >>  using UNPRECONDITIONED norm type for convergence test
>> >> PC Object: 16 MPI processes
>> >>  type: bjacobi
>> >>    number of blocks = 4
>> >>    Local solver information for first block is in the following KSP
>> >> and PC objects on rank 0:
>> >>    Use -ksp_view ::ascii_info_detail to display information for all
>> blocks
>> >>  KSP Object: (sub_) 4 MPI processes
>> >>    type: preonly
>> >>    maximum iterations=10000, initial guess is zero
>> >>    tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>> >>    left preconditioning
>> >>    using NONE norm type for convergence test
>> >>  PC Object: (sub_) 4 MPI processes
>> >>    type: telescope
>> >>      petsc subcomm: parent comm size reduction factor = 4
>> >>      petsc subcomm: parent_size = 4 , subcomm_size = 1
>> >>      petsc subcomm type = contiguous
>> >>    linear system matrix = precond matrix:
>> >>    Mat Object: (sub_) 4 MPI processes
>> >>      type: mpiaij
>> >>      rows=40200, cols=40200
>> >>      total: nonzeros=199996, allocated nonzeros=203412
>> >>      total number of mallocs used during MatSetValues calls=0
>> >>        not using I-node (on process 0) routines
>> >>        setup type: default
>> >>        Parent DM object: NULL
>> >>        Sub DM object: NULL
>> >>        KSP Object:   (sub_telescope_)   1 MPI processes
>> >>          type: preonly
>> >>          maximum iterations=10000, initial guess is zero
>> >>          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>> >>          left preconditioning
>> >>          using NONE norm type for convergence test
>> >>        PC Object:   (sub_telescope_)   1 MPI processes
>> >>          type: lu
>> >>            out-of-place factorization
>> >>            tolerance for zero pivot 2.22045e-14
>> >>            matrix ordering: external
>> >>            factor fill ratio given 0., needed 0.
>> >>              Factored matrix follows:
>> >>                Mat Object:   1 MPI processes
>> >>                  type: mumps
>> >>                  rows=40200, cols=40200
>> >>                  package used to perform factorization: mumps
>> >>                  total: nonzeros=1849788, allocated nonzeros=1849788
>> >>                    MUMPS run parameters:
>> >>                      SYM (matrix type):                   0
>> >>                      PAR (host participation):            1
>> >>                      ICNTL(1) (output for error):         6
>> >>                      ICNTL(2) (output of diagnostic msg): 0
>> >>                      ICNTL(3) (output for global info):   0
>> >>                      ICNTL(4) (level of printing):        0
>> >>                      ICNTL(5) (input mat struct):         0
>> >>                      ICNTL(6) (matrix prescaling):        7
>> >>                      ICNTL(7) (sequential matrix ordering):7
>> >>                      ICNTL(8) (scaling strategy):        77
>> >>                      ICNTL(10) (max num of refinements):  0
>> >>                      ICNTL(11) (error analysis):          0
>> >>                      ICNTL(12) (efficiency control):        1
>> >>                      ICNTL(13) (sequential factorization of the root
>> >> node):  0
>> >>                      ICNTL(14) (percentage of estimated workspace
>> >> increase): 20
>> >>                      ICNTL(18) (input mat struct):        0
>> >>                      ICNTL(19) (Schur complement info):        0
>> >>                      ICNTL(20) (RHS sparse pattern):        0
>> >>                      ICNTL(21) (solution struct):        0
>> >>                      ICNTL(22) (in-core/out-of-core facility):        0
>> >>                      ICNTL(23) (max size of memory can be allocated
>> >> locally):0
>> >>                      ICNTL(24) (detection of null pivot rows):        0
>> >>                      ICNTL(25) (computation of a null space basis):
>> >>        0
>> >>                      ICNTL(26) (Schur options for RHS or solution):
>> >>        0
>> >>                      ICNTL(27) (blocking size for multiple RHS):
>> >>        -32
>> >>                      ICNTL(28) (use parallel or sequential ordering):
>> >>        1
>> >>                      ICNTL(29) (parallel ordering):        0
>> >>                      ICNTL(30) (user-specified set of entries in
>> >> inv(A)):    0
>> >>                      ICNTL(31) (factors is discarded in the solve
>> >> phase):    0
>> >>                      ICNTL(33) (compute determinant):        0
>> >>                      ICNTL(35) (activate BLR based factorization):
>> >>        0
>> >>                      ICNTL(36) (choice of BLR factorization variant):
>> >>        0
>> >>                      ICNTL(38) (estimated compression rate of LU
>> >> factors):   333
>> >>                      CNTL(1) (relative pivoting threshold):      0.01
>> >>                      CNTL(2) (stopping criterion of refinement):
>> >> 1.49012e-08
>> >>                      CNTL(3) (absolute pivoting threshold):      0.
>> >>                      CNTL(4) (value of static pivoting):         -1.
>> >>                      CNTL(5) (fixation for null pivots):         0.
>> >>                      CNTL(7) (dropping parameter for BLR):       0.
>> >>                      RINFO(1) (local estimated flops for the
>> >> elimination after analysis):
>> >>                        [0] 1.45525e+08
>> >>                      RINFO(2) (local estimated flops for the assembly
>> >> after factorization):
>> >>                        [0]  2.89397e+06
>> >>                      RINFO(3) (local estimated flops for the
>> >> elimination after factorization):
>> >>                        [0]  1.45525e+08
>> >>                      INFO(15) (estimated size of (in MB) MUMPS
>> >> internal data for running numerical factorization):
>> >>                      [0] 29
>> >>                      INFO(16) (size of (in MB) MUMPS internal data
>> >> used during numerical factorization):
>> >>                        [0] 29
>> >>                      INFO(23) (num of pivots eliminated on this
>> >> processor after factorization):
>> >>                        [0] 40200
>> >>                      RINFOG(1) (global estimated flops for the
>> >> elimination after analysis): 1.45525e+08
>> >>                      RINFOG(2) (global estimated flops for the
>> >> assembly after factorization): 2.89397e+06
>> >>                      RINFOG(3) (global estimated flops for the
>> >> elimination after factorization): 1.45525e+08
>> >>                      (RINFOG(12) RINFOG(13))*2^INFOG(34)
>> >> (determinant): (0.,0.)*(2^0)
>> >>                      INFOG(3) (estimated real workspace for factors on
>> >> all processors after analysis): 1849788
>> >>                      INFOG(4) (estimated integer workspace for factors
>> >> on all processors after analysis): 879986
>> >>                      INFOG(5) (estimated maximum front size in the
>> >> complete tree): 282
>> >>                      INFOG(6) (number of nodes in the complete tree):
>> >> 23709
>> >>                      INFOG(7) (ordering option effectively used after
>> >> analysis): 5
>> >>                      INFOG(8) (structural symmetry in percent of the
>> >> permuted matrix after analysis): 100
>> >>                      INFOG(9) (total real/complex workspace to store
>> >> the matrix factors after factorization): 1849788
>> >>                      INFOG(10) (total integer space store the matrix
>> >> factors after factorization): 879986
>> >>                      INFOG(11) (order of largest frontal matrix after
>> >> factorization): 282
>> >>                      INFOG(12) (number of off-diagonal pivots): 0
>> >>                      INFOG(13) (number of delayed pivots after
>> >> factorization): 0
>> >>                      INFOG(14) (number of memory compress after
>> >> factorization): 0
>> >>                      INFOG(15) (number of steps of iterative
>> >> refinement after solution): 0
>> >>                      INFOG(16) (estimated size (in MB) of all MUMPS
>> >> internal data for factorization after analysis: value on the most
>> >> memory consuming processor): 29
>> >>                      INFOG(17) (estimated size of all MUMPS internal
>> >> data for factorization after analysis: sum over all processors): 29
>> >>                      INFOG(18) (size of all MUMPS internal data
>> >> allocated during factorization: value on the most memory consuming
>> >> processor): 29
>> >>                      INFOG(19) (size of all MUMPS internal data
>> >> allocated during factorization: sum over all processors): 29
>> >>                      INFOG(20) (estimated number of entries in the
>> >> factors): 1849788
>> >>                      INFOG(21) (size in MB of memory effectively used
>> >> during factorization - value on the most memory consuming processor):
>> 26
>> >>                      INFOG(22) (size in MB of memory effectively used
>> >> during factorization - sum over all processors): 26
>> >>                      INFOG(23) (after analysis: value of ICNTL(6)
>> >> effectively used): 0
>> >>                      INFOG(24) (after analysis: value of ICNTL(12)
>> >> effectively used): 1
>> >>                      INFOG(25) (after factorization: number of pivots
>> >> modified by static pivoting): 0
>> >>                      INFOG(28) (after factorization: number of null
>> >> pivots encountered): 0
>> >>                      INFOG(29) (after factorization: effective number
>> >> of entries in the factors (sum over all processors)): 1849788
>> >>                      INFOG(30, 31) (after solution: size in Mbytes of
>> >> memory used during solution phase): 29, 29
>> >>                      INFOG(32) (after analysis: type of analysis
>> done): 1
>> >>                      INFOG(33) (value used for ICNTL(8)): 7
>> >>                      INFOG(34) (exponent of the determinant if
>> >> determinant is requested): 0
>> >>                      INFOG(35) (after factorization: number of entries
>> >> taking into account BLR factor compression - sum over all processors):
>> >> 1849788
>> >>                      INFOG(36) (after analysis: estimated size of all
>> >> MUMPS internal data for running BLR in-core - value on the most memory
>> >> consuming processor): 0
>> >>                      INFOG(37) (after analysis: estimated size of all
>> >> MUMPS internal data for running BLR in-core - sum over all
>> processors): 0
>> >>                      INFOG(38) (after analysis: estimated size of all
>> >> MUMPS internal data for running BLR out-of-core - value on the most
>> >> memory consuming processor): 0
>> >>                      INFOG(39) (after analysis: estimated size of all
>> >> MUMPS internal data for running BLR out-of-core - sum over all
>> >> processors): 0
>> >>          linear system matrix = precond matrix:
>> >>          Mat Object:   1 MPI processes
>> >>            type: seqaijcusparse
>> >>            rows=40200, cols=40200
>> >>            total: nonzeros=199996, allocated nonzeros=199996
>> >>            total number of mallocs used during MatSetValues calls=0
>> >>              not using I-node routines
>> >>  linear system matrix = precond matrix:
>> >>  Mat Object: 16 MPI processes
>> >>    type: mpiaijcusparse
>> >>    rows=160800, cols=160800
>> >>    total: nonzeros=802396, allocated nonzeros=1608000
>> >>    total number of mallocs used during MatSetValues calls=0
>> >>      not using I-node (on process 0) routines
>> >> Norm of error 9.11684e-07 iterations 189
>> >>
>> >> Chang
>> >>
>> >>
>> >>
>> >> On 10/14/21 10:10 PM, Chang Liu wrote:
>> >>> Hi Barry,
>> >>> No problem. Here is the output. It seems that the resid norm
>> >>> calculation is incorrect.
>> >>> $ mpiexec -n 16 --hostfile hostfile --oversubscribe ./ex7 -m 400
>> >>> -ksp_view -ksp_monitor_true_residual -pc_type bjacobi
>> >>> -pc_bjacobi_blocks 4 -ksp_type fgmres -mat_type aijcusparse
>> >>> -sub_pc_type telescope -sub_ksp_type preonly -sub_telescope_ksp_type
>> >>> preonly -sub_telescope_pc_type lu
>> >>> -sub_telescope_pc_factor_mat_solver_type cusparse
>> >>> -sub_pc_telescope_reduction_factor 4 -sub_pc_telescope_subcomm_type
>> >>> contiguous -ksp_max_it 2000 -ksp_rtol 1.e-20 -ksp_atol 1.e-9
>> >>>   0 KSP unpreconditioned resid norm 4.014971979977e+01 true resid
>> >>> norm 4.014971979977e+01 ||r(i)||/||b|| 1.000000000000e+00
>> >>>   1 KSP unpreconditioned resid norm 0.000000000000e+00 true resid
>> >>> norm 4.014971979977e+01 ||r(i)||/||b|| 1.000000000000e+00
>> >>> KSP Object: 16 MPI processes
>> >>>   type: fgmres
>> >>>     restart=30, using Classical (unmodified) Gram-Schmidt
>> >>> Orthogonalization with no iterative refinement
>> >>>     happy breakdown tolerance 1e-30
>> >>>   maximum iterations=2000, initial guess is zero
>> >>>   tolerances:  relative=1e-20, absolute=1e-09, divergence=10000.
>> >>>   right preconditioning
>> >>>   using UNPRECONDITIONED norm type for convergence test
>> >>> PC Object: 16 MPI processes
>> >>>   type: bjacobi
>> >>>     number of blocks = 4
>> >>>     Local solver information for first block is in the following KSP
>> >>> and PC objects on rank 0:
>> >>>     Use -ksp_view ::ascii_info_detail to display information for all
>> >>> blocks
>> >>>   KSP Object: (sub_) 4 MPI processes
>> >>>     type: preonly
>> >>>     maximum iterations=10000, initial guess is zero
>> >>>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>> >>>     left preconditioning
>> >>>     using NONE norm type for convergence test
>> >>>   PC Object: (sub_) 4 MPI processes
>> >>>     type: telescope
>> >>>       petsc subcomm: parent comm size reduction factor = 4
>> >>>       petsc subcomm: parent_size = 4 , subcomm_size = 1
>> >>>       petsc subcomm type = contiguous
>> >>>     linear system matrix = precond matrix:
>> >>>     Mat Object: (sub_) 4 MPI processes
>> >>>       type: mpiaij
>> >>>       rows=40200, cols=40200
>> >>>       total: nonzeros=199996, allocated nonzeros=203412
>> >>>       total number of mallocs used during MatSetValues calls=0
>> >>>         not using I-node (on process 0) routines
>> >>>         setup type: default
>> >>>         Parent DM object: NULL
>> >>>         Sub DM object: NULL
>> >>>         KSP Object:   (sub_telescope_)   1 MPI processes
>> >>>           type: preonly
>> >>>           maximum iterations=10000, initial guess is zero
>> >>>           tolerances:  relative=1e-05, absolute=1e-50,
>> divergence=10000.
>> >>>           left preconditioning
>> >>>           using NONE norm type for convergence test
>> >>>         PC Object:   (sub_telescope_)   1 MPI processes
>> >>>           type: lu
>> >>>             out-of-place factorization
>> >>>             tolerance for zero pivot 2.22045e-14
>> >>>             matrix ordering: nd
>> >>>             factor fill ratio given 5., needed 8.62558
>> >>>               Factored matrix follows:
>> >>>                 Mat Object:   1 MPI processes
>> >>>                   type: seqaijcusparse
>> >>>                   rows=40200, cols=40200
>> >>>                   package used to perform factorization: cusparse
>> >>>                   total: nonzeros=1725082, allocated nonzeros=1725082
>> >>>                     not using I-node routines
>> >>>           linear system matrix = precond matrix:
>> >>>           Mat Object:   1 MPI processes
>> >>>             type: seqaijcusparse
>> >>>             rows=40200, cols=40200
>> >>>             total: nonzeros=199996, allocated nonzeros=199996
>> >>>             total number of mallocs used during MatSetValues calls=0
>> >>>               not using I-node routines
>> >>>   linear system matrix = precond matrix:
>> >>>   Mat Object: 16 MPI processes
>> >>>     type: mpiaijcusparse
>> >>>     rows=160800, cols=160800
>> >>>     total: nonzeros=802396, allocated nonzeros=1608000
>> >>>     total number of mallocs used during MatSetValues calls=0
>> >>>       not using I-node (on process 0) routines
>> >>> Norm of error 400.999 iterations 1
>> >>> Chang
>> >>> On 10/14/21 9:47 PM, Barry Smith wrote:
>> >>>>
>> >>>>    Chang,
>> >>>>
>> >>>>     Sorry I did not notice that one. Please run that with -ksp_view
>> >>>> -ksp_monitor_true_residual so we can see exactly how options are
>> >>>> interpreted and solver used. At a glance it looks ok but something
>> >>>> must be wrong to get the wrong answer.
>> >>>>
>> >>>>    Barry
>> >>>>
>> >>>>> On Oct 14, 2021, at 6:02 PM, Chang Liu <cliu at pppl.gov
>> >>>>> <mailto:cliu at pppl.gov>> wrote:
>> >>>>>
>> >>>>> Hi Barry,
>> >>>>>
>> >>>>> That is exactly what I was doing in the second example, in which
>> >>>>> the preconditioner works but the GMRES does not.
>> >>>>>
>> >>>>> Chang
>> >>>>>
>> >>>>> On 10/14/21 5:15 PM, Barry Smith wrote:
>> >>>>>>    You need to use the PCTELESCOPE inside the block Jacobi, not
>> >>>>>> outside it. So something like -pc_type bjacobi -sub_pc_type
>> >>>>>> telescope -sub_telescope_pc_type lu
>> >>>>>>> On Oct 14, 2021, at 4:14 PM, Chang Liu <cliu at pppl.gov
>> >>>>>>> <mailto:cliu at pppl.gov>> wrote:
>> >>>>>>>
>> >>>>>>> Hi Pierre,
>> >>>>>>>
>> >>>>>>> I wonder if the trick of PCTELESCOPE only works for
>> >>>>>>> preconditioner and not for the solver. I have done some tests,
>> >>>>>>> and find that for solving a small matrix using
>> >>>>>>> -telescope_ksp_type preonly, it does work for GPU with multiple
>> >>>>>>> MPI processes. However, for bjacobi and gmres, it does not work.
>> >>>>>>>
>> >>>>>>> The command line options I used for small matrix is like
>> >>>>>>>
>> >>>>>>> mpiexec -n 4 --oversubscribe ./ex7 -m 100 -ksp_monitor_short
>> >>>>>>> -pc_type telescope -mat_type aijcusparse -telescope_pc_type lu
>> >>>>>>> -telescope_pc_factor_mat_solver_type cusparse -telescope_ksp_type
>> >>>>>>> preonly -pc_telescope_reduction_factor 4
>> >>>>>>>
>> >>>>>>> which gives the correct output. For iterative solver, I tried
>> >>>>>>>
>> >>>>>>> mpiexec -n 16 --oversubscribe ./ex7 -m 400 -ksp_monitor_short
>> >>>>>>> -pc_type bjacobi -pc_bjacobi_blocks 4 -ksp_type fgmres -mat_type
>> >>>>>>> aijcusparse -sub_pc_type telescope -sub_ksp_type preonly
>> >>>>>>> -sub_telescope_ksp_type preonly -sub_telescope_pc_type lu
>> >>>>>>> -sub_telescope_pc_factor_mat_solver_type cusparse
>> >>>>>>> -sub_pc_telescope_reduction_factor 4 -ksp_max_it 2000 -ksp_rtol
>> >>>>>>> 1.e-9 -ksp_atol 1.e-20
>> >>>>>>>
>> >>>>>>> for large matrix. The output is like
>> >>>>>>>
>> >>>>>>>   0 KSP Residual norm 40.1497
>> >>>>>>>   1 KSP Residual norm < 1.e-11
>> >>>>>>> Norm of error 400.999 iterations 1
>> >>>>>>>
>> >>>>>>> So it seems to call a direct solver instead of an iterative one.
>> >>>>>>>
>> >>>>>>> Can you please help check these options?
>> >>>>>>>
>> >>>>>>> Chang
>> >>>>>>>
>> >>>>>>> On 10/14/21 10:04 AM, Pierre Jolivet wrote:
>> >>>>>>>>> On 14 Oct 2021, at 3:50 PM, Chang Liu <cliu at pppl.gov
>> >>>>>>>>> <mailto:cliu at pppl.gov>> wrote:
>> >>>>>>>>>
>> >>>>>>>>> Thank you Pierre. I was not aware of PCTELESCOPE before. This
>> >>>>>>>>> sounds exactly what I need. I wonder if PCTELESCOPE can
>> >>>>>>>>> transform a mpiaijcusparse to seqaircusparse? Or I have to do
>> >>>>>>>>> it manually?
>> >>>>>>>> PCTELESCOPE uses MatCreateMPIMatConcatenateSeqMat().
>> >>>>>>>> 1) I’m not sure this is implemented for cuSparse matrices, but
>> >>>>>>>> it should be;
>> >>>>>>>> 2) at least for the implementations
>> >>>>>>>> MatCreateMPIMatConcatenateSeqMat_MPIBAIJ() and
>> >>>>>>>> MatCreateMPIMatConcatenateSeqMat_MPIAIJ(), the resulting MatType
>> >>>>>>>> is MATBAIJ (resp. MATAIJ). Constructors are usually “smart”
>> >>>>>>>> enough to detect if the MPI communicator on which the Mat lives
>> >>>>>>>> is of size 1 (your case), and then the resulting Mat is of type
>> >>>>>>>> MatSeqX instead of MatMPIX, so you would not need to worry about
>> >>>>>>>> the transformation you are mentioning.
>> >>>>>>>> If you try this out and this does not work, please provide the
>> >>>>>>>> backtrace (probably something like “Operation XYZ not
>> >>>>>>>> implemented for MatType ABC”), and hopefully someone can add the
>> >>>>>>>> missing plumbing.
>> >>>>>>>> I do not claim that this will be efficient, but I think this
>> >>>>>>>> goes in the direction of what you want to achieve.
>> >>>>>>>> Thanks,
>> >>>>>>>> Pierre
>> >>>>>>>>> Chang
>> >>>>>>>>>
>> >>>>>>>>> On 10/14/21 1:35 AM, Pierre Jolivet wrote:
>> >>>>>>>>>> Maybe I’m missing something, but can’t you use PCTELESCOPE as
>> >>>>>>>>>> a subdomain solver, with a reduction factor equal to the
>> >>>>>>>>>> number of MPI processes you have per block?
>> >>>>>>>>>> -sub_pc_type telescope -sub_pc_telescope_reduction_factor X
>> >>>>>>>>>> -sub_telescope_pc_type lu
>> >>>>>>>>>> This does not work with MUMPS -mat_mumps_use_omp_threads
>> >>>>>>>>>> because not only do the Mat needs to be redistributed, the
>> >>>>>>>>>> secondary processes also need to be “converted” to OpenMP
>> threads.
>> >>>>>>>>>> Thus the need for specific code in mumps.c.
>> >>>>>>>>>> Thanks,
>> >>>>>>>>>> Pierre
>> >>>>>>>>>>> On 14 Oct 2021, at 6:00 AM, Chang Liu via petsc-users
>> >>>>>>>>>>> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>
>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>> Hi Junchao,
>> >>>>>>>>>>>
>> >>>>>>>>>>> Yes that is what I want.
>> >>>>>>>>>>>
>> >>>>>>>>>>> Chang
>> >>>>>>>>>>>
>> >>>>>>>>>>> On 10/13/21 11:42 PM, Junchao Zhang wrote:
>> >>>>>>>>>>>> On Wed, Oct 13, 2021 at 8:58 PM Barry Smith
>> >>>>>>>>>>>> <bsmith at petsc.dev <mailto:bsmith at petsc.dev>
>> >>>>>>>>>>>> <mailto:bsmith at petsc.dev <mailto:bsmith at petsc.dev>>> wrote:
>> >>>>>>>>>>>>        Junchao,
>> >>>>>>>>>>>>           If I understand correctly Chang is using the block
>> >>>>>>>>>>>> Jacobi
>> >>>>>>>>>>>>     method with a single block for a number of MPI ranks and
>> >>>>>>>>>>>> a direct
>> >>>>>>>>>>>>     solver for each block so it uses
>> >>>>>>>>>>>> PCSetUp_BJacobi_Multiproc() which
>> >>>>>>>>>>>>     is code Hong Zhang wrote a number of years ago for CPUs.
>> >>>>>>>>>>>> For their
>> >>>>>>>>>>>>     particular problems this preconditioner works well, but
>> >>>>>>>>>>>> using an
>> >>>>>>>>>>>>     iterative solver on the blocks does not work well.
>> >>>>>>>>>>>>           If we had complete MPI-GPU direct solvers he could
>> >>>>>>>>>>>> just use
>> >>>>>>>>>>>>     the current code with MPIAIJCUSPARSE on each block but
>> >>>>>>>>>>>> since we do
>> >>>>>>>>>>>>     not he would like to use a single GPU for each block,
>> >>>>>>>>>>>> this means
>> >>>>>>>>>>>>     that diagonal blocks of  the global parallel MPI matrix
>> >>>>>>>>>>>> needs to be
>> >>>>>>>>>>>>     sent to a subset of the GPUs (one GPU per block, which
>> >>>>>>>>>>>> has multiple
>> >>>>>>>>>>>>     MPI ranks associated with the blocks). Similarly for the
>> >>>>>>>>>>>> triangular
>> >>>>>>>>>>>>     solves the blocks of the right hand side needs to be
>> >>>>>>>>>>>> shipped to the
>> >>>>>>>>>>>>     appropriate GPU and the resulting solution shipped back
>> >>>>>>>>>>>> to the
>> >>>>>>>>>>>>     multiple GPUs. So Chang is absolutely correct, this is
>> >>>>>>>>>>>> somewhat like
>> >>>>>>>>>>>>     your code for MUMPS with OpenMP. OK, I now understand
>> >>>>>>>>>>>> the background..
>> >>>>>>>>>>>>     One could use PCSetUp_BJacobi_Multiproc() and get the
>> >>>>>>>>>>>> blocks on the
>> >>>>>>>>>>>>     MPI ranks and then shrink each block down to a single
>> >>>>>>>>>>>> GPU but this
>> >>>>>>>>>>>>     would be pretty inefficient, ideally one would go
>> >>>>>>>>>>>> directly from the
>> >>>>>>>>>>>>     big MPI matrix on all the GPUs to the sub matrices on
>> >>>>>>>>>>>> the subset of
>> >>>>>>>>>>>>     GPUs. But this may be a large coding project.
>> >>>>>>>>>>>> I don't understand these sentences. Why do you say "shrink"?
>> >>>>>>>>>>>> In my mind, we just need to move each block (submatrix)
>> >>>>>>>>>>>> living over multiple MPI ranks to one of them and solve
>> >>>>>>>>>>>> directly there.  In other words, we keep blocks' size, no
>> >>>>>>>>>>>> shrinking or expanding.
>> >>>>>>>>>>>> As mentioned before, cusparse does not provide LU
>> >>>>>>>>>>>> factorization. So the LU factorization would be done on CPU,
>> >>>>>>>>>>>> and the solve be done on GPU. I assume Chang wants to gain
>> >>>>>>>>>>>> from the (potential) faster solve (instead of factorization)
>> >>>>>>>>>>>> on GPU.
>> >>>>>>>>>>>>        Barry
>> >>>>>>>>>>>>     Since the matrices being factored and solved directly
>> >>>>>>>>>>>> are relatively
>> >>>>>>>>>>>>     large it is possible that the cusparse code could be
>> >>>>>>>>>>>> reasonably
>> >>>>>>>>>>>>     efficient (they are not the tiny problems one gets at
>> >>>>>>>>>>>> the coarse
>> >>>>>>>>>>>>     level of multigrid). Of course, this is speculation, I
>> don't
>> >>>>>>>>>>>>     actually know how much better the cusparse code would be
>> >>>>>>>>>>>> on the
>> >>>>>>>>>>>>     direct solver than a good CPU direct sparse solver.
>> >>>>>>>>>>>>      > On Oct 13, 2021, at 9:32 PM, Chang Liu <cliu at pppl.gov
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>> wrote:
>> >>>>>>>>>>>>      >
>> >>>>>>>>>>>>      > Sorry I am not familiar with the details either. Can
>> >>>>>>>>>>>> you please
>> >>>>>>>>>>>>     check the code in MatMumpsGatherNonzerosOnMaster in
>> mumps.c?
>> >>>>>>>>>>>>      >
>> >>>>>>>>>>>>      > Chang
>> >>>>>>>>>>>>      >
>> >>>>>>>>>>>>      > On 10/13/21 9:24 PM, Junchao Zhang wrote:
>> >>>>>>>>>>>>      >> Hi Chang,
>> >>>>>>>>>>>>      >>   I did the work in mumps. It is easy for me to
>> >>>>>>>>>>>> understand
>> >>>>>>>>>>>>     gathering matrix rows to one process.
>> >>>>>>>>>>>>      >>   But how to gather blocks (submatrices) to form a
>> >>>>>>>>>>>> large block?     Can you draw a picture of that?
>> >>>>>>>>>>>>      >>   Thanks
>> >>>>>>>>>>>>      >> --Junchao Zhang
>> >>>>>>>>>>>>      >> On Wed, Oct 13, 2021 at 7:47 PM Chang Liu via
>> >>>>>>>>>>>> petsc-users
>> >>>>>>>>>>>>     <petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>
>> >>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>>>
>> >>>>>>>>>>>>     wrote:
>> >>>>>>>>>>>>      >>    Hi Barry,
>> >>>>>>>>>>>>      >>    I think mumps solver in petsc does support that.
>> >>>>>>>>>>>> You can
>> >>>>>>>>>>>>     check the
>> >>>>>>>>>>>>      >>    documentation on "-mat_mumps_use_omp_threads" at
>> >>>>>>>>>>>>      >>
>> >>>>>>>>>>>>
>> https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html
>> >>>>>>>>>>>> <
>> https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>    <
>> https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html
>> >>>>>>>>>>>> <
>> https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>      >>
>> >>>>>>>>>>>>       <
>> https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html
>> >>>>>>>>>>>> <
>> https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>    <
>> https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html
>> >>>>>>>>>>>> <
>> https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>      >>    and the code enclosed by #if
>> >>>>>>>>>>>>     defined(PETSC_HAVE_OPENMP_SUPPORT) in
>> >>>>>>>>>>>>      >>    functions MatMumpsSetUpDistRHSInfo and
>> >>>>>>>>>>>>      >>    MatMumpsGatherNonzerosOnMaster in
>> >>>>>>>>>>>>      >>    mumps.c
>> >>>>>>>>>>>>      >>    1. I understand it is ideal to do one MPI rank
>> >>>>>>>>>>>> per GPU.
>> >>>>>>>>>>>>     However, I am
>> >>>>>>>>>>>>      >>    working on an existing code that was developed
>> >>>>>>>>>>>> based on MPI
>> >>>>>>>>>>>>     and the the
>> >>>>>>>>>>>>      >>    # of mpi ranks is typically equal to # of cpu
>> >>>>>>>>>>>> cores. We don't
>> >>>>>>>>>>>>     want to
>> >>>>>>>>>>>>      >>    change the whole structure of the code.
>> >>>>>>>>>>>>      >>    2. What you have suggested has been coded in
>> >>>>>>>>>>>> mumps.c. See
>> >>>>>>>>>>>>     function
>> >>>>>>>>>>>>      >>    MatMumpsSetUpDistRHSInfo.
>> >>>>>>>>>>>>      >>    Regards,
>> >>>>>>>>>>>>      >>    Chang
>> >>>>>>>>>>>>      >>    On 10/13/21 7:53 PM, Barry Smith wrote:
>> >>>>>>>>>>>>      >>     >
>> >>>>>>>>>>>>      >>     >
>> >>>>>>>>>>>>      >>     >> On Oct 13, 2021, at 3:50 PM, Chang Liu
>> >>>>>>>>>>>> <cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>> wrote:
>> >>>>>>>>>>>>      >>     >>
>> >>>>>>>>>>>>      >>     >> Hi Barry,
>> >>>>>>>>>>>>      >>     >>
>> >>>>>>>>>>>>      >>     >> That is exactly what I want.
>> >>>>>>>>>>>>      >>     >>
>> >>>>>>>>>>>>      >>     >> Back to my original question, I am looking
>> >>>>>>>>>>>> for an approach to
>> >>>>>>>>>>>>      >>    transfer
>> >>>>>>>>>>>>      >>     >> matrix
>> >>>>>>>>>>>>      >>     >> data from many MPI processes to "master" MPI
>> >>>>>>>>>>>>      >>     >> processes, each of which taking care of one
>> >>>>>>>>>>>> GPU, and then
>> >>>>>>>>>>>>     upload
>> >>>>>>>>>>>>      >>    the data to GPU to
>> >>>>>>>>>>>>      >>     >> solve.
>> >>>>>>>>>>>>      >>     >> One can just grab some codes from mumps.c to
>> >>>>>>>>>>>>     aijcusparse.cu <http://aijcusparse.cu
>> >>>>>>>>>>>> <http://aijcusparse.cu>>
>> >>>>>>>>>>>>      >>    <http://aijcusparse.cu <http://aijcusparse.cu>
>> >>>>>>>>>>>> <http://aijcusparse.cu <http://aijcusparse.cu>>>.
>> >>>>>>>>>>>>      >>     >
>> >>>>>>>>>>>>      >>     >    mumps.c doesn't actually do that. It never
>> >>>>>>>>>>>> needs to
>> >>>>>>>>>>>>     copy the
>> >>>>>>>>>>>>      >>    entire matrix to a single MPI rank.
>> >>>>>>>>>>>>      >>     >
>> >>>>>>>>>>>>      >>     >    It would be possible to write such a code
>> >>>>>>>>>>>> that you
>> >>>>>>>>>>>>     suggest but
>> >>>>>>>>>>>>      >>    it is not clear that it makes sense
>> >>>>>>>>>>>>      >>     >
>> >>>>>>>>>>>>      >>     > 1)  For normal PETSc GPU usage there is one
>> >>>>>>>>>>>> GPU per MPI
>> >>>>>>>>>>>>     rank, so
>> >>>>>>>>>>>>      >>    while your one GPU per big domain is solving its
>> >>>>>>>>>>>> systems the
>> >>>>>>>>>>>>     other
>> >>>>>>>>>>>>      >>    GPUs (with the other MPI ranks that share that
>> >>>>>>>>>>>> domain) are doing
>> >>>>>>>>>>>>      >>    nothing.
>> >>>>>>>>>>>>      >>     >
>> >>>>>>>>>>>>      >>     > 2) For each triangular solve you would have to
>> >>>>>>>>>>>> gather the
>> >>>>>>>>>>>>     right
>> >>>>>>>>>>>>      >>    hand side from the multiple ranks to the single
>> >>>>>>>>>>>> GPU to pass it to
>> >>>>>>>>>>>>      >>    the GPU solver and then scatter the resulting
>> >>>>>>>>>>>> solution back
>> >>>>>>>>>>>>     to all
>> >>>>>>>>>>>>      >>    of its subdomain ranks.
>> >>>>>>>>>>>>      >>     >
>> >>>>>>>>>>>>      >>     >    What I was suggesting was assign an entire
>> >>>>>>>>>>>> subdomain to a
>> >>>>>>>>>>>>      >>    single MPI rank, thus it does everything on one
>> >>>>>>>>>>>> GPU and can
>> >>>>>>>>>>>>     use the
>> >>>>>>>>>>>>      >>    GPU solver directly. If all the major
>> >>>>>>>>>>>> computations of a subdomain
>> >>>>>>>>>>>>      >>    can fit and be done on a single GPU then you
>> would be
>> >>>>>>>>>>>>     utilizing all
>> >>>>>>>>>>>>      >>    the GPUs you are using effectively.
>> >>>>>>>>>>>>      >>     >
>> >>>>>>>>>>>>      >>     >    Barry
>> >>>>>>>>>>>>      >>     >
>> >>>>>>>>>>>>      >>     >
>> >>>>>>>>>>>>      >>     >
>> >>>>>>>>>>>>      >>     >>
>> >>>>>>>>>>>>      >>     >> Chang
>> >>>>>>>>>>>>      >>     >>
>> >>>>>>>>>>>>      >>     >> On 10/13/21 1:53 PM, Barry Smith wrote:
>> >>>>>>>>>>>>      >>     >>>    Chang,
>> >>>>>>>>>>>>      >>     >>>      You are correct there is no MPI + GPU
>> >>>>>>>>>>>> direct
>> >>>>>>>>>>>>     solvers that
>> >>>>>>>>>>>>      >>    currently do the triangular solves with MPI + GPU
>> >>>>>>>>>>>> parallelism
>> >>>>>>>>>>>>     that I
>> >>>>>>>>>>>>      >>    am aware of. You are limited that individual
>> >>>>>>>>>>>> triangular solves be
>> >>>>>>>>>>>>      >>    done on a single GPU. I can only suggest making
>> >>>>>>>>>>>> each subdomain as
>> >>>>>>>>>>>>      >>    big as possible to utilize each GPU as much as
>> >>>>>>>>>>>> possible for the
>> >>>>>>>>>>>>      >>    direct triangular solves.
>> >>>>>>>>>>>>      >>     >>>     Barry
>> >>>>>>>>>>>>      >>     >>>> On Oct 13, 2021, at 12:16 PM, Chang Liu via
>> >>>>>>>>>>>> petsc-users
>> >>>>>>>>>>>>      >>    <petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>
>> >>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>>>
>> >>>>>>>>>>>>     wrote:
>> >>>>>>>>>>>>      >>     >>>>
>> >>>>>>>>>>>>      >>     >>>> Hi Mark,
>> >>>>>>>>>>>>      >>     >>>>
>> >>>>>>>>>>>>      >>     >>>> '-mat_type aijcusparse' works with
>> >>>>>>>>>>>> mpiaijcusparse with
>> >>>>>>>>>>>>     other
>> >>>>>>>>>>>>      >>    solvers, but with -pc_factor_mat_solver_type
>> >>>>>>>>>>>> cusparse, it
>> >>>>>>>>>>>>     will give
>> >>>>>>>>>>>>      >>    an error.
>> >>>>>>>>>>>>      >>     >>>>
>> >>>>>>>>>>>>      >>     >>>> Yes what I want is to have mumps or superlu
>> >>>>>>>>>>>> to do the
>> >>>>>>>>>>>>      >>    factorization, and then do the rest, including
>> >>>>>>>>>>>> GMRES solver,
>> >>>>>>>>>>>>     on gpu.
>> >>>>>>>>>>>>      >>    Is that possible?
>> >>>>>>>>>>>>      >>     >>>>
>> >>>>>>>>>>>>      >>     >>>> I have tried to use aijcusparse with
>> >>>>>>>>>>>> superlu_dist, it
>> >>>>>>>>>>>>     runs but
>> >>>>>>>>>>>>      >>    the iterative solver is still running on CPUs. I
>> have
>> >>>>>>>>>>>>     contacted the
>> >>>>>>>>>>>>      >>    superlu group and they confirmed that is the case
>> >>>>>>>>>>>> right now.
>> >>>>>>>>>>>>     But if
>> >>>>>>>>>>>>      >>    I set -pc_factor_mat_solver_type cusparse, it
>> >>>>>>>>>>>> seems that the
>> >>>>>>>>>>>>      >>    iterative solver is running on GPU.
>> >>>>>>>>>>>>      >>     >>>>
>> >>>>>>>>>>>>      >>     >>>> Chang
>> >>>>>>>>>>>>      >>     >>>>
>> >>>>>>>>>>>>      >>     >>>> On 10/13/21 12:03 PM, Mark Adams wrote:
>> >>>>>>>>>>>>      >>     >>>>> On Wed, Oct 13, 2021 at 11:10 AM Chang Liu
>> >>>>>>>>>>>>     <cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>> wrote:
>> >>>>>>>>>>>>      >>     >>>>>     Thank you Junchao for explaining this.
>> >>>>>>>>>>>> I guess in
>> >>>>>>>>>>>>     my case
>> >>>>>>>>>>>>      >>    the code is
>> >>>>>>>>>>>>      >>     >>>>>     just calling a seq solver like superlu
>> >>>>>>>>>>>> to do
>> >>>>>>>>>>>>      >>    factorization on GPUs.
>> >>>>>>>>>>>>      >>     >>>>>     My idea is that I want to have a
>> >>>>>>>>>>>> traditional MPI
>> >>>>>>>>>>>>     code to
>> >>>>>>>>>>>>      >>    utilize GPUs
>> >>>>>>>>>>>>      >>     >>>>>     with cusparse. Right now cusparse does
>> >>>>>>>>>>>> not support
>> >>>>>>>>>>>>     mpiaij
>> >>>>>>>>>>>>      >>    matrix, Sure it does: '-mat_type aijcusparse'
>> >>>>>>>>>>>> will give you an
>> >>>>>>>>>>>>      >>    mpiaijcusparse matrix with > 1 processes.
>> >>>>>>>>>>>>      >>     >>>>> (-mat_type mpiaijcusparse might also work
>> >>>>>>>>>>>> with >1 proc).
>> >>>>>>>>>>>>      >>     >>>>> However, I see in grepping the repo that
>> >>>>>>>>>>>> all the mumps and
>> >>>>>>>>>>>>      >>    superlu tests use aij or sell matrix type.
>> >>>>>>>>>>>>      >>     >>>>> MUMPS and SuperLU provide their own
>> >>>>>>>>>>>> solves, I assume
>> >>>>>>>>>>>>     .... but
>> >>>>>>>>>>>>      >>    you might want to do other matrix operations on
>> >>>>>>>>>>>> the GPU. Is
>> >>>>>>>>>>>>     that the
>> >>>>>>>>>>>>      >>    issue?
>> >>>>>>>>>>>>      >>     >>>>> Did you try -mat_type aijcusparse with
>> >>>>>>>>>>>> MUMPS and/or
>> >>>>>>>>>>>>     SuperLU
>> >>>>>>>>>>>>      >>    have a problem? (no test with it so it probably
>> >>>>>>>>>>>> does not work)
>> >>>>>>>>>>>>      >>     >>>>> Thanks,
>> >>>>>>>>>>>>      >>     >>>>> Mark
>> >>>>>>>>>>>>      >>     >>>>>     so I
>> >>>>>>>>>>>>      >>     >>>>>     want the code to have a mpiaij matrix
>> >>>>>>>>>>>> when adding
>> >>>>>>>>>>>>     all the
>> >>>>>>>>>>>>      >>    matrix terms,
>> >>>>>>>>>>>>      >>     >>>>>     and then transform the matrix to
>> >>>>>>>>>>>> seqaij when doing the
>> >>>>>>>>>>>>      >>    factorization
>> >>>>>>>>>>>>      >>     >>>>>     and
>> >>>>>>>>>>>>      >>     >>>>>     solve. This involves sending the data
>> >>>>>>>>>>>> to the master
>> >>>>>>>>>>>>      >>    process, and I
>> >>>>>>>>>>>>      >>     >>>>>     think
>> >>>>>>>>>>>>      >>     >>>>>     the petsc mumps solver have something
>> >>>>>>>>>>>> similar already.
>> >>>>>>>>>>>>      >>     >>>>>     Chang
>> >>>>>>>>>>>>      >>     >>>>>     On 10/13/21 10:18 AM, Junchao Zhang
>> wrote:
>> >>>>>>>>>>>>      >>     >>>>>      >
>> >>>>>>>>>>>>      >>     >>>>>      >
>> >>>>>>>>>>>>      >>     >>>>>      >
>> >>>>>>>>>>>>      >>     >>>>>      > On Tue, Oct 12, 2021 at 1:07 PM
>> >>>>>>>>>>>> Mark Adams
>> >>>>>>>>>>>>      >>    <mfadams at lbl.gov <mailto:mfadams at lbl.gov>
>> >>>>>>>>>>>> <mailto:mfadams at lbl.gov <mailto:mfadams at lbl.gov>>
>> >>>>>>>>>>>>     <mailto:mfadams at lbl.gov <mailto:mfadams at lbl.gov>
>> >>>>>>>>>>>> <mailto:mfadams at lbl.gov <mailto:mfadams at lbl.gov>>>
>> >>>>>>>>>>>>      >>     >>>>>     <mailto:mfadams at lbl.gov
>> >>>>>>>>>>>> <mailto:mfadams at lbl.gov> <mailto:mfadams at lbl.gov
>> >>>>>>>>>>>> <mailto:mfadams at lbl.gov>>
>> >>>>>>>>>>>>     <mailto:mfadams at lbl.gov <mailto:mfadams at lbl.gov>
>> >>>>>>>>>>>> <mailto:mfadams at lbl.gov <mailto:mfadams at lbl.gov>>>>
>> >>>>>>>>>>>>      >>     >>>>>      > <mailto:mfadams at lbl.gov
>> >>>>>>>>>>>> <mailto:mfadams at lbl.gov>
>> >>>>>>>>>>>>     <mailto:mfadams at lbl.gov <mailto:mfadams at lbl.gov>>
>> >>>>>>>>>>>> <mailto:mfadams at lbl.gov <mailto:mfadams at lbl.gov>
>> >>>>>>>>>>>>     <mailto:mfadams at lbl.gov <mailto:mfadams at lbl.gov>>>
>> >>>>>>>>>>>>      >>    <mailto:mfadams at lbl.gov <mailto:mfadams at lbl.gov>
>> >>>>>>>>>>>> <mailto:mfadams at lbl.gov <mailto:mfadams at lbl.gov>>
>> >>>>>>>>>>>>     <mailto:mfadams at lbl.gov <mailto:mfadams at lbl.gov>
>> >>>>>>>>>>>> <mailto:mfadams at lbl.gov <mailto:mfadams at lbl.gov>>>>>> wrote:
>> >>>>>>>>>>>>      >>     >>>>>      >
>> >>>>>>>>>>>>      >>     >>>>>      >
>> >>>>>>>>>>>>      >>     >>>>>      >
>> >>>>>>>>>>>>      >>     >>>>>      >     On Tue, Oct 12, 2021 at 1:45 PM
>> >>>>>>>>>>>> Chang Liu
>> >>>>>>>>>>>>      >>    <cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>> >>>>>>>>>>>>      >>     >>>>>     <mailto:cliu at pppl.gov
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
>> >>>>>>>>>>>>      >>     >>>>>      >     <mailto:cliu at pppl.gov
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>> >>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>> wrote:
>> >>>>>>>>>>>>      >>     >>>>>      >
>> >>>>>>>>>>>>      >>     >>>>>      >         Hi Mark,
>> >>>>>>>>>>>>      >>     >>>>>      >
>> >>>>>>>>>>>>      >>     >>>>>      >         The option I use is like
>> >>>>>>>>>>>>      >>     >>>>>      >
>> >>>>>>>>>>>>      >>     >>>>>      >         -pc_type bjacobi
>> >>>>>>>>>>>> -pc_bjacobi_blocks 16
>> >>>>>>>>>>>>      >>    -ksp_type fgmres
>> >>>>>>>>>>>>      >>     >>>>>     -mat_type
>> >>>>>>>>>>>>      >>     >>>>>      >         aijcusparse
>> >>>>>>>>>>>> *-sub_pc_factor_mat_solver_type
>> >>>>>>>>>>>>      >>    cusparse
>> >>>>>>>>>>>>      >>     >>>>>     *-sub_ksp_type
>> >>>>>>>>>>>>      >>     >>>>>      >         preonly *-sub_pc_type lu*
>> >>>>>>>>>>>> -ksp_max_it 2000
>> >>>>>>>>>>>>      >>    -ksp_rtol 1.e-300
>> >>>>>>>>>>>>      >>     >>>>>      >         -ksp_atol 1.e-300
>> >>>>>>>>>>>>      >>     >>>>>      >
>> >>>>>>>>>>>>      >>     >>>>>      >
>> >>>>>>>>>>>>      >>     >>>>>      >     Note, If you use -log_view the
>> >>>>>>>>>>>> last column
>> >>>>>>>>>>>>     (rows
>> >>>>>>>>>>>>      >>    are the
>> >>>>>>>>>>>>      >>     >>>>>     method like
>> >>>>>>>>>>>>      >>     >>>>>      >     MatFactorNumeric) has the
>> >>>>>>>>>>>> percent of work
>> >>>>>>>>>>>>     in the GPU.
>> >>>>>>>>>>>>      >>     >>>>>      >
>> >>>>>>>>>>>>      >>     >>>>>      >     Junchao: *This* implies that we
>> >>>>>>>>>>>> have a
>> >>>>>>>>>>>>     cuSparse LU
>> >>>>>>>>>>>>      >>     >>>>>     factorization. Is
>> >>>>>>>>>>>>      >>     >>>>>      >     that correct? (I don't think we
>> do)
>> >>>>>>>>>>>>      >>     >>>>>      >
>> >>>>>>>>>>>>      >>     >>>>>      > No, we don't have cuSparse LU
>> >>>>>>>>>>>> factorization.     If you check
>> >>>>>>>>>>>>      >>     >>>>>      >
>> >>>>>>>>>>>> MatLUFactorSymbolic_SeqAIJCUSPARSE(),you will
>> >>>>>>>>>>>>     find it
>> >>>>>>>>>>>>      >>    calls
>> >>>>>>>>>>>>      >>     >>>>>      > MatLUFactorSymbolic_SeqAIJ()
>> instead.
>> >>>>>>>>>>>>      >>     >>>>>      > So I don't understand Chang's idea.
>> >>>>>>>>>>>> Do you want to
>> >>>>>>>>>>>>      >>    make bigger
>> >>>>>>>>>>>>      >>     >>>>>     blocks?
>> >>>>>>>>>>>>      >>     >>>>>      >
>> >>>>>>>>>>>>      >>     >>>>>      >
>> >>>>>>>>>>>>      >>     >>>>>      >         I think this one do both
>> >>>>>>>>>>>> factorization and
>> >>>>>>>>>>>>      >>    solve on gpu.
>> >>>>>>>>>>>>      >>     >>>>>      >
>> >>>>>>>>>>>>      >>     >>>>>      >         You can check the
>> >>>>>>>>>>>>     runex72_aijcusparse.sh file
>> >>>>>>>>>>>>      >>    in petsc
>> >>>>>>>>>>>>      >>     >>>>>     install
>> >>>>>>>>>>>>      >>     >>>>>      >         directory, and try it your
>> >>>>>>>>>>>> self (this
>> >>>>>>>>>>>>     is only lu
>> >>>>>>>>>>>>      >>     >>>>>     factorization
>> >>>>>>>>>>>>      >>     >>>>>      >         without
>> >>>>>>>>>>>>      >>     >>>>>      >         iterative solve).
>> >>>>>>>>>>>>      >>     >>>>>      >
>> >>>>>>>>>>>>      >>     >>>>>      >         Chang
>> >>>>>>>>>>>>      >>     >>>>>      >
>> >>>>>>>>>>>>      >>     >>>>>      >         On 10/12/21 1:17 PM, Mark
>> >>>>>>>>>>>> Adams wrote:
>> >>>>>>>>>>>>      >>     >>>>>      >          >
>> >>>>>>>>>>>>      >>     >>>>>      >          >
>> >>>>>>>>>>>>      >>     >>>>>      >          > On Tue, Oct 12, 2021 at
>> >>>>>>>>>>>> 11:19 AM
>> >>>>>>>>>>>>     Chang Liu
>> >>>>>>>>>>>>      >>     >>>>>     <cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>> >>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
>> >>>>>>>>>>>>      >>     >>>>>      >         <mailto:cliu at pppl.gov
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>> >>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>
>> >>>>>>>>>>>>      >>     >>>>>      >          > <mailto:cliu at pppl.gov
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
>> >>>>>>>>>>>>      >>     >>>>>     <mailto:cliu at pppl.gov
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>> >>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>>> wrote:
>> >>>>>>>>>>>>      >>     >>>>>      >          >
>> >>>>>>>>>>>>      >>     >>>>>      >          >     Hi Junchao,
>> >>>>>>>>>>>>      >>     >>>>>      >          >
>> >>>>>>>>>>>>      >>     >>>>>      >          >     No I only needs it
>> >>>>>>>>>>>> to be transferred
>> >>>>>>>>>>>>      >>    within a
>> >>>>>>>>>>>>      >>     >>>>>     node. I use
>> >>>>>>>>>>>>      >>     >>>>>      >         block-Jacobi
>> >>>>>>>>>>>>      >>     >>>>>      >          >     method and GMRES to
>> >>>>>>>>>>>> solve the sparse
>> >>>>>>>>>>>>      >>    matrix, so each
>> >>>>>>>>>>>>      >>     >>>>>      >         direct solver will
>> >>>>>>>>>>>>      >>     >>>>>      >          >     take care of a
>> >>>>>>>>>>>> sub-block of the
>> >>>>>>>>>>>>     whole
>> >>>>>>>>>>>>      >>    matrix. In this
>> >>>>>>>>>>>>      >>     >>>>>      >         way, I can use
>> >>>>>>>>>>>>      >>     >>>>>      >          >     one
>> >>>>>>>>>>>>      >>     >>>>>      >          >     GPU to solve one
>> >>>>>>>>>>>> sub-block, which is
>> >>>>>>>>>>>>      >>    stored within
>> >>>>>>>>>>>>      >>     >>>>>     one node.
>> >>>>>>>>>>>>      >>     >>>>>      >          >
>> >>>>>>>>>>>>      >>     >>>>>      >          >     It was stated in the
>> >>>>>>>>>>>>     documentation that
>> >>>>>>>>>>>>      >>    cusparse
>> >>>>>>>>>>>>      >>     >>>>>     solver
>> >>>>>>>>>>>>      >>     >>>>>      >         is slow.
>> >>>>>>>>>>>>      >>     >>>>>      >          >     However, in my test
>> >>>>>>>>>>>> using
>> >>>>>>>>>>>>     ex72.c, the
>> >>>>>>>>>>>>      >>    cusparse
>> >>>>>>>>>>>>      >>     >>>>>     solver is
>> >>>>>>>>>>>>      >>     >>>>>      >         faster than
>> >>>>>>>>>>>>      >>     >>>>>      >          >     mumps or
>> >>>>>>>>>>>> superlu_dist on CPUs.
>> >>>>>>>>>>>>      >>     >>>>>      >          >
>> >>>>>>>>>>>>      >>     >>>>>      >          >
>> >>>>>>>>>>>>      >>     >>>>>      >          > Are we talking about the
>> >>>>>>>>>>>>     factorization, the
>> >>>>>>>>>>>>      >>    solve, or
>> >>>>>>>>>>>>      >>     >>>>>     both?
>> >>>>>>>>>>>>      >>     >>>>>      >          >
>> >>>>>>>>>>>>      >>     >>>>>      >          > We do not have an
>> >>>>>>>>>>>> interface to
>> >>>>>>>>>>>>     cuSparse's LU
>> >>>>>>>>>>>>      >>     >>>>>     factorization (I
>> >>>>>>>>>>>>      >>     >>>>>      >         just
>> >>>>>>>>>>>>      >>     >>>>>      >          > learned that it exists a
>> >>>>>>>>>>>> few weeks ago).
>> >>>>>>>>>>>>      >>     >>>>>      >          > Perhaps your fast
>> >>>>>>>>>>>> "cusparse solver" is
>> >>>>>>>>>>>>      >>    '-pc_type lu
>> >>>>>>>>>>>>      >>     >>>>>     -mat_type
>> >>>>>>>>>>>>      >>     >>>>>      >          > aijcusparse' ? This
>> >>>>>>>>>>>> would be the CPU
>> >>>>>>>>>>>>      >>    factorization,
>> >>>>>>>>>>>>      >>     >>>>>     which is the
>> >>>>>>>>>>>>      >>     >>>>>      >          > dominant cost.
>> >>>>>>>>>>>>      >>     >>>>>      >          >
>> >>>>>>>>>>>>      >>     >>>>>      >          >
>> >>>>>>>>>>>>      >>     >>>>>      >          >     Chang
>> >>>>>>>>>>>>      >>     >>>>>      >          >
>> >>>>>>>>>>>>      >>     >>>>>      >          >     On 10/12/21 10:24
>> >>>>>>>>>>>> AM, Junchao
>> >>>>>>>>>>>>     Zhang wrote:
>> >>>>>>>>>>>>      >>     >>>>>      >          >      > Hi, Chang,
>> >>>>>>>>>>>>      >>     >>>>>      >          >      >     For the mumps
>> >>>>>>>>>>>> solver, we
>> >>>>>>>>>>>>     usually
>> >>>>>>>>>>>>      >>    transfers
>> >>>>>>>>>>>>      >>     >>>>>     matrix
>> >>>>>>>>>>>>      >>     >>>>>      >         and vector
>> >>>>>>>>>>>>      >>     >>>>>      >          >     data
>> >>>>>>>>>>>>      >>     >>>>>      >          >      > within a compute
>> >>>>>>>>>>>> node.  For
>> >>>>>>>>>>>>     the idea you
>> >>>>>>>>>>>>      >>     >>>>>     propose, it
>> >>>>>>>>>>>>      >>     >>>>>      >         looks like
>> >>>>>>>>>>>>      >>     >>>>>      >          >     we need
>> >>>>>>>>>>>>      >>     >>>>>      >          >      > to gather data
>> within
>> >>>>>>>>>>>>      >>    MPI_COMM_WORLD, right?
>> >>>>>>>>>>>>      >>     >>>>>      >          >      >
>> >>>>>>>>>>>>      >>     >>>>>      >          >      >     Mark, I
>> >>>>>>>>>>>> remember you said
>> >>>>>>>>>>>>      >>    cusparse solve is
>> >>>>>>>>>>>>      >>     >>>>>     slow
>> >>>>>>>>>>>>      >>     >>>>>      >         and you would
>> >>>>>>>>>>>>      >>     >>>>>      >          >      > rather do it on
>> >>>>>>>>>>>> CPU. Is it right?
>> >>>>>>>>>>>>      >>     >>>>>      >          >      >
>> >>>>>>>>>>>>      >>     >>>>>      >          >      > --Junchao Zhang
>> >>>>>>>>>>>>      >>     >>>>>      >          >      >
>> >>>>>>>>>>>>      >>     >>>>>      >          >      >
>> >>>>>>>>>>>>      >>     >>>>>      >          >      > On Mon, Oct 11,
>> >>>>>>>>>>>> 2021 at 10:25 PM
>> >>>>>>>>>>>>      >>    Chang Liu via
>> >>>>>>>>>>>>      >>     >>>>>     petsc-users
>> >>>>>>>>>>>>      >>     >>>>>      >          >      >
>> >>>>>>>>>>>> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>
>> >>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>
>> >>>>>>>>>>>>      >>    <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>>
>> >>>>>>>>>>>>      >>     >>>>>     <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>> >>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>
>> >>>>>>>>>>>>      >>    <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>> >>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>>>
>> >>>>>>>>>>>>      >>     >>>>>      >
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov <mailto:
>> petsc-users at mcs.anl.gov>
>> >>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>
>> >>>>>>>>>>>>      >>    <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>>
>> >>>>>>>>>>>>      >>     >>>>>     <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>> >>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>
>> >>>>>>>>>>>>      >>    <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>> >>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>>>>
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov <mailto:
>> petsc-users at mcs.anl.gov>
>> >>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>
>> >>>>>>>>>>>>      >>    <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>>
>> >>>>>>>>>>>>      >>     >>>>>     <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>> >>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>
>> >>>>>>>>>>>>      >>    <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>> >>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>>>
>> >>>>>>>>>>>>      >>     >>>>>      >
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov <mailto:
>> petsc-users at mcs.anl.gov>
>> >>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>
>> >>>>>>>>>>>>      >>    <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>>
>> >>>>>>>>>>>>      >>     >>>>>     <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>> >>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>
>> >>>>>>>>>>>>      >>    <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>> >>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>>>>>
>> >>>>>>>>>>>>      >>     >>>>>      >          >
>> >>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>> >>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>
>> >>>>>>>>>>>>      >>    <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>>
>> >>>>>>>>>>>>      >>     >>>>>     <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>> >>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>
>> >>>>>>>>>>>>      >>    <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>> >>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>>>
>> >>>>>>>>>>>>      >>     >>>>>      >
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov <mailto:
>> petsc-users at mcs.anl.gov>
>> >>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>
>> >>>>>>>>>>>>      >>    <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>>
>> >>>>>>>>>>>>      >>     >>>>>     <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>> >>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>
>> >>>>>>>>>>>>      >>    <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>> >>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>>>>
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov <mailto:
>> petsc-users at mcs.anl.gov>
>> >>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>
>> >>>>>>>>>>>>      >>    <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>>
>> >>>>>>>>>>>>      >>     >>>>>     <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>> >>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>
>> >>>>>>>>>>>>      >>    <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>> >>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>>>
>> >>>>>>>>>>>>      >>     >>>>>      >
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov <mailto:
>> petsc-users at mcs.anl.gov>
>> >>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>
>> >>>>>>>>>>>>      >>    <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>>
>> >>>>>>>>>>>>      >>     >>>>>     <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>> >>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>
>> >>>>>>>>>>>>      >>    <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>
>> >>>>>>>>>>>>     <mailto:petsc-users at mcs.anl.gov
>> >>>>>>>>>>>> <mailto:petsc-users at mcs.anl.gov>>>>>>>>
>> >>>>>>>>>>>>      >>     >>>>>      >          >     wrote:
>> >>>>>>>>>>>>      >>     >>>>>      >          >      >
>> >>>>>>>>>>>>      >>     >>>>>      >          >      >     Hi,
>> >>>>>>>>>>>>      >>     >>>>>      >          >      >
>> >>>>>>>>>>>>      >>     >>>>>      >          >      >     Currently, it
>> >>>>>>>>>>>> is possible
>> >>>>>>>>>>>>     to use
>> >>>>>>>>>>>>      >>    mumps
>> >>>>>>>>>>>>      >>     >>>>>     solver in
>> >>>>>>>>>>>>      >>     >>>>>      >         PETSC with
>> >>>>>>>>>>>>      >>     >>>>>      >          >      >
>> >>>>>>>>>>>>     -mat_mumps_use_omp_threads
>> >>>>>>>>>>>>      >>    option, so that
>> >>>>>>>>>>>>      >>     >>>>>      >         multiple MPI
>> >>>>>>>>>>>>      >>     >>>>>      >          >     processes will
>> >>>>>>>>>>>>      >>     >>>>>      >          >      >     transfer the
>> >>>>>>>>>>>> matrix and
>> >>>>>>>>>>>>     rhs data
>> >>>>>>>>>>>>      >>    to the master
>> >>>>>>>>>>>>      >>     >>>>>      >         rank, and then
>> >>>>>>>>>>>>      >>     >>>>>      >          >     master
>> >>>>>>>>>>>>      >>     >>>>>      >          >      >     rank will
>> >>>>>>>>>>>> call mumps with
>> >>>>>>>>>>>>     OpenMP
>> >>>>>>>>>>>>      >>    to solve
>> >>>>>>>>>>>>      >>     >>>>>     the matrix.
>> >>>>>>>>>>>>      >>     >>>>>      >          >      >
>> >>>>>>>>>>>>      >>     >>>>>      >          >      >     I wonder if
>> >>>>>>>>>>>> someone can
>> >>>>>>>>>>>>     develop
>> >>>>>>>>>>>>      >>    similar
>> >>>>>>>>>>>>      >>     >>>>>     option for
>> >>>>>>>>>>>>      >>     >>>>>      >         cusparse
>> >>>>>>>>>>>>      >>     >>>>>      >          >     solver.
>> >>>>>>>>>>>>      >>     >>>>>      >          >      >     Right now,
>> >>>>>>>>>>>> this solver
>> >>>>>>>>>>>>     does not
>> >>>>>>>>>>>>      >>    work with
>> >>>>>>>>>>>>      >>     >>>>>      >         mpiaijcusparse. I
>> >>>>>>>>>>>>      >>     >>>>>      >          >     think a
>> >>>>>>>>>>>>      >>     >>>>>      >          >      >     possible
>> >>>>>>>>>>>> workaround is to
>> >>>>>>>>>>>>      >>    transfer all the
>> >>>>>>>>>>>>      >>     >>>>>     matrix
>> >>>>>>>>>>>>      >>     >>>>>      >         data to one MPI
>> >>>>>>>>>>>>      >>     >>>>>      >          >      >     process, and
>> >>>>>>>>>>>> then upload the
>> >>>>>>>>>>>>      >>    data to GPU to
>> >>>>>>>>>>>>      >>     >>>>>     solve.
>> >>>>>>>>>>>>      >>     >>>>>      >         In this
>> >>>>>>>>>>>>      >>     >>>>>      >          >     way, one can
>> >>>>>>>>>>>>      >>     >>>>>      >          >      >     use cusparse
>> >>>>>>>>>>>> solver for a MPI
>> >>>>>>>>>>>>      >>    program.
>> >>>>>>>>>>>>      >>     >>>>>      >          >      >
>> >>>>>>>>>>>>      >>     >>>>>      >          >      >     Chang
>> >>>>>>>>>>>>      >>     >>>>>      >          >      >     --
>> >>>>>>>>>>>>      >>     >>>>>      >          >      >     Chang Liu
>> >>>>>>>>>>>>      >>     >>>>>      >          >      >     Staff
>> >>>>>>>>>>>> Research Physicist
>> >>>>>>>>>>>>      >>     >>>>>      >          >      >     +1 609 243
>> 3438
>> >>>>>>>>>>>>      >>     >>>>>      >          >      > cliu at pppl.gov
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>> >>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
>> >>>>>>>>>>>>      >>     >>>>>     <mailto:cliu at pppl.gov
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>> >>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>
>> >>>>>>>>>>>>      >>     >>>>>      >         <mailto:cliu at pppl.gov
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>> >>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
>> >>>>>>>>>>>>      >>     >>>>>     <mailto:cliu at pppl.gov
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>> >>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>>
>> >>>>>>>>>>>>      >>     >>>>>      >         <mailto:cliu at pppl.gov
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>> >>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
>> >>>>>>>>>>>>      >>     >>>>>     <mailto:cliu at pppl.gov
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>> >>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>
>> >>>>>>>>>>>>      >>     >>>>>      >          >
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
>> >>>>>>>>>>>>      >>     >>>>>     <mailto:cliu at pppl.gov
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>> >>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>>>
>> >>>>>>>>>>>>      >>     >>>>>      >          >      >     Princeton
>> >>>>>>>>>>>> Plasma Physics
>> >>>>>>>>>>>>     Laboratory
>> >>>>>>>>>>>>      >>     >>>>>      >          >      >     100
>> >>>>>>>>>>>> Stellarator Rd,
>> >>>>>>>>>>>>     Princeton NJ
>> >>>>>>>>>>>>      >>    08540, USA
>> >>>>>>>>>>>>      >>     >>>>>      >          >      >
>> >>>>>>>>>>>>      >>     >>>>>      >          >
>> >>>>>>>>>>>>      >>     >>>>>      >          >     --
>> >>>>>>>>>>>>      >>     >>>>>      >          >     Chang Liu
>> >>>>>>>>>>>>      >>     >>>>>      >          >     Staff Research
>> Physicist
>> >>>>>>>>>>>>      >>     >>>>>      >          >     +1 609 243 3438
>> >>>>>>>>>>>>      >>     >>>>>      >          > cliu at pppl.gov
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>> >>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
>> >>>>>>>>>>>>      >>     >>>>>     <mailto:cliu at pppl.gov
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>> >>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>> >>>>>>>>>>>>      >>     >>>>>     <mailto:cliu at pppl.gov
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
>> >>>>>>>>>>>>      >>     >>>>>      >         <mailto:cliu at pppl.gov
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>> >>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>>
>> >>>>>>>>>>>>      >>     >>>>>      >          >     Princeton Plasma
>> >>>>>>>>>>>> Physics Laboratory
>> >>>>>>>>>>>>      >>     >>>>>      >          >     100 Stellarator Rd,
>> >>>>>>>>>>>> Princeton NJ
>> >>>>>>>>>>>>     08540, USA
>> >>>>>>>>>>>>      >>     >>>>>      >          >
>> >>>>>>>>>>>>      >>     >>>>>      >
>> >>>>>>>>>>>>      >>     >>>>>      >         --
>> >>>>>>>>>>>>      >>     >>>>>      >         Chang Liu
>> >>>>>>>>>>>>      >>     >>>>>      >         Staff Research Physicist
>> >>>>>>>>>>>>      >>     >>>>>      >         +1 609 243 3438
>> >>>>>>>>>>>>      >>     >>>>>      > cliu at pppl.gov
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>> >>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>> >>>>>>>>>>>>      >>     >>>>>     <mailto:cliu at pppl.gov
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>
>> >>>>>>>>>>>>      >>     >>>>>      >         Princeton Plasma Physics
>> >>>>>>>>>>>> Laboratory
>> >>>>>>>>>>>>      >>     >>>>>      >         100 Stellarator Rd,
>> >>>>>>>>>>>> Princeton NJ 08540, USA
>> >>>>>>>>>>>>      >>     >>>>>      >
>> >>>>>>>>>>>>      >>     >>>>>     --     Chang Liu
>> >>>>>>>>>>>>      >>     >>>>>     Staff Research Physicist
>> >>>>>>>>>>>>      >>     >>>>>     +1 609 243 3438
>> >>>>>>>>>>>>      >>     >>>>> cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
>> >>>>>>>>>>>>      >>     >>>>>     Princeton Plasma Physics Laboratory
>> >>>>>>>>>>>>      >>     >>>>>     100 Stellarator Rd, Princeton NJ
>> >>>>>>>>>>>> 08540, USA
>> >>>>>>>>>>>>      >>     >>>>
>> >>>>>>>>>>>>      >>     >>>> --
>> >>>>>>>>>>>>      >>     >>>> Chang Liu
>> >>>>>>>>>>>>      >>     >>>> Staff Research Physicist
>> >>>>>>>>>>>>      >>     >>>> +1 609 243 3438
>> >>>>>>>>>>>>      >>     >>>> cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>> >>>>>>>>>>>>      >>     >>>> Princeton Plasma Physics Laboratory
>> >>>>>>>>>>>>      >>     >>>> 100 Stellarator Rd, Princeton NJ 08540, USA
>> >>>>>>>>>>>>      >>     >>
>> >>>>>>>>>>>>      >>     >> --
>> >>>>>>>>>>>>      >>     >> Chang Liu
>> >>>>>>>>>>>>      >>     >> Staff Research Physicist
>> >>>>>>>>>>>>      >>     >> +1 609 243 3438
>> >>>>>>>>>>>>      >>     >> cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>> >>>>>>>>>>>>      >>     >> Princeton Plasma Physics Laboratory
>> >>>>>>>>>>>>      >>     >> 100 Stellarator Rd, Princeton NJ 08540, USA
>> >>>>>>>>>>>>      >>     >
>> >>>>>>>>>>>>      >>    --     Chang Liu
>> >>>>>>>>>>>>      >>    Staff Research Physicist
>> >>>>>>>>>>>>      >>    +1 609 243 3438
>> >>>>>>>>>>>>      >> cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>> >>>>>>>>>>>>      >>    Princeton Plasma Physics Laboratory
>> >>>>>>>>>>>>      >>    100 Stellarator Rd, Princeton NJ 08540, USA
>> >>>>>>>>>>>>      >
>> >>>>>>>>>>>>      > --
>> >>>>>>>>>>>>      > Chang Liu
>> >>>>>>>>>>>>      > Staff Research Physicist
>> >>>>>>>>>>>>      > +1 609 243 3438
>> >>>>>>>>>>>>      > cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>>> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>> >>>>>>>>>>>>      > Princeton Plasma Physics Laboratory
>> >>>>>>>>>>>>      > 100 Stellarator Rd, Princeton NJ 08540, USA
>> >>>>>>>>>>>
>> >>>>>>>>>>> --
>> >>>>>>>>>>> Chang Liu
>> >>>>>>>>>>> Staff Research Physicist
>> >>>>>>>>>>> +1 609 243 3438
>> >>>>>>>>>>> cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>>>> Princeton Plasma Physics Laboratory
>> >>>>>>>>>>> 100 Stellarator Rd, Princeton NJ 08540, USA
>> >>>>>>>>>
>> >>>>>>>>> --
>> >>>>>>>>> Chang Liu
>> >>>>>>>>> Staff Research Physicist
>> >>>>>>>>> +1 609 243 3438
>> >>>>>>>>> cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>>>> Princeton Plasma Physics Laboratory
>> >>>>>>>>> 100 Stellarator Rd, Princeton NJ 08540, USA
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> Chang Liu
>> >>>>>>> Staff Research Physicist
>> >>>>>>> +1 609 243 3438
>> >>>>>>> cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>>>> Princeton Plasma Physics Laboratory
>> >>>>>>> 100 Stellarator Rd, Princeton NJ 08540, USA
>> >>>>>
>> >>>>> --
>> >>>>> Chang Liu
>> >>>>> Staff Research Physicist
>> >>>>> +1 609 243 3438
>> >>>>> cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >>>>> Princeton Plasma Physics Laboratory
>> >>>>> 100 Stellarator Rd, Princeton NJ 08540, USA
>> >>>>
>> >>
>> >> --
>> >> Chang Liu
>> >> Staff Research Physicist
>> >> +1 609 243 3438
>> >> cliu at pppl.gov <mailto:cliu at pppl.gov>
>> >> Princeton Plasma Physics Laboratory
>> >> 100 Stellarator Rd, Princeton NJ 08540, USA
>> >
>>
>> --
>> Chang Liu
>> Staff Research Physicist
>> +1 609 243 3438
>> cliu at pppl.gov
>> Princeton Plasma Physics Laboratory
>> 100 Stellarator Rd, Princeton NJ 08540, USA
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20211018/b1e89a89/attachment-0001.html>


More information about the petsc-users mailing list