[petsc-dev] ASM for each field solve on GPUs

Mark Adams mfadams at lbl.gov
Thu Dec 31 15:26:44 CST 2020


Oh, and I use:

-fieldsplit_e_pc_type lu
-fieldsplit_i1_pc_type lu

and

-fieldsplit_pc_type lu

does not seem to work.


On Thu, Dec 31, 2020 at 4:25 PM Mark Adams <mfadams at lbl.gov> wrote:

> Ha! This seems to work. It Iooks good as far as I can tell. I don't think
> this is just a direct solver.
> Thanks
>
>  ....
>     3 SNES Function norm 1.557752448861e-11
>
> *      0 KSP Residual norm 1.557752448861e-11      1 KSP Residual norm
> 7.987154898471e-27*
> KSP Object: 1 MPI processes
>   type: preonly
>   maximum iterations=10000, initial guess is zero
>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>   left preconditioning
>   using NONE norm type for convergence test
> PC Object: 1 MPI processes
>   type: fieldsplit
>
> *    FieldSplit with ADDITIVE composition: total splits = 2*    Solver
> info for each split is in the following KSP objects:
>   Split number 0 Defined by IS
>   KSP Object: (fieldsplit_e_) 1 MPI processes
>     type: preonly
>     maximum iterations=10000, initial guess is zero
>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>     left preconditioning
>     using NONE norm type for convergence test
>   PC Object: (fieldsplit_e_) 1 MPI processes
>     type: lu
>       out-of-place factorization
>       tolerance for zero pivot 2.22045e-14
>       matrix ordering: nd
>       factor fill ratio given 5., needed 1.30805
>         Factored matrix follows:
>           Mat Object: 1 MPI processes
>             type: seqaij
>             rows=448, cols=448
>             package used to perform factorization: petsc
>             total: nonzeros=14038, allocated nonzeros=14038
>               using I-node routines: found 175 nodes, limit used is 5
>     linear system matrix = precond matrix:
>     Mat Object: (fieldsplit_e_) 1 MPI processes
>       type: seqaij
>       rows=448, cols=448
>       total: nonzeros=10732, allocated nonzeros=10732
>       total number of mallocs used during MatSetValues calls=0
>         using I-node routines: found 197 nodes, limit used is 5
>   Split number 1 Defined by IS
>   KSP Object: (fieldsplit_i1_) 1 MPI processes
>     type: preonly
>     maximum iterations=10000, initial guess is zero
>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>     left preconditioning
>     using NONE norm type for convergence test
>   PC Object: (fieldsplit_i1_) 1 MPI processes
>     type: lu
>       out-of-place factorization
>       tolerance for zero pivot 2.22045e-14
>       matrix ordering: nd
>       factor fill ratio given 5., needed 1.30805
>         Factored matrix follows:
>           Mat Object: 1 MPI processes
>             type: seqaij
>             rows=448, cols=448
>             package used to perform factorization: petsc
>             total: nonzeros=14038, allocated nonzeros=14038
>               using I-node routines: found 175 nodes, limit used is 5
>     linear system matrix = precond matrix:
>     Mat Object: (fieldsplit_i1_) 1 MPI processes
>       type: seqaij
>       rows=448, cols=448
>       total: nonzeros=10732, allocated nonzeros=10732
>       total number of mallocs used during MatSetValues calls=0
>         using I-node routines: found 197 nodes, limit used is 5
>   linear system matrix = precond matrix:
>   Mat Object: 1 MPI processes
>     type: seqaij
>     rows=896, cols=896
>     total: nonzeros=21464, allocated nonzeros=42928
>     total number of mallocs used during MatSetValues calls=0
>       using I-node routines: found 398 nodes, limit used is 5
>     4 SNES Function norm 5.154807006139e-14
>   Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 4
>       TSAdapt basic arkimex 0:1bee step   0 accepted t=0          +
> 1.000e-02 dt=1.100e-02  wlte=0.00161  wltea=   -1 wlter=   -1
> 1 TS dt 0.011 time 0.01
>   1) species-0: charge density= -1.6022862392985e+01 z-momentum=
>  1.9463853500826e-19 energy=  9.6063874253791e+04
>   1) species-1: charge density=  1.6029890912678e+01 z-momentum=
> -1.6132625562313e-18 energy=  9.6333617535820e+04
>  1) Total: charge density=  7.0285196929696e-03, momentum=
> -1.4186240212230e-18, energy=  1.9239749178961e+05 (m_i[0]/m_e = 1835.47,
> 50 cells)
> testSpitzer    1) time= 1.000e-02 n_e=  1.000e+00 E=  0.000e+00 J=
>  5.338e-08 J_re= -0.000e+00 -0 % Te_kev=  3.995e+00 Z_eff=1. E/J to eta
> ratio=0. (diff=-0.) constant E
> [0] parallel consistency check OK
> #PETSc Option Table entries:
> -dm_landau_amr_levels_max 8
> -dm_landau_amr_post_refine 0
> -dm_landau_device_type cpu
> -dm_landau_domain_radius 5
> -dm_landau_Ez 0
> -dm_landau_ion_charges 1
> -dm_landau_ion_masses 1
> -dm_landau_n 1,1
> -dm_landau_thermal_temps 4,4
> -dm_landau_type p4est
> -dm_preallocate_only
> -ex2_connor_e_field_units
> -ex2_impurity_source_type pulse
> -ex2_plot_dt 1
> -ex2_pulse_rate 2e+0
> -ex2_pulse_start_time 32
> -ex2_pulse_width_time 15
> -ex2_t_cold 1
> -ex2_test_type spitzer
>
> *-fieldsplit_e_pc_type lu-fieldsplit_i1_pc_type lu*
> -info :dm
> -ksp_monitor
> -ksp_type preonly
> -ksp_view
> -options_left
>
> *-pc_fieldsplit_type additive-pc_type fieldsplit*
> -petscspace_degree 3
> -snes_converged_reason
> -snes_max_it 15
> -snes_monitor
> -snes_rtol 1.e-14
> -snes_stol 1.e-14
> -ts_adapt_clip .25,1.1
> -ts_adapt_dt_max 1.
> -ts_adapt_dt_min 1e-4
> -ts_adapt_monitor
> -ts_adapt_scale_solve_failed 0.75
> -ts_adapt_time_step_increase_delay 5
> -ts_arkimex_type 1bee
> -ts_dt .1e-1
> -ts_exact_final_time stepover
> -ts_max_snes_failures -1
> -ts_max_steps 1
> -ts_max_time 100
> -ts_monitor
> -ts_rtol 1e-3
> -ts_type arkimex
> #End of PETSc Option Table entries
> There are no unused options.
> (base) 16:19 master= ~/Codes/petsc/src/ts/utils/dmplexlandau/tutorials$
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20201231/a769915b/attachment-0001.html>


More information about the petsc-dev mailing list