[petsc-users] Error using GPU in Fortran code
Ramoni Z. Sedano Azevedo
ramoni.zsedano at gmail.com
Wed Aug 30 15:41:28 CDT 2023
Hello,
I'm executing a code in Fortran using PETSc with MPI via CPU and I would
like to execute it using GPU.
PETSc is configured as follows:
./configure \
--prefix=${PWD}/installdir \
--with-fortran \
--with-fortran-kernels=true \
--with-cuda \
--download-fblaslapack \
--with-scalar-type=complex \
--with-precision=double \
--with-debugging=0 \
--with-x=0 \
--with-gnu-compilers=1 \
--with-cc=mpicc \
--with-cxx=mpicxx \
--with-fc=mpif90 \
--with-make-exec=make
The parameters for using MPI on CPU are:
mpirun -np $ntasks ./${executable} \
-A_mat_type mpiaij \
-P_mat_type mpiaij \
-em_ksp_monitor_true_residual \
-em_ksp_type bcgs \
-em_pc_type bjacobi \
-em_sub_pc_type ilu \
-em_sub_pc_factor_levels 3 \
-em_sub_pc_factor_fill 6 \
< ./Parameters.inp
Code output:
Solving for Hz fields
bnorm 3.7727507818834821E-005
xnorm 2.3407405211699372E-016
Residual norms for em_ solve.
0 KSP preconditioned resid norm 1.236208833927e-08 true resid norm
1.413045088306e-03 ||r(i)||/||b|| 3.745397377137e+01
1 KSP preconditioned resid norm 1.664973208594e-10 true resid norm
3.463939828700e+00 ||r(i)||/||b|| 9.181470043910e+04
2 KSP preconditioned resid norm 8.366983092820e-14 true resid norm
9.171051852915e-02 ||r(i)||/||b|| 2.430866066466e+03
3 KSP preconditioned resid norm 1.386354386207e-14 true resid norm
1.905770367881e-02 ||r(i)||/||b|| 5.051408052270e+02
4 KSP preconditioned resid norm 4.635883581096e-15 true resid norm
7.285180695640e-03 ||r(i)||/||b|| 1.930999717931e+02
5 KSP preconditioned resid norm 1.974093227402e-15 true resid norm
2.953370060898e-03 ||r(i)||/||b|| 7.828161020018e+01
6 KSP preconditioned resid norm 1.182781787023e-15 true resid norm
2.288756945462e-03 ||r(i)||/||b|| 6.066546871987e+01
7 KSP preconditioned resid norm 6.221244366707e-16 true resid norm
1.263339414861e-03 ||r(i)||/||b|| 3.348589631014e+01
8 KSP preconditioned resid norm 3.800488678870e-16 true resid norm
9.015738978063e-04 ||r(i)||/||b|| 2.389699054959e+01
9 KSP preconditioned resid norm 2.498733213989e-16 true resid norm
7.194509577987e-04 ||r(i)||/||b|| 1.906966559396e+01
10 KSP preconditioned resid norm 1.563017112250e-16 true resid norm
5.055208317846e-04 ||r(i)||/||b|| 1.339926385310e+01
11 KSP preconditioned resid norm 8.733803057628e-17 true resid norm
3.171941303660e-04 ||r(i)||/||b|| 8.407502872682e+00
12 KSP preconditioned resid norm 4.907010803529e-17 true resid norm
1.868311755294e-04 ||r(i)||/||b|| 4.952120782177e+00
13 KSP preconditioned resid norm 2.214070343700e-17 true resid norm
8.760421740830e-05 ||r(i)||/||b|| 2.322025028236e+00
14 KSP preconditioned resid norm 1.333171674446e-17 true resid norm
5.984548368534e-05 ||r(i)||/||b|| 1.586255948119e+00
15 KSP preconditioned resid norm 7.696778066646e-18 true resid norm
3.786809196913e-05 ||r(i)||/||b|| 1.003726303656e+00
16 KSP preconditioned resid norm 3.863008301366e-18 true resid norm
1.284864871601e-05 ||r(i)||/||b|| 3.405644702988e-01
17 KSP preconditioned resid norm 2.061402843494e-18 true resid norm
1.054741071688e-05 ||r(i)||/||b|| 2.795681805311e-01
18 KSP preconditioned resid norm 1.062033155108e-18 true resid norm
3.992776343462e-06 ||r(i)||/||b|| 1.058319664960e-01
converged reason 2
total number of relaxations 18
========================================
The parameters for GPU usage are:
mpirun -np $ntasks ./${executable} \
-A_mat_type aijcusparse \
-P_mat_type aijcusparse \
-vec_type cuda \
-use_gpu_aware_mpi 0 \
-em_ksp_monitor_true_residual \
-em_ksp_type bcgs \
-em_pc_type bjacobi \
-em_sub_pc_type ilu \
-em_sub_pc_factor_levels 3 \
-em_sub_pc_factor_fill 6 \
< ./Parameters.inp
Code output:
Solving for Hz fields
bnorm 3.7727507818834821E-005
xnorm 2.3407405211699372E-016
Residual norms for em_ solve.
0 KSP preconditioned resid norm 1.236220954395e-08 true resid norm
3.772750781883e-05 ||r(i)||/||b|| 1.000000000000e+00
1 KSP preconditioned resid norm 0.000000000000e+00 true resid norm
3.772750781883e-05 ||r(i)||/||b|| 1.000000000000e+00
converged reason 3
total number of relaxations 1
========================================
Clearly the code running on GPU is not converging correctly.
Has anyone experienced this problem?
Sincerely,
Ramoni Z. S. Azevedo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230830/00dcaa40/attachment.html>
More information about the petsc-users
mailing list