[petsc-users] overlap cpu and gpu?

nicola varini nicola.varini at gmail.com
Wed Aug 12 03:18:58 CDT 2020


Dear all, following the suggestions I did resubmit the simulation with the
petscrc below.
However I do get the following error:
========
 7362 [592]PETSC ERROR: #1 formProl0() line 748 in
/scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c
  7363 [339]PETSC ERROR: Petsc has generated inconsistent data
  7364 [339]PETSC ERROR: xGEQRF error
  7365 [339]PETSC ERROR: See
https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
  7366 [339]PETSC ERROR: Petsc Release Version 3.13.3, Jul 01, 2020
  7367 [339]PETSC ERROR:
/users/nvarini/gbs_test_nicola/bin/gbs_daint_gpu_gnu on a  named nid05083
by nvarini Wed Aug 12 10:06:15 2020
  7368 [339]PETSC ERROR: Configure options --with-cc=cc --with-fc=ftn
--known-mpi-shared-libraries=1 --known-mpi-c-double-complex=1
--known-mpi-int64_t=1 --known-mpi-long-double=1 --with-batch=1
--known-64-bit-blas-indices=0 --LIBS=-lstdc++ --with-cxxlib-autodetect=0
--with-scalapa       ck=1 --with-cxx=CC --with-debugging=0
--with-hypre-dir=/opt/cray/pe/tpsl/19.06.1/GNU/8.2/haswell
--prefix=/scratch/snx3000/nvarini/petsc3.13.3-gpu --with-cuda=1
--with-cuda-c=nvcc --with-cxxlib-autodetect=0
--COPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include -
-with-cxx=CC
--CXXOPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include
  7369 [592]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in
/scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c
  7370 [592]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in
/scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c
  7371 [592]PETSC ERROR: #4 PCSetUp() line 898 in
/scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c
  7372 [592]PETSC ERROR: #5 KSPSetUp() line 376 in
/scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c
  7373 [592]PETSC ERROR: #6 KSPSolve_Private() line 633 in
/scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c
  7374 [316]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in
/scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c
  7375 [339]PETSC ERROR: #1 formProl0() line 748 in
/scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c
  7376 [339]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in
/scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c
  7377 [339]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in
/scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c
  7378 [339]PETSC ERROR: #4 PCSetUp() line 898 in
/scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c
  7379 [339]PETSC ERROR: #5 KSPSetUp() line 376 in
/scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c
  7380 [592]PETSC ERROR: #7 KSPSolve() line 853 in
/scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c
  7381 [339]PETSC ERROR: #6 KSPSolve_Private() line 633 in
/scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c
  7382 [339]PETSC ERROR: #7 KSPSolve() line 853 in
/scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c
  7383 On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal value
(info = -7)
  7384 [160]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in
/scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c
========

I did try other pc_gamg_type but they fails as well.


#PETSc Option Table entries:
-ampere_dm_mat_type aijcusparse
-ampere_dm_vec_type cuda
-ampere_ksp_atol 1e-15
-ampere_ksp_initial_guess_nonzero yes
-ampere_ksp_reuse_preconditioner yes
-ampere_ksp_rtol 1e-7
-ampere_ksp_type dgmres
-ampere_mg_levels_esteig_ksp_max_it 10
-ampere_mg_levels_esteig_ksp_type cg
-ampere_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05
-ampere_mg_levels_ksp_type chebyshev
-ampere_mg_levels_pc_type jacobi
-ampere_pc_gamg_agg_nsmooths 1
-ampere_pc_gamg_coarse_eq_limit 10
-ampere_pc_gamg_reuse_interpolation true
-ampere_pc_gamg_square_graph 1
-ampere_pc_gamg_threshold 0.05
-ampere_pc_gamg_threshold_scale .0
-ampere_pc_gamg_type agg
-ampere_pc_type gamg
-dm_mat_type aijcusparse
-dm_vec_type cuda
-log_view
-poisson_dm_mat_type aijcusparse
-poisson_dm_vec_type cuda
-poisson_ksp_atol 1e-15
-poisson_ksp_initial_guess_nonzero yes
-poisson_ksp_reuse_preconditioner yes
-poisson_ksp_rtol 1e-7
-poisson_ksp_type dgmres
-poisson_log_view
-poisson_mg_levels_esteig_ksp_max_it 10
-poisson_mg_levels_esteig_ksp_type cg
-poisson_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05
-poisson_mg_levels_ksp_max_it 1
-poisson_mg_levels_ksp_type chebyshev
-poisson_mg_levels_pc_type jacobi
-poisson_pc_gamg_agg_nsmooths 1
-poisson_pc_gamg_coarse_eq_limit 10
-poisson_pc_gamg_reuse_interpolation true
-poisson_pc_gamg_square_graph 1
-poisson_pc_gamg_threshold 0.05
-poisson_pc_gamg_threshold_scale .0
-poisson_pc_gamg_type agg
-poisson_pc_type gamg
-use_mat_nearnullspace true
#End of PETSc Option Table entries

Regards,

Nicola

Il giorno mar 4 ago 2020 alle ore 17:57 Mark Adams <mfadams at lbl.gov> ha
scritto:

>
>
> On Tue, Aug 4, 2020 at 6:35 AM Stefano Zampini <stefano.zampini at gmail.com>
> wrote:
>
>> Nicola,
>>
>> You are actually not using the GPU properly, since you use HYPRE
>> preconditioning, which is CPU only. One of your solvers is actually slower
>> on “GPU”.
>> For a full AMG GPU, you can use PCGAMG, with cheby smoothers and with
>> Jacobi preconditioning. Mark can help you out with the specific command
>> line options.
>> When it works properly, everything related to PC application is offloaded
>> to the GPU, and you should expect to get the well-known and branded 10x
>> (maybe more) speedup one is expecting from GPUs during KSPSolve
>>
>>
> The speedup depends on the machine, but on SUMMIT, using enough CPUs to
> saturate the memory bus vs all 6 GPUs the speedup is a function of problem
> subdomain size. I saw 10x at about 100K equations/process.
>
>
>> Doing what you want to do is one of the last optimization steps of an
>> already optimized code before entering production. Yours is not even
>> optimized for proper GPU usage  yet.
>> Also, any specific reason why you are using dgmres and fgmres?
>>
>> PETSc has not been designed with multi-threading in mind. You can achieve
>> “overlap” of the two solves by splitting the communicator. But then you
>> need communications to let the two solutions talk to each other.
>>
>> Thanks
>> Stefano
>>
>>
>> On Aug 4, 2020, at 12:04 PM, nicola varini <nicola.varini at gmail.com>
>> wrote:
>>
>> Dear all, thanks for your replies. The reason why I've asked if it is
>> possible to overlap poisson and ampere is because they roughly
>> take the same amount of time. Please find in attachment the profiling
>> logs for only CPU  and only GPU.
>> Of course it is possible to split the MPI communicator and run each
>> solver on different subcommunicator, however this would involve more
>> communication.
>> Did anyone ever tried to run 2 solvers with hyperthreading?
>> Thanks
>>
>>
>> Il giorno dom 2 ago 2020 alle ore 14:09 Mark Adams <mfadams at lbl.gov> ha
>> scritto:
>>
>>> I suspect that the Poisson and Ampere's law solve are not coupled. You
>>> might be able to duplicate the communicator and use two threads. You would
>>> want to configure PETSc with threadsafty and threads and I think it
>>> could/should work, but this mode is never used by anyone.
>>>
>>> That said, I would not recommend doing this unless you feel like playing
>>> in computer science, as opposed to doing application science. The best case
>>> scenario you get a speedup of 2x. That is a strict upper bound, but you
>>> will never come close to it. Your hardware has some balance of CPU to GPU
>>> processing rate. Your application has a balance of volume of work for your
>>> two solves. They have to be the same to get close to 2x speedup and that
>>> ratio(s) has to be 1:1. To be concrete, from what little I can guess about
>>> your applications let's assume that the cost of each of these two solves is
>>> about the same (eg, Laplacians on your domain and the best case scenario).
>>> But, GPU machines are configured to have roughly 1-10% of capacity in the
>>> GPUs, these days, that gives you an upper bound of about 10% speedup. That
>>> is noise. Upshot, unless you configure your hardware to match this problem,
>>> and the two solves have the same cost, you will not see close to 2x
>>> speedup. Your time is better spent elsewhere.
>>>
>>> Mark
>>>
>>> On Sat, Aug 1, 2020 at 3:24 PM Jed Brown <jed at jedbrown.org> wrote:
>>>
>>>> You can use MPI and split the communicator so n-1 ranks create a DMDA
>>>> for one part of your system and the other rank drives the GPU in the other
>>>> part.  They can all be part of the same coupled system on the full
>>>> communicator, but PETSc doesn't currently support some ranks having their
>>>> Vec arrays on GPU and others on host, so you'd be paying host-device
>>>> transfer costs on each iteration (and that might swamp any performance
>>>> benefit you would have gotten).
>>>>
>>>> In any case, be sure to think about the execution time of each part.
>>>> Load balancing with matching time-to-solution for each part can be really
>>>> hard.
>>>>
>>>>
>>>> Barry Smith <bsmith at petsc.dev> writes:
>>>>
>>>> >   Nicola,
>>>> >
>>>> >     This is really viable or practical at this time with PETSc. It is
>>>> not impossible but requires careful coding with threads, another
>>>> possibility is to use one half of the virtual GPUs for each solve, this is
>>>> also not trivial. I would recommend first seeing what kind of performance
>>>> you can get on the GPU for each type of solve and revist this idea in the
>>>> future.
>>>> >
>>>> >    Barry
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >> On Jul 31, 2020, at 9:23 AM, nicola varini <nicola.varini at gmail.com>
>>>> wrote:
>>>> >>
>>>> >> Hello, I would like to know if it is possible to overlap CPU and GPU
>>>> with DMDA.
>>>> >> I've a machine where each node has 1P100+1Haswell.
>>>> >> I've to resolve Poisson and Ampere equation for each time step.
>>>> >> I'm using 2D DMDA for each of them. Would be possible to compute
>>>> poisson
>>>> >> and ampere equation at the same time? One on CPU and the other on
>>>> GPU?
>>>> >>
>>>> >> Thanks
>>>>
>>> <out_gpu><out_nogpu>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200812/7932be58/attachment-0001.html>


More information about the petsc-users mailing list