<div dir="ltr">I did not configure hypre manually, so I guess it is not using GPUs.</div><br><div class="gmail_quote"><div dir="ltr">On Fri, Nov 2, 2018 at 2:40 PM Smith, Barry F. <<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

<br>

> On Nov 2, 2018, at 1:25 PM, Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>> wrote:<br>

> <br>

> And I just tested it with GAMG and it seems fine.  And hypre ran, but it is not clear that it used GPUs....<br>

<br>

    Presumably hyper must be configured to use GPUs. Currently the PETSc hyper download installer hypre.py doesn't have any options for getting hypre built for GPUs.<br>

<br>

    Barry<br>

<br>

> <br>

> 14:13 master= ~/petsc/src/snes/examples/tutorials$ jsrun -n 1 ./ex19 -dm_vec_type cuda -dm_mat_type aijcusparse -pc_type hypre -ksp_type fgmres -snes_monitor_short -snes_rtol 1.e-5 -ksp_view<br>

> lid velocity = 0.0625, prandtl # = 1., grashof # = 1.<br>

>   0 SNES Function norm 0.239155 <br>

> KSP Object: 1 MPI processes<br>

>   type: fgmres<br>

>     restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement<br>

>     happy breakdown tolerance 1e-30<br>

>   maximum iterations=10000, initial guess is zero<br>

>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.<br>

>   right preconditioning<br>

>   using UNPRECONDITIONED norm type for convergence test<br>

> PC Object: 1 MPI processes<br>

>   type: hypre<br>

>     HYPRE BoomerAMG preconditioning<br>

>       Cycle type V<br>

>       Maximum number of levels 25<br>

>       Maximum number of iterations PER hypre call 1<br>

>       Convergence tolerance PER hypre call 0.<br>

>       Threshold for strong coupling 0.25<br>

>       Interpolation truncation factor 0.<br>

>       Interpolation: max elements per row 0<br>

>       Number of levels of aggressive coarsening 0<br>

>       Number of paths for aggressive coarsening 1<br>

>       Maximum row sums 0.9<br>

>       Sweeps down         1<br>

>       Sweeps up           1<br>

>       Sweeps on coarse    1<br>

>       Relax down          symmetric-SOR/Jacobi<br>

>       Relax up            symmetric-SOR/Jacobi<br>

>       Relax on coarse     Gaussian-elimination<br>

>       Relax weight  (all)      1.<br>

>       Outer relax weight (all) 1.<br>

>       Using CF-relaxation<br>

>       Not using more complex smoothers.<br>

>       Measure type        local<br>

>       Coarsen type        Falgout<br>

>       Interpolation type  classical<br>

>   linear system matrix = precond matrix:<br>

>   Mat Object: 1 MPI processes<br>

>     type: seqaijcusparse<br>

>     rows=64, cols=64, bs=4<br>

>     total: nonzeros=1024, allocated nonzeros=1024<br>

>     total number of mallocs used during MatSetValues calls =0<br>

>       using I-node routines: found 16 nodes, limit used is 5<br>

>   1 SNES Function norm 6.80716e-05 <br>

> KSP Object: 1 MPI processes<br>

>   type: fgmres<br>

>     restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement<br>

>     happy breakdown tolerance 1e-30<br>

>   maximum iterations=10000, initial guess is zero<br>

>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.<br>

>   right preconditioning<br>

>   using UNPRECONDITIONED norm type for convergence test<br>

> PC Object: 1 MPI processes<br>

>   type: hypre<br>

>     HYPRE BoomerAMG preconditioning<br>

>       Cycle type V<br>

>       Maximum number of levels 25<br>

>       Maximum number of iterations PER hypre call 1<br>

>       Convergence tolerance PER hypre call 0.<br>

>       Threshold for strong coupling 0.25<br>

>       Interpolation truncation factor 0.<br>

>       Interpolation: max elements per row 0<br>

>       Number of levels of aggressive coarsening 0<br>

>       Number of paths for aggressive coarsening 1<br>

>       Maximum row sums 0.9<br>

>       Sweeps down         1<br>

>       Sweeps up           1<br>

>       Sweeps on coarse    1<br>

>       Relax down          symmetric-SOR/Jacobi<br>

>       Relax up            symmetric-SOR/Jacobi<br>

>       Relax on coarse     Gaussian-elimination<br>

>       Relax weight  (all)      1.<br>

>       Outer relax weight (all) 1.<br>

>       Using CF-relaxation<br>

>       Not using more complex smoothers.<br>

>       Measure type        local<br>

>       Coarsen type        Falgout<br>

>       Interpolation type  classical<br>

>   linear system matrix = precond matrix:<br>

>   Mat Object: 1 MPI processes<br>

>     type: seqaijcusparse<br>

>     rows=64, cols=64, bs=4<br>

>     total: nonzeros=1024, allocated nonzeros=1024<br>

>     total number of mallocs used during MatSetValues calls =0<br>

>       using I-node routines: found 16 nodes, limit used is 5<br>

>   2 SNES Function norm 4.093e-11 <br>

> Number of SNES iterations = 2<br>

> <br>

> <br>

> On Fri, Nov 2, 2018 at 2:10 PM Smith, Barry F. <<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>> wrote:<br>

> <br>

> <br>

> > On Nov 2, 2018, at 1:03 PM, Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>> wrote:<br>

> > <br>

> > FYI, I seem to have the new GPU machine at ORNL (summitdev) working with GPUs. That is good enough for now.<br>

> > Thanks,<br>

> <br>

>    Excellant!<br>

> <br>

> > <br>

> > 14:00 master= ~/petsc/src/snes/examples/tutorials$ jsrun -n 1 ./ex19 -dm_vec_type cuda -dm_mat_type aijcusparse -pc_type none -ksp_type fgmres -snes_monitor_short -snes_rtol 1.e-5 -ksp_view<br>

> > lid velocity = 0.0625, prandtl # = 1., grashof # = 1.<br>

> >   0 SNES Function norm 0.239155 <br>

> > KSP Object: 1 MPI processes<br>

> >   type: fgmres<br>

> >     restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement<br>

> >     happy breakdown tolerance 1e-30<br>

> >   maximum iterations=10000, initial guess is zero<br>

> >   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.<br>

> >   right preconditioning<br>

> >   using UNPRECONDITIONED norm type for convergence test<br>

> > PC Object: 1 MPI processes<br>

> >   type: none<br>

> >   linear system matrix = precond matrix:<br>

> >   Mat Object: 1 MPI processes<br>

> >     type: seqaijcusparse<br>

> >     rows=64, cols=64, bs=4<br>

> >     total: nonzeros=1024, allocated nonzeros=1024<br>

> >     total number of mallocs used during MatSetValues calls =0<br>

> >       using I-node routines: found 16 nodes, limit used is 5<br>

> >   1 SNES Function norm 6.82338e-05 <br>

> > KSP Object: 1 MPI processes<br>

> >   type: fgmres<br>

> >     restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement<br>

> >     happy breakdown tolerance 1e-30<br>

> >   maximum iterations=10000, initial guess is zero<br>

> >   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.<br>

> >   right preconditioning<br>

> >   using UNPRECONDITIONED norm type for convergence test<br>

> > PC Object: 1 MPI processes<br>

> >   type: none<br>

> >   linear system matrix = precond matrix:<br>

> >   Mat Object: 1 MPI processes<br>

> >     type: seqaijcusparse<br>

> >     rows=64, cols=64, bs=4<br>

> >     total: nonzeros=1024, allocated nonzeros=1024<br>

> >     total number of mallocs used during MatSetValues calls =0<br>

> >       using I-node routines: found 16 nodes, limit used is 5<br>

> >   2 SNES Function norm 3.346e-10 <br>

> > Number of SNES iterations = 2<br>

> > 14:01 master= ~/petsc/src/snes/examples/tutorials$ <br>

> > <br>

> > <br>

> > <br>

> > On Thu, Nov 1, 2018 at 9:33 AM Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>> wrote:<br>

> > <br>

> > <br>

> > On Wed, Oct 31, 2018 at 12:30 PM Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>> wrote:<br>

> > <br>

> > <br>

> > On Wed, Oct 31, 2018 at 6:59 AM Karl Rupp <<a href="mailto:rupp@iue.tuwien.ac.at" target="_blank">rupp@iue.tuwien.ac.at</a>> wrote:<br>

> > Hi Mark,<br>

> > <br>

> > ah, I was confused by the Python information at the beginning of <br>

> > configure.log. So it is picking up the correct compiler.<br>

> > <br>

> > Have you tried uncommenting the check for GNU?<br>

> > <br>

> > Yes, but I am getting an error that the cuda files do not find mpi.h.<br>

> >  <br>

> > <br>

> > I'm getting a make error.<br>

> > <br>

> > Thanks, <br>

> <br>

<br>

</blockquote></div>