[petsc-dev] Error running on Titan with GPUs & GNU

Fri Nov 2 16:14:25 CDT 2018

I did not configure hypre manually, so I guess it is not using GPUs.

On Fri, Nov 2, 2018 at 2:40 PM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:

>
>
> > On Nov 2, 2018, at 1:25 PM, Mark Adams <mfadams at lbl.gov> wrote:
> >
> > And I just tested it with GAMG and it seems fine.  And hypre ran, but it
> is not clear that it used GPUs....
>
>     Presumably hyper must be configured to use GPUs. Currently the PETSc
> hyper download installer hypre.py doesn't have any options for getting
> hypre built for GPUs.
>
>     Barry
>
> >
> > 14:13 master= ~/petsc/src/snes/examples/tutorials$ jsrun -n 1 ./ex19
> -dm_vec_type cuda -dm_mat_type aijcusparse -pc_type hypre -ksp_type fgmres
> -snes_monitor_short -snes_rtol 1.e-5 -ksp_view
> > lid velocity = 0.0625, prandtl # = 1., grashof # = 1.
> >   0 SNES Function norm 0.239155
> > KSP Object: 1 MPI processes
> >   type: fgmres
> >     restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> >     happy breakdown tolerance 1e-30
> >   maximum iterations=10000, initial guess is zero
> >   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >   right preconditioning
> >   using UNPRECONDITIONED norm type for convergence test
> > PC Object: 1 MPI processes
> >   type: hypre
> >     HYPRE BoomerAMG preconditioning
> >       Cycle type V
> >       Maximum number of levels 25
> >       Maximum number of iterations PER hypre call 1
> >       Convergence tolerance PER hypre call 0.
> >       Threshold for strong coupling 0.25
> >       Interpolation truncation factor 0.
> >       Interpolation: max elements per row 0
> >       Number of levels of aggressive coarsening 0
> >       Number of paths for aggressive coarsening 1
> >       Maximum row sums 0.9
> >       Sweeps down         1
> >       Sweeps up           1
> >       Sweeps on coarse    1
> >       Relax down          symmetric-SOR/Jacobi
> >       Relax up            symmetric-SOR/Jacobi
> >       Relax on coarse     Gaussian-elimination
> >       Relax weight  (all)      1.
> >       Outer relax weight (all) 1.
> >       Using CF-relaxation
> >       Not using more complex smoothers.
> >       Measure type        local
> >       Coarsen type        Falgout
> >       Interpolation type  classical
> >   linear system matrix = precond matrix:
> >   Mat Object: 1 MPI processes
> >     type: seqaijcusparse
> >     rows=64, cols=64, bs=4
> >     total: nonzeros=1024, allocated nonzeros=1024
> >     total number of mallocs used during MatSetValues calls =0
> >       using I-node routines: found 16 nodes, limit used is 5
> >   1 SNES Function norm 6.80716e-05
> > KSP Object: 1 MPI processes
> >   type: fgmres
> >     restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> >     happy breakdown tolerance 1e-30
> >   maximum iterations=10000, initial guess is zero
> >   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >   right preconditioning
> >   using UNPRECONDITIONED norm type for convergence test
> > PC Object: 1 MPI processes
> >   type: hypre
> >     HYPRE BoomerAMG preconditioning
> >       Cycle type V
> >       Maximum number of levels 25
> >       Maximum number of iterations PER hypre call 1
> >       Convergence tolerance PER hypre call 0.
> >       Threshold for strong coupling 0.25
> >       Interpolation truncation factor 0.
> >       Interpolation: max elements per row 0
> >       Number of levels of aggressive coarsening 0
> >       Number of paths for aggressive coarsening 1
> >       Maximum row sums 0.9
> >       Sweeps down         1
> >       Sweeps up           1
> >       Sweeps on coarse    1
> >       Relax down          symmetric-SOR/Jacobi
> >       Relax up            symmetric-SOR/Jacobi
> >       Relax on coarse     Gaussian-elimination
> >       Relax weight  (all)      1.
> >       Outer relax weight (all) 1.
> >       Using CF-relaxation
> >       Not using more complex smoothers.
> >       Measure type        local
> >       Coarsen type        Falgout
> >       Interpolation type  classical
> >   linear system matrix = precond matrix:
> >   Mat Object: 1 MPI processes
> >     type: seqaijcusparse
> >     rows=64, cols=64, bs=4
> >     total: nonzeros=1024, allocated nonzeros=1024
> >     total number of mallocs used during MatSetValues calls =0
> >       using I-node routines: found 16 nodes, limit used is 5
> >   2 SNES Function norm 4.093e-11
> > Number of SNES iterations = 2
> >
> >
> > On Fri, Nov 2, 2018 at 2:10 PM Smith, Barry F. <bsmith at mcs.anl.gov>
> wrote:
> >
> >
> > > On Nov 2, 2018, at 1:03 PM, Mark Adams <mfadams at lbl.gov> wrote:
> > >
> > > FYI, I seem to have the new GPU machine at ORNL (summitdev) working
> with GPUs. That is good enough for now.
> > > Thanks,
> >
> >    Excellant!
> >
> > >
> > > 14:00 master= ~/petsc/src/snes/examples/tutorials$ jsrun -n 1 ./ex19
> -dm_vec_type cuda -dm_mat_type aijcusparse -pc_type none -ksp_type fgmres
> -snes_monitor_short -snes_rtol 1.e-5 -ksp_view
> > > lid velocity = 0.0625, prandtl # = 1., grashof # = 1.
> > >   0 SNES Function norm 0.239155
> > > KSP Object: 1 MPI processes
> > >   type: fgmres
> > >     restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> > >     happy breakdown tolerance 1e-30
> > >   maximum iterations=10000, initial guess is zero
> > >   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> > >   right preconditioning
> > >   using UNPRECONDITIONED norm type for convergence test
> > > PC Object: 1 MPI processes
> > >   type: none
> > >   linear system matrix = precond matrix:
> > >   Mat Object: 1 MPI processes
> > >     type: seqaijcusparse
> > >     rows=64, cols=64, bs=4
> > >     total: nonzeros=1024, allocated nonzeros=1024
> > >     total number of mallocs used during MatSetValues calls =0
> > >       using I-node routines: found 16 nodes, limit used is 5
> > >   1 SNES Function norm 6.82338e-05
> > > KSP Object: 1 MPI processes
> > >   type: fgmres
> > >     restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> > >     happy breakdown tolerance 1e-30
> > >   maximum iterations=10000, initial guess is zero
> > >   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> > >   right preconditioning
> > >   using UNPRECONDITIONED norm type for convergence test
> > > PC Object: 1 MPI processes
> > >   type: none
> > >   linear system matrix = precond matrix:
> > >   Mat Object: 1 MPI processes
> > >     type: seqaijcusparse
> > >     rows=64, cols=64, bs=4
> > >     total: nonzeros=1024, allocated nonzeros=1024
> > >     total number of mallocs used during MatSetValues calls =0
> > >       using I-node routines: found 16 nodes, limit used is 5
> > >   2 SNES Function norm 3.346e-10
> > > Number of SNES iterations = 2
> > > 14:01 master= ~/petsc/src/snes/examples/tutorials$
> > >
> > >
> > >
> > > On Thu, Nov 1, 2018 at 9:33 AM Mark Adams <mfadams at lbl.gov> wrote:
> > >
> > >
> > > On Wed, Oct 31, 2018 at 12:30 PM Mark Adams <mfadams at lbl.gov> wrote:
> > >
> > >
> > > On Wed, Oct 31, 2018 at 6:59 AM Karl Rupp <rupp at iue.tuwien.ac.at>
> wrote:
> > > Hi Mark,
> > >
> > > ah, I was confused by the Python information at the beginning of
> > > configure.log. So it is picking up the correct compiler.
> > >
> > > Have you tried uncommenting the check for GNU?
> > >
> > > Yes, but I am getting an error that the cuda files do not find mpi.h.
> > >
> > >
> > > I'm getting a make error.
> > >
> > > Thanks,
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20181102/4be74b01/attachment.html>