[petsc-dev] Error running on Titan with GPUs & GNU

Smith, Barry F. bsmith at mcs.anl.gov
Fri Nov 2 13:40:36 CDT 2018



> On Nov 2, 2018, at 1:25 PM, Mark Adams <mfadams at lbl.gov> wrote:
> 
> And I just tested it with GAMG and it seems fine.  And hypre ran, but it is not clear that it used GPUs....

    Presumably hyper must be configured to use GPUs. Currently the PETSc hyper download installer hypre.py doesn't have any options for getting hypre built for GPUs.

    Barry

> 
> 14:13 master= ~/petsc/src/snes/examples/tutorials$ jsrun -n 1 ./ex19 -dm_vec_type cuda -dm_mat_type aijcusparse -pc_type hypre -ksp_type fgmres -snes_monitor_short -snes_rtol 1.e-5 -ksp_view
> lid velocity = 0.0625, prandtl # = 1., grashof # = 1.
>   0 SNES Function norm 0.239155 
> KSP Object: 1 MPI processes
>   type: fgmres
>     restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>     happy breakdown tolerance 1e-30
>   maximum iterations=10000, initial guess is zero
>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>   right preconditioning
>   using UNPRECONDITIONED norm type for convergence test
> PC Object: 1 MPI processes
>   type: hypre
>     HYPRE BoomerAMG preconditioning
>       Cycle type V
>       Maximum number of levels 25
>       Maximum number of iterations PER hypre call 1
>       Convergence tolerance PER hypre call 0.
>       Threshold for strong coupling 0.25
>       Interpolation truncation factor 0.
>       Interpolation: max elements per row 0
>       Number of levels of aggressive coarsening 0
>       Number of paths for aggressive coarsening 1
>       Maximum row sums 0.9
>       Sweeps down         1
>       Sweeps up           1
>       Sweeps on coarse    1
>       Relax down          symmetric-SOR/Jacobi
>       Relax up            symmetric-SOR/Jacobi
>       Relax on coarse     Gaussian-elimination
>       Relax weight  (all)      1.
>       Outer relax weight (all) 1.
>       Using CF-relaxation
>       Not using more complex smoothers.
>       Measure type        local
>       Coarsen type        Falgout
>       Interpolation type  classical
>   linear system matrix = precond matrix:
>   Mat Object: 1 MPI processes
>     type: seqaijcusparse
>     rows=64, cols=64, bs=4
>     total: nonzeros=1024, allocated nonzeros=1024
>     total number of mallocs used during MatSetValues calls =0
>       using I-node routines: found 16 nodes, limit used is 5
>   1 SNES Function norm 6.80716e-05 
> KSP Object: 1 MPI processes
>   type: fgmres
>     restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>     happy breakdown tolerance 1e-30
>   maximum iterations=10000, initial guess is zero
>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>   right preconditioning
>   using UNPRECONDITIONED norm type for convergence test
> PC Object: 1 MPI processes
>   type: hypre
>     HYPRE BoomerAMG preconditioning
>       Cycle type V
>       Maximum number of levels 25
>       Maximum number of iterations PER hypre call 1
>       Convergence tolerance PER hypre call 0.
>       Threshold for strong coupling 0.25
>       Interpolation truncation factor 0.
>       Interpolation: max elements per row 0
>       Number of levels of aggressive coarsening 0
>       Number of paths for aggressive coarsening 1
>       Maximum row sums 0.9
>       Sweeps down         1
>       Sweeps up           1
>       Sweeps on coarse    1
>       Relax down          symmetric-SOR/Jacobi
>       Relax up            symmetric-SOR/Jacobi
>       Relax on coarse     Gaussian-elimination
>       Relax weight  (all)      1.
>       Outer relax weight (all) 1.
>       Using CF-relaxation
>       Not using more complex smoothers.
>       Measure type        local
>       Coarsen type        Falgout
>       Interpolation type  classical
>   linear system matrix = precond matrix:
>   Mat Object: 1 MPI processes
>     type: seqaijcusparse
>     rows=64, cols=64, bs=4
>     total: nonzeros=1024, allocated nonzeros=1024
>     total number of mallocs used during MatSetValues calls =0
>       using I-node routines: found 16 nodes, limit used is 5
>   2 SNES Function norm 4.093e-11 
> Number of SNES iterations = 2
> 
> 
> On Fri, Nov 2, 2018 at 2:10 PM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
> 
> 
> > On Nov 2, 2018, at 1:03 PM, Mark Adams <mfadams at lbl.gov> wrote:
> > 
> > FYI, I seem to have the new GPU machine at ORNL (summitdev) working with GPUs. That is good enough for now.
> > Thanks,
> 
>    Excellant!
> 
> > 
> > 14:00 master= ~/petsc/src/snes/examples/tutorials$ jsrun -n 1 ./ex19 -dm_vec_type cuda -dm_mat_type aijcusparse -pc_type none -ksp_type fgmres -snes_monitor_short -snes_rtol 1.e-5 -ksp_view
> > lid velocity = 0.0625, prandtl # = 1., grashof # = 1.
> >   0 SNES Function norm 0.239155 
> > KSP Object: 1 MPI processes
> >   type: fgmres
> >     restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
> >     happy breakdown tolerance 1e-30
> >   maximum iterations=10000, initial guess is zero
> >   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >   right preconditioning
> >   using UNPRECONDITIONED norm type for convergence test
> > PC Object: 1 MPI processes
> >   type: none
> >   linear system matrix = precond matrix:
> >   Mat Object: 1 MPI processes
> >     type: seqaijcusparse
> >     rows=64, cols=64, bs=4
> >     total: nonzeros=1024, allocated nonzeros=1024
> >     total number of mallocs used during MatSetValues calls =0
> >       using I-node routines: found 16 nodes, limit used is 5
> >   1 SNES Function norm 6.82338e-05 
> > KSP Object: 1 MPI processes
> >   type: fgmres
> >     restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
> >     happy breakdown tolerance 1e-30
> >   maximum iterations=10000, initial guess is zero
> >   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >   right preconditioning
> >   using UNPRECONDITIONED norm type for convergence test
> > PC Object: 1 MPI processes
> >   type: none
> >   linear system matrix = precond matrix:
> >   Mat Object: 1 MPI processes
> >     type: seqaijcusparse
> >     rows=64, cols=64, bs=4
> >     total: nonzeros=1024, allocated nonzeros=1024
> >     total number of mallocs used during MatSetValues calls =0
> >       using I-node routines: found 16 nodes, limit used is 5
> >   2 SNES Function norm 3.346e-10 
> > Number of SNES iterations = 2
> > 14:01 master= ~/petsc/src/snes/examples/tutorials$ 
> > 
> > 
> > 
> > On Thu, Nov 1, 2018 at 9:33 AM Mark Adams <mfadams at lbl.gov> wrote:
> > 
> > 
> > On Wed, Oct 31, 2018 at 12:30 PM Mark Adams <mfadams at lbl.gov> wrote:
> > 
> > 
> > On Wed, Oct 31, 2018 at 6:59 AM Karl Rupp <rupp at iue.tuwien.ac.at> wrote:
> > Hi Mark,
> > 
> > ah, I was confused by the Python information at the beginning of 
> > configure.log. So it is picking up the correct compiler.
> > 
> > Have you tried uncommenting the check for GNU?
> > 
> > Yes, but I am getting an error that the cuda files do not find mpi.h.
> >  
> > 
> > I'm getting a make error.
> > 
> > Thanks, 
> 



More information about the petsc-dev mailing list