[petsc-dev] [petsc-maint] running CUDA on SUMMIT

Mark Adams mfadams at lbl.gov
Fri Aug 30 13:56:59 CDT 2019


Here is some more weak scaling data with a fixed number of iterations (I
have given a test with the numerical problems to ORNL and they said they
would give it to Nvidia).

I implemented an option to "spread" the reduced coarse grids across the
whole machine as opposed to a "compact" layout where active processes are
laid out in simple lexicographical order. This spread approach looks a
little better.

Mark

On Wed, Aug 14, 2019 at 10:46 PM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:

>
>   Ahh, PGI compiler, that explains it :-)
>
>   Ok, thanks. Don't worry about the runs right now. We'll figure out the
> fix. The code is just
>
>   *a = (PetscReal)strtod(name,endptr);
>
>   could be a compiler bug.
>
>
>
>
> > On Aug 14, 2019, at 9:23 PM, Mark Adams <mfadams at lbl.gov> wrote:
> >
> > I am getting this error with single:
> >
> > 22:21  /gpfs/alpine/geo127/scratch/adams$ jsrun -n 1 -a 1 -c 1 -g 1
> ./ex56_single -cells 2,2,2 -ex56_dm_vec_type cuda -ex56_dm_mat_type
> aijcusparse -fp_trap
> > [0] 81 global equations, 27 vertices
> > [0]PETSC ERROR: *** unknown floating point error occurred ***
> > [0]PETSC ERROR: The specific exception can be determined by running in a
> debugger.  When the
> > [0]PETSC ERROR: debugger traps the signal, the exception can be found
> with fetestexcept(0x3e000000)
> > [0]PETSC ERROR: where the result is a bitwise OR of the following flags:
> > [0]PETSC ERROR: FE_INVALID=0x20000000 FE_DIVBYZERO=0x4000000
> FE_OVERFLOW=0x10000000 FE_UNDERFLOW=0x8000000 FE_INEXACT=0x2000000
> > [0]PETSC ERROR: Try option -start_in_debugger
> > [0]PETSC ERROR: likely location of problem given in stack below
> > [0]PETSC ERROR: ---------------------  Stack Frames
> ------------------------------------
> > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not
> available,
> > [0]PETSC ERROR:       INSTEAD the line number of the start of the
> function
> > [0]PETSC ERROR:       is given.
> > [0]PETSC ERROR: [0] PetscDefaultFPTrap line 355
> /autofs/nccs-svm1_home1/adams/petsc/src/sys/error/fp.c
> > [0]PETSC ERROR: [0] PetscStrtod line 1964
> /autofs/nccs-svm1_home1/adams/petsc/src/sys/objects/options.c
> > [0]PETSC ERROR: [0] PetscOptionsStringToReal line 2021
> /autofs/nccs-svm1_home1/adams/petsc/src/sys/objects/options.c
> > [0]PETSC ERROR: [0] PetscOptionsGetReal line 2321
> /autofs/nccs-svm1_home1/adams/petsc/src/sys/objects/options.c
> > [0]PETSC ERROR: [0] PetscOptionsReal_Private line 1015
> /autofs/nccs-svm1_home1/adams/petsc/src/sys/objects/aoptions.c
> > [0]PETSC ERROR: [0] KSPSetFromOptions line 329
> /autofs/nccs-svm1_home1/adams/petsc/src/ksp/ksp/interface/itcl.c
> > [0]PETSC ERROR: [0] SNESSetFromOptions line 869
> /autofs/nccs-svm1_home1/adams/petsc/src/snes/interface/snes.c
> > [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> > [0]PETSC ERROR: Floating point exception
> > [0]PETSC ERROR: trapped floating point error
> > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> > [0]PETSC ERROR: Petsc Development GIT revision: v3.11.3-1685-gd3eb2e1
> GIT Date: 2019-08-13 06:33:29 -0400
> > [0]PETSC ERROR: ./ex56_single on a arch-summit-dbg-single-pgi-cuda named
> h36n11 by adams Wed Aug 14 22:21:56 2019
> > [0]PETSC ERROR: Configure options --with-cc=mpicc --with-cxx=mpiCC
> --with-fc=mpif90 COPTFLAGS="-g -Mfcon" CXXOPTFLAGS="-g -Mfcon"
> FOPTFLAGS="-g -Mfcon" --with-precision=single --with-ssl=0 --with-batch=0
> --with-mpiexec="jsrun -g 1" --with-cuda=1 --with-cudac=nvcc
> CUDAFLAGS="-ccbin pgc++" --download-metis --download-parmetis
> --download-fblaslapack --with-x=0 --with-64-bit-indices=0
> --with-debugging=1 PETSC_ARCH=arch-summit-dbg-single-pgi-cuda
> > [0]PETSC ERROR: #1 User provided function() line 0 in Unknown file
> >
> --------------------------------------------------------------------------
> >
> > On Wed, Aug 14, 2019 at 9:51 PM Smith, Barry F. <bsmith at mcs.anl.gov>
> wrote:
> >
> >   Oh, doesn't even have to be that large. We just need to be able to
> look at the flop rates (as a surrogate for run times) and compare with the
> previous runs. So long as the size per process is pretty much the same that
> is good enough.
> >
> >    Barry
> >
> >
> > > On Aug 14, 2019, at 8:45 PM, Mark Adams <mfadams at lbl.gov> wrote:
> > >
> > > I can run single, I just can't scale up. But I can use like 1500
> processors.
> > >
> > > On Wed, Aug 14, 2019 at 9:31 PM Smith, Barry F. <bsmith at mcs.anl.gov>
> wrote:
> > >
> > >   Oh, are all your integers 8 bytes? Even on one node?
> > >
> > >   Once Karl's new middleware is in place we should see about reducing
> to 4 bytes on the GPU.
> > >
> > >    Barry
> > >
> > >
> > > > On Aug 14, 2019, at 7:44 PM, Mark Adams <mfadams at lbl.gov> wrote:
> > > >
> > > > OK, I'll run single. It a bit perverse to run with 4 byte floats and
> 8 byte integers ... I could use 32 bit ints and just not scale out.
> > > >
> > > > On Wed, Aug 14, 2019 at 6:48 PM Smith, Barry F. <bsmith at mcs.anl.gov>
> wrote:
> > > >
> > > >  Mark,
> > > >
> > > >    Oh, I don't even care if it converges, just put in a fixed number
> of iterations. The idea is to just get a baseline of the possible
> improvement.
> > > >
> > > >     ECP is literally dropping millions into research on "multi
> precision" computations on GPUs, we need to have some actual numbers for
> the best potential benefit to determine how much we invest in further
> investigating it, or not.
> > > >
> > > >     I am not expressing any opinions on the approach, we are just in
> the fact gathering stage.
> > > >
> > > >
> > > >    Barry
> > > >
> > > >
> > > > > On Aug 14, 2019, at 2:27 PM, Mark Adams <mfadams at lbl.gov> wrote:
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Aug 14, 2019 at 2:35 PM Smith, Barry F. <
> bsmith at mcs.anl.gov> wrote:
> > > > >
> > > > >   Mark,
> > > > >
> > > > >    Would you be able to make one run using single precision? Just
> single everywhere since that is all we support currently?
> > > > >
> > > > >
> > > > > Experience in engineering at least is single does not work for FE
> elasticity. I have tried it many years ago and have heard this from others.
> This problem is pretty simple other than using Q2. I suppose I could try
> it, but just be aware the FE people might say that single sucks.
> > > > >
> > > > >    The results will give us motivation (or anti-motivation) to
> have support for running KSP (or PC (or Mat)  in single precision while the
> simulation is double.
> > > > >
> > > > >    Thanks.
> > > > >
> > > > >      Barry
> > > > >
> > > > > For example if the GPU speed on KSP is a factor of 3 over the
> double on GPUs this is serious motivation.
> > > > >
> > > > >
> > > > > > On Aug 14, 2019, at 12:45 PM, Mark Adams <mfadams at lbl.gov>
> wrote:
> > > > > >
> > > > > > FYI, Here is some scaling data of GAMG on SUMMIT. Getting about
> 4x GPU speedup with 98K dof/proc (3D Q2 elasticity).
> > > > > >
> > > > > > This is weak scaling of a solve. There is growth in iteration
> count folded in here. I should put rtol in the title and/or run a fixed
> number of iterations and make it clear in the title.
> > > > > >
> > > > > > Comments welcome.
> > > > > >
> <out_cpu_012288><out_cpu_001536><out_cuda_012288><out_cpu_000024><out_cpu_000192><out_cuda_001536><out_cuda_000192><out_cuda_000024><weak_scaling_cpu.png><weak_scaling_cuda.png>
> > > > >
> > > >
> > >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20190830/c0fca5fa/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: weak_scaling_gpu_compact_spread.png
Type: image/png
Size: 80121 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20190830/c0fca5fa/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: weak_scaling_cpu.png
Type: image/png
Size: 45433 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20190830/c0fca5fa/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: spread.tar
Type: application/x-tar
Size: 532480 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20190830/c0fca5fa/attachment-0002.tar>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: compact.tar
Type: application/x-tar
Size: 532480 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20190830/c0fca5fa/attachment-0003.tar>


More information about the petsc-dev mailing list