[petsc-dev] [petsc-maint] running CUDA on SUMMIT

Wed Aug 14 21:46:07 CDT 2019

  Ahh, PGI compiler, that explains it :-)

  Ok, thanks. Don't worry about the runs right now. We'll figure out the fix. The code is just

  *a = (PetscReal)strtod(name,endptr);

  could be a compiler bug.

> On Aug 14, 2019, at 9:23 PM, Mark Adams <mfadams at lbl.gov> wrote:
> 
> I am getting this error with single:
> 
> 22:21  /gpfs/alpine/geo127/scratch/adams$ jsrun -n 1 -a 1 -c 1 -g 1 ./ex56_single -cells 2,2,2 -ex56_dm_vec_type cuda -ex56_dm_mat_type aijcusparse -fp_trap 
> [0] 81 global equations, 27 vertices
> [0]PETSC ERROR: *** unknown floating point error occurred ***
> [0]PETSC ERROR: The specific exception can be determined by running in a debugger.  When the
> [0]PETSC ERROR: debugger traps the signal, the exception can be found with fetestexcept(0x3e000000)
> [0]PETSC ERROR: where the result is a bitwise OR of the following flags:
> [0]PETSC ERROR: FE_INVALID=0x20000000 FE_DIVBYZERO=0x4000000 FE_OVERFLOW=0x10000000 FE_UNDERFLOW=0x8000000 FE_INEXACT=0x2000000
> [0]PETSC ERROR: Try option -start_in_debugger
> [0]PETSC ERROR: likely location of problem given in stack below
> [0]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
> [0]PETSC ERROR:       INSTEAD the line number of the start of the function
> [0]PETSC ERROR:       is given.
> [0]PETSC ERROR: [0] PetscDefaultFPTrap line 355 /autofs/nccs-svm1_home1/adams/petsc/src/sys/error/fp.c
> [0]PETSC ERROR: [0] PetscStrtod line 1964 /autofs/nccs-svm1_home1/adams/petsc/src/sys/objects/options.c
> [0]PETSC ERROR: [0] PetscOptionsStringToReal line 2021 /autofs/nccs-svm1_home1/adams/petsc/src/sys/objects/options.c
> [0]PETSC ERROR: [0] PetscOptionsGetReal line 2321 /autofs/nccs-svm1_home1/adams/petsc/src/sys/objects/options.c
> [0]PETSC ERROR: [0] PetscOptionsReal_Private line 1015 /autofs/nccs-svm1_home1/adams/petsc/src/sys/objects/aoptions.c
> [0]PETSC ERROR: [0] KSPSetFromOptions line 329 /autofs/nccs-svm1_home1/adams/petsc/src/ksp/ksp/interface/itcl.c
> [0]PETSC ERROR: [0] SNESSetFromOptions line 869 /autofs/nccs-svm1_home1/adams/petsc/src/snes/interface/snes.c
> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [0]PETSC ERROR: Floating point exception
> [0]PETSC ERROR: trapped floating point error
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [0]PETSC ERROR: Petsc Development GIT revision: v3.11.3-1685-gd3eb2e1  GIT Date: 2019-08-13 06:33:29 -0400
> [0]PETSC ERROR: ./ex56_single on a arch-summit-dbg-single-pgi-cuda named h36n11 by adams Wed Aug 14 22:21:56 2019
> [0]PETSC ERROR: Configure options --with-cc=mpicc --with-cxx=mpiCC --with-fc=mpif90 COPTFLAGS="-g -Mfcon" CXXOPTFLAGS="-g -Mfcon" FOPTFLAGS="-g -Mfcon" --with-precision=single --with-ssl=0 --with-batch=0 --with-mpiexec="jsrun -g 1" --with-cuda=1 --with-cudac=nvcc CUDAFLAGS="-ccbin pgc++" --download-metis --download-parmetis --download-fblaslapack --with-x=0 --with-64-bit-indices=0 --with-debugging=1 PETSC_ARCH=arch-summit-dbg-single-pgi-cuda
> [0]PETSC ERROR: #1 User provided function() line 0 in Unknown file
> --------------------------------------------------------------------------
> 
> On Wed, Aug 14, 2019 at 9:51 PM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
> 
>   Oh, doesn't even have to be that large. We just need to be able to look at the flop rates (as a surrogate for run times) and compare with the previous runs. So long as the size per process is pretty much the same that is good enough.
> 
>    Barry
> 
> 
> > On Aug 14, 2019, at 8:45 PM, Mark Adams <mfadams at lbl.gov> wrote:
> > 
> > I can run single, I just can't scale up. But I can use like 1500 processors.
> > 
> > On Wed, Aug 14, 2019 at 9:31 PM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
> > 
> >   Oh, are all your integers 8 bytes? Even on one node?
> > 
> >   Once Karl's new middleware is in place we should see about reducing to 4 bytes on the GPU.
> > 
> >    Barry
> > 
> > 
> > > On Aug 14, 2019, at 7:44 PM, Mark Adams <mfadams at lbl.gov> wrote:
> > > 
> > > OK, I'll run single. It a bit perverse to run with 4 byte floats and 8 byte integers ... I could use 32 bit ints and just not scale out.
> > > 
> > > On Wed, Aug 14, 2019 at 6:48 PM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
> > > 
> > >  Mark,
> > > 
> > >    Oh, I don't even care if it converges, just put in a fixed number of iterations. The idea is to just get a baseline of the possible improvement. 
> > > 
> > >     ECP is literally dropping millions into research on "multi precision" computations on GPUs, we need to have some actual numbers for the best potential benefit to determine how much we invest in further investigating it, or not.
> > > 
> > >     I am not expressing any opinions on the approach, we are just in the fact gathering stage.
> > > 
> > > 
> > >    Barry
> > > 
> > > 
> > > > On Aug 14, 2019, at 2:27 PM, Mark Adams <mfadams at lbl.gov> wrote:
> > > > 
> > > > 
> > > > 
> > > > On Wed, Aug 14, 2019 at 2:35 PM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
> > > > 
> > > >   Mark,
> > > > 
> > > >    Would you be able to make one run using single precision? Just single everywhere since that is all we support currently? 
> > > > 
> > > > 
> > > > Experience in engineering at least is single does not work for FE elasticity. I have tried it many years ago and have heard this from others. This problem is pretty simple other than using Q2. I suppose I could try it, but just be aware the FE people might say that single sucks.
> > > >  
> > > >    The results will give us motivation (or anti-motivation) to have support for running KSP (or PC (or Mat)  in single precision while the simulation is double.
> > > > 
> > > >    Thanks.
> > > > 
> > > >      Barry
> > > > 
> > > > For example if the GPU speed on KSP is a factor of 3 over the double on GPUs this is serious motivation. 
> > > > 
> > > > 
> > > > > On Aug 14, 2019, at 12:45 PM, Mark Adams <mfadams at lbl.gov> wrote:
> > > > > 
> > > > > FYI, Here is some scaling data of GAMG on SUMMIT. Getting about 4x GPU speedup with 98K dof/proc (3D Q2 elasticity).
> > > > > 
> > > > > This is weak scaling of a solve. There is growth in iteration count folded in here. I should put rtol in the title and/or run a fixed number of iterations and make it clear in the title.
> > > > > 
> > > > > Comments welcome.
> > > > > <out_cpu_012288><out_cpu_001536><out_cuda_012288><out_cpu_000024><out_cpu_000192><out_cuda_001536><out_cuda_000192><out_cuda_000024><weak_scaling_cpu.png><weak_scaling_cuda.png>
> > > > 
> > > 
> > 
>