[petsc-users] Errors about PETSc MPI+GPU

Matthew Knepley knepley at gmail.com
Mon May 29 14:06:14 CDT 2017


On Mon, May 29, 2017 at 11:19 AM, Xinzhe Wu <xinzhe.wu1990 at gmail.com> wrote:

> Dear all,
>
> We have developed the codes with PETSc + SLEPc which works well on CPU
> version. Now we want to try these codes with GPU + MPI, but get some weird
> errors shown as below.
>
> I have found someone talked about this problem here
> http://lists.mcs.anl.gov/pipermail/petsc-dev/2016-March/018836.html , but
> I can hardly understand it. Can anyone help me with these issues?
>

The answer is here:

>>>>* I think the error messages you get is pretty descriptive regarding the root cause. You are probably running out of GPU memory. Since you are running on a GTX 285 you can't use MPS [1] therefore each MPI process has its own context on the GPU. Each context needs to initialize some data on the GPU (used for local variables and so on). The required amount needed for this depends on the size of the GPUs (essentially correlates with the maximum number of concurrently active threads). This can easily be 50-100MB. So with only 1GB of GPU memory you are probably using all GPUs memory for context data and nothing is available for your application. Unfortunately there is no good way to debug this with GeForce. On Tesla nvidia-smi does show you all processes that have a context on a GPU together with their memory consumption.*

It appears that you are running out of GPU memory. This can happen if you
use too many
MPI processes for a single GPU.

  Thanks,

     Matt


> Thank you in advance!
>
>
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: Error in external library
> [0]PETSC ERROR: CUBLAS error 1
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [0]PETSC ERROR: [2]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [2]PETSC ERROR: Error in external library
> [2]PETSC ERROR: CUBLAS error 1
> [2]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [2]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3965-gf375733  GIT
> Date: 2017-05-28 10:32:02 -0500
> [2]PETSC ERROR: ./hyperh on a arch-linux2-c-debug named romeo44 by
> xinzhewu Mon May 29 18:03:58 2017
> [2]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++
> --with-fc=gfortran --download-mpich --download-fblaslapack
> --with-visibility=0 --with-shared-libraries=0 --with-cuda=1 --with-thrust=1
> --with-precision=double --with-clanguage=c --with-pestc-arch=linux-c-no-debug-complex
> --with-scalar-type=complex
> [2]PETSC ERROR: #1 PetscInitialize() line 906 in /home/xinzhewu/Petsc-GPUs/
> petsc/src/sys/objects/pinit.c
> [2]PETSC ERROR: #2 SlepcInitialize() line 259 in /home/xinzhewu/Petsc-GPUs/
> slepc/src/sys/slepcinit.c
>
>
> --
> Xinzhe WU
> Ph.D Student of Computer Science
> Maison de la Simulation, CNRS USR3441
> Building 565, CEA Saclay
> 91191, Gif-sur-Yvette, France
> Tel: +33 (0) 1 69 08 59 93 <+33%201%2069%2008%2059%2093>
>



-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

http://www.caam.rice.edu/~mk51/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170529/12920487/attachment.html>


More information about the petsc-users mailing list