[petsc-users] Using PETSc with GPU supported SuperLU_Dist

Satish Balay balay at mcs.anl.gov
Mon Feb 24 09:18:51 CST 2020


nvidia-smi gives some relevant info. I'm not sure what exactly the cuda-version listed here refers to..

[is it the max version of cuda - this driver is compatible with?]

Satish

-----

[balay at p1 ~]$ nvidia-smi 
Mon Feb 24 09:15:26 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.59       Driver Version: 440.59       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro T2000        Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   45C    P8     4W /  N/A |    182MiB /  3911MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1372      G   /usr/libexec/Xorg                            180MiB |
+-----------------------------------------------------------------------------+
[balay at p1 ~]$ 


On Mon, 24 Feb 2020, Junchao Zhang via petsc-users wrote:

> [0]PETSC ERROR: error in cudaSetDevice CUDA driver version is insufficient
> for CUDA runtime version
> 
> That means you need to update your cuda driver for CUDA 10.2.  See minimum
> requirement in Table 1 at
> https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#major-components
> 
> --Junchao Zhang
> 
> 
> On Sun, Feb 23, 2020 at 3:33 PM Abhyankar, Shrirang G <
> shrirang.abhyankar at pnnl.gov> wrote:
> 
> > I was using CUDA v10.2. Switching to 9.2 gives a clean make test.
> >
> >
> >
> > Thanks,
> >
> > Shri
> >
> >
> >
> >
> >
> > *From: *petsc-users <petsc-users-bounces at mcs.anl.gov> on behalf of
> > "Abhyankar, Shrirang G via petsc-users" <petsc-users at mcs.anl.gov>
> > *Reply-To: *"Abhyankar, Shrirang G" <shrirang.abhyankar at pnnl.gov>
> > *Date: *Sunday, February 23, 2020 at 3:10 PM
> > *To: *petsc-users <petsc-users at mcs.anl.gov>, Junchao Zhang <
> > jczhang at mcs.anl.gov>
> > *Subject: *Re: [petsc-users] Using PETSc with GPU supported SuperLU_Dist
> >
> >
> >
> > I am getting an error now for CUDA driver version. Any suggestions?
> >
> >
> >
> > petsc:maint$ make test
> >
> > Running test examples to verify correct installation
> >
> > Using PETSC_DIR=/people/abhy245/software/petsc and
> > PETSC_ARCH=debug-mode-newell
> >
> > Possible error running C/C++ src/snes/examples/tutorials/ex19 with 1 MPI
> > process
> >
> > See http://www.mcs.anl.gov/petsc/documentation/faq.html
> >
> > [0]PETSC ERROR: --------------------- Error Message
> > --------------------------------------------------------------
> >
> > [0]PETSC ERROR: Error in system call
> >
> > [0]PETSC ERROR: error in cudaSetDevice CUDA driver version is insufficient
> > for CUDA runtime version
> >
> > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> > for trouble shooting.
> >
> > [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown
> >
> > [0]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by
> > abhy245 Sun Feb 23 12:49:55 2020
> >
> > [0]PETSC ERROR: Configure options --download-fblaslapack --download-make
> > --download-metis --download-parmetis --download-scalapack
> > --download-suitesparse --download-superlu_dist-gpu=1
> > --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++
> > --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1
> > --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1
> > PETSC_ARCH=debug-mode-newell
> >
> > [0]PETSC ERROR: #1 PetscCUDAInitialize() line 261 in
> > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
> >
> > [0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in
> > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
> >
> > [0]PETSC ERROR: #3 PetscInitialize() line 1010 in
> > /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c
> >
> > --------------------------------------------------------------------------
> >
> > Primary job  terminated normally, but 1 process returned
> >
> > a non-zero exit code. Per user-direction, the job has been aborted.
> >
> > --------------------------------------------------------------------------
> >
> > --------------------------------------------------------------------------
> >
> > mpiexec detected that one or more processes exited with non-zero status,
> > thus causing
> >
> > the job to be terminated. The first process to do so was:
> >
> >
> >
> >   Process name: [[46518,1],0]
> >
> >   Exit code:    88
> >
> > --------------------------------------------------------------------------
> >
> > Possible error running C/C++ src/snes/examples/tutorials/ex19 with 2 MPI
> > processes
> >
> > See http://www.mcs.anl.gov/petsc/documentation/faq.html
> >
> > [0]PETSC ERROR: --------------------- Error Message
> > --------------------------------------------------------------
> >
> > [1]PETSC ERROR: --------------------- Error Message
> > --------------------------------------------------------------
> >
> > [1]PETSC ERROR: Error in system call
> >
> > [1]PETSC ERROR: [0]PETSC ERROR: Error in system call
> >
> > [0]PETSC ERROR: error in cudaGetDeviceCount CUDA driver version is
> > insufficient for CUDA runtime version
> >
> > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> > for trouble shooting.
> >
> > error in cudaGetDeviceCount CUDA driver version is insufficient for CUDA
> > runtime version
> >
> > [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> > for trouble shooting.
> >
> > [1]PETSC ERROR: [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown
> >
> > [0]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by
> > abhy245 Sun Feb 23 12:49:57 2020
> >
> > [0]PETSC ERROR: Configure options --download-fblaslapack --download-make
> > --download-metis --download-parmetis --download-scalapack
> > --download-suitesparse --download-superlu_dist-gpu=1
> > --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++
> > --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1
> > --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1
> > PETSC_ARCH=debug-mode-newell
> >
> > [0]PETSC ERROR: #1 PetscCUDAInitialize() line 254 in
> > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
> >
> > [0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in
> > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
> >
> > [0]PETSC ERROR: #3 PetscInitialize() line 1010 in
> > /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c
> >
> > Petsc Release Version 3.12.4, unknown
> >
> > [1]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by
> > abhy245 Sun Feb 23 12:49:57 2020
> >
> > [1]PETSC ERROR: Configure options --download-fblaslapack --download-make
> > --download-metis --download-parmetis --download-scalapack
> > --download-suitesparse --download-superlu_dist-gpu=1
> > --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++
> > --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1
> > --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1
> > PETSC_ARCH=debug-mode-newell
> >
> > [1]PETSC ERROR: #1 PetscCUDAInitialize() line 254 in
> > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
> >
> > [1]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in
> > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
> >
> > [1]PETSC ERROR: #3 PetscInitialize() line 1010 in
> > /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c
> >
> > --------------------------------------------------------------------------
> >
> > Primary job  terminated normally, but 1 process returned
> >
> > a non-zero exit code. Per user-direction, the job has been aborted.
> >
> > --------------------------------------------------------------------------
> >
> > --------------------------------------------------------------------------
> >
> > mpiexec detected that one or more processes exited with non-zero status,
> > thus causing
> >
> > the job to be terminated. The first process to do so was:
> >
> >
> >
> >   Process name: [[46522,1],0]
> >
> >   Exit code:    88
> >
> > --------------------------------------------------------------------------
> >
> > 1,2c1,21
> >
> > < lid velocity = 0.0025, prandtl # = 1., grashof # = 1.
> >
> > < Number of SNES iterations = 2
> >
> > ---
> >
> > > [0]PETSC ERROR: --------------------- Error Message
> > --------------------------------------------------------------
> >
> > > [0]PETSC ERROR: Error in system call
> >
> > > [0]PETSC ERROR: error in cudaSetDevice CUDA driver version is
> > insufficient for CUDA runtime version
> >
> > > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> > for trouble shooting.
> >
> > > [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown
> >
> > > [0]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by
> > abhy245 Sun Feb 23 12:50:00 2020
> >
> > > [0]PETSC ERROR: Configure options --download-fblaslapack --download-make
> > --download-metis --download-parmetis --download-scalapack
> > --download-suitesparse --download-superlu_dist-gpu=1
> > --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++
> > --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1
> > --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1
> > PETSC_ARCH=debug-mode-newell
> >
> > > [0]PETSC ERROR: #1 PetscCUDAInitialize() line 261 in
> > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
> >
> > > [0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in
> > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
> >
> > > [0]PETSC ERROR: #3 PetscInitialize() line 1010 in
> > /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c
> >
> > >
> > --------------------------------------------------------------------------
> >
> > > Primary job  terminated normally, but 1 process returned
> >
> > > a non-zero exit code. Per user-direction, the job has been aborted.
> >
> > >
> > --------------------------------------------------------------------------
> >
> > >
> > --------------------------------------------------------------------------
> >
> > > mpiexec detected that one or more processes exited with non-zero status,
> > thus causing
> >
> > > the job to be terminated. The first process to do so was:
> >
> > >
> >
> > >   Process name: [[46545,1],0]
> >
> > >   Exit code:    88
> >
> > >
> > --------------------------------------------------------------------------
> >
> > /people/abhy245/software/petsc/src/snes/examples/tutorials
> >
> > Possible problem with ex19 running with superlu_dist, diffs above
> >
> > =========================================
> >
> > Possible error running Fortran example src/snes/examples/tutorials/ex5f
> > with 1 MPI process
> >
> > See http://www.mcs.anl.gov/petsc/documentation/faq.html
> >
> > [0]PETSC ERROR: --------------------- Error Message
> > --------------------------------------------------------------
> >
> > [0]PETSC ERROR: Error in system call
> >
> > [0]PETSC ERROR: error in cudaSetDevice CUDA driver version is insufficient
> > for CUDA runtime version
> >
> > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> > for trouble shooting.
> >
> > [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown
> >
> > [0]PETSC ERROR: ./ex5f on a debug-mode-newell named newell01.pnl.gov by
> > abhy245 Sun Feb 23 12:50:04 2020
> >
> > [0]PETSC ERROR: Configure options --download-fblaslapack --download-make
> > --download-metis --download-parmetis --download-scalapack
> > --download-suitesparse --download-superlu_dist-gpu=1
> > --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++
> > --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1
> > --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1
> > PETSC_ARCH=debug-mode-newell
> >
> > [0]PETSC ERROR: #1 PetscCUDAInitialize() line 261 in
> > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
> >
> > [0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in
> > /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
> >
> > [0]PETSC ERROR: PetscInitialize:Checking initial options
> >
> >  Unable to initialize PETSc
> >
> > --------------------------------------------------------------------------
> >
> > mpiexec has exited due to process rank 0 with PID 0 on
> >
> > node newell01 exiting improperly. There are three reasons this could occur:
> >
> >
> >
> > 1. this process did not call "init" before exiting, but others in
> >
> > the job did. This can cause a job to hang indefinitely while it waits
> >
> > for all processes to call "init". By rule, if one process calls "init",
> >
> > then ALL processes must call "init" prior to termination.
> >
> >
> >
> > 2. this process called "init", but exited without calling "finalize".
> >
> > By rule, all processes that call "init" MUST call "finalize" prior to
> >
> > exiting or it will be considered an "abnormal termination"
> >
> >
> >
> > 3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
> >
> > orte_create_session_dirs is set to false. In this case, the run-time cannot
> >
> > detect that the abort call was an abnormal termination. Hence, the only
> >
> > error message you will receive is this one.
> >
> >
> >
> > This may have caused other processes in the application to be
> >
> > terminated by signals sent by mpiexec (as reported here).
> >
> >
> >
> > You can avoid this message by specifying -quiet on the mpiexec command
> > line.
> >
> > --------------------------------------------------------------------------
> >
> > Completed test examples
> >
> > *From: *Satish Balay <balay at mcs.anl.gov>
> > *Reply-To: *petsc-users <petsc-users at mcs.anl.gov>
> > *Date: *Saturday, February 22, 2020 at 9:00 PM
> > *To: *Junchao Zhang <jczhang at mcs.anl.gov>
> > *Cc: *"Abhyankar, Shrirang G" <shrirang.abhyankar at pnnl.gov>, petsc-users <
> > petsc-users at mcs.anl.gov>
> > *Subject: *Re: [petsc-users] Using PETSc with GPU supported SuperLU_Dist
> >
> >
> >
> > The fix is now in both  maint and master
> >
> >
> >
> > https://gitlab.com/petsc/petsc/-/merge_requests/2555
> >
> >
> >
> > Satish
> >
> >
> >
> > On Sat, 22 Feb 2020, Junchao Zhang via petsc-users wrote:
> >
> >
> >
> > We met the error before and knew why. Will fix it soon.
> >
> > --Junchao Zhang
> >
> > On Sat, Feb 22, 2020 at 11:43 AM Abhyankar, Shrirang G via petsc-users <
> >
> > petsc-users at mcs.anl.gov> wrote:
> >
> > > Thanks, Satish. Configure and make go through fine. Getting an undefined
> >
> > > reference error for VecGetArrayWrite_SeqCUDA.
> >
> > >
> >
> > >
> >
> > >
> >
> > > Shri
> >
> > >
> >
> > > *From: *Satish Balay <balay at mcs.anl.gov>
> >
> > > *Reply-To: *petsc-users <petsc-users at mcs.anl.gov>
> >
> > > *Date: *Saturday, February 22, 2020 at 8:25 AM
> >
> > > *To: *"Abhyankar, Shrirang G" <shrirang.abhyankar at pnnl.gov>
> >
> > > *Cc: *"petsc-users at mcs.anl.gov" <petsc-users at mcs.anl.gov>
> >
> > > *Subject: *Re: [petsc-users] Using PETSc with GPU supported SuperLU_Dist
> >
> > >
> >
> > >
> >
> > >
> >
> > > On Sat, 22 Feb 2020, Abhyankar, Shrirang G via petsc-users wrote:
> >
> > >
> >
> > >
> >
> > >
> >
> > > Hi,
> >
> > >
> >
> > >     I want to install PETSc with GPU supported SuperLU_Dist. What are the
> >
> > > configure options I should be using?
> >
> > >
> >
> > >
> >
> > >
> >
> > >
> >
> > >
> >
> > > Shri,
> >
> > >
> >
> > >
> >
> > >
> >
> > >
> >
> > >
> >
> > >     if self.framework.argDB['download-superlu_dist-gpu']:
> >
> > >
> >
> > >       self.cuda           =
> > framework.require('config.packages.cuda',self)
> >
> > >
> >
> > >       self.openmp         =
> >
> > > framework.require('config.packages.openmp',self)
> >
> > >
> >
> > >       self.deps           =
> >
> > > [self.mpi,self.blasLapack,self.cuda,self.openmp]
> >
> > >
> >
> > > <<<<<
> >
> > >
> >
> > >
> >
> > >
> >
> > > So try:
> >
> > >
> >
> > >
> >
> > >
> >
> > > --with-cuda=1 --download-superlu_dist=1 --download-superlu_dist-gpu=1
> >
> > > --with-openmp=1 [and usual MPI, blaslapack]
> >
> > >
> >
> > >
> >
> > >
> >
> > > Satish
> >
> > >
> >
> > >
> >
> > >
> >
> > >
> >
> > >
> >
> >
> >
> >
> >
> 



More information about the petsc-users mailing list