[petsc-users] Using PETSc with GPU supported SuperLU_Dist

Junchao Zhang jczhang at mcs.anl.gov
Mon Feb 24 09:01:05 CST 2020


[0]PETSC ERROR: error in cudaSetDevice CUDA driver version is insufficient
for CUDA runtime version

That means you need to update your cuda driver for CUDA 10.2.  See minimum
requirement in Table 1 at
https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#major-components

--Junchao Zhang


On Sun, Feb 23, 2020 at 3:33 PM Abhyankar, Shrirang G <
shrirang.abhyankar at pnnl.gov> wrote:

> I was using CUDA v10.2. Switching to 9.2 gives a clean make test.
>
>
>
> Thanks,
>
> Shri
>
>
>
>
>
> *From: *petsc-users <petsc-users-bounces at mcs.anl.gov> on behalf of
> "Abhyankar, Shrirang G via petsc-users" <petsc-users at mcs.anl.gov>
> *Reply-To: *"Abhyankar, Shrirang G" <shrirang.abhyankar at pnnl.gov>
> *Date: *Sunday, February 23, 2020 at 3:10 PM
> *To: *petsc-users <petsc-users at mcs.anl.gov>, Junchao Zhang <
> jczhang at mcs.anl.gov>
> *Subject: *Re: [petsc-users] Using PETSc with GPU supported SuperLU_Dist
>
>
>
> I am getting an error now for CUDA driver version. Any suggestions?
>
>
>
> petsc:maint$ make test
>
> Running test examples to verify correct installation
>
> Using PETSC_DIR=/people/abhy245/software/petsc and
> PETSC_ARCH=debug-mode-newell
>
> Possible error running C/C++ src/snes/examples/tutorials/ex19 with 1 MPI
> process
>
> See http://www.mcs.anl.gov/petsc/documentation/faq.html
>
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
>
> [0]PETSC ERROR: Error in system call
>
> [0]PETSC ERROR: error in cudaSetDevice CUDA driver version is insufficient
> for CUDA runtime version
>
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
>
> [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown
>
> [0]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by
> abhy245 Sun Feb 23 12:49:55 2020
>
> [0]PETSC ERROR: Configure options --download-fblaslapack --download-make
> --download-metis --download-parmetis --download-scalapack
> --download-suitesparse --download-superlu_dist-gpu=1
> --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++
> --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1
> --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1
> PETSC_ARCH=debug-mode-newell
>
> [0]PETSC ERROR: #1 PetscCUDAInitialize() line 261 in
> /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
>
> [0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in
> /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
>
> [0]PETSC ERROR: #3 PetscInitialize() line 1010 in
> /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c
>
> --------------------------------------------------------------------------
>
> Primary job  terminated normally, but 1 process returned
>
> a non-zero exit code. Per user-direction, the job has been aborted.
>
> --------------------------------------------------------------------------
>
> --------------------------------------------------------------------------
>
> mpiexec detected that one or more processes exited with non-zero status,
> thus causing
>
> the job to be terminated. The first process to do so was:
>
>
>
>   Process name: [[46518,1],0]
>
>   Exit code:    88
>
> --------------------------------------------------------------------------
>
> Possible error running C/C++ src/snes/examples/tutorials/ex19 with 2 MPI
> processes
>
> See http://www.mcs.anl.gov/petsc/documentation/faq.html
>
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
>
> [1]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
>
> [1]PETSC ERROR: Error in system call
>
> [1]PETSC ERROR: [0]PETSC ERROR: Error in system call
>
> [0]PETSC ERROR: error in cudaGetDeviceCount CUDA driver version is
> insufficient for CUDA runtime version
>
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
>
> error in cudaGetDeviceCount CUDA driver version is insufficient for CUDA
> runtime version
>
> [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
>
> [1]PETSC ERROR: [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown
>
> [0]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by
> abhy245 Sun Feb 23 12:49:57 2020
>
> [0]PETSC ERROR: Configure options --download-fblaslapack --download-make
> --download-metis --download-parmetis --download-scalapack
> --download-suitesparse --download-superlu_dist-gpu=1
> --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++
> --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1
> --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1
> PETSC_ARCH=debug-mode-newell
>
> [0]PETSC ERROR: #1 PetscCUDAInitialize() line 254 in
> /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
>
> [0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in
> /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
>
> [0]PETSC ERROR: #3 PetscInitialize() line 1010 in
> /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c
>
> Petsc Release Version 3.12.4, unknown
>
> [1]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by
> abhy245 Sun Feb 23 12:49:57 2020
>
> [1]PETSC ERROR: Configure options --download-fblaslapack --download-make
> --download-metis --download-parmetis --download-scalapack
> --download-suitesparse --download-superlu_dist-gpu=1
> --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++
> --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1
> --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1
> PETSC_ARCH=debug-mode-newell
>
> [1]PETSC ERROR: #1 PetscCUDAInitialize() line 254 in
> /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
>
> [1]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in
> /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
>
> [1]PETSC ERROR: #3 PetscInitialize() line 1010 in
> /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c
>
> --------------------------------------------------------------------------
>
> Primary job  terminated normally, but 1 process returned
>
> a non-zero exit code. Per user-direction, the job has been aborted.
>
> --------------------------------------------------------------------------
>
> --------------------------------------------------------------------------
>
> mpiexec detected that one or more processes exited with non-zero status,
> thus causing
>
> the job to be terminated. The first process to do so was:
>
>
>
>   Process name: [[46522,1],0]
>
>   Exit code:    88
>
> --------------------------------------------------------------------------
>
> 1,2c1,21
>
> < lid velocity = 0.0025, prandtl # = 1., grashof # = 1.
>
> < Number of SNES iterations = 2
>
> ---
>
> > [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
>
> > [0]PETSC ERROR: Error in system call
>
> > [0]PETSC ERROR: error in cudaSetDevice CUDA driver version is
> insufficient for CUDA runtime version
>
> > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
>
> > [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown
>
> > [0]PETSC ERROR: ./ex19 on a debug-mode-newell named newell01.pnl.gov by
> abhy245 Sun Feb 23 12:50:00 2020
>
> > [0]PETSC ERROR: Configure options --download-fblaslapack --download-make
> --download-metis --download-parmetis --download-scalapack
> --download-suitesparse --download-superlu_dist-gpu=1
> --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++
> --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1
> --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1
> PETSC_ARCH=debug-mode-newell
>
> > [0]PETSC ERROR: #1 PetscCUDAInitialize() line 261 in
> /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
>
> > [0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in
> /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
>
> > [0]PETSC ERROR: #3 PetscInitialize() line 1010 in
> /qfs/people/abhy245/software/petsc/src/sys/objects/pinit.c
>
> >
> --------------------------------------------------------------------------
>
> > Primary job  terminated normally, but 1 process returned
>
> > a non-zero exit code. Per user-direction, the job has been aborted.
>
> >
> --------------------------------------------------------------------------
>
> >
> --------------------------------------------------------------------------
>
> > mpiexec detected that one or more processes exited with non-zero status,
> thus causing
>
> > the job to be terminated. The first process to do so was:
>
> >
>
> >   Process name: [[46545,1],0]
>
> >   Exit code:    88
>
> >
> --------------------------------------------------------------------------
>
> /people/abhy245/software/petsc/src/snes/examples/tutorials
>
> Possible problem with ex19 running with superlu_dist, diffs above
>
> =========================================
>
> Possible error running Fortran example src/snes/examples/tutorials/ex5f
> with 1 MPI process
>
> See http://www.mcs.anl.gov/petsc/documentation/faq.html
>
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
>
> [0]PETSC ERROR: Error in system call
>
> [0]PETSC ERROR: error in cudaSetDevice CUDA driver version is insufficient
> for CUDA runtime version
>
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
>
> [0]PETSC ERROR: Petsc Release Version 3.12.4, unknown
>
> [0]PETSC ERROR: ./ex5f on a debug-mode-newell named newell01.pnl.gov by
> abhy245 Sun Feb 23 12:50:04 2020
>
> [0]PETSC ERROR: Configure options --download-fblaslapack --download-make
> --download-metis --download-parmetis --download-scalapack
> --download-suitesparse --download-superlu_dist-gpu=1
> --download-superlu_dist=1 --with-cc=mpicc --with-clanguage=c++
> --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1
> --with-cxx-dialect=C++11 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1
> PETSC_ARCH=debug-mode-newell
>
> [0]PETSC ERROR: #1 PetscCUDAInitialize() line 261 in
> /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
>
> [0]PETSC ERROR: #2 PetscOptionsCheckInitial_Private() line 652 in
> /qfs/people/abhy245/software/petsc/src/sys/objects/init.c
>
> [0]PETSC ERROR: PetscInitialize:Checking initial options
>
>  Unable to initialize PETSc
>
> --------------------------------------------------------------------------
>
> mpiexec has exited due to process rank 0 with PID 0 on
>
> node newell01 exiting improperly. There are three reasons this could occur:
>
>
>
> 1. this process did not call "init" before exiting, but others in
>
> the job did. This can cause a job to hang indefinitely while it waits
>
> for all processes to call "init". By rule, if one process calls "init",
>
> then ALL processes must call "init" prior to termination.
>
>
>
> 2. this process called "init", but exited without calling "finalize".
>
> By rule, all processes that call "init" MUST call "finalize" prior to
>
> exiting or it will be considered an "abnormal termination"
>
>
>
> 3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
>
> orte_create_session_dirs is set to false. In this case, the run-time cannot
>
> detect that the abort call was an abnormal termination. Hence, the only
>
> error message you will receive is this one.
>
>
>
> This may have caused other processes in the application to be
>
> terminated by signals sent by mpiexec (as reported here).
>
>
>
> You can avoid this message by specifying -quiet on the mpiexec command
> line.
>
> --------------------------------------------------------------------------
>
> Completed test examples
>
> *From: *Satish Balay <balay at mcs.anl.gov>
> *Reply-To: *petsc-users <petsc-users at mcs.anl.gov>
> *Date: *Saturday, February 22, 2020 at 9:00 PM
> *To: *Junchao Zhang <jczhang at mcs.anl.gov>
> *Cc: *"Abhyankar, Shrirang G" <shrirang.abhyankar at pnnl.gov>, petsc-users <
> petsc-users at mcs.anl.gov>
> *Subject: *Re: [petsc-users] Using PETSc with GPU supported SuperLU_Dist
>
>
>
> The fix is now in both  maint and master
>
>
>
> https://gitlab.com/petsc/petsc/-/merge_requests/2555
>
>
>
> Satish
>
>
>
> On Sat, 22 Feb 2020, Junchao Zhang via petsc-users wrote:
>
>
>
> We met the error before and knew why. Will fix it soon.
>
> --Junchao Zhang
>
> On Sat, Feb 22, 2020 at 11:43 AM Abhyankar, Shrirang G via petsc-users <
>
> petsc-users at mcs.anl.gov> wrote:
>
> > Thanks, Satish. Configure and make go through fine. Getting an undefined
>
> > reference error for VecGetArrayWrite_SeqCUDA.
>
> >
>
> >
>
> >
>
> > Shri
>
> >
>
> > *From: *Satish Balay <balay at mcs.anl.gov>
>
> > *Reply-To: *petsc-users <petsc-users at mcs.anl.gov>
>
> > *Date: *Saturday, February 22, 2020 at 8:25 AM
>
> > *To: *"Abhyankar, Shrirang G" <shrirang.abhyankar at pnnl.gov>
>
> > *Cc: *"petsc-users at mcs.anl.gov" <petsc-users at mcs.anl.gov>
>
> > *Subject: *Re: [petsc-users] Using PETSc with GPU supported SuperLU_Dist
>
> >
>
> >
>
> >
>
> > On Sat, 22 Feb 2020, Abhyankar, Shrirang G via petsc-users wrote:
>
> >
>
> >
>
> >
>
> > Hi,
>
> >
>
> >     I want to install PETSc with GPU supported SuperLU_Dist. What are the
>
> > configure options I should be using?
>
> >
>
> >
>
> >
>
> >
>
> >
>
> > Shri,
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >     if self.framework.argDB['download-superlu_dist-gpu']:
>
> >
>
> >       self.cuda           =
> framework.require('config.packages.cuda',self)
>
> >
>
> >       self.openmp         =
>
> > framework.require('config.packages.openmp',self)
>
> >
>
> >       self.deps           =
>
> > [self.mpi,self.blasLapack,self.cuda,self.openmp]
>
> >
>
> > <<<<<
>
> >
>
> >
>
> >
>
> > So try:
>
> >
>
> >
>
> >
>
> > --with-cuda=1 --download-superlu_dist=1 --download-superlu_dist-gpu=1
>
> > --with-openmp=1 [and usual MPI, blaslapack]
>
> >
>
> >
>
> >
>
> > Satish
>
> >
>
> >
>
> >
>
> >
>
> >
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200224/44ed892a/attachment-0001.html>


More information about the petsc-users mailing list