[petsc-dev] Floating point exception in ex47cu.cu

David Fuentes fuentesdt at gmail.com
Wed Mar 21 09:11:48 CDT 2012


I'm having a similiar problem with ex47cu

with petsc 3.2-p7
       cuda 4.0
       thrust 1.4
       cusp 0.2

Unfortunately, it's pretty random.
sometimes the executable runs fine.
Other times it will crash with floating point exceptions as below.
(The function evaluation will returns inf's and nan's when using
-da_vec_type mpicusp)

the -arch=sm_13 flag is being used.

My card has compute capability 2.0 would using -arch=sm_20 possibly help ?
also, is there a particular version of cuda/thrust/cusp that should be used
?


Thanks,
David


On Mon, Apr 25, 2011 at 8:46 AM, Satish Balay <balay at mcs.anl.gov> wrote:

> That would be wierd. Currently configure defaults to sm_10 for single
> precision, and sm_13 for double precision. [I've verified this again]
>
> can you send configure.log [without the cuda-arch option] - to
> petsc-maint?
>
>
> Satish
>
>
> On Mon, 25 Apr 2011, Евгений Козлов wrote:
>
> > Yes, this was the problem. Now it works.
> >
> > Thank you.
> >
> > 2011/4/25 Victor Minden <victorminden at gmail.com>:
> > > Eugene,
> > > Based off of
> > >  Configure options --prefix=/home/kukushkinav
> > > --with-blas-lapack-dir=/opt/intel/composerxe-2011.0.084/mkl
> > > --with-mpi-dir=/opt/intel/impi/4.0.1.007/intel64/bin --with-cuda=1
> > > --with-cusp=1 --with-thrust=1
> > > --with-thrust-dir=/home/kukushkinav/include
> > > --with-cusp-dir=/home/kukushkinav/include
> > > It looks like maybe you're not setting the configure flag
> > > --with-cuda-arch=XXX.  Without this, PETSc uses the nvcc default,
> which when
> > > I last looked was sm_10.  The problem with this is that sm_10 doesn't
> > > support double-precision.  Maybe this is your problem?  For example, I
> use
> > > -with-cuda-arch=sm_13, which corresponds to NVIDIA compute capability
> 1.3
> > > Cheers,
> > > Victor
> > > ---
> > > Victor L. Minden
> > >
> > > Tufts University
> > > School of Engineering
> > > Class of 2012
> > >
> > >
> > > On Fri, Apr 22, 2011 at 12:22 PM, Евгений Козлов <
> neoveneficus at gmail.com>
> > > wrote:
> > >>
> > >> Hello,
> > >>
> > >> I am interesting in using PETSc for iterative solving sparse linear
> > >> system on multi-GPU systems.
> > >>
> > >> First of all, I compiled PETSc-dev and tried to run some examples,
> > >> which will run on GPU.
> > >>
> > >> I found src/snes/examples/tutorials/ex47cu.cu. It was compiled and
> > >> executed.
> > >>
> > >> Out of original program src/snes/examples/tutorials/ex47cu.cu:
> > >>
> > >> [0]PETSC ERROR: --------------------- Error Message
> > >> ------------------------------------
> > >> [0]PETSC ERROR: Floating point exception!
> > >> [0]PETSC ERROR: User provided compute function generated a
> Not-a-Number!
> > >> [0]PETSC ERROR:
> > >>
> ------------------------------------------------------------------------
> > >> [0]PETSC ERROR: Petsc Development HG revision:
> > >> d3e10315d68b1dd5481adb2889c7d354880da362  HG Date: Wed Apr 20 21:03:56
> > >> 2011 -0500
> > >> [0]PETSC ERROR: See docs/changes/index.html for recent updates.
> > >> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> > >> [0]PETSC ERROR: See docs/index.html for manual pages.
> > >> [0]PETSC ERROR:
> > >>
> ------------------------------------------------------------------------
> > >> [0]PETSC ERROR: ex47cu on a arch-linu named cn03 by kukushkinav Fri
> > >> Apr 22 18:33:32 2011
> > >> [0]PETSC ERROR: Libraries linked from /home/kukushkinav/lib
> > >> [0]PETSC ERROR: Configure run at Thu Apr 21 19:18:22 2011
> > >> [0]PETSC ERROR: Configure options --prefix=/home/kukushkinav
> > >> --with-blas-lapack-dir=/opt/intel/composerxe-2011.0.084/mkl
> > >> --with-mpi-dir=/opt/intel/impi/4.0.1.007/intel64/bin --with-cuda=1
> > >> --with-cusp=1 --with-thrust=1
> > >> --with-thrust-dir=/home/kukushkinav/include
> > >> --with-cusp-dir=/home/kukushkinav/include
> > >> [0]PETSC ERROR:
> > >>
> ------------------------------------------------------------------------
> > >> [0]PETSC ERROR: SNESSolve_LS() line 167 in src/snes/impls/ls/ls.c
> > >> [0]PETSC ERROR: SNESSolve() line 2407 in src/snes/interface/snes.c
> > >> [0]PETSC ERROR: main() line 38 in src/snes/examples/tutorials/
> ex47cu.cu
> > >> application called MPI_Abort(MPI_COMM_WORLD, 72) - process 0
> > >>
> > >> Then I tried to find the place in source with the problem, I changed
> > >> the function in struct ApplyStencil to
> > >>
> > >> void operator()(Tuple t) { thrust::get<0>(t) = 1; }
> > >>
> > >> Result:
> > >> [0]PETSC ERROR: --------------------- Error Message
> > >> ------------------------------------
> > >> [0]PETSC ERROR: Floating point exception!
> > >> [0]PETSC ERROR: Infinite or not-a-number generated in mdot, entry 0!
> > >> [0]PETSC ERROR:
> > >>
> ------------------------------------------------------------------------
> > >> [0]PETSC ERROR: Petsc Development HG revision:
> > >> d3e10315d68b1dd5481adb2889c7d354880da362  HG Date: Wed Apr 20 21:03:56
> > >> 2011 -0500
> > >> [0]PETSC ERROR: See docs/changes/index.html for recent updates.
> > >> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> > >> [0]PETSC ERROR: See docs/index.html for manual pages.
> > >> [0]PETSC ERROR:
> > >>
> ------------------------------------------------------------------------
> > >> [0]PETSC ERROR: ex47cu on a arch-linu named cn11 by kukushkinav Fri
> > >> Apr 22 18:58:04 2011
> > >> [0]PETSC ERROR: Libraries linked from /home/kukushkinav/lib
> > >> [0]PETSC ERROR: Configure run at Thu Apr 21 19:18:22 2011
> > >> [0]PETSC ERROR: Configure options --prefix=/home/kukushkinav
> > >> --with-blas-lapack-dir=/opt/intel/composerxe-2011.0.084/mkl
> > >> --with-mpi-dir=/opt/intel/impi/4.0.1.007/intel64/bin --with-cuda=1
> > >> --with-cusp=1 --with-thrust=1
> > >> --with-thrust-dir=/home/kukushkinav/include
> > >> --with-cusp-dir=/home/kukushkinav/include
> > >> [0]PETSC ERROR:
> > >>
> ------------------------------------------------------------------------
> > >> [0]PETSC ERROR: VecMDot() line 1146 in src/vec/vec/interface/rvector.c
> > >> [0]PETSC ERROR: KSPGMRESClassicalGramSchmidtOrthogonalization() line
> > >> 66 in src/ksp/ksp/impls/gmres/borthog2.c
> > >> [0]PETSC ERROR: GMREScycle() line 161 in
> src/ksp/ksp/impls/gmres/gmres.c
> > >> [0]PETSC ERROR: KSPSolve_GMRES() line 244 in
> > >> src/ksp/ksp/impls/gmres/gmres.c
> > >> [0]PETSC ERROR: KSPSolve() line 426 in src/ksp/ksp/interface/itfunc.c
> > >> [0]PETSC ERROR: SNES_KSPSolve() line 3107 in src/snes/interface/snes.c
> > >> [0]PETSC ERROR: SNESSolve_LS() line 190 in src/snes/impls/ls/ls.c
> > >> [0]PETSC ERROR: SNESSolve() line 2407 in src/snes/interface/snes.cС
> > >> уважением,
> > >> Евгений
> > >> [0]PETSC ERROR: main() line 38 in src/snes/examples/tutorials/
> ex47cu.cu
> > >>
> > >> RedHat 5.5, Cuda 3.2
> > >>
> > >> Question1: Is it my problem or a bug of the algorithm?
> > >>
> > >> Question2: Where can I find simple doc or example, which describe how
> > >> to solve sparse linear systems on multi-GPU systems using PETSc?
> > >>
> > >>
> > >> --
> > >> Best regards,
> > >> Eugene
> > >
> > >
> >
> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120321/a177702c/attachment.html>


More information about the petsc-dev mailing list