[petsc-dev] [GPU - slepc] Hands-on exercise 4 (SVD) not working with GPU and default configurations

Karl Rupp rupp at iue.tuwien.ac.at
Mon Aug 10 05:54:16 CDT 2015


Hi Massimiliano,

On 08/10/2015 12:45 PM, Leoni, Massimiliano wrote:
> Good, it is running now, but performances are really poor: I tried on 3
> nodes with 2 GPUs and 12 CPU threads each, MPI+CUDA performs much worse
> than pure MPI.
>
> I have a few thoughts on why this might be happening:
>
> ·My problem has dense matrices but on GPU I use –mat_type aijcusp
>
> ·I did something wrong in setting the problem up, that causes a bunch of
> useless memory traffic I could avoid
>
> ·The real power of GPUs might show up only eventually as the problem
> gets bigger [I tested a 100000x200 matrix]
>
> Is any of these likely to be causing this poor performance?

The use of aijcusp instead of a dense matrix type certainly adds to the 
issue. Please send the output of -log_summary so that we can see where 
most time is spent.


> There is also the banner on PETSc’s website saying that using GPUs
> effectively is not that easy. Can anyone point me to further reference
> on how to improve this?
>
> I suspect that I can achieve more by choosing proper solution methods
> and other options of PETSc/SLEPc.

If you have good (recent) CPUs in dual-socket configuration, it's more 
than unlikely that you will gain anything beyond ~2x with an optimized 
GPU setup. Even that ~2x may only be possible with heavily tweaking the 
current SVD-implementation in SLEPc, of which I don't know the details.

Best regards,
Karli





>  > -----Original Message-----
>
>  > From: Jose E. Roman [mailto:jroman at dsic.upv.es]
>
>  > Sent: 07 August 2015 18:44
>
>  > To: Leoni, Massimiliano
>
>  > Cc: petsc-dev at mcs.anl.gov; slepc-maint at upv.es
>
>  > Subject: Re: [petsc-dev] [GPU - slepc] Hands-on exercise 4 (SVD) not
> working
>
>  > with GPU and default configurations
>
>  >
>
>  > Yes, there seems to be a problem with the default SVD solver
> (SVDCROSS). I
>
>  > will fix it in the master branch in the next days. Meanwhile, you can
> run the
>
>  > example with -svd_type trlanczos
>
>  >
>
>  > Thanks for reporting this.
>
>  > Jose
>
>  >
>
>  >
>
>  > > El 7/8/2015, a las 16:31, Leoni, Massimiliano
> <Massimiliano.Leoni at Rolls- <mailto:Massimiliano.Leoni at Rolls-Royce.com>
>
>> Royce.com  <mailto:Massimiliano.Leoni at Rolls-Royce.com>> escribió:
>
>  > >
>
>  > > Hi everybody!
>
>  > >
>
>  > > I kept experimenting with slepc and GPUs, and when I turned to SVD I
>
>  > found out that the hands-on exercise on SVD [#4] doesn’t run properly.
>
>  > >
>
>  > > If I run it on CPU it works fine, whereas… $ mpirun -np 1 slepcSVD
>
>  > > -file $SLEPC_DIR/share/slepc/datafiles/matrices/rdb200.petsc -bv_type
>
>  > > vecs -mat_type aijcusp -on_error_abort
>
>  > >
>
>  > > Singular value problem stored in file.
>
>  > >
>
>  > > Reading REAL matrix from a binary file...
>
>  > > [0]PETSC ERROR: BVScaleColumn() line 380 in
>
>  > > /gpfs/rrcfd/rruk-students/apps/slepc/src/sys/classes/bv/interface/bvop
>
>  > > s.c Scalar value must be same on all processes, argument # 3 [0]PETSC
>
>  > > ERROR:
>
>  > > ----------------------------------------------------------------------
>
>  > > -- [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation
>
>  > > Violation, probably memory access out of range [0]PETSC ERROR: Try
>
>  > > option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR:
>
>  > > or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>
>  > > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac
>
>  > > OS X to find memory corruption errors [0]PETSC ERROR: likely location
>
>  > > of problem given in stack below [0]PETSC ERROR: ---------------------
>
>  > > Stack Frames ------------------------------------
>
>  > > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not
>
>  > available,
>
>  > > [0]PETSC ERROR:       INSTEAD the line number of the start of the
> function
>
>  > > [0]PETSC ERROR:       is given.
>
>  > > [0]PETSC ERROR: [0] PetscAbortErrorHandler line 56
>
>  > > /gpfs/rrcfd/rruk-students/apps/petsc/src/sys/error/errabort.c
>
>  > > [0]PETSC ERROR: [0] PetscError line 363
>
>  > > /gpfs/rrcfd/rruk-students/apps/petsc/src/sys/error/err.c
>
>  > > [0]PETSC ERROR: [0] BVScaleColumn line 377
>
>  > > /gpfs/rrcfd/rruk-students/apps/slepc/src/sys/classes/bv/interface/bvop
>
>  > > s.c [0]PETSC ERROR: [0] EPSFullLanczos line 357
>
>  > > /gpfs/rrcfd/rruk-students/apps/slepc/src/eps/impls/krylov/epskrylov.c
>
>  > > [0]PETSC ERROR: [0] EPSSolve_KrylovSchur_Symm line 41
>
>  > > /gpfs/rrcfd/rruk-students/apps/slepc/src/eps/impls/krylov/krylovschur/
>
>  > > ks-symm.c [0]PETSC ERROR: [0] EPSSolve line 83
>
>  > > /gpfs/rrcfd/rruk-students/apps/slepc/src/eps/interface/epssolve.c
>
>  > > [0]PETSC ERROR: [0] SVDSolve_Cross line 155
>
>  > > /gpfs/rrcfd/rruk-students/apps/slepc/src/svd/impls/cross/cross.c
>
>  > > [0]PETSC ERROR: [0] SVDSolve line 92
>
>  > > /gpfs/rrcfd/rruk-students/apps/slepc/src/svd/interface/svdsolve.c
>
>  > > [0]PETSC ERROR: User provided function() line 0 in  unknown file
>
>  > > (null)
>
>  > > ----------------------------------------------------------------------
>
>  > > ---- mpirun noticed that process rank 0 with PID 11264 on node gpu3
>
>  > > exited on signal 11 (Segmentation fault).
>
>  > > ----------------------------------------------------------------------
>
>  > > ----
>
>  > >
>
>  > > I am using the same command line options that work just fine on
> hands-on
>
>  > exercises 1 and 2, which feature EPS solvers.
>
>  > >
>
>  > > Any hint appreciated.
>
>  > >
>
>  > > Best regards,
>
>  > > Massimiliano Leoni
>
>  > >
>
>  > >
>
>  > > The data contained in, or attached to, this e-mail, may contain
> confidential
>
>  > information. If you have received it in error you should notify the
> sender
>
>  > immediately by reply e-mail, delete the message from your system and
>
>  > contact +44 (0) 1332 622800(Security Operations Centre) if you need
>
>  > assistance. Please do not copy it for any purpose, or disclose its
> contents to
>
>  > any other person.
>
>  > >
>
>  > > An e-mail response to this address may be subject to interception or
>
>  > monitoring for operational reasons or for lawful business practices.
>
>  > >
>
>  > > (c) 2015 Rolls-Royce plc
>
>  > >
>
>  > > Registered office: 62 Buckingham Gate, London SW1E 6AT Company
>
>  > number: 1003142. Registered in England.
>
>  > >
>
>
> The data contained in, or attached to, this e-mail, may contain
> confidential information. If you have received it in error you should
> notify the sender immediately by reply e-mail, delete the message from
> your system and contact +44 (0) 1332 622800(Security Operations Centre)
> if you need assistance. Please do not copy it for any purpose, or
> disclose its contents to any other person.
>
> An e-mail response to this address may be subject to interception or
> monitoring for operational reasons or for lawful business practices.
>
> (c) 2015 Rolls-Royce plc
>
> Registered office: 62 Buckingham Gate, London SW1E 6AT Company number:
> 1003142. Registered in England.
>




More information about the petsc-dev mailing list