[petsc-dev] SuiteSparse 4.0

dnystrom1 at comcast.net dnystrom1 at comcast.net
Wed Jun 13 09:38:17 CDT 2012

I understand the issue of some uses of blas being too small to justify shipping to the gpu. 
So I was thinking maybe petsc could choose at runtime based on problem size whether 
to use a cpu implementation of the blas or a gpu implementation of the blas. Seems like 
that should be feasible but more complex than just having a petsc interface to substitute 
use of cublas for everything. 

Yes, I will back off tonight and try to work with a petsc example problem. I had to add 
an additional library to CHOLMOD.py to satisfy an unstatisfied external. I think it was 
named libsuitesparseconfig.a and was needed to resolve the SuiteSparse_time symbol. 
Would probably be good to have a SuiteSparse.py module. I'll send more info if I can 
reproduce the problem with a petsc example. 



----- Original Message -----
From: "Jed Brown" <jedbrown at mcs.anl.gov> 
To: "Dave Nystrom" <dnystrom1 at comcast.net> 
Cc: "For users of the development version of PETSc" <petsc-dev at mcs.anl.gov> 
Sent: Wednesday, June 13, 2012 7:41:05 AM 
Subject: Re: [petsc-dev] SuiteSparse 4.0 

You can't use cublas everywhere because it's better to do small sizes on the CPU and different threads need to be able to operate independently (without multiple devices). 

Yes, try a PETSc example before your own code. Did petsc build without any changes to the cholmod interface? ALWAYS send the error message/stack trace---even if we don't have an answer, it usually tells us how difficult the problem is likely to be to fix. 
On Jun 13, 2012 8:09 AM, "Dave Nystrom" < dnystrom1 at comcast.net > wrote: 

Well, I tried it last night but without success. I got a memory corruption 
error when I tried running my app. I suppose I should try first to get it 
working with a petsc example. I'm building SuiteSparse but only using 
cholmod out of that package. Several of the SuiteSparse packages use blas 
but cholmod is the only one so far that has the option to use cublas. Tim 
gets about a 10x speedup over the single core result when using cublas on an 
example problem that spends a lot of time in blas. So seems worth trying. 
Anyway, I think I must be doing something wrong so far. But for the same 
problem, Tim gets about a 20+ percent speedup over using multi-threaded Goto 

Probably a better solution for petsc would be support for using cublas for 
all of the petsc blas needs. Why should just one petsc blas client have 
access to cublas? But not sure how much work is involved for that - I see it 
is on the petsc ToDo list. I will try again tonight but would welcome advice 
or experiences from anyone else who has tried the new cholmod. 


Jed Brown writes: 
> Nope, why don't you try it and send us a patch if you get it working. 
> On Wed, Jun 13, 2012 at 12:49 AM, Dave Nystrom < dnystrom1 at comcast.net >wrote: 
> > Has anyone tried building petsc-dev to use cholmod-2.0 which is part of 
> > SuiteSparse-4.0 with the cholmod support for using cublas enabled? I am 
> > interested in trying the cublas support for cholmod to see how it compares 
> > with mkl and goto for my solves. 
> > 
> > Thanks, 
> > 
> > Dave 
> > 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120613/0120280e/attachment.html>

More information about the petsc-dev mailing list