[petsc-users] Some papers for additive schwarz and blocked jacobi?

Dürrwang, Jürgen Juergen.Duerrwang at iosb.fraunhofer.de
Mon Jun 6 10:16:19 CDT 2011


@Jed Brown

I copy the hole matrix and the solved vectors from each ILU-Block(=preconditioner) to GPU where I can solve with cg.

At the moment I have finished an cg solver on GPU using an algorithm from Saad. It is  very fast. By a matrix of size 640000x640000 and about 4.500.000 non zero elements I need for a failure tolerance of 10e-3 only 900ms. But I want to have a mix of an stabile and fast solver, so I implemented a cg solver with ILU(0) preconditioning. Where the ILU is unfortunately  a serial CPU implementation(ILU decompose and solve on CPU, cg operations on GPU). It computes for the same Matrix size the solution in 2,6s. So I thought if I can use all of my cpu cores instead of only one would be nice.  And perhaps I can get the the 1,5s for computing.

That’s the way I want to go:

1.Load Matrix which should be solve to CPU and GPU
2.Decompose in blocks, so on each block an ILU(0) can run in “parallel”.              : CPU
3.Loop until tolerance is reached
4.Solve each block in parallel to get an preconditioner                                                 : CPU
5.Solve CG with preconditioner to break down iteration number                            :GPU
6.End loop

There are about 4 copies between CPU /GPU per step, but that isn’t a problem

I haven’t seen the PETSC GPU manually until now….


Yes, I tried some PETSC examples and I modified one for my stuff. It works very well on my Xeon quadcore, but my intention is to mix CPU and GPU code. I want a paralell domain decomposition using jacobi block method for runing ILU(0) on each block(number of blocks = number of CPU cores). Then I want to take the results of each blocksolution as a preconditioner for a cg solver on GPU.

What is the GPU going to do while this is taking place on the CPU? I don't see much point doing CG on the GPU if you don't also move the matrix and preconditioner there. (The performance may even be worse than doing everything on the CPU.)

Have you read the docs on running PETSc on GPUs?

http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#gpus
http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-dev/docs/installation.html#CUDA

There is no ILU on the GPU because nobody has written it (because it seems to be ill-suited to the execution model).


At the moment I can decompose my matrix in  four jacobi block matrices. I compared my results with petsc and they are the same. But now I don’t know if I have to run my cg solver on each block or could I put the results of each blocked-ILU together and the use this as preconditioner for the non blocked matrix(my large input matrix).

You can do either of these; -pc_type asm -sub_ksp_type cg -sub_pc_type icc, for example. Be careful about symmetry and remember to use FGMRES if you make the preconditioner nonlinear.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110606/c2099876/attachment.htm>


More information about the petsc-users mailing list