[petsc-dev] Parallel calculation on GPU

Projet_TRIOU triou at cea.fr
Wed Aug 20 05:41:38 CDT 2014

On 08/20/14 12:11, Karl Rupp wrote:
> Hi Pierre,
> > I have a cluster with nodes of 2 sockets of 4 cores+1 GPU.
>> Is there a way to run a calculation with 4*N MPI tasks where
>> my matrix is first built outside PETSc, then to solve the
>> linear system using PETSc Mat, Vec, KSP on only N MPI
>> tasks to adress efficiently the N GPUs ?
> as far as I can tell, this should be possible with a suitable 
> subcommunicator. The tricky piece, however, is to select the right MPI 
> ranks for this. Note that you generally have no guarantee on how the 
> MPI ranks are distributed across the nodes, so be prepared for 
> something fairly specific to your MPI installation.
Yes, I am ready to face this point too.
>> I am playing with the communicators without success, but I
>> am surely confusing things...
> To keep matters simple, try to get this scenario working with a purely 
> CPU-based solve. Once this works, the switch to GPUs should be just a 
> matter of passing the right flags. Have a look at PetscInitialize() here:
> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscInitialize.html 
> which mentions that you need to create the subcommunicator of 
I also started the work with a purely CPU-based solve only to test, but 
without success. When
I read this:

"If you wish PETSc code to run ONLY on a subcommunicator of 
MPI_COMM_WORLD, create that communicator first and assign it to 
BEFORE calling PetscInitialize 

Thus if you are running a four process job and two processes will run 
PETSc and have PetscInitialize 
and PetscFinalize 
and two process will not, then do this. If ALL processes in
the job are using PetscInitialize 
and PetscFinalize 
then you don't need to do this, even if different subcommunicators of 
the job are doing different things with PETSc."

I think I am not in this special scenario, because as my matrix is 
initially partitionned on 4
processes, I need to call PetscInitialize() on each 4 processes in order 
to build the PETSc matrix
with MatSetValues. And my goal is after to solve the linear system on 
only 2 processes... So
building a sub-communicator will really do the trick ? Or i miss something ?

Thanks Karli for your answer,

> Best regards,
> Karli

*Trio_U support team*
Marthe ROUX (01 69 08 00 02) Saclay
Pierre LEDAC (04 38 78 91 49) Grenoble
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140820/9f995fac/attachment.html>

More information about the petsc-dev mailing list