[petsc-users] Parallelization of the code

Tue Feb 12 20:03:05 CST 2019

  Sounds like you have a single structured grid so you should use https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DMDA/DMDACreate2d.html and start with something like src/ksp/ksp/examples/tutorials/ex29.c and ex46.c 

  The DMDA manages dividing up the grid among processes and simplifies generating the matrix in parallel.

  Good luck,

   Barry

> On Feb 12, 2019, at 7:43 PM, Maahi Talukder <maahi.buet at gmail.com> wrote:
> 
> I am using Finite Difference to discretize my system and I am working with nine point stencil. So I am populating in Q() all the values. The code to do that is the following - 
> 
> Do i = 2,ymax-1
> 
> Do j = 2,xmax-1
> 
> g11(i-1,j-1) = 0.25*(x(i,j+1)-x(i,j-1))*(x(i,j+1)-x(i,j-1)) +0.25*(y(i,j+1)-y(i,j-1))*(y(i,j+1)-y(i,j-1))
> g12(i-1,j-1) = 0.25*(x(i,j+1)-x(i,j-1))*(x(i+1,j)-x(i-1,j))+ 0.25*(y(i,j+1)-y(i,j-1))*(y(i+1,j)-y(i-1,j))
> g22(i-1,j-1) = 0.25*(x(i+1,j)-x(i-1,j))*(x(i+1,j)-x(i-1,j)) + 0.25*(y(i+1,j)-y(i-1,j))*(y(i+1,j)-y(i-1,j))
> 
> M(j-1,j-1+xmax*(i-2))= (-1)*(g12(i-1,j-1)/2)
> M(j-1,j+xmax*(i-2))= g11(i-1,j-1)
> M(j-1,j+1+xmax*(i-2)) = g12(i-1,j-1)/2
> M(j-1,j-1+xmax*(i-2)+xmax)= g22(i-1,j-1)
> M(j-1,j+xmax*(i-2)+xmax)= (-2)*(g22(i-1,j-1)+g11(i-1,j-1))
> M(j-1,j+1+xmax*(i-2)+xmax)= g22(i-1,j-1)
> M(j-1,j-1+xmax*(i-2)+2*xmax) = g12(i-1,j-1)/2
> M(j-1,j+xmax*(i-2)+2*xmax) = g11(i-1,j-1)
> M(j-1,j+1+xmax*(i-2)+2*xmax) = (-1)*(g12(i-1,j-1)/2)       
> 
> end Do
> 
> N(1+(xmax-2)*(i-2):(xmax-2)*(i-1),1:xmax*ymax)= M(1:xmax-2,1:xmax*ymax)
> M = 0
> 
> end Do
> 
> E(1:(xmax-2)*(ymax-2),1:xmax*ymax-2*xmax) = N(1:(xmax-2)*(ymax-2),xmax+1:xmax*ymax-xmax)
> 
> Do i = 1,ymax-2
>     
> Q(1:(xmax-2)*(ymax-2),1+(xmax-2)*(i-1):(xmax-2)*i)= E(1:(xmax-2)*(ymax-2),2+xmax*(i-1):(xmax-1)+xmax*(i-1))
>     
> end Do
> 
> So how do I go about decomposing this Q() matrix across processes ? What PETSc function do I use for that?
> 
> 
> Regards,
> Maahi
> 
> On Tue, Feb 12, 2019 at 8:25 PM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
> 
>   With MPI parallelism you need to do a decomposition of your data across the processes. Thus, for example, each process will generate a subset of the matrix entries. In addition, for large problems (which is what parallelism is for) you cannot use "dense" data structures, like your Q(), to store sparse matrices. For large problems with true sparse matrices the Q() will not fit on your machine and consists almost completely of zeros which it makes no sense to save in memory.
> 
>    How are your Q() entries being generated? This will determine the decomposition of the data you need to make. For example, with the finite element method each process would be assigned a subset of the elements (generally for a subregion of the entire domain).
> 
>    Barry
> 
> 
> 
> > On Feb 12, 2019, at 5:42 PM, Maahi Talukder via petsc-users <petsc-users at mcs.anl.gov> wrote:
> > 
> > 
> > Dear All,
> > 
> > 
> > I am tying to solve a linear system using ksp solvers. I have managed to solve the system with a sequential code. The part of my sequential code that deals with creating Matrix and setting values is as the following - 
> > 
> > call MatCreate(PETSC_COMM_WORLD,Mp,ierr)
> > call MatSetSizes(Mp,PETSC_DECIDE,PETSC_DECIDE,u*v,u*v,ierr)
> > call MatSetFromOptions(Mp,ierr)
> > call MatSetUp(Mp,ierr)
> > 
> > Do p = 1,29008
> > Do r = 1,29008
> > if(Q(p,r)/=0.0) then
> > val(1) = Q(p,r)
> > col(1) = r-1
> > call MatSetValues(Mp,ione,p-1,ione,col,val,INSERT_VALUES,ierr)
> > endif
> > end Do
> > end Do
> > 
> > call MatAssemblyBegin(Mp,MAT_FINAL_ASSEMBLY,ierr)
> > call MatAssemblyEnd(Mp,MAT_FINAL_ASSEMBLY,ierr)
> > 
> > And the part of my sequential code that creates the vector is -
> > 
> > call VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,u*v,Bx,ierr)
> > call VecSetFromOptions(Bx,ierr)
> > call VecDuplicate(Bx,Xp,ierr)
> > call VecSet(Bx,zero,ierr) 
> > 
> > Do p = 1,29008
> > if(Fx(p,1)/=0.0) then
> > val(1) = Fx(p,1)
> > call VecSetValues(Bx,ione,p-1,val,INSERT_VALUES,ierr)
> > endif
> > end Do
> > 
> > call VecAssemblyBegin(Bx,ierr)
> > call VecAssemblyEnd(Bx,ierr)
> > 
> > So when I run the code on single processor, it runs fine. But when I tried to run it on more than one processor, it failed.  Now from what I understood from going through the manual is that if I use MatCreate to create my Matrix, then depending on the no of processor  that I put in after 'mpiexec -n ...' , it either acts either as a sequential code or a parallel code. And I don't need to anything extra to make it work in parallel, as PETSc does that internally. 
> > 
> > So would you please let me know  what modifications I need to do to my existing sequential code to make it work in parallel like using MatGetOwnershipRange ? 
> > 
> > Regards,
> > Maahi Talukder
> > MSc Student
> > Clarkson University
> > 
> > 
> > 
>