[petsc-users] efficiency of parallel convolution

LikunTan tlk0812 at hotmail.com
Wed Aug 6 13:39:54 CDT 2014


Hi Barry,

Thanks for your email. sorry i did not make it clear. Here is a more detailed one:

int dim, i, j;
int NDOF=3,  NX=5, NY=5;

 for(dim=0; dim<NDOF; dim++)
{
      for(i=0; i<NX; i++)
      {
          for(j=0; j<NY; j++)
          {
                //compute inpx
                 Set values for vec inpx, which has a dimension of 256*256
                //compute inpw
                Set values for vec inpw, which has a dimension of 256*256
                //fast convolution
               i am following ex158 in src/mat/examples using the petsc and fftw interface, the mat is created using        MatCreateFFT()
         }
     }
}

The values of inpx and inpw are changing with the indices dim, i and j, but the lengths are the same all the time and the convolution can be calculated separately. I am thinking about two options:
option1: using MPI to do the fast convolution for each inpx and inpw simulataneously, i.e. , do NDOF*NX*NY convolutions in parallel
option2: in convolution, define an extended matrix and vector to store all the values from the NDOF*NX*NY convolutions, and do MatMult(), VecPointwiseMult(), MatMultTranpose() on the extended objects at the same time.

I would very much appreciate your comments. Thanks.



> Subject: Re: [petsc-users] efficiency of parallel convolution
> From: bsmith at mcs.anl.gov
> Date: Wed, 6 Aug 2014 10:13:34 -0500
> CC: petsc-users at mcs.anl.gov
> To: tlk0812 at hotmail.com
> 
> 
>   It is difficult to understand what you are doing here. What is dim? What is NX and NY?   Is the length of inpx and inpw 256*256 ?  Are you using a PETSc Mat like AIJ to apply the “fast convolution” or some custom MATSHELL?  Is the “fast convolution” the same for each dim, i and j or is it different ?
> 
>   Barry
> 
> On Aug 5, 2014, at 1:24 AM, LikunTan <tlk0812 at hotmail.com> wrote:
> 
> > Hi all,
> > 
> > I am calculating the multiplication of matrix and vector using fast convolution, but this has to be done for many times. Here is a brief framework of my code:
> > 
> > for(dim=0; dim<NDOF; dim++)
> > {
> >      for(i=0; i<NX; i++)
> >      {
> >          for(j=0; j<NY; j++)
> >          {
> >                //compute inpx
> >                //compute inpw
> >                //fast convolution
> >           }
> >      }
> > }
> > 
> > The fast convolution needs to compute multiple times within the for loops. The dimension of the input vector is 256*256. The most time consuming parts are MatMult(), VecPoinstwiseMult() and MatMultTranspose() during fast convolution. The optimal number of processors is 2. Further increase of processor numbers will reduce the efficiency. In this case, would you please suggest a way to improve efficiency and fully make use of parallelization?  Thanks.
> 
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20140807/d040eb24/attachment.html>


More information about the petsc-users mailing list