[petsc-users] efficiency of parallel convolution

Wed Aug 6 10:13:34 CDT 2014

  It is difficult to understand what you are doing here. What is dim? What is NX and NY?   Is the length of inpx and inpw 256*256 ?  Are you using a PETSc Mat like AIJ to apply the “fast convolution” or some custom MATSHELL?  Is the “fast convolution” the same for each dim, i and j or is it different ?

  Barry

On Aug 5, 2014, at 1:24 AM, LikunTan <tlk0812 at hotmail.com> wrote:

> Hi all,
> 
> I am calculating the multiplication of matrix and vector using fast convolution, but this has to be done for many times. Here is a brief framework of my code:
> 
> for(dim=0; dim<NDOF; dim++)
> {
>      for(i=0; i<NX; i++)
>      {
>          for(j=0; j<NY; j++)
>          {
>                //compute inpx
>                //compute inpw
>                //fast convolution
>           }
>      }
> }
> 
> The fast convolution needs to compute multiple times within the for loops. The dimension of the input vector is 256*256. The most time consuming parts are MatMult(), VecPoinstwiseMult() and MatMultTranspose() during fast convolution. The optimal number of processors is 2. Further increase of processor numbers will reduce the efficiency. In this case, would you please suggest a way to improve efficiency and fully make use of parallelization?  Thanks.