[petsc-users] coding VecMDot_Seq as gemv

Wed Dec 27 17:32:23 CST 2017

> On Dec 27, 2017, at 4:24 PM, Kokron, Daniel S. (ARC-606.2)[CSRA, LLC] <daniel.s.kokron at nasa.gov> wrote:
> 
> Barry,
> 
> Thanks for the reply.
> 
> Here is my attempt.  Do you see anything wrong with it?
> 
> #if 1
> #include <mkl.h>
>  PetscInt n = xin->map->n,i,j;
>  const PetscScalar *xbase,*a;
>  PetscScalar *b;
>  Vec         *yy;
>  yy   = (Vec*)y;
> 
>  ierr = PetscMalloc(n*nv*sizeof(PetscScalar),&b);CHKERRQ(ierr);
>  for(i=0; i<nv; i++) {
>     ierr = VecGetArrayRead(yy[i],&a);CHKERRQ(ierr);
>     for(j=0; j<n; j++) {
>        b[j] = a[j];
>     }
>     b += n;
>     ierr = VecRestoreArrayRead(yy[i],&a);CHKERRQ(ierr);
>  }
>  ierr = VecGetArrayRead(xin,&xbase);CHKERRQ(ierr);

You've been incrementing b but now you use it as if it was the original value you malloced. You need to, for example, decrement it by n*nv

>    cblas_dgemv(CblasRowMajor, CblasNoTrans, nv, n, 1., b, n, xbase, 1, 0., work, 1);
>  ierr = VecRestoreArrayRead(xin,&xbase);CHKERRQ(ierr);
>  for(i=0; i<nv; i++) {
>     PetscPrintf(PETSC_COMM_SELF,"VecMDot_MPI: %D %.16f\n",i,work[i]);
>  }
>  ierr = PetscFree(b);CHKERRQ(ierr);
> #else
>  ierr = VecMDot_Seq(xin,nv,y,work);CHKERRQ(ierr);
>  PetscInt i;
>  for(i=0; i<nv; i++) {
>     PetscPrintf(PETSC_COMM_SELF,"VecMDot_MPI: %D %.16f\n",i,work[i]);
>  }
> #endif
> 
> 
> Daniel Kokron
> NASA Ames (ARC-TN)
> SciCon group
> ________________________________________
> From: Smith, Barry F. [bsmith at mcs.anl.gov]
> Sent: Wednesday, December 27, 2017 14:58
> To: Kokron, Daniel S. (ARC-606.2)[CSRA, LLC]
> Cc: petsc-users at mcs.anl.gov
> Subject: Re: [petsc-users] coding VecMDot_Seq as gemv
> 
>> On Dec 27, 2017, at 2:53 PM, Kokron, Daniel S. (ARC-606.2)[CSRA, LLC] <daniel.s.kokron at nasa.gov> wrote:
>> 
>> I am looking into ways to improve performance of the VecMDot_Seq routine.  I am focusing on the variant that gets called when PETSC_THREADCOMM_ACTIVE and PETSC_USE_FORTRAN_KERNEL_MDOT are NOT defined.
>> 
>> My current version of PETSc is 3.4.5 due solely to user requirement.  I am linking against MKL.
>> 
>> I tried and failed to implement VecMDot_Seq as a call to cblas_dgemv in ~/mpi/pvec2.c
>> 
>> cblas_dgemv(CblasRowMajor, CblasNoTrans, nv, n, 1., b, n, xbase, 1, 0., work, 1);
>> 
>> I could not figure out a way to extract the vectors from 'Vec y[]' and store them as rows of an allocated array.
> 
>  Use VecGetArray on each one then do a memcpy of those values into the big allocated array (or a loop copy or whatever).
>> 
>> This user post starts off with a similar request (how to construct a matrix from many vectors)
>> https://lists.mcs.anl.gov/pipermail/petsc-users/2015-August/026848.html
>> 
>> I understand that this sort of memory shuffling is expensive.  I was just hoping to prove the point to myself that it's possible.
> 
>   Yes the memory shuffling is possible and not that difficult, perhaps you just have the rows and columns backwards?
> 
>> 
>> The action performed by VecMDot_Seq is the same as matrix-vector multiplication, so I was wondering why it wasn't implemented as a call ?gemv?
> 
>   The reason is that we want each of the vectors to be independent of the other ones; to use gemv they have be shared in a single common array.
> 
>   I doubt that copying into a single array and calling the gemv will ever be faster. Better to just try to optimize/vectorize better the current code.
> 
> 
>> 
>> Daniel Kokron
>> NASA Ames (ARC-TN)
>> SciCon group
>