[petsc-users] coding VecMDot_Seq as gemv
Smith, Barry F.
bsmith at mcs.anl.gov
Wed Dec 27 17:32:23 CST 2017
> On Dec 27, 2017, at 4:24 PM, Kokron, Daniel S. (ARC-606.2)[CSRA, LLC] <daniel.s.kokron at nasa.gov> wrote:
>
> Barry,
>
> Thanks for the reply.
>
> Here is my attempt. Do you see anything wrong with it?
>
> #if 1
> #include <mkl.h>
> PetscInt n = xin->map->n,i,j;
> const PetscScalar *xbase,*a;
> PetscScalar *b;
> Vec *yy;
> yy = (Vec*)y;
>
> ierr = PetscMalloc(n*nv*sizeof(PetscScalar),&b);CHKERRQ(ierr);
> for(i=0; i<nv; i++) {
> ierr = VecGetArrayRead(yy[i],&a);CHKERRQ(ierr);
> for(j=0; j<n; j++) {
> b[j] = a[j];
> }
> b += n;
> ierr = VecRestoreArrayRead(yy[i],&a);CHKERRQ(ierr);
> }
> ierr = VecGetArrayRead(xin,&xbase);CHKERRQ(ierr);
You've been incrementing b but now you use it as if it was the original value you malloced. You need to, for example, decrement it by n*nv
> cblas_dgemv(CblasRowMajor, CblasNoTrans, nv, n, 1., b, n, xbase, 1, 0., work, 1);
> ierr = VecRestoreArrayRead(xin,&xbase);CHKERRQ(ierr);
> for(i=0; i<nv; i++) {
> PetscPrintf(PETSC_COMM_SELF,"VecMDot_MPI: %D %.16f\n",i,work[i]);
> }
> ierr = PetscFree(b);CHKERRQ(ierr);
> #else
> ierr = VecMDot_Seq(xin,nv,y,work);CHKERRQ(ierr);
> PetscInt i;
> for(i=0; i<nv; i++) {
> PetscPrintf(PETSC_COMM_SELF,"VecMDot_MPI: %D %.16f\n",i,work[i]);
> }
> #endif
>
>
> Daniel Kokron
> NASA Ames (ARC-TN)
> SciCon group
> ________________________________________
> From: Smith, Barry F. [bsmith at mcs.anl.gov]
> Sent: Wednesday, December 27, 2017 14:58
> To: Kokron, Daniel S. (ARC-606.2)[CSRA, LLC]
> Cc: petsc-users at mcs.anl.gov
> Subject: Re: [petsc-users] coding VecMDot_Seq as gemv
>
>> On Dec 27, 2017, at 2:53 PM, Kokron, Daniel S. (ARC-606.2)[CSRA, LLC] <daniel.s.kokron at nasa.gov> wrote:
>>
>> I am looking into ways to improve performance of the VecMDot_Seq routine. I am focusing on the variant that gets called when PETSC_THREADCOMM_ACTIVE and PETSC_USE_FORTRAN_KERNEL_MDOT are NOT defined.
>>
>> My current version of PETSc is 3.4.5 due solely to user requirement. I am linking against MKL.
>>
>> I tried and failed to implement VecMDot_Seq as a call to cblas_dgemv in ~/mpi/pvec2.c
>>
>> cblas_dgemv(CblasRowMajor, CblasNoTrans, nv, n, 1., b, n, xbase, 1, 0., work, 1);
>>
>> I could not figure out a way to extract the vectors from 'Vec y[]' and store them as rows of an allocated array.
>
> Use VecGetArray on each one then do a memcpy of those values into the big allocated array (or a loop copy or whatever).
>>
>> This user post starts off with a similar request (how to construct a matrix from many vectors)
>> https://lists.mcs.anl.gov/pipermail/petsc-users/2015-August/026848.html
>>
>> I understand that this sort of memory shuffling is expensive. I was just hoping to prove the point to myself that it's possible.
>
> Yes the memory shuffling is possible and not that difficult, perhaps you just have the rows and columns backwards?
>
>>
>> The action performed by VecMDot_Seq is the same as matrix-vector multiplication, so I was wondering why it wasn't implemented as a call ?gemv?
>
> The reason is that we want each of the vectors to be independent of the other ones; to use gemv they have be shared in a single common array.
>
> I doubt that copying into a single array and calling the gemv will ever be faster. Better to just try to optimize/vectorize better the current code.
>
>
>>
>> Daniel Kokron
>> NASA Ames (ARC-TN)
>> SciCon group
>
More information about the petsc-users
mailing list