[petsc-users] Can I use TAO for embarrassingly parallel problems with multiple threads?

Jed Brown jed at jedbrown.org
Thu Dec 20 11:27:49 CST 2018


This isn't thread-safe, but in a "simple" (if these things ever are) way because the operation could complete eagerly.

"Dener, Alp via petsc-users" <petsc-users at mcs.anl.gov> writes:

> Hi Krys,
>
> On Dec 20, 2018, at 10:59 AM, Krzysztof Kamieniecki via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
>
> That example seems to have critical sections around certain Vec calls, and it looks like my problem occurs in VecDotBegin/VecDotEnd which is called by TAO/BLMVM.
>
> The quasi-Newton matrix objects in BLMVM have asynchronous dot products in the matrix-free forward and inverse product formulations. This is a relatively recent performance optimization. If avoiding this split phase communication would solve the problem, and you don’t need other recent PETSc features, you could revert to 3.9 and use the old version of BLMVM that will use straight VecDot operations instead.
>
> Unfortunately I don’t know enough about multithreading to definitively say whether that will actually solve the problem or not. Other members of the community can probably provide a more complete answer on that.
>
>
> I assume  PetscSplitReductionGet is pulling the PetscSplitReduction for PETSC_COMM_SELF which is shared across the whole process?
>
> I tried PetscCommDuplicate/PetscCommDestroy but that does not seem to help.
>
> PetscErrorCode  VecDotBegin(Vec x,Vec y,PetscScalar *result)
> {
>   PetscErrorCode      ierr;
>   PetscSplitReduction *sr;
>   MPI_Comm            comm;
>
>   PetscFunctionBegin;
>   PetscValidHeaderSpecific(x,VEC_CLASSID,1);
>   PetscValidHeaderSpecific(y,VEC_CLASSID,1);
>   ierr = PetscObjectGetComm((PetscObject)x,&comm);CHKERRQ(ierr);
>   ierr = PetscSplitReductionGet(comm,&sr);CHKERRQ(ierr);
>   if (sr->state != STATE_BEGIN) SETERRQ(PETSC_COMM_SELF,PETSC_ERR_ORDER,"Called before all VecxxxEnd() called");
>   if (sr->numopsbegin >= sr->maxops) {
>     ierr = PetscSplitReductionExtend(sr);CHKERRQ(ierr);
>   }
>   sr->reducetype[sr->numopsbegin] = PETSC_SR_REDUCE_SUM;
>   sr->invecs[sr->numopsbegin]     = (void*)x;
>   if (!x->ops->dot_local) SETERRQ(PETSC_COMM_SELF,PETSC_ERR_SUP,"Vector does not suppport local dots");
>   ierr = PetscLogEventBegin(VEC_ReduceArithmetic,0,0,0,0);CHKERRQ(ierr);
>   ierr = (*x->ops->dot_local)(x,y,sr->lvalues+sr->numopsbegin++);CHKERRQ(ierr);
>   ierr = PetscLogEventEnd(VEC_ReduceArithmetic,0,0,0,0);CHKERRQ(ierr);
>   PetscFunctionReturn(0);
> }
>
>
>
>
> On Thu, Dec 20, 2018 at 11:26 AM Smith, Barry F. <bsmith at mcs.anl.gov<mailto:bsmith at mcs.anl.gov>> wrote:
>
>    The code src/ksp/ksp/examples/tutorials/ex61f.F90 demonstrates working with multiple threads each managing their own collection of PETSc objects. Hope this helps.
>
>     Barry
>
>
>> On Dec 20, 2018, at 9:28 AM, Krzysztof Kamieniecki via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
>>
>> Hello All,
>>
>> I have an embarrassingly parallel problem that I would like to use TAO on, is there some way to do this with threads as opposed to multiple processes?
>>
>>  I compiled PETSc with the following flags
>> ./configure \
>> --prefix=${DEP_INSTALL_DIR} \
>> --with-threadsafety --with-log=0 --download-concurrencykit \
>> --with-openblas=1 \
>> --with-openblas-dir=${DEP_INSTALL_DIR} \
>> --with-mpi=0 \
>> --with-shared=0 \
>> --with-debugging=0 COPTFLAGS='-O3' CXXOPTFLAGS='-O3' FOPTFLAGS='-O3'
>>
>> When I run TAO in multiple threads I get the error "Called VecxxxEnd() in a different order or with a different vector than VecxxxBegin()"
>>
>> Thanks,
>> Krys
>>
>
>
>> Alp


More information about the petsc-users mailing list