[petsc-users] Can I use TAO for embarrassingly parallel problems with multiple threads?

Thu Dec 20 16:57:50 CST 2018

   Great, thanks for letting us know. We should change our example to use individual MPI communicators as well.

    Barry

> On Dec 20, 2018, at 3:52 PM, Krzysztof Kamieniecki <krys at kamieniecki.com> wrote:
> 
> Based on Barry's encouragement I spent the extra time to get back to v3.10.2 and this code seems to work:
> 
> int mpierr;
> MPI_Comm mpiCommRaw;
> int tag = 0;
> mpierr = MPI_Comm_dup(MPI_COMM_SELF, &mpiCommRaw); if(mpierr != MPI_SUCCESS) throw PSI::Exception("MPI ERROR: MPI_Comm_dup");
> 
> MPI_Comm mpiComm;
> ierr = PetscCommDuplicate(mpiCommRaw, &mpiComm, &tag); CHKERRABORT(PETSC_COMM_SELF, ierr);
> 
> Tao tao;
> ierr = TaoCreate(mpiComm, &tao); CHKERRABORT(mpiComm, ierr);
> 
> ...
> ierr = TaoSolve(tao); CHKERRABORT(mpiComm, ierr);
> ...
> 
> ierr = TaoDestroy(&tao); CHKERRABORT(mpiComm, ierr);
> 
> ierr = PetscCommDestroy(&mpiComm); CHKERRABORT(PETSC_COMM_SELF, ierr);
> 
> mpierr = MPI_Comm_free(&mpiCommRaw); if(mpierr != MPI_SUCCESS) throw PSI::Exception("MPI ERROR: MPI_Comm_free");
> 
> 
> On Thu, Dec 20, 2018 at 4:06 PM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
> 
> 
> > On Dec 20, 2018, at 2:24 PM, Krzysztof Kamieniecki via petsc-users <petsc-users at mcs.anl.gov> wrote:
> > 
> > Hi Alp,
> > 
> > Thanks! This worked, I reverted back to v3.9.4 and after removing the monitors (which caused an error in PetscViewerASCIIPopTab) it seems to be passing tests for now.
> > 
> > (For the future peanut gallery) I misread what PetscCommDuplicate does, it does not duplicate Petsc communicators that already "wrap" MPI communicators, so I may look into MPI and creating a completely independent MPI_Comm for each thread.
> 
>    Yes, this should work.
> > 
> > Best Regards,
> > Krys
> > 
> > On Thu, Dec 20, 2018 at 12:16 PM Dener, Alp <adener at anl.gov> wrote:
> > Hi Krys,
> > 
> >> On Dec 20, 2018, at 10:59 AM, Krzysztof Kamieniecki via petsc-users <petsc-users at mcs.anl.gov> wrote:
> >> 
> >> That example seems to have critical sections around certain Vec calls, and it looks like my problem occurs in VecDotBegin/VecDotEnd which is called by TAO/BLMVM.
> > 
> > The quasi-Newton matrix objects in BLMVM have asynchronous dot products in the matrix-free forward and inverse product formulations. This is a relatively recent performance optimization. If avoiding this split phase communication would solve the problem, and you don’t need other recent PETSc features, you could revert to 3.9 and use the old version of BLMVM that will use straight VecDot operations instead.
> > 
> > Unfortunately I don’t know enough about multithreading to definitively say whether that will actually solve the problem or not. Other members of the community can probably provide a more complete answer on that.
> > 
> >> 
> >> I assume  PetscSplitReductionGet is pulling the PetscSplitReduction for PETSC_COMM_SELF which is shared across the whole process?
> >> 
> >> I tried PetscCommDuplicate/PetscCommDestroy but that does not seem to help.
> >> 
> >> PetscErrorCode  VecDotBegin(Vec x,Vec y,PetscScalar *result)
> >> {
> >>   PetscErrorCode      ierr;
> >>   PetscSplitReduction *sr;
> >>   MPI_Comm            comm;
> >> 
> >>   PetscFunctionBegin;
> >>   PetscValidHeaderSpecific(x,VEC_CLASSID,1);
> >>   PetscValidHeaderSpecific(y,VEC_CLASSID,1);
> >>   ierr = PetscObjectGetComm((PetscObject)x,&comm);CHKERRQ(ierr);
> >>   ierr = PetscSplitReductionGet(comm,&sr);CHKERRQ(ierr);
> >>   if (sr->state != STATE_BEGIN) SETERRQ(PETSC_COMM_SELF,PETSC_ERR_ORDER,"Called before all VecxxxEnd() called");
> >>   if (sr->numopsbegin >= sr->maxops) {
> >>     ierr = PetscSplitReductionExtend(sr);CHKERRQ(ierr);
> >>   }
> >>   sr->reducetype[sr->numopsbegin] = PETSC_SR_REDUCE_SUM;
> >>   sr->invecs[sr->numopsbegin]     = (void*)x;
> >>   if (!x->ops->dot_local) SETERRQ(PETSC_COMM_SELF,PETSC_ERR_SUP,"Vector does not suppport local dots");
> >>   ierr = PetscLogEventBegin(VEC_ReduceArithmetic,0,0,0,0);CHKERRQ(ierr);
> >>   ierr = (*x->ops->dot_local)(x,y,sr->lvalues+sr->numopsbegin++);CHKERRQ(ierr);
> >>   ierr = PetscLogEventEnd(VEC_ReduceArithmetic,0,0,0,0);CHKERRQ(ierr);
> >>   PetscFunctionReturn(0);
> >> }
> >> 
> >> 
> >> 
> >> 
> >> On Thu, Dec 20, 2018 at 11:26 AM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
> >> 
> >>    The code src/ksp/ksp/examples/tutorials/ex61f.F90 demonstrates working with multiple threads each managing their own collection of PETSc objects. Hope this helps.
> >> 
> >>     Barry
> >> 
> >> 
> >> > On Dec 20, 2018, at 9:28 AM, Krzysztof Kamieniecki via petsc-users <petsc-users at mcs.anl.gov> wrote:
> >> > 
> >> > Hello All,
> >> > 
> >> > I have an embarrassingly parallel problem that I would like to use TAO on, is there some way to do this with threads as opposed to multiple processes?
> >> > 
> >> >  I compiled PETSc with the following flags
> >> > ./configure \
> >> > --prefix=${DEP_INSTALL_DIR} \
> >> > --with-threadsafety --with-log=0 --download-concurrencykit \
> >> > --with-openblas=1 \
> >> > --with-openblas-dir=${DEP_INSTALL_DIR} \
> >> > --with-mpi=0 \
> >> > --with-shared=0 \
> >> > --with-debugging=0 COPTFLAGS='-O3' CXXOPTFLAGS='-O3' FOPTFLAGS='-O3' 
> >> > 
> >> > When I run TAO in multiple threads I get the error "Called VecxxxEnd() in a different order or with a different vector than VecxxxBegin()"
> >> > 
> >> > Thanks,
> >> > Krys
> >> > 
> >> 
> > 
> > —
> > Alp
> > 
>