<div dir="ltr"><div dir="ltr"><div>Hi Alp,</div><div><br></div><div>Thanks! This worked, I reverted back to v3.9.4 and after removing the monitors (which caused an error in PetscViewerASCIIPopTab) it seems to be passing tests for now.</div><div><br></div><div>(For the future peanut gallery) I misread what PetscCommDuplicate does, it does not duplicate Petsc communicators that already "wrap" MPI communicators, so I may look into MPI and creating a completely independent MPI_Comm for each thread.</div><div><br></div><div>Best Regards,</div><div>Krys</div><div><br></div><div class="gmail_quote"><div dir="ltr">On Thu, Dec 20, 2018 at 12:16 PM Dener, Alp <<a href="mailto:adener@anl.gov">adener@anl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div style="overflow-wrap: break-word;">
Hi Krys,<br>
<div><br>
<blockquote type="cite">
<div>On Dec 20, 2018, at 10:59 AM, Krzysztof Kamieniecki via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>> wrote:</div>
<br class="gmail-m_-8218347454224286929Apple-interchange-newline">
<div>
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">That example seems to have critical sections around certain Vec calls, and it looks like my problem occurs in VecDotBegin/VecDotEnd which is called by TAO/BLMVM.</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>The quasi-Newton matrix objects in BLMVM have asynchronous dot products in the matrix-free forward and inverse product formulations. This is a relatively recent performance optimization. If avoiding this split phase communication would solve the problem,
and you don’t need other recent PETSc features, you could revert to 3.9 and use the old version of BLMVM that will use straight VecDot operations instead.</div>
<div><br>
</div>
<div>Unfortunately I don’t know enough about multithreading to definitively say whether that will actually solve the problem or not. Other members of the community can probably provide a more complete answer on that.</div>
<br>
<blockquote type="cite">
<div>
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr"><br>
</div>
<div dir="ltr">I assume PetscSplitReductionGet is pulling the PetscSplitReduction for PETSC_COMM_SELF which is shared across the whole process?</div>
<div dir="ltr"><br>
</div>
<div dir="ltr">I tried PetscCommDuplicate/PetscCommDestroy but that does not seem to help.<br>
</div>
<div dir="ltr">
<div><br>
</div>
<div>
<div>PetscErrorCode VecDotBegin(Vec x,Vec y,PetscScalar *result)</div>
<div>{</div>
<div> PetscErrorCode ierr;</div>
<div> PetscSplitReduction *sr;</div>
<div> MPI_Comm comm;</div>
<div><br>
</div>
<div> PetscFunctionBegin;</div>
<div> PetscValidHeaderSpecific(x,VEC_CLASSID,1);</div>
<div> PetscValidHeaderSpecific(y,VEC_CLASSID,1);</div>
<div> ierr = PetscObjectGetComm((PetscObject)x,&comm);CHKERRQ(ierr);</div>
<div> ierr = PetscSplitReductionGet(comm,&sr);CHKERRQ(ierr);</div>
<div> if (sr->state != STATE_BEGIN) SETERRQ(PETSC_COMM_SELF,PETSC_ERR_ORDER,"Called before all VecxxxEnd() called");</div>
<div> if (sr->numopsbegin >= sr->maxops) {</div>
<div> ierr = PetscSplitReductionExtend(sr);CHKERRQ(ierr);</div>
<div> }</div>
<div> sr->reducetype[sr->numopsbegin] = PETSC_SR_REDUCE_SUM;</div>
<div> sr->invecs[sr->numopsbegin] = (void*)x;</div>
<div> if (!x->ops->dot_local) SETERRQ(PETSC_COMM_SELF,PETSC_ERR_SUP,"Vector does not suppport local dots");</div>
<div> ierr = PetscLogEventBegin(VEC_ReduceArithmetic,0,0,0,0);CHKERRQ(ierr);</div>
<div> ierr = (*x->ops->dot_local)(x,y,sr->lvalues+sr->numopsbegin++);CHKERRQ(ierr);</div>
<div> ierr = PetscLogEventEnd(VEC_ReduceArithmetic,0,0,0,0);CHKERRQ(ierr);</div>
<div> PetscFunctionReturn(0);</div>
<div>}</div>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
</div>
</div>
</div>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr">On Thu, Dec 20, 2018 at 11:26 AM Smith, Barry F. <<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
The code src/ksp/ksp/examples/tutorials/ex61f.F90 demonstrates working with multiple threads each managing their own collection of PETSc objects. Hope this helps.<br>
<br>
Barry<br>
<br>
<br>
> On Dec 20, 2018, at 9:28 AM, Krzysztof Kamieniecki via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>> wrote:<br>
> <br>
> Hello All,<br>
> <br>
> I have an embarrassingly parallel problem that I would like to use TAO on, is there some way to do this with threads as opposed to multiple processes?<br>
> <br>
> I compiled PETSc with the following flags<br>
> ./configure \<br>
> --prefix=${DEP_INSTALL_DIR} \<br>
> --with-threadsafety --with-log=0 --download-concurrencykit \<br>
> --with-openblas=1 \<br>
> --with-openblas-dir=${DEP_INSTALL_DIR} \<br>
> --with-mpi=0 \<br>
> --with-shared=0 \<br>
> --with-debugging=0 COPTFLAGS='-O3' CXXOPTFLAGS='-O3' FOPTFLAGS='-O3' <br>
> <br>
> When I run TAO in multiple threads I get the error "Called VecxxxEnd() in a different order or with a different vector than VecxxxBegin()"<br>
> <br>
> Thanks,<br>
> Krys<br>
> <br>
<br>
</blockquote>
</div>
</div>
</blockquote>
<br>
</div>
<div>—</div>
<div>Alp</div>
<br>
</div>
</blockquote></div></div></div>