[petsc-dev] new P^1.5 algorithm in VecAssembleBegin?

Barry Smith bsmith at mcs.anl.gov
Fri May 29 13:16:35 CDT 2015


   
> On May 29, 2015, at 7:55 AM, Mark Adams <mfadams at lbl.gov> wrote:
> 
> I am suspecting that it is catching load imbalance and just not reporting it correctly.


    The code is trivial and exactly the same as in many other places where the load balance of not 1.0 is reported so something is funky:

PetscErrorCode  VecAssemblyBegin(Vec vec)
{
  PetscErrorCode ierr;

  PetscFunctionBegin;
  PetscValidHeaderSpecific(vec,VEC_CLASSID,1);
  PetscValidType(vec,1);
  ierr = VecStashViewFromOptions(vec,NULL,"-vec_view_stash");CHKERRQ(ierr);
  ierr = PetscLogEventBegin(VEC_AssemblyBegin,vec,0,0,0);CHKERRQ(ierr);
  if (vec->ops->assemblybegin) {
    ierr = (*vec->ops->assemblybegin)(vec);CHKERRQ(ierr);
  }
  ierr = PetscLogEventEnd(VEC_AssemblyBegin,vec,0,0,0);CHKERRQ(ierr);
  ierr = PetscObjectStateIncrease((PetscObject)vec);CHKERRQ(ierr);
  PetscFunctionReturn(0);
}

  I cannot explain why the load balance would be 1.0 unless, by unlikely coincidence on the 248 different calls to the function different processes are the ones waiting so that the sum of the waits on different processes matches over the 248 calls. Possible but 


> I've added a barrier in the code.

   You don't need a barrier.  If you do not have a barrier you should see all the "wait time" now accumulate somewhere later in the code at the next reduction after the VecAssemblyBegin/End.

  Barry



> 
> Here are the two log files.
> 
> On Thu, May 28, 2015 at 7:48 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
>    VecAssemblyBegin() serves as a barrier unless you set the vector option VEC_IGNORE_OFF_PROC_ENTRIES so I am not surprised that it "appears" to take a lot of time. BUT the balance between the fastest and slowest is listed in your table below is 1.0  which is very surprising; indicating every process supposedly spent the same amount of time within the VecAssemblyBegin(). Note that for VecAssemblyEnd() the balance is 2.3 which is what I commonly would expect. Please send me ALL the output for -log_summary for these cases.  Version of PETSc shouldn't matter for this issue.
> 
> > On May 28, 2015, at 4:59 PM, Mark Adams <mfadams at lbl.gov> wrote:
> >
> > We are seeing some large times spent in VecAssemblyBegin:
> >
> > VecAssemblyBegin     242 1.0 7.9796e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 7.3e+02 12  0  0  0  5  76  0  0  0 10     0
> > VecAssemblyEnd       242 1.0 5.6624e-04 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> >
> > This is with 64K cores on Edison.  On 128K cores (weak speedup) we see:
> >
> > VecAssemblyBegin     248 1.0 2.3615e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 7.4e+02 17  0  0  0  4  87  0  0  0 10     0
> > VecAssemblyEnd       248 1.0 6.8855e-04 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> >
> > We are working on using older versions of PETSc to make sure this is a PETSc issue but does anyone have any thoughts on this?
> >
> > Mark
> 
> 
> <log_64K><log_128K>




More information about the petsc-dev mailing list