[petsc-users] Enquiry regarding log summary results

Thu Oct 4 14:16:25 CDT 2012

 On 4/10/2012 5:11 PM, Matthew Knepley wrote:

On Thu, Oct 4, 2012 at 11:01 AM, TAY wee-beng <zonexo at gmail.com> wrote:

>  On 4/10/2012 3:40 AM, Matthew Knepley wrote:
>
> On Wed, Oct 3, 2012 at 4:05 PM, TAY wee-beng <zonexo at gmail.com> wrote:
>
>>  Hi Jed,
>>
>> I believe they are real cores. Anyway, I have attached the log summary
>> for the 12/24/48 cores. I re-run a smaller case because the large problem
>> can't run with 12cores.
>>
>
>  Okay, look at VecScatterBegin/End for 24 and 48 cores (I am guessing you
> have 4 16-core chips, but please figure this out).
> The messages are logged in ScatterBegin, and the time is logged in
> ScatterEnd. From 24 to 48 cores the time is cut in half.
> If you were only communicating the boundary, this is completely backwards,
> so you are communicating a fair fraction of ALL
> the values in a subdomain. Figure out why your partition is so screwed up
> and this will go away.
>
>
> What do you mean by "If you were only communicating the boundary, this is
> completely backwards, so you are communicating a fair fraction of ALL the
> values in a subdomain"?
>

 If you have 48 partitions instead of 24, you have a larger interface, so
AssemblyEnd() should take
slightly longer. However, your AssemblyEnd() takes HALF the time, which
means its communicating
much fewer values, which means you are not sending interface values, you
are sending interior values,
since the interior shrinks when you have more partitions.

 What this probably means is that your assembly routines are screwed up,
and sending data all over the place.

  Ok I got it now. Looking at the AssemblyEnd time,

12 procs

MatAssemblyEnd       145 1.0 1.6342e+01 1.8 0.00e+00 0.0 4.4e+01 6.0e+04
8.0e+00  0  0  0  0  0   0  0  0  0  0     0

VecAssemblyEnd       388 1.0 1.4472e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0

24 procs

MatAssemblyEnd       145 1.0 1.1618e+01 2.4 0.00e+00 0.0 9.2e+01 6.0e+04
8.0e+00  0  0  0  0  0   0  0  0  0  0     0

VecAssemblyEnd       388 1.0 2.3527e-03 2.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0

48 procs

MatAssemblyEnd       145 1.0 7.4327e+00 2.4 0.00e+00 0.0 1.9e+02 6.0e+04
8.0e+00  0  0  0  0  0   0  0  0  0  0

VecAssemblyEnd       388 1.0 2.8818e-03 3.7 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0

VecAssemblyEnd time increases with procs, does it mean that there is
nothing wrong with it?

On the other hand, MatAssemblyEnd time decreases with procs. So that's
where the problem lies, is that so?

I'm still scanning my code but haven't found the error yet. It seems
strange because I inserted the matrix and vector exactly the same way for
x,y,z. The u,v,w are also allocated with the same indices. Shouldn't the
error be the same for x, y and z too?

Trying to get some hints as to where else I need to look in my code...

Tks

    Matt

>  I partition my domain in the z direction, as shown in the attached pic.
> The circled region is where the airfoils are. I'm using an immersed
> boundary method (IBM) code so the grid is all Cartesian.
>
> I created my Z matrix using:
>
> call
> MatCreateAIJ(MPI_COMM_WORLD,ijk_end-ijk_sta,ijk_end-ijk_sta,PETSC_DECIDE,PETSC_DECIDE,7,PETSC_NULL_INTEGER,7,PETSC_NULL_INTEGER,A_semi_z,ierr)
>
> where ijk_sta / ijk_end are the starting/ending global indices of the row.
>
> 7 is because the star-stencil is used in 3D.
>
> I create my RHS vector using:
>
> *call
> VecCreateMPI(MPI_COMM_WORLD,ijk_end-ijk_sta,PETSC_DECIDE,b_rhs_semi_z,ierr)
> *
>
> The values for the matrix and vector were calculated before PETSc logging
> so they don't come into play.
>
> They are also done in a similar fashion for matrix x and y. I still can't
> get it why solving the z momentum eqn takes so much time. Which portion
> should I focus on?
>
> Tks!
>
>
>     Matt
>
>
>>  Yours sincerely,
>>
>> TAY wee-beng
>>
>>  On 3/10/2012 5:59 PM, Jed Brown wrote:
>>
>> There is an inordinate amount of time being spent in VecScatterEnd().
>> That sometimes indicates a very bad partition. Also, are your "48 cores"
>> real physical cores or just "logical cores" (look like cores to the
>> operating system, usually advertised as "threads" by the vendor, nothing
>> like cores in reality)? That can cause a huge load imbalance and very
>> confusing results as over-subscribed threads compete for shared resources.
>> Step it back to 24 threads and 12 threads, send log_summary for each.
>>
>> On Wed, Oct 3, 2012 at 8:08 AM, TAY wee-beng <zonexo at gmail.com> wrote:
>>
>>>  On 2/10/2012 2:43 PM, Jed Brown wrote:
>>>
>>> On Tue, Oct 2, 2012 at 8:35 AM, TAY wee-beng <zonexo at gmail.com> wrote:
>>>
>>>>  Hi,
>>>>
>>>> I have combined the momentum linear eqns involving x,y,z into 1 large
>>>> matrix. The Poisson eqn is solved using HYPRE strcut format so it's not
>>>> included. I run the code for 50 timesteps (hence 50 kspsolve) using 96
>>>> procs. The log_summary is given below. I have some questions:
>>>>
>>>> 1. After combining the matrix, I should have only 1 PETSc matrix. Why
>>>> does it says there are 4 matrix, 12 vector etc?
>>>>
>>>
>>>  They are part of preconditioning. Are you sure you're using Hypre for
>>> this? It looks like you are using bjacobi/ilu.
>>>
>>>
>>>>
>>>> 2. I'm looking at the stages which take the longest time. It seems that
>>>> MatAssemblyBegin, VecNorm, VecAssemblyBegin, VecScatterEnd have very high
>>>> ratios. The ratios of some others are also not too good (~ 1.6 - 2). So are
>>>> these stages the reason why my code is not scaling well? What can I do to
>>>> improve it?
>>>>
>>>
>>>  3/4 of the solve time is evenly balanced between MatMult, MatSolve,
>>> MatLUFactorNumeric, and VecNorm+VecDot.
>>>
>>>  The high VecAssembly time might be due to generating a lot of entries
>>> off-process?
>>>
>>>  In any case, this looks like an _extremely_ slow network, perhaps it's
>>> misconfigured?
>>>
>>>
>>>  My cluster is configured with 48 procs per node. I re-run the case,
>>> using only 48 procs, thus there's no need to pass over a 'slow'
>>> interconnect. I'm now also using GAMG and BCGS for the poisson and momentum
>>> eqn respectively. I have also separated the x,y,z component of the momentum
>>> eqn to 3 separate linear eqns to debug the problem.
>>>
>>> Results show that stage "momentum_z" is taking a lot of time. I wonder
>>> if it has to do with the fact that I am partitioning my grids in the z
>>> direction. VecScatterEnd, MatMult are taking a lot of time. VecNormalize,
>>> VecScatterEnd, VecNorm, VecAssemblyBegin 's ratio are also not good.
>>>
>>> I wonder why a lot of entries are generated off-process.
>>>
>>> I create my RHS vector using:
>>>
>>> *call
>>> VecCreateMPI(MPI_COMM_WORLD,ijk_xyz_end-ijk_xyz_sta,PETSC_DECIDE,b_rhs_semi_z,ierr)
>>> *
>>>
>>> where ijk_xyz_sta and ijk_xyz_end are obtained from
>>>
>>> *call MatGetOwnershipRange(A_semi_z,ijk_xyz_sta,ijk_xyz_end,ierr)*
>>>
>>> I then insert the values into the vector using:
>>>
>>> *call VecSetValues(b_rhs_semi_z , ijk_xyz_end - ijk_xyz_sta ,
>>> (/ijk_xyz_sta : ijk_xyz_end - 1/) , q_semi_vect_z(ijk_xyz_sta + 1 :
>>> ijk_xyz_end) , INSERT_VALUES , ierr)*
>>>
>>> What should I do to correct the problem?
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>>>
>>>> Btw, I insert matrix using:
>>>>
>>>> *do ijk=ijk_xyz_sta+1,ijk_xyz_end**
>>>> **
>>>> **    II = ijk - 1**    !Fortran shift to 0-based**
>>>> **    **
>>>> **    call
>>>> MatSetValues(A_semi_xyz,1,II,7,int_semi_xyz(ijk,1:7),semi_mat_xyz(ijk,1:7),INSERT_VALUES,ierr)
>>>> **
>>>> **
>>>> **end do*
>>>>
>>>> where ijk_xyz_sta/ijk_xyz_end are the starting/end index
>>>>
>>>> int_semi_xyz(ijk,1:7) stores the 7 column global indices
>>>>
>>>> semi_mat_xyz has the corresponding values.
>>>>
>>>> and I insert vectors using:
>>>>
>>>> call
>>>> VecSetValues(b_rhs_semi_xyz,ijk_xyz_end_mz-ijk_xyz_sta_mz,(/ijk_xyz_sta_mz:ijk_xyz_end_mz-1/),q_semi_vect_xyz(ijk_xyz_sta_mz+1:ijk_xyz_end_mz),INSERT_VALUES,ierr)
>>>>
>>>> Thanks!
>>>>
>>>> *
>>>> *
>>>>
>>>> Yours sincerely,
>>>>
>>>> TAY wee-beng
>>>>
>>>>  On 30/9/2012 11:30 PM, Jed Brown wrote:
>>>>
>>>> You can measure the time spent in Hypre via PCApply and PCSetUp, but
>>>> you can't get finer grained integrated profiling because it was not set up
>>>> that way.
>>>> On Sep 30, 2012 3:26 PM, "TAY wee-beng" <zonexo at gmail.com> wrote:
>>>>
>>>>>  On 27/9/2012 1:44 PM, Matthew Knepley wrote:
>>>>>
>>>>> On Thu, Sep 27, 2012 at 3:49 AM, TAY wee-beng <zonexo at gmail.com>wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm doing a log summary for my 3d cfd code. I have some questions:
>>>>>>
>>>>>> 1. if I'm solving 3 linear equations using ksp, is the result given
>>>>>> in the log summary the total of the 3 linear eqns' performance? How can I
>>>>>> get the performance for each individual eqn?
>>>>>>
>>>>>
>>>>>  Use logging stages:
>>>>> http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogStagePush.html
>>>>>
>>>>>
>>>>>> 2. If I run my code for 10 time steps, does the log summary gives the
>>>>>> total or avg performance/ratio?
>>>>>>
>>>>>
>>>>>  Total.
>>>>>
>>>>>
>>>>>> 3. Besides PETSc, I'm also using HYPRE's native geometric MG (Struct)
>>>>>> to solve my Cartesian's grid CFD poisson eqn. Is there any way I can use
>>>>>> PETSc's log summary to get HYPRE's performance? If I use boomerAMG thru
>>>>>> PETSc, can I get its performance?
>>>>>
>>>>>
>>>>>  If you mean flops, only if you count them yourself and tell PETSc
>>>>> using
>>>>> http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogFlops.html
>>>>>
>>>>>  This is the disadvantage of using packages that do not properly
>>>>> monitor things :)
>>>>>
>>>>>      Matt
>>>>>
>>>>>
>>>>> So u mean if I use boomerAMG thru PETSc, there is no proper way of
>>>>> evaluating its performance, beside using PetscLogFlops?
>>>>>
>>>>>
>>>>>> --
>>>>>> Yours sincerely,
>>>>>>
>>>>>> TAY wee-beng
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>  --
>>>>> What most experimenters take for granted before they begin their
>>>>> experiments is infinitely more interesting than any results to which their
>>>>> experiments lead.
>>>>> -- Norbert Wiener
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
>
>
>  --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
>
>

 --
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121004/d817ce1f/attachment-0001.html>