[petsc-users] Enquiry regarding log summary results

Thu Oct 4 10:01:52 CDT 2012

On 4/10/2012 3:40 AM, Matthew Knepley wrote:
> On Wed, Oct 3, 2012 at 4:05 PM, TAY wee-beng <zonexo at gmail.com 
> <mailto:zonexo at gmail.com>> wrote:
>
>     Hi Jed,
>
>     I believe they are real cores. Anyway, I have attached the log
>     summary for the 12/24/48 cores. I re-run a smaller case because
>     the large problem can't run with 12cores.
>
>
> Okay, look at VecScatterBegin/End for 24 and 48 cores (I am guessing 
> you have 4 16-core chips, but please figure this out).
> The messages are logged in ScatterBegin, and the time is logged in 
> ScatterEnd. From 24 to 48 cores the time is cut in half.
> If you were only communicating the boundary, this is completely 
> backwards, so you are communicating a fair fraction of ALL
> the values in a subdomain. Figure out why your partition is so screwed 
> up and this will go away.

What do you mean by "If you were only communicating the boundary, this 
is completely backwards, so you are communicating a fair fraction of ALL 
the values in a subdomain"?

I partition my domain in the z direction, as shown in the attached pic. 
The circled region is where the airfoils are. I'm using an immersed 
boundary method (IBM) code so the grid is all Cartesian.

I created my Z matrix using:

call 
MatCreateAIJ(MPI_COMM_WORLD,ijk_end-ijk_sta,ijk_end-ijk_sta,PETSC_DECIDE,PETSC_DECIDE,7,PETSC_NULL_INTEGER,7,PETSC_NULL_INTEGER,A_semi_z,ierr)

where ijk_sta / ijk_end are the starting/ending global indices of the row.

7 is because the star-stencil is used in 3D.

I create my RHS vector using:

/call 
VecCreateMPI(MPI_COMM_WORLD,ijk_end-ijk_sta,PETSC_DECIDE,b_rhs_semi_z,ierr)/

The values for the matrix and vector were calculated before PETSc 
logging so they don't come into play.

They are also done in a similar fashion for matrix x and y. I still 
can't get it why solving the z momentum eqn takes so much time. Which 
portion should I focus on?

Tks!

>
>    Matt
>
>     Yours sincerely,
>
>     TAY wee-beng
>
>     On 3/10/2012 5:59 PM, Jed Brown wrote:
>>     There is an inordinate amount of time being spent in
>>     VecScatterEnd(). That sometimes indicates a very bad partition.
>>     Also, are your "48 cores" real physical cores or just "logical
>>     cores" (look like cores to the operating system, usually
>>     advertised as "threads" by the vendor, nothing like cores in
>>     reality)? That can cause a huge load imbalance and very confusing
>>     results as over-subscribed threads compete for shared resources.
>>     Step it back to 24 threads and 12 threads, send log_summary for each.
>>
>>     On Wed, Oct 3, 2012 at 8:08 AM, TAY wee-beng <zonexo at gmail.com
>>     <mailto:zonexo at gmail.com>> wrote:
>>
>>         On 2/10/2012 2:43 PM, Jed Brown wrote:
>>>         On Tue, Oct 2, 2012 at 8:35 AM, TAY wee-beng
>>>         <zonexo at gmail.com <mailto:zonexo at gmail.com>> wrote:
>>>
>>>             Hi,
>>>
>>>             I have combined the momentum linear eqns involving x,y,z
>>>             into 1 large matrix. The Poisson eqn is solved using
>>>             HYPRE strcut format so it's not included. I run the code
>>>             for 50 timesteps (hence 50 kspsolve) using 96 procs. The
>>>             log_summary is given below. I have some questions:
>>>
>>>             1. After combining the matrix, I should have only 1
>>>             PETSc matrix. Why does it says there are 4 matrix, 12
>>>             vector etc?
>>>
>>>
>>>         They are part of preconditioning. Are you sure you're using
>>>         Hypre for this? It looks like you are using bjacobi/ilu.
>>>
>>>
>>>             2. I'm looking at the stages which take the longest
>>>             time. It seems that MatAssemblyBegin, VecNorm,
>>>             VecAssemblyBegin, VecScatterEnd have very high ratios.
>>>             The ratios of some others are also not too good (~ 1.6 -
>>>             2). So are these stages the reason why my code is not
>>>             scaling well? What can I do to improve it?
>>>
>>>
>>>         3/4 of the solve time is evenly balanced between MatMult,
>>>         MatSolve, MatLUFactorNumeric, and VecNorm+VecDot.
>>>
>>>         The high VecAssembly time might be due to generating a lot
>>>         of entries off-process?
>>>
>>>         In any case, this looks like an _extremely_ slow network,
>>>         perhaps it's misconfigured?
>>
>>         My cluster is configured with 48 procs per node. I re-run the
>>         case, using only 48 procs, thus there's no need to pass over
>>         a 'slow' interconnect. I'm now also using GAMG and BCGS for
>>         the poisson and momentum eqn respectively. I have also
>>         separated the x,y,z component of the momentum eqn to 3
>>         separate linear eqns to debug the problem.
>>
>>         Results show that stage "momentum_z" is taking a lot of time.
>>         I wonder if it has to do with the fact that I am partitioning
>>         my grids in the z direction. VecScatterEnd, MatMult are
>>         taking a lot of time. VecNormalize, VecScatterEnd, VecNorm,
>>         VecAssemblyBegin 's ratio are also not good.
>>
>>         I wonder why a lot of entries are generated off-process.
>>
>>         I create my RHS vector using:
>>
>>         /call
>>         VecCreateMPI(MPI_COMM_WORLD,ijk_xyz_end-ijk_xyz_sta,PETSC_DECIDE,b_rhs_semi_z,ierr)/
>>
>>         where ijk_xyz_sta and ijk_xyz_end are obtained from
>>
>>         /call
>>         MatGetOwnershipRange(A_semi_z,ijk_xyz_sta,ijk_xyz_end,ierr)/
>>
>>         I then insert the values into the vector using:
>>
>>         /call VecSetValues(b_rhs_semi_z , ijk_xyz_end - ijk_xyz_sta ,
>>         (/ijk_xyz_sta : ijk_xyz_end - 1/) , q_semi_vect_z(ijk_xyz_sta
>>         + 1 : ijk_xyz_end) , INSERT_VALUES , ierr)/
>>
>>         What should I do to correct the problem?
>>
>>         Thanks
>>
>>
>>>
>>>             Btw, I insert matrix using:
>>>
>>>             /do ijk=ijk_xyz_sta+1,ijk_xyz_end//
>>>             //
>>>             //    II = ijk - 1//!Fortran shift to 0-based//
>>>             ////
>>>             //    call
>>>             MatSetValues(A_semi_xyz,1,II,7,int_semi_xyz(ijk,1:7),semi_mat_xyz(ijk,1:7),INSERT_VALUES,ierr)//
>>>             //
>>>             //end do/
>>>
>>>             where ijk_xyz_sta/ijk_xyz_end are the starting/end index
>>>
>>>             int_semi_xyz(ijk,1:7) stores the 7 column global indices
>>>
>>>             semi_mat_xyz has the corresponding values.
>>>
>>>             and I insert vectors using:
>>>
>>>             call
>>>             VecSetValues(b_rhs_semi_xyz,ijk_xyz_end_mz-ijk_xyz_sta_mz,(/ijk_xyz_sta_mz:ijk_xyz_end_mz-1/),q_semi_vect_xyz(ijk_xyz_sta_mz+1:ijk_xyz_end_mz),INSERT_VALUES,ierr)
>>>
>>>             Thanks!
>>>
>>>             /
>>>             /
>>>
>>>             Yours sincerely,
>>>
>>>             TAY wee-beng
>>>
>>>             On 30/9/2012 11:30 PM, Jed Brown wrote:
>>>>
>>>>             You can measure the time spent in Hypre via PCApply and
>>>>             PCSetUp, but you can't get finer grained integrated
>>>>             profiling because it was not set up that way.
>>>>
>>>>             On Sep 30, 2012 3:26 PM, "TAY wee-beng"
>>>>             <zonexo at gmail.com <mailto:zonexo at gmail.com>> wrote:
>>>>
>>>>                 On 27/9/2012 1:44 PM, Matthew Knepley wrote:
>>>>>                 On Thu, Sep 27, 2012 at 3:49 AM, TAY wee-beng
>>>>>                 <zonexo at gmail.com <mailto:zonexo at gmail.com>> wrote:
>>>>>
>>>>>                     Hi,
>>>>>
>>>>>                     I'm doing a log summary for my 3d cfd code. I
>>>>>                     have some questions:
>>>>>
>>>>>                     1. if I'm solving 3 linear equations using
>>>>>                     ksp, is the result given in the log summary
>>>>>                     the total of the 3 linear eqns' performance?
>>>>>                     How can I get the performance for each
>>>>>                     individual eqn?
>>>>>
>>>>>
>>>>>                 Use logging stages:
>>>>>                 http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogStagePush.html
>>>>>
>>>>>                     2. If I run my code for 10 time steps, does
>>>>>                     the log summary gives the total or avg
>>>>>                     performance/ratio?
>>>>>
>>>>>
>>>>>                 Total.
>>>>>
>>>>>                     3. Besides PETSc, I'm also using HYPRE's
>>>>>                     native geometric MG (Struct) to solve my
>>>>>                     Cartesian's grid CFD poisson eqn. Is there any
>>>>>                     way I can use PETSc's log summary to get
>>>>>                     HYPRE's performance? If I use boomerAMG thru
>>>>>                     PETSc, can I get its performance?
>>>>>
>>>>>
>>>>>                 If you mean flops, only if you count them yourself
>>>>>                 and tell PETSc using
>>>>>                 http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogFlops.html
>>>>>
>>>>>                 This is the disadvantage of using packages that do
>>>>>                 not properly monitor things :)
>>>>>
>>>>>                     Matt
>>>>                 So u mean if I use boomerAMG thru PETSc, there is
>>>>                 no proper way of evaluating its performance, beside
>>>>                 using PetscLogFlops?
>>>>>
>>>>>
>>>>>                     -- 
>>>>>                     Yours sincerely,
>>>>>
>>>>>                     TAY wee-beng
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>                 -- 
>>>>>                 What most experimenters take for granted before
>>>>>                 they begin their experiments is infinitely more
>>>>>                 interesting than any results to which their
>>>>>                 experiments lead.
>>>>>                 -- Norbert Wiener
>>>>
>>>
>>>
>>
>>
>
>
>
>
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121004/1ab24144/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 3d_grid.jpg
Type: image/jpeg
Size: 179125 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121004/1ab24144/attachment-0001.jpg>