[petsc-users] Enquiry regarding log summary results
TAY wee-beng
zonexo at gmail.com
Thu Oct 4 10:01:52 CDT 2012
On 4/10/2012 3:40 AM, Matthew Knepley wrote:
> On Wed, Oct 3, 2012 at 4:05 PM, TAY wee-beng <zonexo at gmail.com
> <mailto:zonexo at gmail.com>> wrote:
>
> Hi Jed,
>
> I believe they are real cores. Anyway, I have attached the log
> summary for the 12/24/48 cores. I re-run a smaller case because
> the large problem can't run with 12cores.
>
>
> Okay, look at VecScatterBegin/End for 24 and 48 cores (I am guessing
> you have 4 16-core chips, but please figure this out).
> The messages are logged in ScatterBegin, and the time is logged in
> ScatterEnd. From 24 to 48 cores the time is cut in half.
> If you were only communicating the boundary, this is completely
> backwards, so you are communicating a fair fraction of ALL
> the values in a subdomain. Figure out why your partition is so screwed
> up and this will go away.
What do you mean by "If you were only communicating the boundary, this
is completely backwards, so you are communicating a fair fraction of ALL
the values in a subdomain"?
I partition my domain in the z direction, as shown in the attached pic.
The circled region is where the airfoils are. I'm using an immersed
boundary method (IBM) code so the grid is all Cartesian.
I created my Z matrix using:
call
MatCreateAIJ(MPI_COMM_WORLD,ijk_end-ijk_sta,ijk_end-ijk_sta,PETSC_DECIDE,PETSC_DECIDE,7,PETSC_NULL_INTEGER,7,PETSC_NULL_INTEGER,A_semi_z,ierr)
where ijk_sta / ijk_end are the starting/ending global indices of the row.
7 is because the star-stencil is used in 3D.
I create my RHS vector using:
/call
VecCreateMPI(MPI_COMM_WORLD,ijk_end-ijk_sta,PETSC_DECIDE,b_rhs_semi_z,ierr)/
The values for the matrix and vector were calculated before PETSc
logging so they don't come into play.
They are also done in a similar fashion for matrix x and y. I still
can't get it why solving the z momentum eqn takes so much time. Which
portion should I focus on?
Tks!
>
> Matt
>
> Yours sincerely,
>
> TAY wee-beng
>
> On 3/10/2012 5:59 PM, Jed Brown wrote:
>> There is an inordinate amount of time being spent in
>> VecScatterEnd(). That sometimes indicates a very bad partition.
>> Also, are your "48 cores" real physical cores or just "logical
>> cores" (look like cores to the operating system, usually
>> advertised as "threads" by the vendor, nothing like cores in
>> reality)? That can cause a huge load imbalance and very confusing
>> results as over-subscribed threads compete for shared resources.
>> Step it back to 24 threads and 12 threads, send log_summary for each.
>>
>> On Wed, Oct 3, 2012 at 8:08 AM, TAY wee-beng <zonexo at gmail.com
>> <mailto:zonexo at gmail.com>> wrote:
>>
>> On 2/10/2012 2:43 PM, Jed Brown wrote:
>>> On Tue, Oct 2, 2012 at 8:35 AM, TAY wee-beng
>>> <zonexo at gmail.com <mailto:zonexo at gmail.com>> wrote:
>>>
>>> Hi,
>>>
>>> I have combined the momentum linear eqns involving x,y,z
>>> into 1 large matrix. The Poisson eqn is solved using
>>> HYPRE strcut format so it's not included. I run the code
>>> for 50 timesteps (hence 50 kspsolve) using 96 procs. The
>>> log_summary is given below. I have some questions:
>>>
>>> 1. After combining the matrix, I should have only 1
>>> PETSc matrix. Why does it says there are 4 matrix, 12
>>> vector etc?
>>>
>>>
>>> They are part of preconditioning. Are you sure you're using
>>> Hypre for this? It looks like you are using bjacobi/ilu.
>>>
>>>
>>> 2. I'm looking at the stages which take the longest
>>> time. It seems that MatAssemblyBegin, VecNorm,
>>> VecAssemblyBegin, VecScatterEnd have very high ratios.
>>> The ratios of some others are also not too good (~ 1.6 -
>>> 2). So are these stages the reason why my code is not
>>> scaling well? What can I do to improve it?
>>>
>>>
>>> 3/4 of the solve time is evenly balanced between MatMult,
>>> MatSolve, MatLUFactorNumeric, and VecNorm+VecDot.
>>>
>>> The high VecAssembly time might be due to generating a lot
>>> of entries off-process?
>>>
>>> In any case, this looks like an _extremely_ slow network,
>>> perhaps it's misconfigured?
>>
>> My cluster is configured with 48 procs per node. I re-run the
>> case, using only 48 procs, thus there's no need to pass over
>> a 'slow' interconnect. I'm now also using GAMG and BCGS for
>> the poisson and momentum eqn respectively. I have also
>> separated the x,y,z component of the momentum eqn to 3
>> separate linear eqns to debug the problem.
>>
>> Results show that stage "momentum_z" is taking a lot of time.
>> I wonder if it has to do with the fact that I am partitioning
>> my grids in the z direction. VecScatterEnd, MatMult are
>> taking a lot of time. VecNormalize, VecScatterEnd, VecNorm,
>> VecAssemblyBegin 's ratio are also not good.
>>
>> I wonder why a lot of entries are generated off-process.
>>
>> I create my RHS vector using:
>>
>> /call
>> VecCreateMPI(MPI_COMM_WORLD,ijk_xyz_end-ijk_xyz_sta,PETSC_DECIDE,b_rhs_semi_z,ierr)/
>>
>> where ijk_xyz_sta and ijk_xyz_end are obtained from
>>
>> /call
>> MatGetOwnershipRange(A_semi_z,ijk_xyz_sta,ijk_xyz_end,ierr)/
>>
>> I then insert the values into the vector using:
>>
>> /call VecSetValues(b_rhs_semi_z , ijk_xyz_end - ijk_xyz_sta ,
>> (/ijk_xyz_sta : ijk_xyz_end - 1/) , q_semi_vect_z(ijk_xyz_sta
>> + 1 : ijk_xyz_end) , INSERT_VALUES , ierr)/
>>
>> What should I do to correct the problem?
>>
>> Thanks
>>
>>
>>>
>>> Btw, I insert matrix using:
>>>
>>> /do ijk=ijk_xyz_sta+1,ijk_xyz_end//
>>> //
>>> // II = ijk - 1//!Fortran shift to 0-based//
>>> ////
>>> // call
>>> MatSetValues(A_semi_xyz,1,II,7,int_semi_xyz(ijk,1:7),semi_mat_xyz(ijk,1:7),INSERT_VALUES,ierr)//
>>> //
>>> //end do/
>>>
>>> where ijk_xyz_sta/ijk_xyz_end are the starting/end index
>>>
>>> int_semi_xyz(ijk,1:7) stores the 7 column global indices
>>>
>>> semi_mat_xyz has the corresponding values.
>>>
>>> and I insert vectors using:
>>>
>>> call
>>> VecSetValues(b_rhs_semi_xyz,ijk_xyz_end_mz-ijk_xyz_sta_mz,(/ijk_xyz_sta_mz:ijk_xyz_end_mz-1/),q_semi_vect_xyz(ijk_xyz_sta_mz+1:ijk_xyz_end_mz),INSERT_VALUES,ierr)
>>>
>>> Thanks!
>>>
>>> /
>>> /
>>>
>>> Yours sincerely,
>>>
>>> TAY wee-beng
>>>
>>> On 30/9/2012 11:30 PM, Jed Brown wrote:
>>>>
>>>> You can measure the time spent in Hypre via PCApply and
>>>> PCSetUp, but you can't get finer grained integrated
>>>> profiling because it was not set up that way.
>>>>
>>>> On Sep 30, 2012 3:26 PM, "TAY wee-beng"
>>>> <zonexo at gmail.com <mailto:zonexo at gmail.com>> wrote:
>>>>
>>>> On 27/9/2012 1:44 PM, Matthew Knepley wrote:
>>>>> On Thu, Sep 27, 2012 at 3:49 AM, TAY wee-beng
>>>>> <zonexo at gmail.com <mailto:zonexo at gmail.com>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I'm doing a log summary for my 3d cfd code. I
>>>>> have some questions:
>>>>>
>>>>> 1. if I'm solving 3 linear equations using
>>>>> ksp, is the result given in the log summary
>>>>> the total of the 3 linear eqns' performance?
>>>>> How can I get the performance for each
>>>>> individual eqn?
>>>>>
>>>>>
>>>>> Use logging stages:
>>>>> http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogStagePush.html
>>>>>
>>>>> 2. If I run my code for 10 time steps, does
>>>>> the log summary gives the total or avg
>>>>> performance/ratio?
>>>>>
>>>>>
>>>>> Total.
>>>>>
>>>>> 3. Besides PETSc, I'm also using HYPRE's
>>>>> native geometric MG (Struct) to solve my
>>>>> Cartesian's grid CFD poisson eqn. Is there any
>>>>> way I can use PETSc's log summary to get
>>>>> HYPRE's performance? If I use boomerAMG thru
>>>>> PETSc, can I get its performance?
>>>>>
>>>>>
>>>>> If you mean flops, only if you count them yourself
>>>>> and tell PETSc using
>>>>> http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogFlops.html
>>>>>
>>>>> This is the disadvantage of using packages that do
>>>>> not properly monitor things :)
>>>>>
>>>>> Matt
>>>> So u mean if I use boomerAMG thru PETSc, there is
>>>> no proper way of evaluating its performance, beside
>>>> using PetscLogFlops?
>>>>>
>>>>>
>>>>> --
>>>>> Yours sincerely,
>>>>>
>>>>> TAY wee-beng
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> What most experimenters take for granted before
>>>>> they begin their experiments is infinitely more
>>>>> interesting than any results to which their
>>>>> experiments lead.
>>>>> -- Norbert Wiener
>>>>
>>>
>>>
>>
>>
>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121004/1ab24144/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 3d_grid.jpg
Type: image/jpeg
Size: 179125 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121004/1ab24144/attachment-0001.jpg>
More information about the petsc-users
mailing list