[petsc-users] [Ext] Re: Very slow VecDot operations

Junchao Zhang junchao.zhang at gmail.com
Fri May 20 15:03:44 CDT 2022


You can also use -log_view -log_sync to sync before timing so that you can
clearly see which operations are really imbalanced.

--Junchao Zhang


On Fri, May 20, 2022 at 12:37 PM Ernesto Prudencio via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> Thank you, Barry. I will dig more on the issue with your suggestions.
>
>
>
>
>
> Schlumberger-Private
>
> *From:* Barry Smith <bsmith at petsc.dev>
> *Sent:* Friday, May 20, 2022 12:33 PM
> *To:* Ernesto Prudencio <EPrudencio at slb.com>
> *Cc:* PETSc users list <petsc-users at mcs.anl.gov>
> *Subject:* [Ext] Re: [petsc-users] Very slow VecDot operations
>
>
>
>
>
>   Ernesto,
>
>
>
>     If you ran (or can run) with -log_view you could see the time "ratio"
> in the output that tells how much time the "fastest" rank spent on the dot
> product versus the "slowest". Based on the different counts per rank you
> report that ratio might be around 3. But based on the times you report is
> around 200!
>
>
>
>     My guess is that for the VecDotRhs() some ranks are arriving at the
> vec dot long before other ranks and have to wait there an extremely long
> amount of time making it appear that the dot product is very slow. While,
> in reality, the large time credited to the vecdot is due to a misbalance in
> time for the operation before the VecDot.
>
>
>
>    Barry
>
>
>
>
>
> On May 20, 2022, at 1:23 PM, Ernesto Prudencio via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
>
>
> I am using LSQR to minimize || L x – b ||_2, where L is a sparse
> rectangular matrix with 145,253,395 rows, 209,423,775 columns, and around
> 54 billion non zeros.
>
>
>
> The numbers reported below are for a run with 27 compute nodes, each
> compute node with 4 MPI ranks, so a total of 108 ranks.
>
>
>
> Throughout the run, I assess the runtime taken by all dot products during
> the LSQR iterations, and I differentiate between dot products involving
> vectors of the size of the solution vector “x”, and dot products involving
> vectors of the size of the rhs “b”. Here are the numbers I get (we have an
> implementation of LSQR that performs some extra vector dot products for our
> needs):
>
>
>
> 236 VecDotSol take 1.523 seconds
>
> 226 VecDotRhs take 326.008 seconds
>
>
>
> Regarding the partition of rows and columns among the 108 MPI ranks:
>
>
>
> Rows: min = 838,529 ; avg = 1.34494e+06 ; max = 2,437,206
>
> Columns: min = 1,903,500 ; avg = 1.93911e+06 ; max =  1,946,270
>
>
>
> Regarding the partition of rows and columns among the 27 compute nodes:
>
>
>
> Rows: min = 3,575,584 ; avg = 5.37976e+06 ; max = 8,788,062
>
> Columns: min = 7,637,500 ; avg = 7.75644e+06 ; max = 7,785,080
>
>
>
> Questions:
>
>    1. Why the average run times are so different between VecDotSol and
>    VecDotRhs?
>    2. Could the much bigger unbalancing among the number of rows per rank
>    (compared to the very well balanced distribution of columns per rank) be
>    the cause?
>    3. Have you ever observed such situation?
>    4. Could it be because of a bad MPI configuration / parametrization
>    with respect to the underlying network?
>    5. But, if yes, why the VecDotSol dot products are so much faster than
>    VecDotRhs?
>
>
>
> Thank you in advance,
>
>
>
> Ernesto.
>
>
>
>
>
> Schlumberger-Private
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220520/70185c20/attachment.html>


More information about the petsc-users mailing list