[petsc-users] log_summary time ratio and flops ratio

Xiangdong epscodes at gmail.com
Mon Feb 8 21:30:45 CST 2016


On Mon, Feb 8, 2016 at 6:45 PM, Jed Brown <jed at jedbrown.org> wrote:

> Xiangdong <epscodes at gmail.com> writes:
>
> > iii) since the time ratios of VecDot (2.5) and MatMult (1.5) are still
> > high, I rerun the program with ipm module. The IPM summary is here:
> >
> https://drive.google.com/file/d/0BxEfb1tasJxhYXI0VkV0cjlLWUU/view?usp=sharing
> .
> > From this IPM reuslts, MPI_Allreduce takes 74% of MPI time. The
> > communication by task figure (1st figure in p4) in above link showed that
> > it is not well-balanced. Is this related to the hardware and network
> (which
> > the users cannot control) or can I do something on my codes to improve?
>
> Here are a few functions that don't have any communication, but still
> have significant load imbalance.
>
>   VecAXPY          1021815 1.0 2.2148e+01 2.1 1.89e+10 1.1 0.0e+00 0.0e+00
> 0.0e+00  2  4  0  0  0   2  4  0  0  0 207057
>   VecMAXPY          613089 1.0 1.3276e+01 2.2 2.27e+10 1.1 0.0e+00 0.0e+00
> 0.0e+00  1  4  0  0  0   1  4  0  0  0 414499
>   MatSOR            818390 1.0 1.9608e+02 1.5 2.00e+11 1.1 0.0e+00 0.0e+00
> 0.0e+00 22 40  0  0  0  22 40  0  0  0 247472
>
>
For these functions, the flop ratios are all 1.1, while the time ratio are
1.5-2.2. So the amount of work are sort of balanced for each processes.
Both runs on Stampede and my group cluster gave similar behaviors. Given
that I only use 256 cores, do you think it is likely that my job was
assigned cores with different speeds? How can I test/measure this since
each time the job was assigned to different nodes?

Are there any other factors I should also look into for the behavior that
flops ratio 1.1 but time ratio 1.5-2.1 for non-communicating functions?


> You can and should improve load balance before stressing about network
> costs.  This could be that the nodes aren't clean (running at different
> speeds) or that the partition is not balancing data.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160208/3b46b2b7/attachment.html>


More information about the petsc-users mailing list