[petsc-users] log_summary time ratio and flops ratio
Xiangdong
epscodes at gmail.com
Mon Feb 8 16:21:22 CST 2016
Based on what you suggested, I have done the following:
i) rerun the same problem without output. The ratios are still roughly the
same. So it is not the problem of IO.
ii) rerun the program on a supercomputer (Stampede), instead of group
cluster. the MPI_Barrier time got better:
Average time to get PetscTime(): 0
Average time for MPI_Barrier(): 1.27792e-05
Average time for zero size MPI_Send(): 3.94508e-06
the full petsc logsummary is here:
https://googledrive.com/host/0BxEfb1tasJxhTjNTVXh4bmJmWlk
iii) since the time ratios of VecDot (2.5) and MatMult (1.5) are still
high, I rerun the program with ipm module. The IPM summary is here:
https://drive.google.com/file/d/0BxEfb1tasJxhYXI0VkV0cjlLWUU/view?usp=sharing.
>From this IPM reuslts, MPI_Allreduce takes 74% of MPI time. The
communication by task figure (1st figure in p4) in above link showed that
it is not well-balanced. Is this related to the hardware and network (which
the users cannot control) or can I do something on my codes to improve?
Thank you.
Best,
Xiangdong
On Fri, Feb 5, 2016 at 10:34 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> Make the same run with no IO and see if the numbers are much better and
> if the load balance is better.
>
> > On Feb 5, 2016, at 8:59 PM, Xiangdong <epscodes at gmail.com> wrote:
> >
> > If I want to know whether only rank 0 is slow (since it may has more io)
> or actually a portion of cores are slow, what tools can I start with?
> >
> > Thanks.
> >
> > Xiangdong
> >
> > On Fri, Feb 5, 2016 at 5:27 PM, Jed Brown <jed at jedbrown.org> wrote:
> > Matthew Knepley <knepley at gmail.com> writes:
> > >> I attached the full summary. At the end, it has
> > >>
> > >> Average time to get PetscTime(): 0
> > >> Average time for MPI_Barrier(): 8.3971e-05
> > >> Average time for zero size MPI_Send(): 7.16746e-06
> > >>
> > >> Is it an indication of slow network?
> > >>
> > >
> > > I think so. It takes nearly 100 microseconds to synchronize processes.
> >
> > Edison with 65536 processes:
> > Average time for MPI_Barrier(): 4.23908e-05
> > Average time for zero size MPI_Send(): 2.46466e-06
> >
> > Mira with 16384 processes:
> > Average time for MPI_Barrier(): 5.7075e-06
> > Average time for zero size MPI_Send(): 1.33179e-05
> >
> > Titan with 131072 processes:
> > Average time for MPI_Barrier(): 0.000368595
> > Average time for zero size MPI_Send(): 1.71567e-05
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160208/51df043b/attachment.html>
More information about the petsc-users
mailing list