[petsc-users] Variable wall times

Phanisri Pradeep Pratapa ppratapa at gatech.edu
Mon Feb 22 15:03:13 CST 2016


@Barry:
I am running the job on a single node with 64 processors and using all of
them. So do I still have to do the binding? (From what I understand from
the link you have provided, the switching of processes between cores could
be the reason. But will that be a reason even when all processors are doing
equal amount of computations?)

@Jed:
There are no other jobs running on that node (and also no other jobs on the
queue). So could the noise be due to load balance issue or something else
in computing dot products and MatMults? Is this to be expected? If so, is
the only work around for this is to report a statistics of wall time for
the runs? (When I compare two different algorithms, the speed-up of one
over the other varies significantly because of this issue.)

Thank you for the replies.

Regards,

Pradeep

On Mon, Feb 22, 2016 at 2:14 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
>    Also make sure you are doing consistent binding of MPI processes to
> cores:  see http://www.mcs.anl.gov/petsc/documentation/faq.html#computers
> and run the streams benchmarks several times to see if you get consistent
> results.
>
>    With no other user or system processes consuming large amounts of
> cycles and suitable binding you should get pretty consistent results.
>
>   Barry
>
>
>
> > On Feb 22, 2016, at 1:08 PM, Jed Brown <jed at jedbrown.org> wrote:
> >
> > Phanisri Pradeep Pratapa <ppratapa at gatech.edu> writes:
> >
> >> Hi,
> >>
> >> I am trying to compare run times for a function in petsc using ksp
> solve.
> >> When I run the code in parallel on a single node (64 processors), it
> seems
> >> to take different wall time for each run, although run on exactly same
> >> processors. For example one time it could take 10 sec where as the same
> run
> >> next time could take 20 sec. (This happens for other choices of
> processors
> >> too, not just 64)
> >
> > You probably just have a noisy system.  It could be due to the
> > architecture and system software or it could be caused by contention
> > with other jobs and/or misconfiguration.
> >
> > See this paper, especially Figure 1.
> >
> > http://unixer.de/publications/img/hoefler-scientific-benchmarking.pdf
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160222/2582219d/attachment-0001.html>


More information about the petsc-users mailing list