[petsc-users] Variable wall times

Barry Smith bsmith at mcs.anl.gov
Mon Feb 22 15:17:45 CST 2016


> On Feb 22, 2016, at 3:03 PM, Phanisri Pradeep Pratapa <ppratapa at gatech.edu> wrote:
> 
> @Barry:
> I am running the job on a single node with 64 processors and using all of them. So do I still have to do the binding? (From what I understand from the link you have provided, the switching of processes between cores could be the reason. But will that be a reason even when all processors are doing equal amount of computations?)

   Absolutely! If processes are switched then all data in the cache is invalid so having the processes migrate around (even if you are the only one using the machine) can change the timing a lot.

  Barry

> 
> @Jed:
> There are no other jobs running on that node (and also no other jobs on the queue). So could the noise be due to load balance issue or something else in computing dot products and MatMults? Is this to be expected? If so, is the only work around for this is to report a statistics of wall time for the runs? (When I compare two different algorithms, the speed-up of one over the other varies significantly because of this issue.)
> 
> Thank you for the replies.
> 
> Regards,
> 
> Pradeep 
> 
> On Mon, Feb 22, 2016 at 2:14 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
>    Also make sure you are doing consistent binding of MPI processes to cores:  see http://www.mcs.anl.gov/petsc/documentation/faq.html#computers  and run the streams benchmarks several times to see if you get consistent results.
> 
>    With no other user or system processes consuming large amounts of cycles and suitable binding you should get pretty consistent results.
> 
>   Barry
> 
> 
> 
> > On Feb 22, 2016, at 1:08 PM, Jed Brown <jed at jedbrown.org> wrote:
> >
> > Phanisri Pradeep Pratapa <ppratapa at gatech.edu> writes:
> >
> >> Hi,
> >>
> >> I am trying to compare run times for a function in petsc using ksp solve.
> >> When I run the code in parallel on a single node (64 processors), it seems
> >> to take different wall time for each run, although run on exactly same
> >> processors. For example one time it could take 10 sec where as the same run
> >> next time could take 20 sec. (This happens for other choices of processors
> >> too, not just 64)
> >
> > You probably just have a noisy system.  It could be due to the
> > architecture and system software or it could be caused by contention
> > with other jobs and/or misconfiguration.
> >
> > See this paper, especially Figure 1.
> >
> > http://unixer.de/publications/img/hoefler-scientific-benchmarking.pdf
> 
> 



More information about the petsc-users mailing list