A 3D example of KSPSolve?

Mon Feb 12 10:08:59 CST 2007

Well some how the inbalance comes up in your application run - but not
in the test example. It is possible that the application stresses your
machine/memory-subsytem a lot more than the test code.

Your machine has a NUMA [Non-unimform memory access] - so some
messages are local [if the memory is local - and others can take
atleast 3 hops trhough the AMD memory/hypertransport network. I was
assuming the delays due to multiple hops might show up in this test
runs I requested. [but it does not].

So perhaps these multiple hops cause delays only when the memort
network gets stressed - as with your application?

http://www.thg.ru/cpu/20040929/images/opteron_8way.gif

I guess we'll just have to use your app to benchmark. Earlier I
sugested using latest mpich with '--device=ch3:sshm'. Another option
to try is '--with-device=ch3:nemesis'

To do these experiments - you can build different versions of PETSc
[so that you can switch between them all]. i.e use a different value
for PETSC_ARCH for each build:

It is possible that some of the load imbalance happens before the
communication stages - but its visible only in the scatter state [in
log_summary]. So to get a better idea on this - we'll need a Barrier
in VecScatterBegin(). Not sure how to do this.

Barry: does -log_sync add a barrier in vecscatter?

Also - can you confirm that no-one-else/no-other-application is using
this machine when you perform these measurement runs?

Satish

On Sat, 10 Feb 2007, Shi Jin wrote:

> Furthermore, I did a multi-process test on the SMP.
> petscmpirun -n 3 taskset -c 0,2,4 ./ex2 -ksp_type cg
> -log_summary | egrep \(MPI_Send\|MPI_Barrier\)
> Average time for MPI_Barrier(): 4.19617e-06
> Average time for zero size MPI_Send(): 3.65575e-06
> 
>  petscmpirun -n 4 taskset -c 0,2,4,6 ./ex2 -ksp_type
> cg -log_summary | egrep \(MPI_Send\|MPI_Barrier\)
> Average time for MPI_Barrier(): 1.75953e-05
> Average time for zero size MPI_Send(): 2.44975e-05
> 
>  petscmpirun -n 5 taskset -c 0,2,4,6,8 ./ex2 -ksp_type
> cg -log_summary | egrep \(MPI_Send\|MPI_Barrier\)
> Average time for MPI_Barrier(): 4.22001e-05
> Average time for zero size MPI_Send(): 2.54154e-05
> 
> petscmpirun -n 6 taskset -c 0,2,4,6,8,10 ./ex2
> -ksp_type cg -log_summary | egrep
> \(MPI_Send\|MPI_Barrier\)
> Average time for MPI_Barrier(): 4.87804e-05
> Average time for zero size MPI_Send(): 1.83185e-05
> 
> petscmpirun -n 7 taskset -c 0,2,4,6,8,10,12 ./ex2
> -ksp_type cg -log_summary | egrep
> \(MPI_Send\|MPI_Barrier\)
> Average time for MPI_Barrier(): 2.37942e-05
> Average time for zero size MPI_Send(): 5.00679e-06
> 
> petscmpirun -n 8 taskset -c 0,2,4,6,8,10,12,14 ./ex2
> -ksp_type cg -log_summary | egrep
> \(MPI_Send\|MPI_Barrier\)
> Average time for MPI_Barrier(): 1.35899e-05
> Average time for zero size MPI_Send(): 6.73532e-06
> 
> They all seem quite fast.
> Shi
> 
> --- Shi Jin <jinzishuai at yahoo.com> wrote:
> 
> > Yes. The results follow.
> > --- Satish Balay <balay at mcs.anl.gov> wrote:
> > 
> > > Can you send the optupt from the following runs.
> > You
> > > can do this with
> > > src/ksp/ksp/examples/tutorials/ex2.c - to keep
> > > things simple.
> > > 
> > > petscmpirun -n 2 taskset -c 0,2 ./ex2 -log_summary
> > |
> > > egrep \(MPI_Send\|MPI_Barrier\)
> > Average time for MPI_Barrier(): 1.81198e-06
> > Average time for zero size MPI_Send(): 5.00679e-06
> > > petscmpirun -n 2 taskset -c 0,4 ./ex2 -log_summary
> > |
> > > egrep \(MPI_Send\|MPI_Barrier\)
> > Average time for MPI_Barrier(): 2.00272e-06
> > Average time for zero size MPI_Send(): 4.05312e-06
> > > petscmpirun -n 2 taskset -c 0,6 ./ex2 -log_summary
> > |
> > > egrep \(MPI_Send\|MPI_Barrier\)
> > Average time for MPI_Barrier(): 1.7643e-06
> > Average time for zero size MPI_Send(): 4.05312e-06
> > > petscmpirun -n 2 taskset -c 0,8 ./ex2 -log_summary
> > |
> > > egrep \(MPI_Send\|MPI_Barrier\)
> > Average time for MPI_Barrier(): 2.00272e-06
> > Average time for zero size MPI_Send(): 4.05312e-06
> > > petscmpirun -n 2 taskset -c 0,12 ./ex2
> > -log_summary
> > > | egrep \(MPI_Send\|MPI_Barrier\)
> > Average time for MPI_Barrier(): 1.57356e-06
> > Average time for zero size MPI_Send(): 5.48363e-06
> > > petscmpirun -n 2 taskset -c 0,14 ./ex2
> > -log_summary
> > > | egrep \(MPI_Send\|MPI_Barrier\)
> > Average time for MPI_Barrier(): 2.00272e-06
> > Average time for zero size MPI_Send(): 4.52995e-06
> > I also did 
> >  petscmpirun -n 2 taskset -c 0,10 ./ex2 -log_summary
> > |
> > egrep \(MPI_Send\|MPI_Barrier\)
> > Average time for MPI_Barrier(): 5.00679e-06
> > Average time for zero size MPI_Send(): 3.93391e-06
> > 
> > 
> > The results are not so different from each other.
> > Also
> > please note, the timing is not exact, some times I
> > got
> > O(1e-5) timings for all cases.
> > I assume these numbers are pretty good, right? Does
> > it
> > indicate that the MPI communication on a SMP machine
> > is very fast?
> > I will do a similar test on a cluster and report it
> > back to the list.
> > 
> > Shi
> > 
> > 
> > 
> > 
> >  
> >
> ____________________________________________________________________________________
> > Need Mail bonding?
> > Go to the Yahoo! Mail Q&A for great tips from Yahoo!
> > Answers users.
> >
> http://answers.yahoo.com/dir/?link=list&sid=396546091
> > 
> > 
> 
> 
> 
>  
> ____________________________________________________________________________________
> Yahoo! Music Unlimited
> Access over 1 million songs.
> http://music.yahoo.com/unlimited
> 
>