A 3D example of KSPSolve?

Mon Feb 12 22:22:14 CST 2007

Thank you Satish.
I cannot say that no one is using that machine when I
ran. But I made sure that when I use taskset, the
processors are exclusively mine. The total number of
running jobs is always smaller than the available
runs.

I think I will stop benchmarking the SMP machine for
the time being and focus my concentration on the code
to run on a distributed memory cluster. I think it is
very likely I can make some improvement to the
existing  code by tuning the linear solver and
preconditioner. I am starting another thread on how to
use the incomplete  cholesky docomposition (ICC) as a
preconditioner for my congugate gradient method. 

When I am satisfied with the code and its performance
on a cluster, I will revisit the SMP issue so that we
might achieve better performance when the number of
processes is not too large (<-8).

Thank you very much.

Shi

T
Shi
--- Satish Balay <balay at mcs.anl.gov> wrote:

> Well some how the inbalance comes up in your
> application run - but not
> in the test example. It is possible that the
> application stresses your
> machine/memory-subsytem a lot more than the test
> code.
> 
> Your machine has a NUMA [Non-unimform memory access]
> - so some
> messages are local [if the memory is local - and
> others can take
> atleast 3 hops trhough the AMD memory/hypertransport
> network. I was
> assuming the delays due to multiple hops might show
> up in this test
> runs I requested. [but it does not].
> 
> So perhaps these multiple hops cause delays only
> when the memort
> network gets stressed - as with your application?
> 
>
http://www.thg.ru/cpu/20040929/images/opteron_8way.gif
> 
> I guess we'll just have to use your app to
> benchmark. Earlier I
> sugested using latest mpich with
> '--device=ch3:sshm'. Another option
> to try is '--with-device=ch3:nemesis'
> 
> To do these experiments - you can build different
> versions of PETSc
> [so that you can switch between them all]. i.e use a
> different value
> for PETSC_ARCH for each build:
> 
> It is possible that some of the load imbalance
> happens before the
> communication stages - but its visible only in the
> scatter state [in
> log_summary]. So to get a better idea on this -
> we'll need a Barrier
> in VecScatterBegin(). Not sure how to do this.
> 
> Barry: does -log_sync add a barrier in vecscatter?
> 
> Also - can you confirm that
> no-one-else/no-other-application is using
> this machine when you perform these measurement
> runs?
> 
> Satish
> 
> On Sat, 10 Feb 2007, Shi Jin wrote:
> 
> > Furthermore, I did a multi-process test on the
> SMP.
> > petscmpirun -n 3 taskset -c 0,2,4 ./ex2 -ksp_type
> cg
> > -log_summary | egrep \(MPI_Send\|MPI_Barrier\)
> > Average time for MPI_Barrier(): 4.19617e-06
> > Average time for zero size MPI_Send(): 3.65575e-06
> > 
> >  petscmpirun -n 4 taskset -c 0,2,4,6 ./ex2
> -ksp_type
> > cg -log_summary | egrep \(MPI_Send\|MPI_Barrier\)
> > Average time for MPI_Barrier(): 1.75953e-05
> > Average time for zero size MPI_Send(): 2.44975e-05
> > 
> >  petscmpirun -n 5 taskset -c 0,2,4,6,8 ./ex2
> -ksp_type
> > cg -log_summary | egrep \(MPI_Send\|MPI_Barrier\)
> > Average time for MPI_Barrier(): 4.22001e-05
> > Average time for zero size MPI_Send(): 2.54154e-05
> > 
> > petscmpirun -n 6 taskset -c 0,2,4,6,8,10 ./ex2
> > -ksp_type cg -log_summary | egrep
> > \(MPI_Send\|MPI_Barrier\)
> > Average time for MPI_Barrier(): 4.87804e-05
> > Average time for zero size MPI_Send(): 1.83185e-05
> > 
> > petscmpirun -n 7 taskset -c 0,2,4,6,8,10,12 ./ex2
> > -ksp_type cg -log_summary | egrep
> > \(MPI_Send\|MPI_Barrier\)
> > Average time for MPI_Barrier(): 2.37942e-05
> > Average time for zero size MPI_Send(): 5.00679e-06
> > 
> > petscmpirun -n 8 taskset -c 0,2,4,6,8,10,12,14
> ./ex2
> > -ksp_type cg -log_summary | egrep
> > \(MPI_Send\|MPI_Barrier\)
> > Average time for MPI_Barrier(): 1.35899e-05
> > Average time for zero size MPI_Send(): 6.73532e-06
> > 
> > They all seem quite fast.
> > Shi
> > 
> > --- Shi Jin <jinzishuai at yahoo.com> wrote:
> > 
> > > Yes. The results follow.
> > > --- Satish Balay <balay at mcs.anl.gov> wrote:
> > > 
> > > > Can you send the optupt from the following
> runs.
> > > You
> > > > can do this with
> > > > src/ksp/ksp/examples/tutorials/ex2.c - to keep
> > > > things simple.
> > > > 
> > > > petscmpirun -n 2 taskset -c 0,2 ./ex2
> -log_summary
> > > |
> > > > egrep \(MPI_Send\|MPI_Barrier\)
> > > Average time for MPI_Barrier(): 1.81198e-06
> > > Average time for zero size MPI_Send():
> 5.00679e-06
> > > > petscmpirun -n 2 taskset -c 0,4 ./ex2
> -log_summary
> > > |
> > > > egrep \(MPI_Send\|MPI_Barrier\)
> > > Average time for MPI_Barrier(): 2.00272e-06
> > > Average time for zero size MPI_Send():
> 4.05312e-06
> > > > petscmpirun -n 2 taskset -c 0,6 ./ex2
> -log_summary
> > > |
> > > > egrep \(MPI_Send\|MPI_Barrier\)
> > > Average time for MPI_Barrier(): 1.7643e-06
> > > Average time for zero size MPI_Send():
> 4.05312e-06
> > > > petscmpirun -n 2 taskset -c 0,8 ./ex2
> -log_summary
> > > |
> > > > egrep \(MPI_Send\|MPI_Barrier\)
> > > Average time for MPI_Barrier(): 2.00272e-06
> > > Average time for zero size MPI_Send():
> 4.05312e-06
> > > > petscmpirun -n 2 taskset -c 0,12 ./ex2
> > > -log_summary
> > > > | egrep \(MPI_Send\|MPI_Barrier\)
> > > Average time for MPI_Barrier(): 1.57356e-06
> > > Average time for zero size MPI_Send():
> 5.48363e-06
> > > > petscmpirun -n 2 taskset -c 0,14 ./ex2
> > > -log_summary
> > > > | egrep \(MPI_Send\|MPI_Barrier\)
> > > Average time for MPI_Barrier(): 2.00272e-06
> > > Average time for zero size MPI_Send():
> 4.52995e-06
> > > I also did 
> > >  petscmpirun -n 2 taskset -c 0,10 ./ex2
> -log_summary
> > > |
> > > egrep \(MPI_Send\|MPI_Barrier\)
> > > Average time for MPI_Barrier(): 5.00679e-06
> > > Average time for zero size MPI_Send():
> 3.93391e-06
> > > 
> > > 
> > > The results are not so different from each
> other.
> > > Also
> > > please note, the timing is not exact, some times
> I
> > > got
> > > O(1e-5) timings for all cases.
> > > I assume these numbers are pretty good, right?
> Does
> > > it
> > > indicate that the MPI communication on a SMP
> machine
> > > is very fast?
> > > I will do a similar test on a cluster and report
> it
> > > back to the list.
> > > 
> > > Shi
> > > 
> > > 
> > > 
> > > 
> > >  
> > >
> >
>
____________________________________________________________________________________
> > > Need Mail bonding?
> > > Go to the Yahoo! Mail Q&A for great tips from
> Yahoo!
> > > Answers users.
> > >
> >
>
http://answers.yahoo.com/dir/?link=list&sid=396546091
> > > 
> > > 
> > 
> > 
> > 
> >  
> >
>
____________________________________________________________________________________
> > Yahoo! Music Unlimited
> > Access over 1 million songs.
> 
=== message truncated ===

____________________________________________________________________________________
Have a burning question?  
Go to www.Answers.yahoo.com and get answers from real people who know.