On 2/2/07, Shi Jin <jinzishuai at> wrote:
> > There is a point which is not clear for me.
> >
> > When you run in your shared-memory machine...
> >
> > - Are you running your as a 'sequential' program
> > with a global,shared
> > memory space?
> >
> > - Or are you running it through MPI, as a
> > distributed memory
> > application using MPI message passing (where shared
> > mem is the
> > underlying communication 'channel') ?
> Thank you for replying.
> I run the code on a shared memory machine through MPI,
> just like what I do on a cluster. I simply did:
> petscmpirun -np 18 ./code
> I am not 100% sure whether MPICH-2 will automatically
> use shared memory as the underlying commnunication
> channel instead of the network but I know most MPI
> implementations are smart enough to do so (like
> LAM-MPI I used before). Could anyone confirm this?
> Thank you.

This is missing the point I think. It is just as Satish pointed out.
Sparse matrix multiply is completely dominated by memory bandwidth
and the shared memory machine has contention between the processes.
I guarantee you that the performance problem is in the effective memory
bandwidth per process.


