Can PETSc detect the number of CPUs on each computer node?

Satish Balay balay at
Tue Jun 16 13:30:42 CDT 2009

On Tue, 16 Jun 2009, Alex Peyser wrote:

> I had a question on what is the best approach for this. Most of the time is 
> spent inside of BLAS, correct?

Not really. PETSc uses a bit of blas1 operations - that should
poerhaps account for arround 10-20% of runtime [depending upon
application. Check for Vec operations in -log_summary. They are
usually blas calls]

> So wouldn't you maximize your operations by 
> running one MPI/PETSC job per board (per shared memory), and use a 
> multi-threaded BLAS that matches your board? You should cut down 
> communications by some factor proportional to the number of threads per 
> board, and the BLAS itself should better optimize most of your operations 
> across the board, rather than relying on higher order parallelisms.

If the issue is memorybandwidth - then it affects threads or processes
[MPI] equally.

And if the algorithm needs some data sharing - there is cost
associated with explicit communication [MPI] vs implicit data-sharing
[shared memory] due to cache conflcits and other synchronization thats

There could be implementation inefficiencies between threads vs procs,
mpi vs openmp that might tilt things in favor of one approach or the
other - But I don't think it should be big margin..


More information about the petsc-users mailing list