Can PETSc detect the number of CPUs on each computer node?
balay at mcs.anl.gov
Tue Jun 16 13:30:42 CDT 2009
On Tue, 16 Jun 2009, Alex Peyser wrote:
> I had a question on what is the best approach for this. Most of the time is
> spent inside of BLAS, correct?
Not really. PETSc uses a bit of blas1 operations - that should
poerhaps account for arround 10-20% of runtime [depending upon
application. Check for Vec operations in -log_summary. They are
usually blas calls]
> So wouldn't you maximize your operations by
> running one MPI/PETSC job per board (per shared memory), and use a
> multi-threaded BLAS that matches your board? You should cut down
> communications by some factor proportional to the number of threads per
> board, and the BLAS itself should better optimize most of your operations
> across the board, rather than relying on higher order parallelisms.
If the issue is memorybandwidth - then it affects threads or processes
And if the algorithm needs some data sharing - there is cost
associated with explicit communication [MPI] vs implicit data-sharing
[shared memory] due to cache conflcits and other synchronization thats
There could be implementation inefficiencies between threads vs procs,
mpi vs openmp that might tilt things in favor of one approach or the
other - But I don't think it should be big margin..
More information about the petsc-users