[petsc-users] Advice on OpenMP/PETSc mix
Mohammad Mirzadeh
mirzadeh at gmail.com
Fri Apr 20 13:31:52 CDT 2012
Hi guys,
I have seen multiple emails regarding this in the mailing list and I'm
afraid you might have already answered this question but I'm not quite sure!
I have objects in my code that are hard(er) to parallelize using MPI and so
far my strategy has been to just handle them in serial such that each
process has a copy of the whole thing. This object is related to my grid
generation/information etc so it only needs to be done once at
the beginning (no moving mesh for NOW). As a result I do not care much
about the speed since its nothing compared to the overall solution time.
However, I do care about the memory that this object consumes and can limit
my problem size.
So I had the following idea the other day. Is it possible/good idea to
paralleize the grid generation using OpenMP so that each node (as opposed
to core) would share the data structure? This can save me a lot since
memory on nodes are shared among cores (e.g. 32 GB/node vs 2GB/core on
Ranger). What I'm not quite sure about is how the job is scheduled when
running the code via mpirun -n Np. Should Np be the total number of cores
or nodes?
If I use, say Np = 16 processes on one node, MPI is running 16 versions of
the code on a single node (which has 16 cores). How does OpenMP figure out
how to fork? Does it fork a total of 16 threads/MPI process = 256 threads
or is it smart to just fork a total of 16 threads/node = 1 thread/core = 16
threads? I'm a bit confused here how the job is scheduled when MPI and
OpenMP are mixed?
Do I make any sense at all?!
Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120420/cd466a9c/attachment.htm>
More information about the petsc-users
mailing list