[petsc-dev] Fwd: Poisson step in GTS

Barry Smith bsmith at mcs.anl.gov
Sun Jun 19 18:11:09 CDT 2011


On Jun 19, 2011, at 3:34 PM, Jed Brown wrote:

> On Sun, Jun 19, 2011 at 21:39, Barry Smith <bsmith at mcs.anl.gov> wrote:
> Huhh? VecDot() {if n is >> big use 2 threads else use 1} I don't see why that is hard?
> 
> VecMAXPY() when some vectors were faulted with different affinity. Most any use of VecPlaceArray(). Any bubbling of threads to a higher level (e.g. if all thread dispatch is not strictly done at the finest level of granularity). Client code that uses a different affinity during residual evaluation. Matrix preallocation with variation in row length. Index sets have different sizes than vectors.

   As Vec's can now track if the memory or GPU memory is valid can we not add info to the Vec (and Mat) indicating the memory "affinity" etc then dispatch different versions based on that?  For example a VecPlaceArray() would mark the affinity as "unknown" or something. 

   Barry


> 
> > A related matter that I keep harping on is that the memory hierarchy is very non-uniform. In the old days, it was reasonably uniform within a socket, but some of the latest hardware has multiple dies within a socket, each with more-or-less independent memory buses.
> 
>  So what is the numa.h you've been using. If we allocate vector arrays and matrix arrays then does that give you the locality?
> 
> That lets you specify explicitly at allocation time how you want the memory mapped. This can be achieved, more-or-less, by spawning a suitable number of OpenMP (or other paradigm) threads, making sure the OS/environment was configured so that they will have the affinity you desire, partitioning their work load as you want, and faulting the memory.
> 
> But numa.h also has primitives to move the physical pages associated with memory that you have allocated, e.g. numa_move_pages(), as well as query the mapping of other memory. If every platform supported libnuma (it's Linux-only), I think we would be a lot better off. We could build a slightly higher level abstraction on libnuma and have predictable, debuggable mapping of memory.
> 
> One option is to experiment and build this higher level abstraction using libnuma with a default implementation that does something less reliable on platforms without libnuma (non-Linux). Some primitives like numa_move_pages() are not at all available, so they would have to just do nothing and suffer the performance consequences.
> 
>   BTW: If it doesn't do it yet, ./configure needs to check for numa.h and do PETSC_HAVE_NUMA_H
> 
> It doesn't, but I agree.




More information about the petsc-dev mailing list