-malign-double

Satish Balay balay at mcs.anl.gov
Mon Nov 16 14:07:22 CST 2009


The whole reason for having PetscMalloc2() PetscMalloc3() etc is to
reduce the number of calls to malloc() - thus hoping for a performance
boost.

For one - this usage was triggered by old SunOS boxes [where malloc
took a long time]. We don't know if this is still true for any of the
current OSes.

With the increased complexity of managing alignment here - is it still
worth keeping these merged mallocs? [do we still get any payoff from
this complexity? - instead of relying directly on malloc() for the
necessary alignment?]

I'm not sure..

Satish

On Mon, 16 Nov 2009, Jed Brown wrote:

> Barry Smith wrote:
> > 
> >    Agreed this would be a good option to have. The question is how to do
> > it without having a morass of nasty nested if-defs.  Note that portably
> > getting alignment out of malloc()  alone is already ugly and not as
> > simple a code as I would like.
> > 
> >     To simplify things we could always require 16 byte alignment
> > everywhere? But is that a desirable?
> 
> I doubt it's harmful, I don't think I understand where the nested ifdefs
> come in.  The required alignment only needs to be stated in one place, I
> have something like the following in one of my projects.
> 
>   /* current */
> #define PetscMalloc3(m1,t1,r1,m2,t2,r2,m3,t3,r3)                        \
>     (PetscMalloc((m1)*sizeof(t1)+(m2)*sizeof(t2)+(m3)*sizeof(t3),r1)    \
>      || (*(r2) = (t2*)(*(r1)+m1),                                       \
>          *(r3) = (t3*)(*(r2)+m2),0))
> 
>   /* aligned */
> #define PetscMalloc3(m1,t1,r1,m2,t2,r2,m3,t3,r3)                        \
>   (PetscMalloc((m1)*sizeof(t1)+(m2)*sizeof(t2)+(m3)*sizeof(t3)+2*(PETSC_MEMALIGN-1),r1) \
>    || (*(r2) = (t2*)PETSC_ALIGN(*(r1)+m1),                              \
>        *(r3) = (t3*)PETSC_ALIGN(*(r2)+m2),0))
> 
> #define PETSC_ALIGN(p) PetscNextAligned((uintptr_t)(p),PETSC_MEMALIGN-1)
> 
> static inline void *PetscNextAligned(uintptr_t base,uintptr_t mask)
> {return (void*)((base + mask) & ~mask);}
> 
> 
> Note that this is a no-op if PETSC_MEMALIGN=1 and thus compiles to
> exactly what we have now.
> 
> Jed
> 
> 




More information about the petsc-dev mailing list