-malign-double

Mon Nov 16 14:45:24 CST 2009

On Nov 16, 2009, at 2:07 PM, Satish Balay wrote:

> The whole reason for having PetscMalloc2() PetscMalloc3() etc is to
> reduce the number of calls to malloc() - thus hoping for a performance
> boost.
>
> For one - this usage was triggered by old SunOS boxes [where malloc
> took a long time]. We don't know if this is still true for any of the
> current OSes.
>
> With the increased complexity of managing alignment here - is it still
> worth keeping these merged mallocs? [do we still get any payoff from
> this complexity? - instead of relying directly on malloc() for the
> necessary alignment?]

    I would argue yes. Not only is there the issue of improved  
performance of the malloc but it also is a nice way to have clarity in  
the code as to what allocated arrays are associated together.

    Barry

>
> I'm not sure..
>
> Satish
>
> On Mon, 16 Nov 2009, Jed Brown wrote:
>
>> Barry Smith wrote:
>>>
>>>   Agreed this would be a good option to have. The question is how  
>>> to do
>>> it without having a morass of nasty nested if-defs.  Note that  
>>> portably
>>> getting alignment out of malloc()  alone is already ugly and not as
>>> simple a code as I would like.
>>>
>>>    To simplify things we could always require 16 byte alignment
>>> everywhere? But is that a desirable?
>>
>> I doubt it's harmful, I don't think I understand where the nested  
>> ifdefs
>> come in.  The required alignment only needs to be stated in one  
>> place, I
>> have something like the following in one of my projects.
>>
>>  /* current */
>> #define  
>> PetscMalloc3(m1,t1,r1,m2,t2,r2,m3,t3,r3)                        \
>>    (PetscMalloc((m1)*sizeof(t1)+(m2)*sizeof(t2)+ 
>> (m3)*sizeof(t3),r1)    \
>>     || (*(r2) = (t2*) 
>> (*(r1)+m1),                                       \
>>         *(r3) = (t3*)(*(r2)+m2),0))
>>
>>  /* aligned */
>> #define  
>> PetscMalloc3(m1,t1,r1,m2,t2,r2,m3,t3,r3)                        \
>>  (PetscMalloc((m1)*sizeof(t1)+(m2)*sizeof(t2)+ 
>> (m3)*sizeof(t3)+2*(PETSC_MEMALIGN-1),r1) \
>>   || (*(r2) =  
>> (t2*)PETSC_ALIGN(*(r1)+m1),                              \
>>       *(r3) = (t3*)PETSC_ALIGN(*(r2)+m2),0))
>>
>> #define PETSC_ALIGN(p) PetscNextAligned((uintptr_t) 
>> (p),PETSC_MEMALIGN-1)
>>
>> static inline void *PetscNextAligned(uintptr_t base,uintptr_t mask)
>> {return (void*)((base + mask) & ~mask);}
>>
>>
>> Note that this is a no-op if PETSC_MEMALIGN=1 and thus compiles to
>> exactly what we have now.
>>
>> Jed
>>
>>
>