-malign-double
Barry Smith
bsmith at mcs.anl.gov
Mon Nov 16 14:45:24 CST 2009
On Nov 16, 2009, at 2:07 PM, Satish Balay wrote:
> The whole reason for having PetscMalloc2() PetscMalloc3() etc is to
> reduce the number of calls to malloc() - thus hoping for a performance
> boost.
>
> For one - this usage was triggered by old SunOS boxes [where malloc
> took a long time]. We don't know if this is still true for any of the
> current OSes.
>
> With the increased complexity of managing alignment here - is it still
> worth keeping these merged mallocs? [do we still get any payoff from
> this complexity? - instead of relying directly on malloc() for the
> necessary alignment?]
I would argue yes. Not only is there the issue of improved
performance of the malloc but it also is a nice way to have clarity in
the code as to what allocated arrays are associated together.
Barry
>
> I'm not sure..
>
> Satish
>
> On Mon, 16 Nov 2009, Jed Brown wrote:
>
>> Barry Smith wrote:
>>>
>>> Agreed this would be a good option to have. The question is how
>>> to do
>>> it without having a morass of nasty nested if-defs. Note that
>>> portably
>>> getting alignment out of malloc() alone is already ugly and not as
>>> simple a code as I would like.
>>>
>>> To simplify things we could always require 16 byte alignment
>>> everywhere? But is that a desirable?
>>
>> I doubt it's harmful, I don't think I understand where the nested
>> ifdefs
>> come in. The required alignment only needs to be stated in one
>> place, I
>> have something like the following in one of my projects.
>>
>> /* current */
>> #define
>> PetscMalloc3(m1,t1,r1,m2,t2,r2,m3,t3,r3) \
>> (PetscMalloc((m1)*sizeof(t1)+(m2)*sizeof(t2)+
>> (m3)*sizeof(t3),r1) \
>> || (*(r2) = (t2*)
>> (*(r1)+m1), \
>> *(r3) = (t3*)(*(r2)+m2),0))
>>
>> /* aligned */
>> #define
>> PetscMalloc3(m1,t1,r1,m2,t2,r2,m3,t3,r3) \
>> (PetscMalloc((m1)*sizeof(t1)+(m2)*sizeof(t2)+
>> (m3)*sizeof(t3)+2*(PETSC_MEMALIGN-1),r1) \
>> || (*(r2) =
>> (t2*)PETSC_ALIGN(*(r1)+m1), \
>> *(r3) = (t3*)PETSC_ALIGN(*(r2)+m2),0))
>>
>> #define PETSC_ALIGN(p) PetscNextAligned((uintptr_t)
>> (p),PETSC_MEMALIGN-1)
>>
>> static inline void *PetscNextAligned(uintptr_t base,uintptr_t mask)
>> {return (void*)((base + mask) & ~mask);}
>>
>>
>> Note that this is a no-op if PETSC_MEMALIGN=1 and thus compiles to
>> exactly what we have now.
>>
>> Jed
>>
>>
>
More information about the petsc-dev
mailing list