Petsc on Blue Gene

Barry Smith bsmith at mcs.anl.gov
Wed Jul 11 22:06:09 CDT 2007


  PetscMalloc() is designed so that all mallocs are
double alligned. PetscTrMallocDefault() calls PetscMallocAlign()
in src/sys/memory/mal.c

  How could something go wrong? If  PETSC_HAVE_DOUBLE_ALIGN_MALLOC
is set but mallocs are not always double aligned there will be a problem.
If the system call memalign() is broken. If there is a bug in our
code that tries to manually align at 8 bytes.

   Barry


On Wed, 11 Jul 2007, Brian Biskeborn wrote:

> > How do you know the location of these exceptions? Can you narrow down
> further
> > to the correct function name/source line?
> 
> I found the locations of the exceptions by forcing an abort at various
> places in the code and counting the exceptions.
> 
> Line 62 (counting from 1) of ex2.c generates 10 errors:
> ierr = MatView(mat,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr);
> 
> Line 71 generates 12 errors:
> ierr = MatTranspose(mat,&tmat);CHKERRQ(ierr);;
> 
> Line 82 generates 10 errors:
> ierr = MatView(tmat,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr);
> 
> Line 91 generates 10 errors:
> ierr = MatView(tmat,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr);
> 
> So the exceptions are occurring in MatView and MatTranspose here.
> 
> > Also do you use --with-debugging=0 for this build? Do you get the smae
> > errors wih '--with-debugging=1' build?
> 
> I've been running with debugging=0, but the same errors occur with
> debugging=1.
> 
> I have also improved my understanding of Blue Gene's alignment
> requirements: experimentally, it looks like double values must be 4-byte
> aligned, but they cannot cross a 16-byte boundary. That is, the address of
> a double must be 0, 4, or 8 modulo 16. So if everything is indeed 8-byte
> aligned, there should be no problem.
> 
> Lisandro:
> The compiler guarantees proper alignment of stack-allocated and
> statically-allocated data. Also, I think the Blue Gene implementation of
> malloc always returns 16-byte aligned addresses. That means the only way to
> get floating point exceptions is to use malloc'ed memory in such a way that
> alignment is disrupted.
> 
> Brian
> 
> 




More information about the petsc-users mailing list