Petsc on Blue Gene

Satish Balay balay at mcs.anl.gov
Wed Jul 11 11:34:33 CDT 2007


On Wed, 11 Jul 2007, Brian Biskeborn wrote:

> > Can you send a log of these messages? Is this on BGL or BGP? Does the
> > program abort? [on encountering these messages]
> 
> The program does not abort on exceptions - the only evidence of the problem
> is messages in the event log reading "Kernel detected X floating point
> alignment exceptions" (where X is a number usually on the order of 10^5)
> followed by what looks like a series of register values. I'm running on
> BGL.

Is this event log in some system logs that users have no access to?
Where is this logfile? [I'm guessing its neither JOBID.output nor
JOBID.error]

> 
> > With the minimal runs I've done on BGL - I don't remember seing any
> > such messages.
> 
> > [Barry can confirm this] the code in mal.c attempts to make sure the
> > memory allocated by PETSc is aligned properly. [8 byte boundary for
> > doubles]
> 
> > One possibility is that the data passed in to MatAssemblyBegin() is
> > not aligned?
> 
> This says to me that the unaligned data is probably being generated outside
> of Petsc. Thanks for the info, I now have a much better idea about where to
> look for the problem!

If the problem exists in PETSc, it should be reporduceable with a
PETSc example [perhaps mat/examples/tests/ex2.c - which does
MatSetValues()]

cqsub -n 2 -t 2 ex9

Satish




More information about the petsc-users mailing list