[petsc-users] [Supercomputing Lab #847] VecView() error in BlueGene

Satish Balay balay at mcs.anl.gov
Wed Apr 28 07:39:20 CDT 2010


This indicates the errros you got earlier were alignment
errors. BG_MAXALIGNEXP=-1 get the code running - but a bit
inefficiently.

We have to track them down in a debugger to determine the location.

If BG_MAXALIGNEXP=0 is used - the code will give BUS error at
the first mis-aligned access. Then we have to use the debugger
with the core file and determine the location of the crash - and
try to figure out which memory is misaligned.

Satish

On Wed, 28 Apr 2010, (Rebecca) Xuefei YUAN wrote:

> Dear Satish,
> 
> After I use this env set, it is right and no error comes out.
> 
> Thanks very much!
> 
> Rebecca
> 
> 
> Quoting Satish Balay <balay at mcs.anl.gov>:
> 
> > It should be used at runtime. For ex on our bgl - we use qsub as:
> > 
> > qsub --env BG_MAXALIGNEXP=-1 -t 00:15:00 -n 4 ./ex2
> > 
> > Satish
> > 
> > 
> > On Tue, 27 Apr 2010, (Rebecca) Xuefei YUAN wrote:
> > 
> > > Dear Satish,
> > > 
> > > I think it is the latest petsc-dev, but I can check.
> > > 
> > > How could I use the env variable?
> > > 
> > > Do I need to put this
> > > 
> > > 'BG_MAXALIGNEXP=-1'
> > > 
> > > in the configure.py?
> > > 
> > > Thanks a lot!
> > > 
> > > Rebecca
> > > 
> > > 
> > > Quoting Satish Balay <balay at mcs.anl.gov>:
> > > 
> > > > Is this latest petsc-dev?
> > > > 
> > > > Can you try using the following env variable and see if the error
> > > > message
> > > > goes away?
> > > > 
> > > > 'BG_MAXALIGNEXP=-1'
> > > > 
> > > > 
> > > > We still have to find the source of the problem..
> > > > 
> > > > Satish
> > > > 
> > > > On Tue, 27 Apr 2010, Aron Ahmadia wrote:
> > > > 
> > > > > I'll take a look at this and report back Rebecca.  I was seeing
> > > > > similar bus errors on some PETSc example code calling VecView and we
> > > > > haven't tracked it down yet.
> > > > >
> > > > > A
> > > > >
> > > > > On Tue, Apr 27, 2010 at 5:20 AM, Xuefei YUAN via RT
> > > > > <shaheen-help at kaust.edu.sa> wrote:
> > > > > >
> > > > > > Tue Apr 27 12:20:11 2010: Request 847 was acted upon.
> > > > > >  Transaction: Ticket created by xy2102 at columbia.edu
> > > > > >       Queue: Shaheen
> > > > > >     Subject: VecView() error in BlueGene
> > > > > >       Owner: Nobody
> > > > > >  Requestors: xy2102 at columbia.edu
> > > > > >      Status: new
> > > > > >  Ticket <URL:
> > > > http://www.hpc.kaust.edu.sa/rt/Ticket/Display.html?id=847
> > > > > > >
> > > > > >
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I tried the example code
> > > > /petsc-dev/src/snes/examples/tutorials/ex5.c
> > > > > > on bluegene, but get the following message about the VecView().
> > > > > >
> > > > > > [0]PETSC ERROR:
> > > > > >
> > > > ------------------------------------------------------------------------
> > > > > > [0]PETSC ERROR: Caught signal number 7 BUS: Bus Error, possibly
> > > > > > illegal memory access
> > > > > > [0]PETSC ERROR: Try option -start_in_debugger or
> > > > > > -on_error_attach_debugger
> > > > > > [0]PETSC ERROR: or see
> > > > > >
> > > > http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrind[0]PETSC
> > > > > > ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to
> > > > > > find memory corruption errors
> > > > > > [0]PETSC ERROR: likely location of problem given in stack below
> > > > > > [0]PETSC ERROR: ---------------------  Stack Frames
> > > > > > ------------------------------------
> > > > > > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not
> > > > > > available,
> > > > > > [0]PETSC ERROR:       INSTEAD the line number of the start of the
> > > > > > function
> > > > > > [0]PETSC ERROR:       is given.
> > > > > > [0]PETSC ERROR: [0] VecView_MPI line 801
> > > > src/vec/vec/impls/mpi/pdvec.c
> > > > > > [0]PETSC ERROR: [0] VecView line 690 src/vec/vec/interface/vector.c
> > > > > > [0]PETSC ERROR: [0] DAView_VTK line 129 src/dm/da/src/daview.c
> > > > > > [0]PETSC ERROR: [0] DAView line 227 src/dm/da/src/daview.c
> > > > > > [0]PETSC ERROR: --------------------- Error Message
> > > > > > ------------------------------------
> > > > > > [0]PETSC ERROR: Signal received!
> > > > > > [0]PETSC ERROR:
> > > > > >
> > > > ------------------------------------------------------------------------
> > > > > > [0]PETSC ERROR: Petsc Development HG revision: unknown HG  Date:
> > > > unknown
> > > > > > [0]PETSC ERROR: See docs/changes/index.html for recent updates.
> > > > > > [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> > > > > > [0]PETSC ERROR: See docs/index.html for manual pages.
> > > > > > [0]PETSC ERROR:
> > > > > >
> > > > ------------------------------------------------------------------------
> > > > > > [0]PETSC ERROR: ./ex5.exe on a arch-bgp- named ionode37 by Unknown
> > > > Thu
> > > > > > Jan  1 03:03:12 1970
> > > > > > [0]PETSC ERROR: Libraries linked from
> > > > > > /scratch/rebeccaxyf/petsc-dev/arch-bgp-ibm-dbg/lib
> > > > > > [0]PETSC ERROR: Configure run at Mon Apr 26 15:37:29 2010
> > > > > > [0]PETSC ERROR: Configure options --with-cc=mpixlc_r
> > > > > > --with-cxx=mpixlcxx_r --with-fc=mpixlf90_r --with-clanguage=cxx
> > > > > >
> > > > --with-blas-lapack-lib="-L/opt/share/math_libraries/lapack/ppc64/IBM/
> > > > > > -llapack -lblas" --with-x=0 --with-is-color-value-type=short
> > > > > > --with-shared=0 -CFLAGS="-qmaxmem=-1 -g" -CXXFLAGS="-qmaxmem=-1 -g"
> > > > > > -FFLAGS="-qmaxmem=-1 -g" --with-debugging=1 --with-fortran-kernels=1
> > > > > > --with-batch=1 --known-mpi-shared=0 --known-memcmp-ok
> > > > > > --known-sizeof-char=1 --known-sizeof-void-p=4 --known-sizeof-short=2
> > > > > > --known-sizeof-int=4 --known-sizeof-long=4 --known-sizeof-size_t=4
> > > > > > --known-sizeof-long-long=8 --known-sizeof-float=4
> > > > > > --known-sizeof-double=8 --known-bits-per-byte=8
> > > > > > --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4
> > > > > > --known-mpi-long-double=1 --known-level1-dcache-assoc=0
> > > > > > --known-level1-dcache-linesize=32 --known-level1-dcache-size=32768
> > > > > > --petsc-arch=bgp-dbg PETSC_ARCH=arch-bgp-ibm-dbg
> > > > > > [0]PETSC ERROR:
> > > > > >
> > > > ------------------------------------------------------------------------
> > > > > > [0]PETSC ERROR: User provided function() line 0 in unknown directory
> > > > > > unknown file
> > > > > > Abort(59) on node 0 (rank 0 in comm 1140850688): application called
> > > > > > MPI_Abort(MPI_COMM_WORLD, 59) - process 0
> > > > > >
> > > > > >
> > > > > > Also, my code is not working for calling VecView() with error
> > > > message:
> > > > > >
> > > > > > [0]PETSC ERROR:
> > > > > >
> > > > ------------------------------------------------------------------------
> > > > > > [0]PETSC ERROR: Caught signal number 7 BUS: Bus Error, possibly
> > > > > > illegal memory access
> > > > > > [0]PETSC ERROR: Try option -start_in_debugger or
> > > > > > -on_error_attach_debugger
> > > > > > [0]PETSC ERROR: or see
> > > > > >
> > > > http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrind[0]PETSC
> > > > > > ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to
> > > > > > find memory corruption errors
> > > > > > [0]PETSC ERROR: likely location of problem given in stack below
> > > > > > [0]PETSC ERROR: ---------------------  Stack Frames
> > > > > > ------------------------------------
> > > > > > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not
> > > > > > available,
> > > > > > [0]PETSC ERROR:       INSTEAD the line number of the start of the
> > > > > > function
> > > > > > [0]PETSC ERROR:       is given.
> > > > > > [0]PETSC ERROR: [0] VecView_MPI_DA line 508 src/dm/da/src/gr2.c
> > > > > > [0]PETSC ERROR: [0] VecView line 690 src/vec/vec/interface/vector.c
> > > > > > [0]PETSC ERROR: [0] DumpSolutionToMatlab line 1052 twqt2ff.c
> > > > > > [0]PETSC ERROR: [0] Solve line 235 twqt2ff.c
> > > > > > [0]PETSC ERROR: --------------------- Error Message
> > > > > > ------------------------------------
> > > > > > [0]PETSC ERROR: Signal received!
> > > > > > [0]PETSC ERROR:
> > > > > >
> > > > ------------------------------------------------------------------------
> > > > > > [0]PETSC ERROR: Petsc Development HG revision: unknown HG  Date:
> > > > unknown
> > > > > > [0]PETSC ERROR: See docs/changes/index.html for recent updates.
> > > > > > [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> > > > > > [0]PETSC ERROR: See docs/index.html for manual pages.
> > > > > > [0]PETSC ERROR:
> > > > > >
> > > > ------------------------------------------------------------------------
> > > > > > [0]PETSC ERROR: ./twqt2ff.exe on a arch-bgp- named ionode132 by
> > > > > > Unknown Fri Jan  2 00:44:26 1970
> > > > > > [0]PETSC ERROR: Libraries linked from
> > > > > > /scratch/rebeccaxyf/petsc-dev/arch-bgp-ibm-dbg/lib
> > > > > > [0]PETSC ERROR: Configure run at Mon Apr 26 15:37:29 2010
> > > > > > [0]PETSC ERROR: Configure options --with-cc=mpixlc_r
> > > > > > --with-cxx=mpixlcxx_r --with-fc=mpixlf90_r --with-clanguage=cxx
> > > > > >
> > > > --with-blas-lapack-lib="-L/opt/share/math_libraries/lapack/ppc64/IBM/
> > > > > > -llapack -lblas" --with-x=0 --with-is-color-value-type=short
> > > > > > --with-shared=0 -CFLAGS="-qmaxmem=-1 -g" -CXXFLAGS="-qmaxmem=-1 -g"
> > > > > > -FFLAGS="-qmaxmem=-1 -g" --with-debugging=1 --with-fortran-kernels=1
> > > > > > --with-batch=1 --known-mpi-shared=0 --known-memcmp-ok
> > > > > > --known-sizeof-char=1 --known-sizeof-void-p=4 --known-sizeof-short=2
> > > > > > --known-sizeof-int=4 --known-sizeof-long=4 --known-sizeof-size_t=4
> > > > > > --known-sizeof-long-long=8 --known-sizeof-float=4
> > > > > > --known-sizeof-double=8 --known-bits-per-byte=8
> > > > > > --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4
> > > > > > --known-mpi-long-double=1 --known-level1-dcache-assoc=0
> > > > > > --known-level1-dcache-linesize=32 --known-level1-dcache-size=32768
> > > > > > --petsc-arch=bgp-dbg PETSC_ARCH=arch-bgp-ibm-dbg
> > > > > > [0]PETSC ERROR:
> > > > > >
> > > > ------------------------------------------------------------------------
> > > > > > [0]PETSC ERROR: User provided function() line 0 in unknown directory
> > > > > > unknown file
> > > > > > Abort(59) on node 0 (rank 0 in comm 1140850688): application called
> > > > > > MPI_Abort(MPI_COMM_WORLD, 59) - process 0
> > > > > >
> > > > > >
> > > > > > Am I wrong at configure?
> > > > > >
> > > > > > Thanks a lot!
> > > > > >
> > > > > > Rebecca
> > > > > >
> > > > > >
> > > > > > --
> > > > > > (Rebecca) Xuefei YUAN
> > > > > > Department of Applied Physics and Applied Mathematics
> > > > > > Columbia University
> > > > > > Tel:917-399-8032
> > > > > > www.columbia.edu/~xy2102
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > 
> > > 
> > > 
> > > 
> > > 
> > 
> 
> 
> 
> 


More information about the petsc-users mailing list