NetBSD port

Satish Balay balay at mcs.anl.gov
Wed Dec 16 09:11:01 CST 2009


Lets ignore 'Sun Grid Engine environment' initially and just figureout
your PETSc install.

- What MPI is it built with? Send us the output for the compile of ex19

- you claim 'make test' worked fine - i.e this example ran fine
paralley.  can you confrim this with manual run?

[if thats the case - then PETSc would be working correctly with the
MPI specified]



>From the info below -- the example crashes happen only in 'Sun Grid
Engine environment' What is that? And why should binaries compiled
with this default 'MPI' - work in that grid enviornment - without
recompiling with a different 'sun-grid-mpi' ?


Satish


On Wed, 16 Dec 2009, Kevin.Buckley at ecs.vuw.ac.nz wrote:

> Hi again,
> 
> I though tI had got things working but maybe not, not completely,
> anyway.
> 
> I did this and stuff worked:
> 
> PETSC_DIR=$PWD; export PETSC_DIR
> ./configure  --with-c++-support --with-hdf5=/usr/pkg
> --prefix=/vol/grid/pkg/petsc-3.0.0-p7
> PETSC_ARCH=netbsdelf5.0.-c-debug; export PETSC_ARCH
> make all
> make install
> make test
> cd src/snes/examples/tutorials/
> make ex19
> ./ex19 -contours
> 
> Nice pictures!
> 
> I then moved the example ex19 source and the makefile out of the
> distribution tree to somwhere else and built it against the
> installed stuff and ran it: that worked too.
> 
> export PETSC_DIR=/vol/grid/pkg/petsc-3.0.0-p7
> make ex19
> ./ex19 -dmmg_nlevels 4 -snes_monitor_draw
> ./ex19 -contours
> 
> 
> I then built the package that needs PETSc, PISM, from Univ Alaska at
> Fairbanks, and ran that.
> 
> What I then found is that the PISM stuff would fail if we launched it
> into an Sun Grid Engine environment with more than TWO processors,
> 
> It also ran if simply mpiexec-d onto a four-processor machine but
> not onto a four-machine grid.
> 
> I saw this block of error messages from a 4-node submission
> 
> [2]PETSC ERROR:
> ------------------------------------------------------------------------
> [2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
> probably memory access out of range
> [2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [2]PETSC ERROR: or see
> http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[2]PETSC
> ERROR: or try http://valgrind.org on linux or man libgmalloc on Apple to
> find memory corruption errors
> [2]PETSC ERROR: likely location of problem given in stack below
> [2]PETSC ERROR: ---------------------  Stack Frames
> ------------------------------------
> [2]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
> [2]PETSC ERROR:       INSTEAD the line number of the start of the function
> [2]PETSC ERROR:       is given.
> [2]PETSC ERROR: [2] VecScatterCreateCommon_PtoS line 1699
> src/vec/vec/utils/vpscat.c
> [2]PETSC ERROR: [2] VecScatterCreate_PtoS line 1508
> src/vec/vec/utils/vpscat.c
> [2]PETSC ERROR: [2] VecScatterCreate line 833 src/vec/vec/utils/vscat.c
> [2]PETSC ERROR: [2] DACreate2d line 338 src/dm/da/src/da2.c
> [2]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [2]PETSC ERROR: Signal received!
> [2]PETSC ERROR:
> ------------------------------------------------------------------------
> [2]PETSC ERROR: Petsc Release Version 3.0.0, Patch 7, Mon Jul  6 11:33:34
> CDT 2009
> [2]PETSC ERROR: See docs/changes/index.html for recent updates.
> [2]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [2]PETSC ERROR: See docs/index.html for manual pages.
> [2]PETSC ERROR:
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD
> with errorcode 59.
> 
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --------------------------------------------------------------------------
> ------------------------------------------------------------------------
> [2]PETSC ERROR: /vol/grid/pkg/pism-0.2.1/bin/pismv on a netbsdelf named
> citron.ecs.vuw.ac.nz by golledni Wed Dec 16 15:49:09 2009
> [2]PETSC ERROR: Libraries linked from /vol/grid/pkg/petsc-3.0.0-p7/lib
> [2]PETSC ERROR: Configure run at Mon Dec 14 17:02:49 2009
> [2]PETSC ERROR: Configure options --with-c++-support --with-hdf5=/usr/pkg
> --prefix=/vol/grid/pkg/petsc-3.0.0-p7 --with-shared=0
> [2]PETSC ERROR:
> ------------------------------------------------------------------------
> [2]PETSC ERROR: User provided function() line 0 in unknown directory
> unknown file
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 2 with PID 4365 on
> node citron.ecs.vuw.ac.nz exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> --------------------------------------------------------------------------
> 
> 
> and this block of messages from an 8-node submission
> 
> 
> [3]PETSC ERROR:
> ------------------------------------------------------------------------
> [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
> probably memory access out of range
> [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [3]PETSC ERROR: or see
> http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[3]PETSC
> ERROR: or try http://valgrind.org on linux or man libgmalloc on Apple to
> find memory corruption errors
> [3]PETSC ERROR: likely location of problem given in stack below
> [3]PETSC ERROR: ---------------------  Stack Frames
> ------------------------------------
> [2]PETSC ERROR:
> ------------------------------------------------------------------------
> [2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
> probably memory access out of range
> [2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [2]PETSC ERROR: or see
> http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[2]PETSC
> ERROR: or try http://valgrind.org on linux or man
>  libgmalloc on Apple to find memory corruption errors
> [2]PETSC ERROR: likely location of problem given in stack below
> [2]PETSC ERROR: ---------------------  Stack Frames
> ------------------------------------
> 
> 
> 
> I then went back and tried to run the PETSc example and found similar
> happenings, things run when submitted to a two-node "grid" but not a
> four-node one, the error message block being:
> 
> [0]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [0]PETSC ERROR: Out of memory. This could be due to allocating
> [0]PETSC ERROR: too large an object or bleeding by not properly
> [0]PETSC ERROR: destroying unneeded objects.
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> with errorcode 1.
> 
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --------------------------------------------------------------------------
> [0]PETSC ERROR: Memory allocated 90628 Memory used by process 0
> [0]PETSC ERROR: Try running with -malloc_dump or -malloc_log for info.
> [0]PETSC ERROR: Memory requested 320!
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Petsc Release Version 3.0.0, Patch 7, Mon Jul  6 11:33:34
> CDT 2009
> [0]PETSC ERROR: See docs/changes/index.html for recent updates.
> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [0]PETSC ERROR: See docs/index.html for manual pages.
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: /home/rialto1/kingstlind/kevin/PETSc/ex19 on a netbsdelf
> named petit-lyon.ecs.vuw.ac.nz by kingstlind Wed Dec 16 16:45:39 2009
> [0]PETSC ERROR: Libraries linked from /vol/grid/pkg/petsc-3.0.0-p7/lib
> [0]PETSC ERROR: Configure run at Mon Dec 14 17:02:49 2009
> [0]PETSC ERROR: Configure options --with-c++-support --with-hdf5=/usr/pkg
> --prefix=/vol/grid/pkg/petsc-3.0.0-p7 --with-shared=0
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: PetscMallocAlign() line 61 in src/sys/memory/mal.c
> [0]PETSC ERROR: PetscTrMallocDefault() line 194 in src/sys/memory/mtr.c
> [0]PETSC ERROR: PetscFListAdd() line 235 in src/sys/dll/reg.c
> [0]PETSC ERROR: MatRegister() line 140 in src/mat/interface/matreg.c
> [0]PETSC ERROR: MatRegisterAll() line 106 in src/mat/interface/matregis.c
> [0]PETSC ERROR: MatInitializePackage() line 54 in
> src/mat/interface/dlregismat.c
> [0]PETSC ERROR: MatCreate() line 74 in src/mat/utils/gcreate.c
> [0]PETSC ERROR: DAGetInterpolation_2D_Q1() line 308 in
> src/dm/da/src/dainterp.c
> [0]PETSC ERROR: DAGetInterpolation() line 879 in src/dm/da/src/dainterp.c
> [0]PETSC ERROR: DMGetInterpolation() line 144 in src/dm/da/utils/dm.c
> [0]PETSC ERROR: DMMGSetDM() line 309 in src/snes/utils/damg.c
> [0]PETSC ERROR: main() line 108 in src/snes/examples/tutorials/ex19.c
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 0 with PID 9757 on
> node petit-lyon.ecs.vuw.ac.nz exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> --------------------------------------------------------------------------
> [1]PETSC ERROR:
> ------------------------------------------------------------------------
> [1]PETSC ERROR: Caught signal number 15 Terminate: Somet process (or the
> batch system) has told this process to end
> [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [1]PETSC ERROR: or see
> http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[2]PETSC
> ERROR:
> ------------------------------------------------------------------------
> [pulcinella.ecs.vuw.ac.nz:24936] opal_sockaddr2str failed:Unknown error
> (return code 4)
> [3]PETSC ERROR:
> ------------------------------------------------------------------------
> [3]PETSC ERROR: Caught signal number 15 Terminate: Somet process (or the
> batch system) has told this process to end
> [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [3]PETSC ERROR: or see
> http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[3]PETSC
> ERROR: or try http://valgrind.org on linux or man libgmalloc on Apple to
> find memory corruption errors
> [3]PETSC ERROR:
> 
> 
> Do the PETSc error message suggest anything wrong with my PETSc or do
> they point to underlying problems with the OpenMPI ?
> 
> Any suggestions/insight welcome,
> Kevin
> 
> 




More information about the petsc-dev mailing list