NetBSD port

Kevin.Buckley at ecs.vuw.ac.nz Kevin.Buckley at ecs.vuw.ac.nz
Tue Dec 15 22:26:50 CST 2009


Hi again,

I though tI had got things working but maybe not, not completely,
anyway.

I did this and stuff worked:

PETSC_DIR=$PWD; export PETSC_DIR
./configure  --with-c++-support --with-hdf5=/usr/pkg
--prefix=/vol/grid/pkg/petsc-3.0.0-p7
PETSC_ARCH=netbsdelf5.0.-c-debug; export PETSC_ARCH
make all
make install
make test
cd src/snes/examples/tutorials/
make ex19
./ex19 -contours

Nice pictures!

I then moved the example ex19 source and the makefile out of the
distribution tree to somwhere else and built it against the
installed stuff and ran it: that worked too.

export PETSC_DIR=/vol/grid/pkg/petsc-3.0.0-p7
make ex19
./ex19 -dmmg_nlevels 4 -snes_monitor_draw
./ex19 -contours


I then built the package that needs PETSc, PISM, from Univ Alaska at
Fairbanks, and ran that.

What I then found is that the PISM stuff would fail if we launched it
into an Sun Grid Engine environment with more than TWO processors,

It also ran if simply mpiexec-d onto a four-processor machine but
not onto a four-machine grid.

I saw this block of error messages from a 4-node submission

[2]PETSC ERROR:
------------------------------------------------------------------------
[2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
probably memory access out of range
[2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[2]PETSC ERROR: or see
http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[2]PETSC
ERROR: or try http://valgrind.org on linux or man libgmalloc on Apple to
find memory corruption errors
[2]PETSC ERROR: likely location of problem given in stack below
[2]PETSC ERROR: ---------------------  Stack Frames
------------------------------------
[2]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
[2]PETSC ERROR:       INSTEAD the line number of the start of the function
[2]PETSC ERROR:       is given.
[2]PETSC ERROR: [2] VecScatterCreateCommon_PtoS line 1699
src/vec/vec/utils/vpscat.c
[2]PETSC ERROR: [2] VecScatterCreate_PtoS line 1508
src/vec/vec/utils/vpscat.c
[2]PETSC ERROR: [2] VecScatterCreate line 833 src/vec/vec/utils/vscat.c
[2]PETSC ERROR: [2] DACreate2d line 338 src/dm/da/src/da2.c
[2]PETSC ERROR: --------------------- Error Message
------------------------------------
[2]PETSC ERROR: Signal received!
[2]PETSC ERROR:
------------------------------------------------------------------------
[2]PETSC ERROR: Petsc Release Version 3.0.0, Patch 7, Mon Jul  6 11:33:34
CDT 2009
[2]PETSC ERROR: See docs/changes/index.html for recent updates.
[2]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
[2]PETSC ERROR: See docs/index.html for manual pages.
[2]PETSC ERROR:
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD
with errorcode 59.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
------------------------------------------------------------------------
[2]PETSC ERROR: /vol/grid/pkg/pism-0.2.1/bin/pismv on a netbsdelf named
citron.ecs.vuw.ac.nz by golledni Wed Dec 16 15:49:09 2009
[2]PETSC ERROR: Libraries linked from /vol/grid/pkg/petsc-3.0.0-p7/lib
[2]PETSC ERROR: Configure run at Mon Dec 14 17:02:49 2009
[2]PETSC ERROR: Configure options --with-c++-support --with-hdf5=/usr/pkg
--prefix=/vol/grid/pkg/petsc-3.0.0-p7 --with-shared=0
[2]PETSC ERROR:
------------------------------------------------------------------------
[2]PETSC ERROR: User provided function() line 0 in unknown directory
unknown file
--------------------------------------------------------------------------
mpirun has exited due to process rank 2 with PID 4365 on
node citron.ecs.vuw.ac.nz exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------


and this block of messages from an 8-node submission


[3]PETSC ERROR:
------------------------------------------------------------------------
[3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
probably memory access out of range
[3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[3]PETSC ERROR: or see
http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[3]PETSC
ERROR: or try http://valgrind.org on linux or man libgmalloc on Apple to
find memory corruption errors
[3]PETSC ERROR: likely location of problem given in stack below
[3]PETSC ERROR: ---------------------  Stack Frames
------------------------------------
[2]PETSC ERROR:
------------------------------------------------------------------------
[2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
probably memory access out of range
[2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[2]PETSC ERROR: or see
http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[2]PETSC
ERROR: or try http://valgrind.org on linux or man
 libgmalloc on Apple to find memory corruption errors
[2]PETSC ERROR: likely location of problem given in stack below
[2]PETSC ERROR: ---------------------  Stack Frames
------------------------------------



I then went back and tried to run the PETSc example and found similar
happenings, things run when submitted to a two-node "grid" but not a
four-node one, the error message block being:

[0]PETSC ERROR: --------------------- Error Message
------------------------------------
[0]PETSC ERROR: Out of memory. This could be due to allocating
[0]PETSC ERROR: too large an object or bleeding by not properly
[0]PETSC ERROR: destroying unneeded objects.
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[0]PETSC ERROR: Memory allocated 90628 Memory used by process 0
[0]PETSC ERROR: Try running with -malloc_dump or -malloc_log for info.
[0]PETSC ERROR: Memory requested 320!
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: Petsc Release Version 3.0.0, Patch 7, Mon Jul  6 11:33:34
CDT 2009
[0]PETSC ERROR: See docs/changes/index.html for recent updates.
[0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
[0]PETSC ERROR: See docs/index.html for manual pages.
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: /home/rialto1/kingstlind/kevin/PETSc/ex19 on a netbsdelf
named petit-lyon.ecs.vuw.ac.nz by kingstlind Wed Dec 16 16:45:39 2009
[0]PETSC ERROR: Libraries linked from /vol/grid/pkg/petsc-3.0.0-p7/lib
[0]PETSC ERROR: Configure run at Mon Dec 14 17:02:49 2009
[0]PETSC ERROR: Configure options --with-c++-support --with-hdf5=/usr/pkg
--prefix=/vol/grid/pkg/petsc-3.0.0-p7 --with-shared=0
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: PetscMallocAlign() line 61 in src/sys/memory/mal.c
[0]PETSC ERROR: PetscTrMallocDefault() line 194 in src/sys/memory/mtr.c
[0]PETSC ERROR: PetscFListAdd() line 235 in src/sys/dll/reg.c
[0]PETSC ERROR: MatRegister() line 140 in src/mat/interface/matreg.c
[0]PETSC ERROR: MatRegisterAll() line 106 in src/mat/interface/matregis.c
[0]PETSC ERROR: MatInitializePackage() line 54 in
src/mat/interface/dlregismat.c
[0]PETSC ERROR: MatCreate() line 74 in src/mat/utils/gcreate.c
[0]PETSC ERROR: DAGetInterpolation_2D_Q1() line 308 in
src/dm/da/src/dainterp.c
[0]PETSC ERROR: DAGetInterpolation() line 879 in src/dm/da/src/dainterp.c
[0]PETSC ERROR: DMGetInterpolation() line 144 in src/dm/da/utils/dm.c
[0]PETSC ERROR: DMMGSetDM() line 309 in src/snes/utils/damg.c
[0]PETSC ERROR: main() line 108 in src/snes/examples/tutorials/ex19.c
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 9757 on
node petit-lyon.ecs.vuw.ac.nz exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[1]PETSC ERROR:
------------------------------------------------------------------------
[1]PETSC ERROR: Caught signal number 15 Terminate: Somet process (or the
batch system) has told this process to end
[1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[1]PETSC ERROR: or see
http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[2]PETSC
ERROR:
------------------------------------------------------------------------
[pulcinella.ecs.vuw.ac.nz:24936] opal_sockaddr2str failed:Unknown error
(return code 4)
[3]PETSC ERROR:
------------------------------------------------------------------------
[3]PETSC ERROR: Caught signal number 15 Terminate: Somet process (or the
batch system) has told this process to end
[3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[3]PETSC ERROR: or see
http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[3]PETSC
ERROR: or try http://valgrind.org on linux or man libgmalloc on Apple to
find memory corruption errors
[3]PETSC ERROR:


Do the PETSc error message suggest anything wrong with my PETSc or do
they point to underlying problems with the OpenMPI ?

Any suggestions/insight welcome,
Kevin

-- 
Kevin M. Buckley                                  Room:  CO327
School of Engineering and                         Phone: +64 4 463 5971
 Computer Science
Victoria University of Wellington
New Zealand




More information about the petsc-dev mailing list