[petsc-users] Newbie question: Strange failure when calling PetscIntView from slepc application
dazza simplythebest
sayosale at hotmail.com
Fri Apr 9 04:11:18 CDT 2021
Dear Stefano,
Many thanks for your response. I have just installed and run valgrind, and I get the output
pasted in below. I can see it seems to be reporting some kind of error, but I don't valgrind well enough
to know exactly where it is saying the problem is. Does this suggest the cause of the problem to you ?
Many thanks once again,
Dan.
dan at super01 /data/work/rotplane/omega_to_zero/stability/test/tmp10/tmp3 $ valgrind --track-origins=yes --leak-check=full mpiexec.hydra -n 1 ./trashy.exe
==839704== Memcheck, a memory error detector
==839704== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==839704== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==839704== Command: mpiexec.hydra -n 1 ./trashy.exe
==839704==
==839704== Conditional jump or move depends on uninitialised value(s)
==839704== at 0x437D9C: HYD_find_base_path (hydra_fs.c:203)
==839704== by 0x41B2D8: mpiexec_get_parameters (mpiexec_params.c:1226)
==839704== by 0x4049F4: main (mpiexec.c:1743)
==839704==
==839704== Conditional jump or move depends on uninitialised value(s)
==839704== at 0x437DAB: HYD_find_base_path (hydra_fs.c:203)
==839704== by 0x41B2D8: mpiexec_get_parameters (mpiexec_params.c:1226)
==839704== by 0x4049F4: main (mpiexec.c:1743)
==839704==
==839704== Conditional jump or move depends on uninitialised value(s)
==839704== at 0x437DB6: HYD_find_base_path (hydra_fs.c:204)
==839704== by 0x41B2D8: mpiexec_get_parameters (mpiexec_params.c:1226)
==839704== by 0x4049F4: main (mpiexec.c:1743)
==839704==
==839704== Syscall param write(buf) points to uninitialised byte(s)
==839704== at 0x488E297: write (write.c:26)
==839704== by 0x43A54D: HYD_sock_write (hydra_sock_intel.c:347)
==839704== by 0x4520AA: send_info_downstream (i_hydra_bstrap.c:618)
==839704== by 0x4520AA: HYD_bstrap_setup (i_hydra_bstrap.c:763)
==839704== by 0x4060A6: main (mpiexec.c:1913)
==839704== Address 0x4d93940 is 0 bytes inside a block of size 300 alloc'd
==839704== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==839704== by 0x42F856: create_pg_node_list (i_mpiexec.c:677)
==839704== by 0x4056BD: main (mpiexec.c:1817)
==839704== Uninitialised value was created by a heap allocation
==839704== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==839704== by 0x438066: HYD_node_list_append (hydra_node.c:19)
==839704== by 0x404D2F: get_node_list (mpiexec.c:276)
==839704== by 0x404D2F: main (mpiexec.c:1773)
==839704==
==839704== Syscall param write(buf) points to uninitialised byte(s)
==839704== at 0x488E297: write (write.c:26)
==839704== by 0x43A54D: HYD_sock_write (hydra_sock_intel.c:347)
==839704== by 0x4098E9: cmd_bcast_root (mpiexec.c:169)
==839704== by 0x40A60B: push_env_downstream (mpiexec.c:642)
==839704== by 0x407E65: main (mpiexec.c:1944)
==839704== Address 0x1ffeffb6a0 is on thread 1's stack
==839704== in frame #3, created by push_env_downstream (mpiexec.c:562)
==839704== Uninitialised value was created by a stack allocation
==839704== at 0x409CF0: push_env_downstream (mpiexec.c:562)
==839704==
==839704== Syscall param write(buf) points to uninitialised byte(s)
==839704== at 0x488E297: write (write.c:26)
==839704== by 0x43A54D: HYD_sock_write (hydra_sock_intel.c:347)
==839704== by 0x4098E9: cmd_bcast_root (mpiexec.c:169)
==839704== by 0x40A6DD: push_env_downstream (mpiexec.c:648)
==839704== by 0x407E65: main (mpiexec.c:1944)
==839704== Address 0x1ffeffb6a0 is on thread 1's stack
==839704== in frame #3, created by push_env_downstream (mpiexec.c:562)
==839704== Uninitialised value was created by a stack allocation
==839704== at 0x409CF0: push_env_downstream (mpiexec.c:562)
==839704==
==839704== Syscall param write(buf) points to uninitialised byte(s)
==839704== at 0x488E297: write (write.c:26)
==839704== by 0x43A54D: HYD_sock_write (hydra_sock_intel.c:347)
==839704== by 0x4098E9: cmd_bcast_root (mpiexec.c:169)
==839704== by 0x409C3F: push_cwd_downstream (mpiexec.c:682)
==839704== by 0x407F07: main (mpiexec.c:1948)
==839704== Address 0x1ffeffb738 is on thread 1's stack
==839704== in frame #3, created by push_cwd_downstream (mpiexec.c:668)
==839704== Uninitialised value was created by a stack allocation
==839704== at 0x409BC0: push_cwd_downstream (mpiexec.c:668)
==839704==
==839704== Syscall param write(buf) points to uninitialised byte(s)
==839704== at 0x488E297: write (write.c:26)
==839704== by 0x43A54D: HYD_sock_write (hydra_sock_intel.c:347)
==839704== by 0x4098E9: cmd_bcast_root (mpiexec.c:169)
==839704== by 0x4092FF: push_mapping_info_downstream (mpiexec.c:790)
==839704== by 0x40803D: main (mpiexec.c:1957)
==839704== Address 0x1ffeffb738 is on thread 1's stack
==839704== in frame #3, created by push_mapping_info_downstream (mpiexec.c:782)
==839704== Uninitialised value was created by a stack allocation
==839704== at 0x409290: push_mapping_info_downstream (mpiexec.c:782)
==839704==
==839704== Syscall param write(buf) points to uninitialised byte(s)
==839704== at 0x488E297: write (write.c:26)
==839704== by 0x43A54D: HYD_sock_write (hydra_sock_intel.c:347)
==839704== by 0x4098E9: cmd_bcast_root (mpiexec.c:169)
==839704== by 0x413670: initiate_process_launch (mpiexec.c:813)
==839704== by 0x4080D8: main (mpiexec.c:1960)
==839704== Address 0x1ffeffb728 is on thread 1's stack
==839704== in frame #3, created by initiate_process_launch (mpiexec.c:801)
==839704== Uninitialised value was created by a stack allocation
==839704== at 0x4135B0: initiate_process_launch (mpiexec.c:801)
==839704==
==839704== Syscall param write(buf) points to uninitialised byte(s)
==839704== at 0x488E297: write (write.c:26)
==839704== by 0x43A54D: HYD_sock_write (hydra_sock_intel.c:347)
==839704== by 0x4098E9: cmd_bcast_root (mpiexec.c:169)
==839704== by 0x413740: initiate_process_launch (mpiexec.c:819)
==839704== by 0x4080D8: main (mpiexec.c:1960)
==839704== Address 0x1ffeffb728 is on thread 1's stack
==839704== in frame #3, created by initiate_process_launch (mpiexec.c:801)
==839704== Uninitialised value was created by a stack allocation
==839704== at 0x4135B0: initiate_process_launch (mpiexec.c:801)
==839704==
==839704== Syscall param write(buf) points to uninitialised byte(s)
==839704== at 0x488E297: write (write.c:26)
==839704== by 0x43A54D: HYD_sock_write (hydra_sock_intel.c:347)
==839704== by 0x420A36: mpiexec_pmi_barrier (mpiexec_pmi.c:81)
==839704== by 0x40E6AA: control_cb (mpiexec.c:1280)
==839704== by 0x4339CF: HYDI_dmx_poll_wait_for_event (hydra_demux_poll.c:75)
==839704== by 0x4066B9: main (mpiexec.c:2006)
==839704== Address 0x1ffeffb68c is on thread 1's stack
==839704== in frame #2, created by mpiexec_pmi_barrier (mpiexec_pmi.c:13)
==839704== Uninitialised value was created by a stack allocation
==839704== at 0x420850: mpiexec_pmi_barrier (mpiexec_pmi.c:13)
==839704==
==839704== Syscall param write(buf) points to uninitialised byte(s)
==839704== at 0x488E297: write (write.c:26)
==839704== by 0x43A54D: HYD_sock_write (hydra_sock_intel.c:347)
==839704== by 0x420A97: mpiexec_pmi_barrier (mpiexec_pmi.c:48)
==839704== by 0x40E6AA: control_cb (mpiexec.c:1280)
==839704== by 0x4339CF: HYDI_dmx_poll_wait_for_event (hydra_demux_poll.c:75)
==839704== by 0x4066B9: main (mpiexec.c:2006)
==839704== Address 0x1ffeffb698 is on thread 1's stack
==839704== in frame #2, created by mpiexec_pmi_barrier (mpiexec_pmi.c:13)
==839704== Uninitialised value was created by a stack allocation
==839704== at 0x420850: mpiexec_pmi_barrier (mpiexec_pmi.c:13)
==839704==
check 01: 10 0 9
check 02: 10 0 9
4 4 4
4 4 4
4 4 4
4
now for PetscIntView ...
0: 4 4 4 4 4 4 4 4 4 4
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[0]PETSC ERROR: likely location of problem given in stack below
[0]PETSC ERROR: --------------------- Stack Frames ------------------------------------
[0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
[0]PETSC ERROR: INSTEAD the line number of the start of the function
[0]PETSC ERROR: is given.
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Signal received
[0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.14.5, Mar 03, 2021
[0]PETSC ERROR: ./trashy.exe on a named super01 by darren Fri Apr 9 18:02:16 2021
[0]PETSC ERROR: Configure options --package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=1 --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-ci-linux-intel-mkl-cmplx-ilp64-dbg-ftn-with-external
[0]PETSC ERROR: #1 User provided function() line 0 in unknown file
[0]PETSC ERROR: Checking the memory for corruption.
Abort(50176059) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 50176059) - process 0
==839704==
==839704== HEAP SUMMARY:
==839704== in use at exit: 84 bytes in 8 blocks
==839704== total heap usage: 4,061 allocs, 4,053 frees, 1,651,554 bytes allocated
==839704==
==839704== 2 bytes in 1 blocks are definitely lost in loss record 1 of 8
==839704== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==839704== by 0x44EFAE: HYD_str_from_int_pad (hydra_str.c:129)
==839704== by 0x4632F2: i_gtool_proxy_args (i_gtool.c:307)
==839704== by 0x405D5F: main (mpiexec.c:1901)
==839704==
==839704== 30 bytes in 1 blocks are definitely lost in loss record 8 of 8
==839704== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==839704== by 0x4632B4: MPL_strdup (mpl_trmem.h:295)
==839704== by 0x4632B4: i_gtool_proxy_args (i_gtool.c:306)
==839704== by 0x405D5F: main (mpiexec.c:1901)
==839704==
==839704== LEAK SUMMARY:
==839704== definitely lost: 32 bytes in 2 blocks
==839704== indirectly lost: 0 bytes in 0 blocks
==839704== possibly lost: 0 bytes in 0 blocks
==839704== still reachable: 52 bytes in 6 blocks
==839704== suppressed: 0 bytes in 0 blocks
==839704== Reachable blocks (those to which a pointer was found) are not shown.
==839704== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==839704==
==839704== For lists of detected and suppressed errors, rerun with: -s
==839704== ERROR SUMMARY: 15 errors from 14 contexts (suppressed: 0 from 0)
________________________________
From: Stefano Zampini <stefano.zampini at gmail.com>
Sent: Friday, April 9, 2021 7:46 AM
To: dazza simplythebest <sayosale at hotmail.com>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Newbie question: Strange failure when calling PetscIntView from slepc application
As the error message says, use valgrind https://www.valgrind.org/ to catch these kind of issues
On Apr 9, 2021, at 10:43 AM, dazza simplythebest <sayosale at hotmail.com<mailto:sayosale at hotmail.com>> wrote:
Dear All,
I am getting a puzzling 'Segmentation Violation' error when I try to
write out an integer array using PetscIntView in a Fortran code. I have written the small
code below which reproduces the problem. All this code does is create
a PetscInt array, initialise this array, then try to write it out to screen.
Interestingly PetscIntView does seem to correctly write out all the values to
the screen (which agree with a direct write), but then fails before it can return to
the main program (see output pasted in below).
I think I must be doing something quite silly, but just
can't quite see what it is! Any suggestions will be very welcome.
Many thanks,
Dan
Code:
MODULE ALL_STAB_ROUTINES
IMPLICIT NONE
CONTAINS
SUBROUTINE WRITE_ROWS_TO_PETSC_MATRIX( ISIZE, JALOC)
#include <slepc/finclude/slepceps.h>
use slepceps
IMPLICIT NONE
PetscInt, INTENT (IN) :: ISIZE
PetscInt, INTENT(INOUT), DIMENSION(0:ISIZE-1) :: JALOC
PetscErrorCode :: ierr
write(*,*)'check 02: ',shape(jaloc),lbound(jaloc),ubound(jaloc)
write(*,*)jaloc
write(*,*)'now for PetscIntView ...'
call PetscIntView(ISIZE,JALOC, PETSC_VIEWER_STDOUT_WORLD)
CHKERRA(ierr)
END SUBROUTINE WRITE_ROWS_TO_PETSC_MATRIX
END MODULE ALL_STAB_ROUTINES
program stabbo
USE MPI
#include <slepc/finclude/slepceps.h>
use slepceps
USE ALL_STAB_ROUTINES
IMPLICIT NONE
PetscInt, ALLOCATABLE, DIMENSION(:) :: JALOC
PetscInt, PARAMETER :: ISIZE = 10
PetscInt, parameter :: FOUR=4
PetscErrorCode :: ierr_pets
call SlepcInitialize(PETSC_NULL_CHARACTER,ierr_pets)
ALLOCATE(JALOC(0:ISIZE-1))
JALOC = FOUR
write(*,*)'check 01: ',shape(jaloc),lbound(jaloc),ubound(jaloc)
CALL WRITE_ROWS_TO_PETSC_MATRIX(ISIZE, JALOC)
CALL SlepcFinalize(ierr_pets)
END PROGRAM STABBO
Output:
dan at super01 /data/work/rotplane/omega_to_zero/stability/test/tmp10/tmp3 $ mpiexec.hydra -n 1 ./trashy.exe
check 01: 10 0 9
check 02: 10 0 9
4 4 4
4 4 4
4 4 4
4
now for PetscIntView ...
0: 4 4 4 4 4 4 4 4 4 4
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[0]PETSC ERROR: or try http://valgrind.org<http://valgrind.org/> on GNU/linux and Apple Mac OS X to find memory corruption errors
[0]PETSC ERROR: likely location of problem given in stack below
[0]PETSC ERROR: --------------------- Stack Frames ------------------------------------
[0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
[0]PETSC ERROR: INSTEAD the line number of the start of the function
[0]PETSC ERROR: is given.
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Signal received
[0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.14.5, Mar 03, 2021
[0]PETSC ERROR: ./trashy.exe on a named super01 by darren Fri Apr 9 16:28:25 2021
[0]PETSC ERROR: Configure options --package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=1 --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-ci-linux-intel-mkl-cmplx-ilp64-dbg-ftn-with-external
[0]PETSC ERROR: #1 User provided function() line 0 in unknown file
[0]PETSC ERROR: Checking the memory for corruption.
Abort(50176059) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 50176059) - process 0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210409/4f8d30fb/attachment-0001.html>
More information about the petsc-users
mailing list