[petsc-users] MPI-FFTW example crashes
Sajid Ali
sajidsyed2021 at u.northwestern.edu
Sun Jun 2 23:14:20 CDT 2019
Hi PETSc-developers,
I'm trying to run ex143 on a cluster (alcf-theta). I compiled PETSc on
login node with cray-fftw-3.3.8.1 and there was no error in either
configure or make.
When I try running ex143 with 1 MPI rank on compute node, everything works
fine but with 2 MPI ranks, it crashes due to illegal instruction due to
memory corruption. I tried running it with valgrind but the available
valgrind module on theta gives the error `valgrind: failed to start tool
'memcheck' for platform 'amd64-linux': No such file or directory`.
To get around this, I tried running it with gdb4hpc and I attached the
backtrace which shows that there is some error with mpi-fftw being called.
I also attach the output with -start_in_debugger command option.
What could possibly cause this error and how do I fix it ?
Thank You,
Sajid Ali
Applied Physics
Northwestern University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190602/c4666ee3/attachment.html>
-------------- next part --------------
sajid at thetamom1:/gpfs/mira-home/sajid/sajid_proj/test_fftw> aprun -n 2 --cc depth -d 1 -j 1 -r 1 ./ex143 -start_in_debugger -log_view &> out
sajid at thetamom1:/gpfs/mira-home/sajid/sajid_proj/test_fftw> cat out
PETSC: Attaching gdb to ./ex143 of pid 62260 on display :0.0 on machine nid03832
PETSC: Attaching gdb to ./ex143 of pid 62259 on display :0.0 on machine nid03832
xterm: xterm: Xt error: Can't open display: :0.0
Xt error: Can't open display: :0.0
xterm: xterm: DISPLAY is not set
DISPLAY is not set
Use PETSc-FFTW interface...1-DIM: 30
[1]PETSC ERROR: [0]PETSC ERROR: ------------------------------------------------------------------------
------------------------------------------------------------------------
[0]PETSC ERROR: [1]PETSC ERROR: Caught signal number 4 Illegal instruction: Likely due to memory corruption
Caught signal number 4 Illegal instruction: Likely due to memory corruption
[0]PETSC ERROR: [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: [1]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[0]PETSC ERROR: [1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[1]PETSC ERROR: [0]PETSC ERROR: likely location of problem given in stack below
likely location of problem given in stack below
[1]PETSC ERROR: [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------
--------------------- Stack Frames ------------------------------------
[0]PETSC ERROR: [1]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
Note: The EXACT line numbers in the stack are not available,
[0]PETSC ERROR: [1]PETSC ERROR: INSTEAD the line number of the start of the function
INSTEAD the line number of the start of the function
[0]PETSC ERROR: [1]PETSC ERROR: is given.
is given.
[0]PETSC ERROR: [1]PETSC ERROR: [0] MatMult_MPIFFTW line 236 /gpfs/mira-home/sajid/packages/petsc/src/mat/impls/fft/fftw/fftw.c
[1] MatMult_MPIFFTW line 236 /gpfs/mira-home/sajid/packages/petsc/src/mat/impls/fft/fftw/fftw.c
[0]PETSC ERROR: [1]PETSC ERROR: [1] MatMult line 2402 /gpfs/mira-home/sajid/packages/petsc/src/mat/interface/matrix.c
[0] MatMult line 2402 /gpfs/mira-home/sajid/packages/petsc/src/mat/interface/matrix.c
[1]PETSC ERROR: [0]PETSC ERROR: User provided function() line 0 in unknown file (null)
User provided function() line 0 in unknown file (null)
_pmiu_daemon(SIGCHLD): [NID 03832] [c7-1c2s14n0] [Mon Jun 3 04:10:53 2019] PE RANK 0 exit signal Aborted
[NID 03832] 2019-06-03 04:10:53 Apid 13751865: initiated application termination
Application 13751865 exit codes: 134
Application 13751865 resources: utime ~0s, stime ~2s, Rss ~27708, inblocks ~9678, outblocks ~0
sajid at thetamom1:/gpfs/mira-home/sajid/sajid_proj/test_fftw>
-------------- next part --------------
sajid at thetamom1:/gpfs/mira-home/sajid/sajid_proj/test_fftw> gdb4hpc
gdb4hpc 3.0 - Cray Line Mode Parallel Debugger
With Cray Comparative Debugging Technology.
Copyright 2007-2018 Cray Inc. All Rights Reserved.
Copyright 1996-2016 University of Queensland. All Rights Reserved.
Type "help" for a list of commands.
Type "help <cmd>" for detailed help about a command.
dbg all> maint set unsafe on
dbg all> launch $a{2} ./ex143
Starting application, please wait...
Creating MRNet communication network...
Waiting for debug servers to attach to MRNet communications network...
Timeout in 400 seconds. Please wait for the attach to complete.
Number of dbgsrvs connected: [0]; Timeout Counter: [1]
Number of dbgsrvs connected: [1]; Timeout Counter: [0]
Number of dbgsrvs connected: [1]; Timeout Counter: [1]
Number of dbgsrvs connected: [2]; Timeout Counter: [0]
Finalizing setup...
Launch complete.
a{0..1}: Initial breakpoint, main at /lus/theta-fs0/projects/large3dxrayADSP/sajid_proj/test_fftw/ex143.c:27
dbg all> step 20
a{0..1}: main at /lus/theta-fs0/projects/large3dxrayADSP/sajid_proj/test_fftw/ex143.c:100
dbg all> step 20
<$a>: Use PETSc-FFTW interface...1-DIM: 30
a{0..1}: main at /lus/theta-fs0/projects/large3dxrayADSP/sajid_proj/test_fftw/ex143.c:128
dbg all> step 20
a{0..1}: Program received signal SIGILL.
a{0..1}: In sadt at :0
dbg all> backtrace
a{0..1}: #0 0x00002aaab5c769c2 in sadt
a{0..1}: #1 0x00002aaab26399f6 in MatMult_MPIFFTW
a{0..1}: #2 0x00002aaab2579c2a in MatMult
a{0..1}: #3 0x0000000000404ded in main at /lus/theta-fs0/projects/large3dxrayADSP/sajid_proj/test_fftw/ex143.c:128
dbg all>
More information about the petsc-users
mailing list