[petsc-users] Problem running geoclaw example with petsc 3.22
Barry Smith
bsmith at petsc.dev
Thu Oct 31 09:18:40 CDT 2024
Thanks, this is progress :-)
You can add to petscMPIOptions the lines
-start_in_debugger
-debugger_nodes 0
then run as usual, a window will pop up with the debugger
type c (for continue)
then when it craches type bt (for backtrace)
and it should print out the stack frames, cut and paste them all and send that back
Barry
> On Oct 31, 2024, at 1:24 AM, Praveen C <cpraveen at gmail.com> wrote:
>
> Hello Barry
>
> With the extra option
>
>> -mpi_linear_solver_server_use_shared_memory false
>
>
> I get different error
>
> Thanks
> praveen
>
> $ make .output
> /Library/Developer/CommandLineTools/usr/bin/make output -f Makefile /Users/praveen/Applications/clawpack/geoclaw/src/2d/bouss/Makefile.bouss /Users/praveen/Applications/clawpack/clawutil/src/Makefile.common
> rm -f .output
> python /Users/praveen/Applications/clawpack/clawutil/src/python/clawutil/runclaw.py /Users/praveen/Work/bouss/radial_flat/xgeoclaw _output \
> True None . False False None "/opt/homebrew/Caskroom/miniforge/base/envs/claw/./bin/mpiexec -n 6"
> ==> runclaw: Will take data from /Volumes/Samsung_T5/Work/bouss/radial_flat
> ==> runclaw: Will write output to /Volumes/Samsung_T5/Work/bouss/radial_flat/_output
> ==> runclaw: Removing all old fort/gauge files in /Volumes/Samsung_T5/Work/bouss/radial_flat/_output
>
> ==> Running with command:
> /opt/homebrew/Caskroom/miniforge/base/envs/claw/./bin/mpiexec -n 6 /Users/praveen/Work/bouss/radial_flat/xgeoclaw
> Reading data file: claw.data
> first 5 lines are comments and will be skipped
> Reading data file: amr.data
> first 5 lines are comments and will be skipped
>
> Running amrclaw ...
>
> Reading data file: geoclaw.data
> first 5 lines are comments and will be skipped
> Reading data file: refinement.data
> first 5 lines are comments and will be skipped
> Reading data file: dtopo.data
> first 5 lines are comments and will be skipped
> Reading data file: topo.data
> first 5 lines are comments and will be skipped
> converting to topotype > 1 might reduce file size
> python tools for converting files are provided
>
> Reading topography file /Volumes/Samsung_T5/Work/bouss/radial_flat/flat100.tt1
> Reading data file: qinit.data
> first 5 lines are comments and will be skipped
> qinit_type = 0, no perturbation
> Reading data file: fgout_grids.data
> first 5 lines are comments and will be skipped
> Reading data file: friction.data
> first 5 lines are comments and will be skipped
> Reading data file: multilayer.data
> first 5 lines are comments and will be skipped
> Reading data file: surge.data
> first 5 lines are comments and will be skipped
> Reading data file: regions.data
> first 5 lines are comments and will be skipped
> Reading data file: flagregions.data
> first 5 lines are comments and will be skipped
> +++ rregion bounding box:
> 0.0000000000000000 5000.0000000000000 0.0000000000000000 5000.0000000000000
> +++ i, rr%s(1), rr%ds: 1 0.0000000000000000 5000.0000000000000
> +++ Ruled region name: Region_diagonal
> +++ Ruled region file_name: /Volumes/Samsung_T5/Work/bouss/radial_flat/RuledRectangle_Diagonal.data
> +++ rregion bounding box:
> 0.0000000000000000 5000.0000000000000 -1000.0000000000000 6000.0000000000000
> +++ i, rr%s(1), rr%ds: 2 0.0000000000000000 5000.0000000000000
> +++ rregion bounding box:
> 0.0000000000000000 1000.0000000000000 0.0000000000000000 1000.0000000000000
> +++ i, rr%s(1), rr%ds: 3 0.0000000000000000 1000.0000000000000
> Reading data file: gauges.data
> first 5 lines are comments and will be skipped
> Reading data file: fgmax_grids.data
> first 5 lines are comments and will be skipped
> Reading data file: adjoint.data
> first 5 lines are comments and will be skipped
> Reading data file: bouss.data
> first 5 lines are comments and will be skipped
> Using SGN equations
> ==> Applying Bouss equations to selected grids between levels 1 and 10
> ==> Use Bouss. in water deeper than 1.0000000000000000
> Using a PETSc solver
> Using Bouss equations from the start
> rnode allocated...
> node allocated...
> listOfGrids allocated...
> Storage allocated...
> bndList allocated...
> Gridding level 1 at t = 0.000000E+00: 4 grids with 10000 cells
> Setting initial dt to 2.9999999999999999E-002
> max threads set to 1
>
> Done reading data, starting computation ...
>
> Total zeta at initial time: 39269.907650665169
> GEOCLAW: Frame 0 output files done at time t = 0.000000D+00
>
> [0]PETSC ERROR: ------------------------------------------------------------------------
> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [0]PETSC ERROR: or see https://urldefense.us/v3/__https://petsc.org/release/faq/*valgrind__;Iw!!G_uCfscf7eWS!eiByYM9OWRuz3FQcylxVZQPskjJAYk5n9fzr1ZlHLo8_rF0ZbemCaV8fPWmYHH3mmUMBSJyaYkhhxsJBVvg65ic$ and https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!eiByYM9OWRuz3FQcylxVZQPskjJAYk5n9fzr1ZlHLo8_rF0ZbemCaV8fPWmYHH3mmUMBSJyaYkhhxsJB3S85n4s$
> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
> [0]PETSC ERROR: to get more information on the crash.
> [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash.
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF
> Proc: [[5779,1],0]
> Errorcode: 59
>
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> prterun has exited due to process rank 0 with PID 0 on node chandra calling
> "abort". This may have caused other processes in the application to be
> terminated by signals sent by prterun (as reported here).
> --------------------------------------------------------------------------
> Traceback (most recent call last):
> File "/Users/praveen/Applications/clawpack/clawutil/src/python/clawutil/runclaw.py", line 242, in runclaw
> proc = subprocess.check_call(cmd_split,
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> File "/opt/homebrew/Caskroom/miniforge/base/envs/claw/lib/python3.12/subprocess.py", line 413, in check_call
> raise CalledProcessError(retcode, cmd)
> subprocess.CalledProcessError: Command '['/opt/homebrew/Caskroom/miniforge/base/envs/claw/./bin/mpiexec', '-n', '6', '/Users/praveen/Work/bouss/radial_flat/xgeoclaw']' returned non-zero exit status 59.
>
> During handling of the above exception, another exception occurred:
>
> Traceback (most recent call last):
> File "/Users/praveen/Applications/clawpack/clawutil/src/python/clawutil/runclaw.py", line 341, in <module>
> runclaw(*args)
> File "/Users/praveen/Applications/clawpack/clawutil/src/python/clawutil/runclaw.py", line 249, in runclaw
> raise ClawExeError(exe_error_str, cpe.returncode, cpe.cmd,
> ClawExeError:
>
> *** FORTRAN EXE FAILED ***
>
> make[1]: *** [output] Error 1
> make: *** [.output] Error 2
>
>> On 30 Oct 2024, at 9:33 PM, Barry Smith <bsmith at petsc.dev> wrote:
>>
>>
>> Please put
>>
>> -mpi_linear_solver_server_use_shared_memory false
>>
>> into the petscMPIOptions file and see if that changes anything.
>>
>> Barry
>>
>>
>>> On Oct 30, 2024, at 9:05 AM, Praveen C <cpraveen at gmail.com> wrote:
>>>
>>> I have attached some files for this
>>>
>>> cd bouss
>>> . setenv.sh # some settings in this file may need to be changed
>>>
>>> cd radial_flat/1d_radial
>>> make .output
>>> cd ..
>>> make .output
>>>
>>> Thanks
>>> praveen
>>>
>>> <petscMPIoptions>
>>> <setenv.sh>
>>>
>>>> On 30 Oct 2024, at 6:28 PM, Barry Smith <bsmith at petsc.dev> wrote:
>>>>
>>>>
>>>> Please send me exact instructions on how you are running the geoclaw example, when I run
>>>>
>>>> geoclaw/examples/bouss/radial_flat
>>>>
>>>>
>>>> it only runs with one level, but never errors, I changed setrun.py to amrdata.amr_levels_max = 5 but it made no difference.
>>>>
>>>> Thanks
>>>>
>>>> Barry
>>>>
>>>>
>>>>
>>>>> On Oct 24, 2024, at 12:17 PM, Praveen C <cpraveen at gmail.com> wrote:
>>>>>
>>>>> Hello Barry
>>>>>
>>>>> I use this script to install clawpack and required dependencies
>>>>>
>>>>> https://urldefense.us/v3/__https://github.com/cpraveen/cfdlab/blob/master/bin/clawpack.sh__;!!G_uCfscf7eWS!eiByYM9OWRuz3FQcylxVZQPskjJAYk5n9fzr1ZlHLo8_rF0ZbemCaV8fPWmYHH3mmUMBSJyaYkhhxsJBDAG1OOo$
>>>>>
>>>>> See lines 125-132
>>>>>
>>>>> You can use it like this
>>>>>
>>>>> export CLAW=/path/to/where/you/want/clawpack
>>>>> bash clawpack.sh v5.11.0
>>>>>
>>>>> This will git pull clawpack and creates a conda env called “claw” and installs inside that.
>>>>>
>>>>> I have not specified petsc version in this script, but latest miniforge should install petsc at 3.22
>>>>>
>>>>> Thank you
>>>>> praveen
>>>>>
>>>>>> On 24 Oct 2024, at 9:37 PM, Barry Smith <bsmith at petsc.dev> wrote:
>>>>>>
>>>>>>
>>>>>> Good, some helpful information with the runs you made.
>>>>>>
>>>>>> In the crash below it made more progress it was able to allocate multiple regions of shared memory and access them. I don't know why it would crash later.
>>>>>>
>>>>>> Can you tell me all the steps with miniforge (which seems to be related to the failure) you use? I've never used miniforge.
>>>>>>
>>>>>> If I can get to an environment that reproduces the problem I can debug it and fix it.
>>>>>>
>>>>>> Barry
>>>>>>
>>>>>>
>>>>>>> On Oct 24, 2024, at 11:49 AM, Praveen C <cpraveen at gmail.com> wrote:
>>>>>>>
>>>>>>> I get this
>>>>>>>
>>>>>>> $ mpiexec -n 3 ./ex89f -n 20 -mpi_linear_solver_server -mpi_linear_solver_server -mpi_linear_solver_server_ksp_view -ksp_monitor -ksp_converged_reason -ksp_view -mpi_linear_solver_server_minimum_count_per_rank 5
>>>>>>> [0]PETSC ERROR: ------------------------------------------------------------------------
>>>>>>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
>>>>>>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>>>>>>> [0]PETSC ERROR: or see https://urldefense.us/v3/__https://petsc.org/release/faq/*valgrind__;Iw!!G_uCfscf7eWS!eiByYM9OWRuz3FQcylxVZQPskjJAYk5n9fzr1ZlHLo8_rF0ZbemCaV8fPWmYHH3mmUMBSJyaYkhhxsJBVvg65ic$ and https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!eiByYM9OWRuz3FQcylxVZQPskjJAYk5n9fzr1ZlHLo8_rF0ZbemCaV8fPWmYHH3mmUMBSJyaYkhhxsJB3S85n4s$
>>>>>>> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
>>>>>>> [0]PETSC ERROR: to get more information on the crash.
>>>>>>> [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash.
>>>>>>> --------------------------------------------------------------------------
>>>>>>> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF
>>>>>>> Proc: [[47380,1],0]
>>>>>>> Errorcode: 59
>>>>>>>
>>>>>>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>>>>>>> You may or may not see output from other processes, depending on
>>>>>>> exactly when Open MPI kills them.
>>>>>>> --------------------------------------------------------------------------
>>>>>>> --------------------------------------------------------------------------
>>>>>>> prterun has exited due to process rank 0 with PID 0 on node MacMiniHome calling
>>>>>>> "abort". This may have caused other processes in the application to be
>>>>>>> terminated by signals sent by prterun (as reported here).
>>>>>>> —————————————————————————————————————
>>>>>>>
>>>>>>> and I have this after the code exits
>>>>>>>
>>>>>>> $ ipcs -m
>>>>>>> IPC status from <running system> as of Thu Oct 24 21:17:39 IST 2024
>>>>>>> T ID KEY MODE OWNER GROUP
>>>>>>> Shared Memory:
>>>>>>> m 1572864 0x0000000b --rw-rw-rw- praveen staff
>>>>>>> m 524289 0x0000000c --rw-rw-rw- praveen staff
>>>>>>> m 655362 0x0000000d --rw-rw-rw- praveen staff
>>>>>>> m 262147 0x0000000e --rw-rw-rw- praveen staff
>>>>>>> m 262148 0x0000000f --rw-rw-rw- praveen staff
>>>>>>> m 393221 0x0000000a --rw-rw-rw- praveen staff
>>>>>>>
>>>>>>> This is with petsc installed with miniforge, which I also use with clawpack. With spack installed petsc, I can run the ex89f example.
>>>>>>>
>>>>>>> Thanks
>>>>>>> praveen
>>>>>>>
>>>>>>>> On 24 Oct 2024, at 7:55 PM, Barry Smith <bsmith at petsc.dev> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Ok, super strange, I've run many times on my Mac
>>>>>>>>
>>>>>>>> Can you please try to remove that allocated memory with ipcrm and then
>>>>>>>>
>>>>>>>> cd $PETSC_DIR/src/ksp/ksp/tutorials
>>>>>>>> make ex89f
>>>>>>>> mpiexec -n 3 ./ex89f -n 20 -mpi_linear_solver_server -mpi_linear_solver_server -mpi_linear_solver_server_ksp_view -ksp_monitor -ksp_converged_reason -ksp_view -mpi_linear_solver_server_minimum_count_per_rank 5
>>>>>>>>
>>>>>>>> This does the same thing as the GeoClaw code but is much simpler.
>>>>>>>>
>>>>>>>> Barry
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> On Oct 23, 2024, at 10:55 PM, Praveen C <cpraveen at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> I get very similar error on my mac with
>>>>>>>>>
>>>>>>>>> $ gfortran -v
>>>>>>>>> Using built-in specs.
>>>>>>>>> COLLECT_GCC=gfortran
>>>>>>>>> COLLECT_LTO_WRAPPER=/opt/homebrew/Caskroom/miniforge/base/envs/claw/libexec/gcc/arm64-apple-darwin20.0.0/13.2.0/lto-wrapper
>>>>>>>>> Target: arm64-apple-darwin20.0.0
>>>>>>>>> Configured with: ../configure --prefix=/opt/homebrew/Caskroom/miniforge/base/envs/claw --build=x86_64-apple-darwin13.4.0 --host=arm64-apple-darwin20.0.0 --target=arm64-apple-darwin20.0.0 --with-libiconv-prefix=/opt/homebrew/Caskroom/miniforge/base/envs/claw --enable-languages=fortran --disable-multilib --enable-checking=release --disable-bootstrap --disable-libssp --with-gmp=/opt/homebrew/Caskroom/miniforge/base/envs/claw --with-mpfr=/opt/homebrew/Caskroom/miniforge/base/envs/claw --with-mpc=/opt/homebrew/Caskroom/miniforge/base/envs/claw --with-isl=/opt/homebrew/Caskroom/miniforge/base/envs/claw --enable-darwin-at-rpath
>>>>>>>>> Thread model: posix
>>>>>>>>> Supported LTO compression algorithms: zlib
>>>>>>>>> gcc version 13.2.0 (GCC)
>>>>>>>>>
>>>>>>>>> Before starting
>>>>>>>>>
>>>>>>>>> $ ipcs -m
>>>>>>>>> IPC status from <running system> as of Thu Oct 24 08:02:11 IST 2024
>>>>>>>>> T ID KEY MODE OWNER GROUP
>>>>>>>>> Shared Memory:
>>>>>>>>>
>>>>>>>>> and when I run the code
>>>>>>>>>
>>>>>>>>> Using a PETSc solver
>>>>>>>>> Using Bouss equations from the start
>>>>>>>>> rnode allocated...
>>>>>>>>> node allocated...
>>>>>>>>> listOfGrids allocated...
>>>>>>>>> Storage allocated...
>>>>>>>>> bndList allocated...
>>>>>>>>> Gridding level 1 at t = 0.000000E+00: 4 grids with 10000 cells
>>>>>>>>> Setting initial dt to 2.9999999999999999E-002
>>>>>>>>> max threads set to 1
>>>>>>>>>
>>>>>>>>> Done reading data, starting computation ...
>>>>>>>>>
>>>>>>>>> Total zeta at initial time: 39269.907650665169
>>>>>>>>> GEOCLAW: Frame 0 output files done at time t = 0.000000D+00
>>>>>>>>>
>>>>>>>>> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
>>>>>>>>> [0]PETSC ERROR: Petsc has generated inconsistent data
>>>>>>>>> [0]PETSC ERROR: Unable to locate PCMPI allocated shared address 0x130698000
>>>>>>>>> [0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc!
>>>>>>>>> [0]PETSC ERROR: Option left: name:-ksp_type value: preonly source: file
>>>>>>>>> [0]PETSC ERROR: Option left: name:-mpi_ksp_max_it value: 200 source: file
>>>>>>>>> [0]PETSC ERROR: Option left: name:-mpi_ksp_reuse_preconditioner (no value) source: file
>>>>>>>>> [0]PETSC ERROR: Option left: name:-mpi_ksp_rtol value: 1.e-9 source: file
>>>>>>>>> [0]PETSC ERROR: Option left: name:-mpi_ksp_type value: gmres source: file
>>>>>>>>> [0]PETSC ERROR: Option left: name:-mpi_linear_solver_server_view (no value) source: file
>>>>>>>>> [0]PETSC ERROR: Option left: name:-mpi_pc_gamg_sym_graph value: true source: file
>>>>>>>>> [0]PETSC ERROR: Option left: name:-mpi_pc_gamg_symmetrize_graph value: true source: file
>>>>>>>>> [0]PETSC ERROR: Option left: name:-mpi_pc_type value: gamg source: file
>>>>>>>>> [0]PETSC ERROR: Option left: name:-pc_mpi_minimum_count_per_rank value: 5000 source: file
>>>>>>>>> [0]PETSC ERROR: Option left: name:-pc_type value: mpi source: file
>>>>>>>>> [0]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!eiByYM9OWRuz3FQcylxVZQPskjJAYk5n9fzr1ZlHLo8_rF0ZbemCaV8fPWmYHH3mmUMBSJyaYkhhxsJB3S85n4s$ for trouble shooting.
>>>>>>>>> [0]PETSC ERROR: Petsc Release Version 3.22.0, Sep 28, 2024
>>>>>>>>> [0]PETSC ERROR: /Users/praveen/work/bouss/radial_flat/xgeoclaw with 6 MPI process(es) and PETSC_ARCH on MacMiniHome.local by praveen Thu Oct 24 08:04:27 2024
>>>>>>>>> [0]PETSC ERROR: Configure options: AR=arm64-apple-darwin20.0.0-ar CC=mpicc CXX=mpicxx FC=mpifort CFLAGS="-ftree-vectorize -fPIC -fstack-protector-strong -O2 -pipe -isystem /opt/homebrew/Caskroom/miniforge/base/envs/claw/include " CPPFLAGS="-D_FORTIFY_SOURCE=2 -isystem /opt/homebrew/Caskroom/miniforge/base/envs/claw/include -mmacosx-version-min=11.0 -mmacosx-version-min=11.0" CXXFLAGS="-ftree-vectorize -fPIC -fstack-protector-strong -O2 -pipe -stdlib=libc++ -fvisibility-inlines-hidden -fmessage-length=0 -isystem /opt/homebrew/Caskroom/miniforge/base/envs/claw/include " FFLAGS="-march=armv8.3-a -ftree-vectorize -fPIC -fno-stack-protector -O2 -pipe -isystem /opt/homebrew/Caskroom/miniforge/base/envs/claw/include " LDFLAGS="-Wl,-headerpad_max_install_names -Wl,-dead_strip_dylibs -Wl,-rpath,/opt/homebrew/Caskroom/miniforge/base/envs/claw/lib -L/opt/homebrew/Caskroom/miniforge/base/envs/claw/lib" LIBS="-Wl,-rpath,/opt/homebrew/Caskroom/miniforge/base/envs/claw/lib -lmpi_mpifh -lgfortran" --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3 --with-clib-autodetect=0 --with-cxxlib-autodetect=0 --with-fortranlib-autodetect=0 --with-debugging=0 --with-blas-lib=libblas.dylib --with-lapack-lib=liblapack.dylib --with-yaml=1 --with-hdf5=1 --with-fftw=1 --with-hwloc=0 --with-hypre=1 --with-metis=1 --with-mpi=1 --with-mumps=1 --with-parmetis=1 --with-pthread=1 --with-ptscotch=1 --with-shared-libraries --with-ssl=0 --with-scalapack=1 --with-superlu=1 --with-superlu_dist=1 --with-superlu_dist-include=/opt/homebrew/Caskroom/miniforge/base/envs/claw/include/superlu-dist --with-superlu_dist-lib=-lsuperlu_dist --with-suitesparse=1 --with-suitesparse-dir=/opt/homebrew/Caskroom/miniforge/base/envs/claw --with-x=0 --with-scalar-type=real --with-cuda=0 --with-batch --prefix=/opt/homebrew/Caskroom/miniforge/base/envs/claw
>>>>>>>>> [0]PETSC ERROR: #1 PetscShmgetMapAddresses() at /Users/runner/miniforge3/conda-bld/petsc_1728030427805/work/src/sys/utils/server.c:114
>>>>>>>>> [0]PETSC ERROR: #2 PCMPISetMat() at /Users/runner/miniforge3/conda-bld/petsc_1728030427805/work/src/ksp/pc/impls/mpi/pcmpi.c:269
>>>>>>>>> [0]PETSC ERROR: #3 PCSetUp_MPI() at /Users/runner/miniforge3/conda-bld/petsc_1728030427805/work/src/ksp/pc/impls/mpi/pcmpi.c:853
>>>>>>>>> [0]PETSC ERROR: #4 PCSetUp() at /Users/runner/miniforge3/conda-bld/petsc_1728030427805/work/src/ksp/pc/interface/precon.c:1071
>>>>>>>>> [0]PETSC ERROR: #5 KSPSetUp() at /Users/runner/miniforge3/conda-bld/petsc_1728030427805/work/src/ksp/ksp/interface/itfunc.c:415
>>>>>>>>> [0]PETSC ERROR: #6 KSPSolve_Private() at /Users/runner/miniforge3/conda-bld/petsc_1728030427805/work/src/ksp/ksp/interface/itfunc.c:826
>>>>>>>>> [0]PETSC ERROR: #7 KSPSolve() at /Users/runner/miniforge3/conda-bld/petsc_1728030427805/work/src/ksp/ksp/interface/itfunc.c:1075
>>>>>>>>>
>>>>>>>>> Code does not progress and I kill it
>>>>>>>>>
>>>>>>>>> ^CTraceback (most recent call last):
>>>>>>>>> File "/Users/praveen/Applications/clawpack/clawutil/src/python/clawutil/runclaw.py", line 341, in <module>
>>>>>>>>> runclaw(*args)
>>>>>>>>> File "/Users/praveen/Applications/clawpack/clawutil/src/python/clawutil/runclaw.py", line 242, in runclaw
>>>>>>>>> proc = subprocess.check_call(cmd_split,
>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>> File "/opt/homebrew/Caskroom/miniforge/base/envs/claw/lib/python3.12/subprocess.py", line 408, in check_call
>>>>>>>>> retcode = call(*popenargs, **kwargs)
>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>> File "/opt/homebrew/Caskroom/miniforge/base/envs/claw/lib/python3.12/subprocess.py", line 391, in call
>>>>>>>>> return p.wait(timeout=timeout)
>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>> File "/opt/homebrew/Caskroom/miniforge/base/envs/claw/lib/python3.12/subprocess.py", line 1264, in wait
>>>>>>>>> return self._wait(timeout=timeout)
>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>> File "/opt/homebrew/Caskroom/miniforge/base/envs/claw/lib/python3.12/subprocess.py", line 2053, in _wait
>>>>>>>>> (pid, sts) = self._try_wait(0)
>>>>>>>>> ^^^^^^^^^^^^^^^^^
>>>>>>>>> File "/opt/homebrew/Caskroom/miniforge/base/envs/claw/lib/python3.12/subprocess.py", line 2011, in _try_wait
>>>>>>>>> (pid, sts) = os.waitpid(self.pid, wait_flags)
>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>> KeyboardInterrupt
>>>>>>>>> make[1]: *** [output] Interrupt: 2
>>>>>>>>> make: *** [.output] Interrupt: 2
>>>>>>>>>
>>>>>>>>> Now it says
>>>>>>>>>
>>>>>>>>> $ ipcs -m
>>>>>>>>> IPC status from <running system> as of Thu Oct 24 08:05:06 IST 2024
>>>>>>>>> T ID KEY MODE OWNER GROUP
>>>>>>>>> Shared Memory:
>>>>>>>>> m 720896 0x0000000a --rw-rw-rw- praveen staff
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> praveen
>>>>>>>>>
>>>>>>>>>> On 23 Oct 2024, at 8:26 PM, Barry Smith <bsmith at petsc.dev> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hmm, so it is creating the first shared memory region with ID of 10 (A in hex) puts it in a linked list in PETSc but then when it tries to find it in the linked list it cannot find it.
>>>>>>>>>>
>>>>>>>>>> I don't know how to reproduce this or debug it remotely.
>>>>>>>>>>
>>>>>>>>>> Can you build on a completely different machine or with completely different compilers?
>>>>>>>>>>
>>>>>>>>>> Barry
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> On Oct 23, 2024, at 10:31 AM, Praveen C <cpraveen at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> I get same error and now it shows
>>>>>>>>>>>
>>>>>>>>>>> $ ipcs -m
>>>>>>>>>>>
>>>>>>>>>>> ------ Shared Memory Segments --------
>>>>>>>>>>> key shmid owner perms bytes nattch status
>>>>>>>>>>> 0x0000000a 32788 praveen 666 240 6
>>>>>>>>>>>
>>>>>>>>>>> Note that the code seems to be still running after printing those error message, but it is not printing any progress which it should do.
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>> praveen
>>>>>>>>>>>
>>>>>>>>>>>> On 23 Oct 2024, at 7:56 PM, Barry Smith <bsmith at petsc.dev> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Try
>>>>>>>>>>>>
>>>>>>>>>>>> ipcrm -m 11
>>>>>>>>>>>>
>>>>>>>>>>>> ipcs -m
>>>>>>>>>>>>
>>>>>>>>>>>> Try running the program again
>>>>>>>>>>>>
>>>>>>>>>>>> If failed check
>>>>>>>>>>>>
>>>>>>>>>>>> ipcs -m
>>>>>>>>>>>>
>>>>>>>>>>>> again
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> On Oct 23, 2024, at 10:20 AM, Praveen C <cpraveen at gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hello Barry
>>>>>>>>>>>>>
>>>>>>>>>>>>> I see this
>>>>>>>>>>>>>
>>>>>>>>>>>>> $ ipcs -m
>>>>>>>>>>>>>
>>>>>>>>>>>>> ------ Shared Memory Segments --------
>>>>>>>>>>>>> key shmid owner perms bytes nattch status
>>>>>>>>>>>>> 0x0000000a 11 praveen 666 240 6
>>>>>>>>>>>>>
>>>>>>>>>>>>> and I am observing same error as below.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>> praveen
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 23 Oct 2024, at 7:08 PM, Barry Smith <bsmith at petsc.dev> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Please take a look at the notes in https://urldefense.us/v3/__https://petsc.org/release/manualpages/Sys/PetscShmgetAllocateArray/__;!!G_uCfscf7eWS!eiByYM9OWRuz3FQcylxVZQPskjJAYk5n9fzr1ZlHLo8_rF0ZbemCaV8fPWmYHH3mmUMBSJyaYkhhxsJB5aYdAxo$ For some reason your program is not able to access/use the Unix shared memory; check if you are already using the shared memory (so it is not available for a new run) or the limits are too low to access enough memory.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Barry
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Oct 23, 2024, at 8:23 AM, Praveen C <cpraveen at gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Dear all
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I am not able to run the boussinesq example from geoclaw using petsc at 3.22.0
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> https://urldefense.us/v3/__https://github.com/clawpack/geoclaw/tree/3303883f46572c58130d161986b8a87a57ca7816/examples/bouss__;!!G_uCfscf7eWS!e3VQ4NHKmXGstRsQW5vtI7fmKfUT9zmJkMJcPbcvPyIjicyfJpNoMgx3wZ-qyGcKNSjIkNZkzilec8MnHN6PMw$
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It runs with petsc at 3.21.6
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The error I get is given below. After printing this, the code does not progress.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I use the following petsc options
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # set min numbers of matrix rows per MPI rank (default is 10000)
>>>>>>>>>>>>>>> -mpi_linear_solve_minimum_count_per_rank 5000
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # Krylov linear solver:
>>>>>>>>>>>>>>> -mpi_linear_solver_server
>>>>>>>>>>>>>>> -mpi_linear_solver_server_view
>>>>>>>>>>>>>>> -ksp_type gmres
>>>>>>>>>>>>>>> -ksp_max_it 200
>>>>>>>>>>>>>>> -ksp_reuse_preconditioner
>>>>>>>>>>>>>>> -ksp_rtol 1.e-9
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # preconditioner:
>>>>>>>>>>>>>>> -pc_type gamg
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I installed petsc and other dependencies for clawpack using miniforge.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>> pc
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ==> Use Bouss. in water deeper than 1.0000000000000000 Using a PETSc solver
>>>>>>>>>>>>>>> Using Bouss equations from the start
>>>>>>>>>>>>>>> rnode allocated...
>>>>>>>>>>>>>>> node allocated...
>>>>>>>>>>>>>>> listOfGrids allocated...
>>>>>>>>>>>>>>> Storage allocated...
>>>>>>>>>>>>>>> bndList allocated...
>>>>>>>>>>>>>>> Gridding level 1 at t = 0.000000E+00: 4 grids with 10000 cells
>>>>>>>>>>>>>>> Setting initial dt to 2.9999999999999999E-002
>>>>>>>>>>>>>>> max threads set to 6
>>>>>>>>>>>>>>> Done reading data, starting computation ... Total zeta at initial time: 39269.907650665169 GEOCLAW: Frame 0 output files done at time t = 0.000000D+00
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
>>>>>>>>>>>>>>> [0]PETSC ERROR: Petsc has generated inconsistent data
>>>>>>>>>>>>>>> [0]PETSC ERROR: Unable to locate PCMPI allocated shared address 0x55e6d750ae20
>>>>>>>>>>>>>>> [0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc!
>>>>>>>>>>>>>>> [0]PETSC ERROR: Option left: name:-ksp_max_it value: 200 source: file
>>>>>>>>>>>>>>> [0]PETSC ERROR: Option left: name:-ksp_reuse_preconditioner (no value) source: file
>>>>>>>>>>>>>>> [0]PETSC ERROR: Option left: name:-ksp_rtol value: 1.e-9 source: file
>>>>>>>>>>>>>>> [0]PETSC ERROR: Option left: name:-ksp_type value: gmres source: file
>>>>>>>>>>>>>>> [0]PETSC ERROR: Option left: name:-mpi_linear_solve_minimum_count_per_rank value: 5000 source: file
>>>>>>>>>>>>>>> [0]PETSC ERROR: Option left: name:-mpi_linear_solver_server_view (no value) source: file
>>>>>>>>>>>>>>> [0]PETSC ERROR: Option left: name:-pc_type value: gamg source: file
>>>>>>>>>>>>>>> [0]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!e3VQ4NHKmXGstRsQW5vtI7fmKfUT9zmJkMJcPbcvPyIjicyfJpNoMgx3wZ-qyGcKNSjIkNZkzilec8MvNjNo7A$ for trouble shooting.
>>>>>>>>>>>>>>> [0]PETSC ERROR: Petsc Release Version 3.22.0, Sep 28, 2024 [0]PETSC ERROR: /home/praveen/bouss/radial_flat/xgeoclaw with 6 MPI process(es) and PETSC_ARCH on euler by praveen Thu Oct 17 21:49:54 2024
>>>>>>>>>>>>>>> [0]PETSC ERROR: Configure options: AR=${PREFIX}/bin/x86_64-conda-linux-gnu-ar CC=mpicc CXX=mpicxx FC=mpifort CFLAGS="-march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /opt/miniforge/envs/claw/include " CPPFLAGS="-DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /opt/miniforge/envs/claw/include" CXXFLAGS="-fvisibility-inlines-hidden -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /opt/miniforge/envs/claw/include " FFLAGS="-march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /opt/miniforge/envs/claw/include -Wl,--no-as-needed" LDFLAGS="-pthread -fopenmp -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined -Wl,-rpath,/opt/miniforge/envs/claw/lib -Wl,-rpath-link,/opt/miniforge/envs/claw/lib -L/opt/miniforge/envs/claw/lib -Wl,-rpath-link,/opt/miniforge/envs/claw/lib" LIBS="-Wl,-rpath,/opt/miniforge/envs/claw/lib -lmpi_mpifh -lgfortran" --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3 --with-clib-autodetect=0 --with-cxxlib-autodetect=0 --with-fortranlib-autodetect=0 --with-debugging=0 --with-blas-lib=libblas.so --with-lapack-lib=liblapack.so --with-yaml=1 --with-hdf5=1 --with-fftw=1 --with-hwloc=0 --with-hypre=1 --with-metis=1 --with-mpi=1 --with-mumps=1 --with-parmetis=1 --with-pthread=1 --with-ptscotch=1 --with-shared-libraries --with-ssl=0 --with-scalapack=1 --with-superlu=1 --with-superlu_dist=1 --with-superlu_dist-include=/opt/miniforge/envs/claw/include/superlu-dist --with-superlu_dist-lib=-lsuperlu_dist --with-suitesparse=1 --with-suitesparse-dir=/opt/miniforge/envs/claw --with-x=0 --with-scalar-type=real --with-cuda=0 --prefix=/opt/miniforge/envs/claw
>>>>>>>>>>>>>>> [0]PETSC ERROR: #1 PetscShmgetMapAddresses() at /home/conda/feedstock_root/build_artifacts/petsc_1728030599661/work/src/sys/utils/server.c:114
>>>>>>>>>>>>>>> [0]PETSC ERROR: #2 PCMPISetMat() at /home/conda/feedstock_root/build_artifacts/petsc_1728030599661/work/src/ksp/pc/impls/mpi/pcmpi.c:269
>>>>>>>>>>>>>>> [0]PETSC ERROR: #3 PCSetUp_MPI() at /home/conda/feedstock_root/build_artifacts/petsc_1728030599661/work/src/ksp/pc/impls/mpi/pcmpi.c:853
>>>>>>>>>>>>>>> [0]PETSC ERROR: #4 PCSetUp() at /home/conda/feedstock_root/build_artifacts/petsc_1728030599661/work/src/ksp/pc/interface/precon.c:1071
>>>>>>>>>>>>>>> [0]PETSC ERROR: #5 KSPSetUp() at /home/conda/feedstock_root/build_artifacts/petsc_1728030599661/work/src/ksp/ksp/interface/itfunc.c:415
>>>>>>>>>>>>>>> [0]PETSC ERROR: #6 KSPSolve_Private() at /home/conda/feedstock_root/build_artifacts/petsc_1728030599661/work/src/ksp/ksp/interface/itfunc.c:826
>>>>>>>>>>>>>>> [0]PETSC ERROR: #7 KSPSolve() at /home/conda/feedstock_root/build_artifacts/petsc_1728030599661/work/src/ksp/ksp/interface/itfunc.c:1075
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20241031/30ababf3/attachment-0001.html>
More information about the petsc-users
mailing list