[petsc-users] AMPERE80

Matthew Knepley knepley at gmail.com
Fri Apr 16 09:24:04 CDT 2021


Can you get a stack trace?

  Matt

On Fri, Apr 16, 2021 at 10:19 AM Mark Adams <mfadams at lbl.gov> wrote:

> That seems to have changed it. No stack trace.
>
> srun  -G 1 -c 2 -n 1  ./ex2 -petscspace_degree 3 -ex2_test_type spitzer
> -dm_landau_Ez 0 -dm_landau_ion_masses .01 -dm_landau_ion_charges 1
> -dm_landau_thermal_temps 2,1 -dm_landau_n 1,1 -ts_type beuler
> -ts_exact_final_time stepover -ts_max_steps 2 -ts_dt 1 -ts_monitor
> -snes_monitor -snes_max_it 25 -snes_rtol 1.e-14 -snes_stol 1.e-14 -pc_type
> lu -ksp_type preonly -dm_landau_type p4est -dm_landau_amr_levels_max 13
> -dm_landau_amr_post_refine 1 -dm_preallocate_only -ex2_plot_dt .0001
> -dm_landau_device_type cuda -dm_mat_type aijcusparse -dm_vec_type cuda
> -display :0.0
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Caught signal number 4 Illegal instruction: Likely due to
> memory corruption
> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [0]PETSC ERROR: or see
> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS
> X to find memory corruption errors
> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and
> run
> [0]PETSC ERROR: to get more information on the crash.
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: Signal received
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [0]PETSC ERROR: Petsc Development GIT revision: v3.15.0-205-g2283897782
>  GIT Date: 2021-04-15 09:38:17 -0400
> [0]PETSC ERROR:
> /global/u2/m/madams/petsc/src/ts/utils/dmplexlandau/tutorials/./ex2 on a
> arch-cori-gpu80-opt-kokkos-gcc named cgpu19 by madams Fri Apr 16 07:16:28
> 2021
> [0]PETSC ERROR: Configure options
> --with-mpi-dir=/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc
> --with-cuda-dir=/usr/common/software/sles15_cgpu/cuda/11.1.1 CFLAGS="   -g
> -DPETSC_HAVE_CUDA_ATOMIC" CXXFLAGS=" -g -DPETSC_HAVE_CUDA_ATOMIC" FFLAGS="
>   -g " COPTFLAGS="   -O" CXXOPTFLAGS=" -O" FOPTFLAGS="   -O"
> --CUDAFLAGS="-arch=sm_80 -Xcompiler -rdynamic -lineinfo
> -DPETSC_HAVE_CUDA_ATOMIC -g" --CUDAOPTFLAGS=-O3 --download-fblaslapack=1
> --with-debugging=0 --download-kokkos --download-kokkos-kernels
> --with-kokkos-cuda-arch=AMPERE80 --with-kokkos-kernels-tpl=0
> --with-make-np=8 --with-ctable=0 --with-mpiexec="srun -G 1 -c 2"
> --with-batch=0 PETSC_ARCH=arch-cori-gpu80-opt-kokkos-gcc --with-cuda=1
> --download-p4est=1 --with-zlib=1
> [0]PETSC ERROR: #1 User provided function() at unknown file:0
>
> On Fri, Apr 16, 2021 at 10:02 AM Matthew Knepley <knepley at gmail.com>
> wrote:
>
>> On Fri, Apr 16, 2021 at 9:58 AM Mark Adams <mfadams at lbl.gov> wrote:
>>
>>> I am running on a new AMPERE80 node at NERSc and get this error message.
>>> I put this in totalview but did not get anything useful.
>>>
>>> Any ideas?
>>>
>>
>> Maybe getenv() is failing?
>>
>> You can shutoff this behavior using
>>
>>   -display :0.0
>>
>>    Matt
>>
>>
>>> cgpu19:228520:0:228520] Caught signal 4 (Illegal instruction: illegal
>>> operand)
>>> ==== backtrace (tid: 228520) ====
>>>  0
>>>  /usr/common/software/sles15_cgpu/ucx/1.8.1/lib/libucs.so.0(ucs_handle_error+0x2e4)
>>> [0x2aab2e9a2ac4]
>>>  1  /usr/common/software/sles15_cgpu/ucx/1.8.1/lib/libucs.so.0(+0x21cc4)
>>> [0x2aab2e9a2cc4]
>>>  2  /usr/common/software/sles15_cgpu/ucx/1.8.1/lib/libucs.so.0(+0x21d33)
>>> [0x2aab2e9a2d33]
>>>  3
>>>  /global/homes/m/madams/petsc/arch-cori-gpu80-opt-kokkos-gcc/lib/libpetsc.so.3.015(PetscSetDisplay+0x152)
>>> [0x2aaaaaf19ab1]
>>>  4
>>>  /global/homes/m/madams/petsc/arch-cori-gpu80-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x21c7bc)
>>> [0x2aaaaaeef7bc]
>>>  5
>>>  /global/homes/m/madams/petsc/arch-cori-gpu80-opt-kokkos-gcc/lib/libpetsc.so.3.015(
>>> *PetscInitialize*+0x449) [0x2aaaaaef5278]
>>>  6
>>>  /global/u2/m/madams/petsc/src/ts/utils/dmplexlandau/tutorials/./ex2()
>>> [0x405b62]
>>>  7  /lib64/libc.so.6(__libc_start_main+0xea) [0x2aab1344df8a]
>>>  8
>>>  /global/u2/m/madams/petsc/src/ts/utils/dmplexlandau/tutorials/./ex2()
>>> [0x4026aa]
>>> =================================
>>> srun: error: cgpu19: task 0: Illegal instruction
>>> srun: Terminating job step 1821681.10
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210416/e7977bd1/attachment-0001.html>


More information about the petsc-users mailing list