[petsc-users] SuperLU + GPUs
    Mark Adams 
    mfadams at lbl.gov
       
    Sun Apr 19 10:51:59 CDT 2020
    
    
  
Ahhh, thanks,
OK, now I am able to reproduce the error in the test. I can work on that,
Thanks again,
On Sun, Apr 19, 2020 at 11:45 AM Satish Balay <balay at mcs.anl.gov> wrote:
> > *[0]PETSC ERROR: Could not locate solver package superlu for
> factorization
>
> Here you are requesting 'superlu' - instead of 'superlu_dist' - hence this
> error.
>
> Satish
>
> On Sun, 19 Apr 2020, Mark Adams wrote:
>
> > >
> > >
> > >
> > > > > --download-superlu --download-superlu_dist
> > >
> > > You are installing with both superlu and superlu_dist. To verify -
> remove
> > > superlu - and keep only superlu_dist
> > >
> >
> > I tried this earlier. Here is the error message:
> >
> >    0 SNES Function norm 1.511918966798e-02
> > [0]PETSC ERROR: --------------------- Error Message
> > --------------------------------------------------------------
> > [0]PETSC ERROR: See
> > https://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html for
> > possible LU and Cholesky solvers
> >
> > *[0]PETSC ERROR: Could not locate solver package superlu for
> factorization
> > type LU and matrix type seqaij. Perhaps you must ./configure with
> > --download-superlu*[0]PETSC ERROR: See
> > https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble
> shooting.
> > [0]PETSC ERROR: Petsc Development GIT revision: v3.13-163-g4c71feb  GIT
> > Date: 2020-04-18 15:35:50 -0400
> > [0]PETSC ERROR: ./ex112d on a arch-summit-opt-gnu-cuda-omp-2db named
> h23n05
> > by adams Sun Apr 19 11:39:05 2020
> > [0]PETSC ERROR: Configure options --with-fc=0 --COPTFLAGS="-g -O2 -fPIC
> > -fopenmp -DFP_DIM=2" --CXXOPTFLAGS="-g -O2 -fPIC -fopenmp"
> --FOPTFLAGS="-g
> > -O2 -fPIC -fopenmp" --CUDAOPTFLAGS="-O2 -g" --with-ssl=0 --with-batch=0
> > --with-cxx=mpicxx --with-mpiexec="jsrun -g1" --with-cuda=1
> > --with-cudac=nvcc --download-p4est=1 --download-zlib --download-hdf5=1
> > --download-metis --download-superlu_dist --with-make-np=16
> > --download-parmetis --download-triangle
> >
> --with-blaslapack-lib="-L/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/netlib-lapack-3.8.0-wcabdyqhdi5rooxbkqa6x5d7hxyxwdkm/lib64
> > -lblas -llapack" --with-cc=mpicc --with-shared-libraries=1 --with-x=0
> > --with-64-bit-indices=0 --with-debugging=0
> > PETSC_ARCH=arch-summit-opt-gnu-cuda-omp-2db --with-openmp=1
> > --with-threadsaftey=1 --with-log=1
> > [0]PETSC ERROR: #1 MatGetFactor() line 4490 in
> > /autofs/nccs-svm1_home1/adams/petsc/src/mat/interface/matrix.c
> > [0]PETSC ERROR: #2 PCSetUp_LU() line 88 in
> > /autofs/nccs-svm1_home1/adams/petsc/src/ksp/pc/impls/factor/lu/lu.c
> > [0]PETSC ERROR: #3 PCSetUp() line 894 in
> > /autofs/nccs-svm1_home1/adams/petsc/src/ksp/pc/interface/precon.c
> > [0]PETSC ERROR: #4 KSPSetUp() line 376 in
> > /autofs/nccs-svm1_home1/adams/petsc/src/ksp/ksp/interface/itfunc.c
> > [0]PETSC ERROR: #5 KSPSolve_Private() line 633 in
> > /autofs/nccs-svm1_home1/adams/petsc/src/ksp/ksp/interface/itfunc.c
> > [0]PETSC ERROR: #6 KSPSolve() line 853 in
> > /autofs/nccs-svm1_home1/adams/petsc/src/ksp/ksp/interface/itfunc.c
> > [0]PETSC ERROR: #7 SNESSolve_NEWTONLS() line 225 in
> > /autofs/nccs-svm1_home1/adams/petsc/src/snes/impls/ls/ls.c
> > [0]PETSC ERROR: #8 SNESSolve() line 4520 in
> > /autofs/nccs-svm1_home1/adams/petsc/src/snes/interface/snes.c
> > [0]PETSC ERROR: #9 TSStep_ARKIMEX() line 811 in
> > /autofs/nccs-svm1_home1/adams/petsc/src/ts/impls/arkimex/arkimex.c
> > [0]PETSC ERROR: #10 TSStep() line 3721 in
> > /autofs/nccs-svm1_home1/adams/petsc/src/ts/interface/ts.c
> > [0]PETSC ERROR: #11 TSSolve() line 4127 in
> > /autofs/nccs-svm1_home1/adams/petsc/src/ts/interface/ts.c
> > [0]PETSC ERROR: #12 main() line 955 in ex11.c
> >
> >
> > >
> > > Satish
> > >
> > >
> > > >
> > > >
> > > > >
> > > > > SuperLU:
> > > > >   Version:  5.2.1
> > > > >   Includes:
> > > -I/ccs/home/adams/petsc/arch-summit-opt-gnu-cuda-omp/include
> > > > >   Library:
> > > -Wl,-rpath,/ccs/home/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib
> > > > > -L/ccs/home/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib -lsuperlu
> > > > >
> > > > > which is serial superlu, not superlu_dist.   These are 2 different
> > > codes.
> > > > >
> > > > > Sherry
> > > > >
> > > > > On Sat, Apr 18, 2020 at 4:54 PM Mark Adams <mfadams at lbl.gov>
> wrote:
> > > > >
> > > > >>
> > > > >>
> > > > >> On Sat, Apr 18, 2020 at 3:05 PM Xiaoye S. Li <xsli at lbl.gov>
> wrote:
> > > > >>
> > > > >>> Mark,
> > > > >>>
> > > > >>> It seems you are talking about serial superlu?   There is no GPU
> > > support
> > > > >>> in it.  Only superlu_dist has GPU.
> > > > >>>
> > > > >>
> > > > >> I am using superlu_dist on one processor. Should that work?
> > > > >>
> > > > >>
> > > > >>>
> > > > >>> But I don't know why there is a crash.
> > > > >>>
> > > > >>> Sherry
> > > > >>>
> > > > >>> On Sat, Apr 18, 2020 at 11:44 AM Mark Adams <mfadams at lbl.gov>
> wrote:
> > > > >>>
> > > > >>>> Sherry, I did rebase with master this week:
> > > > >>>>
> > > > >>>> SuperLU:
> > > > >>>>   Version:  5.2.1
> > > > >>>>   Includes:
> > > -I/ccs/home/adams/petsc/arch-summit-opt-gnu-cuda-omp/include
> > > > >>>>   Library:
> > > > >>>>
> -Wl,-rpath,/ccs/home/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib
> > > > >>>> -L/ccs/home/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib
> -lsuperlu
> > > > >>>>
> > > > >>>> I see the same thing with a debug build.
> > > > >>>>
> > > > >>>> If anyone is interested in looking at this, I was also able to
> see
> > > that
> > > > >>>> plex/ex10 in my branch, which is a very simple test , also does
> not
> > > crash
> > > > >>>> and also does not seem to use GPUs in SuperLU.
> > > > >>>>
> > > > >>>>
> > > > >>>> On Sat, Apr 18, 2020 at 11:46 AM Xiaoye S. Li <xsli at lbl.gov>
> wrote:
> > > > >>>>
> > > > >>>>> When you install "-download-superlu_dist", that is from
> 'master'
> > > > >>>>> branch?
> > > > >>>>>
> > > > >>>>> In the error trace, I recognized this:
> > > > >>>>>
> > > > >>>>> > [h50n09:102287] [ 9] /ccs/home/adams/petsc/arch-
> > > > >>>>> summit-opt-gnu-cuda-omp/lib/libsuperlu_dist.so.6(dDestroy_
> > > > >>>>> LU+0xc4)[0x20000195aff4]
> > > > >>>>>
> > > > >>>>> This is to free the L and U data structures at the end of the
> > > program.
> > > > >>>>>
> > > > >>>>> Sherry
> > > > >>>>>
> > > > >>>>> On Sat, Apr 18, 2020 at 7:24 AM Mark Adams <mfadams at lbl.gov>
> > > wrote:
> > > > >>>>>
> > > > >>>>>> Back to SuperLU + GPUs (adding Sherry)
> > > > >>>>>>
> > > > >>>>>> I get this error (appended) running 'check', as I said
> before. It
> > > > >>>>>> looks like ex19 is *failing* with CUDA but it is not clear it
> has
> > > > >>>>>> anything to do with SuperLU. I can not find these diagnostics
> > > that got
> > > > >>>>>> printed after the error in PETSc or SuperLU.
> > > > >>>>>>
> > > > >>>>>> So this is a problem, but moving on to my code (plex/ex11 in
> > > > >>>>>> mark/feature-xgc-interface-rebase-v2, configure script
> appended).
> > > It runs.
> > > > >>>>>> I use superlu and GPUs, but they do not seem to be used in
> > > SuperLU:
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > >
> ------------------------------------------------------------------------------------------------------------------------
> > > > >>>>>> Event                Count      Time (sec)     Flop
> > > > >>>>>>            --- Global ---  --- Stage ----  Total   GPU    -
> > > CpuToGpu -   -
> > > > >>>>>> GpuToCpu - GPU
> > > > >>>>>>                    Max Ratio  Max     Ratio   Max  Ratio  Mess
> > > > >>>>>> AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s Mflop/s
> > > Count   Size
> > > > >>>>>>   Count   Size  %F
> > > > >>>>>>
> > > > >>>>>>
> > >
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
> > > > >>>>>>  ....
> > > > >>>>>> MatLUFactorNum        12 1.0 *2.3416e+01* 1.0 0.00e+00 0.0
> 0.0e+00
> > > > >>>>>> 0.0e+00 0.0e+00 31  0  0  0  0  31  0  0  0  0     0       0
> > > *0
> > > > >>>>>> 0.00e+00    0 0.00e+00  0*
> > > > >>>>>>
> > > > >>>>>> No CUDA version. The times are the same and no GPU
> > > > >>>>>> communication above. So SuperLU does not seem to be using
> GPUs.
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > >
> ------------------------------------------------------------------------------------------------------------------------
> > > > >>>>>> Event                Count      Time (sec)     Flop
> > > > >>>>>>            --- Global ---  --- Stage ----  Total
> > > > >>>>>>                    Max Ratio  Max     Ratio   Max  Ratio  Mess
> > > > >>>>>> AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> > > > >>>>>>
> > > > >>>>>>
> > >
> ------------------------------------------------------------------------------------------------------------------------
> > > > >>>>>>  ....
> > > > >>>>>> MatLUFactorNum        12 1.0 *2.3421e+01* 1.0 0.00e+00 0.0
> 0.0e+00
> > > > >>>>>> 0.0e+00 0.0e+00  5  0  0  0  0   5  0  0  0  0     0
> > > > >>>>>>
> > > > >>>>>> There are some differences: ex19 use DMDA and I use DMPlex,
> > > 'check'
> > > > >>>>>> is run in my home directory, where files can not be written,
> and
> > > I run my
> > > > >>>>>> code in the project areas.
> > > > >>>>>>
> > > > >>>>>> The timings are different without superlu so I think superlu
> is
> > > being
> > > > >>>>>> used.  THis is how I run this (w and w/o -mat_superlu_equil
> > > -dm_mat_type
> > > > >>>>>> sell)
> > > > >>>>>>
> > > > >>>>>> jsrun -n 1 -a 1 -c 2 -g 1 ./ex113d_no_cuda -dim 3 -dm_view
> > > > >>>>>> hdf5:re33d.h5 -vec_view hdf5:re33d.h5::append -test_type
> spitzer
> > > -Ez 0
> > > > >>>>>> -petscspace_degree 2 -mass_petscspace_degree 2
> > > -petscspace_poly_tensor 1
> > > > >>>>>> -mass_petscspace_poly_tensor 1 -dm_type p8est -ion_masses 4
> > > -ion_charges 2
> > > > >>>>>> -thermal_temps 4,4 -n 1,.5 -n_0 1e20 -ts_monitor
> -ts_adapt_monitor
> > > > >>>>>> -snes_rtol 1.e-6 -snes_stol 1.e-9 -snes_monitor
> > > -snes_converged_reason
> > > > >>>>>> -snes_max_it 15 -ts_type arkimex -ts_exact_final_time stepover
> > > > >>>>>> -ts_arkimex_type 1bee -ts_max_snes_failures -1 -ts_rtol 1e-3
> > > -ts_dt 1e-1
> > > > >>>>>> -ts_adapt_clip .25,1.05 -ts_adapt_dt_max 10 -ts_adapt_dt_min
> 2e-2
> > > > >>>>>> -ts_max_time 3200 -ts_max_steps 1 -ts_adapt_scale_solve_failed
> > > 0.75
> > > > >>>>>> -ts_adapt_time_step_increase_delay 5 -pc_type lu -ksp_type
> preonly
> > > > >>>>>> -amr_levels_max 11 -amr_re_levels 0 -amr_z_refine1 0
> > > -amr_z_refine2 0
> > > > >>>>>> -amr_post_refine 0 -domain_radius -.95 -re_radius 4
> -z_radius1 8
> > > -z_radius2
> > > > >>>>>> .1 -plot_dt .10 -impurity_source_type pulse -pulse_start_time
> 2600
> > > > >>>>>> -pulse_width_time 100 -pulse_rate 1e+0 -t_cold .005 -info
> > > :dm,tsadapt:
> > > > >>>>>> -sub_thread_block_size 4 -options_left -log_view
> > > -pc_factor_mat_solver_type
> > > > >>>>>> superlu -mat_superlu_equil -dm_mat_type sell
> > > > >>>>>>
> > > > >>>>>> So there is a bug in ex19 on SUMMIT and I am not getting GPUs
> > > turned
> > > > >>>>>> on in SuperLU.
> > > > >>>>>> Thoughts?
> > > > >>>>>>
> > > > >>>>>> Thanks,
> > > > >>>>>> Mark
> > > > >>>>>>
> > > > >>>>>> 09:28 mark/feature-xgc-interface-rebase-v2 *= ~/petsc$ make
> > > > >>>>>> PETSC_DIR=/ccs/home/adams/petsc
> > > PETSC_ARCH=arch-summit-opt-gnu-cuda-omp
> > > > >>>>>> check
> > > > >>>>>> Running check examples to verify correct installation
> > > > >>>>>> Using PETSC_DIR=/ccs/home/adams/petsc and
> > > > >>>>>> PETSC_ARCH=arch-summit-opt-gnu-cuda-omp
> > > > >>>>>> C/C++ example src/snes/tutorials/ex19 run successfully with 1
> MPI
> > > > >>>>>> process
> > > > >>>>>> C/C++ example src/snes/tutorials/ex19 run successfully with 2
> MPI
> > > > >>>>>> processes
> > > > >>>>>> 2c2,39
> > > > >>>>>> < Number of SNES iterations = 2
> > > > >>>>>> ---
> > > > >>>>>>
> > > > >>>>>> *> ex19: cudahook.cc:762: CUresult host_free_callback(void*):
> > > > >>>>>> Assertion `cacheNode != __null' failed.*> [h50n09:102287] ***
> > > > >>>>>> Process received signal ***
> > > > >>>>>> > CUDA version:   v 10010
> > > > >>>>>> > CUDA Devices:
> > > > >>>>>> >
> > > > >>>>>> > 0 : Tesla V100-SXM2-16GB 7 0
> > > > >>>>>> >   Global memory:   16128 mb
> > > > >>>>>> >   Shared memory:   48 kb
> > > > >>>>>> >   Constant memory: 64 kb
> > > > >>>>>> >   Block registers: 65536
> > > > >>>>>> >
> > > > >>>>>> > [h50n09:102287] Signal: Aborted (6)
> > > > >>>>>> > [h50n09:102287] Associated errno: Unknown error 1072693248
> > > > >>>>>> (1072693248)
> > > > >>>>>> > [h50n09:102287] Signal code: User function (kill, sigsend,
> > > abort,
> > > > >>>>>> etc.) (0)
> > > > >>>>>> > [h50n09:102287] [ 0] [0x2000000504d8]
> > > > >>>>>> > [h50n09:102287] [ 1]
> > > /lib64/libc.so.6(abort+0x2b4)[0x200021bf2094]
> > > > >>>>>> > [h50n09:102287] [ 2]
> /lib64/libc.so.6(+0x356d4)[0x200021be56d4]
> > > > >>>>>> > [h50n09:102287] [ 3]
> > > > >>>>>> /lib64/libc.so.6(__assert_fail+0x64)[0x200021be57c4]
> > > > >>>>>> > [h50n09:102287] [ 4]
> > > > >>>>>>
> > >
> /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/container/../lib/libpami_cudahook.so(_Z18host_free_callbackPv+0x2d8)[0x2000000cd2c8]
> > > > >>>>>> > [h50n09:102287] [ 5]
> > > > >>>>>>
> > >
> /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/container/../lib/libpami_cudahook.so(cuMemFreeHost+0xb0)[0x2000000c3cc0]
> > > > >>>>>> > [h50n09:102287] [ 6]
> > > > >>>>>>
> > >
> /sw/summit/cuda/10.1.243/lib64/libcudart.so.10.1(+0x42f50)[0x20000ed02f50]
> > > > >>>>>> > [h50n09:102287] [ 7]
> > > > >>>>>>
> > >
> /sw/summit/cuda/10.1.243/lib64/libcudart.so.10.1(+0x11db8)[0x20000ecd1db8]
> > > > >>>>>> > [h50n09:102287] [ 8]
> > > > >>>>>>
> > >
> /sw/summit/cuda/10.1.243/lib64/libcudart.so.10.1(cudaFreeHost+0x74)[0x20000ed12ea4]
> > > > >>>>>> > [h50n09:102287] [ 9]
> > > > >>>>>>
> > >
> /ccs/home/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libsuperlu_dist.so.6(dDestroy_LU+0xc4)[0x20000195aff4]
> > > > >>>>>> > [h50n09:102287] [10]
> > > > >>>>>>
> > >
> /ccs/home/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.013(+0x7cdb70)[0x2000008bdb70]
> > > > >>>>>> > [h50n09:102287] [11]
> > > > >>>>>>
> > >
> /ccs/home/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.013(MatLUFactorNumeric+0x1ec)[0x2000005f1a8c]
> > > > >>>>>> > [h50n09:102287] [12]
> > > > >>>>>>
> > >
> /ccs/home/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.013(+0xbf8270)[0x200000ce8270]
> > > > >>>>>> > [h50n09:102287] [13]
> > > > >>>>>>
> > >
> /ccs/home/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.013(PCSetUp+0x1a4)[0x200000d8d5a4]
> > > > >>>>>> > [h50n09:102287] [14]
> > > > >>>>>>
> > >
> /ccs/home/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.013(KSPSetUp+0x40c)[0x200000dc498c]
> > > > >>>>>> > [h50n09:102287] [15]
> > > > >>>>>>
> > >
> /ccs/home/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.013(+0xcd56fc)[0x200000dc56fc]
> > > > >>>>>> > [h50n09:102287] [16]
> > > > >>>>>>
> > >
> /ccs/home/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.013(KSPSolve+0x20)[0x200000dc8260]
> > > > >>>>>> > [h50n09:102287] [17]
> > > > >>>>>>
> > >
> /ccs/home/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.013(+0xe0a170)[0x200000efa170]
> > > > >>>>>> > [h50n09:102287] [18]
> > > > >>>>>>
> > >
> /ccs/home/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.013(SNESSolve+0x814)[0x200000ebd394]
> > > > >>>>>> > [h50n09:102287] [19] ./ex19[0x10001a6c]
> > > > >>>>>> > [h50n09:102287] [20]
> /lib64/libc.so.6(+0x25200)[0x200021bd5200]
> > > > >>>>>> > [h50n09:102287] [21]
> > > > >>>>>> /lib64/libc.so.6(__libc_start_main+0xc4)[0x200021bd53f4]
> > > > >>>>>> > [h50n09:102287] *** End of error message ***
> > > > >>>>>> > ERROR:  One or more process (first noticed rank 0)
> terminated
> > > with
> > > > >>>>>> signal 6
> > > > >>>>>> /ccs/home/adams/petsc/src/snes/tutorials
> > > > >>>>>> Possible problem with ex19 running with superlu_dist, diffs
> above
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> #!/usr/bin/env python
> > > > >>>>>> if __name__ == '__main__':
> > > > >>>>>>   import sys
> > > > >>>>>>   import os
> > > > >>>>>>   sys.path.insert(0, os.path.abspath('config'))
> > > > >>>>>>   import configure
> > > > >>>>>>   configure_options = [
> > > > >>>>>>     '--with-fc=0',
> > > > >>>>>>     '--COPTFLAGS=-g -O2 -fPIC -fopenmp',
> > > > >>>>>>     '--CXXOPTFLAGS=-g -O2 -fPIC -fopenmp',
> > > > >>>>>>     '--FOPTFLAGS=-g -O2 -fPIC -fopenmp',
> > > > >>>>>>     '--CUDAOPTFLAGS=-O2 -g',
> > > > >>>>>>     '--with-ssl=0',
> > > > >>>>>>     '--with-batch=0',
> > > > >>>>>>     '--with-cxx=mpicxx',
> > > > >>>>>>     '--with-mpiexec=jsrun -g1',
> > > > >>>>>>     '--with-cuda=1',
> > > > >>>>>>     '--with-cudac=nvcc',
> > > > >>>>>>     '--download-p4est=1',
> > > > >>>>>>     '--download-zlib',
> > > > >>>>>>     '--download-hdf5=1',
> > > > >>>>>>     '--download-metis',
> > > > >>>>>>     '--download-superlu',
> > > > >>>>>>     '--download-superlu_dist',
> > > > >>>>>>     '--with-make-np=16',
> > > > >>>>>>     #  '--with-hwloc=0',
> > > > >>>>>>     '--download-parmetis',
> > > > >>>>>>     #  '--download-hypre',
> > > > >>>>>>     '--download-triangle',
> > > > >>>>>>     #  '--download-amgx',
> > > > >>>>>>     #  '--download-fblaslapack',
> > > > >>>>>>     '--with-blaslapack-lib=-L' +
> > > > >>>>>> os.environ['OLCF_NETLIB_LAPACK_ROOT'] + '/lib64 -lblas
> -llapack',
> > > > >>>>>>     '--with-cc=mpicc',
> > > > >>>>>>     #  '--with-fc=mpif90',
> > > > >>>>>>     '--with-shared-libraries=1',
> > > > >>>>>>     #  '--known-mpi-shared-libraries=1',
> > > > >>>>>>     '--with-x=0',
> > > > >>>>>>     '--with-64-bit-indices=0',
> > > > >>>>>>     '--with-debugging=0',
> > > > >>>>>>     'PETSC_ARCH=arch-summit-opt-gnu-cuda-omp',
> > > > >>>>>>     '--with-openmp=1',
> > > > >>>>>>     '--with-threadsaftey=1',
> > > > >>>>>>     '--with-log=1'
> > > > >>>>>>   ]
> > > > >>>>>>   configure.petsc_configure(configure_options)
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> On Wed, Apr 15, 2020 at 9:58 PM Satish Balay <
> balay at mcs.anl.gov>
> > > > >>>>>> wrote:
> > > > >>>>>>
> > > > >>>>>>> The crash is inside Superlu_DIST - so don't know what to
> suggest.
> > > > >>>>>>>
> > > > >>>>>>> Might have to debug this via debugger and check with Sherry.
> > > > >>>>>>>
> > > > >>>>>>> Satish
> > > > >>>>>>>
> > > > >>>>>>> On Wed, 15 Apr 2020, Mark Adams wrote:
> > > > >>>>>>>
> > > > >>>>>>> > Ah, OK 'check' will test SuperLU. Semi worked:
> > > > >>>>>>> >
> > > > >>>>>>> > s20:13 mark/feature-xgc-interface-rebase *= ~/petsc$ make
> > > > >>>>>>> > PETSC_DIR=/ccs/home/adams/petsc
> > > > >>>>>>> PETSC_ARCH=arch-summit-dbg-gnu-cuda-omp
> > > > >>>>>>> > check
> > > > >>>>>>> > Running check examples to verify correct installation
> > > > >>>>>>> > Using PETSC_DIR=/ccs/home/adams/petsc and
> > > > >>>>>>> > PETSC_ARCH=arch-summit-dbg-gnu-cuda-omp
> > > > >>>>>>> > C/C++ example src/snes/tutorials/ex19 run successfully
> with 1
> > > MPI
> > > > >>>>>>> process
> > > > >>>>>>> > C/C++ example src/snes/tutorials/ex19 run successfully
> with 2
> > > MPI
> > > > >>>>>>> processes
> > > > >>>>>>> > 2c2,38
> > > > >>>>>>> > < Number of SNES iterations = 2
> > > > >>>>>>> > ---
> > > > >>>>>>> > > CUDA version:   v 10010
> > > > >>>>>>> > > CUDA Devices:
> > > > >>>>>>> > >
> > > > >>>>>>> > > 0 : Tesla V100-SXM2-16GB 7 0
> > > > >>>>>>> > >   Global memory:   16128 mb
> > > > >>>>>>> > >   Shared memory:   48 kb
> > > > >>>>>>> > >   Constant memory: 64 kb
> > > > >>>>>>> > >   Block registers: 65536
> > > > >>>>>>> > >
> > > > >>>>>>> > > ex19: cudahook.cc:762: CUresult
> host_free_callback(void*):
> > > > >>>>>>> Assertion
> > > > >>>>>>> > `cacheNode != __null' failed.
> > > > >>>>>>> > > [h16n07:78357] *** Process received signal ***
> > > > >>>>>>> > > [h16n07:78357] Signal: Aborted (6)
> > > > >>>>>>> > > [h16n07:78357] Signal code:  (1704218624)
> > > > >>>>>>> > > [h16n07:78357] [ 0] [0x2000000504d8]
> > > > >>>>>>> > > [h16n07:78357] [ 1]
> > > /lib64/libc.so.6(abort+0x2b4)[0x200023992094]
> > > > >>>>>>> > > [h16n07:78357] [ 2]
> > > /lib64/libc.so.6(+0x356d4)[0x2000239856d4]
> > > > >>>>>>> > > [h16n07:78357] [ 3]
> > > > >>>>>>> /lib64/libc.so.6(__assert_fail+0x64)[0x2000239857c4]
> > > > >>>>>>> > > [h16n07:78357] [ 4]
> > > > >>>>>>> >
> > > > >>>>>>>
> > >
> /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/container/../lib/libpami_cudahook.so(_Z18host_free_callbackPv+0x2d8)[0x2000000cd2c8]
> > > > >>>>>>> > > [h16n07:78357] [ 5]
> > > > >>>>>>> >
> > > > >>>>>>>
> > >
> /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/container/../lib/libpami_cudahook.so(cuMemFreeHost+0xb0)[0x2000000c3cc0]
> > > > >>>>>>> > > [h16n07:78357] [ 6]
> > > > >>>>>>> >
> > > > >>>>>>>
> > >
> /sw/summit/cuda/10.1.243/lib64/libcudart.so.10.1(+0x42f50)[0x200010aa2f50]
> > > > >>>>>>> > > [h16n07:78357] [ 7]
> > > > >>>>>>> >
> > > > >>>>>>>
> > >
> /sw/summit/cuda/10.1.243/lib64/libcudart.so.10.1(+0x11db8)[0x200010a71db8]
> > > > >>>>>>> > > [h16n07:78357] [ 8]
> > > > >>>>>>> >
> > > > >>>>>>>
> > >
> /sw/summit/cuda/10.1.243/lib64/libcudart.so.10.1(cudaFreeHost+0x74)[0x200010ab2ea4]
> > > > >>>>>>> > > [h16n07:78357] [ 9]
> > > > >>>>>>> >
> > > > >>>>>>>
> > >
> /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libsuperlu_dist.so.6(dDestroy_LU+0x150)[0x200003188058]
> > > > >>>>>>> > > [h16n07:78357] [10]
> > > > >>>>>>> >
> > > > >>>>>>>
> > >
> /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(+0x12ebc6c)[0x2000013dbc6c]
> > > > >>>>>>> > > [h16n07:78357] [11]
> > > > >>>>>>> >
> > > > >>>>>>>
> > >
> /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(MatLUFactorNumeric+0x934)[0x200000d2fae4]
> > > > >>>>>>> > > [h16n07:78357] [12]
> > > > >>>>>>> >
> > > > >>>>>>>
> > >
> /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(+0x1cca7a4)[0x200001dba7a4]
> > > > >>>>>>> > > [h16n07:78357] [13]
> > > > >>>>>>> >
> > > > >>>>>>>
> > >
> /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(PCSetUp+0xde0)[0x200001f3f990]
> > > > >>>>>>> > > [h16n07:78357] [14]
> > > > >>>>>>> >
> > > > >>>>>>>
> > >
> /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(KSPSetUp+0x1848)[0x200001fc5594]
> > > > >>>>>>> > > [h16n07:78357] [15]
> > > > >>>>>>> >
> > > > >>>>>>>
> > >
> /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(+0x1ed9908)[0x200001fc9908]
> > > > >>>>>>> > > [h16n07:78357] [16]
> > > > >>>>>>> >
> > > > >>>>>>>
> > >
> /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(KSPSolve+0x5d0)[0x200001fcc690]
> > > > >>>>>>> > > [h16n07:78357] [17]
> > > > >>>>>>> >
> > > > >>>>>>>
> > >
> /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(+0x21e16ac)[0x2000022d16ac]
> > > > >>>>>>> > > [h16n07:78357] [18]
> > > > >>>>>>> >
> > > > >>>>>>>
> > >
> /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(SNESSolve+0x23f4)[0x2000022255c0]
> > > > >>>>>>> > > [h16n07:78357] [19] ./ex19[0x10002ac8]
> > > > >>>>>>> > > [h16n07:78357] [20]
> > > /lib64/libc.so.6(+0x25200)[0x200023975200]
> > > > >>>>>>> > > [h16n07:78357] [21]
> > > > >>>>>>> > /lib64/libc.so.6(__libc_start_main+0xc4)[0x2000239753f4]
> > > > >>>>>>> > > [h16n07:78357] *** End of error message ***
> > > > >>>>>>> > > ERROR:  One or more process (first noticed rank 0)
> terminated
> > > > >>>>>>> with signal
> > > > >>>>>>> > 6
> > > > >>>>>>> > /ccs/home/adams/petsc/src/snes/tutorials
> > > > >>>>>>> > Possible problem with ex19 running with superlu_dist, diffs
> > > above
> > > > >>>>>>> > =========================================
> > > > >>>>>>> >
> > > > >>>>>>> > On Wed, Apr 15, 2020 at 5:58 PM Satish Balay <
> > > balay at mcs.anl.gov>
> > > > >>>>>>> wrote:
> > > > >>>>>>> >
> > > > >>>>>>> > > Please send configure.log
> > > > >>>>>>> > >
> > > > >>>>>>> > > This is what I get on my linux build:
> > > > >>>>>>> > >
> > > > >>>>>>> > > [balay at p1 petsc]$ ./configure
> > > > >>>>>>> > > --with-mpi-dir=/home/petsc/soft/openmpi-4.0.2-cuda
> > > --with-cuda=1
> > > > >>>>>>> > > --with-openmp=1 --download-superlu-dist=1 && make && make
> > > check
> > > > >>>>>>> > > <snip>
> > > > >>>>>>> > > Running check examples to verify correct installation
> > > > >>>>>>> > > Using PETSC_DIR=/home/balay/petsc and
> > > > >>>>>>> PETSC_ARCH=arch-linux-c-debug
> > > > >>>>>>> > > C/C++ example src/snes/tutorials/ex19 run successfully
> with 1
> > > > >>>>>>> MPI process
> > > > >>>>>>> > > C/C++ example src/snes/tutorials/ex19 run successfully
> with 2
> > > > >>>>>>> MPI processes
> > > > >>>>>>> > > 1a2,19
> > > > >>>>>>> > > > CUDA version:   v 10020
> > > > >>>>>>> > > > CUDA Devices:
> > > > >>>>>>> > > >
> > > > >>>>>>> > > > 0 : Quadro T2000 7 5
> > > > >>>>>>> > > >   Global memory:   3911 mb
> > > > >>>>>>> > > >   Shared memory:   48 kb
> > > > >>>>>>> > > >   Constant memory: 64 kb
> > > > >>>>>>> > > >   Block registers: 65536
> > > > >>>>>>> > > >
> > > > >>>>>>> > > > CUDA version:   v 10020
> > > > >>>>>>> > > > CUDA Devices:
> > > > >>>>>>> > > >
> > > > >>>>>>> > > > 0 : Quadro T2000 7 5
> > > > >>>>>>> > > >   Global memory:   3911 mb
> > > > >>>>>>> > > >   Shared memory:   48 kb
> > > > >>>>>>> > > >   Constant memory: 64 kb
> > > > >>>>>>> > > >   Block registers: 65536
> > > > >>>>>>> > > >
> > > > >>>>>>> > > /home/balay/petsc/src/snes/tutorials
> > > > >>>>>>> > > Possible problem with ex19 running with superlu_dist,
> diffs
> > > above
> > > > >>>>>>> > > =========================================
> > > > >>>>>>> > > Fortran example src/snes/tutorials/ex5f run successfully
> > > with 1
> > > > >>>>>>> MPI process
> > > > >>>>>>> > > Completed test examples
> > > > >>>>>>> > >
> > > > >>>>>>> > >
> > > > >>>>>>> > > On Wed, 15 Apr 2020, Mark Adams wrote:
> > > > >>>>>>> > >
> > > > >>>>>>> > > > On Wed, Apr 15, 2020 at 5:17 PM Satish Balay <
> > > > >>>>>>> balay at mcs.anl.gov> wrote:
> > > > >>>>>>> > > >
> > > > >>>>>>> > > > > The build should work. It should give some verbose
> info
> > > [at
> > > > >>>>>>> runtime]
> > > > >>>>>>> > > > > regarding GPUs - from the following code.
> > > > >>>>>>> > > > >
> > > > >>>>>>> > > > >
> > > > >>>>>>> > > > I don't see that and I am running GPUs in my code and
> have
> > > > >>>>>>> gotten
> > > > >>>>>>> > > cusparse
> > > > >>>>>>> > > > LU to run. Should I use '-info :sys:'  ?
> > > > >>>>>>> > > >
> > > > >>>>>>> > > >
> > > > >>>>>>> > > > > >>>>> SRC/cublas_utils.c >>>>>>>>>>>
> > > > >>>>>>> > > > >  void DisplayHeader()
> > > > >>>>>>> > > > > {
> > > > >>>>>>> > > > >     const int kb = 1024;
> > > > >>>>>>> > > > >     const int mb = kb * kb;
> > > > >>>>>>> > > > >     // cout << "NBody.GPU" << endl << "=========" <<
> > > endl <<
> > > > >>>>>>> endl;
> > > > >>>>>>> > > > >
> > > > >>>>>>> > > > >     printf("CUDA version:   v %d\n",CUDART_VERSION);
> > > > >>>>>>> > > > >     //cout << "Thrust version: v" <<
> > > THRUST_MAJOR_VERSION <<
> > > > >>>>>>> "." <<
> > > > >>>>>>> > > > > THRUST_MINOR_VERSION << endl << endl;
> > > > >>>>>>> > > > >
> > > > >>>>>>> > > > >     int devCount;
> > > > >>>>>>> > > > >     cudaGetDeviceCount(&devCount);
> > > > >>>>>>> > > > >     printf( "CUDA Devices: \n \n");
> > > > >>>>>>> > > > > <snip>
> > > > >>>>>>> > > > > <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> > > > >>>>>>> > > > >
> > > > >>>>>>> > > > > Satish
> > > > >>>>>>> > > > >
> > > > >>>>>>> > > > > On Wed, 15 Apr 2020, Junchao Zhang wrote:
> > > > >>>>>>> > > > >
> > > > >>>>>>> > > > > > I remember Barry said superlu gpu support is
> broken.
> > > > >>>>>>> > > > > > --Junchao Zhang
> > > > >>>>>>> > > > > >
> > > > >>>>>>> > > > > >
> > > > >>>>>>> > > > > > On Wed, Apr 15, 2020 at 3:47 PM Mark Adams <
> > > > >>>>>>> mfadams at lbl.gov> wrote:
> > > > >>>>>>> > > > > >
> > > > >>>>>>> > > > > > > How does one use SuperLU with GPUs. I don't seem
> to
> > > get
> > > > >>>>>>> any GPU
> > > > >>>>>>> > > > > > > performance data so I assume GPUs are not getting
> > > turned
> > > > >>>>>>> on. Am I
> > > > >>>>>>> > > wrong
> > > > >>>>>>> > > > > > > about that?
> > > > >>>>>>> > > > > > >
> > > > >>>>>>> > > > > > > I configure with:
> > > > >>>>>>> > > > > > > configure options: --with-fc=0 --COPTFLAGS="-g
> -O2
> > > -fPIC
> > > > >>>>>>> -fopenmp"
> > > > >>>>>>> > > > > > > --CXXOPTFLAGS="-g -O2 -fPIC -fopenmp"
> --FOPTFLAGS="-g
> > > > >>>>>>> -O2 -fPIC
> > > > >>>>>>> > > > > -fopenmp"
> > > > >>>>>>> > > > > > > --CUDAOPTFLAGS="-O2 -g" --with-ssl=0
> --with-batch=0
> > > > >>>>>>> > > --with-cxx=mpicxx
> > > > >>>>>>> > > > > > > --with-mpiexec="jsrun -g1" --with-cuda=1
> > > > >>>>>>> --with-cudac=nvcc
> > > > >>>>>>> > > > > > > --download-p4est=1 --download-zlib
> --download-hdf5=1
> > > > >>>>>>> > > --download-metis
> > > > >>>>>>> > > > > > > --download-superlu --download-superlu_dist
> > > > >>>>>>> --with-make-np=16
> > > > >>>>>>> > > > > > > --download-parmetis --download-triangle
> > > > >>>>>>> > > > > > >
> > > > >>>>>>> > > > >
> > > > >>>>>>> > >
> > > > >>>>>>>
> > >
> --with-blaslapack-lib="-L/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/netlib-lapack-3.8.0-wcabdyqhdi5rooxbkqa6x5d7hxyxwdkm/lib64
> > > > >>>>>>> > > > > > > -lblas -llapack" --with-cc=mpicc
> > > > >>>>>>> --with-shared-libraries=1
> > > > >>>>>>> > > --with-x=0
> > > > >>>>>>> > > > > > > --with-64-bit-indices=0 --with-debugging=0
> > > > >>>>>>> > > > > > > PETSC_ARCH=arch-summit-opt-gnu-cuda-omp
> > > --with-openmp=1
> > > > >>>>>>> > > > > > > --with-threadsaftey=1 --with-log=1
> > > > >>>>>>> > > > > > >
> > > > >>>>>>> > > > > > > Thanks,
> > > > >>>>>>> > > > > > > Mark
> > > > >>>>>>> > > > > > >
> > > > >>>>>>> > > > > >
> > > > >>>>>>> > > > >
> > > > >>>>>>> > > > >
> > > > >>>>>>> > > >
> > > > >>>>>>> > >
> > > > >>>>>>> > >
> > > > >>>>>>> >
> > > > >>>>>>>
> > > > >>>>>>>
> > > >
> > >
> > >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200419/cfd02c3b/attachment-0001.html>
    
    
More information about the petsc-users
mailing list