[petsc-users] MatMatMult causes crash

Barry Smith bsmith at mcs.anl.gov
Thu Jan 19 10:03:30 CST 2017


   Absurd memory requests "Memory requested 18446744068029169664" usually means that 32 bit integers are not large enough for the problem. Try configuring on the cray with --with-64-bit-indices

   Barry


> On Jan 19, 2017, at 7:14 AM, Cyrill Vonplanta <cyrill.von.planta at usi.ch> wrote:
> 
> Dear PETSc Users,
> 
> 
> I have a problem with a solver running on a cray machine that crashes at the command “MatMatMult” (see error message below). When i run the same solver on my machine in serial or parallel it runs through, also when I look at it with -malloc_debug there doesn’t seem to be any issues.
> 
> Does someone have a clue what the cause of this failure could be?
> 
> Best Cyrill
> --
> 
> The line that causes the crash is this:
> 
> ierr = MatMatMult(_O, _interpolations[0], MAT_INITIAL_MATRIX, PETSC_DEFAULT, &mmg->interpolations[mg_levels-2]); CHKERRQ(ierr);
> 
> The error message:
> 
> 
> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [0]PETSC ERROR: Out of memory. This could be due to allocating
> [0]PETSC ERROR: too large an object or bleeding by not properly
> [0]PETSC ERROR: destroying unneeded objects.
> [0]PETSC ERROR: Memory allocated 0 Memory used by process 61852
> [0]PETSC ERROR: Try running with -malloc_dump or -malloc_log for info.
> [0]PETSC ERROR: Memory requested 18446744068029169664
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016
> [0]PETSC ERROR: /scratch/snx3000/studi/./moose-passo-opt on a haswell named nid01137 by studi Thu Jan 19 14:03:27 2017
> [0]PETSC ERROR: Configure options --known-has-attribute-aligned=1 --known-mpi-int64_t=0 --known-bits-per-byte=8 --known-sdot-returns-double=0 --known-snrm2-returns-double=0 --known-level1-dcache-assoc=0 --known-level1-dcache-linesize=32 --known-level1-dcache-size=32768 --known-memcmp-ok=1 --known-mpi-c-double-complex=1 --known-mpi-long-double=1 --known-mpi-shared-libraries=0 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-sizeof-char=1 --known-sizeof-double=8 --known-sizeof-float=4 --known-sizeof-int=4 --known-sizeof-long-long=8 --known-sizeof-long=8 --known-sizeof-short=2 --known-sizeof-size_t=8 --known-sizeof-void-p=8 --with-ar=ar --with-batch=1 --with-cc=cc --with-clib-autodetect=0 --with-cxx=CC --with-cxxlib-autodetect=0 --with-debugging=0 --with-dependencies=0 --with-fc=ftn --with-fortran-datatypes=0 --with-fortran-interfaces=0 --with-fortranlib-autodetect=0 --with-ranlib=ranlib --with-scalar-type=real --with-shared-ld=ar --with-etags=0 --with-dependencies=0 --with-x=0 --with-ssl=0 --with-shared-libraries=0 --with-dependencies=0 --with-mpi-lib="[]" --with-mpi-include="[]" --with-blas-lapack-lib="-L/opt/cray/libsci/13.2.0/GNU/5.1/x86_64/lib -lsci_gnu_mp" --with-superlu=1 --with-superlu-include=/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/include --with-superlu-lib="-L/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/lib -lsuperlu" --with-superlu_dist=1 --with-superlu_dist-include=/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/include --with-superlu_dist-lib="-L/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/lib -lsuperlu_dist" --with-parmetis=1 --with-parmetis-include=/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/include --with-parmetis-lib="-L/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/lib -lparmetis" --with-metis=1 --with-metis-include=/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/include --with-metis-lib="-L/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/lib -lmetis" --with-ptscotch=1 --with-ptscotch-include=/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/include --with-ptscotch-lib="-L/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/lib -lptscotch -lscotch -lptscotcherr -lscotcherr" --with-scalapack=1 --with-scalapack-include=/opt/cray/libsci/13.2.0/GNU/5.1/x86_64/include --with-scalapack-lib="-L/opt/cray/libsci/13.2.0/GNU/5.1/x86_64/lib -lsci_gnu_mpi_mp -lsci_gnu_mp" --with-mumps=1 --with-mumps-include=/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/include --with-mumps-lib="-L/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/lib -lcmumps -ldmumps -lesmumps -lsmumps -lzmumps -lmumps_common -lptesmumps -lpord" --with-hdf5=1 --with-hdf5-include=/opt/cray/hdf5-parallel/1.8.16/GNU/5.1/include --with-hdf5-lib="-L/opt/cray/hdf5-parallel/1.8.16/GNU/5.1/lib -lhdf5_parallel -lz -ldl" --CFLAGS="-march=haswell -fopenmp -O3 -ffast-math  -fPIC" --CPPFLAGS= --CXXFLAGS="-march=haswell -fopenmp -O3 -ffast-math   -fPIC" --FFLAGS="-march=haswell -fopenmp -O3 -ffast-math   -fPIC" --LIBS= --CXX_LINKER_FLAGS= --PETSC_ARCH=haswell --prefix=/opt/cray/pe/petsc/3.7.2.1/real/GNU/5.1/haswell --with-hypre=1 --with-hypre-include=/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/include --with-hypre-lib="-L/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/lib -lHYPRE" --with-sundials=1 --with-sundials-include=/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/include --with-sundials-lib="-L/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/lib -lsundials_cvode -lsundials_cvodes -lsundials_ida -lsundials_idas -lsundials_kinsol -lsundials_nvecparallel -lsundials_nvecserial"
> [0]PETSC ERROR: #1 MatGetBrowsOfAoCols_MPIAIJ() line 4815 in src/mat/impls/aij/mpi/mpiaij.c
> [0]PETSC ERROR: #2 MatGetBrowsOfAoCols_MPIAIJ() line 4815 in src/mat/impls/aij/mpi/mpiaij.c
> [0]PETSC ERROR: #3 MatMatMultSymbolic_MPIAIJ_MPIAIJ_nonscalable() line 198 in src/mat/impls/aij/mpi/mpimatmatmult.c
> [0]PETSC ERROR: #4 MatMatMult_MPIAIJ_MPIAIJ() line 34 in src/mat/impls/aij/mpi/mpimatmatmult.c
> [0]PETSC ERROR:   MMG Setup 30.868420 ms.
> #5 MatMatMult() line 9517 in src/mat/interface/matrix.c
> [0]PETSC ERROR: #6 MMGSetup() line 85 in /users/studi/src/moose-passo/src/passo/monotone_mg.C
> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [0]PETSC ERROR: Arguments are incompatible
> [0]PETSC ERROR: Incompatible vector local lengths 666 != 10922
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016
> [0]PETSC ERROR: /scratch/snx3000/studi/./moose-passo-opt on a haswell named nid01137 by studi Thu Jan 19 14:03:27 2017
> [0]PETSC ERROR: Configure options --known-has-attribute-aligned=1 --known-mpi-int64_t=0 --known-bits-per-byte=8 --known-sdot-returns-double=0 --known-snrm2-returns-double=0 --known-level1-dcache-assoc=0 --known-level1-dcache-linesize=32 --known-level1-dcache-size=32768 --known-memcmp-ok=1 --known-mpi-c-double-complex=1 --known-mpi-long-double=1 --known-mpi-shared-libraries=0 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-sizeof-char=1 --known-sizeof-double=8 --known-sizeof-float=4 --known-sizeof-int=4 --known-sizeof-long-long=8 --known-sizeof-long=8 --known-sizeof-short=2 --known-sizeof-size_t=8 --known-sizeof-void-p=8 --with-ar=ar --with-batch=1 --with-cc=cc --with-clib-autodetect=0 --with-cxx=CC --with-cxxlib-autodetect=0 --with-debugging=0 --with-dependencies=0 --with-fc=ftn --with-fortran-datatypes=0 --with-fortran-interfaces=0 --with-fortranlib-autodetect=0 --with-ranlib=ranlib --with-scalar-type=real --with-shared-ld=ar --with-etags=0 --with-dependencies=0 --with-x=0 --with-ssl=0 --with-shared-libraries=0 --with-dependencies=0 --with-mpi-lib="[]" --with-mpi-include="[]" --with-blas-lapack-lib="-L/opt/cray/libsci/13.2.0/GNU/5.1/x86_64/lib -lsci_gnu_mp" --with-superlu=1 --with-superlu-include=/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/include --with-superlu-lib="-L/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/lib -lsuperlu" --with-superlu_dist=1 --with-superlu_dist-include=/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/include --with-superlu_dist-lib="-L/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/lib -lsuperlu_dist" --with-parmetis=1 --with-parmetis-include=/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/include --with-parmetis-lib="-L/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/lib -lparmetis" --with-metis=1 --with-metis-include=/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/include --with-metis-lib="-L/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/lib -lmetis" --with-ptscotch=1 --with-ptscotch-include=/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/include --with-ptscotch-lib="-L/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/lib -lptscotch -lscotch -lptscotcherr -lscotcherr" --with-scalapack=1 --with-scalapack-include=/opt/cray/libsci/13.2.0/GNU/5.1/x86_64/include --with-scalapack-lib="-L/opt/cray/libsci/13.2.0/GNU/5.1/x86_64/lib -lsci_gnu_mpi_mp -lsci_gnu_mp" --with-mumps=1 --with-mumps-include=/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/include --with-mumps-lib="-L/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/lib -lcmumps -ldmumps -lesmumps -lsmumps -lzmumps -lmumps_common -lptesmumps -lpord" --with-hdf5=1 --with-hdf5-include=/opt/cray/hdf5-parallel/1.8.16/GNU/5.1/include --with-hdf5-lib="-L/opt/cray/hdf5-parallel/1.8.16/GNU/5.1/lib -lhdf5_parallel -lz -ldl" --CFLAGS="-march=haswell -fopenmp -O3 -ffast-math  -fPIC" --CPPFLAGS= --CXXFLAGS="-march=haswell -fopenmp -O3 -ffast-math   -fPIC" --FFLAGS="-march=haswell -fopenmp -O3 -ffast-math   -fPIC" --LIBS= --CXX_LINKER_FLAGS= --PETSC_ARCH=haswell --prefix=/opt/cray/pe/petsc/3.7.2.1/real/GNU/5.1/haswell --with-hypre=1 --with-hypre-include=/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/include --with-hypre-lib="-L/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/lib -lHYPRE" --with-sundials=1 --with-sundials-include=/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/include --with-sundials-lib="-L/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/lib -lsundials_cvode -lsundials_cvodes -lsundials_ida -lsundials_idas -lsundials_kinsol -lsundials_nvecparallel -lsundials_nvecserial"
> [0]PETSC ERROR: #7 VecCopy() line 1639 in src/vec/vec/interface/vector.c
> Level 1, Presmoothing step 0 ... srun: error: nid01137: task 0: Trace/breakpoint trap
> srun: Terminating job step 349949.1
> slurmstepd: error: *** STEP 349949.1 ON nid01137 CANCELLED AT 2017-01-19T14:03:32 ***
> srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
> srun: error: nid01137: task 1: Killed
> 
> 



More information about the petsc-users mailing list