[petsc-users] Errors from large matrices

Joon hee Choi choi240 at purdue.edu
Sun May 26 06:19:52 CDT 2013


Hello all,

I need to multiply a large seqaij matrix(X1) and a maij(or baij) matrix(CC). I set up X1 (size:4273949x108965941330383, nonzeros:143599552) and C (size:25495389x10, nonzeros:254953890) and created a maij matrix CC from C. However, I got errors such as out of memory and Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range. Is this memory problem, and do I have to change seqaij into mpiaij and use multi processors? Or do I have another methods fixing it? If you know the method, then please let me know it. Thank you.

Joon


Code:
  ...
  ierr = MatCreate(PETSC_COMM_SELF, &X1); CHKERRQ(ierr);
  ierr = MatSetSizes(X1, PETSC_DECIDE, PETSC_DECIDE, I, J*K); CHKERRQ(ierr);
  ierr = MatSetBlockSizes(X1, I, J); CHKERRQ(ierr);
  ierr = MatSetType(X1, MATSEQAIJ); CHKERRQ(ierr);
  ierr = MatSeqAIJSetPreallocation(X1, 0, nnz); CHKERRQ(ierr);

  for (int x=0; x<tups.size(); x++) {
       i = std::tr1::get<0>(tups[x]);
       j = std::tr1::get<2>(tups[x]) + std::tr1::get<1>(tups[x])*J;
       val = std::tr1::get<3>(tups[x]);
       ierr = MatSetValues(X1, 1, &i, 1, &j, &val, INSERT_VALUES); CHKERRQ(ierr);
  }
  ierr = MatAssemblyBegin(X1, MAT_FINAL_ASSEMBLY);
  ierr = MatAssemblyEnd(X1, MAT_FINAL_ASSEMBLY);
  ierr = PetscGetTime(&v1); CHKERRQ(ierr);
  ierr = PetscPrintf(PETSC_COMM_WORLD, "Setup Time: %2.1e \n", v1-v); CHKERRQ(ierr);

  // Create a matrix C (K x R) with all values 1
  ierr = MatCreateSeqAIJ(PETSC_COMM_SELF, K, R, R, NULL, &C); CHKERRQ(ierr);
  for (k=0; k<K; k++) {
       for (r=0; r<R; r++) {
            ierr = MatSetValues(C, 1, &k, 1, &r, &one, INSERT_VALUES); CHKERRQ(ierr);
       }
  }
  ierr = MatAssemblyBegin(C, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr);
  ierr = MatAssemblyEnd(C, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr);

  ierr = MatCreateMAIJ(C, J, &MC); CHKERRQ(ierr);
  ierr = MatConvert(MC, MATBAIJ, MAT_INITIAL_MATRIX, &CC); CHKERRQ(ierr);
  ierr = MatMatMult(X1, CC, MAT_INITIAL_MATRIX, PETSC_DEFAULT, &M); CHKERRQ(ierr);
  ...


Results and Errors with -info -mat-view-info:
[0] PetscInitialize(): PETSc successfully started: number of processors = 1
[0] PetscInitialize(): Running on machine: rossmann-fe02.rcac.purdue.edu
[0] PetscFOpen(): Opening file /group/ml/data/tensor/nell/sparse.large.txt
[0] PetscCommDuplicate(): Duplicating a communicator 1140850689 -2080374784 max tags = 2147483647
[0] MatAssemblyEnd_SeqAIJ(): Matrix size: 4273949 X 108965941330383; storage space: 83847 unneeded,143599552 used
[0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 504677
[0] Mat_CheckInode(): Found 3499069 nodes out of 4273949 rows. Not using Inode routines
Matrix Object: 1 MPI processes
  type: seqaij
  rows=4273949, cols=108965941330383, bs=4273949
  total: nonzeros=143599552, allocated nonzeros=143683399
  total number of mallocs used during MatSetValues calls =0
    not using I-node routines
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374784
[0] MatAssemblyEnd_SeqAIJ(): Matrix size: 25495389 X 10; storage space: 0 unneeded,254953890 used
[0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 10
[0] Mat_CheckInode(): Found 5099078 nodes of 25495389. Limit used: 5. Using Inode routines
Matrix Object: 1 MPI processes
  type: seqaij
  rows=25495389, cols=10
  total: nonzeros=254953890, allocated nonzeros=254953890
  total number of mallocs used during MatSetValues calls =0
    using I-node routines: found 5099078 nodes, limit used is 5
Matrix Object: 1 MPI processes
  type: seqmaij
  rows=108965941330383, cols=42739470, bs=4273947
[0]PETSC ERROR: --------------------- Error Message ------------------------------------
[0]PETSC ERROR: Out of memory. This could be due to allocating
[0]PETSC ERROR: too large an object or bleeding by not properly
[0]PETSC ERROR: destroying unneeded objects.
[0]PETSC ERROR: Memory allocated 0 Memory used by process 11980140544
[0]PETSC ERROR: Try running with -malloc_dump or -malloc_log for info.
[0]PETSC ERROR: Memory requested 871727530643064!
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Petsc Release Version 3.3.0, Patch 6, Mon Feb 11 12:26:34 CST 2013 
[0]PETSC ERROR: See docs/changes/index.html for recent updates.
[0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
[0]PETSC ERROR: See docs/index.html for manual pages.
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: ./tensor on a linux-sta named rossmann-fe02.rcac.purdue.edu by choi240 Sun May 26 07:13:32 2013
[0]PETSC ERROR: Libraries linked from /apps/rhel5/petsc-3.3-p6/64/impi-4.1.0.024_intel-13.0.1.117_ind64/linux-static/lib
[0]PETSC ERROR: Configure run at Tue May 21 15:56:45 2013
[0]PETSC ERROR: Configure options --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-scalar-type=real --with-shared-libraries=0 --with-pic=1 --with-clanguage=C++ --with-fortran --with-fortran-kernels=1 --with-64-bit-indices=1 --with-debugging=0 --with-blas-lapack-dir=/opt/intel/composer_xe_2013.1.117/mkl/lib/intel64 --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3 --download-hdf5=no --download-metis=no --download-parmetis=no --download-superlu_dist=no --download-mumps=no --download-scalapack=yes --download-blacs=yes --download-hypre=no --download-spooles=no
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: PetscMallocAlign() line 49 in /apps/rhel5/petsc-3.3-p6/64/impi-4.1.0.024_intel-13.0.1.117_ind64/src/sys/memory/mal.c
[0]PETSC ERROR: MatConvert_SeqMAIJ_SeqAIJ() line 3232 in /apps/rhel5/petsc-3.3-p6/64/impi-4.1.0.024_intel-13.0.1.117_ind64/src/mat/impls/maij/maij.c
[0]PETSC ERROR: MatConvert() line 3778 in /apps/rhel5/petsc-3.3-p6/64/impi-4.1.0.024_intel-13.0.1.117_ind64/src/mat/interface/matrix.c
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run 
[0]PETSC ERROR: to get more information on the crash.
[0]PETSC ERROR: --------------------- Error Message ------------------------------------
[0]PETSC ERROR: Signal received!
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Petsc Release Version 3.3.0, Patch 6, Mon Feb 11 12:26:34 CST 2013 
[0]PETSC ERROR: See docs/changes/index.html for recent updates.
[0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
[0]PETSC ERROR: See docs/index.html for manual pages.
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: ./tensor on a linux-sta named rossmann-fe02.rcac.purdue.edu by choi240 Sun May 26 07:13:32 2013
[0]PETSC ERROR: Libraries linked from /apps/rhel5/petsc-3.3-p6/64/impi-4.1.0.024_intel-13.0.1.117_ind64/linux-static/lib
[0]PETSC ERROR: Configure run at Tue May 21 15:56:45 2013
[0]PETSC ERROR: Configure options --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-scalar-type=real --with-shared-libraries=0 --with-pic=1 --with-clanguage=C++ --with-fortran --with-fortran-kernels=1 --with-64-bit-indices=1 --with-debugging=0 --with-blas-lapack-dir=/opt/intel/composer_xe_2013.1.117/mkl/lib/intel64 --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3 --download-hdf5=no --download-metis=no --download-parmetis=no --download-superlu_dist=no --download-mumps=no --download-scalapack=yes --download-blacs=yes --download-hypre=no --download-spooles=no
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: User provided function() line 0 in unknown directory unknown file
application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0


More information about the petsc-users mailing list