[petsc-users] Errors from large matrices

Barry Smith bsmith at mcs.anl.gov
Sun May 26 08:13:47 CDT 2013


On May 26, 2013, at 6:19 AM, Joon hee Choi <choi240 at purdue.edu> wrote:

> Hello all,
> 
> I need to multiply a large seqaij matrix(X1) and a maij(or baij) matrix(CC). I set up X1 (size:4273949x108965941330383, nonzeros:143599552) and C (size:25495389x10, nonzeros:254953890) and created a maij matrix CC from C. However, I got errors such as out of memory and Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range. Is this memory problem, and do I have to change seqaij into mpiaij and use multi processors? Or do I have another methods fixing it? If you know the method, then please let me know it. Thank you.

http://www.mcs.anl.gov/petsc/documentation/faq.html#with-64-bit-indices


> 
> Joon
> 
> 
> Code:
>  ...
>  ierr = MatCreate(PETSC_COMM_SELF, &X1); CHKERRQ(ierr);
>  ierr = MatSetSizes(X1, PETSC_DECIDE, PETSC_DECIDE, I, J*K); CHKERRQ(ierr);
>  ierr = MatSetBlockSizes(X1, I, J); CHKERRQ(ierr);
>  ierr = MatSetType(X1, MATSEQAIJ); CHKERRQ(ierr);
>  ierr = MatSeqAIJSetPreallocation(X1, 0, nnz); CHKERRQ(ierr);
> 
>  for (int x=0; x<tups.size(); x++) {
>       i = std::tr1::get<0>(tups[x]);
>       j = std::tr1::get<2>(tups[x]) + std::tr1::get<1>(tups[x])*J;
>       val = std::tr1::get<3>(tups[x]);
>       ierr = MatSetValues(X1, 1, &i, 1, &j, &val, INSERT_VALUES); CHKERRQ(ierr);
>  }
>  ierr = MatAssemblyBegin(X1, MAT_FINAL_ASSEMBLY);
>  ierr = MatAssemblyEnd(X1, MAT_FINAL_ASSEMBLY);
>  ierr = PetscGetTime(&v1); CHKERRQ(ierr);
>  ierr = PetscPrintf(PETSC_COMM_WORLD, "Setup Time: %2.1e \n", v1-v); CHKERRQ(ierr);
> 
>  // Create a matrix C (K x R) with all values 1
>  ierr = MatCreateSeqAIJ(PETSC_COMM_SELF, K, R, R, NULL, &C); CHKERRQ(ierr);
>  for (k=0; k<K; k++) {
>       for (r=0; r<R; r++) {
>            ierr = MatSetValues(C, 1, &k, 1, &r, &one, INSERT_VALUES); CHKERRQ(ierr);
>       }
>  }
>  ierr = MatAssemblyBegin(C, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr);
>  ierr = MatAssemblyEnd(C, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr);
> 
>  ierr = MatCreateMAIJ(C, J, &MC); CHKERRQ(ierr);
>  ierr = MatConvert(MC, MATBAIJ, MAT_INITIAL_MATRIX, &CC); CHKERRQ(ierr);
>  ierr = MatMatMult(X1, CC, MAT_INITIAL_MATRIX, PETSC_DEFAULT, &M); CHKERRQ(ierr);
>  ...
> 
> 
> Results and Errors with -info -mat-view-info:
> [0] PetscInitialize(): PETSc successfully started: number of processors = 1
> [0] PetscInitialize(): Running on machine: rossmann-fe02.rcac.purdue.edu
> [0] PetscFOpen(): Opening file /group/ml/data/tensor/nell/sparse.large.txt
> [0] PetscCommDuplicate(): Duplicating a communicator 1140850689 -2080374784 max tags = 2147483647
> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 4273949 X 108965941330383; storage space: 83847 unneeded,143599552 used
> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 504677
> [0] Mat_CheckInode(): Found 3499069 nodes out of 4273949 rows. Not using Inode routines
> Matrix Object: 1 MPI processes
>  type: seqaij
>  rows=4273949, cols=108965941330383, bs=4273949
>  total: nonzeros=143599552, allocated nonzeros=143683399
>  total number of mallocs used during MatSetValues calls =0
>    not using I-node routines
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374784
> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 25495389 X 10; storage space: 0 unneeded,254953890 used
> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 10
> [0] Mat_CheckInode(): Found 5099078 nodes of 25495389. Limit used: 5. Using Inode routines
> Matrix Object: 1 MPI processes
>  type: seqaij
>  rows=25495389, cols=10
>  total: nonzeros=254953890, allocated nonzeros=254953890
>  total number of mallocs used during MatSetValues calls =0
>    using I-node routines: found 5099078 nodes, limit used is 5
> Matrix Object: 1 MPI processes
>  type: seqmaij
>  rows=108965941330383, cols=42739470, bs=4273947
> [0]PETSC ERROR: --------------------- Error Message ------------------------------------
> [0]PETSC ERROR: Out of memory. This could be due to allocating
> [0]PETSC ERROR: too large an object or bleeding by not properly
> [0]PETSC ERROR: destroying unneeded objects.
> [0]PETSC ERROR: Memory allocated 0 Memory used by process 11980140544
> [0]PETSC ERROR: Try running with -malloc_dump or -malloc_log for info.
> [0]PETSC ERROR: Memory requested 871727530643064!
> [0]PETSC ERROR: ------------------------------------------------------------------------
> [0]PETSC ERROR: Petsc Release Version 3.3.0, Patch 6, Mon Feb 11 12:26:34 CST 2013 
> [0]PETSC ERROR: See docs/changes/index.html for recent updates.
> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [0]PETSC ERROR: See docs/index.html for manual pages.
> [0]PETSC ERROR: ------------------------------------------------------------------------
> [0]PETSC ERROR: ./tensor on a linux-sta named rossmann-fe02.rcac.purdue.edu by choi240 Sun May 26 07:13:32 2013
> [0]PETSC ERROR: Libraries linked from /apps/rhel5/petsc-3.3-p6/64/impi-4.1.0.024_intel-13.0.1.117_ind64/linux-static/lib
> [0]PETSC ERROR: Configure run at Tue May 21 15:56:45 2013
> [0]PETSC ERROR: Configure options --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-scalar-type=real --with-shared-libraries=0 --with-pic=1 --with-clanguage=C++ --with-fortran --with-fortran-kernels=1 --with-64-bit-indices=1 --with-debugging=0 --with-blas-lapack-dir=/opt/intel/composer_xe_2013.1.117/mkl/lib/intel64 --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3 --download-hdf5=no --download-metis=no --download-parmetis=no --download-superlu_dist=no --download-mumps=no --download-scalapack=yes --download-blacs=yes --download-hypre=no --download-spooles=no
> [0]PETSC ERROR: ------------------------------------------------------------------------
> [0]PETSC ERROR: PetscMallocAlign() line 49 in /apps/rhel5/petsc-3.3-p6/64/impi-4.1.0.024_intel-13.0.1.117_ind64/src/sys/memory/mal.c
> [0]PETSC ERROR: MatConvert_SeqMAIJ_SeqAIJ() line 3232 in /apps/rhel5/petsc-3.3-p6/64/impi-4.1.0.024_intel-13.0.1.117_ind64/src/mat/impls/maij/maij.c
> [0]PETSC ERROR: MatConvert() line 3778 in /apps/rhel5/petsc-3.3-p6/64/impi-4.1.0.024_intel-13.0.1.117_ind64/src/mat/interface/matrix.c
> [0]PETSC ERROR: ------------------------------------------------------------------------
> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run 
> [0]PETSC ERROR: to get more information on the crash.
> [0]PETSC ERROR: --------------------- Error Message ------------------------------------
> [0]PETSC ERROR: Signal received!
> [0]PETSC ERROR: ------------------------------------------------------------------------
> [0]PETSC ERROR: Petsc Release Version 3.3.0, Patch 6, Mon Feb 11 12:26:34 CST 2013 
> [0]PETSC ERROR: See docs/changes/index.html for recent updates.
> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [0]PETSC ERROR: See docs/index.html for manual pages.
> [0]PETSC ERROR: ------------------------------------------------------------------------
> [0]PETSC ERROR: ./tensor on a linux-sta named rossmann-fe02.rcac.purdue.edu by choi240 Sun May 26 07:13:32 2013
> [0]PETSC ERROR: Libraries linked from /apps/rhel5/petsc-3.3-p6/64/impi-4.1.0.024_intel-13.0.1.117_ind64/linux-static/lib
> [0]PETSC ERROR: Configure run at Tue May 21 15:56:45 2013
> [0]PETSC ERROR: Configure options --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-scalar-type=real --with-shared-libraries=0 --with-pic=1 --with-clanguage=C++ --with-fortran --with-fortran-kernels=1 --with-64-bit-indices=1 --with-debugging=0 --with-blas-lapack-dir=/opt/intel/composer_xe_2013.1.117/mkl/lib/intel64 --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3 --download-hdf5=no --download-metis=no --download-parmetis=no --download-superlu_dist=no --download-mumps=no --download-scalapack=yes --download-blacs=yes --download-hypre=no --download-spooles=no
> [0]PETSC ERROR: ------------------------------------------------------------------------
> [0]PETSC ERROR: User provided function() line 0 in unknown directory unknown file
> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0



More information about the petsc-users mailing list