[petsc-users] Question on writing a large matrix

Barry Smith bsmith at mcs.anl.gov
Sat Apr 23 13:39:35 CDT 2011


http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#64-bit-indices


On Apr 23, 2011, at 12:36 PM, S V N Vishwanathan wrote:

> Barry,
> 
>>   It has not assembled the matrix in an hour. It is working all night
>>   to assemble the matrix, the problem is that you are not
>>   preallocating the nonzeros per row with MatMPIAIJSetPreallocation()
>>   when pre allocation is correct it will always print 0 for Number of
>>   mallocs. The actual writing of the parallel matrix to the binary
>>   file will take at most minutes.
> 
> You were absolutely right! I had not set the preallocation properly and
> hence the code was painfully slow. I fixed that issue (see attached
> code) and now it runs much faster. However, I am having a different
> problem now. When I run the code for smaller matrices (less than a
> million rows) everything works well. However, when working with large
> matrices (e.g. 2.8 million rows x 1157 columns) writing the matrix to
> file dies with the following message:
> 
> Fatal error in MPI_Recv: Other MPI error
> 
> Any hints on how to solve this problem or are deeply appreciated. 
> 
> vishy
> 
> The output of running the code with the -info flag is as follows:
> 
> [0] PetscInitialize(): PETSc successfully started: number of processors = 4
> [0] PetscInitialize(): Running on machine: rossmann-b001.rcac.purdue.edu
> [3] PetscInitialize(): PETSc successfully started: number of processors = 4
> [3] PetscInitialize(): Running on machine: rossmann-b004.rcac.purdue.edu
> No libsvm test file specified!
> 
> Reading libsvm train file at /scratch/lustreA/v/vishy/LibSVM/biclass/ocr/ocr.train.txt
> [2] PetscInitialize(): PETSc successfully started: number of processors = 4
> [2] PetscInitialize(): Running on machine: rossmann-b003.rcac.purdue.edu
> [3] PetscFOpen(): Opening file /scratch/lustreA/v/vishy/LibSVM/biclass/ocr/ocr.train.txt
> [2] PetscFOpen(): Opening file /scratch/lustreA/v/vishy/LibSVM/biclass/ocr/ocr.train.txt
> [0] PetscFOpen(): Opening file /scratch/lustreA/v/vishy/LibSVM/biclass/ocr/ocr.train.txt
> [1] PetscInitialize(): PETSc successfully started: number of processors = 4
> [1] PetscInitialize(): Running on machine: rossmann-b002.rcac.purdue.edu
> [1] PetscFOpen(): Opening file /scratch/lustreA/v/vishy/LibSVM/biclass/ocr/ocr.train.txt
> m=100000
> m=200000
> m=300000
> m=400000
> m=500000
> m=600000
> m=700000
> m=800000
> m=900000
> m=1000000
> m=1100000
> m=1200000
> m=1300000
> m=1400000
> m=1500000
> m=1600000
> m=1700000
> m=1800000
> m=1900000
> m=2000000
> m=2100000
> m=2200000
> m=2300000
> m=2400000
> m=2500000
> m=2600000
> m=2700000
> m=2800000
> user.dim=1157 user.m=2800000 user.maxnnz=1156 user.maxlen=32768 user.flg=1 
> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 2147483647
> [0] PetscCommDuplicate():   returning tag 2147483647
> user.dim=1157 user.m=2800000 user.maxnnz=1156 user.maxlen=32768 user.flg=1 
> user.dim=1157 user.m=2800000 user.maxnnz=1156 user.maxlen=32768 user.flg=1 
> user.dim=1157 user.m=2800000 user.maxnnz=1156 user.maxlen=32768 user.flg=1 
> [0] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm 1140850688
> [0] PetscCommDestroy(): Deleting PETSc MPI_Comm -2080374784
> [0] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm -2080374784
> [0] Petsc_DelCounter(): Deleting counter data in an MPI_Comm -2080374784
> [2] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 2147483647
> [2] PetscCommDuplicate():   returning tag 2147483647
> [2] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm 1140850688
> [2] PetscCommDestroy(): Deleting PETSc MPI_Comm -2080374784
> [2] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm -2080374784
> [2] Petsc_DelCounter(): Deleting counter data in an MPI_Comm -2080374784
> [3] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 2147483647
> [3] PetscCommDuplicate():   returning tag 2147483647
> [3] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm 1140850688
> [3] PetscCommDestroy(): Deleting PETSc MPI_Comm -2080374784
> [3] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm -2080374784
> [3] Petsc_DelCounter(): Deleting counter data in an MPI_Comm -2080374784
> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 2147483647
> [0] PetscCommDuplicate():   returning tag 2147483647
> [1] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 2147483647
> [1] PetscCommDuplicate():   returning tag 2147483647
> [1] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm 1140850688
> [1] PetscCommDestroy(): Deleting PETSc MPI_Comm -2080374784
> [1] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm -2080374784
> [1] Petsc_DelCounter(): Deleting counter data in an MPI_Comm -2080374784
> [1] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 2147483647
> [1] PetscCommDuplicate():   returning tag 2147483647
> [2] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 2147483647
> [2] PetscCommDuplicate():   returning tag 2147483647
> [3] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 2147483647
> [3] PetscCommDuplicate():   returning tag 2147483647
> [3] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [3] PetscCommDuplicate():   returning tag 2147483642
> [2] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [2] PetscCommDuplicate():   returning tag 2147483642
> [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [1] PetscCommDuplicate():   returning tag 2147483642
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate():   returning tag 2147483642
> [0] MatSetUpPreallocation(): Warning not preallocating matrix storage
> [0] PetscCommDuplicate(): Duplicating a communicator 1140850689 -2080374783 max tags = 2147483647
> [0] PetscCommDuplicate():   returning tag 2147483647
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374783
> [0] PetscCommDuplicate():   returning tag 2147483646
> [2] PetscCommDuplicate(): Duplicating a communicator 1140850689 -2080374783 max tags = 2147483647
> [2] PetscCommDuplicate():   returning tag 2147483647
> [1] PetscCommDuplicate(): Duplicating a communicator 1140850689 -2080374783 max tags = 2147483647
> [1] PetscCommDuplicate():   returning tag 2147483647
> [2] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374783
> [2] PetscCommDuplicate():   returning tag 2147483646
> [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374783
> [1] PetscCommDuplicate():   returning tag 2147483646
> [3] PetscCommDuplicate(): Duplicating a communicator 1140850689 -2080374783 max tags = 2147483647
> [3] PetscCommDuplicate():   returning tag 2147483647
> [3] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374783
> [3] PetscCommDuplicate():   returning tag 2147483646
> [0] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs.
> [0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs.
> [0] MatStashScatterBegin_Private(): No of messages: 0 
> [0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
> [1] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
> [3] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
> [2] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 700000 X 289; storage space: 0 unneeded,202300000 used
> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 289
> [3] MatAssemblyEnd_SeqAIJ(): Matrix size: 700000 X 289; storage space: 0 unneeded,202300000 used
> [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> [3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 289
> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 700000 X 290; storage space: 0 unneeded,202300000 used
> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 289
> [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 700000 X 289; storage space: 0 unneeded,202300000 used
> [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 289
> [1] Mat_CheckInode(): Found 140000 nodes of 700000. Limit used: 5. Using Inode routines
> [3] Mat_CheckInode(): Found 140000 nodes of 700000. Limit used: 5. Using Inode routines
> [2] Mat_CheckInode(): Found 140000 nodes of 700000. Limit used: 5. Using Inode routines
> [0] Mat_CheckInode(): Found 140000 nodes of 700000. Limit used: 5. Using Inode routines
> [3] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374783
> [3] PetscCommDuplicate():   returning tag 2147483645
> [3] PetscCommDuplicate():   returning tag 2147483638
> [2] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374783
> [2] PetscCommDuplicate():   returning tag 2147483645
> [2] PetscCommDuplicate():   returning tag 2147483638
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374783
> [0] PetscCommDuplicate():   returning tag 2147483645
> [0] MatSetUpMultiply_MPIAIJ(): Using block index set to define scatter
> [0] PetscCommDuplicate():   returning tag 2147483638
> [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374783
> [1] PetscCommDuplicate():   returning tag 2147483645
> [1] PetscCommDuplicate():   returning tag 2147483638
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374783
> [0] PetscCommDuplicate():   returning tag 2147483644
> [0] PetscCommDuplicate():   returning tag 2147483637
> [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374783
> [1] PetscCommDuplicate():   returning tag 2147483644
> [3] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374783
> [3] PetscCommDuplicate():   returning tag 2147483644
> [3] PetscCommDuplicate():   returning tag 2147483637
> [1] PetscCommDuplicate():   returning tag 2147483637
> [0] PetscCommDuplicate():   returning tag 2147483632
> [2] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374783
> [2] PetscCommDuplicate():   returning tag 2147483644
> [2] PetscCommDuplicate():   returning tag 2147483637
> [1] PetscCommDuplicate():   returning tag 2147483632
> [2] PetscCommDuplicate():   returning tag 2147483632
> [3] PetscCommDuplicate():   returning tag 2147483632
> [0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter
> [0] VecScatterCreate(): General case: MPI to Seq
> [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 700000 X 867; storage space: 0 unneeded,606900000 used
> [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 867
> [2] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [2] PetscCommDuplicate():   returning tag 2147483628
> [3] MatAssemblyEnd_SeqAIJ(): Matrix size: 700000 X 867; storage space: 0 unneeded,606900000 used
> [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> [3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 867
> [3] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [3] PetscCommDuplicate():   returning tag 2147483628
> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 700000 X 867; storage space: 0 unneeded,606900000 used
> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 867
> [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [1] PetscCommDuplicate():   returning tag 2147483628
> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 700000 X 867; storage space: 0 unneeded,606900000 used
> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 867
> 
> Writing data in binary format to /scratch/lustreA/v/vishy/biclass/ocr.train.x 
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate():   returning tag 2147483628
> APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)
> 
> <libsvm-to-binary.cpp>



More information about the petsc-users mailing list