[petsc-users] Question on writing a large matrix

S V N Vishwanathan vishy at stat.purdue.edu
Sat Apr 23 12:36:35 CDT 2011


Barry,

>    It has not assembled the matrix in an hour. It is working all night
>    to assemble the matrix, the problem is that you are not
>    preallocating the nonzeros per row with MatMPIAIJSetPreallocation()
>    when pre allocation is correct it will always print 0 for Number of
>    mallocs. The actual writing of the parallel matrix to the binary
>    file will take at most minutes.

You were absolutely right! I had not set the preallocation properly and
hence the code was painfully slow. I fixed that issue (see attached
code) and now it runs much faster. However, I am having a different
problem now. When I run the code for smaller matrices (less than a
million rows) everything works well. However, when working with large
matrices (e.g. 2.8 million rows x 1157 columns) writing the matrix to
file dies with the following message:

Fatal error in MPI_Recv: Other MPI error

Any hints on how to solve this problem or are deeply appreciated. 

vishy

The output of running the code with the -info flag is as follows:

[0] PetscInitialize(): PETSc successfully started: number of processors = 4
[0] PetscInitialize(): Running on machine: rossmann-b001.rcac.purdue.edu
[3] PetscInitialize(): PETSc successfully started: number of processors = 4
[3] PetscInitialize(): Running on machine: rossmann-b004.rcac.purdue.edu
No libsvm test file specified!

 Reading libsvm train file at /scratch/lustreA/v/vishy/LibSVM/biclass/ocr/ocr.train.txt
[2] PetscInitialize(): PETSc successfully started: number of processors = 4
[2] PetscInitialize(): Running on machine: rossmann-b003.rcac.purdue.edu
[3] PetscFOpen(): Opening file /scratch/lustreA/v/vishy/LibSVM/biclass/ocr/ocr.train.txt
[2] PetscFOpen(): Opening file /scratch/lustreA/v/vishy/LibSVM/biclass/ocr/ocr.train.txt
[0] PetscFOpen(): Opening file /scratch/lustreA/v/vishy/LibSVM/biclass/ocr/ocr.train.txt
[1] PetscInitialize(): PETSc successfully started: number of processors = 4
[1] PetscInitialize(): Running on machine: rossmann-b002.rcac.purdue.edu
[1] PetscFOpen(): Opening file /scratch/lustreA/v/vishy/LibSVM/biclass/ocr/ocr.train.txt
m=100000
m=200000
m=300000
m=400000
m=500000
m=600000
m=700000
m=800000
m=900000
m=1000000
m=1100000
m=1200000
m=1300000
m=1400000
m=1500000
m=1600000
m=1700000
m=1800000
m=1900000
m=2000000
m=2100000
m=2200000
m=2300000
m=2400000
m=2500000
m=2600000
m=2700000
m=2800000
user.dim=1157 user.m=2800000 user.maxnnz=1156 user.maxlen=32768 user.flg=1 
[0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 2147483647
[0] PetscCommDuplicate():   returning tag 2147483647
user.dim=1157 user.m=2800000 user.maxnnz=1156 user.maxlen=32768 user.flg=1 
user.dim=1157 user.m=2800000 user.maxnnz=1156 user.maxlen=32768 user.flg=1 
user.dim=1157 user.m=2800000 user.maxnnz=1156 user.maxlen=32768 user.flg=1 
[0] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm 1140850688
[0] PetscCommDestroy(): Deleting PETSc MPI_Comm -2080374784
[0] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm -2080374784
[0] Petsc_DelCounter(): Deleting counter data in an MPI_Comm -2080374784
[2] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 2147483647
[2] PetscCommDuplicate():   returning tag 2147483647
[2] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm 1140850688
[2] PetscCommDestroy(): Deleting PETSc MPI_Comm -2080374784
[2] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm -2080374784
[2] Petsc_DelCounter(): Deleting counter data in an MPI_Comm -2080374784
[3] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 2147483647
[3] PetscCommDuplicate():   returning tag 2147483647
[3] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm 1140850688
[3] PetscCommDestroy(): Deleting PETSc MPI_Comm -2080374784
[3] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm -2080374784
[3] Petsc_DelCounter(): Deleting counter data in an MPI_Comm -2080374784
[0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 2147483647
[0] PetscCommDuplicate():   returning tag 2147483647
[1] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 2147483647
[1] PetscCommDuplicate():   returning tag 2147483647
[1] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm 1140850688
[1] PetscCommDestroy(): Deleting PETSc MPI_Comm -2080374784
[1] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm -2080374784
[1] Petsc_DelCounter(): Deleting counter data in an MPI_Comm -2080374784
[1] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 2147483647
[1] PetscCommDuplicate():   returning tag 2147483647
[2] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 2147483647
[2] PetscCommDuplicate():   returning tag 2147483647
[3] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 2147483647
[3] PetscCommDuplicate():   returning tag 2147483647
[3] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
[3] PetscCommDuplicate():   returning tag 2147483642
[2] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
[2] PetscCommDuplicate():   returning tag 2147483642
[1] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
[1] PetscCommDuplicate():   returning tag 2147483642
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
[0] PetscCommDuplicate():   returning tag 2147483642
[0] MatSetUpPreallocation(): Warning not preallocating matrix storage
[0] PetscCommDuplicate(): Duplicating a communicator 1140850689 -2080374783 max tags = 2147483647
[0] PetscCommDuplicate():   returning tag 2147483647
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374783
[0] PetscCommDuplicate():   returning tag 2147483646
[2] PetscCommDuplicate(): Duplicating a communicator 1140850689 -2080374783 max tags = 2147483647
[2] PetscCommDuplicate():   returning tag 2147483647
[1] PetscCommDuplicate(): Duplicating a communicator 1140850689 -2080374783 max tags = 2147483647
[1] PetscCommDuplicate():   returning tag 2147483647
[2] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374783
[2] PetscCommDuplicate():   returning tag 2147483646
[1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374783
[1] PetscCommDuplicate():   returning tag 2147483646
[3] PetscCommDuplicate(): Duplicating a communicator 1140850689 -2080374783 max tags = 2147483647
[3] PetscCommDuplicate():   returning tag 2147483647
[3] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374783
[3] PetscCommDuplicate():   returning tag 2147483646
[0] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs.
[0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs.
[0] MatStashScatterBegin_Private(): No of messages: 0 
[0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
[1] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
[3] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
[2] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
[1] MatAssemblyEnd_SeqAIJ(): Matrix size: 700000 X 289; storage space: 0 unneeded,202300000 used
[1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 289
[3] MatAssemblyEnd_SeqAIJ(): Matrix size: 700000 X 289; storage space: 0 unneeded,202300000 used
[3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 289
[0] MatAssemblyEnd_SeqAIJ(): Matrix size: 700000 X 290; storage space: 0 unneeded,202300000 used
[0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 289
[2] MatAssemblyEnd_SeqAIJ(): Matrix size: 700000 X 289; storage space: 0 unneeded,202300000 used
[2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 289
[1] Mat_CheckInode(): Found 140000 nodes of 700000. Limit used: 5. Using Inode routines
[3] Mat_CheckInode(): Found 140000 nodes of 700000. Limit used: 5. Using Inode routines
[2] Mat_CheckInode(): Found 140000 nodes of 700000. Limit used: 5. Using Inode routines
[0] Mat_CheckInode(): Found 140000 nodes of 700000. Limit used: 5. Using Inode routines
[3] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374783
[3] PetscCommDuplicate():   returning tag 2147483645
[3] PetscCommDuplicate():   returning tag 2147483638
[2] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374783
[2] PetscCommDuplicate():   returning tag 2147483645
[2] PetscCommDuplicate():   returning tag 2147483638
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374783
[0] PetscCommDuplicate():   returning tag 2147483645
[0] MatSetUpMultiply_MPIAIJ(): Using block index set to define scatter
[0] PetscCommDuplicate():   returning tag 2147483638
[1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374783
[1] PetscCommDuplicate():   returning tag 2147483645
[1] PetscCommDuplicate():   returning tag 2147483638
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374783
[0] PetscCommDuplicate():   returning tag 2147483644
[0] PetscCommDuplicate():   returning tag 2147483637
[1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374783
[1] PetscCommDuplicate():   returning tag 2147483644
[3] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374783
[3] PetscCommDuplicate():   returning tag 2147483644
[3] PetscCommDuplicate():   returning tag 2147483637
[1] PetscCommDuplicate():   returning tag 2147483637
[0] PetscCommDuplicate():   returning tag 2147483632
[2] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374783
[2] PetscCommDuplicate():   returning tag 2147483644
[2] PetscCommDuplicate():   returning tag 2147483637
[1] PetscCommDuplicate():   returning tag 2147483632
[2] PetscCommDuplicate():   returning tag 2147483632
[3] PetscCommDuplicate():   returning tag 2147483632
[0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter
[0] VecScatterCreate(): General case: MPI to Seq
[2] MatAssemblyEnd_SeqAIJ(): Matrix size: 700000 X 867; storage space: 0 unneeded,606900000 used
[2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 867
[2] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
[2] PetscCommDuplicate():   returning tag 2147483628
[3] MatAssemblyEnd_SeqAIJ(): Matrix size: 700000 X 867; storage space: 0 unneeded,606900000 used
[3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 867
[3] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
[3] PetscCommDuplicate():   returning tag 2147483628
[1] MatAssemblyEnd_SeqAIJ(): Matrix size: 700000 X 867; storage space: 0 unneeded,606900000 used
[1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 867
[1] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
[1] PetscCommDuplicate():   returning tag 2147483628
[0] MatAssemblyEnd_SeqAIJ(): Matrix size: 700000 X 867; storage space: 0 unneeded,606900000 used
[0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 867

 Writing data in binary format to /scratch/lustreA/v/vishy/biclass/ocr.train.x 
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
[0] PetscCommDuplicate():   returning tag 2147483628
APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: libsvm-to-binary.cpp
Type: application/octet-stream
Size: 15449 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110423/64fe1a16/attachment.obj>


More information about the petsc-users mailing list