[petsc-users] Question on writing a large matrix

S V N Vishwanathan vishy at stat.purdue.edu
Thu Apr 21 11:59:10 CDT 2011


> What is 'painfully slow'.  Do you have a profile or an estimate in
> terms of GB/s?  Have you taken a look at your process's memory
> allocation and checked to see if it is swapping?  My first guess would
> be that you are exceeding RAM and your program is thrashing as parts
> of the page table get swapped to and from disk mid-run.

A single machine does not have enough memory to hold the entire
matrix. That is why I have to assemble it in parallel. When distributed
across 8 machines the assembly seemed to finish in under an hr. However,
my program tried to write the matrix to file since yesterday night and
eventually crashed. The log just indicated

[1]PETSC ERROR: Caught signal number 1 Hang up: Some other process (or the batch system) has told this process to end

Most likely because it tried to allocate a large chunk of memory and
failed. 

I investigated using a smaller matrix and ran the code with the -info
flag (see below). What worries me are these lines:

Writing data in binary format to adult9.train.x 
....  >>>> I call MatView in my code here 
[1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 16281

Is MatView reconstructing the matrix at the root node? In that case the
program will definitely fail due to lack of memory. 

Please let me know if I you need any other information or if I can run
any other tests to help investigate.

vishy




mpiexec -n 2 ./libsvm-to-binary -in ../LibSVM/biclass/adult9/adult9.train.txt -data adult9.train.x -labels adult9.train.y -info 

[0] PetscInitialize(): PETSc successfully started: number of processors = 2
[1] PetscInitialize(): PETSc successfully started: number of processors = 2
[1] PetscInitialize(): Running on machine: rossmann-fe03.rcac.purdue.edu
[0] PetscInitialize(): Running on machine: rossmann-fe03.rcac.purdue.edu
No libsvm test file specified!

 Reading libsvm train file at ../LibSVM/biclass/adult9/adult9.train.txt
[0] PetscFOpen(): Opening file ../LibSVM/biclass/adult9/adult9.train.txt
[1] PetscFOpen(): Opening file ../LibSVM/biclass/adult9/adult9.train.txt
[0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374780 max tags = 2147483647
[0] PetscCommDuplicate():   returning tag 2147483647
[1] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374782 max tags = 2147483647
[1] PetscCommDuplicate():   returning tag 2147483647
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374780
[0] PetscCommDuplicate():   returning tag 2147483642
[1] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374782
[1] PetscCommDuplicate():   returning tag 2147483642
[0] PetscCommDuplicate(): Duplicating a communicator 1140850689 -2080374777 max tags = 2147483647
[1] PetscCommDuplicate(): Duplicating a communicator 1140850689 -2080374780 max tags = 2147483647
[1] PetscCommDuplicate():   returning tag 2147483647
[0] PetscCommDuplicate():   returning tag 2147483647
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374777
[0] PetscCommDuplicate():   returning tag 2147483646
[1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374780
[1] PetscCommDuplicate():   returning tag 2147483646
[0] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs.
[0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs.
[0] MatStashScatterBegin_Private(): No of messages: 0 
[0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
[1] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
[1] MatAssemblyEnd_SeqAIJ(): Matrix size: 16281 X 124; storage space: 225806 unneeded,0 used
[1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 0
[1] Mat_CheckInode(): Found 3257 nodes of 16281. Limit used: 5. Using Inode routines
[0] MatAssemblyEnd_SeqAIJ(): Matrix size: 16280 X 124; storage space: 0 unneeded,225786 used
[0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 14
[0] Mat_CheckInode(): Found 16280 nodes out of 16280 rows. Not using Inode routines
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374777
[0] PetscCommDuplicate():   returning tag 2147483645
[0] MatSetUpMultiply_MPIAIJ(): Using block index set to define scatter
[0] PetscCommDuplicate():   returning tag 2147483638
[1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374780
[1] PetscCommDuplicate():   returning tag 2147483645
[1] PetscCommDuplicate():   returning tag 2147483638
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374777
[1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374780
[1] PetscCommDuplicate():   returning tag 2147483644
[1] PetscCommDuplicate():   returning tag 2147483637
[0] PetscCommDuplicate():   returning tag 2147483644
[0] PetscCommDuplicate():   returning tag 2147483637
[1] PetscCommDuplicate():   returning tag 2147483632
[0] PetscCommDuplicate():   returning tag 2147483632
[0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter
[0] VecScatterCreate(): General case: MPI to Seq
[0] MatAssemblyEnd_SeqAIJ(): Matrix size: 16280 X 0; storage space: 0 unneeded,0 used
[0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 0

 Writing data in binary format to adult9.train.x 
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374780
[0] PetscCommDuplicate():   returning tag 2147483628
[1] MatAssemblyEnd_SeqAIJ(): Matrix size: 16281 X 123; storage space: 18409 unneeded,225806 used
[1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 16281
[1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 14
[1] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374782
[1] PetscCommDuplicate():   returning tag 2147483628
[1] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm 1140850689
[1] PetscCommDestroy(): Deleting PETSc MPI_Comm -2080374780
[1] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm -2080374780
[1] Petsc_DelCounter(): Deleting counter data in an MPI_Comm -2080374780
[1] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374782
[1] PetscCommDuplicate():   returning tag 2147483627
[0] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm 1140850689
[0] PetscCommDestroy(): Deleting PETSc MPI_Comm -2080374777
[0] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm -2080374777
[0] Petsc_DelCounter(): Deleting counter data in an MPI_Comm -2080374777

 Writing labels in binary format to adult9.train.y 
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374780
[0] PetscCommDuplicate():   returning tag 2147483627
[1] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm 1140850688
[1] PetscCommDestroy(): Deleting PETSc MPI_Comm -2080374782
[1] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm -2080374782
[1] Petsc_DelCounter(): Deleting counter data in an MPI_Comm -2080374782
[1] PetscFinalize(): PetscFinalize() called
[0] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm 1140850688
[0] PetscCommDestroy(): Deleting PETSc MPI_Comm -2080374780
[0] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm -2080374780
[0] Petsc_DelCounter(): Deleting counter data in an MPI_Comm -2080374780
[0] PetscFinalize(): PetscFinalize() called


> 
> On Thu, Apr 21, 2011 at 5:39 PM, S V N Vishwanathan <vishy at stat.purdue.edu> wrote:
> 
>     Hi
>    
>     I am using the attached code to convert a matrix from a rather
>     inefficient ascii format (each line is a row and contains a series of
>     idx:val pairs) to the PETSc binary format. Some of the matrices that I
>     am working with are rather huge (50GB ascii file) and cannot be
>     assembled on a single processor. When I use the attached code the matrix
>     assembly across machines seems to be fairly fast. However, dumping the
>     assembled matrix out to disk seems to be painfully slow. Any suggestions
>     on how to speed things up will be deeply appreciated.
>    
>     vishy
> 
> 



More information about the petsc-users mailing list