[petsc-users] Unequal sparse matrix row distribution for MPI MatMult
Barry Smith
bsmith at mcs.anl.gov
Thu Apr 9 19:06:08 CDT 2015
Run with valgrind or in the debugger
> On Apr 9, 2015, at 6:49 PM, Steena M <stm8086 at yahoo.com> wrote:
>
> Thanks, Matt. From *View(), vectors x and y are being created and initialized correctly and their layout matches that of the matrix. There is a segmentation fault happening on one of the ranks, after all elements of y have been computed. The error (pasted below) is printed and visible only when using MatView(A,0) Besides changing the block size, local row sizes, and setting up for MatMult() I have not changed the code in ex190.c
>
> Process [0]
> 12
> .
> .
> Process [1]
> 20
> .
> .
> srun: error: sierra12: tasks 0-1: Segmentation fault (core dumped)
>
> Thanks in advance,
> Steena
>
>
>
>
> On Thursday, April 9, 2015 12:15 PM, Matthew Knepley <knepley at gmail.com> wrote:
>
>
> On Thu, Apr 9, 2015 at 2:12 PM, Steena M <stm8086 at yahoo.com> wrote:
> Thanks, Barry! I patched the master, modified src/mat/examples/tests/ex190.c to suit my data, and the fix works.
>
> I need to execute MatMult and I am assigning vectors on each rank using MatCreateVecs(A, &x, &y). However, I don't think multiplication is happening at all. (I inserted a few printf statements inside MatMult_MPIBAIJ and inside MatMult_SeqBAIJ1 to check). VecSum(&ysum) is always zero. Maybe I'm assigning data incorrectly?
>
> Code snippet in ex190.c after partitioning the matrix unequally
>
> ierr = MatLoad(A,fd);CHKERRQ(ierr);
> ierr = MatCreateVecs(A, &x, &y);CHKERRQ(ierr);
>
> ierr = VecSet(x,one);CHKERRQ(ierr); //PetscScalar one = 1.0;
> ierr = VecSet(y,zero); CHKERRQ(ierr); //PetscScalar zero = 0.0;
>
> MatView(A, 0) gives you the matrix, so you can see what Ax should be.
>
> ierr = MatMult(A,x,y); CHKERRQ(ierr);
>
> VecView(y, 0) gives you the output
>
> Matt
>
> ierr = VecSum(y,&ysum);CHKERRQ(ierr); //ysum is always zero.
>
> Data is a 20x20 matrix and block size is 1.
>
> Any thoughts?
>
> Thanks,
> Steena
>
>
> .
>
>
>
>
> On Friday, April 3, 2015 4:17 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>
>
> Steena,
>
> Sorry for all the delays. Our code was just totally wrong for a user provided decomposition for two reasons 1) the logic in MatLoad_MPIXAIJ() was too convoluted to work in all cases and 2) we didn't have a single test case.
>
> I have attached a patch that fixes it for petsc-3.5.3
> . The fix is also in the branch barry/fix-matload-uneven-rows
> and I'll put it into next then after testing it will got into maint and master and the next PETSc patch release.
>
> Please let us know if the patch doesn't work for you
>
> Thanks for your patience,
>
> Barry
>
> > On Apr 1, 2015, at 3:10 PM, Steena M <stm8086 at yahoo.com> wrote:
> >
> > Thanks Barry. Attached is the driver program, the binary mat file, and the corresponding mtx file.
> >
> > Runtime command used:
> >
> > sierra324 at monteiro:time srun -n 2 -ppdebug ./petsc-mpibaij-unequalrows -fin trefethen.dat -matload_block_size 1
> >
> >
> >
> >
> > On Wednesday, April 1, 2015 12:28 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> >
> >
> > Send a data file you generated and your reader program and we'll debug it.
> >
> > Barry
> >
> > > On Apr 1, 2015, at 2:18 PM, Steena M <stm8086 at yahoo.com> wrote:
> > >
> > > Thanks Barry. I removed the Preallocation calls. It is still complaining about the malloc and incorrect data in the matrix file. I generate binary matrix files using PETSc's pythonscript to loop through a set of UFL sparse matrices. For this use case:
> > >
> > > mtx_mat = scipy.io.mmread('trefethen.mtx')
> > > PetscBinaryIO.PetscBinaryIO().writeMatSciPy(open('trefnew.dat','w'), mtx_mat)
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Tuesday, March 31, 2015 9:15 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> > >
> > >
> > >
> > > You should not need to call any preallocation routines when using MatLoad()
> > >
> > >
> > > How did you generate the file? Are you sure it has the correct information for the matrix?
> > >
> > > Barry
> > >
> > >
> > >
> > > > On Mar 31, 2015, at 11:05 PM, Steena M <stm8086 at yahoo.com> wrote:
> > > >
> > > > Thanks Matt. I'm still getting the malloc error
> > > >
> > > > [0]PETSC ERROR: Argument out of range!
> > > > [0]PETSC ERROR: New nonzero at (2,18) caused a malloc!
> > > >
> > > > and
> > > >
> > > > a new incorrect matrix file error:
> > > >
> > > > [0]PETSC ERROR: Unexpected data in file!
> > > > [0]PETSC ERROR: not matrix object!
> > > >
> > > > Maybe the order of calls is mixed up. This is the code snippet:
> > > >
> > > > if (rank ==0)
> > > > {
> > > > PetscPrintf (PETSC_COMM_WORLD,"\n On rank %d ", rank);
> > > >
> > > > CHKERRQ(MatSetSizes(A, 15, PETSC_DETERMINE, 20, 20));
> > > > CHKERRQ(MatSetType(A, MATMPIBAIJ));
> > > > CHKERRQ( MatMPIBAIJSetPreallocation(A,1,1,NULL,1,NULL));
> > > > CHKERRQ( MatLoad(A,fd));
> > > > CHKERRQ(MatSetOption(A,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE));
> > > > }
> > > >
> > > > else
> > > > {
> > > > PetscPrintf (PETSC_COMM_WORLD,"\n On rank %d ", rank);
> > > >
> > > > CHKERRQ( MatSetSizes(A, 5, PETSC_DETERMINE, 20, 20) );
> > > > CHKERRQ(MatSetType(A, MATMPIBAIJ));
> > > > CHKERRQ( MatMPIBAIJSetPreallocation(A,1,1,NULL,1,NULL));
> > > > CHKERRQ(MatLoad(A,fd));
> > > > CHKERRQ(MatSetOption(A,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE));
> > > > }
> > > >
> > > > Is there something I'm missing?
> > > >
> > > > Thanks,
> > > > Steena
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Tuesday, March 31, 2015 6:10 PM, Matthew Knepley <knepley at gmail.com> wrote:
> > > >
> > > >
> > > > On Tue, Mar 31, 2015 at 6:51 PM, Steena M <stm8086 at yahoo.com> wrote:
> > > > Thanks Barry. I'm still getting the malloc error with NULL. Is there a way to distribute the matrix without explicit preallocation? Different matrices will be loaded during runtime and assigning preallocation parameters would mean an additional preprocessing step.
> > > >
> > > > 1) MatSetOption(MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE)
> > > >
> > > > 2) Note that this is never ever ever more efficient than making another pass and preallocating
> > > >
> > > > Thanks,
> > > >
> > > > Matt
> > > > --------------------------------------------
> > > > On Sun, 3/29/15, Barry Smith <bsmith at mcs.anl.gov> wrote:
> > > >
> > > > Subject: Re: [petsc-users] Unequal sparse matrix row distribution for MPI MatMult
> > > > To: "Steena M" <stm8086 at yahoo.com>
> > > > Cc: "Matthew Knepley" <knepley at gmail.com>, petsc-users at mcs.anl.gov
> > > > Date: Sunday, March 29, 2015, 9:26 PM
> > > >
> > > >
> > > > > On
> > > > Mar 29, 2015, at 11:05 PM, Steena M <stm8086 at yahoo.com>
> > > > wrote:
> > > > >
> > > > > Thanks
> > > > Matt. I used PETSC_DETERMINE but I'm now getting an
> > > > allocation-based error:
> > > > >
> > > > > [0]PETSC ERROR: ---------------------
> > > > Error Message ------------------------------------
> > > > > [0]PETSC ERROR: Argument out of range!
> > > > > [0]PETSC ERROR: New nonzero at (2,18)
> > > > caused a malloc!
> > > > > [0]PETSC ERROR:
> > > > ------------------------------------------------------------------------
> > > > >
> > > > > I tried
> > > > preallocating on each rank for the diagonal and off diagonal
> > > > section of the matrix as the next step My current
> > > > approximations for preallocation
> > > > >
> > > > > CHKERRQ(
> > > > MatMPIBAIJSetPreallocation(A,1,5,PETSC_DEFAULT,5,PETSC_DEFAULT));
> > > >
> > > >
> > > > These
> > > > arguments where you pass PETSC_DEFAULT are expecting a
> > > > pointer not an integer. You can pass NULL in those
> > > > locations. Though it is better to provide the correct
> > > > preallocation rather than some defaults.
> > > >
> > > > Barry
> > > >
> > > > >
> > > > > are throwing segmentation errors.
> > > > >
> > > > > [0]PETSC ERROR:
> > > > Caught signal number 11 SEGV: Segmentation Violation,
> > > > probably memory access out of range
> > > > >
> > > > > Any insights into what I'm doing
> > > > wrong?
> > > > >
> > > > > Thanks,
> > > > > Steena
> > > > >
> > > > >
> > > > >
> > > > > On Sun, 3/29/15, Matthew Knepley <knepley at gmail.com>
> > > > wrote:
> > > > >
> > > > > Subject:
> > > > Re: [petsc-users] Unequal sparse matrix row distribution for
> > > > MPI MatMult
> > > > > To: "Steena M"
> > > > <stm8086 at yahoo.com>
> > > > > Cc: "Barry Smith" <bsmith at mcs.anl.gov>,
> > > > petsc-users at mcs.anl.gov
> > > > > Date: Sunday, March 29, 2015, 10:02 PM
> > > > >
> > > > > On Sun, Mar 29, 2015
> > > > at
> > > > > 9:56 PM, Steena M <stm8086 at yahoo.com>
> > > > > wrote:
> > > > > Hi
> > > > > Barry,
> > > > >
> > > > >
> > > > >
> > > > > I am trying to partition a 20 row and 20
> > > > col sparse matrix
> > > > > between two procs
> > > > such that proc 0 has 15 rows and 20 cols
> > > > > and proc 1 has 5 rows and 20 cols. The
> > > > code snippet:
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > CHKERRQ(MatCreate(PETSC_COMM_WORLD,&A));
> > > > //
> > > > > at runtime: -matload_block_size 1
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > if (rank ==0)
> > > > >
> > > > > {
> > > > >
> > > > >
> > > > CHKERRQ( MatSetSizes(A, 15, 20, 20,
> > > > > 20) ); //rank 0 gets 75% of the rows
> > > > >
> > > > >
> > > > CHKERRQ( MatSetType(A, MATMPIBAIJ)
> > > > > );
> > > > >
> > > > > CHKERRQ(
> > > > MatLoad(A,fd) );
> > > > >
> > > > > }
> > > > >
> > > > >
> > > > >
> > > > > else
> > > > >
> > > > >
> > > > {
> > > > >
> > > > >
> > > > CHKERRQ( MatSetSizes(A, 5,
> > > > 20, 20,
> > > > > 20) ); //rank 1 gets 25% of the
> > > > rows
> > > > >
> > > > >
> > > > CHKERRQ( MatSetType(A, MATMPIBAIJ)
> > > > > );
> > > > >
> > > > > CHKERRQ(
> > > > MatLoad(A,fd) );
> > > > >
> > > > > }
> > > > >
> > > >
> > > > >
> > > > >
> > > > > This throws the following error (probably
> > > > from psplit.c):
> > > > >
> > > > >
> > > > [1]PETSC ERROR: --------------------- Error Message
> > > > > ------------------------------------
> > > > >
> > > > > [1]PETSC ERROR:
> > > > Nonconforming object sizes!
> > > > >
> > > > > [1]PETSC ERROR: Sum of local lengths 40
> > > > does not equal
> > > > > global length 20, my
> > > > local length 20
> > > > >
> > > > > likely a call to
> > > > VecSetSizes() or MatSetSizes() is
> > > > >
> > > > wrong.
> > > > >
> > > > > See
> > > > http://www.mcs.anl.gov/petsc/documentation/faq.html#split!
> > > > >
> > > > >
> > > > >
> > > > > This error printout
> > > > doesn't quite make sense to me.
> > > > >
> > > > I'm trying to specify a total matrix size of 20x20...
> > > > I
> > > > > haven't yet figured out where the
> > > > '40' comes
> > > > > from in the error
> > > > message.
> > > > >
> > > > >
> > > > >
> > > > > Any thoughts on what
> > > > might be going wrong?
> > > > >
> > > > > Its the column specification. Just
> > > > > use PETSC_DETERMINE for the local columns
> > > > since all our
> > > > > sparse matrixformats are
> > > > row divisions
> > > > > anyway.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Matt
> > > > > Thanks in advance,
> > > > >
> > > >
> > > > > Steena
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > --------------------------------------------
> > > > >
> > > > > On Sun, 3/22/15,
> > > > Barry Smith <bsmith at mcs.anl.gov>
> > > > > wrote:
> > > > >
> > > > >
> > > > >
> > > > > Subject: Re: [petsc-users] Unequal
> > > > sparse matrix row
> > > > > distribution for MPI
> > > > MatMult
> > > > >
> > > > > To:
> > > > "Steena M" <stm8086 at yahoo.com>
> > > > >
> > > > > Cc: petsc-users at mcs.anl.gov
> > > > >
> > > > > Date: Sunday,
> > > > March 22, 2015, 3:58 PM
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Steena,
> > > > >
> > > > >
> > > > >
> > > > > I am
> > > > >
> > > >
> > > > > a little unsure of your question.
> > > >
> > > > >
> > > > >
> > > > >
> > > > > 1)
> > > > you can create a MPIBAIJ
> > > > >
> > > > > matrix with any distribution of block
> > > > rows per process
> > > > > you
> > > > >
> > > > > want, just set the
> > > > local row size for each process to
> > > > >
> > > > be
> > > > >
> > > > > what you
> > > > like. Use MatCreateVecs() to get
> > > > >
> > > > correspondingly
> > > > >
> > > > >
> > > > laid out vectors.
> > > > >
> > > > >
> > > >
> > > > >
> > > > >
> > > > or 2) if you have a MPIBAIJ
> > > > >
> > > > > matrix with
> > > > "equal" row layout and you want a
> > > > > new
> > > > >
> > > > > one with uneven row layout you can
> > > > simply use
> > > > >
> > > > >
> > > > MatGetSubMatrix() to create that new matrix.
> > > > >
> > > > >
> > > > >
> > > > > Barry
> > > > >
> > > > >
> > > > >
> > > > > Unless you have
> > > > another reason to have the
> > > > >
> > > > > matrix with an equal number row layout I
> > > > would just
> > > > > generate
> > > > >
> > > > > the matrix with
> > > > the layout you want.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > On Mar 22, 2015, at 5:50 PM, Steena
> > > > M
> > > > >
> > > > > <stm8086 at yahoo.com>
> > > > >
> > > > > wrote:
> > > > >
> > > > > >
> > > > >
> > > > > > Hello,
> > > > >
> > > > > >
> > > > >
> > > > > > I need to
> > > > distribute
> > > > >
> > > > > a
> > > > sparse matrix such that each proc owns an unequal
> > > > > number
> > > > >
> > > > > of blocked rows before I proceed with
> > > > MPI MatMult. My
> > > > >
> > > > >
> > > > initial thoughts on doing this:
> > > > >
> > > > > >
> > > > >
> > > > > > 1) Use MatGetSubMatrices() on the
> > > > test
> > > > >
> > > > > MATMPIBAIJ
> > > > matrix to produce a new matrix where each
> > > > > proc
> > > > >
> > > > > has an unequal number of rows.
> > > > >
> > > > > >
> > > > >
> > > > > > 2) Provide
> > > > scatter context for vector X
> > > > >
> > > > > (for MatMult )using IS iscol from
> > > > MatGetSubMatrices()
> > > > > while
> > > > >
> > > > > creating the
> > > > vector X.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > 3)
> > > > Call MatMult()
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > Will
> > > > MatMult_MPIBAIJ continue to scatter
> > > > >
> > > > > this matrix and vector such that each
> > > > proc will own an
> > > > > equal
> > > > >
> > > > > number of matrix
> > > > rows and corresponding diagonal vector
> > > > >
> > > >
> > > > > elements? Should I write my own
> > > > MPIMatMult function to
> > > > >
> > > > > retain my redistribution of the matrix
> > > > and vector?
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > >
> > > > Thanks in
> > > > >
> > > > >
> > > > advance,
> > > > >
> > > > > >
> > > > Steena
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > What most
> > > > experimenters
> > > > > take for granted before
> > > > they begin their experiments is
> > > > >
> > > > infinitely more interesting than any results to which
> > > > their
> > > > > experiments lead.
> > > > > -- Norbert
> > > > > Wiener
> > > >
> > > > >
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> > > > -- Norbert Wiener
> > > >
> > > >
> > >
> > >
> >
> >
> > <trefethen.mtx><trefethen.dat><petsc-unequalrows-mpibaij.c>
>
>
>
>
>
> --
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
>
>
More information about the petsc-users
mailing list