[petsc-users] Unequal sparse matrix row distribution for MPI MatMult

Steena M stm8086 at yahoo.com
Thu Apr 9 18:49:17 CDT 2015


Thanks, Matt. From *View(), vectors x and y are being created and initialized correctly and their layout matches that of the matrix.  There is a segmentation fault happening on one of the ranks, after all elements of y have been computed. The error (pasted below) is printed and visible only when using MatView(A,0)  Besides changing the block size, local row sizes, and setting up for MatMult()  I have not changed the code in ex190.c 
Process [0]12..Process [1]20..srun: error: sierra12: tasks 0-1: Segmentation fault (core dumped)

Thanks in advance,Steena 



     On Thursday, April 9, 2015 12:15 PM, Matthew Knepley <knepley at gmail.com> wrote:
   

 On Thu, Apr 9, 2015 at 2:12 PM, Steena M <stm8086 at yahoo.com> wrote:

Thanks, Barry! I patched the master, modified src/mat/examples/tests/ex190.c to suit my data, and the fix works.
I need to execute MatMult and I am assigning  vectors on each rank using MatCreateVecs(A, &x, &y). However, I don't think multiplication is happening at all. (I inserted a few printf statements inside MatMult_MPIBAIJ and inside MatMult_SeqBAIJ1 to check).   VecSum(&ysum) is always zero. Maybe I'm assigning data incorrectly? 
Code snippet in ex190.c after partitioning the matrix unequally
  ierr = MatLoad(A,fd);CHKERRQ(ierr);  ierr = MatCreateVecs(A, &x, &y);CHKERRQ(ierr);    ierr = VecSet(x,one);CHKERRQ(ierr); //PetscScalar one = 1.0;  ierr = VecSet(y,zero); CHKERRQ(ierr); //PetscScalar zero = 0.0;

MatView(A, 0) gives you the matrix, so you can see what Ax should be. 

  ierr = MatMult(A,x,y); CHKERRQ(ierr);

VecView(y, 0) gives you the output
  Matt 
  ierr =  VecSum(y,&ysum);CHKERRQ(ierr); //ysum is always zero. 
Data is a 20x20 matrix and block size is 1.
Any thoughts?
Thanks,Steena   . 
  


     On Friday, April 3, 2015 4:17 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
   

 
  Steena,

  Sorry for all the delays. Our code was just totally wrong for a user provided decomposition for two reasons 1) the logic in MatLoad_MPIXAIJ() was too convoluted to work in all cases and 2) we didn't have a single test case.

  I have attached a patch that fixes it for petsc-3.5.3 
.  The fix is also in the branch barry/fix-matload-uneven-rows
and I'll put it into next then after testing it will got into maint and master and the next PETSc patch release.

  Please let us know if the patch doesn't work for you

  Thanks for your patience,

  Barry

> On Apr 1, 2015, at 3:10 PM, Steena M <stm8086 at yahoo.com> wrote:
> 
> Thanks Barry. Attached is the driver program, the binary mat file, and the corresponding mtx file.
> 
> Runtime command used:
> 
> sierra324 at monteiro:time srun -n 2 -ppdebug ./petsc-mpibaij-unequalrows -fin trefethen.dat -matload_block_size 1
> 
> 
> 
> 
> On Wednesday, April 1, 2015 12:28 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
> 
> 
>  Send a data file you generated and your reader program and we'll debug it.
> 
>  Barry
> 
> > On Apr 1, 2015, at 2:18 PM, Steena M <stm8086 at yahoo.com> wrote:
> > 
> > Thanks Barry. I removed the Preallocation calls. It is still complaining about the malloc and incorrect data in the matrix file. I generate binary matrix files using PETSc's pythonscript to loop through a set of UFL sparse matrices. For this use case:
> > 
> > mtx_mat = scipy.io.mmread('trefethen.mtx')
> > PetscBinaryIO.PetscBinaryIO().writeMatSciPy(open('trefnew.dat','w'), mtx_mat)
> > 
> > 
>> > 
> > 
> > 
> > On Tuesday, March 31, 2015 9:15 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> > 
> > 
> > 
> >  You should not need to call any preallocation routines when using MatLoad()
> > 
> > 
> >  How did you generate the file? Are you sure it has the correct information for the matrix? 
> > 
> >  Barry
> > 
> > 
> > 
> > > On Mar 31, 2015, at 11:05 PM, Steena M <stm8086 at yahoo.com> wrote:
> > > 
> > > Thanks Matt. I'm still getting the malloc error 
> > > 
> > > [0]PETSC ERROR: Argument out of range!
> > > [0]PETSC ERROR: New nonzero at (2,18) caused a malloc!
> > > 
> > > and 
> > > 
> > > a new incorrect matrix file error:
> > > 
> > > [0]PETSC ERROR: Unexpected data in file!
> > > [0]PETSC ERROR: not matrix object!
> > > 
> > > Maybe the order of calls is mixed up. This is the code snippet:
> > > 
> > >    if (rank ==0)
> > >    {
> > >    PetscPrintf (PETSC_COMM_WORLD,"\n On rank %d ", rank);
> > >      
> > >        CHKERRQ(MatSetSizes(A, 15, PETSC_DETERMINE, 20, 20));
> > >        CHKERRQ(MatSetType(A, MATMPIBAIJ));
> > >        CHKERRQ( MatMPIBAIJSetPreallocation(A,1,1,NULL,1,NULL));
> > >        CHKERRQ( MatLoad(A,fd)); 
> > >    CHKERRQ(MatSetOption(A,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE));
> > >    }
> > >    
> > >    else
> > >    {
> > >        PetscPrintf (PETSC_COMM_WORLD,"\n On rank %d ", rank);
> > >        
> > >        CHKERRQ( MatSetSizes(A, 5, PETSC_DETERMINE, 20, 20) );
> > >        CHKERRQ(MatSetType(A, MATMPIBAIJ));
> > >        CHKERRQ( MatMPIBAIJSetPreallocation(A,1,1,NULL,1,NULL));
> > >        CHKERRQ(MatLoad(A,fd));
> > >    CHKERRQ(MatSetOption(A,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE));
> > >        }
> > > 
> > > Is there something I'm missing? 
> > > 
> > > Thanks,
> > > Steena
> > >    
> > > 
> > > 
> > > 
> > > 
> > > On Tuesday, March 31, 2015 6:10 PM, Matthew Knepley <knepley at gmail.com> wrote:
> > > 
> > > 
> > > On Tue, Mar 31, 2015 at 6:51 PM, Steena M <stm8086 at yahoo.com> wrote:
> > > Thanks Barry. I'm still getting the malloc error with NULL. Is there a way to distribute the matrix without explicit preallocation? Different matrices will be loaded during runtime and assigning preallocation parameters would mean an additional preprocessing step.
> > > 
> > > 1) MatSetOption(MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE)
> > >  
> > > 2) Note that this is never ever ever more efficient than making another pass and preallocating
> > > 
> > >  Thanks,
> > > 
> > >      Matt
> > > --------------------------------------------
> > > On Sun, 3/29/15, Barry Smith <bsmith at mcs.anl.gov> wrote:
> > > 
> > >  Subject: Re: [petsc-users] Unequal sparse matrix row distribution for MPI MatMult
> > >  To: "Steena M" <stm8086 at yahoo.com>
> > >  Cc: "Matthew Knepley" <knepley at gmail.com>, petsc-users at mcs.anl.gov
> > >  Date: Sunday, March 29, 2015, 9:26 PM
> > > 
> > > 
> > >  > On
> > >  Mar 29, 2015, at 11:05 PM, Steena M <stm8086 at yahoo.com>
> > >  wrote:
> > >  >
> > >  > Thanks
> > >  Matt. I used PETSC_DETERMINE but I'm now getting an
> > >  allocation-based error:
> > >  >
> > >  > [0]PETSC ERROR: ---------------------
> > >  Error Message ------------------------------------
> > >  > [0]PETSC ERROR: Argument out of range!
> > >  > [0]PETSC ERROR: New nonzero at (2,18)
> > >  caused a malloc!
> > >  > [0]PETSC ERROR:
> > >  ------------------------------------------------------------------------
> > >  >
> > >  > I tried
> > >  preallocating on each rank for the diagonal and off diagonal
> > >  section of the matrix as the next step  My current
> > >  approximations for preallocation
> > >  >
> > >  > CHKERRQ(
> > >  MatMPIBAIJSetPreallocation(A,1,5,PETSC_DEFAULT,5,PETSC_DEFAULT)); 
> > > 
> > > 
> > >    These
> > >  arguments where you pass PETSC_DEFAULT are expecting a
> > >  pointer not an integer. You can pass NULL in those
> > >  locations. Though it is better to provide the correct
> > >  preallocation rather than some defaults.
> > > 
> > >    Barry
> > > 
> > >  >
> > >  > are throwing segmentation errors.
> > >  >
> > >  > [0]PETSC ERROR: 
> > >  Caught signal number 11 SEGV: Segmentation Violation,
> > >  probably memory access out of range
> > >  >
> > >  > Any insights into what I'm doing
> > >  wrong?
> > >  >
> > >  > Thanks,
> > >  > Steena
> > >  >
> > >  >
> > >  >
> > >  > On Sun, 3/29/15, Matthew Knepley <knepley at gmail.com>
> > >  wrote:
> > >  >
> > >  > Subject:
> > >  Re: [petsc-users] Unequal sparse matrix row distribution for
> > >  MPI MatMult
> > >  > To: "Steena M"
> > >  <stm8086 at yahoo.com>
> > >  > Cc: "Barry Smith" <bsmith at mcs.anl.gov>,
> > >  petsc-users at mcs.anl.gov
> > >  > Date: Sunday, March 29, 2015, 10:02 PM
> > >  >
> > >  > On Sun, Mar 29, 2015
> > >  at
> > >  > 9:56 PM, Steena M <stm8086 at yahoo.com>
> > >  > wrote:
> > >  > Hi
> > >  > Barry,
> > >  >
> > >  >
> > >  >
> > >  > I am trying to partition a 20 row and 20
> > >  col sparse matrix
> > >  > between two procs
> > >  such that proc 0 has 15 rows and 20 cols
> > >  > and proc 1 has 5 rows and 20 cols. The
> > >  code snippet:
> > >  >
> > >  >
> > >  >
> > >  >
> > >  >
> > >  >
> > >  >
> > >  >    
> > >    CHKERRQ(MatCreate(PETSC_COMM_WORLD,&A));
> > >  //
> > >  > at runtime: -matload_block_size 1
> > >  >
> > >  >
> > >  >
> > >  >
> > >  >
> > >  >    
> > >    if (rank ==0)
> > >  >
> > >  >        {
> > >  >
> > >  >            
> > >    CHKERRQ( MatSetSizes(A, 15, 20, 20,
> > >  > 20) ); //rank 0 gets 75% of the rows
> > >  >
> > >  >            
> > >    CHKERRQ( MatSetType(A, MATMPIBAIJ)
> > >  > );
> > >  >
> > >  >                CHKERRQ(
> > >  MatLoad(A,fd) );
> > >  >
> > >  >          }
> > >  >
> > >  >
> > >  >
> > >  >        else
> > >  >
> > >  >    
> > >    {
> > >  >
> > >  > 
> > >                CHKERRQ( MatSetSizes(A, 5,
> > >  20, 20,
> > >  > 20) ); //rank 1 gets 25% of the
> > >  rows
> > >  >
> > >  >      
> > >          CHKERRQ( MatSetType(A, MATMPIBAIJ)
> > >  > );
> > >  >
> > >  >                CHKERRQ(
> > >  MatLoad(A,fd) );
> > >  >
> > >  >      }
> > >  >
> > > 
> > >  >
> > >  >
> > >  > This throws the following error (probably
> > >  from psplit.c):
> > >  >
> > >  >
> > >  [1]PETSC ERROR: --------------------- Error Message
> > >  > ------------------------------------
> > >  >
> > >  > [1]PETSC ERROR:
> > >  Nonconforming object sizes!
> > >  >
> > >  > [1]PETSC ERROR: Sum of local lengths 40
> > >  does not equal
> > >  > global length 20, my
> > >  local length 20
> > >  >
> > >  >  likely a call to
> > >  VecSetSizes() or MatSetSizes() is
> > >  >
> > >  wrong.
> > >  >
> > >  > See
> > >  http://www.mcs.anl.gov/petsc/documentation/faq.html#split!
> > >  >
> > >  >
> > >  >
> > >  > This error printout
> > >  doesn't quite make sense to me.
> > >  >
> > >  I'm trying to specify a total matrix size of 20x20...
> > >  I
> > >  > haven't yet figured out where the
> > >  '40' comes
> > >  > from in the error
> > >  message.
> > >  >
> > >  >
> > >  >
> > >  > Any thoughts on what
> > >  might be going wrong?
> > >  >
> > >  > Its the column specification. Just
> > >  > use PETSC_DETERMINE for the local columns
> > >  since all our
> > >  > sparse matrixformats are
> > >  row divisions
> > >  > anyway.
> > >  > 
> > >  > Thanks,
> > >  >  
> > >  >  Matt
> > >  > Thanks in advance,
> > >  >
> > > 
> > >  > Steena
> > >  >
> > >  >
> > >  >
> > >  >
> > >  >
> > >  >
> > >  >
> > >  >
> > >  --------------------------------------------
> > >  >
> > >  > On Sun, 3/22/15,
> > >  Barry Smith <bsmith at mcs.anl.gov>
> > >  > wrote:
> > >  >
> > >  >
> > >  >
> > >  >  Subject: Re: [petsc-users] Unequal
> > >  sparse matrix row
> > >  > distribution for MPI
> > >  MatMult
> > >  >
> > >  >  To:
> > >  "Steena M" <stm8086 at yahoo.com>
> > >  >
> > >  >  Cc: petsc-users at mcs.anl.gov
> > >  >
> > >  >  Date: Sunday,
> > >  March 22, 2015, 3:58 PM
> > >  >
> > >  >
> > >  >
> > >  >
> > >  >
> > >  >  
> > >  >
> > >  >  Steena,
> > >  >
> > >  >
> > >  >
> > >  >    I am
> > >  >
> > > 
> > >  >  a little unsure of your question.
> > > 
> > >  >
> > >  >
> > >  >
> > >  >    1)
> > >  you can create a MPIBAIJ
> > >  >
> > >  >  matrix with any distribution of block
> > >  rows per process
> > >  > you
> > >  >
> > >  >  want, just set the
> > >  local row size for each process to
> > >  >
> > >  be
> > >  >
> > >  >  what you
> > >  like.  Use MatCreateVecs() to get
> > >  >
> > >  correspondingly
> > >  >
> > >  > 
> > >  laid out vectors.
> > >  >
> > >  >
> > > 
> > >  >
> > >  > 
> > >    or 2) if you have a MPIBAIJ
> > >  >
> > >  >  matrix with
> > >  "equal" row layout and you want a
> > >  > new
> > >  >
> > >  >  one with uneven row layout you can
> > >  simply use
> > >  >
> > >  > 
> > >  MatGetSubMatrix() to create that new matrix.
> > >  >
> > >  >
> > >  >
> > >  >    Barry
> > >  >
> > >  >
> > >  >
> > >  >  Unless you have
> > >  another reason to have the
> > >  >
> > >  >  matrix with an equal number row layout I
> > >  would just
> > >  > generate
> > >  >
> > >  >  the matrix with
> > >  the layout you want.
> > >  >
> > >  >
> > >  >
> > >  >
> > >  >
> > >  >  > On Mar 22, 2015, at 5:50 PM, Steena
> > >  M
> > >  >
> > >  >  <stm8086 at yahoo.com>
> > >  >
> > >  >  wrote:
> > >  >
> > >  >  >
> > >  >
> > >  >  > Hello,
> > >  >
> > >  >  >
> > >  >
> > >  >  > I need to
> > >  distribute
> > >  >
> > >  >  a
> > >  sparse matrix such that each proc owns an unequal
> > >  > number
> > >  >
> > >  >  of blocked rows before I proceed with
> > >  MPI MatMult. My
> > >  >
> > >  > 
> > >  initial thoughts on doing this:
> > >  >
> > >  >  >
> > >  >
> > >  >  > 1) Use  MatGetSubMatrices() on the
> > >  test
> > >  >
> > >  >  MATMPIBAIJ
> > >  matrix to produce a new matrix where each
> > >  > proc
> > >  >
> > >  >  has an unequal number of rows.
> > >  >
> > >  >  >
> > >  >
> > >  >  > 2) Provide
> > >  scatter context for vector X
> > >  >
> > >  >  (for MatMult )using IS iscol from
> > >  MatGetSubMatrices()
> > >  > while
> > >  >
> > >  >  creating the
> > >  vector X.
> > >  >
> > >  > 
> > >  >
> > >  >
> > >  >  > 3)
> > >  Call MatMult()
> > >  >
> > >  > 
> > >  >
> > >  >
> > >  >  > Will
> > >  MatMult_MPIBAIJ continue to scatter
> > >  >
> > >  >  this matrix and vector such that each
> > >  proc will own an
> > >  > equal
> > >  >
> > >  >  number of matrix
> > >  rows and corresponding diagonal vector
> > >  >
> > > 
> > >  >  elements? Should I write my own
> > >  MPIMatMult function to
> > >  >
> > >  >  retain my redistribution of the matrix
> > >  and vector?
> > >  >
> > >  > 
> > >  >
> > >  >
> > >  >  >
> > >  Thanks in
> > >  >
> > >  > 
> > >  advance,
> > >  >
> > >  >  >
> > >  Steena
> > >  >
> > >  >
> > >  >
> > >  >
> > >  >
> > >  >
> > >  > --
> > >  > What most
> > >  experimenters
> > >  > take for granted before
> > >  they begin their experiments is
> > >  >
> > >  infinitely more interesting than any results to which
> > >  their
> > >  > experiments lead.
> > >  > -- Norbert
> > >  > Wiener
> > > 
> > >  >
> > > 
> > > 
> > > 
> > > 
> > > -- 
> > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> > > -- Norbert Wiener
> > > 
> > > 
> > 
> > 
> 
> 
> <trefethen.mtx><trefethen.dat><petsc-unequalrows-mpibaij.c>


   



-- 
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150409/50dff11f/attachment-0001.html>


More information about the petsc-users mailing list