[petsc-users] MatMult() returning different values depending on # of processors?

Barry Smith bsmith at mcs.anl.gov
Thu Jun 26 21:41:53 CDT 2014


  Using the point wise differences in the two computed vectors is not a meaningful way to check if the matrix-vector product is correct. 
I loaded the reduced matrix up in matlab (after doing a MatView(C,PETSC_VIEWER_BINARY_WORLD) in the code) and did the calculation 

>> A(82:162,1:81)*x(1:81) + A(82:162,82:162)*x(82:162) - A(82:162,:)*x

ans =

    -1.333828737741757e-23
     3.970466940254533e-22
     3.970466940254533e-22
     6.606856988583543e-20
     6.606856988583543e-20
    -4.878909776184770e-19
    -4.878909776184770e-19
                         0
                         0
                         0
                         0
                         0
                         0
    -2.628894374088577e-31
    -2.628894374088577e-31
    -1.709836290543913e-26
    -1.709836290543913e-26
    -2.136210087789533e-24
    -2.205680334546916e-24
    -2.584939414228211e-26
    -2.584939414228211e-26
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0

>> A(1:81,1:81)*x(1:81) + A(1:81,82:162)*x(82:162) - A(1:81,:)*x

ans =

                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
    -1.734723475976807e-18
                         0
                         0
                         0
     1.925929944387236e-34
     1.925929944387236e-34
     1.972152263052530e-31
     1.972152263052530e-31
     3.453317498695501e-26
     2.019483917365790e-28
    -1.333828737741757e-23

  Note that the two different ways of computing the same product (which would be identical in “infinite precision”) gives differences on the same order as running the PETSc program on two processes.    When running in parallel the order that the computations are done (and any movement of the partial sums from registers to memory) is different for different number of processes hence different results.

   Note also that the Intel floating point registers are 80 bits wide while double precision numbers in memory are 64 bit. Thus moving a partially computed sum to memory and then back to register wipes out 16 bits of the number.

   Barry



On Jun 26, 2014, at 9:10 PM, Arthur Kurlej <akurlej at gmail.com> wrote:

> Hmm, so I made that change, but my program still exhibits the same problem with it's output.
> 
> 
> On Thu, Jun 26, 2014 at 9:05 PM, Peter Brune <prbrune at gmail.com> wrote:
> Yep!  Looks good.  One other thing is that you can pre/postface all PETSc calls with ierr = ... CHKERRQ(ierr); and it will properly trace other problems if they arise.
> 
> - Peter
> 
> 
> On Thu, Jun 26, 2014 at 8:56 PM, Arthur Kurlej <akurlej at gmail.com> wrote:
> Hello,
> 
> I'm sorry, but this is a bit new for me, and I'm still not quite sure I follow.
> 
> Are you recommending that opposed to doing this:
> if(procs!=1){
> 	for(i=0;i<procs;i++){
> 		if(rank==i){
> 			VecGetLocalSize(*x,&length);
> 			VecGetOwnershipRange(*x,&div,NULL);
> 			ISCreateStride(PETSC_COMM_WORLD,length,div+begin,1,&iscol);
> 			ISCreateStride(PETSC_COMM_WORLD,length,div+begin,1,&isrow);
> 			ierr = MatGetSubMatrix(*A,isrow,iscol,MAT_INITIAL_MATRIX,AA); CHKERRQ(ierr);
> 		}
> 	}
> }
> else{
> ISCreateStride(PETSC_COMM_SELF,final_size,begin,1,&iscol);
> ISCreateStride(PETSC_COMM_SELF,final_size,begin,1,&isrow);
> ierr = MatGetSubMatrix(*A,isrow,iscol,MAT_INITIAL_MATRIX,AA); CHKERRQ(ierr);
> }
> 
> 
> The proper implementation would instead just be the following:
> VecGetLocalSize(*x,&length);
> VecGetOwnershipRange(*x,&div,NULL);
> ISCreateStride(PETSC_COMM_WORLD,length,div+begin,1,&iscol);
> ISCreateStride(PETSC_COMM_WORLD,length,div+begin,1,&isrow);
> ierr = MatGetSubMatrix(*A,isrow,iscol,MAT_INITIAL_MATRIX,AA); CHKERRQ(ierr);
> ?
> 
> 
> 
> 
> 
> On Thu, Jun 26, 2014 at 5:32 PM, Peter Brune <prbrune at gmail.com> wrote:
> MatGetSubMatrix() is collective on Mat and ISCreateXXX is collective on the provided comm, so the logic you have built to call it on one proc at a time is unnecessary at best and most likely incorrect and likely to produce strange results.  You can forgo the if statement and loop over processors, create the ISes on the same comm as x, and then call MatGetSubMatrix() once.
> 
> - Peter
> 
> 
> On Thu, Jun 26, 2014 at 4:26 PM, Arthur Kurlej <akurlej at gmail.com> wrote:
> I cannot send the original code, but I reproduced the problem in another code. I have attached a makefile the code, and the data for the x vector and A matrix.
> 
> I think the problem may be with my ShortenMatrix function, but it's not clear to me what exactly is going wrong and how to fix it. So I would appreciate some assistance there.
> 
> 
> Thanks,
> Arthur
> 
> 
> 
> On Wed, Jun 25, 2014 at 6:24 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
>    Can you send the code that reproduces this behavior?
> 
>    Barry
> 
> On Jun 25, 2014, at 4:37 PM, Arthur Kurlej <akurlej at gmail.com> wrote:
> 
> > Hi Barry,
> >
> > So for the matrix C that I am currently testing (size 162x162), the condition number is roughly 10^4.
> >
> > For reference, I'm porting MATLAB code into PETSc, and for one processor, the PETSc b vector is roughly equivalent to the MATLAB b vector. So I know that for one processor, my program is performing as expected.
> >
> > I've included examples below of values for b (also of size 162), ranging from indices 131 to 141.
> >
> > #processors=1:
> >                          0
> >      1.315217173959314e-20
> >      1.315217173959314e-20
> >      4.843201487740107e-17
> >      4.843201487740107e-17
> >      8.166104700666665e-14
> >      8.166104700666665e-14
> >      6.303834267553249e-11
> >      6.303834267553249e-11
> >      2.227932688485483e-08
> >      2.227932688485483e-08
> >
> > # processors=2:
> >      5.480410831461926e-22
> >      2.892553944350444e-22
> >      2.892553944350444e-22
> >      7.524038923310717e-24
> >      7.524038923214420e-24
> >     -3.340766769043093e-26
> >     -7.558372155761972e-27
> >      5.551561288838557e-25
> >      5.550551546879874e-25
> >     -1.579397982093437e-22
> >      2.655766754178065e-22
> >
> > # processors = 4:
> >      5.480410831461926e-22
> >      2.892553944351728e-22
> >      2.892553944351728e-22
> >      7.524092205125593e-24
> >      7.524092205125593e-24
> >     -2.584939414228212e-26
> >     -2.584939414228212e-26
> >                          0
> >                          0
> >     -1.245940797657998e-23
> >     -1.245940797657998e-23
> >
> > # processors = 8:
> >      5.480410831461926e-22
> >      2.892553944023035e-22
> >      2.892553944023035e-22
> >      7.524065744581494e-24
> >      7.524065744581494e-24
> >     -2.250265175188197e-26
> >     -2.250265175188197e-26
> >     -6.543127892265160e-26
> >     1.544288143499193e-317
> >      8.788794008375919e-25
> >      8.788794008375919e-25
> >
> >
> > Thanks,
> > Arthur
> >
> >
> >
> > On Wed, Jun 25, 2014 at 4:06 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> >    How different are the values in b? Can you send back a few examples of the different b’s? Any idea of the condition number of C?
> >
> >    Barry
> >
> > On Jun 25, 2014, at 3:10 PM, Arthur Kurlej <akurlej at gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > While running my code, I have found that MatMult() returns different values depending on the number of processors I use (and there is quite the variance in the values).
> > >
> > > The setup of my code is as follows (I can go into more depth/background if needed):
> > > -Generate parallel AIJ matrix of size NxN, denoted as A
> > > -Retrieve parallel AIJ submatrix from the last N-1 rows&columns from A, denoted as C
> > > -Generate vector of length N-1, denoted as x
> > > -Find C*x=b
> > >
> > > I have already checked that A, C, and x are all equivalent when ran for any number of processors, it is only the values of vector b that varies.
> > >
> > > Does anyone have an idea about what's going on?
> > >
> > >
> > > Thanks,
> > > Arthur
> > >
> >
> >
> 
> 
> 
> 
> 
> 



More information about the petsc-users mailing list