[petsc-users] Fwd: Building the same petsc matrix with different numprocs gives different results!

Analabha Roy hariseldon99 at gmail.com
Tue Sep 24 12:39:10 CDT 2013


Hi,



On Tue, Sep 24, 2013 at 9:33 PM, Matthew Knepley <knepley at gmail.com> wrote:

> On Tue, Sep 24, 2013 at 8:35 AM, Analabha Roy <hariseldon99 at gmail.com>wrote:
>
>>
>>
>>
>> On Tue, Sep 24, 2013 at 8:41 PM, Matthew Knepley <knepley at gmail.com>wrote:
>>
>>> On Tue, Sep 24, 2013 at 8:08 AM, Analabha Roy <hariseldon99 at gmail.com>wrote:
>>>
>>>>
>>>> On Tue, Sep 24, 2013 at 1:42 PM, Jed Brown <jedbrown at mcs.anl.gov>wrote:
>>>>
>>>>> Analabha Roy <hariseldon99 at gmail.com> writes:
>>>>>
>>>>> > Hi all,
>>>>> >
>>>>> >
>>>>> > Compiling and running this
>>>>> > code<
>>>>> https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c
>>>>> >that
>>>>> > builds a petsc matrix gives different results when run with different
>>>>> > number of processors.
>>>>>
>>>>>
>>>> Thanks for the reply.
>>>>
>>>>
>>>>>  Uh, if you call rand() on different processors, why would you expect
>>>>> it
>>>>> to give the same results?
>>>>>
>>>>> Right, I get that. The rand() was a placeholder.
>>>>
>>>> This original much larger code<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c>replicates the same loop structure and runs the same Petsc subroutines, but
>>>> running it by
>>>>
>>>> mpirun -np $N ./eth -lattice_size 5 -vector_size 1 -repulsion 0.0
>>>> -draw_out -draw_pause -1
>>>>
>>>> with N=1,2,3,4 gives different results for the matrix dumped out by
>>>> lines 514-519<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#514>.
>>>> The matrix itself is evaluated in parallel, created in lines 263-275
>>>> <https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#263>and
>>>> evaluated in lines 294-356<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#294>
>>>>
>>>> (you can click on the line numbers above to navigate directly to them)
>>>>
>>>> Here is a sample <http://i43.tinypic.com/zyhf2f.jpg> of the output of
>>>> lines 514-519<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#514>  for N=1,2,3,4 procs left to right.
>>>>
>>>> Thty're different for different procs. They should be the same, since
>>>> none of my input parameters are numprocs dependent, and I don't explicitly
>>>> use the size or rank anywhere in the code.
>>>>
>>>
>>> You are likely not dividing the rows you loop over so you are
>>> redundantly computing.
>>>
>>
>> Thanks for the reply.
>>
>> Line 274<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#274>gets the local row indices of Petsc Matrix
>> AVG_BDIBJ
>>
>> Line 295
>> <https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#295>iterates
>> over the local rows and the lines below get the column
>> elements. For each row, the column elements are assigned by the lines up
>> to  Line 344<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#344>and stored locally in colvalues[]. Dunno if the details are relevant.
>>
>> Line 347<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#347>inserts the sitestride1^th row into the matrix
>>
>> Line 353+<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#353>does the mat assembly
>>
>> Then, after a lot of currently irrelevant code,
>>
>> Line 514+<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#514>dumps the mat plot to graphics
>>
>>
>> Different numprocs give different matrices.
>>
>> Can somebody suggest what  I did wrong (or didn't do)?
>>
>
> Different values are being given to MatSetValues() for different numbers
> of processes. So
>
>   1) Reduce this to the smallest problem size possible
>
>   2) Print out all rows/cols/values for each call
>
>   3) Compare 2 procs to the serial case
>
>

Thanks for your excellent suggestion.

I modified my code<https://code.google.com/p/daneelrepo/source/diff?spec=svn1435&r=1435&format=side&path=/eth_question/eth.c>to
dump the matrix in binary

Then I used this python script I
had<https://code.google.com/p/daneelrepo/source/browse/eth_question/mat_bin2ascii.py>to
convert to ascii


Here are the values of
<http://pastebin.ca/2457842>AVG_BDIBJ<http://pastebin.ca/2457842>,
a 9X9 matrix (the smallest possible problem size) run with the exact same
input parameters with 1,2,3 and 4 procs

As you can see, the 1 and 2 procs match up, but the 3 and 4 procs do not.

Serious wierdness.



>     Matt
>
>
>>
>>>    Matt
>>>
>>>
>>>>
>>>>
>>>>> for (sitestride1 = Istart; sitestride1 < Iend; sitestride1++)
>>>>>     {
>>>>>       for (sitestride2 = 0; sitestride2 < matsize; sitestride2++)
>>>>>         {
>>>>>           for (alpha = 0; alpha < dim; alpha++)
>>>>>             {
>>>>>               for (mu = 0; mu < dim; mu++)
>>>>>                 for (lambda = 0; lambda < dim; lambda++)
>>>>>                   {
>>>>>                     vecval = rand () / rand ();
>>>>>                   }
>>>>>
>>>>>               VecSetValue (BDB_AA, alpha, vecval, INSERT_VALUES);
>>>>>
>>>>>             }
>>>>>           VecAssemblyBegin (BDB_AA);
>>>>>           VecAssemblyEnd (BDB_AA);
>>>>>           VecSum (BDB_AA, &element);
>>>>>           colvalues[sitestride2] = element;
>>>>>
>>>>>         }
>>>>>       //Insert the array of colvalues to the sitestride1^th row of H
>>>>>       MatSetValues (AVG_BDIBJ, 1, &sitestride1, matsize, idx,
>>>>> colvalues,
>>>>>                     INSERT_VALUES);
>>>>>
>>>>>     }
>>>>>
>>>>> > The code is large and complex, so I have created a smaller program
>>>>> > with the same
>>>>> > loop structure here. <http://pastebin.ca/2457643>
>>>>> >
>>>>> > Compile it and run it with "mpirun -np $N ./test -draw_pause -1"
>>>>> gives
>>>>> > different results for different values of N even though it's not
>>>>> supposed
>>>>> > to.
>>>>>
>>>>> What do you expect to see?
>>>>>
>>>>> > Here is a sample output <http://i42.tinypic.com/2s16ccw.jpg> for
>>>>> N=1,2,3,4
>>>>> > from left to right.
>>>>> >
>>>>> > Can anyone guide me as to what I'm doing wrong? Are any of the petssc
>>>>> > routines used not parallelizable?
>>>>> >
>>>>> > Thanks in advance,
>>>>> >
>>>>> > Regards.
>>>>> >
>>>>> > --
>>>>> > ---
>>>>> > *Analabha Roy*
>>>>> > C.S.I.R <http://www.csir.res.in>  Senior Research
>>>>> > Associate<http://csirhrdg.res.in/poolsra.htm>
>>>>> > Saha Institute of Nuclear Physics <http://www.saha.ac.in>
>>>>> > Section 1, Block AF
>>>>> > Bidhannagar, Calcutta 700064
>>>>> > India
>>>>> > *Emails*: daneel at physics.utexas.edu, hariseldon99 at gmail.com
>>>>> > *Webpage*: http://www.ph.utexas.edu/~daneel/
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> ---
>>>> *Analabha Roy*
>>>> C.S.I.R <http://www.csir.res.in>  Senior Research Associate<http://csirhrdg.res.in/poolsra.htm>
>>>> Saha Institute of Nuclear Physics <http://www.saha.ac.in>
>>>> Section 1, Block AF
>>>> Bidhannagar, Calcutta 700064
>>>> India
>>>> *Emails*: daneel at physics.utexas.edu, hariseldon99 at gmail.com
>>>> *Webpage*: http://www.ph.utexas.edu/~daneel/
>>>>
>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>
>>
>>
>> --
>> ---
>> *Analabha Roy*
>> C.S.I.R <http://www.csir.res.in>  Senior Research Associate<http://csirhrdg.res.in/poolsra.htm>
>> Saha Institute of Nuclear Physics <http://www.saha.ac.in>
>> Section 1, Block AF
>> Bidhannagar, Calcutta 700064
>> India
>> *Emails*: daneel at physics.utexas.edu, hariseldon99 at gmail.com
>> *Webpage*: http://www.ph.utexas.edu/~daneel/
>>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>



-- 
---
*Analabha Roy*
C.S.I.R <http://www.csir.res.in>  Senior Research
Associate<http://csirhrdg.res.in/poolsra.htm>
Saha Institute of Nuclear Physics <http://www.saha.ac.in>
Section 1, Block AF
Bidhannagar, Calcutta 700064
India
*Emails*: daneel at physics.utexas.edu, hariseldon99 at gmail.com
*Webpage*: http://www.ph.utexas.edu/~daneel/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20130924/587871a1/attachment.html>


More information about the petsc-users mailing list