[petsc-users] Fwd: Building the same petsc matrix with different numprocs gives different results!

Matthew Knepley knepley at gmail.com
Tue Sep 24 12:42:02 CDT 2013


On Tue, Sep 24, 2013 at 10:39 AM, Analabha Roy <hariseldon99 at gmail.com>wrote:

> Hi,
>
>
>
> On Tue, Sep 24, 2013 at 9:33 PM, Matthew Knepley <knepley at gmail.com>wrote:
>
>> On Tue, Sep 24, 2013 at 8:35 AM, Analabha Roy <hariseldon99 at gmail.com>wrote:
>>
>>>
>>>
>>>
>>> On Tue, Sep 24, 2013 at 8:41 PM, Matthew Knepley <knepley at gmail.com>wrote:
>>>
>>>> On Tue, Sep 24, 2013 at 8:08 AM, Analabha Roy <hariseldon99 at gmail.com>wrote:
>>>>
>>>>>
>>>>> On Tue, Sep 24, 2013 at 1:42 PM, Jed Brown <jedbrown at mcs.anl.gov>wrote:
>>>>>
>>>>>> Analabha Roy <hariseldon99 at gmail.com> writes:
>>>>>>
>>>>>> > Hi all,
>>>>>> >
>>>>>> >
>>>>>> > Compiling and running this
>>>>>> > code<
>>>>>> https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c
>>>>>> >that
>>>>>> > builds a petsc matrix gives different results when run with
>>>>>> different
>>>>>> > number of processors.
>>>>>>
>>>>>>
>>>>> Thanks for the reply.
>>>>>
>>>>>
>>>>>>  Uh, if you call rand() on different processors, why would you
>>>>>> expect it
>>>>>> to give the same results?
>>>>>>
>>>>>> Right, I get that. The rand() was a placeholder.
>>>>>
>>>>> This original much larger code<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c>replicates the same loop structure and runs the same Petsc subroutines, but
>>>>> running it by
>>>>>
>>>>> mpirun -np $N ./eth -lattice_size 5 -vector_size 1 -repulsion 0.0
>>>>> -draw_out -draw_pause -1
>>>>>
>>>>> with N=1,2,3,4 gives different results for the matrix dumped out by
>>>>> lines 514-519<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#514>.
>>>>> The matrix itself is evaluated in parallel, created in lines 263-275
>>>>> <https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#263>and
>>>>> evaluated in lines 294-356<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#294>
>>>>>
>>>>> (you can click on the line numbers above to navigate directly to them)
>>>>>
>>>>> Here is a sample <http://i43.tinypic.com/zyhf2f.jpg> of the output
>>>>> of  lines 514-519<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#514>  for N=1,2,3,4 procs left to right.
>>>>>
>>>>> Thty're different for different procs. They should be the same, since
>>>>> none of my input parameters are numprocs dependent, and I don't explicitly
>>>>> use the size or rank anywhere in the code.
>>>>>
>>>>
>>>> You are likely not dividing the rows you loop over so you are
>>>> redundantly computing.
>>>>
>>>
>>> Thanks for the reply.
>>>
>>> Line 274<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#274>gets the local row indices of Petsc Matrix
>>> AVG_BDIBJ
>>>
>>> Line 295
>>> <https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#295>iterates
>>> over the local rows and the lines below get the column
>>> elements. For each row, the column elements are assigned by the lines up
>>> to  Line 344<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#344>and stored locally in colvalues[]. Dunno if the details are relevant.
>>>
>>> Line 347<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#347>inserts the sitestride1^th row into the matrix
>>>
>>> Line 353+<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#353>does the mat assembly
>>>
>>> Then, after a lot of currently irrelevant code,
>>>
>>> Line 514+<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#514>dumps the mat plot to graphics
>>>
>>>
>>> Different numprocs give different matrices.
>>>
>>> Can somebody suggest what  I did wrong (or didn't do)?
>>>
>>
>> Different values are being given to MatSetValues() for different numbers
>> of processes. So
>>
>>   1) Reduce this to the smallest problem size possible
>>
>>   2) Print out all rows/cols/values for each call
>>
>>   3) Compare 2 procs to the serial case
>>
>>
>
> Thanks for your excellent suggestion.
>
> I modified my code<https://code.google.com/p/daneelrepo/source/diff?spec=svn1435&r=1435&format=side&path=/eth_question/eth.c>to dump the matrix in binary
>
> Then I used this python script I had<https://code.google.com/p/daneelrepo/source/browse/eth_question/mat_bin2ascii.py>to convert to ascii
>

Do not print the matrix, print the data you are passing to MatSetValues().

MatSetValues() is not likely to be broken. Every PETSc code in the world
calls this many times on every simulation.

   Matt


>
> Here are the values of <http://pastebin.ca/2457842>AVG_BDIBJ<http://pastebin.ca/2457842>,
> a 9X9 matrix (the smallest possible problem size) run with the exact same
> input parameters with 1,2,3 and 4 procs
>
> As you can see, the 1 and 2 procs match up, but the 3 and 4 procs do not.
>
> Serious wierdness.
>
>
>
>>     Matt
>>
>>
>>>
>>>>    Matt
>>>>
>>>>
>>>>>
>>>>>
>>>>>> for (sitestride1 = Istart; sitestride1 < Iend; sitestride1++)
>>>>>>     {
>>>>>>       for (sitestride2 = 0; sitestride2 < matsize; sitestride2++)
>>>>>>         {
>>>>>>           for (alpha = 0; alpha < dim; alpha++)
>>>>>>             {
>>>>>>               for (mu = 0; mu < dim; mu++)
>>>>>>                 for (lambda = 0; lambda < dim; lambda++)
>>>>>>                   {
>>>>>>                     vecval = rand () / rand ();
>>>>>>                   }
>>>>>>
>>>>>>               VecSetValue (BDB_AA, alpha, vecval, INSERT_VALUES);
>>>>>>
>>>>>>             }
>>>>>>           VecAssemblyBegin (BDB_AA);
>>>>>>           VecAssemblyEnd (BDB_AA);
>>>>>>           VecSum (BDB_AA, &element);
>>>>>>           colvalues[sitestride2] = element;
>>>>>>
>>>>>>         }
>>>>>>       //Insert the array of colvalues to the sitestride1^th row of H
>>>>>>       MatSetValues (AVG_BDIBJ, 1, &sitestride1, matsize, idx,
>>>>>> colvalues,
>>>>>>                     INSERT_VALUES);
>>>>>>
>>>>>>     }
>>>>>>
>>>>>> > The code is large and complex, so I have created a smaller program
>>>>>> > with the same
>>>>>> > loop structure here. <http://pastebin.ca/2457643>
>>>>>> >
>>>>>> > Compile it and run it with "mpirun -np $N ./test -draw_pause -1"
>>>>>> gives
>>>>>> > different results for different values of N even though it's not
>>>>>> supposed
>>>>>> > to.
>>>>>>
>>>>>> What do you expect to see?
>>>>>>
>>>>>> > Here is a sample output <http://i42.tinypic.com/2s16ccw.jpg> for
>>>>>> N=1,2,3,4
>>>>>> > from left to right.
>>>>>> >
>>>>>> > Can anyone guide me as to what I'm doing wrong? Are any of the
>>>>>> petssc
>>>>>> > routines used not parallelizable?
>>>>>> >
>>>>>> > Thanks in advance,
>>>>>> >
>>>>>> > Regards.
>>>>>> >
>>>>>> > --
>>>>>> > ---
>>>>>> > *Analabha Roy*
>>>>>> > C.S.I.R <http://www.csir.res.in>  Senior Research
>>>>>> > Associate<http://csirhrdg.res.in/poolsra.htm>
>>>>>> > Saha Institute of Nuclear Physics <http://www.saha.ac.in>
>>>>>> > Section 1, Block AF
>>>>>> > Bidhannagar, Calcutta 700064
>>>>>> > India
>>>>>> > *Emails*: daneel at physics.utexas.edu, hariseldon99 at gmail.com
>>>>>> > *Webpage*: http://www.ph.utexas.edu/~daneel/
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ---
>>>>> *Analabha Roy*
>>>>> C.S.I.R <http://www.csir.res.in>  Senior Research Associate<http://csirhrdg.res.in/poolsra.htm>
>>>>> Saha Institute of Nuclear Physics <http://www.saha.ac.in>
>>>>> Section 1, Block AF
>>>>> Bidhannagar, Calcutta 700064
>>>>> India
>>>>> *Emails*: daneel at physics.utexas.edu, hariseldon99 at gmail.com
>>>>> *Webpage*: http://www.ph.utexas.edu/~daneel/
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which their
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>
>>>
>>>
>>> --
>>> ---
>>> *Analabha Roy*
>>> C.S.I.R <http://www.csir.res.in>  Senior Research Associate<http://csirhrdg.res.in/poolsra.htm>
>>> Saha Institute of Nuclear Physics <http://www.saha.ac.in>
>>> Section 1, Block AF
>>> Bidhannagar, Calcutta 700064
>>> India
>>> *Emails*: daneel at physics.utexas.edu, hariseldon99 at gmail.com
>>> *Webpage*: http://www.ph.utexas.edu/~daneel/
>>>
>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>
>
>
> --
> ---
> *Analabha Roy*
> C.S.I.R <http://www.csir.res.in>  Senior Research Associate<http://csirhrdg.res.in/poolsra.htm>
> Saha Institute of Nuclear Physics <http://www.saha.ac.in>
> Section 1, Block AF
> Bidhannagar, Calcutta 700064
> India
> *Emails*: daneel at physics.utexas.edu, hariseldon99 at gmail.com
> *Webpage*: http://www.ph.utexas.edu/~daneel/
>



-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20130924/4aa2973c/attachment-0001.html>


More information about the petsc-users mailing list