[petsc-users] Fwd: Building the same petsc matrix with different numprocs gives different results!

Analabha Roy hariseldon99 at gmail.com
Tue Sep 24 12:58:33 CDT 2013


Hi,

 Sorry for misunderstanding

I modified my source thus <http://pastebin.ca/2457850> so that the
rows/cols/values for each call are printed before inserting into
MatSetValues()


Then ran it with 1,2 processors


Here are the outputs <http://pastebin.ca/2457852>


Strange! Running it with 2 procs and only half the values show up!!!!!

And even those do not match!!!!




On Tue, Sep 24, 2013 at 11:12 PM, Matthew Knepley <knepley at gmail.com> wrote:

> On Tue, Sep 24, 2013 at 10:39 AM, Analabha Roy <hariseldon99 at gmail.com>wrote:
>
>> Hi,
>>
>>
>>
>> On Tue, Sep 24, 2013 at 9:33 PM, Matthew Knepley <knepley at gmail.com>wrote:
>>
>>> On Tue, Sep 24, 2013 at 8:35 AM, Analabha Roy <hariseldon99 at gmail.com>wrote:
>>>
>>>>
>>>>
>>>>
>>>> On Tue, Sep 24, 2013 at 8:41 PM, Matthew Knepley <knepley at gmail.com>wrote:
>>>>
>>>>> On Tue, Sep 24, 2013 at 8:08 AM, Analabha Roy <hariseldon99 at gmail.com>wrote:
>>>>>
>>>>>>
>>>>>> On Tue, Sep 24, 2013 at 1:42 PM, Jed Brown <jedbrown at mcs.anl.gov>wrote:
>>>>>>
>>>>>>> Analabha Roy <hariseldon99 at gmail.com> writes:
>>>>>>>
>>>>>>> > Hi all,
>>>>>>> >
>>>>>>> >
>>>>>>> > Compiling and running this
>>>>>>> > code<
>>>>>>> https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c
>>>>>>> >that
>>>>>>> > builds a petsc matrix gives different results when run with
>>>>>>> different
>>>>>>> > number of processors.
>>>>>>>
>>>>>>>
>>>>>> Thanks for the reply.
>>>>>>
>>>>>>
>>>>>>>  Uh, if you call rand() on different processors, why would you
>>>>>>> expect it
>>>>>>> to give the same results?
>>>>>>>
>>>>>>> Right, I get that. The rand() was a placeholder.
>>>>>>
>>>>>> This original much larger code<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c>replicates the same loop structure and runs the same Petsc subroutines, but
>>>>>> running it by
>>>>>>
>>>>>> mpirun -np $N ./eth -lattice_size 5 -vector_size 1 -repulsion 0.0
>>>>>> -draw_out -draw_pause -1
>>>>>>
>>>>>> with N=1,2,3,4 gives different results for the matrix dumped out by
>>>>>> lines 514-519<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#514>.
>>>>>> The matrix itself is evaluated in parallel, created in lines 263-275
>>>>>> <https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#263>and
>>>>>> evaluated in lines 294-356<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#294>
>>>>>>
>>>>>> (you can click on the line numbers above to navigate directly to them)
>>>>>>
>>>>>> Here is a sample <http://i43.tinypic.com/zyhf2f.jpg> of the output
>>>>>> of  lines 514-519<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#514>  for N=1,2,3,4 procs left to right.
>>>>>>
>>>>>> Thty're different for different procs. They should be the same, since
>>>>>> none of my input parameters are numprocs dependent, and I don't explicitly
>>>>>> use the size or rank anywhere in the code.
>>>>>>
>>>>>
>>>>> You are likely not dividing the rows you loop over so you are
>>>>> redundantly computing.
>>>>>
>>>>
>>>> Thanks for the reply.
>>>>
>>>> Line 274<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#274>gets the local row indices of Petsc Matrix
>>>> AVG_BDIBJ
>>>>
>>>> Line 295
>>>> <https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#295>iterates
>>>> over the local rows and the lines below get the column
>>>> elements. For each row, the column elements are assigned by the lines
>>>> up to  Line 344<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#344>and stored locally in colvalues[]. Dunno if the details are relevant.
>>>>
>>>> Line 347<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#347>inserts the sitestride1^th row into the matrix
>>>>
>>>> Line 353+<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#353>does the mat assembly
>>>>
>>>> Then, after a lot of currently irrelevant code,
>>>>
>>>> Line 514+<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#514>dumps the mat plot to graphics
>>>>
>>>>
>>>> Different numprocs give different matrices.
>>>>
>>>> Can somebody suggest what  I did wrong (or didn't do)?
>>>>
>>>
>>> Different values are being given to MatSetValues() for different numbers
>>> of processes. So
>>>
>>>   1) Reduce this to the smallest problem size possible
>>>
>>>   2) Print out all rows/cols/values for each call
>>>
>>>   3) Compare 2 procs to the serial case
>>>
>>>
>>
>> Thanks for your excellent suggestion.
>>
>> I modified my code<https://code.google.com/p/daneelrepo/source/diff?spec=svn1435&r=1435&format=side&path=/eth_question/eth.c>to dump the matrix in binary
>>
>> Then I used this python script I had<https://code.google.com/p/daneelrepo/source/browse/eth_question/mat_bin2ascii.py>to convert to ascii
>>
>
> Do not print the matrix, print the data you are passing to MatSetValues().
>
> MatSetValues() is not likely to be broken. Every PETSc code in the world
> calls this many times on every simulation.
>
>    Matt
>
>
>>
>> Here are the values of <http://pastebin.ca/2457842>AVG_BDIBJ<http://pastebin.ca/2457842>,
>> a 9X9 matrix (the smallest possible problem size) run with the exact same
>> input parameters with 1,2,3 and 4 procs
>>
>> As you can see, the 1 and 2 procs match up, but the 3 and 4 procs do not.
>>
>> Serious wierdness.
>>
>>
>>
>>>     Matt
>>>
>>>
>>>>
>>>>>    Matt
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>> for (sitestride1 = Istart; sitestride1 < Iend; sitestride1++)
>>>>>>>     {
>>>>>>>       for (sitestride2 = 0; sitestride2 < matsize; sitestride2++)
>>>>>>>         {
>>>>>>>           for (alpha = 0; alpha < dim; alpha++)
>>>>>>>             {
>>>>>>>               for (mu = 0; mu < dim; mu++)
>>>>>>>                 for (lambda = 0; lambda < dim; lambda++)
>>>>>>>                   {
>>>>>>>                     vecval = rand () / rand ();
>>>>>>>                   }
>>>>>>>
>>>>>>>               VecSetValue (BDB_AA, alpha, vecval, INSERT_VALUES);
>>>>>>>
>>>>>>>             }
>>>>>>>           VecAssemblyBegin (BDB_AA);
>>>>>>>           VecAssemblyEnd (BDB_AA);
>>>>>>>           VecSum (BDB_AA, &element);
>>>>>>>           colvalues[sitestride2] = element;
>>>>>>>
>>>>>>>         }
>>>>>>>       //Insert the array of colvalues to the sitestride1^th row of H
>>>>>>>       MatSetValues (AVG_BDIBJ, 1, &sitestride1, matsize, idx,
>>>>>>> colvalues,
>>>>>>>                     INSERT_VALUES);
>>>>>>>
>>>>>>>     }
>>>>>>>
>>>>>>> > The code is large and complex, so I have created a smaller program
>>>>>>> > with the same
>>>>>>> > loop structure here. <http://pastebin.ca/2457643>
>>>>>>> >
>>>>>>> > Compile it and run it with "mpirun -np $N ./test -draw_pause -1"
>>>>>>> gives
>>>>>>> > different results for different values of N even though it's not
>>>>>>> supposed
>>>>>>> > to.
>>>>>>>
>>>>>>> What do you expect to see?
>>>>>>>
>>>>>>> > Here is a sample output <http://i42.tinypic.com/2s16ccw.jpg> for
>>>>>>> N=1,2,3,4
>>>>>>> > from left to right.
>>>>>>> >
>>>>>>> > Can anyone guide me as to what I'm doing wrong? Are any of the
>>>>>>> petssc
>>>>>>> > routines used not parallelizable?
>>>>>>> >
>>>>>>> > Thanks in advance,
>>>>>>> >
>>>>>>> > Regards.
>>>>>>> >
>>>>>>> > --
>>>>>>> > ---
>>>>>>> > *Analabha Roy*
>>>>>>> > C.S.I.R <http://www.csir.res.in>  Senior Research
>>>>>>> > Associate<http://csirhrdg.res.in/poolsra.htm>
>>>>>>> > Saha Institute of Nuclear Physics <http://www.saha.ac.in>
>>>>>>> > Section 1, Block AF
>>>>>>> > Bidhannagar, Calcutta 700064
>>>>>>> > India
>>>>>>> > *Emails*: daneel at physics.utexas.edu, hariseldon99 at gmail.com
>>>>>>> > *Webpage*: http://www.ph.utexas.edu/~daneel/
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> ---
>>>>>> *Analabha Roy*
>>>>>> C.S.I.R <http://www.csir.res.in>  Senior Research Associate<http://csirhrdg.res.in/poolsra.htm>
>>>>>> Saha Institute of Nuclear Physics <http://www.saha.ac.in>
>>>>>> Section 1, Block AF
>>>>>> Bidhannagar, Calcutta 700064
>>>>>> India
>>>>>> *Emails*: daneel at physics.utexas.edu, hariseldon99 at gmail.com
>>>>>> *Webpage*: http://www.ph.utexas.edu/~daneel/
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> What most experimenters take for granted before they begin their
>>>>> experiments is infinitely more interesting than any results to which their
>>>>> experiments lead.
>>>>> -- Norbert Wiener
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> ---
>>>> *Analabha Roy*
>>>> C.S.I.R <http://www.csir.res.in>  Senior Research Associate<http://csirhrdg.res.in/poolsra.htm>
>>>> Saha Institute of Nuclear Physics <http://www.saha.ac.in>
>>>> Section 1, Block AF
>>>> Bidhannagar, Calcutta 700064
>>>> India
>>>> *Emails*: daneel at physics.utexas.edu, hariseldon99 at gmail.com
>>>> *Webpage*: http://www.ph.utexas.edu/~daneel/
>>>>
>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>
>>
>>
>> --
>> ---
>> *Analabha Roy*
>> C.S.I.R <http://www.csir.res.in>  Senior Research Associate<http://csirhrdg.res.in/poolsra.htm>
>> Saha Institute of Nuclear Physics <http://www.saha.ac.in>
>> Section 1, Block AF
>> Bidhannagar, Calcutta 700064
>> India
>> *Emails*: daneel at physics.utexas.edu, hariseldon99 at gmail.com
>> *Webpage*: http://www.ph.utexas.edu/~daneel/
>>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>



-- 
---
*Analabha Roy*
C.S.I.R <http://www.csir.res.in>  Senior Research
Associate<http://csirhrdg.res.in/poolsra.htm>
Saha Institute of Nuclear Physics <http://www.saha.ac.in>
Section 1, Block AF
Bidhannagar, Calcutta 700064
India
*Emails*: daneel at physics.utexas.edu, hariseldon99 at gmail.com
*Webpage*: http://www.ph.utexas.edu/~daneel/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20130924/6b89fe09/attachment.html>


More information about the petsc-users mailing list