[petsc-users] Fwd: Building the same petsc matrix with different numprocs gives different results!

Analabha Roy hariseldon99 at gmail.com
Tue Sep 24 13:18:30 CDT 2013


On Tue, Sep 24, 2013 at 11:35 PM, Matthew Knepley <knepley at gmail.com> wrote:

> On Tue, Sep 24, 2013 at 10:58 AM, Analabha Roy <hariseldon99 at gmail.com>wrote:
>
>> Hi,
>>
>>  Sorry for misunderstanding
>>
>> I modified my source thus <http://pastebin.ca/2457850> so that the
>> rows/cols/values for each call are printed before inserting into
>> MatSetValues()
>>
>>
>> Then ran it with 1,2 processors
>>
>>
>> Here are the outputs <http://pastebin.ca/2457852>
>>
>>
>> Strange! Running it with 2 procs and only half the values show up!!!!!
>>
>
> PetscPrintf() only prints from rank 0. Use PETSC_COMM_SELF.
>
>


Sorry. Modified accordingly and here is new
output<http://pastebin.ca/2457857>(I manually reordered the output of
the 2 procs case since the order in
which it was printed was haphazard)


All the elements do not match.



>
>    Matt
>
>
>> And even those do not match!!!!
>>
>>
>>
>>
>> On Tue, Sep 24, 2013 at 11:12 PM, Matthew Knepley <knepley at gmail.com>wrote:
>>
>>> On Tue, Sep 24, 2013 at 10:39 AM, Analabha Roy <hariseldon99 at gmail.com>wrote:
>>>
>>>> Hi,
>>>>
>>>>
>>>>
>>>> On Tue, Sep 24, 2013 at 9:33 PM, Matthew Knepley <knepley at gmail.com>wrote:
>>>>
>>>>> On Tue, Sep 24, 2013 at 8:35 AM, Analabha Roy <hariseldon99 at gmail.com>wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 24, 2013 at 8:41 PM, Matthew Knepley <knepley at gmail.com>wrote:
>>>>>>
>>>>>>> On Tue, Sep 24, 2013 at 8:08 AM, Analabha Roy <
>>>>>>> hariseldon99 at gmail.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Sep 24, 2013 at 1:42 PM, Jed Brown <jedbrown at mcs.anl.gov>wrote:
>>>>>>>>
>>>>>>>>> Analabha Roy <hariseldon99 at gmail.com> writes:
>>>>>>>>>
>>>>>>>>> > Hi all,
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > Compiling and running this
>>>>>>>>> > code<
>>>>>>>>> https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c
>>>>>>>>> >that
>>>>>>>>> > builds a petsc matrix gives different results when run with
>>>>>>>>> different
>>>>>>>>> > number of processors.
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Thanks for the reply.
>>>>>>>>
>>>>>>>>
>>>>>>>>>  Uh, if you call rand() on different processors, why would you
>>>>>>>>> expect it
>>>>>>>>> to give the same results?
>>>>>>>>>
>>>>>>>>> Right, I get that. The rand() was a placeholder.
>>>>>>>>
>>>>>>>> This original much larger code<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c>replicates the same loop structure and runs the same Petsc subroutines, but
>>>>>>>> running it by
>>>>>>>>
>>>>>>>> mpirun -np $N ./eth -lattice_size 5 -vector_size 1 -repulsion 0.0
>>>>>>>> -draw_out -draw_pause -1
>>>>>>>>
>>>>>>>> with N=1,2,3,4 gives different results for the matrix dumped out by
>>>>>>>> lines 514-519<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#514>.
>>>>>>>> The matrix itself is evaluated in parallel, created in lines263-275
>>>>>>>> <https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#263>and
>>>>>>>> evaluated in lines 294-356<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#294>
>>>>>>>>
>>>>>>>> (you can click on the line numbers above to navigate directly to
>>>>>>>> them)
>>>>>>>>
>>>>>>>> Here is a sample <http://i43.tinypic.com/zyhf2f.jpg> of the output
>>>>>>>> of  lines 514-519<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#514>  for N=1,2,3,4 procs left to right.
>>>>>>>>
>>>>>>>> Thty're different for different procs. They should be the same,
>>>>>>>> since none of my input parameters are numprocs dependent, and I don't
>>>>>>>> explicitly use the size or rank anywhere in the code.
>>>>>>>>
>>>>>>>
>>>>>>> You are likely not dividing the rows you loop over so you are
>>>>>>> redundantly computing.
>>>>>>>
>>>>>>
>>>>>> Thanks for the reply.
>>>>>>
>>>>>> Line 274<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#274>gets the local row indices of Petsc Matrix
>>>>>> AVG_BDIBJ
>>>>>>
>>>>>> Line 295
>>>>>> <https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#295>iterates
>>>>>> over the local rows and the lines below get the column
>>>>>> elements. For each row, the column elements are assigned by the lines
>>>>>> up to  Line 344<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#344>and stored locally in colvalues[]. Dunno if the details are relevant.
>>>>>>
>>>>>> Line 347<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#347>inserts the sitestride1^th row into the matrix
>>>>>>
>>>>>> Line 353+<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#353>does the mat assembly
>>>>>>
>>>>>> Then, after a lot of currently irrelevant code,
>>>>>>
>>>>>> Line 514+<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#514>dumps the mat plot to graphics
>>>>>>
>>>>>>
>>>>>> Different numprocs give different matrices.
>>>>>>
>>>>>> Can somebody suggest what  I did wrong (or didn't do)?
>>>>>>
>>>>>
>>>>> Different values are being given to MatSetValues() for different
>>>>> numbers of processes. So
>>>>>
>>>>>   1) Reduce this to the smallest problem size possible
>>>>>
>>>>>   2) Print out all rows/cols/values for each call
>>>>>
>>>>>   3) Compare 2 procs to the serial case
>>>>>
>>>>>
>>>>
>>>> Thanks for your excellent suggestion.
>>>>
>>>> I modified my code<https://code.google.com/p/daneelrepo/source/diff?spec=svn1435&r=1435&format=side&path=/eth_question/eth.c>to dump the matrix in binary
>>>>
>>>> Then I used this python script I had<https://code.google.com/p/daneelrepo/source/browse/eth_question/mat_bin2ascii.py>to convert to ascii
>>>>
>>>
>>> Do not print the matrix, print the data you are passing to
>>> MatSetValues().
>>>
>>> MatSetValues() is not likely to be broken. Every PETSc code in the world
>>> calls this many times on every simulation.
>>>
>>>    Matt
>>>
>>>
>>>>
>>>> Here are the values of <http://pastebin.ca/2457842>AVG_BDIBJ<http://pastebin.ca/2457842>,
>>>> a 9X9 matrix (the smallest possible problem size) run with the exact same
>>>> input parameters with 1,2,3 and 4 procs
>>>>
>>>> As you can see, the 1 and 2 procs match up, but the 3 and 4 procs do
>>>> not.
>>>>
>>>> Serious wierdness.
>>>>
>>>>
>>>>
>>>>>     Matt
>>>>>
>>>>>
>>>>>>
>>>>>>>    Matt
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> for (sitestride1 = Istart; sitestride1 < Iend; sitestride1++)
>>>>>>>>>     {
>>>>>>>>>       for (sitestride2 = 0; sitestride2 < matsize; sitestride2++)
>>>>>>>>>         {
>>>>>>>>>           for (alpha = 0; alpha < dim; alpha++)
>>>>>>>>>             {
>>>>>>>>>               for (mu = 0; mu < dim; mu++)
>>>>>>>>>                 for (lambda = 0; lambda < dim; lambda++)
>>>>>>>>>                   {
>>>>>>>>>                     vecval = rand () / rand ();
>>>>>>>>>                   }
>>>>>>>>>
>>>>>>>>>               VecSetValue (BDB_AA, alpha, vecval, INSERT_VALUES);
>>>>>>>>>
>>>>>>>>>             }
>>>>>>>>>           VecAssemblyBegin (BDB_AA);
>>>>>>>>>           VecAssemblyEnd (BDB_AA);
>>>>>>>>>           VecSum (BDB_AA, &element);
>>>>>>>>>           colvalues[sitestride2] = element;
>>>>>>>>>
>>>>>>>>>         }
>>>>>>>>>       //Insert the array of colvalues to the sitestride1^th row of
>>>>>>>>> H
>>>>>>>>>       MatSetValues (AVG_BDIBJ, 1, &sitestride1, matsize, idx,
>>>>>>>>> colvalues,
>>>>>>>>>                     INSERT_VALUES);
>>>>>>>>>
>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>> > The code is large and complex, so I have created a smaller
>>>>>>>>> program
>>>>>>>>> > with the same
>>>>>>>>> > loop structure here. <http://pastebin.ca/2457643>
>>>>>>>>> >
>>>>>>>>> > Compile it and run it with "mpirun -np $N ./test -draw_pause -1"
>>>>>>>>> gives
>>>>>>>>> > different results for different values of N even though it's not
>>>>>>>>> supposed
>>>>>>>>> > to.
>>>>>>>>>
>>>>>>>>> What do you expect to see?
>>>>>>>>>
>>>>>>>>> > Here is a sample output <http://i42.tinypic.com/2s16ccw.jpg>
>>>>>>>>> for N=1,2,3,4
>>>>>>>>> > from left to right.
>>>>>>>>> >
>>>>>>>>> > Can anyone guide me as to what I'm doing wrong? Are any of the
>>>>>>>>> petssc
>>>>>>>>> > routines used not parallelizable?
>>>>>>>>> >
>>>>>>>>> > Thanks in advance,
>>>>>>>>> >
>>>>>>>>> > Regards.
>>>>>>>>> >
>>>>>>>>> > --
>>>>>>>>> > ---
>>>>>>>>> > *Analabha Roy*
>>>>>>>>> > C.S.I.R <http://www.csir.res.in>  Senior Research
>>>>>>>>> > Associate<http://csirhrdg.res.in/poolsra.htm>
>>>>>>>>> > Saha Institute of Nuclear Physics <http://www.saha.ac.in>
>>>>>>>>> > Section 1, Block AF
>>>>>>>>> > Bidhannagar, Calcutta 700064
>>>>>>>>> > India
>>>>>>>>> > *Emails*: daneel at physics.utexas.edu, hariseldon99 at gmail.com
>>>>>>>>> > *Webpage*: http://www.ph.utexas.edu/~daneel/
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> ---
>>>>>>>> *Analabha Roy*
>>>>>>>> C.S.I.R <http://www.csir.res.in>  Senior Research Associate<http://csirhrdg.res.in/poolsra.htm>
>>>>>>>> Saha Institute of Nuclear Physics <http://www.saha.ac.in>
>>>>>>>> Section 1, Block AF
>>>>>>>> Bidhannagar, Calcutta 700064
>>>>>>>> India
>>>>>>>> *Emails*: daneel at physics.utexas.edu, hariseldon99 at gmail.com
>>>>>>>> *Webpage*: http://www.ph.utexas.edu/~daneel/
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> What most experimenters take for granted before they begin their
>>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>>> experiments lead.
>>>>>>> -- Norbert Wiener
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> ---
>>>>>> *Analabha Roy*
>>>>>> C.S.I.R <http://www.csir.res.in>  Senior Research Associate<http://csirhrdg.res.in/poolsra.htm>
>>>>>> Saha Institute of Nuclear Physics <http://www.saha.ac.in>
>>>>>> Section 1, Block AF
>>>>>> Bidhannagar, Calcutta 700064
>>>>>> India
>>>>>> *Emails*: daneel at physics.utexas.edu, hariseldon99 at gmail.com
>>>>>> *Webpage*: http://www.ph.utexas.edu/~daneel/
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> What most experimenters take for granted before they begin their
>>>>> experiments is infinitely more interesting than any results to which their
>>>>> experiments lead.
>>>>> -- Norbert Wiener
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> ---
>>>> *Analabha Roy*
>>>> C.S.I.R <http://www.csir.res.in>  Senior Research Associate<http://csirhrdg.res.in/poolsra.htm>
>>>> Saha Institute of Nuclear Physics <http://www.saha.ac.in>
>>>> Section 1, Block AF
>>>> Bidhannagar, Calcutta 700064
>>>> India
>>>> *Emails*: daneel at physics.utexas.edu, hariseldon99 at gmail.com
>>>> *Webpage*: http://www.ph.utexas.edu/~daneel/
>>>>
>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>
>>
>>
>> --
>> ---
>> *Analabha Roy*
>> C.S.I.R <http://www.csir.res.in>  Senior Research Associate<http://csirhrdg.res.in/poolsra.htm>
>> Saha Institute of Nuclear Physics <http://www.saha.ac.in>
>> Section 1, Block AF
>> Bidhannagar, Calcutta 700064
>> India
>> *Emails*: daneel at physics.utexas.edu, hariseldon99 at gmail.com
>> *Webpage*: http://www.ph.utexas.edu/~daneel/
>>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>



-- 
---
*Analabha Roy*
C.S.I.R <http://www.csir.res.in>  Senior Research
Associate<http://csirhrdg.res.in/poolsra.htm>
Saha Institute of Nuclear Physics <http://www.saha.ac.in>
Section 1, Block AF
Bidhannagar, Calcutta 700064
India
*Emails*: daneel at physics.utexas.edu, hariseldon99 at gmail.com
*Webpage*: http://www.ph.utexas.edu/~daneel/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20130924/5b73f6e0/attachment-0001.html>


More information about the petsc-users mailing list