[petsc-users] Fwd: Building the same petsc matrix with different numprocs gives different results!

Analabha Roy hariseldon99 at gmail.com
Wed Sep 25 00:01:24 CDT 2013


There is one thing

In the code, the evaluation of each element of AVG_BIBJ requires a
read-only matrix U_parallel that I input from another program, and a
writeable sequential vector BDB_AA that is different for each element.

I sequentiate U_parallel to U_seq by using MatCopy here in lines
242+<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#242>
and each process is supposed to update its copy of BDB_AA at every loop
iteration here in line
347+<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#347>


Is this right? Or are sequential vectors/matrices handled by the root
process only? I know how to scatter a parallel vector to all processes
using PetSc scatter contexts but don't see any way to do so to a matrix
other than MatCopy. How do I ensure that each process has its own private
writeable  copy of a sequential vector?




On Tue, Sep 24, 2013 at 11:48 PM, Analabha Roy <hariseldon99 at gmail.com>wrote:

>
>
>
> On Tue, Sep 24, 2013 at 11:35 PM, Matthew Knepley <knepley at gmail.com>wrote:
>
>> On Tue, Sep 24, 2013 at 10:58 AM, Analabha Roy <hariseldon99 at gmail.com>wrote:
>>
>>> Hi,
>>>
>>>  Sorry for misunderstanding
>>>
>>> I modified my source thus <http://pastebin.ca/2457850> so that the
>>> rows/cols/values for each call are printed before inserting into
>>> MatSetValues()
>>>
>>>
>>> Then ran it with 1,2 processors
>>>
>>>
>>> Here are the outputs <http://pastebin.ca/2457852>
>>>
>>>
>>> Strange! Running it with 2 procs and only half the values show up!!!!!
>>>
>>
>> PetscPrintf() only prints from rank 0. Use PETSC_COMM_SELF.
>>
>>
>
>
> Sorry. Modified accordingly and here is new output<http://pastebin.ca/2457857>(I manually reordered the output of the 2 procs case since the order in
> which it was printed was haphazard)
>
>
> All the elements do not match.
>
>
>
>>
>>    Matt
>>
>>
>>> And even those do not match!!!!
>>>
>>>
>>>
>>>
>>> On Tue, Sep 24, 2013 at 11:12 PM, Matthew Knepley <knepley at gmail.com>wrote:
>>>
>>>> On Tue, Sep 24, 2013 at 10:39 AM, Analabha Roy <hariseldon99 at gmail.com>wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Sep 24, 2013 at 9:33 PM, Matthew Knepley <knepley at gmail.com>wrote:
>>>>>
>>>>>> On Tue, Sep 24, 2013 at 8:35 AM, Analabha Roy <hariseldon99 at gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Sep 24, 2013 at 8:41 PM, Matthew Knepley <knepley at gmail.com>wrote:
>>>>>>>
>>>>>>>> On Tue, Sep 24, 2013 at 8:08 AM, Analabha Roy <
>>>>>>>> hariseldon99 at gmail.com> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Sep 24, 2013 at 1:42 PM, Jed Brown <jedbrown at mcs.anl.gov>wrote:
>>>>>>>>>
>>>>>>>>>> Analabha Roy <hariseldon99 at gmail.com> writes:
>>>>>>>>>>
>>>>>>>>>> > Hi all,
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > Compiling and running this
>>>>>>>>>> > code<
>>>>>>>>>> https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c
>>>>>>>>>> >that
>>>>>>>>>> > builds a petsc matrix gives different results when run with
>>>>>>>>>> different
>>>>>>>>>> > number of processors.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> Thanks for the reply.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>  Uh, if you call rand() on different processors, why would you
>>>>>>>>>> expect it
>>>>>>>>>> to give the same results?
>>>>>>>>>>
>>>>>>>>>> Right, I get that. The rand() was a placeholder.
>>>>>>>>>
>>>>>>>>> This original much larger code<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c>replicates the same loop structure and runs the same Petsc subroutines, but
>>>>>>>>> running it by
>>>>>>>>>
>>>>>>>>> mpirun -np $N ./eth -lattice_size 5 -vector_size 1 -repulsion 0.0
>>>>>>>>> -draw_out -draw_pause -1
>>>>>>>>>
>>>>>>>>> with N=1,2,3,4 gives different results for the matrix dumped out
>>>>>>>>> by lines 514-519<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#514>.
>>>>>>>>> The matrix itself is evaluated in parallel, created in lines263-275
>>>>>>>>> <https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#263>and
>>>>>>>>> evaluated in lines 294-356<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#294>
>>>>>>>>>
>>>>>>>>> (you can click on the line numbers above to navigate directly to
>>>>>>>>> them)
>>>>>>>>>
>>>>>>>>> Here is a sample <http://i43.tinypic.com/zyhf2f.jpg> of the
>>>>>>>>> output of  lines 514-519<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#514>  for N=1,2,3,4 procs left to right.
>>>>>>>>>
>>>>>>>>> Thty're different for different procs. They should be the same,
>>>>>>>>> since none of my input parameters are numprocs dependent, and I don't
>>>>>>>>> explicitly use the size or rank anywhere in the code.
>>>>>>>>>
>>>>>>>>
>>>>>>>> You are likely not dividing the rows you loop over so you are
>>>>>>>> redundantly computing.
>>>>>>>>
>>>>>>>
>>>>>>> Thanks for the reply.
>>>>>>>
>>>>>>> Line 274<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#274>gets the local row indices of Petsc Matrix
>>>>>>> AVG_BDIBJ
>>>>>>>
>>>>>>> Line 295
>>>>>>> <https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#295>iterates
>>>>>>> over the local rows and the lines below get the column
>>>>>>> elements. For each row, the column elements are assigned by the
>>>>>>> lines up to  Line 344<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#344>and stored locally in colvalues[]. Dunno if the details are relevant.
>>>>>>>
>>>>>>> Line 347<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#347>inserts the sitestride1^th row into the matrix
>>>>>>>
>>>>>>> Line 353+<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#353>does the mat assembly
>>>>>>>
>>>>>>> Then, after a lot of currently irrelevant code,
>>>>>>>
>>>>>>> Line 514+<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#514>dumps the mat plot to graphics
>>>>>>>
>>>>>>>
>>>>>>> Different numprocs give different matrices.
>>>>>>>
>>>>>>> Can somebody suggest what  I did wrong (or didn't do)?
>>>>>>>
>>>>>>
>>>>>> Different values are being given to MatSetValues() for different
>>>>>> numbers of processes. So
>>>>>>
>>>>>>   1) Reduce this to the smallest problem size possible
>>>>>>
>>>>>>   2) Print out all rows/cols/values for each call
>>>>>>
>>>>>>   3) Compare 2 procs to the serial case
>>>>>>
>>>>>>
>>>>>
>>>>> Thanks for your excellent suggestion.
>>>>>
>>>>> I modified my code<https://code.google.com/p/daneelrepo/source/diff?spec=svn1435&r=1435&format=side&path=/eth_question/eth.c>to dump the matrix in binary
>>>>>
>>>>> Then I used this python script I had<https://code.google.com/p/daneelrepo/source/browse/eth_question/mat_bin2ascii.py>to convert to ascii
>>>>>
>>>>
>>>> Do not print the matrix, print the data you are passing to
>>>> MatSetValues().
>>>>
>>>> MatSetValues() is not likely to be broken. Every PETSc code in the
>>>> world calls this many times on every simulation.
>>>>
>>>>    Matt
>>>>
>>>>
>>>>>
>>>>> Here are the values of <http://pastebin.ca/2457842>AVG_BDIBJ<http://pastebin.ca/2457842>,
>>>>> a 9X9 matrix (the smallest possible problem size) run with the exact same
>>>>> input parameters with 1,2,3 and 4 procs
>>>>>
>>>>> As you can see, the 1 and 2 procs match up, but the 3 and 4 procs do
>>>>> not.
>>>>>
>>>>> Serious wierdness.
>>>>>
>>>>>
>>>>>
>>>>>>     Matt
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>>    Matt
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> for (sitestride1 = Istart; sitestride1 < Iend; sitestride1++)
>>>>>>>>>>     {
>>>>>>>>>>       for (sitestride2 = 0; sitestride2 < matsize; sitestride2++)
>>>>>>>>>>         {
>>>>>>>>>>           for (alpha = 0; alpha < dim; alpha++)
>>>>>>>>>>             {
>>>>>>>>>>               for (mu = 0; mu < dim; mu++)
>>>>>>>>>>                 for (lambda = 0; lambda < dim; lambda++)
>>>>>>>>>>                   {
>>>>>>>>>>                     vecval = rand () / rand ();
>>>>>>>>>>                   }
>>>>>>>>>>
>>>>>>>>>>               VecSetValue (BDB_AA, alpha, vecval, INSERT_VALUES);
>>>>>>>>>>
>>>>>>>>>>             }
>>>>>>>>>>           VecAssemblyBegin (BDB_AA);
>>>>>>>>>>           VecAssemblyEnd (BDB_AA);
>>>>>>>>>>           VecSum (BDB_AA, &element);
>>>>>>>>>>           colvalues[sitestride2] = element;
>>>>>>>>>>
>>>>>>>>>>         }
>>>>>>>>>>       //Insert the array of colvalues to the sitestride1^th row
>>>>>>>>>> of H
>>>>>>>>>>       MatSetValues (AVG_BDIBJ, 1, &sitestride1, matsize, idx,
>>>>>>>>>> colvalues,
>>>>>>>>>>                     INSERT_VALUES);
>>>>>>>>>>
>>>>>>>>>>     }
>>>>>>>>>>
>>>>>>>>>> > The code is large and complex, so I have created a smaller
>>>>>>>>>> program
>>>>>>>>>> > with the same
>>>>>>>>>> > loop structure here. <http://pastebin.ca/2457643>
>>>>>>>>>> >
>>>>>>>>>> > Compile it and run it with "mpirun -np $N ./test -draw_pause
>>>>>>>>>> -1" gives
>>>>>>>>>> > different results for different values of N even though it's
>>>>>>>>>> not supposed
>>>>>>>>>> > to.
>>>>>>>>>>
>>>>>>>>>> What do you expect to see?
>>>>>>>>>>
>>>>>>>>>> > Here is a sample output <http://i42.tinypic.com/2s16ccw.jpg>
>>>>>>>>>> for N=1,2,3,4
>>>>>>>>>> > from left to right.
>>>>>>>>>> >
>>>>>>>>>> > Can anyone guide me as to what I'm doing wrong? Are any of the
>>>>>>>>>> petssc
>>>>>>>>>> > routines used not parallelizable?
>>>>>>>>>> >
>>>>>>>>>> > Thanks in advance,
>>>>>>>>>> >
>>>>>>>>>> > Regards.
>>>>>>>>>> >
>>>>>>>>>> > --
>>>>>>>>>> > ---
>>>>>>>>>> > *Analabha Roy*
>>>>>>>>>> > C.S.I.R <http://www.csir.res.in>  Senior Research
>>>>>>>>>> > Associate<http://csirhrdg.res.in/poolsra.htm>
>>>>>>>>>> > Saha Institute of Nuclear Physics <http://www.saha.ac.in>
>>>>>>>>>> > Section 1, Block AF
>>>>>>>>>> > Bidhannagar, Calcutta 700064
>>>>>>>>>> > India
>>>>>>>>>> > *Emails*: daneel at physics.utexas.edu, hariseldon99 at gmail.com
>>>>>>>>>> > *Webpage*: http://www.ph.utexas.edu/~daneel/
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> ---
>>>>>>>>> *Analabha Roy*
>>>>>>>>> C.S.I.R <http://www.csir.res.in>  Senior Research Associate<http://csirhrdg.res.in/poolsra.htm>
>>>>>>>>> Saha Institute of Nuclear Physics <http://www.saha.ac.in>
>>>>>>>>> Section 1, Block AF
>>>>>>>>> Bidhannagar, Calcutta 700064
>>>>>>>>> India
>>>>>>>>> *Emails*: daneel at physics.utexas.edu, hariseldon99 at gmail.com
>>>>>>>>> *Webpage*: http://www.ph.utexas.edu/~daneel/
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> What most experimenters take for granted before they begin their
>>>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>>>> experiments lead.
>>>>>>>> -- Norbert Wiener
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ---
>>>>>>> *Analabha Roy*
>>>>>>> C.S.I.R <http://www.csir.res.in>  Senior Research Associate<http://csirhrdg.res.in/poolsra.htm>
>>>>>>> Saha Institute of Nuclear Physics <http://www.saha.ac.in>
>>>>>>> Section 1, Block AF
>>>>>>> Bidhannagar, Calcutta 700064
>>>>>>> India
>>>>>>> *Emails*: daneel at physics.utexas.edu, hariseldon99 at gmail.com
>>>>>>> *Webpage*: http://www.ph.utexas.edu/~daneel/
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> What most experimenters take for granted before they begin their
>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>> experiments lead.
>>>>>> -- Norbert Wiener
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ---
>>>>> *Analabha Roy*
>>>>> C.S.I.R <http://www.csir.res.in>  Senior Research Associate<http://csirhrdg.res.in/poolsra.htm>
>>>>> Saha Institute of Nuclear Physics <http://www.saha.ac.in>
>>>>> Section 1, Block AF
>>>>> Bidhannagar, Calcutta 700064
>>>>> India
>>>>> *Emails*: daneel at physics.utexas.edu, hariseldon99 at gmail.com
>>>>> *Webpage*: http://www.ph.utexas.edu/~daneel/
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which their
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>
>>>
>>>
>>> --
>>> ---
>>> *Analabha Roy*
>>> C.S.I.R <http://www.csir.res.in>  Senior Research Associate<http://csirhrdg.res.in/poolsra.htm>
>>> Saha Institute of Nuclear Physics <http://www.saha.ac.in>
>>> Section 1, Block AF
>>> Bidhannagar, Calcutta 700064
>>> India
>>> *Emails*: daneel at physics.utexas.edu, hariseldon99 at gmail.com
>>> *Webpage*: http://www.ph.utexas.edu/~daneel/
>>>
>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>
>
>
> --
> ---
> *Analabha Roy*
> C.S.I.R <http://www.csir.res.in>  Senior Research Associate<http://csirhrdg.res.in/poolsra.htm>
> Saha Institute of Nuclear Physics <http://www.saha.ac.in>
> Section 1, Block AF
> Bidhannagar, Calcutta 700064
> India
> *Emails*: daneel at physics.utexas.edu, hariseldon99 at gmail.com
> *Webpage*: http://www.ph.utexas.edu/~daneel/
>



-- 
---
*Analabha Roy*
C.S.I.R <http://www.csir.res.in>  Senior Research
Associate<http://csirhrdg.res.in/poolsra.htm>
Saha Institute of Nuclear Physics <http://www.saha.ac.in>
Section 1, Block AF
Bidhannagar, Calcutta 700064
India
*Emails*: daneel at physics.utexas.edu, hariseldon99 at gmail.com
*Webpage*: http://www.ph.utexas.edu/~daneel/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20130925/301059e7/attachment-0001.html>


More information about the petsc-users mailing list