<div class="gmail_quote">On Thu, Sep 1, 2011 at 12:45, Likun Tan <span dir="ltr">&lt;<a href="mailto:likunt@andrew.cmu.edu">likunt@andrew.cmu.edu</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<div id=":12l">I still have some confusions. When computing B before MPI_Scan(), could i<br>

compute the values in parallel? After using MPI_Scan(), does that mean the<br>

columns of B will be gathered in one processor?<br></div></blockquote><div><br></div><div>No, the scan is just computing the start column for a given process.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<div id=":12l">

<br>

Here are main steps i took based on your suggestion,<br>

<div class="im"><br>

PetscSplitOwnership(PETSC_COMM_WORLD, &amp;n, &amp;N);<br></div></div></blockquote><div><br></div><div>Add these two lines here:</div><div><br></div><div><div><div><div>MPI_Scan(&amp;n, &amp;cstart, 1, MPIU_INT, MPI_SUM, PETSC_COMM_WORLD);</div>

<div>cstart -= n;</div></div></div></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div id=":12l"><div class="im">

MatCreateSeqDense(PETSC_COMM_SELF, M, n, PETSC_NULL, &amp;B);<br>

</div>MatCreateSeqDense(PETSC_COMM_SELF, M, M, PETSC_NULL, &amp;A);<br></div></blockquote><div><br></div><div>Good</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<div id=":12l">

MatCreateSeqDense(PETSC_COMM_SELF, M, N, PETSC_NULL, &amp;x);<br></div></blockquote><div><br></div><div>Replace with</div><div><br></div><div>MatDuplicate(B,MAT_DO_NOT_COPY_VALUES,&amp;x);</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<div id=":12l">

<div class="im">for(j=0; j&lt;N; j++)<br></div></div></blockquote><div><br></div><div>change the loop bounds here:</div><div><br></div><div>for(j=0; j&lt;n; j++)</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<div id=":12l"><div class="im">

{<br>

   for(i=0; i&lt;M; i++)<br>

   {<br></div></div></blockquote><div><br></div><div>Good, now compute value as the value that goes in (i,cstart+j).</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<div id=":12l"><div class="im">

</div>      MatSetValues(B, 1, &amp;i, 1, &amp;j, &amp;value, INSERT_VALUES);<br>

   }<br>

}<br>

MatAssemblyBegin(...);<br>

MatAssemblyEnd(...)<br></div></blockquote><div><br></div><div>This part is correct.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div id=":12l">

<div class="im">MPI_Scan(&amp;n, &amp;cstart, 1, MPIU_INT, MPI_SUM, PETSC_COMM_WORLD);<br>

cstart -= n;<br></div></div></blockquote><div><br></div><div>We did this already.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div id=":12l"><div class="im">


<br>

</div>MatConvert(...);<br>

MatCholeskyFactor(...);<br>

MatMatSolve(...);</div></blockquote></div><br><div>Yes.</div><div><br></div><div><br></div><div>You can gather the matrix x onto all processes if you need the whole result everywhere, but for performance reasons if you scale further, you should avoid it if possible.</div>