<div dir="ltr">I will check the block matrix, thanks Mat.<br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Jul 31, 2014 at 3:35 PM, Matthew Knepley <span dir="ltr"><<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div class="">On Thu, Jul 31, 2014 at 3:24 PM, Brian Yang <span dir="ltr"><<a href="mailto:jyang29@uh.edu" target="_blank">jyang29@uh.edu</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr"><div><div><div><div><div><div><div><div><div><div>Hi all,<br><br></div>Here's an abstract of the problem,<br><br></div>I got src and rec, they are 3D images with the same size, say Z, X, Y.<br></div><br>
</div>We call one (Z, X) is a panel and then there's Y panels for both src and rec. BTW, they hold complex numbers.<br><br></div>For example, for the <b>first</b> panel (always process the same panel) of src and rec:<br>
<br></div>Take the first panel of src as our A (20x20),<br></div>take the first column of first panel of rec as our b (20x1),<br></div>solve the linear system and get x (20x1),<br></div>go to next column of the first panel of rec until finish this panel,<br>
</div><div>assemble all the solution x column by column (20x20).</div></div></blockquote><div><br></div></div><div>This is a fine conceptual explanation of the algorithm, however I do not think you</div><div>want to implement it this way. Since you are solving all these panels independently,</div>
<div>you can just construct the block matrix, with each panel as a block and solve it all</div><div>at once (they clearly fit into memory). This might not be optimal for multiple rhs.</div><div><br></div><div>If the matrices really are dense and you have multiple rhs, then you should look at</div>
<div>using Elemental. We have an interface to it, although I am not sure we have hooked up</div><div>the multiple rhs solves.</div><div><div class="h5"><div>�</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr"><div>After finishing the first panel of src and rec, go to next... repeat.<br><br><br></div><div>Hope I explained well of my problem. I used SeqDense matrix for A and Seq vector for b.<br>
</div><div><br></div><div>Here's the flow,<br><br></div><div>- start<br></div><div>- all the nodes will share all the Y panels, each node will get part of them<br></div><div>- each node will read in its own part of src and rec images<br>
</div><div>- for each node, take a panel of src and rec<br></div><div><b>- create Mat and Vec, fill them<br></b></div><div><b>- create KSP and solve by lsqr<br></b></div><div><b>- get the solution<br></b></div><div><b>- destroy all the petsc object, A, b, x (destroying KSP will give me error here!)</b><br>
</div><div>- repeat for the next panel<br><br><br></div><div>Here's the time (seconds) output from node 2 (random choice):<br><br></div><div>�������������������������� <u><b>entire time for this panel</b></u>����������� <u><i>solving time</i></u><br>
</div><div><br><blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" class="gmail_quote">�processing panel���������� 1 <u><b>time</b></u>=� 3.2995000E-02 <u><i>solver</i></u>=� 3.0995002E-02<br>
�processing panel���������� 2 time=� 3.5994001E-02 solver=� 3.4995001E-02<br>�processing panel���������� 3 time=� 3.9994001E-02 solver=� 3.8994007E-02<br>�processing panel���������� 4 time=� 4.4993997E-02 solver=� 4.3993995E-02<br>
�processing panel���������� 5 time=� 4.8991993E-02 solver=� 4.6992987E-02<br>�processing panel���������� 6 time=� 5.4991007E-02 solver=� 5.3991005E-02<br>�processing panel���������� 7 time=� 5.8990985E-02 solver=� 5.7990998E-02<br>
�processing panel���������� 8 time=� 6.3990027E-02 solver=� 6.1990023E-02<br>�processing panel���������� 9 time=� 6.8989992E-02 solver=� 6.6990018E-02<br>�processing panel��������� 10 time=� 7.3989004E-02 solver=� 7.1989000E-02<br>
�processing panel��������� 11 time=� 7.7987969E-02 solver=� 7.6987982E-02<br>�processing panel��������� 12 time=� 8.1988037E-02 solver=� 7.9988003E-02<br>�processing panel��������� 13 time=� 8.8985980E-02 solver=� 8.6987019E-02<br>
�processing panel��������� 14 time=� 9.4985008E-02 solver=� 9.2984974E-02<br>�processing panel��������� 15 time=� 0.1009850���� solver=� 9.8985016E-02<br>�processing panel��������� 16 time=� 0.1119831���� solver=� 0.1099830��� <br>
�processing panel��������� 17 time=� 0.1269809���� solver=� 0.1239820��� <br>�processing panel��������� 18 time=� 0.1469780���� solver=� 0.1439790��� <br>�processing panel��������� 19 time=� 0.1709731���� solver=� 0.1669741��� <br>
�processing panel��������� 20 time=� 0.1909720���� solver=� 0.1869720��� <br>�processing panel��������� 21 time=� 0.2019690���� solver=� 0.1979700��� <br>�processing panel��������� 22 time=� 0.2239659���� solver=� 0.2199659��� <br>
�processing panel��������� 23 time=� 0.2369640���� solver=� 0.2319648��� <br>�processing panel��������� 24 time=� 0.2499621���� solver=� 0.2449629��� <br>�processing panel��������� 25 time=� 0.2709589���� solver=� 0.2659600��� <br>
�processing panel��������� 26 time=� 0.2869561���� solver=� 0.2829571��� <br>�processing panel��������� 27 time=� 0.3129530���� solver=� 0.3059540��� <br>�processing panel��������� 28 time=� 0.3389480���� solver=� 0.3329499��� <br>
�processing panel��������� 29 time=� 0.3719430���� solver=� 0.3649440��� <br>�processing panel��������� 30 time=� 0.3949399���� solver=� 0.3879409��� <br>�processing panel��������� 31 time=� 0.4249353���� solver=� 0.4169374��� <br>
�processing panel��������� 32 time=� 0.4549308���� solver=� 0.4469318��� <br>�processing panel��������� 33 time=� 0.4859262���� solver=� 0.4759283��� <br>�processing panel��������� 34 time=� 0.5119228���� solver=� 0.5019240��� <br>
�processing panel��������� 35 time=� 0.5449171���� solver=� 0.5349178��� <br>�processing panel��������� 36 time=� 0.5689130���� solver=� 0.5579152��� <br>�processing panel��������� 37 time=� 0.5959096���� solver=� 0.5849104��� <br>
�processing panel��������� 38 time=� 0.6199055���� solver=� 0.6079073<br></blockquote><div><br></div><div>You could see the time for solving the panels are increasing all the time. The panel number here is the local one. If I start to solve from panel 40 (random choice):<br>
</div></div></div></blockquote><div><br></div></div></div><div>It certainly looks like you have a growing memory footprint. It is likely to have happened</div><div>when you extracted/replaced parts of the matrix, which I think is unnecessary as I said above.</div>
<div><br></div><div>� Thanks,</div><div><br></div><div>� � �Matt</div><div><div class="h5"><div>�</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">
<div><div><blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" class="gmail_quote">
�processing panel��������� 40 time=� 5.5992007E-02 solver=� 5.1991999E-02<br>�processing panel��������� 41 time=� 9.1986001E-02 solver=� 9.0986013E-02<br>
�processing panel��������� 42 time=� 0.1309800���� solver=� 0.1299810��� <br>�processing panel��������� 43 time=� 0.1719730���� solver=� 0.1709740��� <br>�processing panel��������� 44 time=� 0.2119681���� solver=� 0.2109680��� <br>
�processing panel��������� 45 time=� 0.2529620���� solver=� 0.2519621��� <br>�processing panel��������� 46 time=� 0.2919550���� solver=� 0.2909551��� <br>�processing panel��������� 47 time=� 0.3319499���� solver=� 0.3309500��� <br>
�processing panel��������� 48 time=� 0.3719430���� solver=� 0.3709428��� <br>�processing panel��������� 49 time=� 0.4129372���� solver=� 0.4109371��� <br>�processing panel��������� 50 time=� 0.4529319���� solver=� 0.4509320��� <br>
�processing panel��������� 51 time=� 0.4929240���� solver=� 0.4909239��� <br>�processing panel��������� 52 time=� 0.5339203���� solver=� 0.5319204��� <br>�processing panel��������� 53 time=� 0.5779119���� solver=� 0.5759130��� <br>
�processing panel��������� 54 time=� 0.6199059���� solver=� 0.6179061��� <br>�processing panel��������� 55 time=� 0.6648979���� solver=� 0.6628990��� <br>�processing panel��������� 56 time=� 0.7248902���� solver=� 0.7218900��� <br>
�processing panel��������� 57 time=� 0.7938790���� solver=� 0.7908792��� <br>�processing panel��������� 58 time=� 0.8728676���� solver=� 0.8698678��� <br>�processing panel��������� 59 time=� 0.9778509���� solver=� 0.9748516��� <br>
�processing panel��������� 60 time=�� 1.125830���� solver=�� 1.122829��� <br>�processing panel��������� 61 time=�� 1.273806���� solver=�� 1.268806��� <br>�processing panel��������� 62 time=�� 1.448780���� solver=�� 1.444779��� <br>
�processing panel��������� 63 time=�� 1.647749���� solver=�� 1.643749��� <br>�processing panel��������� 64 time=�� 1.901712���� solver=�� 1.896712��� <br>�processing panel��������� 65 time=�� 2.143673���� solver=�� 2.138674��� <br>
�processing panel��������� 66 time=�� 2.437630���� solver=�� 2.431629��� <br>�processing panel��������� 67 time=�� 2.744583���� solver=�� 2.736586��� <br>�processing panel��������� 68 time=�� 3.041536���� solver=�� 3.035538<br>
</blockquote><div><br></div><div>The trend is the same, the time is increasing and also starts from a very quick one. <br><br><br></div><div>Since I have thousands of panels for src and rec, the execution time is unbearable as it goes.<br>
</div><div>So I am wondering whether I used the right method? or there's memory issue?<br><br></div><div>Thanks.<br></div></div></div></div>
</blockquote></div></div></div><span class="HOEnZb"><font color="#888888"><br><br clear="all"><div><br></div>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
-- Norbert Wiener
</font></span></div></div>
</blockquote></div><br><br clear="all"><br>-- <br>Brian Yang<br>U of Houston<br><br><br><br>
</div>