[petsc-users] Performance question using seq Mat and Vec

Matthew Knepley knepley at gmail.com
Thu Jul 31 15:35:06 CDT 2014


On Thu, Jul 31, 2014 at 3:24 PM, Brian Yang <jyang29 at uh.edu> wrote:

> Hi all,
>
> Here's an abstract of the problem,
>
> I got src and rec, they are 3D images with the same size, say Z, X, Y.
>
> We call one (Z, X) is a panel and then there's Y panels for both src and
> rec. BTW, they hold complex numbers.
>
> For example, for the *first* panel (always process the same panel) of src
> and rec:
>
> Take the first panel of src as our A (20x20),
> take the first column of first panel of rec as our b (20x1),
> solve the linear system and get x (20x1),
> go to next column of the first panel of rec until finish this panel,
> assemble all the solution x column by column (20x20).
>

This is a fine conceptual explanation of the algorithm, however I do not
think you
want to implement it this way. Since you are solving all these panels
independently,
you can just construct the block matrix, with each panel as a block and
solve it all
at once (they clearly fit into memory). This might not be optimal for
multiple rhs.

If the matrices really are dense and you have multiple rhs, then you should
look at
using Elemental. We have an interface to it, although I am not sure we have
hooked up
the multiple rhs solves.


> After finishing the first panel of src and rec, go to next... repeat.
>
>
> Hope I explained well of my problem. I used SeqDense matrix for A and Seq
> vector for b.
>
> Here's the flow,
>
> - start
> - all the nodes will share all the Y panels, each node will get part of
> them
> - each node will read in its own part of src and rec images
> - for each node, take a panel of src and rec
>
> *- create Mat and Vec, fill them*
>
> *- create KSP and solve by lsqr*
>
> *- get the solution*
> *- destroy all the petsc object, A, b, x (destroying KSP will give me
> error here!)*
> - repeat for the next panel
>
>
> Here's the time (seconds) output from node 2 (random choice):
>
>                            *entire time for this panel*            *solving
> time*
>
>  processing panel           1 *time*=  3.2995000E-02 *solver*=
>> 3.0995002E-02
>>  processing panel           2 time=  3.5994001E-02 solver=  3.4995001E-02
>>  processing panel           3 time=  3.9994001E-02 solver=  3.8994007E-02
>>  processing panel           4 time=  4.4993997E-02 solver=  4.3993995E-02
>>  processing panel           5 time=  4.8991993E-02 solver=  4.6992987E-02
>>  processing panel           6 time=  5.4991007E-02 solver=  5.3991005E-02
>>  processing panel           7 time=  5.8990985E-02 solver=  5.7990998E-02
>>  processing panel           8 time=  6.3990027E-02 solver=  6.1990023E-02
>>  processing panel           9 time=  6.8989992E-02 solver=  6.6990018E-02
>>  processing panel          10 time=  7.3989004E-02 solver=  7.1989000E-02
>>  processing panel          11 time=  7.7987969E-02 solver=  7.6987982E-02
>>  processing panel          12 time=  8.1988037E-02 solver=  7.9988003E-02
>>  processing panel          13 time=  8.8985980E-02 solver=  8.6987019E-02
>>  processing panel          14 time=  9.4985008E-02 solver=  9.2984974E-02
>>  processing panel          15 time=  0.1009850     solver=  9.8985016E-02
>>  processing panel          16 time=  0.1119831     solver=  0.1099830
>>  processing panel          17 time=  0.1269809     solver=  0.1239820
>>  processing panel          18 time=  0.1469780     solver=  0.1439790
>>  processing panel          19 time=  0.1709731     solver=  0.1669741
>>  processing panel          20 time=  0.1909720     solver=  0.1869720
>>  processing panel          21 time=  0.2019690     solver=  0.1979700
>>  processing panel          22 time=  0.2239659     solver=  0.2199659
>>  processing panel          23 time=  0.2369640     solver=  0.2319648
>>  processing panel          24 time=  0.2499621     solver=  0.2449629
>>  processing panel          25 time=  0.2709589     solver=  0.2659600
>>  processing panel          26 time=  0.2869561     solver=  0.2829571
>>  processing panel          27 time=  0.3129530     solver=  0.3059540
>>  processing panel          28 time=  0.3389480     solver=  0.3329499
>>  processing panel          29 time=  0.3719430     solver=  0.3649440
>>  processing panel          30 time=  0.3949399     solver=  0.3879409
>>  processing panel          31 time=  0.4249353     solver=  0.4169374
>>  processing panel          32 time=  0.4549308     solver=  0.4469318
>>  processing panel          33 time=  0.4859262     solver=  0.4759283
>>  processing panel          34 time=  0.5119228     solver=  0.5019240
>>  processing panel          35 time=  0.5449171     solver=  0.5349178
>>  processing panel          36 time=  0.5689130     solver=  0.5579152
>>  processing panel          37 time=  0.5959096     solver=  0.5849104
>>  processing panel          38 time=  0.6199055     solver=  0.6079073
>>
>
> You could see the time for solving the panels are increasing all the time.
> The panel number here is the local one. If I start to solve from panel 40
> (random choice):
>

It certainly looks like you have a growing memory footprint. It is likely
to have happened
when you extracted/replaced parts of the matrix, which I think is
unnecessary as I said above.

  Thanks,

     Matt


>  processing panel          40 time=  5.5992007E-02 solver=  5.1991999E-02
>>  processing panel          41 time=  9.1986001E-02 solver=  9.0986013E-02
>>  processing panel          42 time=  0.1309800     solver=  0.1299810
>>  processing panel          43 time=  0.1719730     solver=  0.1709740
>>  processing panel          44 time=  0.2119681     solver=  0.2109680
>>  processing panel          45 time=  0.2529620     solver=  0.2519621
>>  processing panel          46 time=  0.2919550     solver=  0.2909551
>>  processing panel          47 time=  0.3319499     solver=  0.3309500
>>  processing panel          48 time=  0.3719430     solver=  0.3709428
>>  processing panel          49 time=  0.4129372     solver=  0.4109371
>>  processing panel          50 time=  0.4529319     solver=  0.4509320
>>  processing panel          51 time=  0.4929240     solver=  0.4909239
>>  processing panel          52 time=  0.5339203     solver=  0.5319204
>>  processing panel          53 time=  0.5779119     solver=  0.5759130
>>  processing panel          54 time=  0.6199059     solver=  0.6179061
>>  processing panel          55 time=  0.6648979     solver=  0.6628990
>>  processing panel          56 time=  0.7248902     solver=  0.7218900
>>  processing panel          57 time=  0.7938790     solver=  0.7908792
>>  processing panel          58 time=  0.8728676     solver=  0.8698678
>>  processing panel          59 time=  0.9778509     solver=  0.9748516
>>  processing panel          60 time=   1.125830     solver=   1.122829
>>  processing panel          61 time=   1.273806     solver=   1.268806
>>  processing panel          62 time=   1.448780     solver=   1.444779
>>  processing panel          63 time=   1.647749     solver=   1.643749
>>  processing panel          64 time=   1.901712     solver=   1.896712
>>  processing panel          65 time=   2.143673     solver=   2.138674
>>  processing panel          66 time=   2.437630     solver=   2.431629
>>  processing panel          67 time=   2.744583     solver=   2.736586
>>  processing panel          68 time=   3.041536     solver=   3.035538
>>
>
> The trend is the same, the time is increasing and also starts from a very
> quick one.
>
>
> Since I have thousands of panels for src and rec, the execution time is
> unbearable as it goes.
> So I am wondering whether I used the right method? or there's memory issue?
>
> Thanks.
>



-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20140731/d1e75269/attachment-0001.html>


More information about the petsc-users mailing list