[petsc-users] Need advice to do blockwise matrix-vector multiplication
Barry Smith
bsmith at mcs.anl.gov
Fri Jan 20 14:51:15 CST 2017
> On Jan 20, 2017, at 2:18 PM, Jed Brown <jed at jedbrown.org> wrote:
>
> Barry Smith <bsmith at mcs.anl.gov> writes:
>
>> Fangbo,
>>
>> Thanks, your explanation is clear
>>
>> Jed,
>>
>> It doesn't look like MatCreateNest() is a appropriate since MATNEST assumes each matrix is parallel?
>
> Well, there's nothing stopping you from making a parallel matrix with no
> entries on most (all but one or two) processes. But doing this manually
> is likely to be a major headache as the number of processes is scaled.
> The description here makes it sound like this particular structure might
> be nicely represented as a tensor contraction, for which there are
> libraries (e.g., https://github.com/solomonik/ctf)
Should we have a matrix class based on this software; rather than custom MatShell for each person? If we did this what are the best tensor contraction libraries to use? What are the abstractions? Especially when it is sparse matrices in sparse matrices?
Barry
> that could be put
> within a MatShell to use with PETSc. Also, the size here isn't that big
> since the blocks are themselves sparse. Depending on what
> preconditioner is being used, the naive sparse matrix representation
> might be required anyway and optimization of the MatMult to reduce its
> memory (/bandwidth) footprint might be premature.
>
>> What should he do in this case.
>>
>> Barry
>>
>>> On Jan 20, 2017, at 1:18 PM, Fangbo Wang <fangbowa at buffalo.edu> wrote:
>>>
>>> Thank you very much for your reply, Barry.
>>>
>>> The block matrix size is 1.35 million by 1.35 million, and the matrix can be divided into 45 blocks along each direction, so there are 45X45=2025 blocks in the matrix (size of each block is 30,000 by 30,000) . Fortunately there are only around 700 non-zeros blocks which are also sparse. Correspondingly, the unknown vector and right-hand-side vector (both with size 1.35million by 1) can also be divided into 45 block vectors.
>>>
>>> Most of the blocks are similar, assume I have a block A, there are dozens of blocks similar to A by a scalar multiplication (2A, 3A ...). Finally, I only need to save 45 different blocks and a few scalars for this matrix to save memory usage(the reason why 45 different blocks comes from physical concept).
>>>
>>> <IMG_2888.JPG>
>>>
>>> Is this clear to explain my problem?
>>>
>>> Thank you very much!
>>>
>>> Best regards,
>>>
>>> Fangbo
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Jan 19, 2017 at 6:19 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>
>>>> On Jan 19, 2017, at 4:10 PM, Fangbo Wang <fangbowa at buffalo.edu> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Background:
>>>>
>>>> I am using stochastic finite element to solve a solid mechanics problem with random material properties. At the end of the day, I get a linear system of equations Ax=b to solve.
>>>>
>>>> The matrix A is very large with size of 1.3million by 1.3 million, and to save this matrix needs more than 100 G memory. Fortunately, matrix A has some nice features that it is a block matrix, most of the blocks inside the matrix are similar, each block is 10,000 by 10,000.
>>>>
>>>>
>>>> Hence, I only need to save some blocks (in my case 45). Most of the computation in my iterative solver is matrix-vec multiplication, that's why I want to do it using block matrices.
>>>> <FIG-2-Color-online-A-symmetric-block-Toeplitz-matrix-Each-block-is-also-a-symmetric.png>
>>>>
>>>>
>>>>
>>>>
>>>> Current:
>>>> I tried to parallelize all my 45 block matrices in all the processors, and all the corresponding 45 block vectors in all the processors. However, the computation seems to be very slow, and no scalability at all.
>>>> I am thinking of using small groups of processors to separate the computation, like using intra-communicators and inter-communicators. Maybe this will help to reduce the communication.
>>>
>>> No, just make things excessively complex.
>>>
>>>>
>>>> Any one have some experiences on this? Is there any Petsc function to do these jobs? I am open to any suggestions.
>>>
>>> Based on your picture it looks like if the matrix was explicitly formed it would be dense? Or are your 45 "small matrices" sparse? Are there any "empty" block matrices in your diagram or are they all one of the 45 small ones?
>>>
>>> There are two ways to order your unknowns; one with all unknowns for one "block" then all unknowns for the next block ... or interlacing the unknowns between blocks. Depending on the structure of the problem one or the other way can be significently better.
>>>
>>> The MatNest construct may be the way to go; it will behave like forming the full matrix but for each block in the matrix you would just have a pointer to the correct small matrix so you don't store the individual matrices more than once.
>>>
>>> Also if you get no speed up you need to verify that it is not due to the hardware or badly configured software so run the streams benchmark and make sure you have a good MPI binding http://www.mcs.anl.gov/petsc/documentation/faq.html#computers
>>>
>>>
>>>
>>>
>>> Barry
>>>
>>>>
>>>> Thank you very much!
>>>>
>>>>
>>>>
>>>> Fangbo Wang
>>>>
>>>> --
>>>> Fangbo Wang, PhD student
>>>> Stochastic Geomechanics Research Group
>>>> Department of Civil, Structural and Environmental Engineering
>>>> University at Buffalo
>>>> Email: fangbowa at buffalo.edu
>>>
>>>
>>>
>>>
>>> --
>>> Fangbo Wang, PhD student
>>> Stochastic Geomechanics Research Group
>>> Department of Civil, Structural and Environmental Engineering
>>> University at Buffalo
>>> Email: fangbowa at buffalo.edu
More information about the petsc-users
mailing list