<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /></head><body style='font-size: 10pt; font-family: Verdana,Geneva,sans-serif'>

<p>Il 18-05-2016 21:23 Barry Smith ha scritto:</p>

<blockquote type="cite" style="padding: 0 0.4em; border-left: #1010ff 2px solid; margin: 0"><!-- html ignored --><!-- head ignored --><!-- meta ignored -->

<div class="pre" style="margin: 0; padding: 0; font-family: monospace"><br />   Are the many small matrices you are extracting consisting of parts coming from different processes? Or does each sub matrix come from the same process (and remain on the same process) or are most coming from the same process but some shared across multiple processes? <br /> <br />    The parallel code was originally written assuming one wishes to get a small number of large sub matrices. Thus it is a bit communication heavy, for example it doesn't share any communications or all reduce across the matrices. It is, of course, possible to write custom code for your case but in order to help you to do that we need to know more about your particular use case.<br /> <br />    Barry<br /> <br /> <br />

<blockquote type="cite" style="padding: 0 0.4em; border-left: #1010ff 2px solid; margin: 0">On May 18, 2016, at 11:41 AM, Jed Brown <<a href="mailto:jed@jedbrown.org">jed@jedbrown.org</a>> wrote:<br /> <br /> Could you give some more information about your preconditioner?<br /> Sequential extraction of many small submatrices is not high performance,<br /> but perhaps we can restructure.<br /> <br /> <a href="mailto:victor.magri@dicea.unipd.it">victor.magri@dicea.unipd.it</a> writes:<br /> <br />

<blockquote type="cite" style="padding: 0 0.4em; border-left: #1010ff 2px solid; margin: 0"><br /> <br /> Dear PETSc developers, <br /> I have a matrix of the type MATMPIAIJ from which I want to extract small<br /> sub-matrices and sub-vectors (number of rows from 1 to 50 maximum)<br /> several times. This is a very fundamental step of a preconditioner that<br /> I'm writing. After that, I want to solve these sub-systems locally. <br /> <br /> In my current implementation I use MatGetSubMatrices with<br /> MAT_INITIAL_MATRIX to extract the sub-matrices and MatGetRow plus some<br /> other artifacts to extract the sub-vectors. However, I see that this<br /> procedure is too costly, once it creates and destroys the output<br /> variables each time they are called. I thought about using the MAT_REUSE<br /> flag, but I can't since the nonzero pattern of the local sub-matrix may<br /> change from call to call. <br /> <br /> Can you give me some hint on how to do this task efficiently? I<br /> appreciate your help! <br /> <br /> <span class="sig">-- <br /> <br /> Victor A. P. Magri - PhD student<br /> Dept. of Civil, Environmental and Architectural Eng.<br /> University of Padova<br /> Via Marzolo, 9 - 35131 Padova, Italy </span></blockquote>

</blockquote>

</div>

</blockquote>

<p>Thank you for the prompt reply, Barry and Jed.</p>

<p>If a communication reducing algorithm is firstly used such as the ones provided by METIS or Scotch, I would say that these sub-matrices are mostly coming from the same process and just some of them are shared across multiple processes. The preconditioner that I'm implementing is the Factorized Sparse Approximate Inverse with adaptive nonzero pattern generation. You can find it in the journal paper <a href="http://dl.acm.org/citation.cfm?doid=2732672.2629475">http://dl.acm.org/citation.cfm?doid=2732672.2629475</a>. It relies in an explicit approximation (called matrix F) for the inverse of the lower triangular Cholesky factor.</p>

<p>Given a maximum number of nonzeros per row of the F matrix, we want to find the best column positions and their respective values which gives the largest reduction in the Kaporin condition number of the preconditioned matrix. This search can be done concurrently from row to row. Considering an arbitrary row of F, we have to gather a sub-matrix and a sub-vector with sizes equal to the current number of nonzero elements already found for F and solve this subsystem, this process is repeated up to when the desired number of nonzeros is reached. Since the maximum size of these subsystems is known, I would like to allocate two local buffer variables (one for the sub-matrix and the other for the sub-vector) just one time and use it along the whole process, instead of creating and destroying new variables each time the sub-systems are gathered. This should reduce a lot the time for building the preconditioner since it is a task which is done a number of times. </p>

<p>Thank you for your help!</p>

<div>-- <br />

<div class="pre" style="margin: 0; padding: 0; font-family: monospace">Victor A. P. Magri - PhD student<br /> Dept. of Civil, Environmental and Architectural Eng.<br /> University of Padova<br /> Via Marzolo, 9 - 35131 Padova, Italy</div>

</div>

</body></html>