<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, May 24, 2017 at 1:13 PM, Michał Dereziński <span dir="ltr"><<a href="mailto:michal.derezinski@gmail.com" target="_blank">michal.derezinski@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word"><br><div><blockquote type="cite"><div>Wiadomość napisana przez Matthew Knepley <<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>> w dniu 24.05.2017, o godz. 10:44:</div><span class="gmail-"><br class="gmail-m_5277196854229442541Apple-interchange-newline"><div><div dir="ltr" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><div class="gmail_extra"><div class="gmail_quote">On Wed, May 24, 2017 at 12:37 PM, Michał Dereziński<span class="gmail-m_5277196854229442541Apple-converted-space"> </span><span dir="ltr"><<a href="mailto:michal.derezinski@gmail.com" target="_blank">michal.derezinski@<wbr>gmail.com</a>></span><span class="gmail-m_5277196854229442541Apple-converted-space"> </span>wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word"><div>Great! Then I have a follow-up question:</div><div><br></div><div>My goal is to be able to load the full matrix X from disk, while at the same time in parallel, performing computations on the submatrices that have already been loaded. Essentially, I want to think of X as a block matrix (where the blocks are horizontal, spanning the full width of the matrix), where I’m loading one block at a time, and all the blocks that have already been loaded are combined using MatCreateNest, so that I can make computations on that portion of the matrix.</div></div></blockquote><div><br></div><div>I need to understand better. So</div><div><br></div><div>  1) You want to load a sparse matrix from disk</div><div><br></div></div></div></div></div></span></blockquote><div><br></div><div>Yes, the matrix is sparse, stored on disk in row-wise chunks (one per process), with total size of around 3TB.</div><span class="gmail-"><br><blockquote type="cite"><div><div dir="ltr" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><div class="gmail_extra"><div class="gmail_quote"><div>  2) You are imagining that it is loaded row-wise, since you can do a calculation with some rows before others are loaded.</div><div><br></div><div>       What calculation, a MatMult?</div><div>       How long does that MatMult take compared to loading?</div><div><br></div></div></div></div></div></blockquote><div><br></div></span><div>Yes, a MatMult.</div><div>I already have a more straightforward implementation where the matrix is loaded completely at the beginning, and then all of the multiplications are performed.</div><div>Based on the loading time and computation time with the current implementation, it appears that most of the computation time could be subsumed into the loading time.</div><span class="gmail-"><br><blockquote type="cite"><div><div dir="ltr" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><div class="gmail_extra"><div class="gmail_quote"><div>  3) If you are talking about a dense matrix, you should be loading in parallel using MPI-I/O. We do this for Vec.</div><div><br></div><div>Before you do complicated programming, I would assure myself that the performance gain is worth it.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word"><div>In this scenario, every process needs to be simultaneously loading the next block of X, and perform computations on the previously loaded portion. My strategy is for each MPI process to spawn a thread for data loading (so that the memory between the process and the thread is shared), while the process does computations. My concern is that the data loading thread may be using up computational resources of the processor, even though it is mainly doing IO. Will this be an issue? What is the best way to minimize the cpu time of this parallel data loading scheme?</div></div></blockquote><div><br></div><div>Oh, you want to load each block in parallel, but there are many blocks. I would really caution you against using threads. They</div><div>are death to clean code. Use non-blocking reads.</div></div></div></div></div></blockquote><div><br></div></span><div>I see. Could you expand on your suggestion regarding non-blocking reads? Are you proposing that each process makes an asynchronous read request in between every, say, MatMult operation?</div></div></div></blockquote><div><br></div><div>Check this out: <a href="http://beige.ucs.indiana.edu/I590/node109.html">http://beige.ucs.indiana.edu/I590/node109.html</a></div><div><br></div><div>PETSc does not do this currently, but it sounds like you are handling the load.</div><div><br></div><div>  Thanks,</div><div><br></div><div>    Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word"><div><div><div class="gmail-h5"><blockquote type="cite"><div><div dir="ltr" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div> <span class="gmail-m_5277196854229442541Apple-converted-space"> </span>Thanks,</div><div><br></div><div>     Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word"><div>Thanks,</div><div>Michal.</div><div><br></div><br><div><blockquote type="cite"><div>Wiadomość napisana przez Matthew Knepley <<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>> w dniu 24.05.2017, o godz. 04:55:</div><br class="gmail-m_5277196854229442541m_-4470384141015376847Apple-interchange-newline"><div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, May 24, 2017 at 1:09 AM, Michal Derezinski<span class="gmail-m_5277196854229442541Apple-converted-space"> </span><span dir="ltr"><<a href="mailto:mderezin@ucsc.edu" target="_blank">mderezin@ucsc.edu</a>></span><span class="gmail-m_5277196854229442541Apple-converted-space"><wbr> </span>wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hi,<div><br></div><div>I want to be able to perform matrix operations on several contiguous submatrices of a full matrix, without allocating the memory redundantly for the submatrices (in addition to the memory that is already allocated for the full matrix).</div><div>I tried using MatGetSubMatrix, but this function appears to allocate the additional memory.</div><div><br></div><div>The other way I found to do this is to create the smallest submatrices I need first, then use MatCreateNest to combine them into bigger ones (including the full matrix).</div><div>The documentation of MatCreateNest seems to indicate that it does not allocate additional memory for storing the new matrix.</div><div>Is this the right approach, or is there a better one?</div></div></blockquote><div><br></div><div>Yes, that is the right approach.</div><div><br></div><div> <span class="gmail-m_5277196854229442541Apple-converted-space"> </span>Thanks,</div><div><br></div><div>   <span class="gmail-m_5277196854229442541Apple-converted-space"> </span>Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Thanks,</div><div>Michal Derezinski.</div></div></blockquote></div><br><br clear="all"><span class="gmail-m_5277196854229442541HOEnZb"><font color="#888888"><div><br></div>--<span class="gmail-m_5277196854229442541Apple-converted-space"> </span><br><div class="gmail-m_5277196854229442541m_-4470384141015376847gmail_signature"><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="http://www.caam.rice.edu/~mk51/" target="_blank">http://www.caam.rice.edu/~mk51<wbr>/</a><br></div></div></div></font></span></div></div></div></blockquote></div><br></div></blockquote></div><br><br clear="all"><div><br></div>--<span class="gmail-m_5277196854229442541Apple-converted-space"> </span><br><div class="gmail-m_5277196854229442541gmail_signature"><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="http://www.caam.rice.edu/~mk51/" target="_blank">http://www.caam.rice.edu/~<wbr>mk51/</a></div></div></div></div></div></div></blockquote></div></div></div><br></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature"><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="http://www.caam.rice.edu/~mk51/" target="_blank">http://www.caam.rice.edu/~mk51/</a><br></div></div></div>

</div></div>