<div dir="ltr"><div dir="ltr">On Sat, Jul 13, 2019 at 11:20 AM Mohammed Mostafa <<a href="mailto:mo7ammedmostafa@gmail.com">mo7ammedmostafa@gmail.com</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div dir="auto">I am sorry but I don’t see what you mean by small times</div></div><div dir="auto">Although mat assembly is relatively smaller</div><div dir="auto">The cost of mat set values is still significant </div><div dir="auto">The same can be said for vec assembly </div><div dir="auto">Combined vec/mat assembly and matsetvalues constitute about 50% of the total cost of matrix construction</div></blockquote><div><br></div><div>This is why I asked you about scaling, since it is difficult to disentangle overheads from scalable work at 10^-2 seconds.</div><div><br></div><div>Second, if you look at the times for the PETSc example you ran, setvalues and assembly is clearly not 50% of the</div><div>time for construction. I cannot see your code, so we do not know exactly what the custom events are timing. I</div><div>would think about incrementally changing the example until you get what you want.</div><div><br></div><div> Thanks,</div><div><br></div><div> Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto">So is this problem of my matrix setup/ preallocation </div><div dir="auto"><br></div><div dir="auto">Or is this a hardware issue, for whatever reason the copy is overly slow</div><div dir="auto">The code was run on a single node </div><div dir="auto"><br></div><div dir="auto">Or is this function call overhead since matsetvalues is being called 1M times inside the for loop ( 170k times in each process)</div><div dir="auto"><br></div><div dir="auto">Thanks, Kamra</div><div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Jul 14, 2019 at 12:41 AM Matthew Knepley <<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">On Sat, Jul 13, 2019 at 9:56 AM Mohammed Mostafa <<a href="mailto:mo7ammedmostafa@gmail.com" target="_blank">mo7ammedmostafa@gmail.com</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><div><font size="4">Hello Matt,<br></font></div><div><font size="4"><br></font></div><div><font size="4">I revised my code and changed the way I create the rhs vector,</font></div><div><font size="4">previosly I was using vecCreateGhost just in case I need the ghost values, but for now I changed that to <br></font></div><div><font size="4">vecCreateMPI(.......)</font></div><div><font size="4">So maybe that was the cause of the scatter</font></div><div><font size="4">I am attaching with this email a new log output</font></div></div></div></blockquote><div><br></div><div>Okay, the times are now very small. How does it scale up?</div><div><br></div><div> Thanks,</div><div><br></div><div> Matt</div></div></div><div dir="ltr"><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><div><font size="4">Also regarding how I fill my petsc matrix,</font></div><div><font size="4">In my code I fill a temp CSR format matrix becasue otherwise I would need "MatSetValue" to fill the petsc mat element by element</font></div><div><font size="4">which is not recommmeded in the petsc manual and probably very expensive due to function call overhead<br></font></div><div><font size="4"><b>So after I create my matrix in CSR format, I fill the PETSC mat A as follows</b><br></font></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><font size="4">for (i = 0; i < nMatRows; i++) {<br> cffset = CSR_iptr[i];<br> row_index = row_gIndex[i];<br> nj = Eqn_nj[i];<br> MatSetValues(PhiEqnSolver.A, 1, &row_index, nj, CSR_jptr + offset, CSR_vptr + offset, INSERT_VALUES);<br> }</font></div></blockquote><div><div class="gmail_quote"><div class="gmail_attr"><font size="4"><i><b>After That</b></i></font></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail_attr"><font size="4"> VecAssemblyBegin(RHS);<br> VecAssemblyEnd(RHS);<br><br> MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY);<br> MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY);</font></div></blockquote><div class="gmail_attr"><font size="4"> </font></div><div class="gmail_attr"><font size="4"><i>I don't believe , I am doing anything special, if possible I would like to set the whole csr matrix at once in one command.</i></font></div><div class="gmail_attr"><font size="4"><i>I took a look at the code for MatSetValues, if I am understanding it correctly(hopefully) I think I could do it, maybe modify it or create a new routine entirely for this pupose.</i></font></div><div class="gmail_attr"><font size="4"><i>i.e. MatSetValuesFromCSR(.....)<br></i></font></div><div class="gmail_attr"><font size="4"><i>Or is there a particular reason why it has to be this way</i></font></div><div class="gmail_attr"><br></div><div class="gmail_attr"><font size="4">I also tried ksp ex3 but I slightly tweaked it to add a logging stage around the assembly and MatSetValues and I am attaching the modified example here as well.</font></div><div class="gmail_attr"><font size="4">Although in this example the matrix stash is not empty ( means off-processor values are being set ) but the timing values for roughly the same matrix size , the command I used is<br></font></div><div class="gmail_attr"><font size="4">mpirun -np 6 ./mod_ksp_ex3 -m 1000 -log_view -info <br></font></div><div class="gmail_attr"><br></div><div class="gmail_attr"><br></div><div class="gmail_attr">Regards,</div><div class="gmail_attr">Kamra<br></div><div class="gmail_attr"><br></div><div dir="ltr" class="gmail_attr">On Sat, Jul 13, 2019 at 1:43 PM Matthew Knepley <<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">On Fri, Jul 12, 2019 at 10:51 PM Mohammed Mostafa <<a href="mailto:mo7ammedmostafa@gmail.com" target="_blank">mo7ammedmostafa@gmail.com</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Hello Matt,</div><div>Attached is the dumped entire log output using -log_view and -info.</div></div></blockquote><div><br></div><div>In matrix construction, it looks like you have a mixture of load imbalance (see the imbalance in the Begin events)</div><div>and lots of Scatter messages in your assembly. We turn off MatSetValues() logging by default since it is usually</div><div>called many times, but you can explicitly turn it back on if you want. I don't think that is the problem here. Its easy</div><div>to see from examples (say SNES ex5) that it is not the major time sink. What is the Scatter doing?</div><div><br></div><div> Thanks,</div><div><br></div><div> Matt</div></div></div><div dir="ltr"><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Thanks,</div><div>Kamra<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Jul 12, 2019 at 9:23 PM Matthew Knepley <<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">On Fri, Jul 12, 2019 at 5:19 AM Mohammed Mostafa via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hello all,<div>I have a few question regarding Petsc,</div></div></blockquote><div><br></div><div>Please send the entire output of a run with all the logging turned on, using -log_view and -info.</div><div><br></div><div> Thanks,</div><div><br></div><div> Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Question 1:</div><div>For the profiling , is it possible to only show the user defined log events in the breakdown of each stage in Log-view.</div><div>I tried deactivating all ClassIDs, MAT,VEC, PC, KSP,PC,</div><div> PetscLogEventExcludeClass(MAT_CLASSID);<br> PetscLogEventExcludeClass(VEC_CLASSID);<br> PetscLogEventExcludeClass(KSP_CLASSID);<br> PetscLogEventExcludeClass(PC_CLASSID);<br></div><div><span style="color:rgb(0,0,0);font-family:"Times New Roman";font-size:medium">which should "Deactivates event logging for a PETSc object class in every stage" according to the manual.</span><br></div><div>however I still see them in the stage breakdown </div><div>--- Event Stage 1: Matrix Construction<br><br>BuildTwoSidedF 4 1.0 2.7364e-02 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 18 0 0 0 0 0<br>VecSet 1 1.0 4.5300e-06 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>VecAssemblyBegin 2 1.0 2.7344e-02 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 18 0 0 0 0 0<br>VecAssemblyEnd 2 1.0 8.3447e-06 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>VecScatterBegin 2 1.0 7.5102e-05 1.7 0.00e+00 0.0 3.6e+01 2.1e+03 0.0e+00 0 0 3 0 0 0 0 50 80 0 0<br>VecScatterEnd 2 1.0 3.5286e-05 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>MatAssemblyBegin 2 1.0 8.8930e-05 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>MatAssemblyEnd 2 1.0 1.3566e-02 1.1 0.00e+00 0.0 3.6e+01 5.3e+02 8.0e+00 0 0 3 0 6 10 0 50 20100 0<br>AssembleMats 2 1.0 3.9774e-02 1.7 0.00e+00 0.0 7.2e+01 1.3e+03 8.0e+00 0 0 7 0 6 28 0100100100 0 # USER EVENT<br>myMatSetValues 2 1.0 2.6931e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 19 0 0 0 0 0 # USER EVENT<br>setNativeMat 1 1.0 3.5613e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 24 0 0 0 0 0 # USER EVENT<br>setNativeMatII 1 1.0 4.7023e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 28 0 0 0 0 0 # USER EVENT<br>callScheme 1 1.0 2.2333e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 2 0 0 0 0 0 # USER EVENT<br></div><div><br></div><div>Also is possible to clear the logs so that I can write a separate profiling output file for each timestep ( since I am solving a transient problem and I want to know the change in performance as time goes by )</div><div>----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------<br></div><div>Question 2:</div><div>Regarding MatSetValues</div><div>Right now, I writing a finite volume code, due to algorithm requirement I have to write the matrix into local native format ( array of arrays) and then loop through rows and use MatSetValues to set the elements in "Mat A"</div><div>MatSetValues(A, 1, &row, nj, j_index, coefvalues, INSERT_VALUES);<br></div><div>but it is very slow and it is killing my performance</div><div>although the matrix was properly set using </div><div>MatCreateAIJ(PETSC_COMM_WORLD, this->local_size, this->local_size, PETSC_DETERMINE,<br> PETSC_DETERMINE, -1, d_nnz, -1, o_nnz, &A);<br></div><div>with d_nnz,and o_nnz properly assigned so no mallocs occur during matsetvalues and all inserted values are local so no off-processor values</div><div>So my question is it possible to set multiple rows at once hopefully all, I checked the manual and MatSetValues can only set dense matrix block because it seems that row by row is expensive<br></div><div>Or perhaps is it possible to copy all rows to the underlying matrix data, as I mentioned all values are local and no off-processor values ( stash is 0 )</div><div>[0] VecAssemblyBegin_MPI_BTS(): Stash has 0 entries, uses 0 mallocs.<br>[0] VecAssemblyBegin_MPI_BTS(): Block-Stash has 0 entries, uses 0 mallocs.<br>[0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.<br>[1] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.<br>[2] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.<br>[3] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.<br>[4] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.<br>[5] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.<br>[2] MatAssemblyEnd_SeqAIJ(): Matrix size: 186064 X 186064; storage space: 0 unneeded,743028 used<br>[1] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage space: 0 unneeded,742972 used<br>[1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br>[1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4<br>[1] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.<br>[2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br>[2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4<br>[2] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 186064) < 0.6. Do not use CompressedRow routines.<br>[4] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 186063; storage space: 0 unneeded,743093 used<br>[0] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage space: 0 unneeded,743036 used<br>[4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br>[4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4<br>[4] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 186063) < 0.6. Do not use CompressedRow routines.<br>[0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br>[5] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage space: 0 unneeded,742938 used<br>[5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br>[5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4<br>[5] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.<br>[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4<br>[0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.<br>[3] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 186063; storage space: 0 unneeded,743049 used<br>[3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br>[3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4<br>[3] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 186063) < 0.6. Do not use CompressedRow routines.<br>[2] MatAssemblyEnd_SeqAIJ(): Matrix size: 186064 X 685; storage space: 0 unneeded,685 used<br>[4] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 649; storage space: 0 unneeded,649 used<br>[4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br>[4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1<br>[4] MatCheckCompressedRow(): Found the ratio (num_zerorows 185414)/(num_localrows 186063) > 0.6. Use CompressedRow routines.<br>[2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br>[2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1<br>[2] MatCheckCompressedRow(): Found the ratio (num_zerorows 185379)/(num_localrows 186064) > 0.6. Use CompressedRow routines.<br>[1] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 1011; storage space: 0 unneeded,1011 used<br>[5] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 1137; storage space: 0 unneeded,1137 used<br>[5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br>[5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1<br>[5] MatCheckCompressedRow(): Found the ratio (num_zerorows 184925)/(num_localrows 186062) > 0.6. Use CompressedRow routines.<br>[1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br>[1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1<br>[3] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 658; storage space: 0 unneeded,658 used<br>[0] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 648; storage space: 0 unneeded,648 used<br>[1] MatCheckCompressedRow(): Found the ratio (num_zerorows 185051)/(num_localrows 186062) > 0.6. Use CompressedRow routines.<br>[0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br>[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1<br>[0] MatCheckCompressedRow(): Found the ratio (num_zerorows 185414)/(num_localrows 186062) > 0.6. Use CompressedRow routines.<br>[3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br>[3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1<br>[3] MatCheckCompressedRow(): Found the ratio (num_zerorows 185405)/(num_localrows 186063) > 0.6. Use CompressedRow routines.<br></div><div><br></div><div>----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------</div><div>Question 3:</div><div>If all matrix and vector inserted data are local, what part of the vec/mat assembly consumes time because matsetvalues and matassembly consume more time than matrix builder<br></div><div>Also this is not just for the first time MAT_FINAL_ASSEMBLY</div><div><br></div><div><br></div><div>For context the matrix in the above is nearly 1Mx1M partitioned over six processes and it was NOT built using DM </div><div><br></div><div>Finally the configure options are:</div><div> </div><div>Configure options:</div><div>PETSC_ARCH=release3 -with-debugging=0 COPTFLAGS="-O3 -march=native -mtune=native" CXXOPTFLAGS="-O3 -march=native -mtune=native" FOPTFLAGS="-O3 -march=native -mtune=native" --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --download-metis --download-hypre<br></div><div><br></div><div>Sorry for such long question and thanks in advance</div><div>Thanks </div><div>M. Kamra</div></div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail-m_-4311703343741395781m_9002171830571990204gmail-m_-7647568561877543171m_5481170975163284759m_-4370509231447847667gmail-m_5288362906312427502gmail-m_-2909258851924680987gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br></div></div></div></div></div></div></div></div>
</blockquote></div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail-m_-4311703343741395781m_9002171830571990204gmail-m_-7647568561877543171m_5481170975163284759m_-4370509231447847667gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br></div></div></div></div></div></div></div></div>
</blockquote></div></div></div>
</div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail-m_-4311703343741395781m_9002171830571990204gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br></div></div></div></div></div></div></div></div>
</blockquote></div></div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br></div></div></div></div></div></div></div></div>