<div dir="ltr"><div class="gmail_quote"><div dir="ltr" class="gmail_attr"><br></div><br><div dir="ltr">Hi, <div> Thanks for your reply. I try to use PetscLogEvent(), and the result shows same conclusion.</div><div> What I have done is :</div><div>----------------</div><div> PetscLogEvent Mat_assemble_event, Mat_setvalue_event, Mat_setAsse_event;<br> PetscClassId classid;<br> PetscLogDouble user_event_flops;<br> PetscClassIdRegister("Test assemble and set value", &classid);<br> PetscLogEventRegister("Test only assemble", classid, &Mat_assemble_event);<br> PetscLogEventRegister("Test only set values", classid, &Mat_setvalue_event);<br> PetscLogEventRegister("Test both assemble and set values", classid, &Mat_setAsse_event);<br> PetscLogEventBegin(Mat_setAsse_event, 0, 0, 0, 0);<br> PetscLogEventBegin(Mat_setvalue_event, 0, 0, 0, 0);<br></div><div> ...compute elements and use MatSetValues. No call for assembly</div><div> PetscLogEventEnd(Mat_setvalue_event, 0, 0, 0, 0);<br><br> PetscLogEventBegin(Mat_assemble_event, 0, 0, 0, 0);<br> MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY);<br> MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY);<br> PetscLogEventEnd(Mat_assemble_event, 0, 0, 0, 0);<br> PetscLogEventEnd(Mat_setAsse_event, 0, 0, 0, 0);<br></div><div>----------------<br></div><div><br></div><div> And the output as follows. By the way, dose petsc recorde all time between PetscLogEventBegin and PetscLogEventEnd? or just test the time of petsc API?</div><div>----------------<br></div>Event Count Time (sec) Flop --- Global --- --- Stage ---- Total<br> Max Ratio <b>Max</b> Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s<br>64new 1 1.0 <b>2.3775e+02</b> 1.0 0.00e+00 0.0 6.2e+03 2.3e+04 9.0e+00 52 0 1 1 2 52 0 1 1 2 0<br>128new 1 1.0<b> 6.9945e+01</b> 1.0 0.00e+00 0.0 2.5e+04 1.1e+04 9.0e+00 30 0 1 1 2 30 0 1 1 2 0<br>256new 1 1.0 <b>1.7445e+01</b> 1.0 0.00e+00 0.0 9.9e+04 5.2e+03 9.0e+00 10 0 1 1 2 10 0 1 1 2 0<br><br>64:<br>only assemble 1 1.0 <b>2.6596e+02 </b>1.0 0.00e+00 0.0 7.0e+03 2.8e+05 1.1e+01 55 0 1 8 3 55 0 1 8 3 0<br>only setvalues 1 1.0 <b>1.9987e+02</b> 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 41 0 0 0 0 41 0 0 0 0 0<br>Test both 1 1.0 4.<b>6580e+02</b> 1.0 0.00e+00 0.0 7.0e+03 2.8e+05 1.5e+01 96 0 1 8 4 96 0 1 8 4 0<br><br>128:<br> only assemble 1 1.0 <b>6.9718e+01</b> 1.0 0.00e+00 0.0 2.6e+04 8.1e+04 1.1e+01 30 0 1 4 3 30 0 1 4 3 0<br>only setvalues 1 1.0 <b>1.4438e+02</b> 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 60 0 0 0 0 60 0 0 0 0 0<br>Test both 1 1.0 <b>2.1417e+02</b> 1.0 0.00e+00 0.0 2.6e+04 8.1e+04 1.5e+01 91 0 1 4 4 91 0 1 4 4 0<br><br>256:<br>only assemble 1 1.0 <b>1.7482e+01</b> 1.0 0.00e+00 0.0 1.0e+05 2.3e+04 1.1e+01 10 0 1 3 3 10 0 1 3 3 0<br>only setvalues 1 1.0 <b>1.3717e+02</b> 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 78 0 0 0 0 78 0 0 0 0 0<br><div>Test both 1 1.0 <b>1.5475e+02</b> 1.0 0.00e+00 0.0 1.0e+05 2.3e+04 1.5e+01 91 0 1 3 4 91 0 1 3 4 0 </div><div><br></div><div><br></div><div><br></div><div>Runfeng</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Barry Smith <<a href="mailto:bsmith@petsc.dev" target="_blank">bsmith@petsc.dev</a>> 于2023年6月30日周五 23:35写道:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>
You cannot look just at the VecAssemblyEnd() time, that will very likely give the wrong impression of the total time it takes to put the values in.<br>
<br>
You need to register a new Event and put a PetscLogEvent() just before you start generating the vector entries and calling VecSetValues() and put the PetscLogEventEnd() just after the VecAssemblyEnd() this is the only way to get an accurate accounting of the time.<br>
<br>
Barry<br>
<br>
<br>
> On Jun 30, 2023, at 11:21 AM, Runfeng Jin <<a href="mailto:jsfaraway@gmail.com" target="_blank">jsfaraway@gmail.com</a>> wrote:<br>
> <br>
> Hello!<br>
> <br>
> When I use PETSc build a sbaij matrix, I find a strange thing. When I increase the number of processors, the assemble time become smaller. All these are totally same matrix. The assemble time mainly arouse from message passing, which because I use dynamic workload that it is random for which elements are computed by which processor.<br>
> But from instinct, if use more processors, then more possible that the processor computes elements storing in other processors. But from the output of log_view, It seems when use more processors, the processors compute more elements storing in its local(infer from that, with more processors, less total amount of passed messages).<br>
> <br>
> What could cause this happened? Thank you!<br>
> <br>
> <br>
> Following is the output of log_view for 64\128\256 processors. Every row is time profiler of VecAssemblyEnd.<br>
> <br>
> ------------------------------------------------------------------------------------------------------------------------<br>
> processors Count Time (sec) Flop --- Global --- --- Stage ---- Total<br>
> Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s<br>
> 64 1 1.0 2.3775e+02 1.0 0.00e+00 0.0 6.2e+03 2.3e+04 9.0e+00 52 0 1 1 2 52 0 1 1 2 0<br>
> 128 1 1.0 6.9945e+01 1.0 0.00e+00 0.0 2.5e+04 1.1e+04 9.0e+00 30 0 1 1 2 30 0 1 1 2 0<br>
> 256 1 1.0 1.7445e+01 1.0 0.00e+00 0.0 9.9e+04 5.2e+03 9.0e+00 10 0 1 1 2 10 0 1 1 2 0<br>
> <br>
> Runfeng Jin<br>
<br>
</blockquote></div>
</div></div>