[petsc-users] Fwd: Smaller assemble time with increasing processors
Runfeng Jin
jsfaraway at gmail.com
Fri Jun 30 21:25:48 CDT 2023
Hi,
Thanks for your reply. I try to use PetscLogEvent(), and the result
shows same conclusion.
What I have done is :
----------------
PetscLogEvent Mat_assemble_event, Mat_setvalue_event, Mat_setAsse_event;
PetscClassId classid;
PetscLogDouble user_event_flops;
PetscClassIdRegister("Test assemble and set value", &classid);
PetscLogEventRegister("Test only assemble", classid,
&Mat_assemble_event);
PetscLogEventRegister("Test only set values", classid,
&Mat_setvalue_event);
PetscLogEventRegister("Test both assemble and set values", classid,
&Mat_setAsse_event);
PetscLogEventBegin(Mat_setAsse_event, 0, 0, 0, 0);
PetscLogEventBegin(Mat_setvalue_event, 0, 0, 0, 0);
...compute elements and use MatSetValues. No call for assembly
PetscLogEventEnd(Mat_setvalue_event, 0, 0, 0, 0);
PetscLogEventBegin(Mat_assemble_event, 0, 0, 0, 0);
MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY);
MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY);
PetscLogEventEnd(Mat_assemble_event, 0, 0, 0, 0);
PetscLogEventEnd(Mat_setAsse_event, 0, 0, 0, 0);
----------------
And the output as follows. By the way, dose petsc recorde all time
between PetscLogEventBegin and PetscLogEventEnd? or just test the time of
petsc API?
----------------
Event Count Time (sec) Flop
--- Global --- --- Stage ---- Total
Max Ratio *Max* Ratio Max Ratio Mess AvgLen
Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
64new 1 1.0 *2.3775e+02* 1.0 0.00e+00 0.0 6.2e+03 2.3e+04
9.0e+00 52 0 1 1 2 52 0 1 1 2 0
128new 1 1.0* 6.9945e+01* 1.0 0.00e+00 0.0 2.5e+04 1.1e+04
9.0e+00 30 0 1 1 2 30 0 1 1 2 0
256new 1 1.0 *1.7445e+01* 1.0 0.00e+00 0.0 9.9e+04 5.2e+03
9.0e+00 10 0 1 1 2 10 0 1 1 2 0
64:
only assemble 1 1.0 *2.6596e+02 *1.0 0.00e+00 0.0 7.0e+03 2.8e+05
1.1e+01 55 0 1 8 3 55 0 1 8 3 0
only setvalues 1 1.0 *1.9987e+02* 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 41 0 0 0 0 41 0 0 0 0 0
Test both 1 1.0 4.*6580e+02* 1.0 0.00e+00 0.0 7.0e+03 2.8e+05
1.5e+01 96 0 1 8 4 96 0 1 8 4 0
128:
only assemble 1 1.0 *6.9718e+01* 1.0 0.00e+00 0.0 2.6e+04 8.1e+04
1.1e+01 30 0 1 4 3 30 0 1 4 3 0
only setvalues 1 1.0 *1.4438e+02* 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 60 0 0 0 0 60 0 0 0 0 0
Test both 1 1.0 *2.1417e+02* 1.0 0.00e+00 0.0 2.6e+04 8.1e+04
1.5e+01 91 0 1 4 4 91 0 1 4 4 0
256:
only assemble 1 1.0 *1.7482e+01* 1.0 0.00e+00 0.0 1.0e+05 2.3e+04
1.1e+01 10 0 1 3 3 10 0 1 3 3 0
only setvalues 1 1.0 *1.3717e+02* 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 78 0 0 0 0 78 0 0 0 0 0
Test both 1 1.0 *1.5475e+02* 1.0 0.00e+00 0.0 1.0e+05 2.3e+04
1.5e+01 91 0 1 3 4 91 0 1 3 4 0
Runfeng
Barry Smith <bsmith at petsc.dev> 于2023年6月30日周五 23:35写道:
>
> You cannot look just at the VecAssemblyEnd() time, that will very
> likely give the wrong impression of the total time it takes to put the
> values in.
>
> You need to register a new Event and put a PetscLogEvent() just before
> you start generating the vector entries and calling VecSetValues() and put
> the PetscLogEventEnd() just after the VecAssemblyEnd() this is the only way
> to get an accurate accounting of the time.
>
> Barry
>
>
> > On Jun 30, 2023, at 11:21 AM, Runfeng Jin <jsfaraway at gmail.com> wrote:
> >
> > Hello!
> >
> > When I use PETSc build a sbaij matrix, I find a strange thing. When I
> increase the number of processors, the assemble time become smaller. All
> these are totally same matrix. The assemble time mainly arouse from message
> passing, which because I use dynamic workload that it is random for which
> elements are computed by which processor.
> > But from instinct, if use more processors, then more possible that the
> processor computes elements storing in other processors. But from the
> output of log_view, It seems when use more processors, the processors
> compute more elements storing in its local(infer from that, with more
> processors, less total amount of passed messages).
> >
> > What could cause this happened? Thank you!
> >
> >
> > Following is the output of log_view for 64\128\256 processors. Every
> row is time profiler of VecAssemblyEnd.
> >
> >
> ------------------------------------------------------------------------------------------------------------------------
> > processors Count Time (sec)
> Flop
> --- Global --- --- Stage
> ---- Total
> > Max Ratio Max
> Ratio Max Ratio Mess AvgLen Reduct
> %T %F %M %L %R %T %F %M %L %R Mflop/s
> > 64 1 1.0 2.3775e+02 1.0
> 0.00e+00 0.0 6.2e+03 2.3e+04 9.0e+00
> 52 0 1 1 2 52 0 1 1 2
> 0
> > 128 1 1.0 6.9945e+01 1.0
> 0.00e+00 0.0 2.5e+04 1.1e+04 9.0e+00
> 30 0 1 1 2 30 0 1 1 2
> 0
> > 256 1 1.0 1.7445e+01 1.0
> 0.00e+00 0.0 9.9e+04 5.2e+03 9.0e+00
> 10 0 1 1 2 10 0 1 1 2
> 0
> >
> > Runfeng Jin
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230701/3b81c396/attachment.html>
More information about the petsc-users
mailing list