[petsc-users] Poor weak scaling when solvingsuccessivelinearsystems
Michael Becker
Michael.Becker at physik.uni-giessen.de
Thu May 24 09:49:29 CDT 2018
Yes, the time increment is the problem. Not because of these 8% in
particular, but it gets worse with more processes.
Performance does improve drastically on one processor (which I
previously never tested); I attached the log_view file. If communication
speed is the problem, then I assume fewer processors per node would
improve performance and I could investigate that (principally). I assume
there's no way to reduce the data volume.
But thanks either way, this helped a lot.
Michael
Am 24.05.2018 um 15:22 schrieb Mark Adams:
> The KSPSolve time goes from 128 to 138 seconds in going from 125 to
> 1000 processes. Is this the problem?
>
> And as Lawrence pointed out there is a lot of "load" imbalance. (This
> could come from a poor network). VecAXPY has no communication and has
> significant imbalance. But you seem to have perfect actual load
> imbalance but this can come from cache effects....
>
> And you are spending almost half the solve time in VecScatter. If you
> really have this nice regular partitioning of the problem, then your
> communication is slow, even on 125 processors. (So it is not a scaling
> issue here, but if you do a one processor test you should see it).
>
> Note, AMG coarse grids get bigger as the problem gets bigger, so it is
> not perfectly scalable pre-asymptotically. Nothing is really, because
> you don't saturate communication until you have at least a 3^D process
> grid and various random things will cause some non-perfect weak speedup.
>
> Mark
>
>
> On Thu, May 24, 2018 at 5:10 AM, Michael Becker
> <Michael.Becker at physik.uni-giessen.de
> <mailto:Michael.Becker at physik.uni-giessen.de>> wrote:
>
> CG/GCR: I accidentally kept gcr in the batch file. That's still
> from when I was experimenting with the different methods. The
> performance is quite similar though.
>
> I use the following setup for the ksp object and the vectors:
>
> ierr=PetscInitialize(&argc, &argv, (char*)0,
> (char*)0);CHKERRQ(ierr);
>
> ierr=KSPCreate(PETSC_COMM_WORLD, &ksp);CHKERRQ(ierr);
>
> ierr=DMDACreate3d(PETSC_COMM_WORLD,DM_BOUNDARY_GHOSTED,DM_BOUNDARY_GHOSTED,DM_BOUNDARY_GHOSTED,
> DMDA_STENCIL_STAR, g_Nx, g_Ny, g_Nz, dims[0],
> dims[1], dims[2], 1, 1, l_Nx, l_Ny, l_Nz, &da);CHKERRQ(ierr);
> ierr=DMSetFromOptions(da);CHKERRQ(ierr);
> ierr=DMSetUp(da);CHKERRQ(ierr);
> ierr=KSPSetDM(ksp, da);CHKERRQ(ierr);
>
> ierr=DMCreateGlobalVector(da, &b);CHKERRQ(ierr);
>
> ierr=VecDuplicate(b, &x);CHKERRQ(ierr);
>
> ierr=DMCreateLocalVector(da, &l_x);CHKERRQ(ierr);
> ierr=VecSet(x,0);CHKERRQ(ierr);
> ierr=VecSet(b,0);CHKERRQ(ierr);
>
> For the 125 case the arrays l_Nx, l_Ny, l_Nz have dimension 5 and
> every element has value 30. VecGetLocalSize() returns 27000 for
> every rank. Is there something I didn't consider?
>
> Michael
>
>
>
> Am 24.05.2018 um 09:39 schrieb Lawrence Mitchell:
>>> On 24 May 2018, at 06:24, Michael Becker<Michael.Becker at physik.uni-giessen.de>
>>> <mailto:Michael.Becker at physik.uni-giessen.de> wrote:
>>>
>>> Could you have a look at the attached log_view files and tell me if something is particularly odd? The system size per processor is 30^3 and the simulation ran over 1000 timesteps, which means KSPsolve() was called equally often. I introduced two new logging states - one for the first solve and the final setup and one for the remaining solves.
>> The two attached logs use CG for the 125 proc run, but gcr for the 1000 proc run. Is this deliberate?
>>
>> 125 proc:
>>
>> -gamg_est_ksp_type cg
>> -ksp_norm_type unpreconditioned
>> -ksp_type cg
>> -log_view
>> -mg_levels_esteig_ksp_max_it 10
>> -mg_levels_esteig_ksp_type cg
>> -mg_levels_ksp_max_it 1
>> -mg_levels_ksp_norm_type none
>> -mg_levels_ksp_type richardson
>> -mg_levels_pc_sor_its 1
>> -mg_levels_pc_type sor
>> -pc_gamg_type classical
>> -pc_type gamg
>>
>> 1000 proc:
>>
>> -gamg_est_ksp_type cg
>> -ksp_norm_type unpreconditioned
>> -ksp_type gcr
>> -log_view
>> -mg_levels_esteig_ksp_max_it 10
>> -mg_levels_esteig_ksp_type cg
>> -mg_levels_ksp_max_it 1
>> -mg_levels_ksp_norm_type none
>> -mg_levels_ksp_type richardson
>> -mg_levels_pc_sor_its 1
>> -mg_levels_pc_type sor
>> -pc_gamg_type classical
>> -pc_type gamg
>>
>>
>> That aside, it looks like you have quite a bit of load imbalance. e.g. in the smoother, where you're doing MatSOR, you have:
>>
>> 125 proc:
>> Calls Time Max/Min time
>> MatSOR 47808 1.0 6.8888e+01 1.7
>>
>> 1000 proc:
>>
>> MatSOR 41400 1.0 6.3412e+01 1.6
>>
>> VecScatters show similar behaviour.
>>
>> How is your problem distributed across the processes?
>>
>> Cheers,
>>
>> Lawrence
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20180524/ab8bb3e1/attachment-0001.html>
More information about the petsc-users
mailing list