[petsc-users] PETSc (3.9.0) GAMG weak scaling test issue

Mon Nov 19 05:26:00 CST 2018

On Mon, Nov 19, 2018 at 5:52 AM "Alberto F. Martín" <amartin at cimne.upc.edu>
wrote:

> Dear Mark, Dear Matthew,
>
> in order to discard load imbalance as the cause of the reported weak
> scaling issue in the GAMG preconditioner
> set-up stage (as you said, we were feeding GAMG with a suboptimal mesh
> distribution, having empty processors,
> among others), we simplified the weak scaling test by considering the
> standard body-fitted trilinear (Q1) FE discretization
> of the 3D Poisson problem on a unit cube discretized with a *uniform,
> structured hexahedral mesh,* *partitioned*
> *optimally (by hand) among processors*, with a fixed load/core of 30**3
> hexahedra/core. Thus, all processors have the same load
> (up-to strong Dirichlet boundary conditions on the subdomains touching the
> global boundary), and the edge-cut is minimum.
>
> We used the following GAMG preconditioner options:
>
> -pc_type gamg
> -pc_gamg_type agg
> -pc_gamg_est_ksp_type cg
> -mg_levels_esteig_ksp_type cg
> -mg_coarse_sub_pc_type cholesky
> -mg_coarse_sub_pc_factor_mat_ordering_type nd
> -pc_gamg_process_eq_limit 50
> -pc_gamg_square_graph 10
> -pc_gamg_agg_nsmooths 1
>
> The results that we obtained for 48 (4x4x3 subdomains),  10,368 (24x24x18
> subdomains),  and 16,464
> (28x28x21 subdomains) CPU cores are as follows:
>
> **preconditioner set up**
> [0.9844961860, *7.017674042*, *12.10154881*]
>
> **PCG stage**
> [0.5849160422, 1.515251888, 1.859617710]
>
> **number of PCG iterations**
> [9,14,15]
>
> As you can observe, *there is still a significant time increase when
> scaling the problem from 48 to 10K/16K MPI tasks*
> *for the preconditioner setup stage. *This time increase is not as
> significant for the PCG stage.
>

Actually, its almost perfect for PCG, given that

  14/9 * 0.98   = 1.52 (Eff 100%)
  15/14 * 1.52 = 1.63 (Eff 88%)

Mark would have better comments on the scalability of the setup stage.

> Please find attached the combined
> output of -ksp_view and -log_view for these three points of the weak
> scaling curve.
>
> Given these results, I am starting to suspect that something within the
> underlying software + hardware stack might be
> responsible for this. I am using OpenMPI 1.10.7 + Intel compilers version
> 18.0. The underlying supercomputer is MN-IV at
> BSC (https://www.bsc.es/marenostrum/marenostrum/technical-information).
> Have you ever conducted a weak scaling test
> of GAMG with OpenMPI on a similar computer architecture? Can you share
> your experience with us?
> (versions tested, outcome, etc.)
>
> We also tried an alternative MPI library, Intel(R) MPI Library for Linux*
> OS, Version 2018 Update 4 Build 20180823 (id: 18555),
> *without success. *For this MPI library, the preconditioner set-up stage
> crashes (find attached stack frames, and internal MPI library
> errors) for the largest two core counts (it did not crash for the 48 CPU
> cores case), while it did not crash with OpenMPI 1.10.7.
> Have you ever experienced errors like the ones
> attached? Is there anyway to set up PETSc such that the subroutine that
> crashes is replaced by an alternative implementation of
> the same concept? (this would be just a workaround).
>

It certainly feels like an MPI (or driver) bug:

libmpi.so.12       00007F460E3AEBC8  PMPIDI_CH3I_Progr     Unknown  Unknown
libmpi.so.12       00007F460E72B90C  MPI_Testall           Unknown  Unknown
libpetsc.so.3.9.0  00007F4607391FFE  PetscCommBuildTwo     Unknown  Unknown

You can try another variant using

  -build_twosided <allreduce|ibarrier|redscatter>

I think ibarrier is currently the default if supported, but -help should
tell you.

  Thanks,

    Matt

> It might be a BUG in the Intel MPI library, although I cannot confirm it.
> We also got
> these errors with the unfitted FEM+space-filling curves version of our
> code.
>
> Thanks a lot for your help and valuable feedback!
> Best regards,
>  Alberto.
>
>
>
>
>
>
>
>
>
> On 08/11/18 17:29, Mark Adams wrote:
>
>
>> I did not configured PETSc with ParMetis support. Should I?
>>
>> I figured it out when I tried to use "-pc_gamg_repartition". PETSc
>> complained that it was not compiled with ParMetis support.
>>
>
> You need ParMetis, or some parallel mesh partitioner, configured to use
> repartitioning. I would guess that "-pc_gamg_repartition" would not help
> and might hurt, because it just does the coarse grids, not the fine grid.
> But it is worth a try. Just configure with --download-parmetis
>
> The problem is that you are using space filling curves on the background
> grid and are getting empty processors. Right?  The mesh setup phase is not
> super optimized, but your times
>
> And you said in your attachment that you added the near null space, but
> just the constant vector. I trust you mean the three translational rigid
> body modes. That is the default and so you should not see any difference.
> If you added one vector of all 1s then that would be bad. You also want the
> rotational rigid body modes. Now, you are converging pretty well and if
> your solution does not have much rotation in it the the rotational modes
> are not needed, but they are required for optimality in general.
>
>
>
> --
> Alberto F. Martín-Huertas
> Senior Researcher, PhD. Computational Science
> Centre Internacional de Mètodes Numèrics a l'Enginyeria (CIMNE)
> Parc Mediterrani de la Tecnologia, UPC
> Esteve Terradas 5, Building C3, Office 215,
> 08860 Castelldefels (Barcelona, Spain)
> Tel.: (+34) 9341 34223e-mail:amartin at cimne.upc.edu
>
> FEMPAR project co-founder
> web: http://www.fempar.org
>
> ________________
> IMPORTANT NOTICE
> All personal data contained on this mail will be processed confidentially and registered in a file property of CIMNE in
> order to manage corporate communications. You may exercise the rights of access, rectification, erasure and object by
> letter sent to Ed. C1 Campus Norte UPC. Gran Capitán s/n Barcelona.
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20181119/75b60b8f/attachment.html>