<html>

  <head>

    <meta http-equiv="content-type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    Dear All,<br>

    <br>

    we are performing a weak scaling test of the PETSc (v3.9.0) GAMG

    preconditioner when applied to the linear system arising<br>

    from the <b>conforming unfitted FE discretization </b>(using Q1

    Lagrangian FEs) of a 3D PDE Poisson problem, where <br>

    the boundary of the domain (a popcorn flake)  is described as a

    zero-level-set embedded within a uniform background <br>

    (Cartesian-like) hexahedral mesh. Details underlying the FEM

    formulation can be made available on demand if you <br>

    believe that this might be helpful, but let me just point out that

    it is designed such that it addresses the well-known<br>

    ill-conditioning issues of unfitted FE discretizations due to the

    small cut cell problem. <br>

    <br>

    The weak scaling test is set up as follows. We start from a single

    cube background mesh, and refine it uniformly several<br>

    steps, until we have approximately either 10**3 (load1), 20**3

    (load2), or 40**3 (load3) hexahedra/MPI task when <br>

    distributing it over 4 MPI tasks. The benchmark is scaled such that

    the next larger scale problem to be tested is obtained<br>

    by uniformly refining the mesh from the previous scale and running

    it on 8x times the number of MPI tasks that we used<br>

    in the previous scale.  As a result, we obtain three weak scaling

    curves for each of the three fixed loads per MPI task<br>

    above, on the following total number of MPI tasks: 4, 32, 262, 2097,

    16777. The underlying mesh is not partitioned among <br>

    MPI tasks using ParMETIS (unstructured multilevel graph

    partitioning)  nor optimally by hand, but following the so-called <br>

    z-shape space-filling curves provided by an underlying octree-like

    mesh handler (i.e., p4est library).<br>

    <br>

    I configured the preconditioned linear solver as follows:<br>

    <br>

    -ksp_type cg<br>

    -ksp_monitor<br>

    -ksp_rtol 1.0e-6<br>

    -ksp_converged_reason<br>

    -ksp_max_it 500<br>

    -ksp_norm_type unpreconditioned<br>

    -ksp_view<br>

    -log_view<br>

    <br>

    -pc_type gamg<br>

    -pc_gamg_type agg<br>

    -mg_levels_esteig_ksp_type cg<br>

    -mg_coarse_sub_pc_type cholesky<br>

    -mg_coarse_sub_pc_factor_mat_ordering_type nd<br>

    -pc_gamg_process_eq_limit 50<br>

    -pc_gamg_square_graph 0<br>

    -pc_gamg_agg_nsmooths 1<br>

    <br>

    Raw timings (in seconds) of the preconditioner set up and PCG

    iterative solution stage, and number of iterations are as follows:<br>

    <br>

    **preconditioner set up**<br>

    (load1): [0.02542160451, 0.05169247743, 0.09266782179, 0.2426272957,

    13.64161944]<br>

    (load2): [0.1239175797  , 0.1885528499  , 0.2719282564  ,

    0.4783878336, 13.37947339]<br>

    (load3): [0.6565349903  , 0.9435049873  , 1.299908397    ,

    1.916243652  , 16.02904088]<br>

    <br>

    **PCG stage**<br>

    (load1): [0.003287350759, 0.008163803257, 0.03565631993,

    0.08343045413, 0.6937994603]<br>

    (load2): [0.0205939794    , 0.03594723623  , 0.07593298424,

    0.1212046621  , 0.6780373845]<br>

    (load3): [0.1310882876    , 0.3214917686    , 0.5532023879  ,

    0.766881627    , 1.485446003]<br>

    <br>

    **number of PCG iterations**<br>

    (load1): [5, 8, 11, 13, 13]<br>

    (load2): [7, 10, 12, 13, 13]<br>

    (load3): [8, 10, 12, 13, 13]<br>

    <br>

    It can be observed that both the number of linear solver iterations

    and the PCG stage timings (weakly) <br>

    scale remarkably, but t<b>here is a significant time increase when

      scaling the problem from 2097 to 16777 MPI tasks </b><b><br>

    </b><b>for the preconditioner setup stage</b> (e.g., 1.916243652 vs

    16.02904088 sec. with 40**3 cells per MPI task).<br>

    I gathered the combined output of -ksp_view and -log_view (only) for

    all the points involving the load3 weak scaling<br>

    test (find them attached to this message). Please note that within

    each run, I execute the these two stages up-to<br>

    three times, and this influences absolute timings given in 

    -log_view.<br>

    <br>

    Looking at the output of -log_view, it is very strange to me, e.g.,

    that the stage labelled as "Graph" <br>

    does not scale properly as it is just a call to MatDuplicate if the

    block size of the matrix is 1 (our case), and<br>

    I guess that it is just a local operation that does not require any

    communication.<br>

    What I am missing here? The load does not seem to be unbalanced

    looking at the "Ratio" column.<br>

    <br>

    I wonder whether the observed behaviour is as expected, or this a

    miss-configuration of the solver from our side.<br>

    I played (quite a lot) with several parameter-value combinations,

    and the configuration above is the one that led to fastest <br>

    execution  (from the ones tested, that might be incomplete, I can

    also provide further feedback if helpful).<br>

    Any feedback that we can get from your experience in order to find

    the cause(s) of this issue and a mitigating solution<br>

    will be of high added value.<br>

    <br>

    Thanks very much in advance!<br>

    Best regards,<br>

     Alberto.<br>

    <pre class="moz-signature" cols="72">-- 

Alberto F. Martín-Huertas

Senior Researcher, PhD. Computational Science

Centre Internacional de Mètodes Numèrics a l'Enginyeria (CIMNE)

Parc Mediterrani de la Tecnologia, UPC

Esteve Terradas 5, Building C3, Office 215,

08860 Castelldefels (Barcelona, Spain)

Tel.: (+34) 9341 34223

<a class="moz-txt-link-abbreviated" href="mailto:e-mail:amartin@cimne.upc.edu">e-mail:amartin@cimne.upc.edu</a>

FEMPAR project co-founder

web: <a class="moz-txt-link-freetext" href="http://www.fempar.org">http://www.fempar.org</a> 

________________

IMPORTANT NOTICE

All personal data contained on this mail will be processed confidentially and registered in a file property of CIMNE in

order to manage corporate communications. You may exercise the rights of access, rectification, erasure and object by

letter sent to Ed. C1 Campus Norte UPC. Gran Capitán s/n Barcelona.

</pre>

  </body>

</html>