[petsc-users] Scaling with number of cores

Thu Nov 5 20:47:39 CST 2015

Hi,

I have removed the nullspace and attached the new logs.

Thank you

Yours sincerely,

TAY wee-beng

On 6/11/2015 12:07 AM, Barry Smith wrote:
>> On Nov 5, 2015, at 9:58 AM, TAY wee-beng <zonexo at gmail.com> wrote:
>>
>> Sorry I realised that I didn't use gamg and that's why. But if I use gamg, the 8 core case worked, but the 64 core case shows p diverged.
>     Where is the log file for the 8 core case? And where is all the output from where it fails with 64 cores? Include -ksp_monitor_true_residual and -ksp_converged_reason
>
>    Barry
>
>> Why is this so? Btw, I have also added nullspace in my code.
>>
>> Thank you.
>>
>> Yours sincerely,
>>
>> TAY wee-beng
>>
>> On 5/11/2015 12:03 PM, Barry Smith wrote:
>>>    There is a problem here. The -log_summary doesn't show all the events associated with the -pc_type gamg preconditioner it should have rows like
>>>
>>> VecDot                 2 1.0 6.1989e-06 1.0 1.00e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1613
>>> VecMDot              134 1.0 5.4145e-04 1.0 1.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  3  0  0  0   0  3  0  0  0  3025
>>> VecNorm              154 1.0 2.4176e-04 1.0 3.82e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  1578
>>> VecScale             148 1.0 1.6928e-04 1.0 1.76e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1039
>>> VecCopy              106 1.0 1.2255e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecSet               474 1.0 5.1236e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecAXPY               54 1.0 1.3471e-04 1.0 2.35e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1742
>>> VecAYPX              384 1.0 5.7459e-04 1.0 4.94e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0   860
>>> VecAXPBYCZ           192 1.0 4.7398e-04 1.0 9.88e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0  2085
>>> VecWAXPY               2 1.0 7.8678e-06 1.0 5.00e+03 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   636
>>> VecMAXPY             148 1.0 8.1539e-04 1.0 1.96e+06 1.0 0.0e+00 0.0e+00 0.0e+00  1  3  0  0  0   1  3  0  0  0  2399
>>> VecPointwiseMult      66 1.0 1.1253e-04 1.0 6.79e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   604
>>> VecScatterBegin       45 1.0 6.3419e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecSetRandom           6 1.0 3.0994e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecReduceArith         4 1.0 1.3113e-05 1.0 2.00e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1525
>>> VecReduceComm          2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecNormalize         148 1.0 4.4799e-04 1.0 5.27e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  1177
>>> MatMult              424 1.0 8.9276e-03 1.0 2.09e+07 1.0 0.0e+00 0.0e+00 0.0e+00  7 37  0  0  0   7 37  0  0  0  2343
>>> MatMultAdd            48 1.0 5.0926e-04 1.0 1.05e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0  2069
>>> MatMultTranspose      48 1.0 9.8586e-04 1.0 1.05e+06 1.0 0.0e+00 0.0e+00 0.0e+00  1  2  0  0  0   1  2  0  0  0  1069
>>> MatSolve              16 1.0 2.2173e-05 1.0 1.02e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   460
>>> MatSOR               354 1.0 1.0547e-02 1.0 1.72e+07 1.0 0.0e+00 0.0e+00 0.0e+00  9 31  0  0  0   9 31  0  0  0  1631
>>> MatLUFactorSym         2 1.0 4.7922e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatLUFactorNum         2 1.0 2.5272e-05 1.0 7.75e+03 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   307
>>> MatScale              18 1.0 1.7142e-04 1.0 1.50e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   874
>>> MatResidual           48 1.0 1.0548e-03 1.0 2.33e+06 1.0 0.0e+00 0.0e+00 0.0e+00  1  4  0  0  0   1  4  0  0  0  2212
>>> MatAssemblyBegin      57 1.0 4.7684e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatAssemblyEnd        57 1.0 1.9786e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
>>> MatGetRow          21616 1.0 1.8497e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
>>> MatGetRowIJ            2 1.0 6.9141e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatGetOrdering         2 1.0 6.0797e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatCoarsen             6 1.0 9.3222e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatZeroEntries         2 1.0 3.9101e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatAXPY                6 1.0 1.7998e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
>>> MatFDColorCreate       1 1.0 3.2902e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatFDColorSetUp        1 1.0 1.6739e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>> MatFDColorApply        2 1.0 1.3199e-03 1.0 2.41e+06 1.0 0.0e+00 0.0e+00 0.0e+00  1  4  0  0  0   1  4  0  0  0  1826
>>> MatFDColorFunc        42 1.0 7.4601e-04 1.0 2.20e+06 1.0 0.0e+00 0.0e+00 0.0e+00  1  4  0  0  0   1  4  0  0  0  2956
>>> MatMatMult             6 1.0 5.1048e-03 1.0 1.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00  4  2  0  0  0   4  2  0  0  0   241
>>> MatMatMultSym          6 1.0 3.2601e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
>>> MatMatMultNum          6 1.0 1.8158e-03 1.0 1.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0   679
>>> MatPtAP                6 1.0 2.1328e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 0.0e+00 18 11  0  0  0  18 11  0  0  0   283
>>> MatPtAPSymbolic        6 1.0 1.0073e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  8  0  0  0  0   8  0  0  0  0     0
>>> MatPtAPNumeric         6 1.0 1.1230e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 0.0e+00  9 11  0  0  0   9 11  0  0  0   537
>>> MatTrnMatMult          2 1.0 7.2789e-04 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0    75
>>> MatTrnMatMultSym       2 1.0 5.7006e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatTrnMatMultNum       2 1.0 1.5473e-04 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   352
>>> MatGetSymTrans         8 1.0 3.1638e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> KSPGMRESOrthog       134 1.0 1.3156e-03 1.0 3.28e+06 1.0 0.0e+00 0.0e+00 0.0e+00  1  6  0  0  0   1  6  0  0  0  2491
>>> KSPSetUp              24 1.0 4.6754e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> KSPSolve               2 1.0 1.1291e-01 1.0 5.32e+07 1.0 0.0e+00 0.0e+00 0.0e+00 94 95  0  0  0  94 95  0  0  0   471
>>> PCGAMGGraph_AGG        6 1.0 1.2108e-02 1.0 1.82e+04 1.0 0.0e+00 0.0e+00 0.0e+00 10  0  0  0  0  10  0  0  0  0     2
>>> PCGAMGCoarse_AGG       6 1.0 1.1127e-03 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0    49
>>> PCGAMGProl_AGG         6 1.0 4.1062e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 34  0  0  0  0  34  0  0  0  0     0
>>> PCGAMGPOpt_AGG         6 1.0 1.1200e-02 1.0 5.98e+06 1.0 0.0e+00 0.0e+00 0.0e+00  9 11  0  0  0   9 11  0  0  0   534
>>> GAMG: createProl       6 1.0 6.5530e-02 1.0 6.06e+06 1.0 0.0e+00 0.0e+00 0.0e+00 55 11  0  0  0  55 11  0  0  0    92
>>>    Graph               12 1.0 1.1692e-02 1.0 1.82e+04 1.0 0.0e+00 0.0e+00 0.0e+00 10  0  0  0  0  10  0  0  0  0     2
>>>    MIS/Agg              6 1.0 1.4496e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>    SA: col data         6 1.0 7.1526e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>    SA: frmProl0         6 1.0 4.0917e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 34  0  0  0  0  34  0  0  0  0     0
>>>    SA: smooth           6 1.0 1.1198e-02 1.0 5.98e+06 1.0 0.0e+00 0.0e+00 0.0e+00  9 11  0  0  0   9 11  0  0  0   534
>>> GAMG: partLevel        6 1.0 2.1341e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 0.0e+00 18 11  0  0  0  18 11  0  0  0   283
>>> PCSetUp                4 1.0 8.8020e-02 1.0 1.21e+07 1.0 0.0e+00 0.0e+00 0.0e+00 74 22  0  0  0  74 22  0  0  0   137
>>> PCSetUpOnBlocks       16 1.0 1.8382e-04 1.0 7.75e+03 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    42
>>> PCApply               16 1.0 2.3858e-02 1.0 3.91e+07 1.0 0.0e+00 0.0e+00 0.0e+00 20 70  0  0  0  20 70  0  0  0  1637
>>>
>>>
>>> Are you sure you ran with -pc_type gamg ? What about running with -info does it print anything about gamg? What about -ksp_view does it indicate it is using the gamg preconditioner?
>>>
>>>
>>>> On Nov 4, 2015, at 9:30 PM, TAY wee-beng <zonexo at gmail.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I have attached the 2 logs.
>>>>
>>>> Thank you
>>>>
>>>> Yours sincerely,
>>>>
>>>> TAY wee-beng
>>>>
>>>> On 4/11/2015 1:11 AM, Barry Smith wrote:
>>>>>     Ok, the convergence looks good. Now run on 8 and 64 processes as before with -log_summary and not -ksp_monitor to see how it scales.
>>>>>
>>>>>    Barry
>>>>>
>>>>>> On Nov 3, 2015, at 6:49 AM, TAY wee-beng <zonexo at gmail.com> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I tried and have attached the log.
>>>>>>
>>>>>> Ya, my Poisson eqn has Neumann boundary condition. Do I need to specify some null space stuff?  Like KSPSetNullSpace or MatNullSpaceCreate?
>>>>>>
>>>>>> Thank you
>>>>>>
>>>>>> Yours sincerely,
>>>>>>
>>>>>> TAY wee-beng
>>>>>>
>>>>>> On 3/11/2015 12:45 PM, Barry Smith wrote:
>>>>>>>> On Nov 2, 2015, at 10:37 PM, TAY wee-beng<zonexo at gmail.com>  wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I tried :
>>>>>>>>
>>>>>>>> 1. -poisson_pc_gamg_agg_nsmooths 1 -poisson_pc_type gamg
>>>>>>>>
>>>>>>>> 2. -poisson_pc_type gamg
>>>>>>>     Run with -poisson_ksp_monitor_true_residual -poisson_ksp_monitor_converged_reason
>>>>>>> Does your poisson have Neumann boundary conditions? Do you have any zeros on the diagonal for the matrix (you shouldn't).
>>>>>>>
>>>>>>>    There may be something wrong with your poisson discretization that was also messing up hypre
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Both options give:
>>>>>>>>
>>>>>>>>     1      0.00150000      0.00000000      0.00000000 1.00000000             NaN             NaN             NaN
>>>>>>>> M Diverged but why?, time =            2
>>>>>>>> reason =           -9
>>>>>>>>
>>>>>>>> How can I check what's wrong?
>>>>>>>>
>>>>>>>> Thank you
>>>>>>>>
>>>>>>>> Yours sincerely,
>>>>>>>>
>>>>>>>> TAY wee-beng
>>>>>>>>
>>>>>>>> On 3/11/2015 3:18 AM, Barry Smith wrote:
>>>>>>>>>     hypre is just not scaling well here. I do not know why. Since hypre is a block box for us there is no way to determine why the poor scaling.
>>>>>>>>>
>>>>>>>>>     If you make the same two runs with -pc_type gamg there will be a lot more information in the log summary about in what routines it is scaling well or poorly.
>>>>>>>>>
>>>>>>>>>    Barry
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On Nov 2, 2015, at 3:17 AM, TAY wee-beng<zonexo at gmail.com>  wrote:
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I have attached the 2 files.
>>>>>>>>>>
>>>>>>>>>> Thank you
>>>>>>>>>>
>>>>>>>>>> Yours sincerely,
>>>>>>>>>>
>>>>>>>>>> TAY wee-beng
>>>>>>>>>>
>>>>>>>>>> On 2/11/2015 2:55 PM, Barry Smith wrote:
>>>>>>>>>>>    Run (158/2)x(266/2)x(150/2) grid on 8 processes  and then (158)x(266)x(150) on 64 processors  and send the two -log_summary results
>>>>>>>>>>>
>>>>>>>>>>>    Barry
>>>>>>>>>>>
>>>>>>>>>>>   
>>>>>>>>>>>> On Nov 2, 2015, at 12:19 AM, TAY wee-beng<zonexo at gmail.com>  wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I have attached the new results.
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you
>>>>>>>>>>>>
>>>>>>>>>>>> Yours sincerely,
>>>>>>>>>>>>
>>>>>>>>>>>> TAY wee-beng
>>>>>>>>>>>>
>>>>>>>>>>>> On 2/11/2015 12:27 PM, Barry Smith wrote:
>>>>>>>>>>>>>    Run without the -momentum_ksp_view -poisson_ksp_view and send the new results
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    You can see from the log summary that the PCSetUp is taking a much smaller percentage of the time meaning that it is reusing the preconditioner and not rebuilding it each time.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Barry
>>>>>>>>>>>>>
>>>>>>>>>>>>>    Something makes no sense with the output: it gives
>>>>>>>>>>>>>
>>>>>>>>>>>>> KSPSolve             199 1.0 2.3298e+03 1.0 5.20e+09 1.8 3.8e+04 9.9e+05 5.0e+02 90100 66100 24  90100 66100 24   165
>>>>>>>>>>>>>
>>>>>>>>>>>>> 90% of the time is in the solve but there is no significant amount of time in other events of the code which is just not possible. I hope it is due to your IO.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Nov 1, 2015, at 10:02 PM, TAY wee-beng<zonexo at gmail.com>  wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have attached the new run with 100 time steps for 48 and 96 cores.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Why does the number of processes increase so much? Is there something wrong with my coding? Seems to be so too for my new run.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thank you
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yours sincerely,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> TAY wee-beng
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 2/11/2015 9:49 AM, Barry Smith wrote:
>>>>>>>>>>>>>>>    If you are doing many time steps with the same linear solver then you MUST do your weak scaling studies with MANY time steps since the setup time of AMG only takes place in the first stimestep. So run both 48 and 96 processes with the same large number of time steps.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    Barry
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Nov 1, 2015, at 7:35 PM, TAY wee-beng<zonexo at gmail.com>  wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Sorry I forgot and use the old a.out. I have attached the new log for 48cores (log48), together with the 96cores log (log96).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Why does the number of processes increase so much? Is there something wrong with my coding?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the preconditioner, what must I do? Or what must I not do?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Lastly, I only simulated 2 time steps previously. Now I run for 10 timesteps (log48_10). Is it building the preconditioner at every timestep?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Also, what about momentum eqn? Is it working well?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I will try the gamg later too.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thank you
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yours sincerely,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> TAY wee-beng
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 2/11/2015 12:30 AM, Barry Smith wrote:
>>>>>>>>>>>>>>>>>    You used gmres with 48 processes but richardson with 96. You need to be careful and make sure you don't change the solvers when you change the number of processors since you can get very different inconsistent results
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     Anyways all the time is being spent in the BoomerAMG algebraic multigrid setup and it is is scaling badly. When you double the problem size and number of processes it went from 3.2445e+01 to 4.3599e+02 seconds.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> PCSetUp                3 1.0 3.2445e+01 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 62  8  0  0  4  62  8  0  0  5    11
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> PCSetUp                3 1.0 4.3599e+02 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 4.0e+00 85 18  0  0  6  85 18  0  0  6     2
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    Now is the Poisson problem changing at each timestep or can you use the same preconditioner built with BoomerAMG for all the time steps? Algebraic multigrid has a large set up time that you often doesn't matter if you have many time steps but if you have to rebuild it each timestep it is too large?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    You might also try -pc_type gamg and see how PETSc's algebraic multigrid scales for your problem/machine.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    Barry
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Nov 1, 2015, at 7:30 AM, TAY wee-beng<zonexo at gmail.com>  wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 1/11/2015 10:00 AM, Barry Smith wrote:
>>>>>>>>>>>>>>>>>>>> On Oct 31, 2015, at 8:43 PM, TAY wee-beng<zonexo at gmail.com>  wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On 1/11/2015 12:47 AM, Matthew Knepley wrote:
>>>>>>>>>>>>>>>>>>>>> On Sat, Oct 31, 2015 at 11:34 AM, TAY wee-beng<zonexo at gmail.com>  wrote:
>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I understand that as mentioned in the faq, due to the limitations in memory, the scaling is not linear. So, I am trying to write a proposal to use a supercomputer.
>>>>>>>>>>>>>>>>>>>>> Its specs are:
>>>>>>>>>>>>>>>>>>>>> Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of memory per node)
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 8 cores / processor
>>>>>>>>>>>>>>>>>>>>> Interconnect: Tofu (6-dimensional mesh/torus) Interconnect
>>>>>>>>>>>>>>>>>>>>> Each cabinet contains 96 computing nodes,
>>>>>>>>>>>>>>>>>>>>> One of the requirement is to give the performance of my current code with my current set of data, and there is a formula to calculate the estimated parallel efficiency when using the new large set of data
>>>>>>>>>>>>>>>>>>>>> There are 2 ways to give performance:
>>>>>>>>>>>>>>>>>>>>> 1. Strong scaling, which is defined as how the elapsed time varies with the number of processors for a fixed
>>>>>>>>>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>>>>>>>>> 2. Weak scaling, which is defined as how the elapsed time varies with the number of processors for a
>>>>>>>>>>>>>>>>>>>>> fixed problem size per processor.
>>>>>>>>>>>>>>>>>>>>> I ran my cases with 48 and 96 cores with my current cluster, giving 140 and 90 mins respectively. This is classified as strong scaling.
>>>>>>>>>>>>>>>>>>>>> Cluster specs:
>>>>>>>>>>>>>>>>>>>>> CPU: AMD 6234 2.4GHz
>>>>>>>>>>>>>>>>>>>>> 8 cores / processor (CPU)
>>>>>>>>>>>>>>>>>>>>> 6 CPU / node
>>>>>>>>>>>>>>>>>>>>> So 48 Cores / CPU
>>>>>>>>>>>>>>>>>>>>> Not sure abt the memory / node
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> The parallel efficiency ‘En’ for a given degree of parallelism ‘n’ indicates how much the program is
>>>>>>>>>>>>>>>>>>>>> efficiently accelerated by parallel processing. ‘En’ is given by the following formulae. Although their
>>>>>>>>>>>>>>>>>>>>> derivation processes are different depending on strong and weak scaling, derived formulae are the
>>>>>>>>>>>>>>>>>>>>> same.
>>>>>>>>>>>>>>>>>>>>>  From the estimated time, my parallel efficiency using  Amdahl's law on the current old cluster was 52.7%.
>>>>>>>>>>>>>>>>>>>>> So is my results acceptable?
>>>>>>>>>>>>>>>>>>>>> For the large data set, if using 2205 nodes (2205X8cores), my expected parallel efficiency is only 0.5%. The proposal recommends value of > 50%.
>>>>>>>>>>>>>>>>>>>>> The problem with this analysis is that the estimated serial fraction from Amdahl's Law  changes as a function
>>>>>>>>>>>>>>>>>>>>> of problem size, so you cannot take the strong scaling from one problem and apply it to another without a
>>>>>>>>>>>>>>>>>>>>> model of this dependence.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Weak scaling does model changes with problem size, so I would measure weak scaling on your current
>>>>>>>>>>>>>>>>>>>>> cluster, and extrapolate to the big machine. I realize that this does not make sense for many scientific
>>>>>>>>>>>>>>>>>>>>> applications, but neither does requiring a certain parallel efficiency.
>>>>>>>>>>>>>>>>>>>> Ok I check the results for my weak scaling it is even worse for the expected parallel efficiency. From the formula used, it's obvious it's doing some sort of exponential extrapolation decrease. So unless I can achieve a near > 90% speed up when I double the cores and problem size for my current 48/96 cores setup,     extrapolating from about 96 nodes to 10,000 nodes will give a much lower expected parallel efficiency for the new case.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> However, it's mentioned in the FAQ that due to memory requirement, it's impossible to get >90% speed when I double the cores and problem size (ie linear increase in performance), which means that I can't get >90% speed up when I double the cores and problem size for my current 48/96 cores setup. Is that so?
>>>>>>>>>>>>>>>>>>>    What is the output of -ksp_view -log_summary on the problem and then on the problem doubled in size and number of processors?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>    Barry
>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I have attached the output
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 48 cores: log48
>>>>>>>>>>>>>>>>>> 96 cores: log96
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> There are 2 solvers - The momentum linear eqn uses bcgs, while the Poisson eqn uses hypre BoomerAMG.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Problem size doubled from 158x266x150 to 158x266x300.
>>>>>>>>>>>>>>>>>>>> So is it fair to say that the main problem does not lie in my programming skills, but rather the way the linear equations are solved?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>>>>>>>>    Thanks,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>       Matt
>>>>>>>>>>>>>>>>>>>>> Is it possible for this type of scaling in PETSc (>50%), when using 17640 (2205X8) cores?
>>>>>>>>>>>>>>>>>>>>> Btw, I do not have access to the system.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Sent using CloudMagic Email
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>>>>>>>>>>>>>>>>>>>> -- Norbert Wiener
>>>>>>>>>>>>>>>>>> <log48.txt><log96.txt>
>>>>>>>>>>>>>>>> <log48_10.txt><log48.txt><log96.txt>
>>>>>>>>>>>>>> <log96_100.txt><log48_100.txt>
>>>>>>>>>>>> <log96_100_2.txt><log48_100_2.txt>
>>>>>>>>>> <log64_100.txt><log8_100.txt>
>>>>>> <log.txt>
>>>> <log64_100_2.txt><log8_100_2.txt>

-------------- next part --------------
  0.000000000000000E+000  0.353000000000000       0.000000000000000E+000
   90.0000000000000       0.000000000000000E+000  0.000000000000000E+000
   1.00000000000000       0.400000000000000                0     -400000
 AB,AA,BB   -2.00050000002375        2.00050000002375     
   2.61200002906844        2.53550002543489     
 size_x,size_y,size_z           79         133          75
 body_cg_ini  0.523700833348298       0.778648765134454     
   7.03282656467989     
 Warning - length difference between element and cell
 max_element_length,min_element_length,min_delta
  0.000000000000000E+000   10000000000.0000       4.300000000000000E-002
 maximum ngh_surfaces and ngh_vertics are          149          68
 minimum ngh_surfaces and ngh_vertics are           54          22
 body_cg_ini  0.896813342835977      -0.976707581163755     
   7.03282656467989     
 Warning - length difference between element and cell
 max_element_length,min_element_length,min_delta
  0.000000000000000E+000   10000000000.0000       4.300000000000000E-002
 maximum ngh_surfaces and ngh_vertics are          149          68
 minimum ngh_surfaces and ngh_vertics are           54          22
 min IIB_cell_no           0
 max IIB_cell_no         265
 final initial IIB_cell_no        1325
 min I_cell_no           0
 max I_cell_no          94
 final initial I_cell_no         470
 size(IIB_cell_u),size(I_cell_u),size(IIB_equal_cell_u),size(I_equal_cell_u)
        1325         470        1325         470
 IIB_I_cell_no_uvw_total1         265         270         255          94
          91          95
 IIB_I_cell_no_uvw_total2         273         280         267          97
          94          98
    1      0.00150000      0.14647311      0.14738627      1.08799969  0.19042093E+02  0.17803989E+00  0.78750668E+06
 escape_time reached, so abort
 body 1
 implicit forces and moment 1
  0.869079172306081      -0.476901556137086       8.158436217625725E-002
  0.428147637893997       0.558124953374670      -0.928673815311215     
 body 2
 implicit forces and moment 2
  0.551071812724179       0.775546545440679       0.135476357173946     
 -0.634587321947283       0.290234875219091       0.936523266880710     
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./a.out on a petsc-3.6.2_shared_rel named n12-04 with 8 processors, by wtay Fri Nov  6 03:01:51 2015
Using Petsc Release Version 3.6.2, Oct, 02, 2015 

                         Max       Max/Min        Avg      Total 
Time (sec):           3.802e+03      1.00000   3.802e+03
Objects:              5.560e+02      1.00361   5.542e+02
Flops:                1.594e+12      1.12475   1.484e+12  1.187e+13
Flops/sec:            4.192e+08      1.12475   3.902e+08  3.122e+09
MPI Messages:         3.857e+06      2.54046   2.913e+06  2.331e+07
MPI Message Lengths:  7.441e+10      2.00621   2.218e+04  5.169e+11
MPI Reductions:       2.063e+05      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 3.8019e+03 100.0%  1.1869e+13 100.0%  2.331e+07 100.0%  2.218e+04      100.0%  2.063e+05 100.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult          1066750 1.0 1.6362e+03 1.2 5.96e+11 1.2 1.9e+07 2.6e+04 0.0e+00 37 37 81 95  0  37 37 81 95  0  2695
MatMultAdd        164088 1.0 1.4247e+02 1.2 3.77e+10 1.2 2.3e+06 5.8e+03 0.0e+00  3  2 10  3  0   3  2 10  3  0  1961
MatMultTranspose  164088 1.0 2.0693e+02 1.4 3.77e+10 1.2 2.3e+06 5.8e+03 0.0e+00  4  2 10  3  0   4  2 10  3  0  1350
MatSolve           82341277.2 2.5719e+00 1.4 1.16e+09 1.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  3316
MatSOR            984572 1.0 1.6424e+03 1.2 5.00e+11 1.1 0.0e+00 0.0e+00 0.0e+00 39 31  0  0  0  39 31  0  0  0  2275
MatLUFactorSym         1 1.0 3.5048e-05 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLUFactorNum       100 1.0 5.1047e+00 1.2 6.34e+08 1.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   919
MatILUFactorSym        1 1.0 3.7481e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatConvert             4 1.0 1.7271e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatScale              12 1.0 1.4342e-02 1.2 3.23e+06 1.2 7.4e+01 2.4e+04 0.0e+00  0  0  0  0  0   0  0  0  0  0  1665
MatResidual       164088 1.0 2.8096e+02 1.3 9.48e+10 1.2 3.0e+06 2.4e+04 0.0e+00  6  6 13 14  0   6  6 13 14  0  2493
MatAssemblyBegin     153 1.0 2.7167e+00 8.5 0.00e+00 0.0 1.1e+02 6.9e+03 2.5e+02  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd       153 1.0 1.3169e+00 1.1 0.00e+00 0.0 7.5e+02 6.5e+03 1.8e+02  0  0  0  0  0   0  0  0  0  0     0
MatGetRow         462356 1.1 1.5225e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ            2 2.0 9.0599e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetSubMatrix        4 1.0 3.0942e-03 1.0 0.00e+00 0.0 1.2e+02 1.6e+03 6.4e+01  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         2 2.0 6.7458e-03 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatCoarsen             4 1.0 5.3186e-02 1.3 0.00e+00 0.0 6.4e+02 1.3e+04 1.6e+01  0  0  0  0  0   0  0  0  0  0     0
MatAXPY                4 1.0 1.0154e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMatMult             4 1.0 1.4702e-01 1.0 2.31e+06 1.2 4.5e+02 1.2e+04 6.4e+01  0  0  0  0  0   0  0  0  0  0   116
MatMatMultSym          4 1.0 9.9177e-02 1.0 0.00e+00 0.0 3.8e+02 1.0e+04 5.6e+01  0  0  0  0  0   0  0  0  0  0     0
MatMatMultNum          4 1.0 4.8954e-02 1.0 2.31e+06 1.2 7.4e+01 2.4e+04 8.0e+00  0  0  0  0  0   0  0  0  0  0   349
MatPtAP                4 1.0 6.1620e-01 1.0 2.66e+07 1.4 7.5e+02 4.0e+04 6.8e+01  0  0  0  0  0   0  0  0  0  0   307
MatPtAPSymbolic        4 1.0 2.3832e-01 1.0 0.00e+00 0.0 4.5e+02 4.6e+04 2.8e+01  0  0  0  0  0   0  0  0  0  0     0
MatPtAPNumeric         4 1.0 3.7817e-01 1.0 2.66e+07 1.4 3.0e+02 3.0e+04 4.0e+01  0  0  0  0  0   0  0  0  0  0   500
MatTrnMatMult          1 1.0 6.1167e-01 1.0 1.05e+07 1.2 8.4e+01 1.7e+05 1.9e+01  0  0  0  0  0   0  0  0  0  0   127
MatTrnMatMultSym       1 1.0 2.9542e-01 1.0 0.00e+00 0.0 7.0e+01 9.2e+04 1.7e+01  0  0  0  0  0   0  0  0  0  0     0
MatTrnMatMultNum       1 1.0 3.1680e-01 1.0 1.05e+07 1.2 1.4e+01 5.8e+05 2.0e+00  0  0  0  0  0   0  0  0  0  0   246
MatGetLocalMat        14 1.0 4.1928e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetBrAoCol         12 1.0 3.7147e-02 1.4 0.00e+00 0.0 5.2e+02 4.1e+04 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetSymTrans         8 1.0 7.8440e-03 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPGMRESOrthog     80756 1.0 3.9613e+02 1.8 2.55e+11 1.1 0.0e+00 0.0e+00 8.1e+04  8 16  0  0 39   8 16  0  0 39  4833
KSPSetUp             213 1.0 4.7638e-02 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+01  0  0  0  0  0   0  0  0  0  0     0
KSPSolve             199 1.0 3.7658e+03 1.0 1.59e+12 1.1 2.3e+07 2.2e+04 2.0e+05 99100100100 99  99100100100 99  3152
VecDot               198 1.0 7.2550e-01 2.1 1.25e+08 1.1 0.0e+00 0.0e+00 2.0e+02  0  0  0  0  0   0  0  0  0  0  1290
VecDotNorm2           99 1.0 5.3831e-01 3.5 1.25e+08 1.1 0.0e+00 0.0e+00 9.9e+01  0  0  0  0  0   0  0  0  0  0  1739
VecMDot            80756 1.0 3.0185e+02 2.8 1.28e+11 1.1 0.0e+00 0.0e+00 8.1e+04  5  8  0  0 39   5  8  0  0 39  3171
VecNorm           123352 1.0 4.6411e+01 4.4 8.75e+09 1.1 0.0e+00 0.0e+00 1.2e+05  1  1  0  0 60   1  1  0  0 60  1414
VecScale          123124 1.0 3.6298e+00 1.3 4.31e+09 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  8911
VecCopy           206684 1.0 1.3012e+01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet            699500 1.0 1.0120e+01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY            43666 1.0 6.9073e-01 1.2 5.55e+08 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  6020
VecAYPX          1312704 1.0 1.2019e+02 1.3 4.74e+10 1.1 0.0e+00 0.0e+00 0.0e+00  3  3  0  0  0   3  3  0  0  0  2960
VecAXPBYCZ        656550 1.0 8.2020e+01 1.3 9.51e+10 1.1 0.0e+00 0.0e+00 0.0e+00  2  6  0  0  0   2  6  0  0  0  8696
VecWAXPY             198 1.0 3.3118e-01 1.4 1.25e+08 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2827
VecMAXPY          123154 1.0 1.1971e+02 1.2 1.36e+11 1.1 0.0e+00 0.0e+00 0.0e+00  3  9  0  0  0   3  9  0  0  0  8519
VecAssemblyBegin     416 1.0 5.0510e-01 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+03  0  0  0  0  1   0  0  0  0  1     0
VecAssemblyEnd       416 1.0 1.0114e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult      44 1.0 7.5576e-03 1.9 1.27e+06 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1262
VecScatterBegin  1394945 1.0 4.2240e+01 2.3 0.00e+00 0.0 2.3e+07 2.2e+04 0.0e+00  1  0100100  0   1  0100100  0     0
VecScatterEnd    1394945 1.0 7.3695e+02 3.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 11  0  0  0  0  11  0  0  0  0     0
VecSetRandom           4 1.0 3.2659e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecNormalize      123154 1.0 4.8155e+01 3.5 1.29e+10 1.1 0.0e+00 0.0e+00 1.2e+05  1  1  0  0 60   1  1  0  0 60  2015
PCGAMGGraph_AGG        4 1.0 3.8426e-01 1.0 2.31e+06 1.2 2.2e+02 1.2e+04 4.8e+01  0  0  0  0  0   0  0  0  0  0    44
PCGAMGCoarse_AGG       4 1.0 7.2042e-01 1.0 1.05e+07 1.2 7.9e+02 4.2e+04 5.1e+01  0  0  0  0  0   0  0  0  0  0   108
PCGAMGProl_AGG         4 1.0 1.6819e-01 1.0 0.00e+00 0.0 3.4e+02 2.0e+04 9.6e+01  0  0  0  0  0   0  0  0  0  0     0
PCGAMGPOpt_AGG         4 1.0 4.5917e-01 1.0 5.82e+07 1.1 1.2e+03 2.0e+04 2.0e+02  0  0  0  0  0   0  0  0  0  0   945
GAMG: createProl       4 1.0 1.7316e+00 1.0 7.10e+07 1.1 2.5e+03 2.6e+04 4.0e+02  0  0  0  0  0   0  0  0  0  0   305
  Graph                8 1.0 3.8322e-01 1.0 2.31e+06 1.2 2.2e+02 1.2e+04 4.8e+01  0  0  0  0  0   0  0  0  0  0    45
  MIS/Agg              4 1.0 5.3304e-02 1.3 0.00e+00 0.0 6.4e+02 1.3e+04 1.6e+01  0  0  0  0  0   0  0  0  0  0     0
  SA: col data         4 1.0 2.2745e-02 1.1 0.00e+00 0.0 1.5e+02 4.0e+04 4.0e+01  0  0  0  0  0   0  0  0  0  0     0
  SA: frmProl0         4 1.0 1.3956e-01 1.0 0.00e+00 0.0 1.9e+02 4.6e+03 4.0e+01  0  0  0  0  0   0  0  0  0  0     0
  SA: smooth           4 1.0 4.5917e-01 1.0 5.82e+07 1.1 1.2e+03 2.0e+04 2.0e+02  0  0  0  0  0   0  0  0  0  0   945
GAMG: partLevel        4 1.0 6.2054e-01 1.0 2.66e+07 1.4 8.9e+02 3.3e+04 1.7e+02  0  0  0  0  0   0  0  0  0  0   305
  repartition          2 1.0 2.2912e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
  Invert-Sort          2 1.0 3.1805e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00  0  0  0  0  0   0  0  0  0  0     0
  Move A               2 1.0 1.4431e-03 1.1 0.00e+00 0.0 6.9e+01 2.6e+03 3.4e+01  0  0  0  0  0   0  0  0  0  0     0
  Move P               2 1.0 1.9579e-03 1.0 0.00e+00 0.0 4.8e+01 6.1e+01 3.4e+01  0  0  0  0  0   0  0  0  0  0     0
PCSetUp              200 1.0 7.5039e+00 1.2 7.32e+08 1.2 3.4e+03 2.8e+04 5.9e+02  0  0  0  0  0   0  0  0  0  0   721
PCSetUpOnBlocks    41121 1.0 5.1800e+00 1.2 6.34e+08 1.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   905
PCApply            41319 1.0 3.3777e+03 1.1 1.26e+12 1.1 2.3e+07 2.1e+04 1.2e+05 87 79 98 91 60  87 79 98 91 60  2771
SFSetGraph             4 1.0 2.0399e-03 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFBcastBegin          24 1.0 1.8663e-02 4.9 0.00e+00 0.0 6.4e+02 1.3e+04 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFBcastEnd            24 1.0 1.0988e-02 3.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix   106            106    220273512     0
      Matrix Coarsen     4              4         2512     0
       Krylov Solver    18             18       287944     0
              Vector   294            294     93849960     0
      Vector Scatter    29             29        31760     0
           Index Set    78             78      2936420     0
      Preconditioner    18             18        17444     0
Star Forest Bipartite Graph     4              4         3424     0
         PetscRandom     4              4         2496     0
              Viewer     1              0            0     0
========================================================================================================================
Average time to get PetscTime(): 1.90735e-07
Average time for MPI_Barrier(): 3.19481e-06
Average time for zero size MPI_Send(): 1.31428e-05
#PETSc Option Table entries:
-log_summary
-poisson_pc_type gamg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-mpi-dir=/opt/ud/openmpi-1.8.8/ --with-blas-lapack-dir=/opt/ud/intel_xe_2013sp1/mkl/lib/intel64/ --with-debugging=0 --download-hypre=1 --prefix=/home/wtay/Lib/petsc-3.6.2_shared_rel --known-mpi-shared=1 --with-shared-libraries --with-fortran-interfaces=1
-----------------------------------------
Libraries compiled on Sun Oct 18 17:34:07 2015 on hpc12 
Machine characteristics: Linux-3.10.0-123.20.1.el7.x86_64-x86_64-with-centos-7.1.1503-Core
Using PETSc directory: /home/wtay/Codes/petsc-3.6.2
Using PETSc arch: petsc-3.6.2_shared_rel
-----------------------------------------

Using C compiler: /opt/ud/openmpi-1.8.8/bin/mpicc  -fPIC -wd1572 -O3  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /opt/ud/openmpi-1.8.8/bin/mpif90  -fPIC -O3   ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/include -I/home/wtay/Codes/petsc-3.6.2/include -I/home/wtay/Codes/petsc-3.6.2/include -I/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/include -I/home/wtay/Lib/petsc-3.6.2_shared_rel/include -I/opt/ud/openmpi-1.8.8/include
-----------------------------------------

Using C linker: /opt/ud/openmpi-1.8.8/bin/mpicc
Using Fortran linker: /opt/ud/openmpi-1.8.8/bin/mpif90
Using libraries: -Wl,-rpath,/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/lib -L/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/lib -lpetsc -Wl,-rpath,/home/wtay/Lib/petsc-3.6.2_shared_rel/lib -L/home/wtay/Lib/petsc-3.6.2_shared_rel/lib -lHYPRE -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -lmpi_cxx -Wl,-rpath,/opt/ud/intel_xe_2013sp1/mkl/lib/intel64 -L/opt/ud/intel_xe_2013sp1/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -lX11 -lhwloc -lssl -lcrypto -lmpi_usempi -lmpi_mpifh -lifport -lifcore -lm -lmpi_cxx -ldl -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -lmpi -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -limf -lsvml -lirng -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lpthread -lirc_s -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib -Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -ldl 
-----------------------------------------
-------------- next part --------------
  0.000000000000000E+000  0.353000000000000       0.000000000000000E+000
   90.0000000000000       0.000000000000000E+000  0.000000000000000E+000
   1.00000000000000       0.400000000000000                0     -400000
 z grid divid too small!
 myid,each procs z size          23           2
 z grid divid too small!
 myid,each procs z size          50           2
 z grid divid too small!
 myid,each procs z size          31           2
 z grid divid too small!
 myid,each procs z size          25           2
 z grid divid too small!
 myid,each procs z size          56           2
 z grid divid too small!
 myid,each procs z size          28           2
 z grid divid too small!
 myid,each procs z size          32           2
 z grid divid too small!
 myid,each procs z size          57           2
 z grid divid too small!
 myid,each procs z size          62           2
 z grid divid too small!
 myid,each procs z size          39           2
 z grid divid too small!
 myid,each procs z size          49           2
 z grid divid too small!
 myid,each procs z size          47           2
 z grid divid too small!
 myid,each procs z size          33           2
 z grid divid too small!
 myid,each procs z size          43           2
 z grid divid too small!
 myid,each procs z size          46           2
 z grid divid too small!
 myid,each procs z size          37           2
 z grid divid too small!
 myid,each procs z size          40           2
 z grid divid too small!
 myid,each procs z size          44           2
 z grid divid too small!
 myid,each procs z size          38           2
 z grid divid too small!
 myid,each procs z size          41           2
 z grid divid too small!
 myid,each procs z size          51           2
 z grid divid too small!
 myid,each procs z size          59           2
 z grid divid too small!
 myid,each procs z size          48           2
 z grid divid too small!
 myid,each procs z size          36           2
 z grid divid too small!
 myid,each procs z size          42           2
 z grid divid too small!
 myid,each procs z size          45           2
 z grid divid too small!
 myid,each procs z size          29           2
 z grid divid too small!
 myid,each procs z size          27           2
 z grid divid too small!
 myid,each procs z size          35           2
 z grid divid too small!
 myid,each procs z size          34           2
 z grid divid too small!
 myid,each procs z size          54           2
 z grid divid too small!
 myid,each procs z size          60           2
 z grid divid too small!
 myid,each procs z size          53           2
 z grid divid too small!
 myid,each procs z size          58           2
 z grid divid too small!
 myid,each procs z size          63           2
 z grid divid too small!
 myid,each procs z size          52           2
 z grid divid too small!
 myid,each procs z size          61           2
 z grid divid too small!
 myid,each procs z size          55           2
 z grid divid too small!
 myid,each procs z size          30           2
 AB,AA,BB   -2.47900002275128        2.50750002410496     
   3.46600006963126        3.40250006661518     
 size_x,size_y,size_z          158         266         150
 z grid divid too small!
 myid,each procs z size          24           2
 z grid divid too small!
 myid,each procs z size          26           2
 z grid divid too small!
 myid,each procs z size          22           2
 body_cg_ini  0.523700833348298       0.778648765134454     
   7.03282656467989     
 Warning - length difference between element and cell
 max_element_length,min_element_length,min_delta
  0.000000000000000E+000   10000000000.0000       1.800000000000000E-002
 maximum ngh_surfaces and ngh_vertics are           42          22
 minimum ngh_surfaces and ngh_vertics are           28          10
 body_cg_ini  0.896813342835977      -0.976707581163755     
   7.03282656467989     
 Warning - length difference between element and cell
 max_element_length,min_element_length,min_delta
  0.000000000000000E+000   10000000000.0000       1.800000000000000E-002
 maximum ngh_surfaces and ngh_vertics are           42          22
 minimum ngh_surfaces and ngh_vertics are           28          10
 min IIB_cell_no           0
 max IIB_cell_no         429
 final initial IIB_cell_no        2145
 min I_cell_no           0
 max I_cell_no         460
 final initial I_cell_no        2300
 size(IIB_cell_u),size(I_cell_u),size(IIB_equal_cell_u),size(I_equal_cell_u)
        2145        2300        2145        2300
 IIB_I_cell_no_uvw_total1        3090        3094        3078        3080
        3074        3073
 IIB_I_cell_no_uvw_total2        3102        3108        3089        3077
        3060        3086
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
Linear poisson_ solve did not converge due to DIVERGED_ITS iterations 10000
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
 P Diverged
--------------------------------------------------------------------------
mpiexec has exited due to process rank 37 with PID 0 on
node n12-10 exiting improperly. There are three reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
orte_create_session_dirs is set to false. In this case, the run-time cannot
detect that the abort call was an abnormal termination. Hence, the only
error message you will receive is this one.

This may have caused other processes in the application to be
terminated by signals sent by mpiexec (as reported here).

You can avoid this message by specifying -quiet on the mpiexec command line.

--------------------------------------------------------------------------