[petsc-users] Investigate parallel code to improve parallelism

Sun Feb 28 20:26:36 CST 2016

On 29/2/2016 9:41 AM, Barry Smith wrote:
>> On Feb 28, 2016, at 7:08 PM, TAY Wee Beng <zonexo at gmail.com> wrote:
>>
>> Hi,
>>
>> I've attached the files for x cells running y procs. hypre is called natively I'm not sure if PETSc catches it.
>    So you are directly creating hypre matrices and calling the hypre solver in another piece of your code?

Yes because I'm using the simple structure (struct) layout for Cartesian 
grids. It's about twice as fast compared to BoomerAMG. I can't create 
PETSc matrix and use the hypre struct layout, right?
>
>     In the PETSc part of the code if you compare the 2x_y to the x_y you see that doubling the problem size resulted in 2.2 as much time for the KSPSolve. Most of this large increase is due to the increased time in the scatter which went up to 150/54.  = 2.7777777777777777  but the amount of data transferred only increased by 1e5/6.4e4 = 1.5625  Normally I would not expect to see this behavior and would not expect such a large increase in the communication time.
>
> Barry
>
>
>
So ideally it should be 2 instead of 2.2, is that so?

May I know where are you looking at? Because I can't find the nos.

So where do you think the error comes from? Or how can I troubleshoot 
further?

Thanks
>> Thanks
>>
>> On 29/2/2016 1:11 AM, Barry Smith wrote:
>>>    As I said before, send the -log_summary output for the two processor sizes and we'll look at where it is spending its time and how it could possibly be improved.
>>>
>>>    Barry
>>>
>>>> On Feb 28, 2016, at 10:29 AM, TAY wee-beng <zonexo at gmail.com> wrote:
>>>>
>>>>
>>>> On 27/2/2016 12:53 AM, Barry Smith wrote:
>>>>>> On Feb 26, 2016, at 10:27 AM, TAY wee-beng <zonexo at gmail.com> wrote:
>>>>>>
>>>>>>
>>>>>> On 26/2/2016 11:32 PM, Barry Smith wrote:
>>>>>>>> On Feb 26, 2016, at 9:28 AM, TAY wee-beng <zonexo at gmail.com> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I have got a 3D code. When I ran with 48 procs and 11 million cells, it runs for 83 min. When I ran with 96 procs and 22 million cells, it ran for 99 min.
>>>>>>>     This is actually pretty good!
>>>>>> But if I'm not wrong, if I increase the no. of cells, the parallelism will keep on decreasing. I hope it scales up to maybe 300 - 400 procs.
>>>> Hi,
>>>>
>>>> I think I may have mentioned this before, that is, I need to submit a proposal to request for computing nodes. In the proposal, I'm supposed to run some simulations to estimate the time it takes to run my code. Then an excel file will use my input to estimate the efficiency when I run my code with more cells. They use 2 mtds to estimate:
>>>>
>>>> 1. strong scaling, whereby I run 2 cases - 1st with n cells and x procs, then with n cells and 2x procs. From there, they can estimate my expected efficiency when I have y procs. The formula is attached in the pdf.
>>>>
>>>> 2. weak scaling, whereby I run 2 cases - 1st with n cells and x procs, then with 2n cells and 2x procs. From there, they can estimate my expected efficiency when I have y procs. The formula is attached in the pdf.
>>>>
>>>> So if I use 48 and 96 procs and get maybe 80% efficiency, by the time I hit 800 procs, I get 32% efficiency for strong scaling. They expect at least 50% efficiency for my code. To reach that, I need to achieve 89% efficiency when I use 48 and 96 procs.
>>>>
>>>> So now my qn is how accurate is this type of calculation, especially wrt to PETSc?
>>>>
>>>> Similarly, for weak scaling, is it accurate?
>>>>
>>>> Can I argue that this estimation is not suitable for PETSc or hypre?
>>>>
>>>> Thanks
>>>>
>>>>
>>>>>>>> So it's not that parallel. I want to find out which part of the code I need to improve. Also if PETsc and hypre is working well in parallel. What's the best way to do it?
>>>>>>>    Run both with -log_summary and send the output for each case. This will show where the time is being spent and which parts are scaling less well.
>>>>>>>
>>>>>>>     Barry
>>>>>> That's only for the PETSc part, right? So for other parts of the code, including hypre part, I will not be able to find out. If so, what can I use to check these parts?
>>>>>     You will still be able to see what percentage of the time is spent in hypre and if it increases with the problem size and how much. So the information will still be useful.
>>>>>
>>>>>    Barry
>>>>>
>>>>>>>> I thought of doing profiling but if the code is optimized, I wonder if it still works well.
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> Thank you.
>>>>>>>>
>>>>>>>> Yours sincerely,
>>>>>>>>
>>>>>>>> TAY wee-beng
>>>>>>>>
>>>> <temp.pdf>
>> -- 
>> Thank you
>>
>> Yours sincerely,
>>
>> TAY wee-beng
>>
>> <2x_2y.txt><2x_y.txt><4x_2y.txt><x_y.txt>