<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <br>

    <div class="moz-cite-prefix">On 3/11/2015 8:52 PM, Matthew Knepley

      wrote:<br>

    </div>

    <blockquote

cite="mid:CAMYG4GkiiAtJaJUm-uotK1BsaTsO1V=_eQfGPCqHLybbEUcLTw@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">On Tue, Nov 3, 2015 at 6:49 AM, TAY

            wee-beng <span dir="ltr"><<a moz-do-not-send="true"

                href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>

            wrote:<br>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>

              <br>

              I tried and have attached the log.<br>

              <br>

              Ya, my Poisson eqn has Neumann boundary condition. Do I

              need to specify some null space stuff?  Like

              KSPSetNullSpace or MatNullSpaceCreate?</blockquote>

            <div><br>

            </div>

            <div>Yes, you need to attach the constant null space to the

              matrix.</div>

            <div><br>

            </div>

            <div>  Thanks,</div>

            <div><br>

            </div>

            <div>     Matt</div>

          </div>

        </div>

      </div>

    </blockquote>

    Ok so can you point me to a suitable example so that I know which

    one to use specifically?<br>

    <br>

    Thanks.<br>

    <blockquote

cite="mid:CAMYG4GkiiAtJaJUm-uotK1BsaTsO1V=_eQfGPCqHLybbEUcLTw@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex"><span

                class="im HOEnZb"><br>

                Thank you<br>

                <br>

                Yours sincerely,<br>

                <br>

                TAY wee-beng<br>

                <br>

              </span>

              <div class="HOEnZb">

                <div class="h5">

                  On 3/11/2015 12:45 PM, Barry Smith wrote:<br>

                  <blockquote class="gmail_quote" style="margin:0 0 0

                    .8ex;border-left:1px #ccc solid;padding-left:1ex">

                    <blockquote class="gmail_quote" style="margin:0 0 0

                      .8ex;border-left:1px #ccc solid;padding-left:1ex">

                      On Nov 2, 2015, at 10:37 PM, TAY wee-beng<<a

                        moz-do-not-send="true"

                        href="mailto:zonexo@gmail.com" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:zonexo@gmail.com">zonexo@gmail.com</a></a>> 

                      wrote:<br>

                      <br>

                      Hi,<br>

                      <br>

                      I tried :<br>

                      <br>

                      1. -poisson_pc_gamg_agg_nsmooths 1

                      -poisson_pc_type gamg<br>

                      <br>

                      2. -poisson_pc_type gamg<br>

                    </blockquote>

                        Run with -poisson_ksp_monitor_true_residual

                    -poisson_ksp_monitor_converged_reason<br>

                    Does your poisson have Neumann boundary conditions?

                    Do you have any zeros on the diagonal for the matrix

                    (you shouldn't).<br>

                    <br>

                       There may be something wrong with your poisson

                    discretization that was also messing up hypre<br>

                    <br>

                    <br>

                    <br>

                    <blockquote class="gmail_quote" style="margin:0 0 0

                      .8ex;border-left:1px #ccc solid;padding-left:1ex">

                      Both options give:<br>

                      <br>

                          1      0.00150000      0.00000000     

                      0.00000000 1.00000000             NaN           

                       NaN             NaN<br>

                      M Diverged but why?, time =            2<br>

                      reason =           -9<br>

                      <br>

                      How can I check what's wrong?<br>

                      <br>

                      Thank you<br>

                      <br>

                      Yours sincerely,<br>

                      <br>

                      TAY wee-beng<br>

                      <br>

                      On 3/11/2015 3:18 AM, Barry Smith wrote:<br>

                      <blockquote class="gmail_quote" style="margin:0 0

                        0 .8ex;border-left:1px #ccc

                        solid;padding-left:1ex">

                            hypre is just not scaling well here. I do

                        not know why. Since hypre is a block box for us

                        there is no way to determine why the poor

                        scaling.<br>

                        <br>

                            If you make the same two runs with -pc_type

                        gamg there will be a lot more information in the

                        log summary about in what routines it is scaling

                        well or poorly.<br>

                        <br>

                           Barry<br>

                        <br>

                        <br>

                        <br>

                        <blockquote class="gmail_quote" style="margin:0

                          0 0 .8ex;border-left:1px #ccc

                          solid;padding-left:1ex">

                          On Nov 2, 2015, at 3:17 AM, TAY wee-beng<<a

                            moz-do-not-send="true"

                            href="mailto:zonexo@gmail.com"

                            target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:zonexo@gmail.com">zonexo@gmail.com</a></a>> 

                          wrote:<br>

                          <br>

                          Hi,<br>

                          <br>

                          I have attached the 2 files.<br>

                          <br>

                          Thank you<br>

                          <br>

                          Yours sincerely,<br>

                          <br>

                          TAY wee-beng<br>

                          <br>

                          On 2/11/2015 2:55 PM, Barry Smith wrote:<br>

                          <blockquote class="gmail_quote"

                            style="margin:0 0 0 .8ex;border-left:1px

                            #ccc solid;padding-left:1ex">

                               Run (158/2)x(266/2)x(150/2) grid on 8

                            processes  and then (158)x(266)x(150) on 64

                            processors  and send the two -log_summary

                            results<br>

                            <br>

                               Barry<br>

                            <br>

                              <br>

                            <blockquote class="gmail_quote"

                              style="margin:0 0 0 .8ex;border-left:1px

                              #ccc solid;padding-left:1ex">

                              On Nov 2, 2015, at 12:19 AM, TAY

                              wee-beng<<a moz-do-not-send="true"

                                href="mailto:zonexo@gmail.com"

                                target="_blank">zonexo@gmail.com</a>> 

                              wrote:<br>

                              <br>

                              Hi,<br>

                              <br>

                              I have attached the new results.<br>

                              <br>

                              Thank you<br>

                              <br>

                              Yours sincerely,<br>

                              <br>

                              TAY wee-beng<br>

                              <br>

                              On 2/11/2015 12:27 PM, Barry Smith wrote:<br>

                              <blockquote class="gmail_quote"

                                style="margin:0 0 0 .8ex;border-left:1px

                                #ccc solid;padding-left:1ex">

                                   Run without the -momentum_ksp_view

                                -poisson_ksp_view and send the new

                                results<br>

                                <br>

                                <br>

                                   You can see from the log summary that

                                the PCSetUp is taking a much smaller

                                percentage of the time meaning that it

                                is reusing the preconditioner and not

                                rebuilding it each time.<br>

                                <br>

                                Barry<br>

                                <br>

                                   Something makes no sense with the

                                output: it gives<br>

                                <br>

                                KSPSolve             199 1.0 2.3298e+03

                                1.0 5.20e+09 1.8 3.8e+04 9.9e+05 5.0e+02

                                90100 66100 24  90100 66100 24   165<br>

                                <br>

                                90% of the time is in the solve but

                                there is no significant amount of time

                                in other events of the code which is

                                just not possible. I hope it is due to

                                your IO.<br>

                                <br>

                                <br>

                                <br>

                                <blockquote class="gmail_quote"

                                  style="margin:0 0 0

                                  .8ex;border-left:1px #ccc

                                  solid;padding-left:1ex">

                                  On Nov 1, 2015, at 10:02 PM, TAY

                                  wee-beng<<a moz-do-not-send="true"

                                    href="mailto:zonexo@gmail.com"

                                    target="_blank">zonexo@gmail.com</a>> 

                                  wrote:<br>

                                  <br>

                                  Hi,<br>

                                  <br>

                                  I have attached the new run with 100

                                  time steps for 48 and 96 cores.<br>

                                  <br>

                                  Only the Poisson eqn 's RHS changes,

                                  the LHS doesn't. So if I want to reuse

                                  the preconditioner, what must I do? Or

                                  what must I not do?<br>

                                  <br>

                                  Why does the number of processes

                                  increase so much? Is there something

                                  wrong with my coding? Seems to be so

                                  too for my new run.<br>

                                  <br>

                                  Thank you<br>

                                  <br>

                                  Yours sincerely,<br>

                                  <br>

                                  TAY wee-beng<br>

                                  <br>

                                  On 2/11/2015 9:49 AM, Barry Smith

                                  wrote:<br>

                                  <blockquote class="gmail_quote"

                                    style="margin:0 0 0

                                    .8ex;border-left:1px #ccc

                                    solid;padding-left:1ex">

                                       If you are doing many time steps

                                    with the same linear solver then you

                                    MUST do your weak scaling studies

                                    with MANY time steps since the setup

                                    time of AMG only takes place in the

                                    first stimestep. So run both 48 and

                                    96 processes with the same large

                                    number of time steps.<br>

                                    <br>

                                       Barry<br>

                                    <br>

                                    <br>

                                    <br>

                                    <blockquote class="gmail_quote"

                                      style="margin:0 0 0

                                      .8ex;border-left:1px #ccc

                                      solid;padding-left:1ex">

                                      On Nov 1, 2015, at 7:35 PM, TAY

                                      wee-beng<<a

                                        moz-do-not-send="true"

                                        href="mailto:zonexo@gmail.com"

                                        target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:zonexo@gmail.com">zonexo@gmail.com</a></a>> 

                                      wrote:<br>

                                      <br>

                                      Hi,<br>

                                      <br>

                                      Sorry I forgot and use the old

                                      a.out. I have attached the new log

                                      for 48cores (log48), together with

                                      the 96cores log (log96).<br>

                                      <br>

                                      Why does the number of processes

                                      increase so much? Is there

                                      something wrong with my coding?<br>

                                      <br>

                                      Only the Poisson eqn 's RHS

                                      changes, the LHS doesn't. So if I

                                      want to reuse the preconditioner,

                                      what must I do? Or what must I not

                                      do?<br>

                                      <br>

                                      Lastly, I only simulated 2 time

                                      steps previously. Now I run for 10

                                      timesteps (log48_10). Is it

                                      building the preconditioner at

                                      every timestep?<br>

                                      <br>

                                      Also, what about momentum eqn? Is

                                      it working well?<br>

                                      <br>

                                      I will try the gamg later too.<br>

                                      <br>

                                      Thank you<br>

                                      <br>

                                      Yours sincerely,<br>

                                      <br>

                                      TAY wee-beng<br>

                                      <br>

                                      On 2/11/2015 12:30 AM, Barry Smith

                                      wrote:<br>

                                      <blockquote class="gmail_quote"

                                        style="margin:0 0 0

                                        .8ex;border-left:1px #ccc

                                        solid;padding-left:1ex">

                                           You used gmres with 48

                                        processes but richardson with

                                        96. You need to be careful and

                                        make sure you don't change the

                                        solvers when you change the

                                        number of processors since you

                                        can get very different

                                        inconsistent results<br>

                                        <br>

                                            Anyways all the time is

                                        being spent in the BoomerAMG

                                        algebraic multigrid setup and it

                                        is is scaling badly. When you

                                        double the problem size and

                                        number of processes it went from

                                        3.2445e+01 to 4.3599e+02

                                        seconds.<br>

                                        <br>

                                        PCSetUp                3 1.0

                                        3.2445e+01 1.0 9.58e+06 2.0

                                        0.0e+00 0.0e+00 4.0e+00 62  8 

                                        0  0  4  62  8  0  0  5    11<br>

                                        <br>

                                        PCSetUp                3 1.0

                                        4.3599e+02 1.0 9.58e+06 2.0

                                        0.0e+00 0.0e+00 4.0e+00 85 18 

                                        0  0  6  85 18  0  0  6     2<br>

                                        <br>

                                           Now is the Poisson problem

                                        changing at each timestep or can

                                        you use the same preconditioner

                                        built with BoomerAMG for all the

                                        time steps? Algebraic multigrid

                                        has a large set up time that you

                                        often doesn't matter if you have

                                        many time steps but if you have

                                        to rebuild it each timestep it

                                        is too large?<br>

                                        <br>

                                           You might also try -pc_type

                                        gamg and see how PETSc's

                                        algebraic multigrid scales for

                                        your problem/machine.<br>

                                        <br>

                                           Barry<br>

                                        <br>

                                        <br>

                                        <br>

                                        <blockquote class="gmail_quote"

                                          style="margin:0 0 0

                                          .8ex;border-left:1px #ccc

                                          solid;padding-left:1ex">

                                          On Nov 1, 2015, at 7:30 AM,

                                          TAY wee-beng<<a

                                            moz-do-not-send="true"

                                            href="mailto:zonexo@gmail.com"

                                            target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:zonexo@gmail.com">zonexo@gmail.com</a></a>> 

                                          wrote:<br>

                                          <br>

                                          <br>

                                          On 1/11/2015 10:00 AM, Barry

                                          Smith wrote:<br>

                                          <blockquote

                                            class="gmail_quote"

                                            style="margin:0 0 0

                                            .8ex;border-left:1px #ccc

                                            solid;padding-left:1ex">

                                            <blockquote

                                              class="gmail_quote"

                                              style="margin:0 0 0

                                              .8ex;border-left:1px #ccc

                                              solid;padding-left:1ex">

                                              On Oct 31, 2015, at 8:43

                                              PM, TAY wee-beng<<a

                                                moz-do-not-send="true"

                                                href="mailto:zonexo@gmail.com"

                                                target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:zonexo@gmail.com">zonexo@gmail.com</a></a>> 

                                              wrote:<br>

                                              <br>

                                              <br>

                                              On 1/11/2015 12:47 AM,

                                              Matthew Knepley wrote:<br>

                                              <blockquote

                                                class="gmail_quote"

                                                style="margin:0 0 0

                                                .8ex;border-left:1px

                                                #ccc

                                                solid;padding-left:1ex">

                                                On Sat, Oct 31, 2015 at

                                                11:34 AM, TAY

                                                wee-beng<<a

                                                  moz-do-not-send="true"

href="mailto:zonexo@gmail.com" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:zonexo@gmail.com">zonexo@gmail.com</a></a>> 

                                                wrote:<br>

                                                Hi,<br>

                                                <br>

                                                I understand that as

                                                mentioned in the faq,

                                                due to the limitations

                                                in memory, the scaling

                                                is not linear. So, I am

                                                trying to write a

                                                proposal to use a

                                                supercomputer.<br>

                                                Its specs are:<br>

                                                Compute nodes: 82,944

                                                nodes (SPARC64 VIIIfx;

                                                16GB of memory per node)<br>

                                                <br>

                                                8 cores / processor<br>

                                                Interconnect: Tofu

                                                (6-dimensional

                                                mesh/torus) Interconnect<br>

                                                Each cabinet contains 96

                                                computing nodes,<br>

                                                One of the requirement

                                                is to give the

                                                performance of my

                                                current code with my

                                                current set of data, and

                                                there is a formula to

                                                calculate the estimated

                                                parallel efficiency when

                                                using the new large set

                                                of data<br>

                                                There are 2 ways to give

                                                performance:<br>

                                                1. Strong scaling, which

                                                is defined as how the

                                                elapsed time varies with

                                                the number of processors

                                                for a fixed<br>

                                                problem.<br>

                                                2. Weak scaling, which

                                                is defined as how the

                                                elapsed time varies with

                                                the number of processors

                                                for a<br>

                                                fixed problem size per

                                                processor.<br>

                                                I ran my cases with 48

                                                and 96 cores with my

                                                current cluster, giving

                                                140 and 90 mins

                                                respectively. This is

                                                classified as strong

                                                scaling.<br>

                                                Cluster specs:<br>

                                                CPU: AMD 6234 2.4GHz<br>

                                                8 cores / processor

                                                (CPU)<br>

                                                6 CPU / node<br>

                                                So 48 Cores / CPU<br>

                                                Not sure abt the memory

                                                / node<br>

                                                <br>

                                                The parallel efficiency

                                                ‘En’ for a given degree

                                                of parallelism ‘n’

                                                indicates how much the

                                                program is<br>

                                                efficiently accelerated

                                                by parallel processing.

                                                ‘En’ is given by the

                                                following formulae.

                                                Although their<br>

                                                derivation processes are

                                                different depending on

                                                strong and weak scaling,

                                                derived formulae are the<br>

                                                same.<br>

                                                 From the estimated

                                                time, my parallel

                                                efficiency using 

                                                Amdahl's law on the

                                                current old cluster was

                                                52.7%.<br>

                                                So is my results

                                                acceptable?<br>

                                                For the large data set,

                                                if using 2205 nodes

                                                (2205X8cores), my

                                                expected parallel

                                                efficiency is only 0.5%.

                                                The proposal recommends

                                                value of > 50%.<br>

                                                The problem with this

                                                analysis is that the

                                                estimated serial

                                                fraction from Amdahl's

                                                Law  changes as a

                                                function<br>

                                                of problem size, so you

                                                cannot take the strong

                                                scaling from one problem

                                                and apply it to another

                                                without a<br>

                                                model of this

                                                dependence.<br>

                                                <br>

                                                Weak scaling does model

                                                changes with problem

                                                size, so I would measure

                                                weak scaling on your

                                                current<br>

                                                cluster, and extrapolate

                                                to the big machine. I

                                                realize that this does

                                                not make sense for many

                                                scientific<br>

                                                applications, but

                                                neither does requiring a

                                                certain parallel

                                                efficiency.<br>

                                              </blockquote>

                                              Ok I check the results for

                                              my weak scaling it is even

                                              worse for the expected

                                              parallel efficiency. From

                                              the formula used, it's

                                              obvious it's doing some

                                              sort of exponential

                                              extrapolation decrease. So

                                              unless I can achieve a

                                              near > 90% speed up

                                              when I double the cores

                                              and problem size for my

                                              current 48/96 cores

                                              setup,     extrapolating

                                              from about 96 nodes to

                                              10,000 nodes will give a

                                              much lower expected

                                              parallel efficiency for

                                              the new case.<br>

                                              <br>

                                              However, it's mentioned in

                                              the FAQ that due to memory

                                              requirement, it's

                                              impossible to get >90%

                                              speed when I double the

                                              cores and problem size (ie

                                              linear increase in

                                              performance), which means

                                              that I can't get >90%

                                              speed up when I double the

                                              cores and problem size for

                                              my current 48/96 cores

                                              setup. Is that so?<br>

                                            </blockquote>

                                               What is the output of

                                            -ksp_view -log_summary on

                                            the problem and then on the

                                            problem doubled in size and

                                            number of processors?<br>

                                            <br>

                                               Barry<br>

                                          </blockquote>

                                          Hi,<br>

                                          <br>

                                          I have attached the output<br>

                                          <br>

                                          48 cores: log48<br>

                                          96 cores: log96<br>

                                          <br>

                                          There are 2 solvers - The

                                          momentum linear eqn uses bcgs,

                                          while the Poisson eqn uses

                                          hypre BoomerAMG.<br>

                                          <br>

                                          Problem size doubled from

                                          158x266x150 to 158x266x300.<br>

                                          <blockquote

                                            class="gmail_quote"

                                            style="margin:0 0 0

                                            .8ex;border-left:1px #ccc

                                            solid;padding-left:1ex">

                                            <blockquote

                                              class="gmail_quote"

                                              style="margin:0 0 0

                                              .8ex;border-left:1px #ccc

                                              solid;padding-left:1ex">

                                              So is it fair to say that

                                              the main problem does not

                                              lie in my programming

                                              skills, but rather the way

                                              the linear equations are

                                              solved?<br>

                                              <br>

                                              Thanks.<br>

                                              <blockquote

                                                class="gmail_quote"

                                                style="margin:0 0 0

                                                .8ex;border-left:1px

                                                #ccc

                                                solid;padding-left:1ex">

                                                   Thanks,<br>

                                                <br>

                                                      Matt<br>

                                                Is it possible for this

                                                type of scaling in PETSc

                                                (>50%), when using

                                                17640 (2205X8) cores?<br>

                                                Btw, I do not have

                                                access to the system.<br>

                                                <br>

                                                <br>

                                                <br>

                                                Sent using CloudMagic

                                                Email<br>

                                                <br>

                                                <br>

                                                <br>

                                                -- <br>

                                                What most experimenters

                                                take for granted before

                                                they begin their

                                                experiments is

                                                infinitely more

                                                interesting than any

                                                results to which their

                                                experiments lead.<br>

                                                -- Norbert Wiener<br>

                                              </blockquote>

                                            </blockquote>

                                          </blockquote>

<log48.txt><log96.txt><br>

                                        </blockquote>

                                      </blockquote>

<log48_10.txt><log48.txt><log96.txt><br>

                                    </blockquote>

                                  </blockquote>

<log96_100.txt><log48_100.txt><br>

                                </blockquote>

                              </blockquote>

<log96_100_2.txt><log48_100_2.txt><br>

                            </blockquote>

                          </blockquote>

                          <log64_100.txt><log8_100.txt><br>

                        </blockquote>

                      </blockquote>

                    </blockquote>

                  </blockquote>

                  <br>

                </div>

              </div>

            </blockquote>

          </div>

          <br>

          <br clear="all">

          <div><br>

          </div>

          -- <br>

          <div class="gmail_signature">What most experimenters take for

            granted before they begin their experiments is infinitely

            more interesting than any results to which their experiments

            lead.<br>

            -- Norbert Wiener</div>

        </div>

      </div>

    </blockquote>

    <br>

  </body>

</html>