<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <br>
    <div class="moz-cite-prefix">On 3/11/2015 8:52 PM, Matthew Knepley
      wrote:<br>
    </div>
    <blockquote
cite="mid:CAMYG4GkiiAtJaJUm-uotK1BsaTsO1V=_eQfGPCqHLybbEUcLTw@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">On Tue, Nov 3, 2015 at 6:49 AM, TAY
            wee-beng <span dir="ltr"><<a moz-do-not-send="true"
                href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>
              <br>
              I tried and have attached the log.<br>
              <br>
              Ya, my Poisson eqn has Neumann boundary condition. Do I
              need to specify some null space stuff?  Like
              KSPSetNullSpace or MatNullSpaceCreate?</blockquote>
            <div><br>
            </div>
            <div>Yes, you need to attach the constant null space to the
              matrix.</div>
            <div><br>
            </div>
            <div>  Thanks,</div>
            <div><br>
            </div>
            <div>     Matt</div>
          </div>
        </div>
      </div>
    </blockquote>
    Ok so can you point me to a suitable example so that I know which
    one to use specifically?<br>
    <br>
    Thanks.<br>
    <blockquote
cite="mid:CAMYG4GkiiAtJaJUm-uotK1BsaTsO1V=_eQfGPCqHLybbEUcLTw@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex"><span
                class="im HOEnZb"><br>
                Thank you<br>
                <br>
                Yours sincerely,<br>
                <br>
                TAY wee-beng<br>
                <br>
              </span>
              <div class="HOEnZb">
                <div class="h5">
                  On 3/11/2015 12:45 PM, Barry Smith wrote:<br>
                  <blockquote class="gmail_quote" style="margin:0 0 0
                    .8ex;border-left:1px #ccc solid;padding-left:1ex">
                    <blockquote class="gmail_quote" style="margin:0 0 0
                      .8ex;border-left:1px #ccc solid;padding-left:1ex">
                      On Nov 2, 2015, at 10:37 PM, TAY wee-beng<<a
                        moz-do-not-send="true"
                        href="mailto:zonexo@gmail.com" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:zonexo@gmail.com">zonexo@gmail.com</a></a>> 
                      wrote:<br>
                      <br>
                      Hi,<br>
                      <br>
                      I tried :<br>
                      <br>
                      1. -poisson_pc_gamg_agg_nsmooths 1
                      -poisson_pc_type gamg<br>
                      <br>
                      2. -poisson_pc_type gamg<br>
                    </blockquote>
                        Run with -poisson_ksp_monitor_true_residual
                    -poisson_ksp_monitor_converged_reason<br>
                    Does your poisson have Neumann boundary conditions?
                    Do you have any zeros on the diagonal for the matrix
                    (you shouldn't).<br>
                    <br>
                       There may be something wrong with your poisson
                    discretization that was also messing up hypre<br>
                    <br>
                    <br>
                    <br>
                    <blockquote class="gmail_quote" style="margin:0 0 0
                      .8ex;border-left:1px #ccc solid;padding-left:1ex">
                      Both options give:<br>
                      <br>
                          1      0.00150000      0.00000000     
                      0.00000000 1.00000000             NaN           
                       NaN             NaN<br>
                      M Diverged but why?, time =            2<br>
                      reason =           -9<br>
                      <br>
                      How can I check what's wrong?<br>
                      <br>
                      Thank you<br>
                      <br>
                      Yours sincerely,<br>
                      <br>
                      TAY wee-beng<br>
                      <br>
                      On 3/11/2015 3:18 AM, Barry Smith wrote:<br>
                      <blockquote class="gmail_quote" style="margin:0 0
                        0 .8ex;border-left:1px #ccc
                        solid;padding-left:1ex">
                            hypre is just not scaling well here. I do
                        not know why. Since hypre is a block box for us
                        there is no way to determine why the poor
                        scaling.<br>
                        <br>
                            If you make the same two runs with -pc_type
                        gamg there will be a lot more information in the
                        log summary about in what routines it is scaling
                        well or poorly.<br>
                        <br>
                           Barry<br>
                        <br>
                        <br>
                        <br>
                        <blockquote class="gmail_quote" style="margin:0
                          0 0 .8ex;border-left:1px #ccc
                          solid;padding-left:1ex">
                          On Nov 2, 2015, at 3:17 AM, TAY wee-beng<<a
                            moz-do-not-send="true"
                            href="mailto:zonexo@gmail.com"
                            target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:zonexo@gmail.com">zonexo@gmail.com</a></a>> 
                          wrote:<br>
                          <br>
                          Hi,<br>
                          <br>
                          I have attached the 2 files.<br>
                          <br>
                          Thank you<br>
                          <br>
                          Yours sincerely,<br>
                          <br>
                          TAY wee-beng<br>
                          <br>
                          On 2/11/2015 2:55 PM, Barry Smith wrote:<br>
                          <blockquote class="gmail_quote"
                            style="margin:0 0 0 .8ex;border-left:1px
                            #ccc solid;padding-left:1ex">
                               Run (158/2)x(266/2)x(150/2) grid on 8
                            processes  and then (158)x(266)x(150) on 64
                            processors  and send the two -log_summary
                            results<br>
                            <br>
                               Barry<br>
                            <br>
                              <br>
                            <blockquote class="gmail_quote"
                              style="margin:0 0 0 .8ex;border-left:1px
                              #ccc solid;padding-left:1ex">
                              On Nov 2, 2015, at 12:19 AM, TAY
                              wee-beng<<a moz-do-not-send="true"
                                href="mailto:zonexo@gmail.com"
                                target="_blank">zonexo@gmail.com</a>> 
                              wrote:<br>
                              <br>
                              Hi,<br>
                              <br>
                              I have attached the new results.<br>
                              <br>
                              Thank you<br>
                              <br>
                              Yours sincerely,<br>
                              <br>
                              TAY wee-beng<br>
                              <br>
                              On 2/11/2015 12:27 PM, Barry Smith wrote:<br>
                              <blockquote class="gmail_quote"
                                style="margin:0 0 0 .8ex;border-left:1px
                                #ccc solid;padding-left:1ex">
                                   Run without the -momentum_ksp_view
                                -poisson_ksp_view and send the new
                                results<br>
                                <br>
                                <br>
                                   You can see from the log summary that
                                the PCSetUp is taking a much smaller
                                percentage of the time meaning that it
                                is reusing the preconditioner and not
                                rebuilding it each time.<br>
                                <br>
                                Barry<br>
                                <br>
                                   Something makes no sense with the
                                output: it gives<br>
                                <br>
                                KSPSolve             199 1.0 2.3298e+03
                                1.0 5.20e+09 1.8 3.8e+04 9.9e+05 5.0e+02
                                90100 66100 24  90100 66100 24   165<br>
                                <br>
                                90% of the time is in the solve but
                                there is no significant amount of time
                                in other events of the code which is
                                just not possible. I hope it is due to
                                your IO.<br>
                                <br>
                                <br>
                                <br>
                                <blockquote class="gmail_quote"
                                  style="margin:0 0 0
                                  .8ex;border-left:1px #ccc
                                  solid;padding-left:1ex">
                                  On Nov 1, 2015, at 10:02 PM, TAY
                                  wee-beng<<a moz-do-not-send="true"
                                    href="mailto:zonexo@gmail.com"
                                    target="_blank">zonexo@gmail.com</a>> 
                                  wrote:<br>
                                  <br>
                                  Hi,<br>
                                  <br>
                                  I have attached the new run with 100
                                  time steps for 48 and 96 cores.<br>
                                  <br>
                                  Only the Poisson eqn 's RHS changes,
                                  the LHS doesn't. So if I want to reuse
                                  the preconditioner, what must I do? Or
                                  what must I not do?<br>
                                  <br>
                                  Why does the number of processes
                                  increase so much? Is there something
                                  wrong with my coding? Seems to be so
                                  too for my new run.<br>
                                  <br>
                                  Thank you<br>
                                  <br>
                                  Yours sincerely,<br>
                                  <br>
                                  TAY wee-beng<br>
                                  <br>
                                  On 2/11/2015 9:49 AM, Barry Smith
                                  wrote:<br>
                                  <blockquote class="gmail_quote"
                                    style="margin:0 0 0
                                    .8ex;border-left:1px #ccc
                                    solid;padding-left:1ex">
                                       If you are doing many time steps
                                    with the same linear solver then you
                                    MUST do your weak scaling studies
                                    with MANY time steps since the setup
                                    time of AMG only takes place in the
                                    first stimestep. So run both 48 and
                                    96 processes with the same large
                                    number of time steps.<br>
                                    <br>
                                       Barry<br>
                                    <br>
                                    <br>
                                    <br>
                                    <blockquote class="gmail_quote"
                                      style="margin:0 0 0
                                      .8ex;border-left:1px #ccc
                                      solid;padding-left:1ex">
                                      On Nov 1, 2015, at 7:35 PM, TAY
                                      wee-beng<<a
                                        moz-do-not-send="true"
                                        href="mailto:zonexo@gmail.com"
                                        target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:zonexo@gmail.com">zonexo@gmail.com</a></a>> 
                                      wrote:<br>
                                      <br>
                                      Hi,<br>
                                      <br>
                                      Sorry I forgot and use the old
                                      a.out. I have attached the new log
                                      for 48cores (log48), together with
                                      the 96cores log (log96).<br>
                                      <br>
                                      Why does the number of processes
                                      increase so much? Is there
                                      something wrong with my coding?<br>
                                      <br>
                                      Only the Poisson eqn 's RHS
                                      changes, the LHS doesn't. So if I
                                      want to reuse the preconditioner,
                                      what must I do? Or what must I not
                                      do?<br>
                                      <br>
                                      Lastly, I only simulated 2 time
                                      steps previously. Now I run for 10
                                      timesteps (log48_10). Is it
                                      building the preconditioner at
                                      every timestep?<br>
                                      <br>
                                      Also, what about momentum eqn? Is
                                      it working well?<br>
                                      <br>
                                      I will try the gamg later too.<br>
                                      <br>
                                      Thank you<br>
                                      <br>
                                      Yours sincerely,<br>
                                      <br>
                                      TAY wee-beng<br>
                                      <br>
                                      On 2/11/2015 12:30 AM, Barry Smith
                                      wrote:<br>
                                      <blockquote class="gmail_quote"
                                        style="margin:0 0 0
                                        .8ex;border-left:1px #ccc
                                        solid;padding-left:1ex">
                                           You used gmres with 48
                                        processes but richardson with
                                        96. You need to be careful and
                                        make sure you don't change the
                                        solvers when you change the
                                        number of processors since you
                                        can get very different
                                        inconsistent results<br>
                                        <br>
                                            Anyways all the time is
                                        being spent in the BoomerAMG
                                        algebraic multigrid setup and it
                                        is is scaling badly. When you
                                        double the problem size and
                                        number of processes it went from
                                        3.2445e+01 to 4.3599e+02
                                        seconds.<br>
                                        <br>
                                        PCSetUp                3 1.0
                                        3.2445e+01 1.0 9.58e+06 2.0
                                        0.0e+00 0.0e+00 4.0e+00 62  8 
                                        0  0  4  62  8  0  0  5    11<br>
                                        <br>
                                        PCSetUp                3 1.0
                                        4.3599e+02 1.0 9.58e+06 2.0
                                        0.0e+00 0.0e+00 4.0e+00 85 18 
                                        0  0  6  85 18  0  0  6     2<br>
                                        <br>
                                           Now is the Poisson problem
                                        changing at each timestep or can
                                        you use the same preconditioner
                                        built with BoomerAMG for all the
                                        time steps? Algebraic multigrid
                                        has a large set up time that you
                                        often doesn't matter if you have
                                        many time steps but if you have
                                        to rebuild it each timestep it
                                        is too large?<br>
                                        <br>
                                           You might also try -pc_type
                                        gamg and see how PETSc's
                                        algebraic multigrid scales for
                                        your problem/machine.<br>
                                        <br>
                                           Barry<br>
                                        <br>
                                        <br>
                                        <br>
                                        <blockquote class="gmail_quote"
                                          style="margin:0 0 0
                                          .8ex;border-left:1px #ccc
                                          solid;padding-left:1ex">
                                          On Nov 1, 2015, at 7:30 AM,
                                          TAY wee-beng<<a
                                            moz-do-not-send="true"
                                            href="mailto:zonexo@gmail.com"
                                            target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:zonexo@gmail.com">zonexo@gmail.com</a></a>> 
                                          wrote:<br>
                                          <br>
                                          <br>
                                          On 1/11/2015 10:00 AM, Barry
                                          Smith wrote:<br>
                                          <blockquote
                                            class="gmail_quote"
                                            style="margin:0 0 0
                                            .8ex;border-left:1px #ccc
                                            solid;padding-left:1ex">
                                            <blockquote
                                              class="gmail_quote"
                                              style="margin:0 0 0
                                              .8ex;border-left:1px #ccc
                                              solid;padding-left:1ex">
                                              On Oct 31, 2015, at 8:43
                                              PM, TAY wee-beng<<a
                                                moz-do-not-send="true"
                                                href="mailto:zonexo@gmail.com"
                                                target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:zonexo@gmail.com">zonexo@gmail.com</a></a>> 
                                              wrote:<br>
                                              <br>
                                              <br>
                                              On 1/11/2015 12:47 AM,
                                              Matthew Knepley wrote:<br>
                                              <blockquote
                                                class="gmail_quote"
                                                style="margin:0 0 0
                                                .8ex;border-left:1px
                                                #ccc
                                                solid;padding-left:1ex">
                                                On Sat, Oct 31, 2015 at
                                                11:34 AM, TAY
                                                wee-beng<<a
                                                  moz-do-not-send="true"
href="mailto:zonexo@gmail.com" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:zonexo@gmail.com">zonexo@gmail.com</a></a>> 
                                                wrote:<br>
                                                Hi,<br>
                                                <br>
                                                I understand that as
                                                mentioned in the faq,
                                                due to the limitations
                                                in memory, the scaling
                                                is not linear. So, I am
                                                trying to write a
                                                proposal to use a
                                                supercomputer.<br>
                                                Its specs are:<br>
                                                Compute nodes: 82,944
                                                nodes (SPARC64 VIIIfx;
                                                16GB of memory per node)<br>
                                                <br>
                                                8 cores / processor<br>
                                                Interconnect: Tofu
                                                (6-dimensional
                                                mesh/torus) Interconnect<br>
                                                Each cabinet contains 96
                                                computing nodes,<br>
                                                One of the requirement
                                                is to give the
                                                performance of my
                                                current code with my
                                                current set of data, and
                                                there is a formula to
                                                calculate the estimated
                                                parallel efficiency when
                                                using the new large set
                                                of data<br>
                                                There are 2 ways to give
                                                performance:<br>
                                                1. Strong scaling, which
                                                is defined as how the
                                                elapsed time varies with
                                                the number of processors
                                                for a fixed<br>
                                                problem.<br>
                                                2. Weak scaling, which
                                                is defined as how the
                                                elapsed time varies with
                                                the number of processors
                                                for a<br>
                                                fixed problem size per
                                                processor.<br>
                                                I ran my cases with 48
                                                and 96 cores with my
                                                current cluster, giving
                                                140 and 90 mins
                                                respectively. This is
                                                classified as strong
                                                scaling.<br>
                                                Cluster specs:<br>
                                                CPU: AMD 6234 2.4GHz<br>
                                                8 cores / processor
                                                (CPU)<br>
                                                6 CPU / node<br>
                                                So 48 Cores / CPU<br>
                                                Not sure abt the memory
                                                / node<br>
                                                <br>
                                                The parallel efficiency
                                                ‘En’ for a given degree
                                                of parallelism ‘n’
                                                indicates how much the
                                                program is<br>
                                                efficiently accelerated
                                                by parallel processing.
                                                ‘En’ is given by the
                                                following formulae.
                                                Although their<br>
                                                derivation processes are
                                                different depending on
                                                strong and weak scaling,
                                                derived formulae are the<br>
                                                same.<br>
                                                 From the estimated
                                                time, my parallel
                                                efficiency using 
                                                Amdahl's law on the
                                                current old cluster was
                                                52.7%.<br>
                                                So is my results
                                                acceptable?<br>
                                                For the large data set,
                                                if using 2205 nodes
                                                (2205X8cores), my
                                                expected parallel
                                                efficiency is only 0.5%.
                                                The proposal recommends
                                                value of > 50%.<br>
                                                The problem with this
                                                analysis is that the
                                                estimated serial
                                                fraction from Amdahl's
                                                Law  changes as a
                                                function<br>
                                                of problem size, so you
                                                cannot take the strong
                                                scaling from one problem
                                                and apply it to another
                                                without a<br>
                                                model of this
                                                dependence.<br>
                                                <br>
                                                Weak scaling does model
                                                changes with problem
                                                size, so I would measure
                                                weak scaling on your
                                                current<br>
                                                cluster, and extrapolate
                                                to the big machine. I
                                                realize that this does
                                                not make sense for many
                                                scientific<br>
                                                applications, but
                                                neither does requiring a
                                                certain parallel
                                                efficiency.<br>
                                              </blockquote>
                                              Ok I check the results for
                                              my weak scaling it is even
                                              worse for the expected
                                              parallel efficiency. From
                                              the formula used, it's
                                              obvious it's doing some
                                              sort of exponential
                                              extrapolation decrease. So
                                              unless I can achieve a
                                              near > 90% speed up
                                              when I double the cores
                                              and problem size for my
                                              current 48/96 cores
                                              setup,     extrapolating
                                              from about 96 nodes to
                                              10,000 nodes will give a
                                              much lower expected
                                              parallel efficiency for
                                              the new case.<br>
                                              <br>
                                              However, it's mentioned in
                                              the FAQ that due to memory
                                              requirement, it's
                                              impossible to get >90%
                                              speed when I double the
                                              cores and problem size (ie
                                              linear increase in
                                              performance), which means
                                              that I can't get >90%
                                              speed up when I double the
                                              cores and problem size for
                                              my current 48/96 cores
                                              setup. Is that so?<br>
                                            </blockquote>
                                               What is the output of
                                            -ksp_view -log_summary on
                                            the problem and then on the
                                            problem doubled in size and
                                            number of processors?<br>
                                            <br>
                                               Barry<br>
                                          </blockquote>
                                          Hi,<br>
                                          <br>
                                          I have attached the output<br>
                                          <br>
                                          48 cores: log48<br>
                                          96 cores: log96<br>
                                          <br>
                                          There are 2 solvers - The
                                          momentum linear eqn uses bcgs,
                                          while the Poisson eqn uses
                                          hypre BoomerAMG.<br>
                                          <br>
                                          Problem size doubled from
                                          158x266x150 to 158x266x300.<br>
                                          <blockquote
                                            class="gmail_quote"
                                            style="margin:0 0 0
                                            .8ex;border-left:1px #ccc
                                            solid;padding-left:1ex">
                                            <blockquote
                                              class="gmail_quote"
                                              style="margin:0 0 0
                                              .8ex;border-left:1px #ccc
                                              solid;padding-left:1ex">
                                              So is it fair to say that
                                              the main problem does not
                                              lie in my programming
                                              skills, but rather the way
                                              the linear equations are
                                              solved?<br>
                                              <br>
                                              Thanks.<br>
                                              <blockquote
                                                class="gmail_quote"
                                                style="margin:0 0 0
                                                .8ex;border-left:1px
                                                #ccc
                                                solid;padding-left:1ex">
                                                   Thanks,<br>
                                                <br>
                                                      Matt<br>
                                                Is it possible for this
                                                type of scaling in PETSc
                                                (>50%), when using
                                                17640 (2205X8) cores?<br>
                                                Btw, I do not have
                                                access to the system.<br>
                                                <br>
                                                <br>
                                                <br>
                                                Sent using CloudMagic
                                                Email<br>
                                                <br>
                                                <br>
                                                <br>
                                                -- <br>
                                                What most experimenters
                                                take for granted before
                                                they begin their
                                                experiments is
                                                infinitely more
                                                interesting than any
                                                results to which their
                                                experiments lead.<br>
                                                -- Norbert Wiener<br>
                                              </blockquote>
                                            </blockquote>
                                          </blockquote>
<log48.txt><log96.txt><br>
                                        </blockquote>
                                      </blockquote>
<log48_10.txt><log48.txt><log96.txt><br>
                                    </blockquote>
                                  </blockquote>
<log96_100.txt><log48_100.txt><br>
                                </blockquote>
                              </blockquote>
<log96_100_2.txt><log48_100_2.txt><br>
                            </blockquote>
                          </blockquote>
                          <log64_100.txt><log8_100.txt><br>
                        </blockquote>
                      </blockquote>
                    </blockquote>
                  </blockquote>
                  <br>
                </div>
              </div>
            </blockquote>
          </div>
          <br>
          <br clear="all">
          <div><br>
          </div>
          -- <br>
          <div class="gmail_signature">What most experimenters take for
            granted before they begin their experiments is infinitely
            more interesting than any results to which their experiments
            lead.<br>
            -- Norbert Wiener</div>
        </div>
      </div>
    </blockquote>
    <br>
  </body>
</html>