<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <br>
    <div class="moz-cite-prefix">On 3/11/2015 9:01 PM, Matthew Knepley
      wrote:<br>
    </div>
    <blockquote
cite="mid:CAMYG4GnKGhFuSokeczdFFaBhMWhhgkES03OAUgFjO2ETJb4LHA@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">On Tue, Nov 3, 2015 at 6:58 AM, TAY
            wee-beng <span dir="ltr"><<a moz-do-not-send="true"
                href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000"> <br>
                <div>On 3/11/2015 8:52 PM, Matthew Knepley wrote:<br>
                </div>
                <blockquote type="cite">
                  <div dir="ltr">
                    <div class="gmail_extra">
                      <div class="gmail_quote">On Tue, Nov 3, 2015 at
                        6:49 AM, TAY wee-beng <span dir="ltr"><<a
                            moz-do-not-send="true"
                            href="mailto:zonexo@gmail.com"
                            target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:zonexo@gmail.com">zonexo@gmail.com</a></a>></span>
                        wrote:<br>
                        <blockquote class="gmail_quote"
                          style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Hi,<br>
                          <br>
                          I tried and have attached the log.<br>
                          <br>
                          Ya, my Poisson eqn has Neumann boundary
                          condition. Do I need to specify some null
                          space stuff?  Like KSPSetNullSpace or
                          MatNullSpaceCreate?</blockquote>
                        <div><br>
                        </div>
                        <div>Yes, you need to attach the constant null
                          space to the matrix.</div>
                        <div><br>
                        </div>
                        <div>  Thanks,</div>
                        <div><br>
                        </div>
                        <div>     Matt</div>
                      </div>
                    </div>
                  </div>
                </blockquote>
                Ok so can you point me to a suitable example so that I
                know which one to use specifically?<br>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div><a moz-do-not-send="true"
href="https://bitbucket.org/petsc/petsc/src/9ae8fd060698c4d6fc0d13188aca8a1828c138ab/src/snes/examples/tutorials/ex12.c?at=master&fileviewer=file-view-default#ex12.c-761">https://bitbucket.org/petsc/petsc/src/9ae8fd060698c4d6fc0d13188aca8a1828c138ab/src/snes/examples/tutorials/ex12.c?at=master&fileviewer=file-view-default#ex12.c-761</a><br>
            </div>
            <div><br>
            </div>
            <div>  Matt</div>
          </div>
        </div>
      </div>
    </blockquote>
    Hi,<br>
    <br>
    Actually, I realised that for my Poisson eqn, I have neumann and
    dirichlet BC. Dirichlet BC is at the output grids by specifying
    pressure = 0. So do I still need the null space?<br>
    <br>
    My Poisson eqn LHS is fixed but RHS is changing with every timestep.<br>
    <br>
    If I need to use null space, how do I know if the null space
    contains the constant vector and what the the no. of vectors? I
    follow the example given and added:<br>
    <br>
    call
    MatNullSpaceCreate(MPI_COMM_WORLD,PETSC_TRUE,0,NULL,nullsp,ierr)<br>
        <br>
        call MatSetNullSpace(A,nullsp,ierr)<br>
        <br>
        call MatNullSpaceDestroy(nullsp,ierr)<br>
    <br>
    Is that all?<br>
    <br>
    Before this, I was using HYPRE geometric solver and the matrix /
    vector in the subroutine was written based on HYPRE. It worked
    pretty well and fast.<br>
    <br>
    However, it's a black box and it's hard to diagnose problems.<br>
    <br>
    I always had the PETSc subroutine to solve my Poisson eqn but I used
    KSPBCGS or KSPGMRES with HYPRE's boomeramg as the PC. It worked but
    was slow. <br>
    <br>
    Matt: Thanks, I will see how it goes using the nullspace and may try
    "<i style="color:rgb(0,0,0);white-space:pre-wrap">-mg_coarse_pc_type svd</i>"
    later.<br>
    <blockquote
cite="mid:CAMYG4GnKGhFuSokeczdFFaBhMWhhgkES03OAUgFjO2ETJb4LHA@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000"> Thanks.<br>
                <blockquote type="cite">
                  <div dir="ltr">
                    <div class="gmail_extra">
                      <div class="gmail_quote">
                        <div> </div>
                        <blockquote class="gmail_quote"
                          style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span><br>
                            Thank you<br>
                            <br>
                            Yours sincerely,<br>
                            <br>
                            TAY wee-beng<br>
                            <br>
                          </span>
                          <div>
                            <div> On 3/11/2015 12:45 PM, Barry Smith
                              wrote:<br>
                              <blockquote class="gmail_quote"
                                style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                                <blockquote class="gmail_quote"
                                  style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                                  On Nov 2, 2015, at 10:37 PM, TAY
                                  wee-beng<<a moz-do-not-send="true"
                                    href="mailto:zonexo@gmail.com"
                                    target="_blank">zonexo@gmail.com</a>> 

                                  wrote:<br>
                                  <br>
                                  Hi,<br>
                                  <br>
                                  I tried :<br>
                                  <br>
                                  1. -poisson_pc_gamg_agg_nsmooths 1
                                  -poisson_pc_type gamg<br>
                                  <br>
                                  2. -poisson_pc_type gamg<br>
                                </blockquote>
                                    Run with
                                -poisson_ksp_monitor_true_residual
                                -poisson_ksp_monitor_converged_reason<br>
                                Does your poisson have Neumann boundary
                                conditions? Do you have any zeros on the
                                diagonal for the matrix (you shouldn't).<br>
                                <br>
                                   There may be something wrong with
                                your poisson discretization that was
                                also messing up hypre<br>
                                <br>
                                <br>
                                <br>
                                <blockquote class="gmail_quote"
                                  style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                                  Both options give:<br>
                                  <br>
                                      1      0.00150000      0.00000000 
                                      0.00000000 1.00000000           
                                   NaN             NaN             NaN<br>
                                  M Diverged but why?, time =           
                                  2<br>
                                  reason =           -9<br>
                                  <br>
                                  How can I check what's wrong?<br>
                                  <br>
                                  Thank you<br>
                                  <br>
                                  Yours sincerely,<br>
                                  <br>
                                  TAY wee-beng<br>
                                  <br>
                                  On 3/11/2015 3:18 AM, Barry Smith
                                  wrote:<br>
                                  <blockquote class="gmail_quote"
                                    style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                                        hypre is just not scaling well
                                    here. I do not know why. Since hypre
                                    is a block box for us there is no
                                    way to determine why the poor
                                    scaling.<br>
                                    <br>
                                        If you make the same two runs
                                    with -pc_type gamg there will be a
                                    lot more information in the log
                                    summary about in what routines it is
                                    scaling well or poorly.<br>
                                    <br>
                                       Barry<br>
                                    <br>
                                    <br>
                                    <br>
                                    <blockquote class="gmail_quote"
                                      style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                                      On Nov 2, 2015, at 3:17 AM, TAY
                                      wee-beng<<a
                                        moz-do-not-send="true"
                                        href="mailto:zonexo@gmail.com"
                                        target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:zonexo@gmail.com">zonexo@gmail.com</a></a>> 

                                      wrote:<br>
                                      <br>
                                      Hi,<br>
                                      <br>
                                      I have attached the 2 files.<br>
                                      <br>
                                      Thank you<br>
                                      <br>
                                      Yours sincerely,<br>
                                      <br>
                                      TAY wee-beng<br>
                                      <br>
                                      On 2/11/2015 2:55 PM, Barry Smith
                                      wrote:<br>
                                      <blockquote class="gmail_quote"
                                        style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                                           Run (158/2)x(266/2)x(150/2)
                                        grid on 8 processes  and then
                                        (158)x(266)x(150) on 64
                                        processors  and send the two
                                        -log_summary results<br>
                                        <br>
                                           Barry<br>
                                        <br>
                                          <br>
                                        <blockquote class="gmail_quote"
                                          style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                                          On Nov 2, 2015, at 12:19 AM,
                                          TAY wee-beng<<a
                                            moz-do-not-send="true"
                                            href="mailto:zonexo@gmail.com"
                                            target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:zonexo@gmail.com">zonexo@gmail.com</a></a>> 

                                          wrote:<br>
                                          <br>
                                          Hi,<br>
                                          <br>
                                          I have attached the new
                                          results.<br>
                                          <br>
                                          Thank you<br>
                                          <br>
                                          Yours sincerely,<br>
                                          <br>
                                          TAY wee-beng<br>
                                          <br>
                                          On 2/11/2015 12:27 PM, Barry
                                          Smith wrote:<br>
                                          <blockquote
                                            class="gmail_quote"
                                            style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                                               Run without the
                                            -momentum_ksp_view
                                            -poisson_ksp_view and send
                                            the new results<br>
                                            <br>
                                            <br>
                                               You can see from the log
                                            summary that the PCSetUp is
                                            taking a much smaller
                                            percentage of the time
                                            meaning that it is reusing
                                            the preconditioner and not
                                            rebuilding it each time.<br>
                                            <br>
                                            Barry<br>
                                            <br>
                                               Something makes no sense
                                            with the output: it gives<br>
                                            <br>
                                            KSPSolve             199 1.0
                                            2.3298e+03 1.0 5.20e+09 1.8
                                            3.8e+04 9.9e+05 5.0e+02
                                            90100 66100 24  90100 66100
                                            24   165<br>
                                            <br>
                                            90% of the time is in the
                                            solve but there is no
                                            significant amount of time
                                            in other events of the code
                                            which is just not possible.
                                            I hope it is due to your IO.<br>
                                            <br>
                                            <br>
                                            <br>
                                            <blockquote
                                              class="gmail_quote"
                                              style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                                              On Nov 1, 2015, at 10:02
                                              PM, TAY wee-beng<<a
                                                moz-do-not-send="true"
                                                href="mailto:zonexo@gmail.com"
                                                target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:zonexo@gmail.com">zonexo@gmail.com</a></a>> 

                                              wrote:<br>
                                              <br>
                                              Hi,<br>
                                              <br>
                                              I have attached the new
                                              run with 100 time steps
                                              for 48 and 96 cores.<br>
                                              <br>
                                              Only the Poisson eqn 's
                                              RHS changes, the LHS
                                              doesn't. So if I want to
                                              reuse the preconditioner,
                                              what must I do? Or what
                                              must I not do?<br>
                                              <br>
                                              Why does the number of
                                              processes increase so
                                              much? Is there something
                                              wrong with my coding?
                                              Seems to be so too for my
                                              new run.<br>
                                              <br>
                                              Thank you<br>
                                              <br>
                                              Yours sincerely,<br>
                                              <br>
                                              TAY wee-beng<br>
                                              <br>
                                              On 2/11/2015 9:49 AM,
                                              Barry Smith wrote:<br>
                                              <blockquote
                                                class="gmail_quote"
                                                style="margin:0px 0px
                                                0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                                                   If you are doing many
                                                time steps with the same
                                                linear solver then you
                                                MUST do your weak
                                                scaling studies with
                                                MANY time steps since
                                                the setup time of AMG
                                                only takes place in the
                                                first stimestep. So run
                                                both 48 and 96 processes
                                                with the same large
                                                number of time steps.<br>
                                                <br>
                                                   Barry<br>
                                                <br>
                                                <br>
                                                <br>
                                                <blockquote
                                                  class="gmail_quote"
                                                  style="margin:0px 0px
                                                  0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                                                  On Nov 1, 2015, at
                                                  7:35 PM, TAY
                                                  wee-beng<<a
                                                    moz-do-not-send="true"
href="mailto:zonexo@gmail.com" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:zonexo@gmail.com">zonexo@gmail.com</a></a>> 
                                                  wrote:<br>
                                                  <br>
                                                  Hi,<br>
                                                  <br>
                                                  Sorry I forgot and use
                                                  the old a.out. I have
                                                  attached the new log
                                                  for 48cores (log48),
                                                  together with the
                                                  96cores log (log96).<br>
                                                  <br>
                                                  Why does the number of
                                                  processes increase so
                                                  much? Is there
                                                  something wrong with
                                                  my coding?<br>
                                                  <br>
                                                  Only the Poisson eqn
                                                  's RHS changes, the
                                                  LHS doesn't. So if I
                                                  want to reuse the
                                                  preconditioner, what
                                                  must I do? Or what
                                                  must I not do?<br>
                                                  <br>
                                                  Lastly, I only
                                                  simulated 2 time steps
                                                  previously. Now I run
                                                  for 10 timesteps
                                                  (log48_10). Is it
                                                  building the
                                                  preconditioner at
                                                  every timestep?<br>
                                                  <br>
                                                  Also, what about
                                                  momentum eqn? Is it
                                                  working well?<br>
                                                  <br>
                                                  I will try the gamg
                                                  later too.<br>
                                                  <br>
                                                  Thank you<br>
                                                  <br>
                                                  Yours sincerely,<br>
                                                  <br>
                                                  TAY wee-beng<br>
                                                  <br>
                                                  On 2/11/2015 12:30 AM,
                                                  Barry Smith wrote:<br>
                                                  <blockquote
                                                    class="gmail_quote"
                                                    style="margin:0px
                                                    0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                                                       You used gmres
                                                    with 48 processes
                                                    but richardson with
                                                    96. You need to be
                                                    careful and make
                                                    sure you don't
                                                    change the solvers
                                                    when you change the
                                                    number of processors
                                                    since you can get
                                                    very different
                                                    inconsistent results<br>
                                                    <br>
                                                        Anyways all the
                                                    time is being spent
                                                    in the BoomerAMG
                                                    algebraic multigrid
                                                    setup and it is is
                                                    scaling badly. When
                                                    you double the
                                                    problem size and
                                                    number of processes
                                                    it went from
                                                    3.2445e+01 to
                                                    4.3599e+02 seconds.<br>
                                                    <br>
                                                    PCSetUp             
                                                      3 1.0 3.2445e+01
                                                    1.0 9.58e+06 2.0
                                                    0.0e+00 0.0e+00
                                                    4.0e+00 62  8  0  0 
                                                    4  62  8  0  0  5   
                                                    11<br>
                                                    <br>
                                                    PCSetUp             
                                                      3 1.0 4.3599e+02
                                                    1.0 9.58e+06 2.0
                                                    0.0e+00 0.0e+00
                                                    4.0e+00 85 18  0  0 
                                                    6  85 18  0  0  6   
                                                     2<br>
                                                    <br>
                                                       Now is the
                                                    Poisson problem
                                                    changing at each
                                                    timestep or can you
                                                    use the same
                                                    preconditioner built
                                                    with BoomerAMG for
                                                    all the time steps?
                                                    Algebraic multigrid
                                                    has a large set up
                                                    time that you often
                                                    doesn't matter if
                                                    you have many time
                                                    steps but if you
                                                    have to rebuild it
                                                    each timestep it is
                                                    too large?<br>
                                                    <br>
                                                       You might also
                                                    try -pc_type gamg
                                                    and see how PETSc's
                                                    algebraic multigrid
                                                    scales for your
                                                    problem/machine.<br>
                                                    <br>
                                                       Barry<br>
                                                    <br>
                                                    <br>
                                                    <br>
                                                    <blockquote
                                                      class="gmail_quote"
                                                      style="margin:0px
                                                      0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                                                      On Nov 1, 2015, at
                                                      7:30 AM, TAY
                                                      wee-beng<<a
                                                        moz-do-not-send="true"
href="mailto:zonexo@gmail.com" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:zonexo@gmail.com">zonexo@gmail.com</a></a>> 
                                                      wrote:<br>
                                                      <br>
                                                      <br>
                                                      On 1/11/2015 10:00
                                                      AM, Barry Smith
                                                      wrote:<br>
                                                      <blockquote
                                                        class="gmail_quote"
                                                        style="margin:0px
                                                        0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                                                        <blockquote
                                                          class="gmail_quote"
                                                          style="margin:0px
                                                          0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                                                          On Oct 31,
                                                          2015, at 8:43
                                                          PM, TAY
                                                          wee-beng<<a
moz-do-not-send="true" href="mailto:zonexo@gmail.com" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:zonexo@gmail.com">zonexo@gmail.com</a></a>> 

                                                          wrote:<br>
                                                          <br>
                                                          <br>
                                                          On 1/11/2015
                                                          12:47 AM,
                                                          Matthew
                                                          Knepley wrote:<br>
                                                          <blockquote
                                                          class="gmail_quote"
                                                          style="margin:0px
                                                          0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                                                          On Sat, Oct
                                                          31, 2015 at
                                                          11:34 AM, TAY
                                                          wee-beng<<a
moz-do-not-send="true" href="mailto:zonexo@gmail.com" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:zonexo@gmail.com">zonexo@gmail.com</a></a>> 

                                                          wrote:<br>
                                                          Hi,<br>
                                                          <br>
                                                          I understand
                                                          that as
                                                          mentioned in
                                                          the faq, due
                                                          to the
                                                          limitations in
                                                          memory, the
                                                          scaling is not
                                                          linear. So, I
                                                          am trying to
                                                          write a
                                                          proposal to
                                                          use a
                                                          supercomputer.<br>
                                                          Its specs are:<br>
                                                          Compute nodes:
                                                          82,944 nodes
                                                          (SPARC64
                                                          VIIIfx; 16GB
                                                          of memory per
                                                          node)<br>
                                                          <br>
                                                          8 cores /
                                                          processor<br>
                                                          Interconnect:
                                                          Tofu
                                                          (6-dimensional
                                                          mesh/torus)
                                                          Interconnect<br>
                                                          Each cabinet
                                                          contains 96
                                                          computing
                                                          nodes,<br>
                                                          One of the
                                                          requirement is
                                                          to give the
                                                          performance of
                                                          my current
                                                          code with my
                                                          current set of
                                                          data, and
                                                          there is a
                                                          formula to
                                                          calculate the
                                                          estimated
                                                          parallel
                                                          efficiency
                                                          when using the
                                                          new large set
                                                          of data<br>
                                                          There are 2
                                                          ways to give
                                                          performance:<br>
                                                          1. Strong
                                                          scaling, which
                                                          is defined as
                                                          how the
                                                          elapsed time
                                                          varies with
                                                          the number of
                                                          processors for
                                                          a fixed<br>
                                                          problem.<br>
                                                          2. Weak
                                                          scaling, which
                                                          is defined as
                                                          how the
                                                          elapsed time
                                                          varies with
                                                          the number of
                                                          processors for
                                                          a<br>
                                                          fixed problem
                                                          size per
                                                          processor.<br>
                                                          I ran my cases
                                                          with 48 and 96
                                                          cores with my
                                                          current
                                                          cluster,
                                                          giving 140 and
                                                          90 mins
                                                          respectively.
                                                          This is
                                                          classified as
                                                          strong
                                                          scaling.<br>
                                                          Cluster specs:<br>
                                                          CPU: AMD 6234
                                                          2.4GHz<br>
                                                          8 cores /
                                                          processor
                                                          (CPU)<br>
                                                          6 CPU / node<br>
                                                          So 48 Cores /
                                                          CPU<br>
                                                          Not sure abt
                                                          the memory /
                                                          node<br>
                                                          <br>
                                                          The parallel
                                                          efficiency
                                                          ‘En’ for a
                                                          given degree
                                                          of parallelism
                                                          ‘n’ indicates
                                                          how much the
                                                          program is<br>
                                                          efficiently
                                                          accelerated by
                                                          parallel
                                                          processing.
                                                          ‘En’ is given
                                                          by the
                                                          following
                                                          formulae.
                                                          Although their<br>
                                                          derivation
                                                          processes are
                                                          different
                                                          depending on
                                                          strong and
                                                          weak scaling,
                                                          derived
                                                          formulae are
                                                          the<br>
                                                          same.<br>
                                                           From the
                                                          estimated
                                                          time, my
                                                          parallel
                                                          efficiency
                                                          using 
                                                          Amdahl's law
                                                          on the current
                                                          old cluster
                                                          was 52.7%.<br>
                                                          So is my
                                                          results
                                                          acceptable?<br>
                                                          For the large
                                                          data set, if
                                                          using 2205
                                                          nodes
                                                          (2205X8cores),
                                                          my expected
                                                          parallel
                                                          efficiency is
                                                          only 0.5%. The
                                                          proposal
                                                          recommends
                                                          value of >
                                                          50%.<br>
                                                          The problem
                                                          with this
                                                          analysis is
                                                          that the
                                                          estimated
                                                          serial
                                                          fraction from
                                                          Amdahl's Law 
                                                          changes as a
                                                          function<br>
                                                          of problem
                                                          size, so you
                                                          cannot take
                                                          the strong
                                                          scaling from
                                                          one problem
                                                          and apply it
                                                          to another
                                                          without a<br>
                                                          model of this
                                                          dependence.<br>
                                                          <br>
                                                          Weak scaling
                                                          does model
                                                          changes with
                                                          problem size,
                                                          so I would
                                                          measure weak
                                                          scaling on
                                                          your current<br>
                                                          cluster, and
                                                          extrapolate to
                                                          the big
                                                          machine. I
                                                          realize that
                                                          this does not
                                                          make sense for
                                                          many
                                                          scientific<br>
                                                          applications,
                                                          but neither
                                                          does requiring
                                                          a certain
                                                          parallel
                                                          efficiency.<br>
                                                          </blockquote>
                                                          Ok I check the
                                                          results for my
                                                          weak scaling
                                                          it is even
                                                          worse for the
                                                          expected
                                                          parallel
                                                          efficiency.
                                                          From the
                                                          formula used,
                                                          it's obvious
                                                          it's doing
                                                          some sort of
                                                          exponential
                                                          extrapolation
                                                          decrease. So
                                                          unless I can
                                                          achieve a near
                                                          > 90% speed
                                                          up when I
                                                          double the
                                                          cores and
                                                          problem size
                                                          for my current
                                                          48/96 cores
                                                          setup,   
                                                           extrapolating
                                                          from about 96
                                                          nodes to
                                                          10,000 nodes
                                                          will give a
                                                          much lower
                                                          expected
                                                          parallel
                                                          efficiency for
                                                          the new case.<br>
                                                          <br>
                                                          However, it's
                                                          mentioned in
                                                          the FAQ that
                                                          due to memory
                                                          requirement,
                                                          it's
                                                          impossible to
                                                          get >90%
                                                          speed when I
                                                          double the
                                                          cores and
                                                          problem size
                                                          (ie linear
                                                          increase in
                                                          performance),
                                                          which means
                                                          that I can't
                                                          get >90%
                                                          speed up when
                                                          I double the
                                                          cores and
                                                          problem size
                                                          for my current
                                                          48/96 cores
                                                          setup. Is that
                                                          so?<br>
                                                        </blockquote>
                                                           What is the
                                                        output of
                                                        -ksp_view
                                                        -log_summary on
                                                        the problem and
                                                        then on the
                                                        problem doubled
                                                        in size and
                                                        number of
                                                        processors?<br>
                                                        <br>
                                                           Barry<br>
                                                      </blockquote>
                                                      Hi,<br>
                                                      <br>
                                                      I have attached
                                                      the output<br>
                                                      <br>
                                                      48 cores: log48<br>
                                                      96 cores: log96<br>
                                                      <br>
                                                      There are 2
                                                      solvers - The
                                                      momentum linear
                                                      eqn uses bcgs,
                                                      while the Poisson
                                                      eqn uses hypre
                                                      BoomerAMG.<br>
                                                      <br>
                                                      Problem size
                                                      doubled from
                                                      158x266x150 to
                                                      158x266x300.<br>
                                                      <blockquote
                                                        class="gmail_quote"
                                                        style="margin:0px
                                                        0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                                                        <blockquote
                                                          class="gmail_quote"
                                                          style="margin:0px
                                                          0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                                                          So is it fair
                                                          to say that
                                                          the main
                                                          problem does
                                                          not lie in my
                                                          programming
                                                          skills, but
                                                          rather the way
                                                          the linear
                                                          equations are
                                                          solved?<br>
                                                          <br>
                                                          Thanks.<br>
                                                          <blockquote
                                                          class="gmail_quote"
                                                          style="margin:0px
                                                          0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                                                             Thanks,<br>
                                                          <br>
                                                                Matt<br>
                                                          Is it possible
                                                          for this type
                                                          of scaling in
                                                          PETSc
                                                          (>50%),
                                                          when using
                                                          17640 (2205X8)
                                                          cores?<br>
                                                          Btw, I do not
                                                          have access to
                                                          the system.<br>
                                                          <br>
                                                          <br>
                                                          <br>
                                                          Sent using
                                                          CloudMagic
                                                          Email<br>
                                                          <br>
                                                          <br>
                                                          <br>
                                                          -- <br>
                                                          What most
                                                          experimenters
                                                          take for
                                                          granted before
                                                          they begin
                                                          their
                                                          experiments is
                                                          infinitely
                                                          more
                                                          interesting
                                                          than any
                                                          results to
                                                          which their
                                                          experiments
                                                          lead.<br>
                                                          -- Norbert
                                                          Wiener<br>
                                                          </blockquote>
                                                        </blockquote>
                                                      </blockquote>
<log48.txt><log96.txt><br>
                                                    </blockquote>
                                                  </blockquote>
<log48_10.txt><log48.txt><log96.txt><br>
                                                </blockquote>
                                              </blockquote>
<log96_100.txt><log48_100.txt><br>
                                            </blockquote>
                                          </blockquote>
<log96_100_2.txt><log48_100_2.txt><br>
                                        </blockquote>
                                      </blockquote>
<log64_100.txt><log8_100.txt><br>
                                    </blockquote>
                                  </blockquote>
                                </blockquote>
                              </blockquote>
                              <br>
                            </div>
                          </div>
                        </blockquote>
                      </div>
                      <br>
                      <br clear="all">
                      <span class=""><font color="#888888">
                          <div><br>
                          </div>
                          -- <br>
                          <div>What most experimenters take for granted
                            before they begin their experiments is
                            infinitely more interesting than any results
                            to which their experiments lead.<br>
                            -- Norbert Wiener</div>
                        </font></span></div>
                  </div>
                </blockquote>
                <br>
              </div>
            </blockquote>
          </div>
          <br>
          <br clear="all">
          <div><br>
          </div>
          -- <br>
          <div class="gmail_signature">What most experimenters take for
            granted before they begin their experiments is infinitely
            more interesting than any results to which their experiments
            lead.<br>
            -- Norbert Wiener</div>
        </div>
      </div>
    </blockquote>
    <br>
  </body>
</html>