<div dir="ltr">Hi Frank,<br><br><div class="gmail_extra"><br><div class="gmail_quote">On 11 July 2016 at 19:14, frank <span dir="ltr"><<a href="mailto:hengjiew@uci.edu" target="_blank">hengjiew@uci.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

  <div bgcolor="#FFFFFF" text="#000000">

    Hi Dave,<br>

    <br>

    I re-run the test using bjacobi as the preconditioner on the coarse

    mesh of telescope. The Grid is 3072*256*768 and process mesh is

    96*8*24. The petsc option file is attached.<br>

    I still got the "Out Of Memory" error. The error occurred before the

    linear solver finished one step. So I don't have the full info from

    ksp_view. The info from ksp_view_pre is attached.</div></blockquote><div><br></div><div>Okay - that is essentially useless (sorry)<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"> <br>

    It seems to me that the error occurred when the decomposition was

    going to be changed.<br></div></blockquote><div><br></div><div>Based on what information?<br></div><div>Running with -info would give us more clues, but will create a ton of output.<br></div><div>Please try running the case which failed with -info<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">

    I had another test with a grid of 1536*128*384 and the same process

    mesh as above. There was no error. The ksp_view info is attached for

    comparison.<br>

    Thank you.</div></blockquote><div><br></div><div><br>[3] Here is my crude estimate of your memory usage. <br>I'll target the

 biggest memory hogs only to get an order of magnitude estimate<br><br><div>* The Fine grid operator contains 4223139840 non-zeros --> 1.8 GB per MPI rank assuming double precision.<br></div><div>The indices for the AIJ could amount to another 0.3 GB (assuming 32 bit integers)<br></div><div><br>* You use 5 levels of coarsening, so the other operators should represent (collectively)  <br>2.1 / 8 + 2.1/8^2 + 2.1/8^3 + 2.1/8^4  ~ 300 MB per MPI rank on the communicator with 18432 ranks.<br></div><div>The coarse grid should consume ~ 0.5 MB per MPI rank on the communicator with 18432 ranks.</div><div><br>*

 You use a reduction factor of 64, making the new communicator with 288 

MPI ranks. <br>PCTelescope will first gather a temporary matrix associated 

with your coarse level operator assuming a comm size of 288 living on 

the comm with size 18432. <br>This matrix will require approximately 0.5 * 

64 = 32 MB per core on the 288 ranks. <br>This matrix is then used to form a

 new MPIAIJ matrix on the subcomm, thus require another 32 MB per rank. 

<br>The temporary matrix is now destroyed.<br></div><div><br>* Because a DMDA is

 detected, a permutation matrix is assembled. <br>This requires 2 doubles 

per point in the DMDA. <br>Your coarse DMDA contains 92 x 16 x 48 points. <br>Thus the permutation matrix will require < 1 MB per MPI rank on the sub-comm.<br><br></div><div>* Lastly, the matrix is permuted. This uses MatPtAP(), but the resulting operator will have the same memory footprint as the unpermuted matrix (32 MB). At any stage in PCTelescope, only 2 operators of size 32 MB are held in memory when the DMDA is provided.<br></div><div><br></div><div>From my rough estimates, the worst case memory foot print for any given core, given your options is approximately <br></div><div>2100 MB + 300 MB + 32 MB + 32 MB + 1 MB  = 2465 MB<br></div><div>This is way below 8 GB.<br><br>Note

 this estimate completely ignores:<br>(1) the memory required for the restriction operator, <br>(2) the potential growth in the number of non-zeros per row due 

to Galerkin coarsening (I wished -ksp_view_pre reported the output from MatView so we could see the number of non-zeros required by the coarse level operators)<br></div><div>(3) all temporary vectors required by the CG solver, and those required by the smoothers.<br></div><div>(4) internal memory allocated by MatPtAP<br></div><div>(5) memory associated with IS's used within PCTelescope<br></div><div><br></div>So either I am completely off in my estimates, or you have not carefully estimated the memory usage of your application code. Hopefully others might examine/correct my rough estimates<br></div><div><div><br>Since I don't have your code I cannot access the latter.<br>Since I don't have access to the same machine you are running on, I think we need to take a step back.<br></div><br>[1] What machine are you running on? Send me a URL if its available<br></div><div><br>[2] What discretization are you using? (I am guessing a scalar 7 point FD stencil)<br></div><div>If it's a 7 point FD stencil, we should be able to examine the memory usage of your solver configuration using a standard, light weight existing PETSc example, run on your machine at the same scale. <br></div><div>This would hopefully enable us to correctly evaluate the actual memory usage required by the solver configuration you are using.<br></div><div><br></div><div>Thanks,<br></div><div>  Dave<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><span><font color="#888888"><br>

    <br>

    Frank</font></span><div><div><br>

    <br>

    <br>

    <br>

    <div>On 07/08/2016 10:38 PM, Dave May wrote:<br>

    </div>

    <blockquote type="cite"><br>

      <br>

      On Saturday, 9 July 2016, frank <<a>hengjiew@uci.edu</a>> wrote:<br>

      <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Barry and

        Dave,<br>

        <br>

        Thank both of you for the advice.<br>

        <br>

        @Barry<br>

        I made a mistake in the file names in last email. I attached the

        correct files this time.<br>

        For all the three tests, 'Telescope' is used as the coarse

        preconditioner.<br>

        <br>

        == Test1:   Grid: 1536*128*384,   Process Mesh: 48*4*12<br>

        Part of the memory usage:  Vector   125            124 3971904 

           0.<br>

                                                     Matrix   101 101   

          9462372     0<br>

        <br>

        == Test2: Grid: 1536*128*384,   Process Mesh: 96*8*24<br>

        Part of the memory usage:  Vector   125            124 681672   

         0.<br>

                                                     Matrix   101 101   

          1462180     0.<br>

        <br>

        In theory, the memory usage in Test1 should be 8 times of Test2.

        In my case, it is about 6 times.<br>

        <br>

        == Test3: Grid: 3072*256*768,   Process Mesh: 96*8*24.

        Sub-domain per process: 32*32*32<br>

        Here I get the out of memory error.<br>

        <br>

        I tried to use -mg_coarse jacobi. In this way, I don't need to

        set -mg_coarse_ksp_type and -mg_coarse_pc_type explicitly,

        right?<br>

        The linear solver didn't work in this case. Petsc output some

        errors.<br>

        <br>

        @Dave<br>

        In test3, I use only one instance of 'Telescope'. On the coarse

        mesh of 'Telescope', I used LU as the preconditioner instead of

        SVD.<br>

        If my set the levels correctly, then on the last coarse mesh of

        MG where it calls 'Telescope', the sub-domain per process is

        2*2*2.<br>

        On the last coarse mesh of 'Telescope', there is only one grid

        point per process.<br>

        I still got the OOM error. The detailed petsc option file is

        attached.</blockquote>

      <div><br>

      </div>

      <div>Do you understand the expected memory usage for the

        particular parallel LU implementation you are using? I don't

        (seriously). Replace LU with bjacobi and re-run this test. My

        point about solver debugging is still valid. </div>

      <div><br>

      </div>

      <div>And please send the result of KSPView so we can see what is

        actually used in the computations</div>

      <div><br>

      </div>

      <div>Thanks</div>

      <div>  Dave</div>

      <div> </div>

      <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

        <br>

        <br>

        Thank you so much.<br>

        <br>

        Frank<br>

        <br>

        <br>

        <br>

        On 07/06/2016 02:51 PM, Barry Smith wrote:<br>

        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

            On Jul 6, 2016, at 4:19 PM, frank <<a></a><a href="mailto:hengjiew@uci.edu" target="_blank">hengjiew@uci.edu</a>> wrote:<br>

            <br>

            Hi Barry,<br>

            <br>

            Thank you for you advice.<br>

            I tried three test. In the 1st test, the grid is

            3072*256*768 and the process mesh is 96*8*24.<br>

            The linear solver is 'cg' the preconditioner is 'mg' and

            'telescope' is used as the preconditioner at the coarse

            mesh.<br>

            The system gives me the "Out of Memory" error before the

            linear system is completely solved.<br>

            The info from '-ksp_view_pre' is attached. I seems to me

            that the error occurs when it reaches the coarse mesh.<br>

            <br>

            The 2nd test uses a grid of 1536*128*384 and process mesh is

            96*8*24. The 3rd test uses the same grid but a different

            process mesh 48*4*12.<br>

          </blockquote>

              Are you sure this is right? The total matrix and vector

          memory usage goes from 2nd test<br>

                         Vector   384            383      8,193,712   

           0.<br>

                         Matrix   103            103     11,508,688   

           0.<br>

          to 3rd test<br>

                        Vector   384            383      1,590,520   

           0.<br>

                         Matrix   103            103      3,508,664   

           0.<br>

          that is the memory usage got smaller but if you have only

          1/8th the processes and the same grid it should have gotten

          about 8 times bigger. Did you maybe cut the grid by a factor

          of 8 also? If so that still doesn't explain it because the

          memory usage changed by a factor of 5 something for the

          vectors and 3 something for the matrices.<br>

          <br>

          <br>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

            The linear solver and petsc options in 2nd and 3rd tests are

            the same in 1st test. The linear solver works fine in both

            test.<br>

            I attached the memory usage of the 2nd and 3rd tests. The

            memory info is from the option '-log_summary'. I tried to

            use '-momery_info' as you suggested, but in my case petsc

            treated it as an unused option. It output nothing about the

            memory. Do I need to add sth to my code so I can use

            '-memory_info'?<br>

          </blockquote>

              Sorry, my mistake the option is -memory_view<br>

          <br>

             Can you run the one case with -memory_view and -mg_coarse

          jacobi -ksp_max_it 1 (just so it doesn't iterate forever) to

          see how much memory is used without the telescope? Also run

          case 2 the same way.<br>

          <br>

             Barry<br>

          <br>

          <br>

          <br>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

            In both tests the memory usage is not large.<br>

            <br>

            It seems to me that it might be the 'telescope' 

            preconditioner that allocated a lot of memory and caused the

            error in the 1st test.<br>

            Is there is a way to show how much memory it allocated?<br>

            <br>

            Frank<br>

            <br>

            On 07/05/2016 03:37 PM, Barry Smith wrote:<br>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                 Frank,<br>

              <br>

                   You can run with -ksp_view_pre to have it "view" the

              KSP before the solve so hopefully it gets that far.<br>

              <br>

                    Please run the problem that does fit with

              -memory_info when the problem completes it will show the

              "high water mark" for PETSc allocated memory and total

              memory used. We first want to look at these numbers to see

              if it is using more memory than you expect. You could also

              run with say half the grid spacing to see how the memory

              usage scaled with the increase in grid points. Make the

              runs also with -log_view and send all the output from

              these options.<br>

              <br>

                  Barry<br>

              <br>

              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                On Jul 5, 2016, at 5:23 PM, frank <<a></a><a href="mailto:hengjiew@uci.edu" target="_blank">hengjiew@uci.edu</a>> wrote:<br>

                <br>

                Hi,<br>

                <br>

                I am using the CG ksp solver and Multigrid

                preconditioner  to solve a linear system in parallel.<br>

                I chose to use the 'Telescope' as the preconditioner on

                the coarse mesh for its good performance.<br>

                The petsc options file is attached.<br>

                <br>

                The domain is a 3d box.<br>

                It works well when the grid is  1536*128*384 and the

                process mesh is 96*8*24. When I double the size of grid

                and keep the same process mesh and petsc options, I get

                an "out of memory" error from the super-cluster I am

                using.<br>

                Each process has access to at least 8G memory, which

                should be more than enough for my application. I am sure

                that all the other parts of my code( except the linear

                solver ) do not use much memory. So I doubt if there is

                something wrong with the linear solver.<br>

                The error occurs before the linear system is completely

                solved so I don't have the info from ksp view. I am not

                able to re-produce the error with a smaller problem

                either.<br>

                In addition,  I tried to use the block jacobi as the

                preconditioner with the same grid and same

                decomposition. The linear solver runs extremely slow but

                there is no memory error.<br>

                <br>

                How can I diagnose what exactly cause the error?<br>

                Thank you so much.<br>

                <br>

                Frank<br>

                <petsc_options.txt><br>

              </blockquote>

            </blockquote>

<ksp_view_pre.txt><memory_test2.txt><memory_test3.txt><petsc_options.txt><br>

          </blockquote>

        </blockquote>

        <br>

      </blockquote>

    </blockquote>

    <br>

  </div></div></div>

</blockquote></div><br></div></div>