[petsc-users] Performance of the Telescope Multigrid Preconditioner
frank
hengjiew at uci.edu
Tue Oct 4 14:14:39 CDT 2016
Hi,
I attached two ksp_view for the two grid sizes. The major difference
between the ksp solver in those runs is the number of MG levels. Except
that, the ksp_view are quite similar.
I also attached the log_view for all the eight runs. Hope it would not
be too messy.
Thank you.
Frank
On 10/04/2016 11:36 AM, Barry Smith wrote:
> -ksp_view in both cases?
>
>> On Oct 4, 2016, at 1:13 PM, frank <hengjiew at uci.edu> wrote:
>>
>> Hi,
>>
>> This question is follow-up of the thread "Question about memory usage in Multigrid preconditioner".
>> I used to have the "Out of Memory(OOM)" problem when using the CG+Telescope MG solver with 32768 cores. Adding the "-matrap 0; -matptap_scalable" option did solve that problem.
>>
>> Then I test the scalability by solving a 3d poisson eqn for 1 step. I used one sub-communicator in all the tests. The difference between the petsc options in those tests are: 1 the pc_telescope_reduction_factor; 2 the number of multigrid levels in the up/down solver. The function "ksp_solve" is timed. It is kind of slow and doesn't scale at all.
>>
>> Test1: 512^3 grid points
>> Core# telescope_reduction_factor MG levels# for up/down solver Time for KSPSolve (s)
>> 512 8 4 / 3 6.2466
>> 4096 64 5 / 3 0.9361
>> 32768 64 4 / 3 4.8914
>>
>> Test2: 1024^3 grid points
>> Core# telescope_reduction_factor MG levels# for up/down solver Time for KSPSolve (s)
>> 4096 64 5 / 4 3.4139
>> 8192 128 5 / 4 2.4196
>> 16384 32 5 / 3 5.4150
>> 32768 64 5 / 3 5.6067
>> 65536 128 5 / 3 6.5219
>>
>> I guess I didn't set the MG levels properly. What would be the efficient way to arrange the MG levels?
>> Also which preconditionr at the coarse mesh of the 2nd communicator should I use to improve the performance?
>>
>> I attached the test code and the petsc options file for the 1024^3 cube with 32768 cores.
>>
>> Thank you.
>>
>> Regards,
>> Frank
>>
>>
>>
>>
>>
>>
>> On 09/15/2016 03:35 AM, Dave May wrote:
>>> HI all,
>>>
>>> I the only unexpected memory usage I can see is associated with the call to MatPtAP().
>>> Here is something you can try immediately.
>>> Run your code with the additional options
>>> -matrap 0 -matptap_scalable
>>>
>>> I didn't realize this before, but the default behaviour of MatPtAP in parallel is actually to to explicitly form the transpose of P (e.g. assemble R = P^T) and then compute R.A.P.
>>> You don't want to do this. The option -matrap 0 resolves this issue.
>>>
>>> The implementation of P^T.A.P has two variants.
>>> The scalable implementation (with respect to memory usage) is selected via the second option -matptap_scalable.
>>>
>>> Try it out - I see a significant memory reduction using these options for particular mesh sizes / partitions.
>>>
>>> I've attached a cleaned up version of the code you sent me.
>>> There were a number of memory leaks and other issues.
>>> The main points being
>>> * You should call DMDAVecGetArrayF90() before VecAssembly{Begin,End}
>>> * You should call PetscFinalize(), otherwise the option -log_summary (-log_view) will not display anything once the program has completed.
>>>
>>>
>>> Thanks,
>>> Dave
>>>
>>>
>>> On 15 September 2016 at 08:03, Hengjie Wang <hengjiew at uci.edu> wrote:
>>> Hi Dave,
>>>
>>> Sorry, I should have put more comment to explain the code.
>>> The number of process in each dimension is the same: Px = Py=Pz=P. So is the domain size.
>>> So if the you want to run the code for a 512^3 grid points on 16^3 cores, you need to set "-N 512 -P 16" in the command line.
>>> I add more comments and also fix an error in the attached code. ( The error only effects the accuracy of solution but not the memory usage. )
>>>
>>> Thank you.
>>> Frank
>>>
>>>
>>> On 9/14/2016 9:05 PM, Dave May wrote:
>>>>
>>>> On Thursday, 15 September 2016, Dave May <dave.mayhem23 at gmail.com> wrote:
>>>>
>>>>
>>>> On Thursday, 15 September 2016, frank <hengjiew at uci.edu> wrote:
>>>> Hi,
>>>>
>>>> I write a simple code to re-produce the error. I hope this can help to diagnose the problem.
>>>> The code just solves a 3d poisson equation.
>>>>
>>>> Why is the stencil width a runtime parameter?? And why is the default value 2? For 7-pnt FD Laplace, you only need a stencil width of 1.
>>>>
>>>> Was this choice made to mimic something in the real application code?
>>>>
>>>> Please ignore - I misunderstood your usage of the param set by -P
>>>>
>>>>
>>>>
>>>> I run the code on a 1024^3 mesh. The process partition is 32 * 32 * 32. That's when I re-produce the OOM error. Each core has about 2G memory.
>>>> I also run the code on a 512^3 mesh with 16 * 16 * 16 processes. The ksp solver works fine.
>>>> I attached the code, ksp_view_pre's output and my petsc option file.
>>>>
>>>> Thank you.
>>>> Frank
>>>>
>>>> On 09/09/2016 06:38 PM, Hengjie Wang wrote:
>>>>> Hi Barry,
>>>>>
>>>>> I checked. On the supercomputer, I had the option "-ksp_view_pre" but it is not in file I sent you. I am sorry for the confusion.
>>>>>
>>>>> Regards,
>>>>> Frank
>>>>>
>>>>> On Friday, September 9, 2016, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>>>
>>>>>> On Sep 9, 2016, at 3:11 PM, frank <hengjiew at uci.edu> wrote:
>>>>>>
>>>>>> Hi Barry,
>>>>>>
>>>>>> I think the first KSP view output is from -ksp_view_pre. Before I submitted the test, I was not sure whether there would be OOM error or not. So I added both -ksp_view_pre and -ksp_view.
>>>>> But the options file you sent specifically does NOT list the -ksp_view_pre so how could it be from that?
>>>>>
>>>>> Sorry to be pedantic but I've spent too much time in the past trying to debug from incorrect information and want to make sure that the information I have is correct before thinking. Please recheck exactly what happened. Rerun with the exact input file you emailed if that is needed.
>>>>>
>>>>> Barry
>>>>>
>>>>>> Frank
>>>>>>
>>>>>>
>>>>>> On 09/09/2016 12:38 PM, Barry Smith wrote:
>>>>>>> Why does ksp_view2.txt have two KSP views in it while ksp_view1.txt has only one KSPView in it? Did you run two different solves in the 2 case but not the one?
>>>>>>>
>>>>>>> Barry
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> On Sep 9, 2016, at 10:56 AM, frank <hengjiew at uci.edu> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I want to continue digging into the memory problem here.
>>>>>>>> I did find a work around in the past, which is to use less cores per node so that each core has 8G memory. However this is deficient and expensive. I hope to locate the place that uses the most memory.
>>>>>>>>
>>>>>>>> Here is a brief summary of the tests I did in past:
>>>>>>>>> Test1: Mesh 1536*128*384 | Process Mesh 48*4*12
>>>>>>>> Maximum (over computational time) process memory: total 7.0727e+08
>>>>>>>> Current process memory: total 7.0727e+08
>>>>>>>> Maximum (over computational time) space PetscMalloc()ed: total 6.3908e+11
>>>>>>>> Current space PetscMalloc()ed: total 1.8275e+09
>>>>>>>>
>>>>>>>>> Test2: Mesh 1536*128*384 | Process Mesh 96*8*24
>>>>>>>> Maximum (over computational time) process memory: total 5.9431e+09
>>>>>>>> Current process memory: total 5.9431e+09
>>>>>>>> Maximum (over computational time) space PetscMalloc()ed: total 5.3202e+12
>>>>>>>> Current space PetscMalloc()ed: total 5.4844e+09
>>>>>>>>
>>>>>>>>> Test3: Mesh 3072*256*768 | Process Mesh 96*8*24
>>>>>>>> OOM( Out Of Memory ) killer of the supercomputer terminated the job during "KSPSolve".
>>>>>>>>
>>>>>>>> I attached the output of ksp_view( the third test's output is from ksp_view_pre ), memory_view and also the petsc options.
>>>>>>>>
>>>>>>>> In all the tests, each core can access about 2G memory. In test3, there are 4223139840 non-zeros in the matrix. This will consume about 1.74M, using double precision. Considering some extra memory used to store integer index, 2G memory should still be way enough.
>>>>>>>>
>>>>>>>> Is there a way to find out which part of KSPSolve uses the most memory?
>>>>>>>> Thank you so much.
>>>>>>>>
>>>>>>>> BTW, there are 4 options remains unused and I don't understand why they are omitted:
>>>>>>>> -mg_coarse_telescope_mg_coarse_ksp_type value: preonly
>>>>>>>> -mg_coarse_telescope_mg_coarse_pc_type value: bjacobi
>>>>>>>> -mg_coarse_telescope_mg_levels_ksp_max_it value: 1
>>>>>>>> -mg_coarse_telescope_mg_levels_ksp_type value: richardson
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Frank
>>>>>>>>
>>>>>>>> On 07/13/2016 05:47 PM, Dave May wrote:
>>>>>>>>> On 14 July 2016 at 01:07, frank <hengjiew at uci.edu> wrote:
>>>>>>>>> Hi Dave,
>>>>>>>>>
>>>>>>>>> Sorry for the late reply.
>>>>>>>>> Thank you so much for your detailed reply.
>>>>>>>>>
>>>>>>>>> I have a question about the estimation of the memory usage. There are 4223139840 allocated non-zeros and 18432 MPI processes. Double precision is used. So the memory per process is:
>>>>>>>>> 4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ?
>>>>>>>>> Did I do sth wrong here? Because this seems too small.
>>>>>>>>>
>>>>>>>>> No - I totally f***ed it up. You are correct. That'll teach me for fumbling around with my iphone calculator and not using my brain. (Note that to convert to MB just divide by 1e6, not 1024^2 - although I apparently cannot convert between units correctly....)
>>>>>>>>>
>>>>>>>>> From the PETSc objects associated with the solver, It looks like it _should_ run with 2GB per MPI rank. Sorry for my mistake. Possibilities are: somewhere in your usage of PETSc you've introduced a memory leak; PETSc is doing a huge over allocation (e.g. as per our discussion of MatPtAP); or in your application code there are other objects you have forgotten to log the memory for.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I am running this job on Bluewater
>>>>>>>>> I am using the 7 points FD stencil in 3D.
>>>>>>>>>
>>>>>>>>> I thought so on both counts.
>>>>>>>>>
>>>>>>>>> I apologize that I made a stupid mistake in computing the memory per core. My settings render each core can access only 2G memory on average instead of 8G which I mentioned in previous email. I re-run the job with 8G memory per core on average and there is no "Out Of Memory" error. I would do more test to see if there is still some memory issue.
>>>>>>>>>
>>>>>>>>> Ok. I'd still like to know where the memory was being used since my estimates were off.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Dave
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Frank
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 07/11/2016 01:18 PM, Dave May wrote:
>>>>>>>>>> Hi Frank,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 11 July 2016 at 19:14, frank <hengjiew at uci.edu> wrote:
>>>>>>>>>> Hi Dave,
>>>>>>>>>>
>>>>>>>>>> I re-run the test using bjacobi as the preconditioner on the coarse mesh of telescope. The Grid is 3072*256*768 and process mesh is 96*8*24. The petsc option file is attached.
>>>>>>>>>> I still got the "Out Of Memory" error. The error occurred before the linear solver finished one step. So I don't have the full info from ksp_view. The info from ksp_view_pre is attached.
>>>>>>>>>>
>>>>>>>>>> Okay - that is essentially useless (sorry)
>>>>>>>>>>
>>>>>>>>>> It seems to me that the error occurred when the decomposition was going to be changed.
>>>>>>>>>>
>>>>>>>>>> Based on what information?
>>>>>>>>>> Running with -info would give us more clues, but will create a ton of output.
>>>>>>>>>> Please try running the case which failed with -info
>>>>>>>>>> I had another test with a grid of 1536*128*384 and the same process mesh as above. There was no error. The ksp_view info is attached for comparison.
>>>>>>>>>> Thank you.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [3] Here is my crude estimate of your memory usage.
>>>>>>>>>> I'll target the biggest memory hogs only to get an order of magnitude estimate
>>>>>>>>>>
>>>>>>>>>> * The Fine grid operator contains 4223139840 non-zeros --> 1.8 GB per MPI rank assuming double precision.
>>>>>>>>>> The indices for the AIJ could amount to another 0.3 GB (assuming 32 bit integers)
>>>>>>>>>>
>>>>>>>>>> * You use 5 levels of coarsening, so the other operators should represent (collectively)
>>>>>>>>>> 2.1 / 8 + 2.1/8^2 + 2.1/8^3 + 2.1/8^4 ~ 300 MB per MPI rank on the communicator with 18432 ranks.
>>>>>>>>>> The coarse grid should consume ~ 0.5 MB per MPI rank on the communicator with 18432 ranks.
>>>>>>>>>>
>>>>>>>>>> * You use a reduction factor of 64, making the new communicator with 288 MPI ranks.
>>>>>>>>>> PCTelescope will first gather a temporary matrix associated with your coarse level operator assuming a comm size of 288 living on the comm with size 18432.
>>>>>>>>>> This matrix will require approximately 0.5 * 64 = 32 MB per core on the 288 ranks.
>>>>>>>>>> This matrix is then used to form a new MPIAIJ matrix on the subcomm, thus require another 32 MB per rank.
>>>>>>>>>> The temporary matrix is now destroyed.
>>>>>>>>>>
>>>>>>>>>> * Because a DMDA is detected, a permutation matrix is assembled.
>>>>>>>>>> This requires 2 doubles per point in the DMDA.
>>>>>>>>>> Your coarse DMDA contains 92 x 16 x 48 points.
>>>>>>>>>> Thus the permutation matrix will require < 1 MB per MPI rank on the sub-comm.
>>>>>>>>>>
>>>>>>>>>> * Lastly, the matrix is permuted. This uses MatPtAP(), but the resulting operator will have the same memory footprint as the unpermuted matrix (32 MB). At any stage in PCTelescope, only 2 operators of size 32 MB are held in memory when the DMDA is provided.
>>>>>>>>>>
>>>>>>>>>> From my rough estimates, the worst case memory foot print for any given core, given your options is approximately
>>>>>>>>>> 2100 MB + 300 MB + 32 MB + 32 MB + 1 MB = 2465 MB
>>>>>>>>>> This is way below 8 GB.
>>>>>>>>>>
>>>>>>>>>> Note this estimate completely ignores:
>>>>>>>>>> (1) the memory required for the restriction operator,
>>>>>>>>>> (2) the potential growth in the number of non-zeros per row due to Galerkin coarsening (I wished -ksp_view_pre reported the output from MatView so we could see the number of non-zeros required by the coarse level operators)
>>>>>>>>>> (3) all temporary vectors required by the CG solver, and those required by the smoothers.
>>>>>>>>>> (4) internal memory allocated by MatPtAP
>>>>>>>>>> (5) memory associated with IS's used within PCTelescope
>>>>>>>>>>
>>>>>>>>>> So either I am completely off in my estimates, or you have not carefully estimated the memory usage of your application code. Hopefully others might examine/correct my rough estimates
>>>>>>>>>>
>>>>>>>>>> Since I don't have your code I cannot access the latter.
>>>>>>>>>> Since I don't have access to the same machine you are running on, I think we need to take a step back.
>>>>>>>>>>
>>>>>>>>>> [1] What machine are you running on? Send me a URL if its available
>>>>>>>>>>
>>>>>>>>>> [2] What discretization are you using? (I am guessing a scalar 7 point FD stencil)
>>>>>>>>>> If it's a 7 point FD stencil, we should be able to examine the memory usage of your solver configuration using a standard, light weight existing PETSc example, run on your machine at the same scale.
>>>>>>>>>> This would hopefully enable us to correctly evaluate the actual memory usage required by the solver configuration you are using.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Dave
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Frank
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 07/08/2016 10:38 PM, Dave May wrote:
>>>>>>>>>>> On Saturday, 9 July 2016, frank <hengjiew at uci.edu> wrote:
>>>>>>>>>>> Hi Barry and Dave,
>>>>>>>>>>>
>>>>>>>>>>> Thank both of you for the advice.
>>>>>>>>>>>
>>>>>>>>>>> @Barry
>>>>>>>>>>> I made a mistake in the file names in last email. I attached the correct files this time.
>>>>>>>>>>> For all the three tests, 'Telescope' is used as the coarse preconditioner.
>>>>>>>>>>>
>>>>>>>>>>> == Test1: Grid: 1536*128*384, Process Mesh: 48*4*12
>>>>>>>>>>> Part of the memory usage: Vector 125 124 3971904 0.
>>>>>>>>>>> Matrix 101 101 9462372 0
>>>>>>>>>>>
>>>>>>>>>>> == Test2: Grid: 1536*128*384, Process Mesh: 96*8*24
>>>>>>>>>>> Part of the memory usage: Vector 125 124 681672 0.
>>>>>>>>>>> Matrix 101 101 1462180 0.
>>>>>>>>>>>
>>>>>>>>>>> In theory, the memory usage in Test1 should be 8 times of Test2. In my case, it is about 6 times.
>>>>>>>>>>>
>>>>>>>>>>> == Test3: Grid: 3072*256*768, Process Mesh: 96*8*24. Sub-domain per process: 32*32*32
>>>>>>>>>>> Here I get the out of memory error.
>>>>>>>>>>>
>>>>>>>>>>> I tried to use -mg_coarse jacobi. In this way, I don't need to set -mg_coarse_ksp_type and -mg_coarse_pc_type explicitly, right?
>>>>>>>>>>> The linear solver didn't work in this case. Petsc output some errors.
>>>>>>>>>>>
>>>>>>>>>>> @Dave
>>>>>>>>>>> In test3, I use only one instance of 'Telescope'. On the coarse mesh of 'Telescope', I used LU as the preconditioner instead of SVD.
>>>>>>>>>>> If my set the levels correctly, then on the last coarse mesh of MG where it calls 'Telescope', the sub-domain per process is 2*2*2.
>>>>>>>>>>> On the last coarse mesh of 'Telescope', there is only one grid point per process.
>>>>>>>>>>> I still got the OOM error. The detailed petsc option file is attached.
>>>>>>>>>>>
>>>>>>>>>>> Do you understand the expected memory usage for the particular parallel LU implementation you are using? I don't (seriously). Replace LU with bjacobi and re-run this test. My point about solver debugging is still valid.
>>>>>>>>>>>
>>>>>>>>>>> And please send the result of KSPView so we can see what is actually used in the computations
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>> Dave
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thank you so much.
>>>>>>>>>>>
>>>>>>>>>>> Frank
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 07/06/2016 02:51 PM, Barry Smith wrote:
>>>>>>>>>>> On Jul 6, 2016, at 4:19 PM, frank <hengjiew at uci.edu> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Barry,
>>>>>>>>>>>
>>>>>>>>>>> Thank you for you advice.
>>>>>>>>>>> I tried three test. In the 1st test, the grid is 3072*256*768 and the process mesh is 96*8*24.
>>>>>>>>>>> The linear solver is 'cg' the preconditioner is 'mg' and 'telescope' is used as the preconditioner at the coarse mesh.
>>>>>>>>>>> The system gives me the "Out of Memory" error before the linear system is completely solved.
>>>>>>>>>>> The info from '-ksp_view_pre' is attached. I seems to me that the error occurs when it reaches the coarse mesh.
>>>>>>>>>>>
>>>>>>>>>>> The 2nd test uses a grid of 1536*128*384 and process mesh is 96*8*24. The 3rd test uses the same grid but a different process mesh 48*4*12.
>>>>>>>>>>> Are you sure this is right? The total matrix and vector memory usage goes from 2nd test
>>>>>>>>>>> Vector 384 383 8,193,712 0.
>>>>>>>>>>> Matrix 103 103 11,508,688 0.
>>>>>>>>>>> to 3rd test
>>>>>>>>>>> Vector 384 383 1,590,520 0.
>>>>>>>>>>> Matrix 103 103 3,508,664 0.
>>>>>>>>>>> that is the memory usage got smaller but if you have only 1/8th the processes and the same grid it should have gotten about 8 times bigger. Did you maybe cut the grid by a factor of 8 also? If so that still doesn't explain it because the memory usage changed by a factor of 5 something for the vectors and 3 something for the matrices.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The linear solver and petsc options in 2nd and 3rd tests are the same in 1st test. The linear solver works fine in both test.
>>>>>>>>>>> I attached the memory usage of the 2nd and 3rd tests. The memory info is from the option '-log_summary'. I tried to use '-momery_info' as you suggested, but in my case petsc treated it as an unused option. It output nothing about the memory. Do I need to add sth to my code so I can use '-memory_info'?
>>>>>>>>>>> Sorry, my mistake the option is -memory_view
>>>>>>>>>>>
>>>>>>>>>>> Can you run the one case with -memory_view and -mg_coarse jacobi -ksp_max_it 1 (just so it doesn't iterate forever) to see how much memory is used without the telescope? Also run case 2 the same way.
>>>>>>>>>>>
>>>>>>>>>>> Barry
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> In both tests the memory usage is not large.
>>>>>>>>>>>
>>>>>>>>>>> It seems to me that it might be the 'telescope' preconditioner that allocated a lot of memory and caused the error in the 1st test.
>>>>>>>>>>> Is there is a way to show how much memory it allocated?
>>>>>>>>>>>
>>>>>>>>>>> Frank
>>>>>>>>>>>
>>>>>>>>>>> On 07/05/2016 03:37 PM, Barry Smith wrote:
>>>>>>>>>>> Frank,
>>>>>>>>>>>
>>>>>>>>>>> You can run with -ksp_view_pre to have it "view" the KSP before the solve so hopefully it gets that far.
>>>>>>>>>>>
>>>>>>>>>>> Please run the problem that does fit with -memory_info when the problem completes it will show the "high water mark" for PETSc allocated memory and total memory used. We first want to look at these numbers to see if it is using more memory than you expect. You could also run with say half the grid spacing to see how the memory usage scaled with the increase in grid points. Make the runs also with -log_view and send all the output from these options.
>>>>>>>>>>>
>>>>>>>>>>> Barry
>>>>>>>>>>>
>>>>>>>>>>> On Jul 5, 2016, at 5:23 PM, frank <hengjiew at uci.edu> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I am using the CG ksp solver and Multigrid preconditioner to solve a linear system in parallel.
>>>>>>>>>>> I chose to use the 'Telescope' as the preconditioner on the coarse mesh for its good performance.
>>>>>>>>>>> The petsc options file is attached.
>>>>>>>>>>>
>>>>>>>>>>> The domain is a 3d box.
>>>>>>>>>>> It works well when the grid is 1536*128*384 and the process mesh is 96*8*24. When I double the size of grid and keep the same process mesh and petsc options, I get an "out of memory" error from the super-cluster I am using.
>>>>>>>>>>> Each process has access to at least 8G memory, which should be more than enough for my application. I am sure that all the other parts of my code( except the linear solver ) do not use much memory. So I doubt if there is something wrong with the linear solver.
>>>>>>>>>>> The error occurs before the linear system is completely solved so I don't have the info from ksp view. I am not able to re-produce the error with a smaller problem either.
>>>>>>>>>>> In addition, I tried to use the block jacobi as the preconditioner with the same grid and same decomposition. The linear solver runs extremely slow but there is no memory error.
>>>>>>>>>>>
>>>>>>>>>>> How can I diagnose what exactly cause the error?
>>>>>>>>>>> Thank you so much.
>>>>>>>>>>>
>>>>>>>>>>> Frank
>>>>>>>>>>> <petsc_options.txt>
>>>>>>>>>>> <ksp_view_pre.txt><memory_test2.txt><memory_test3.txt><petsc_options.txt>
>>>>>>>>>>>
>>>>>>>> <ksp_view1.txt><ksp_view2.txt><ksp_view3.txt><memory1.txt><memory2.txt><petsc_options1.txt><petsc_options2.txt><petsc_options3.txt>
>>>>
>>>
>> <petsc_options_32768.txt><test_ksp.f90>
-------------- next part --------------
Linear solve converged due to CONVERGED_RTOL iterations 7
KSP Object: 4096 MPI processes
type: cg
maximum iterations=10000
tolerances: relative=1e-07, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using UNPRECONDITIONED norm type for convergence test
PC Object: 4096 MPI processes
type: mg
MG: type is MULTIPLICATIVE, levels=5 cycles=v
Cycles per PCApply=1
Using Galerkin computed coarse grid matrices
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_) 4096 MPI processes
type: preonly
maximum iterations=1, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_) 4096 MPI processes
type: telescope
Telescope: parent comm size reduction factor = 64
Telescope: comm_size = 4096 , subcomm_size = 64
Telescope: DMDA detected
DMDA Object: (repart_) 64 MPI processes
M 32 N 32 P 32 m 4 n 4 p 4 dof 1 overlap 1
KSP Object: (mg_coarse_telescope_) 64 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_telescope_) 64 MPI processes
type: mg
MG: type is MULTIPLICATIVE, levels=3 cycles=v
Cycles per PCApply=1
Using Galerkin computed coarse grid matrices
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_telescope_mg_coarse_) 64 MPI processes
type: preonly
maximum iterations=1, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_telescope_mg_coarse_) 64 MPI processes
type: redundant
Redundant preconditioner: First (color=0) of 64 PCs follows
linear system matrix = precond matrix:
Mat Object: 64 MPI processes
type: mpiaij
rows=512, cols=512
total: nonzeros=13824, allocated nonzeros=13824
total number of mallocs used during MatSetValues calls =0
using I-node (on process 0) routines: found 2 nodes, limit used is 5
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_coarse_telescope_mg_levels_1_) 64 MPI processes
type: richardson
Richardson: damping factor=1.
maximum iterations=1
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_coarse_telescope_mg_levels_1_) 64 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 64 MPI processes
type: mpiaij
rows=4096, cols=4096
total: nonzeros=110592, allocated nonzeros=110592
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 2 -------------------------------
KSP Object: (mg_coarse_telescope_mg_levels_2_) 64 MPI processes
type: richardson
Richardson: damping factor=1.
maximum iterations=1
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_coarse_telescope_mg_levels_2_) 64 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 64 MPI processes
type: mpiaij
rows=32768, cols=32768
total: nonzeros=884736, allocated nonzeros=884736
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 64 MPI processes
type: mpiaij
rows=32768, cols=32768
total: nonzeros=884736, allocated nonzeros=884736
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
KSP Object: (mg_coarse_telescope_mg_coarse_redundant_) 1 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_telescope_mg_coarse_redundant_) 1 MPI processes
type: lu
LU: out-of-place factorization
tolerance for zero pivot 2.22045e-14
using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
matrix ordering: nd
factor fill ratio given 5., needed 8.69575
Factored matrix follows:
Mat Object: 1 MPI processes
type: seqaij
rows=512, cols=512
package used to perform factorization: petsc
total: nonzeros=120210, allocated nonzeros=120210
total number of mallocs used during MatSetValues calls =0
not using I-node routines
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=512, cols=512
total: nonzeros=13824, allocated nonzeros=13824
total number of mallocs used during MatSetValues calls =0
not using I-node routines
linear system matrix = precond matrix:
Mat Object: 4096 MPI processes
type: mpiaij
rows=32768, cols=32768
total: nonzeros=884736, allocated nonzeros=884736
total number of mallocs used during MatSetValues calls =0
using I-node (on process 0) routines: found 2 nodes, limit used is 5
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_levels_1_) 4096 MPI processes
type: richardson
Richardson: damping factor=1.
maximum iterations=1
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 4096 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 4096 MPI processes
type: mpiaij
rows=262144, cols=262144
total: nonzeros=7077888, allocated nonzeros=7077888
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 2 -------------------------------
KSP Object: (mg_levels_2_) 4096 MPI processes
type: richardson
Richardson: damping factor=1.
maximum iterations=1
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_2_) 4096 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 4096 MPI processes
type: mpiaij
rows=2097152, cols=2097152
total: nonzeros=56623104, allocated nonzeros=56623104
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 3 -------------------------------
KSP Object: (mg_levels_3_) 4096 MPI processes
type: richardson
Richardson: damping factor=1.
maximum iterations=1
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_3_) 4096 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 4096 MPI processes
type: mpiaij
rows=16777216, cols=16777216
total: nonzeros=452984832, allocated nonzeros=452984832
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 4 -------------------------------
KSP Object: (mg_levels_4_) 4096 MPI processes
type: richardson
Richardson: damping factor=1.
maximum iterations=1
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_4_) 4096 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 4096 MPI processes
type: mpiaij
rows=134217728, cols=134217728
total: nonzeros=939524096, allocated nonzeros=939524096
total number of mallocs used during MatSetValues calls =0
has attached null space
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 4096 MPI processes
type: mpiaij
rows=134217728, cols=134217728
total: nonzeros=939524096, allocated nonzeros=939524096
total number of mallocs used during MatSetValues calls =0
has attached null space
-------------- next part --------------
Linear solve converged due to CONVERGED_RTOL iterations 8
KSP Object: 8192 MPI processes
type: cg
maximum iterations=10000
tolerances: relative=1e-07, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using UNPRECONDITIONED norm type for convergence test
PC Object: 8192 MPI processes
type: mg
MG: type is MULTIPLICATIVE, levels=5 cycles=v
Cycles per PCApply=1
Using Galerkin computed coarse grid matrices
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_) 8192 MPI processes
type: preonly
maximum iterations=1, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_) 8192 MPI processes
type: telescope
Telescope: parent comm size reduction factor = 128
Telescope: comm_size = 8192 , subcomm_size = 64
Telescope: DMDA detected
DMDA Object: (repart_) 64 MPI processes
M 64 N 64 P 64 m 4 n 4 p 4 dof 1 overlap 1
KSP Object: (mg_coarse_telescope_) 64 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_telescope_) 64 MPI processes
type: mg
MG: type is MULTIPLICATIVE, levels=4 cycles=v
Cycles per PCApply=1
Using Galerkin computed coarse grid matrices
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_telescope_mg_coarse_) 64 MPI processes
type: preonly
maximum iterations=1, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_telescope_mg_coarse_) 64 MPI processes
type: redundant
Redundant preconditioner: First (color=0) of 64 PCs follows
linear system matrix = precond matrix:
Mat Object: 64 MPI processes
type: mpiaij
rows=512, cols=512
total: nonzeros=13824, allocated nonzeros=13824
total number of mallocs used during MatSetValues calls =0
using I-node (on process 0) routines: found 2 nodes, limit used is 5
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_coarse_telescope_mg_levels_1_) 64 MPI processes
type: richardson
Richardson: damping factor=1.
maximum iterations=1
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_coarse_telescope_mg_levels_1_) 64 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 64 MPI processes
type: mpiaij
rows=4096, cols=4096
total: nonzeros=110592, allocated nonzeros=110592
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 2 -------------------------------
KSP Object: (mg_coarse_telescope_mg_levels_2_) 64 MPI processes
type: richardson
Richardson: damping factor=1.
maximum iterations=1
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_coarse_telescope_mg_levels_2_) 64 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 64 MPI processes
type: mpiaij
rows=32768, cols=32768
total: nonzeros=884736, allocated nonzeros=884736
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 3 -------------------------------
KSP Object: (mg_coarse_telescope_mg_levels_3_) 64 MPI processes
type: richardson
Richardson: damping factor=1.
maximum iterations=1
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_coarse_telescope_mg_levels_3_) 64 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 64 MPI processes
type: mpiaij
rows=262144, cols=262144
total: nonzeros=7077888, allocated nonzeros=7077888
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 64 MPI processes
type: mpiaij
rows=262144, cols=262144
total: nonzeros=7077888, allocated nonzeros=7077888
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
KSP Object: (mg_coarse_telescope_mg_coarse_redundant_) 1 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_telescope_mg_coarse_redundant_) 1 MPI processes
type: lu
LU: out-of-place factorization
tolerance for zero pivot 2.22045e-14
using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
matrix ordering: nd
factor fill ratio given 5., needed 8.69575
Factored matrix follows:
Mat Object: 1 MPI processes
type: seqaij
rows=512, cols=512
package used to perform factorization: petsc
total: nonzeros=120210, allocated nonzeros=120210
total number of mallocs used during MatSetValues calls =0
not using I-node routines
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=512, cols=512
total: nonzeros=13824, allocated nonzeros=13824
total number of mallocs used during MatSetValues calls =0
not using I-node routines
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=262144, cols=262144
total: nonzeros=7077888, allocated nonzeros=7077888
total number of mallocs used during MatSetValues calls =0
using I-node (on process 0) routines: found 16 nodes, limit used is 5
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_levels_1_) 8192 MPI processes
type: richardson
Richardson: damping factor=1.
maximum iterations=1
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 8192 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=2097152, cols=2097152
total: nonzeros=56623104, allocated nonzeros=56623104
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 2 -------------------------------
KSP Object: (mg_levels_2_) 8192 MPI processes
type: richardson
Richardson: damping factor=1.
maximum iterations=1
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_2_) 8192 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=16777216, cols=16777216
total: nonzeros=452984832, allocated nonzeros=452984832
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 3 -------------------------------
KSP Object: (mg_levels_3_) 8192 MPI processes
type: richardson
Richardson: damping factor=1.
maximum iterations=1
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_3_) 8192 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=134217728, cols=134217728
total: nonzeros=3623878656, allocated nonzeros=3623878656
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 4 -------------------------------
KSP Object: (mg_levels_4_) 8192 MPI processes
type: richardson
Richardson: damping factor=1.
maximum iterations=1
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_4_) 8192 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=1073741824, cols=1073741824
total: nonzeros=7516192768, allocated nonzeros=7516192768
total number of mallocs used during MatSetValues calls =0
has attached null space
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=1073741824, cols=1073741824
total: nonzeros=7516192768, allocated nonzeros=7516192768
total number of mallocs used during MatSetValues calls =0
has attached null space
-------------- next part --------------
Linear solve converged due to CONVERGED_RTOL iterations 7
1 step time: 6.2466299533843994
norm1 error: 1.2135791829058829E-005
norm inf error: 1.0512737852365958E-002
Summary of Memory Usage in PETSc
Maximum (over computational time) process memory: total 8.0407e+07 max 1.9696e+05 min 1.5078e+05
Current process memory: total 8.0407e+07 max 1.9696e+05 min 1.5078e+05
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./test_ksp.exe on a gnu-opt named . with 512 processors, by wang11 Tue Oct 4 05:04:05 2016
Using Petsc Development GIT revision: v3.6.3-2059-geab7831 GIT Date: 2016-01-20 10:58:35 -0600
Max Max/Min Avg Total
Time (sec): 7.128e+00 1.00215 7.121e+00
Objects: 3.330e+02 1.72539 2.105e+02
Flops: 2.508e+09 9.15893 5.530e+08 2.832e+11
Flops/sec: 3.521e+08 9.16346 7.765e+07 3.976e+10
MPI Messages: 3.918e+03 2.07713 2.157e+03 1.104e+06
MPI Message Lengths: 1.003e+07 1.17554 4.064e+03 4.488e+09
MPI Reductions: 4.310e+02 1.60223
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 7.1208e+00 100.0% 2.8316e+11 100.0% 1.104e+06 100.0% 4.064e+03 100.0% 2.882e+02 66.9%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
BuildTwoSidedF 1 1.0 2.5056e-0217.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecTDot 14 1.0 6.0542e-02 1.6 7.34e+06 1.0 0.0e+00 0.0e+00 1.4e+01 1 1 0 0 3 1 1 0 0 5 62074
VecNorm 8 1.0 3.5572e-02 3.1 4.19e+06 1.0 0.0e+00 0.0e+00 8.0e+00 0 1 0 0 2 0 1 0 0 3 60370
VecScale 28 2.0 2.1243e-04 1.8 7.35e+04 1.3 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 144250
VecCopy 9 1.0 3.8947e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 193 1.8 1.6343e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 28 1.0 1.0030e-01 1.1 1.47e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 3 0 0 0 1 3 0 0 0 74940
VecAYPX 48 1.4 6.3155e-02 1.6 7.11e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 57380
VecAssemblyBegin 1 1.0 2.5080e-0217.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 1 1.0 2.2888e-0512.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 194 1.6 3.9131e-02 1.6 0.00e+00 0.0 7.2e+05 4.1e+03 0.0e+00 0 0 65 65 0 0 0 65 65 0 0
VecScatterEnd 194 1.6 3.4133e+0068.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 42 0 0 0 0 42 0 0 0 0 0
MatMult 56 1.3 5.0448e-01 1.2 8.70e+07 1.0 2.9e+05 8.2e+03 0.0e+00 6 15 26 53 0 6 15 26 53 0 86737
MatMultAdd 35 1.7 8.0332e-02 1.2 1.43e+07 1.0 8.2e+04 1.5e+03 0.0e+00 1 3 7 3 0 1 3 7 3 0 90220
MatMultTranspose 47 1.5 1.1686e-01 1.4 1.64e+07 1.0 1.1e+05 1.4e+03 0.0e+00 1 3 10 3 0 1 3 10 3 0 70913
MatSolve 7 0.0 5.4884e-02 0.0 4.38e+07 0.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 51106
MatSOR 70 1.7 7.4662e-01 1.1 8.85e+07 1.0 2.1e+05 1.2e+03 1.8e+00 10 15 19 5 0 10 15 19 5 1 58271
MatLUFactorSym 1 0.0 1.3002e-01 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLUFactorNum 1 0.0 3.0343e+00 0.0 2.18e+09 0.0 0.0e+00 0.0e+00 0.0e+00 5 49 0 0 0 5 49 0 0 0 46035
MatConvert 1 0.0 1.4801e-03 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatResidual 35 1.7 2.5246e-01 1.3 4.14e+07 1.0 2.3e+05 4.1e+03 0.0e+00 3 7 21 21 0 3 7 21 21 0 80802
MatAssemblyBegin 29 1.5 6.2687e-02 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 2.1e+01 1 0 0 0 5 1 0 0 0 7 0
MatAssemblyEnd 29 1.5 2.8406e-01 1.0 0.00e+00 0.0 1.5e+05 5.4e+02 7.7e+01 4 0 14 2 18 4 0 14 2 27 0
MatGetRowIJ 1 0.0 1.1208e-03 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetSubMatrice 2 2.0 4.1284e-02 9.3 0.00e+00 0.0 2.2e+03 3.4e+04 3.5e+00 0 0 0 2 1 0 0 0 2 1 0
MatGetOrdering 1 0.0 7.9041e-03 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatPtAP 6 1.5 1.0306e+00 1.0 4.18e+07 1.0 3.1e+05 4.4e+03 7.2e+01 14 7 28 30 17 14 7 28 30 25 20208
MatPtAPSymbolic 6 1.5 4.9107e-01 1.0 0.00e+00 0.0 1.8e+05 5.3e+03 3.0e+01 7 0 16 21 7 7 0 16 21 10 0
MatPtAPNumeric 6 1.5 5.3958e-01 1.0 4.18e+07 1.0 1.3e+05 3.0e+03 4.2e+01 7 7 11 9 10 7 7 11 9 15 38597
MatRedundantMat 1 0.0 2.7650e-02 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 5.0e-01 0 0 0 0 0 0 0 0 0 0 0
MatMPIConcateSeq 1 0.0 1.6951e-02 0.0 0.00e+00 0.0 3.3e+03 1.4e+02 1.9e+00 0 0 0 0 0 0 0 0 0 1 0
MatGetLocalMat 6 1.5 4.7763e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
MatGetBrAoCol 6 1.5 4.1229e-02 1.2 0.00e+00 0.0 1.4e+05 5.5e+03 0.0e+00 1 0 13 17 0 1 0 13 17 0 0
MatGetSymTrans 12 1.5 1.4412e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
DMCoarsen 5 1.7 8.8470e-03 1.4 0.00e+00 0.0 2.0e+04 8.4e+02 3.6e+01 0 0 2 0 8 0 0 2 0 12 0
DMCreateInterpolation 5 1.7 2.1848e-01 1.0 2.05e+06 1.0 3.5e+04 7.5e+02 5.2e+01 3 0 3 1 12 3 0 3 1 18 4739
KSPSetUp 10 2.0 1.9465e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 3 0 0 0 0 4 0
KSPSolve 1 1.0 6.2467e+00 1.0 2.51e+09 9.2 1.1e+06 4.0e+03 2.6e+02 88100 99 98 60 88100 99 98 90 45330
PCSetUp 2 2.0 4.5211e+00 3.6 2.23e+0952.3 3.8e+05 3.8e+03 2.1e+02 23 57 35 33 48 23 57 35 33 72 35732
PCApply 7 1.0 4.6845e+00 1.0 2.42e+0913.0 7.2e+05 3.1e+03 3.0e+01 66 84 65 50 7 66 84 65 50 11 50783
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 133 133 29053936 0.
Vector Scatter 24 24 2464384 0.
Matrix 58 58 118369764 0.
Matrix Null Space 1 1 592 0.
Distributed Mesh 7 7 34944 0.
Star Forest Bipartite Graph 14 14 11872 0.
Discrete System 7 7 5992 0.
Index Set 54 54 1628276 0.
IS L to G Mapping 7 7 1367088 0.
Krylov Solver 11 11 13640 0.
DMKSP interface 5 5 3240 0.
Preconditioner 11 11 11008 0.
Viewer 1 0 0 0.
========================================================================================================================
Average time to get PetscTime(): 1.90735e-07
Average time for MPI_Barrier(): 1.87874e-05
Average time for zero size MPI_Send(): 1.10432e-05
#PETSc Option Table entries:
-ksp_converged_reason
-ksp_initial_guess_nonzero yes
-ksp_norm_type unpreconditioned
-ksp_rtol 1e-7
-ksp_type cg
-log_view
-matptap_scalable
-matrap 0
-memory_view
-mg_coarse_ksp_type preonly
-mg_coarse_pc_telescope_reduction_factor 8
-mg_coarse_pc_type telescope
-mg_coarse_telescope_ksp_type preonly
-mg_coarse_telescope_mg_coarse_ksp_type preonly
-mg_coarse_telescope_mg_coarse_pc_type redundant
-mg_coarse_telescope_mg_levels_ksp_max_it 1
-mg_coarse_telescope_mg_levels_ksp_type richardson
-mg_coarse_telescope_pc_mg_galerkin
-mg_coarse_telescope_pc_mg_levels 3
-mg_coarse_telescope_pc_type mg
-mg_levels_ksp_max_it 1
-mg_levels_ksp_type richardson
-N 512
-options_left 1
-pc_mg_galerkin
-pc_mg_levels 4
-pc_type mg
-ppe_max_iter 20
-px 8
-py 8
-pz 8
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --known-level1-dcache-size=16384 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=4 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-memcmp-ok=1 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1 --known-mpi-c-double-complex=1 --known-sdot-returns-double=0 --known-snrm2-returns-double=0 --known-has-attribute-aligned=1 --with-batch="1 " --known-mpi-shared="0 " --known-mpi-shared-libraries=0 --known-memcmp-ok --with-blas-lapack-lib=/opt/acml/5.3.1/gfortran64/lib/libacml.a --COPTFLAGS="-march=bdver1 -O3 -ffast-math -fPIC " --FOPTFLAGS="-march=bdver1 -O3 -ffast-math -fPIC " --CXXOPTFLAGS="-march=bdver1 -O3 -ffast-math -fPIC " --with-x="0 " --with-debugging="0 " --with-clib-autodetect="0 " --with-cxxlib-autodetect="0 " --with-fortranlib-autodetect="0 " --with-shared-libraries="0 " --with-mpi-compilers="1 " --with-cc="cc " --with-cxx="CC " --with-fc="ftn " --download-hypre="1 " --download-blacs="1 " --download-scalapack="1 " --download-superlu_dist="1 " --download-metis="1 " --download-parmetis="1 " PETSC_ARCH=gnu-opt
-----------------------------------------
Libraries compiled on Tue Feb 16 12:57:46 2016 on h2ologin3
Machine characteristics: Linux-3.0.101-0.46-default-x86_64-with-SuSE-11-x86_64
Using PETSc directory: /mnt/a/u/sciteam/wang11/Sftw/petsc
Using PETSc arch: gnu-opt
-----------------------------------------
Using C compiler: cc -march=bdver1 -O3 -ffast-math -fPIC ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: ftn -march=bdver1 -O3 -ffast-math -fPIC ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/mnt/a/u/sciteam/wang11/Sftw/petsc/gnu-opt/include -I/mnt/a/u/sciteam/wang11/Sftw/petsc/include -I/mnt/a/u/sciteam/wang11/Sftw/petsc/include -I/mnt/a/u/sciteam/wang11/Sftw/petsc/gnu-opt/include
-----------------------------------------
Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/mnt/a/u/sciteam/wang11/Sftw/petsc/gnu-opt/lib -L/mnt/a/u/sciteam/wang11/Sftw/petsc/gnu-opt/lib -lpetsc -Wl,-rpath,/mnt/a/u/sciteam/wang11/Sftw/petsc/gnu-opt/lib -L/mnt/a/u/sciteam/wang11/Sftw/petsc/gnu-opt/lib -lsuperlu_dist_4.3 -lHYPRE -lscalapack -Wl,-rpath,/opt/acml/5.3.1/gfortran64/lib -L/opt/acml/5.3.1/gfortran64/lib -lacml -lparmetis -lmetis -lssl -lcrypto -ldl
-----------------------------------------
#PETSc Option Table entries:
-ksp_converged_reason
-ksp_initial_guess_nonzero yes
-ksp_norm_type unpreconditioned
-ksp_rtol 1e-7
-ksp_type cg
-log_view
-matptap_scalable
-matrap 0
-memory_view
-mg_coarse_ksp_type preonly
-mg_coarse_pc_telescope_reduction_factor 8
-mg_coarse_pc_type telescope
-mg_coarse_telescope_ksp_type preonly
-mg_coarse_telescope_mg_coarse_ksp_type preonly
-mg_coarse_telescope_mg_coarse_pc_type redundant
-mg_coarse_telescope_mg_levels_ksp_max_it 1
-mg_coarse_telescope_mg_levels_ksp_type richardson
-mg_coarse_telescope_pc_mg_galerkin
-mg_coarse_telescope_pc_mg_levels 3
-mg_coarse_telescope_pc_type mg
-mg_levels_ksp_max_it 1
-mg_levels_ksp_type richardson
-N 512
-options_left 1
-pc_mg_galerkin
-pc_mg_levels 4
-pc_type mg
-ppe_max_iter 20
-px 8
-py 8
-pz 8
#End of PETSc Option Table entries
There is one unused database option. It is:
Option left: name:-ppe_max_iter value: 20
Application 48712763 resources: utime ~3749s, stime ~789s, Rss ~196960, inblocks ~781565, outblocks ~505751
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: log_512_4096.txt
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20161004/536323ff/attachment-0006.txt>
-------------- next part --------------
Linear solve converged due to CONVERGED_RTOL iterations 7
1 step time: 4.8914160728454590
norm1 error: 8.6827845637092041E-008
norm inf error: 4.1127664509280201E-003
Summary of Memory Usage in PETSc
Maximum (over computational time) process memory: total 1.9679e+09 max 1.1249e+05 min 4.1456e+04
Current process memory: total 1.9679e+09 max 1.1249e+05 min 4.1456e+04
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./test_ksp.exe on a gnu-opt named . with 32768 processors, by wang11 Tue Oct 4 03:50:16 2016
Using Petsc Development GIT revision: v3.6.3-2059-geab7831 GIT Date: 2016-01-20 10:58:35 -0600
Max Max/Min Avg Total
Time (sec): 5.221e+00 1.00192 5.215e+00
Objects: 3.330e+02 1.72539 1.952e+02
Flops: 2.232e+09 531.65406 3.900e+07 1.278e+12
Flops/sec: 4.277e+08 531.89802 7.473e+06 2.449e+11
MPI Messages: 8.594e+03 4.55579 2.011e+03 6.589e+07
MPI Message Lengths: 1.078e+06 1.95814 2.782e+02 1.833e+10
MPI Reductions: 4.310e+02 1.60223
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 5.2149e+00 100.0% 1.2779e+12 100.0% 6.589e+07 100.0% 2.782e+02 100.0% 2.705e+02 62.8%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
BuildTwoSidedF 1 1.0 6.2082e-02 6.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecTDot 14 1.0 1.5901e-02 2.1 1.15e+05 1.0 0.0e+00 0.0e+00 1.4e+01 0 0 0 0 3 0 0 0 0 5 236313
VecNorm 8 1.0 8.2795e-0299.5 6.55e+04 1.0 0.0e+00 0.0e+00 8.0e+00 1 0 0 0 2 1 0 0 0 3 25937
VecScale 28 2.0 4.6015e-0417.9 8.96e+03 2.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 324014
VecCopy 9 1.0 2.4486e-04 3.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 193 1.8 5.3072e-04 4.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 28 1.0 6.1011e-04 2.5 2.29e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 12319342
VecAYPX 48 1.4 4.3058e-04 2.8 1.15e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 8416119
VecAssemblyBegin 1 1.0 6.2096e-02 6.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecAssemblyEnd 1 1.0 6.3896e-0567.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 194 1.6 2.2339e-02 8.0 0.00e+00 0.0 4.3e+07 2.8e+02 0.0e+00 0 0 65 66 0 0 0 65 66 0 0
VecScatterEnd 194 1.6 3.7815e+0039.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 71 0 0 0 0 71 0 0 0 0 0
MatMult 56 1.3 7.7610e-02 7.5 1.55e+06 1.2 1.7e+07 5.6e+02 0.0e+00 0 3 26 53 0 0 3 26 53 0 563808
MatMultAdd 35 1.7 1.1928e-02 9.2 2.48e+05 1.1 4.9e+06 1.1e+02 0.0e+00 0 1 7 3 0 0 1 7 3 0 607627
MatMultTranspose 47 1.5 2.6726e-0213.3 2.84e+05 1.1 6.5e+06 9.9e+01 0.0e+00 0 1 10 3 0 0 1 10 3 0 310054
MatSolve 7 0.0 5.5102e-02 0.0 4.38e+07 0.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 407368
MatSOR 70 1.7 2.0535e-02 3.7 1.70e+06 1.4 1.2e+07 9.8e+01 2.2e-01 0 3 18 7 0 0 3 18 7 0 1976428
MatLUFactorSym 1 0.0 1.4304e-01 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLUFactorNum 1 0.0 3.0453e+00 0.0 2.18e+09 0.0 0.0e+00 0.0e+00 0.0e+00 1 87 0 0 0 1 87 0 0 0 366959
MatConvert 1 0.0 1.3890e-03 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatResidual 35 1.7 7.3063e-0211.3 8.37e+05 1.4 1.3e+07 3.0e+02 0.0e+00 0 2 20 22 0 0 2 20 22 0 279200
MatAssemblyBegin 29 1.5 1.1239e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+01 2 0 0 0 5 2 0 0 0 7 0
MatAssemblyEnd 29 1.5 3.6328e-01 1.1 0.00e+00 0.0 8.9e+06 4.1e+01 7.3e+01 6 0 14 2 17 6 0 14 2 27 0
MatGetRowIJ 1 0.0 1.1570e-03 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetSubMatrice 2 2.0 1.0665e-01 4.9 0.00e+00 0.0 1.6e+05 5.4e+02 3.1e+00 1 0 0 0 1 1 0 0 0 1 0
MatGetOrdering 1 0.0 8.1892e-03 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatPtAP 6 1.5 4.1852e-01 1.0 7.98e+05 1.2 1.9e+07 3.0e+02 6.9e+01 8 2 28 30 16 8 2 28 30 25 50373
MatPtAPSymbolic 6 1.5 2.2612e-01 1.0 0.00e+00 0.0 1.1e+07 3.7e+02 2.8e+01 4 0 16 22 7 4 0 16 22 10 0
MatPtAPNumeric 6 1.5 1.9413e-01 1.0 7.98e+05 1.2 7.7e+06 2.0e+02 4.0e+01 4 2 12 8 9 4 2 12 8 15 108597
MatRedundantMat 1 0.0 2.9847e-02 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 6.2e-02 0 0 0 0 0 0 0 0 0 0 0
MatMPIConcateSeq 1 0.0 7.8937e-02 0.0 0.00e+00 0.0 2.7e+04 4.0e+01 2.3e-01 0 0 0 0 0 0 0 0 0 0 0
MatGetLocalMat 6 1.5 7.7701e-04 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 6 1.5 1.9681e-02 3.1 0.00e+00 0.0 8.3e+06 3.9e+02 0.0e+00 0 0 13 18 0 0 0 13 18 0 0
MatGetSymTrans 12 1.5 2.0599e-04 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
DMCoarsen 5 1.7 9.4588e-02 1.0 0.00e+00 0.0 1.2e+06 5.8e+01 3.3e+01 2 0 2 0 8 2 0 2 0 12 0
DMCreateInterpolation 5 1.7 2.1863e-01 1.0 3.54e+04 1.1 2.1e+06 5.8e+01 4.8e+01 4 0 3 1 11 4 0 3 1 18 4736
KSPSetUp 10 2.0 2.9837e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 1 0 0 0 3 1 0 0 0 4 0
KSPSolve 1 1.0 4.8916e+00 1.0 2.23e+09531.7 6.5e+07 2.8e+02 2.4e+02 94100 99 98 56 94100 99 98 89 261253
PCSetUp 2 2.0 4.6506e+00 4.8 2.18e+093247.5 2.3e+07 2.5e+02 1.9e+02 20 89 35 32 44 20 89 35 32 71 245045
PCApply 7 1.0 3.7972e+00 1.0 2.23e+09794.1 4.2e+07 2.2e+02 1.6e+01 73 96 63 51 4 73 96 63 51 6 324561
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 133 133 850544 0.
Vector Scatter 24 24 68032 0.
Matrix 58 58 42186948 0.
Matrix Null Space 1 1 592 0.
Distributed Mesh 7 7 34944 0.
Star Forest Bipartite Graph 14 14 11872 0.
Discrete System 7 7 5992 0.
Index Set 54 54 152244 0.
IS L to G Mapping 7 7 37936 0.
Krylov Solver 11 11 13640 0.
DMKSP interface 5 5 3240 0.
Preconditioner 11 11 11008 0.
Viewer 1 0 0 0.
========================================================================================================================
Average time to get PetscTime(): 1.90735e-07
Average time for MPI_Barrier(): 6.00338e-05
Average time for zero size MPI_Send(): 1.25148e-05
#PETSc Option Table entries:
-ksp_converged_reason
-ksp_initial_guess_nonzero yes
-ksp_norm_type unpreconditioned
-ksp_rtol 1e-7
-ksp_type cg
-log_view
-matptap_scalable
-matrap 0
-memory_view
-mg_coarse_ksp_type preonly
-mg_coarse_pc_telescope_reduction_factor 64
-mg_coarse_pc_type telescope
-mg_coarse_telescope_ksp_type preonly
-mg_coarse_telescope_mg_coarse_ksp_type preonly
-mg_coarse_telescope_mg_coarse_pc_type redundant
-mg_coarse_telescope_mg_levels_ksp_max_it 1
-mg_coarse_telescope_mg_levels_ksp_type richardson
-mg_coarse_telescope_pc_mg_galerkin
-mg_coarse_telescope_pc_mg_levels 3
-mg_coarse_telescope_pc_type mg
-mg_levels_ksp_max_it 1
-mg_levels_ksp_type richardson
-N 512
-options_left 1
-pc_mg_galerkin
-pc_mg_levels 4
-pc_type mg
-ppe_max_iter 20
-px 32
-py 32
-pz 32
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --known-level1-dcache-size=16384 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=4 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-memcmp-ok=1 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1 --known-mpi-c-double-complex=1 --known-sdot-returns-double=0 --known-snrm2-returns-double=0 --known-has-attribute-aligned=1 --with-batch="1 " --known-mpi-shared="0 " --known-mpi-shared-libraries=0 --known-memcmp-ok --with-blas-lapack-lib=/opt/acml/5.3.1/gfortran64/lib/libacml.a --COPTFLAGS="-march=bdver1 -O3 -ffast-math -fPIC " --FOPTFLAGS="-march=bdver1 -O3 -ffast-math -fPIC " --CXXOPTFLAGS="-march=bdver1 -O3 -ffast-math -fPIC " --with-x="0 " --with-debugging="0 " --with-clib-autodetect="0 " --with-cxxlib-autodetect="0 " --with-fortranlib-autodetect="0 " --with-shared-libraries="0 " --with-mpi-compilers="1 " --with-cc="cc " --with-cxx="CC " --with-fc="ftn " --download-hypre="1 " --download-blacs="1 " --download-scalapack="1 " --download-superlu_dist="1 " --download-metis="1 " --download-parmetis="1 " PETSC_ARCH=gnu-opt
-----------------------------------------
Libraries compiled on Tue Feb 16 12:57:46 2016 on h2ologin3
Machine characteristics: Linux-3.0.101-0.46-default-x86_64-with-SuSE-11-x86_64
Using PETSc directory: /mnt/a/u/sciteam/wang11/Sftw/petsc
Using PETSc arch: gnu-opt
-----------------------------------------
Using C compiler: cc -march=bdver1 -O3 -ffast-math -fPIC ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: ftn -march=bdver1 -O3 -ffast-math -fPIC ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/mnt/a/u/sciteam/wang11/Sftw/petsc/gnu-opt/include -I/mnt/a/u/sciteam/wang11/Sftw/petsc/include -I/mnt/a/u/sciteam/wang11/Sftw/petsc/include -I/mnt/a/u/sciteam/wang11/Sftw/petsc/gnu-opt/include
-----------------------------------------
Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/mnt/a/u/sciteam/wang11/Sftw/petsc/gnu-opt/lib -L/mnt/a/u/sciteam/wang11/Sftw/petsc/gnu-opt/lib -lpetsc -Wl,-rpath,/mnt/a/u/sciteam/wang11/Sftw/petsc/gnu-opt/lib -L/mnt/a/u/sciteam/wang11/Sftw/petsc/gnu-opt/lib -lsuperlu_dist_4.3 -lHYPRE -lscalapack -Wl,-rpath,/opt/acml/5.3.1/gfortran64/lib -L/opt/acml/5.3.1/gfortran64/lib -lacml -lparmetis -lmetis -lssl -lcrypto -ldl
-----------------------------------------
#PETSc Option Table entries:
-ksp_converged_reason
-ksp_initial_guess_nonzero yes
-ksp_norm_type unpreconditioned
-ksp_rtol 1e-7
-ksp_type cg
-log_view
-matptap_scalable
-matrap 0
-memory_view
-mg_coarse_ksp_type preonly
-mg_coarse_pc_telescope_reduction_factor 64
-mg_coarse_pc_type telescope
-mg_coarse_telescope_ksp_type preonly
-mg_coarse_telescope_mg_coarse_ksp_type preonly
-mg_coarse_telescope_mg_coarse_pc_type redundant
-mg_coarse_telescope_mg_levels_ksp_max_it 1
-mg_coarse_telescope_mg_levels_ksp_type richardson
-mg_coarse_telescope_pc_mg_galerkin
-mg_coarse_telescope_pc_mg_levels 3
-mg_coarse_telescope_pc_type mg
-mg_levels_ksp_max_it 1
-mg_levels_ksp_type richardson
-N 512
-options_left 1
-pc_mg_galerkin
-pc_mg_levels 4
-pc_type mg
-ppe_max_iter 20
-px 32
-py 32
-pz 32
#End of PETSc Option Table entries
There is one unused database option. It is:
Option left: name:-ppe_max_iter value: 20
Application 48712514 resources: utime ~274648s, stime ~36467s, Rss ~112492, inblocks ~29956998, outblocks ~32114238
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: log_1024_4096.txt
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20161004/536323ff/attachment-0007.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: log_1024_8192.txt
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20161004/536323ff/attachment-0008.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: log_1024_16384.txt
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20161004/536323ff/attachment-0009.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: log_1024_32768.txt
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20161004/536323ff/attachment-0010.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: log_1024_65536.txt
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20161004/536323ff/attachment-0011.txt>
More information about the petsc-users
mailing list