[petsc-users] Question about memory usage in Multigrid preconditioner
Hengjie Wang
hengjiew at uci.edu
Thu Sep 15 01:03:39 CDT 2016
Hi Dave,
Sorry, I should have put more comment to explain the code.
The number of process in each dimension is the same: Px = Py=Pz=P. So is
the domain size.
So if the you want to run the code for a 512^3 grid points on 16^3
cores, you need to set "-N 512 -P 16" in the command line.
I add more comments and also fix an error in the attached code. ( The
error only effects the accuracy of solution but not the memory usage. )
Thank you.
Frank
On 9/14/2016 9:05 PM, Dave May wrote:
>
>
> On Thursday, 15 September 2016, Dave May <dave.mayhem23 at gmail.com
> <mailto:dave.mayhem23 at gmail.com>> wrote:
>
>
>
> On Thursday, 15 September 2016, frank <hengjiew at uci.edu
> <javascript:_e(%7B%7D,'cvml','hengjiew at uci.edu');>> wrote:
>
> Hi,
>
> I write a simple code to re-produce the error. I hope this can
> help to diagnose the problem.
> The code just solves a 3d poisson equation.
>
>
> Why is the stencil width a runtime parameter?? And why is the
> default value 2? For 7-pnt FD Laplace, you only need a stencil
> width of 1.
>
> Was this choice made to mimic something in the real application code?
>
>
> Please ignore - I misunderstood your usage of the param set by -P
>
>
> I run the code on a 1024^3 mesh. The process partition is 32 *
> 32 * 32. That's when I re-produce the OOM error. Each core has
> about 2G memory.
> I also run the code on a 512^3 mesh with 16 * 16 * 16
> processes. The ksp solver works fine.
> I attached the code, ksp_view_pre's output and my petsc option
> file.
>
> Thank you.
> Frank
>
> On 09/09/2016 06:38 PM, Hengjie Wang wrote:
>> Hi Barry,
>>
>> I checked. On the supercomputer, I had the option
>> "-ksp_view_pre" but it is not in file I sent you. I am sorry
>> for the confusion.
>>
>> Regards,
>> Frank
>>
>> On Friday, September 9, 2016, Barry Smith
>> <bsmith at mcs.anl.gov> wrote:
>>
>>
>> > On Sep 9, 2016, at 3:11 PM, frank <hengjiew at uci.edu> wrote:
>> >
>> > Hi Barry,
>> >
>> > I think the first KSP view output is from
>> -ksp_view_pre. Before I submitted the test, I was not
>> sure whether there would be OOM error or not. So I added
>> both -ksp_view_pre and -ksp_view.
>>
>> But the options file you sent specifically does NOT
>> list the -ksp_view_pre so how could it be from that?
>>
>> Sorry to be pedantic but I've spent too much time in
>> the past trying to debug from incorrect information and
>> want to make sure that the information I have is correct
>> before thinking. Please recheck exactly what happened.
>> Rerun with the exact input file you emailed if that is
>> needed.
>>
>> Barry
>>
>> >
>> > Frank
>> >
>> >
>> > On 09/09/2016 12:38 PM, Barry Smith wrote:
>> >> Why does ksp_view2.txt have two KSP views in it
>> while ksp_view1.txt has only one KSPView in it? Did you
>> run two different solves in the 2 case but not the one?
>> >>
>> >> Barry
>> >>
>> >>
>> >>
>> >>> On Sep 9, 2016, at 10:56 AM, frank <hengjiew at uci.edu>
>> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> I want to continue digging into the memory problem here.
>> >>> I did find a work around in the past, which is to use
>> less cores per node so that each core has 8G memory.
>> However this is deficient and expensive. I hope to locate
>> the place that uses the most memory.
>> >>>
>> >>> Here is a brief summary of the tests I did in past:
>> >>>> Test1: Mesh 1536*128*384 | Process Mesh 48*4*12
>> >>> Maximum (over computational time) process memory:
>> total 7.0727e+08
>> >>> Current process memory:
>> total 7.0727e+08
>> >>> Maximum (over computational time) space
>> PetscMalloc()ed: total 6.3908e+11
>> >>> Current space PetscMalloc()ed:
>> total 1.8275e+09
>> >>>
>> >>>> Test2: Mesh 1536*128*384 | Process Mesh 96*8*24
>> >>> Maximum (over computational time) process memory:
>> total 5.9431e+09
>> >>> Current process memory:
>> total 5.9431e+09
>> >>> Maximum (over computational time) space
>> PetscMalloc()ed: total 5.3202e+12
>> >>> Current space PetscMalloc()ed:
>> total 5.4844e+09
>> >>>
>> >>>> Test3: Mesh 3072*256*768 | Process Mesh 96*8*24
>> >>> OOM( Out Of Memory ) killer of the supercomputer
>> terminated the job during "KSPSolve".
>> >>>
>> >>> I attached the output of ksp_view( the third test's
>> output is from ksp_view_pre ), memory_view and also the
>> petsc options.
>> >>>
>> >>> In all the tests, each core can access about 2G
>> memory. In test3, there are 4223139840 non-zeros in the
>> matrix. This will consume about 1.74M, using double
>> precision. Considering some extra memory used to store
>> integer index, 2G memory should still be way enough.
>> >>>
>> >>> Is there a way to find out which part of KSPSolve
>> uses the most memory?
>> >>> Thank you so much.
>> >>>
>> >>> BTW, there are 4 options remains unused and I don't
>> understand why they are omitted:
>> >>> -mg_coarse_telescope_mg_coarse_ksp_type value: preonly
>> >>> -mg_coarse_telescope_mg_coarse_pc_type value: bjacobi
>> >>> -mg_coarse_telescope_mg_levels_ksp_max_it value: 1
>> >>> -mg_coarse_telescope_mg_levels_ksp_type value: richardson
>> >>>
>> >>>
>> >>> Regards,
>> >>> Frank
>> >>>
>> >>> On 07/13/2016 05:47 PM, Dave May wrote:
>> >>>>
>> >>>> On 14 July 2016 at 01:07, frank <hengjiew at uci.edu>
>> wrote:
>> >>>> Hi Dave,
>> >>>>
>> >>>> Sorry for the late reply.
>> >>>> Thank you so much for your detailed reply.
>> >>>>
>> >>>> I have a question about the estimation of the memory
>> usage. There are 4223139840 allocated non-zeros and 18432
>> MPI processes. Double precision is used. So the memory
>> per process is:
>> >>>> 4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ?
>> >>>> Did I do sth wrong here? Because this seems too small.
>> >>>>
>> >>>> No - I totally f***ed it up. You are correct.
>> That'll teach me for fumbling around with my iphone
>> calculator and not using my brain. (Note that to convert
>> to MB just divide by 1e6, not 1024^2 - although I
>> apparently cannot convert between units correctly....)
>> >>>>
>> >>>> From the PETSc objects associated with the solver,
>> It looks like it _should_ run with 2GB per MPI rank.
>> Sorry for my mistake. Possibilities are: somewhere in
>> your usage of PETSc you've introduced a memory leak;
>> PETSc is doing a huge over allocation (e.g. as per our
>> discussion of MatPtAP); or in your application code there
>> are other objects you have forgotten to log the memory for.
>> >>>>
>> >>>>
>> >>>>
>> >>>> I am running this job on Bluewater
>> >>>> I am using the 7 points FD stencil in 3D.
>> >>>>
>> >>>> I thought so on both counts.
>> >>>>
>> >>>> I apologize that I made a stupid mistake in
>> computing the memory per core. My settings render each
>> core can access only 2G memory on average instead of 8G
>> which I mentioned in previous email. I re-run the job
>> with 8G memory per core on average and there is no "Out
>> Of Memory" error. I would do more test to see if there is
>> still some memory issue.
>> >>>>
>> >>>> Ok. I'd still like to know where the memory was
>> being used since my estimates were off.
>> >>>>
>> >>>>
>> >>>> Thanks,
>> >>>> Dave
>> >>>>
>> >>>> Regards,
>> >>>> Frank
>> >>>>
>> >>>>
>> >>>>
>> >>>> On 07/11/2016 01:18 PM, Dave May wrote:
>> >>>>> Hi Frank,
>> >>>>>
>> >>>>>
>> >>>>> On 11 July 2016 at 19:14, frank <hengjiew at uci.edu>
>> wrote:
>> >>>>> Hi Dave,
>> >>>>>
>> >>>>> I re-run the test using bjacobi as the
>> preconditioner on the coarse mesh of telescope. The Grid
>> is 3072*256*768 and process mesh is 96*8*24. The petsc
>> option file is attached.
>> >>>>> I still got the "Out Of Memory" error. The error
>> occurred before the linear solver finished one step. So I
>> don't have the full info from ksp_view. The info from
>> ksp_view_pre is attached.
>> >>>>>
>> >>>>> Okay - that is essentially useless (sorry)
>> >>>>>
>> >>>>> It seems to me that the error occurred when the
>> decomposition was going to be changed.
>> >>>>>
>> >>>>> Based on what information?
>> >>>>> Running with -info would give us more clues, but
>> will create a ton of output.
>> >>>>> Please try running the case which failed with -info
>> >>>>> I had another test with a grid of 1536*128*384 and
>> the same process mesh as above. There was no error. The
>> ksp_view info is attached for comparison.
>> >>>>> Thank you.
>> >>>>>
>> >>>>>
>> >>>>> [3] Here is my crude estimate of your memory usage.
>> >>>>> I'll target the biggest memory hogs only to get an
>> order of magnitude estimate
>> >>>>>
>> >>>>> * The Fine grid operator contains 4223139840
>> non-zeros --> 1.8 GB per MPI rank assuming double precision.
>> >>>>> The indices for the AIJ could amount to another 0.3
>> GB (assuming 32 bit integers)
>> >>>>>
>> >>>>> * You use 5 levels of coarsening, so the other
>> operators should represent (collectively)
>> >>>>> 2.1 / 8 + 2.1/8^2 + 2.1/8^3 + 2.1/8^4 ~ 300 MB per
>> MPI rank on the communicator with 18432 ranks.
>> >>>>> The coarse grid should consume ~ 0.5 MB per MPI
>> rank on the communicator with 18432 ranks.
>> >>>>>
>> >>>>> * You use a reduction factor of 64, making the new
>> communicator with 288 MPI ranks.
>> >>>>> PCTelescope will first gather a temporary matrix
>> associated with your coarse level operator assuming a
>> comm size of 288 living on the comm with size 18432.
>> >>>>> This matrix will require approximately 0.5 * 64 =
>> 32 MB per core on the 288 ranks.
>> >>>>> This matrix is then used to form a new MPIAIJ
>> matrix on the subcomm, thus require another 32 MB per rank.
>> >>>>> The temporary matrix is now destroyed.
>> >>>>>
>> >>>>> * Because a DMDA is detected, a permutation matrix
>> is assembled.
>> >>>>> This requires 2 doubles per point in the DMDA.
>> >>>>> Your coarse DMDA contains 92 x 16 x 48 points.
>> >>>>> Thus the permutation matrix will require < 1 MB per
>> MPI rank on the sub-comm.
>> >>>>>
>> >>>>> * Lastly, the matrix is permuted. This uses
>> MatPtAP(), but the resulting operator will have the same
>> memory footprint as the unpermuted matrix (32 MB). At any
>> stage in PCTelescope, only 2 operators of size 32 MB are
>> held in memory when the DMDA is provided.
>> >>>>>
>> >>>>> From my rough estimates, the worst case memory foot
>> print for any given core, given your options is approximately
>> >>>>> 2100 MB + 300 MB + 32 MB + 32 MB + 1 MB = 2465 MB
>> >>>>> This is way below 8 GB.
>> >>>>>
>> >>>>> Note this estimate completely ignores:
>> >>>>> (1) the memory required for the restriction operator,
>> >>>>> (2) the potential growth in the number of non-zeros
>> per row due to Galerkin coarsening (I wished
>> -ksp_view_pre reported the output from MatView so we
>> could see the number of non-zeros required by the coarse
>> level operators)
>> >>>>> (3) all temporary vectors required by the CG
>> solver, and those required by the smoothers.
>> >>>>> (4) internal memory allocated by MatPtAP
>> >>>>> (5) memory associated with IS's used within PCTelescope
>> >>>>>
>> >>>>> So either I am completely off in my estimates, or
>> you have not carefully estimated the memory usage of your
>> application code. Hopefully others might examine/correct
>> my rough estimates
>> >>>>>
>> >>>>> Since I don't have your code I cannot access the
>> latter.
>> >>>>> Since I don't have access to the same machine you
>> are running on, I think we need to take a step back.
>> >>>>>
>> >>>>> [1] What machine are you running on? Send me a URL
>> if its available
>> >>>>>
>> >>>>> [2] What discretization are you using? (I am
>> guessing a scalar 7 point FD stencil)
>> >>>>> If it's a 7 point FD stencil, we should be able to
>> examine the memory usage of your solver configuration
>> using a standard, light weight existing PETSc example,
>> run on your machine at the same scale.
>> >>>>> This would hopefully enable us to correctly
>> evaluate the actual memory usage required by the solver
>> configuration you are using.
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Dave
>> >>>>>
>> >>>>>
>> >>>>> Frank
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On 07/08/2016 10:38 PM, Dave May wrote:
>> >>>>>>
>> >>>>>> On Saturday, 9 July 2016, frank <hengjiew at uci.edu>
>> wrote:
>> >>>>>> Hi Barry and Dave,
>> >>>>>>
>> >>>>>> Thank both of you for the advice.
>> >>>>>>
>> >>>>>> @Barry
>> >>>>>> I made a mistake in the file names in last email.
>> I attached the correct files this time.
>> >>>>>> For all the three tests, 'Telescope' is used as
>> the coarse preconditioner.
>> >>>>>>
>> >>>>>> == Test1: Grid: 1536*128*384, Process Mesh:
>> 48*4*12
>> >>>>>> Part of the memory usage: Vector 125
>> 124 3971904 0.
>> >>>>>> Matrix 101 101 9462372 0
>> >>>>>>
>> >>>>>> == Test2: Grid: 1536*128*384, Process Mesh: 96*8*24
>> >>>>>> Part of the memory usage: Vector 125
>> 124 681672 0.
>> >>>>>> Matrix 101 101 1462180 0.
>> >>>>>>
>> >>>>>> In theory, the memory usage in Test1 should be 8
>> times of Test2. In my case, it is about 6 times.
>> >>>>>>
>> >>>>>> == Test3: Grid: 3072*256*768, Process Mesh:
>> 96*8*24. Sub-domain per process: 32*32*32
>> >>>>>> Here I get the out of memory error.
>> >>>>>>
>> >>>>>> I tried to use -mg_coarse jacobi. In this way, I
>> don't need to set -mg_coarse_ksp_type and
>> -mg_coarse_pc_type explicitly, right?
>> >>>>>> The linear solver didn't work in this case. Petsc
>> output some errors.
>> >>>>>>
>> >>>>>> @Dave
>> >>>>>> In test3, I use only one instance of 'Telescope'.
>> On the coarse mesh of 'Telescope', I used LU as the
>> preconditioner instead of SVD.
>> >>>>>> If my set the levels correctly, then on the last
>> coarse mesh of MG where it calls 'Telescope', the
>> sub-domain per process is 2*2*2.
>> >>>>>> On the last coarse mesh of 'Telescope', there is
>> only one grid point per process.
>> >>>>>> I still got the OOM error. The detailed petsc
>> option file is attached.
>> >>>>>>
>> >>>>>> Do you understand the expected memory usage for
>> the particular parallel LU implementation you are using?
>> I don't (seriously). Replace LU with bjacobi and re-run
>> this test. My point about solver debugging is still valid.
>> >>>>>>
>> >>>>>> And please send the result of KSPView so we can
>> see what is actually used in the computations
>> >>>>>>
>> >>>>>> Thanks
>> >>>>>> Dave
>> >>>>>>
>> >>>>>>
>> >>>>>> Thank you so much.
>> >>>>>>
>> >>>>>> Frank
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> On 07/06/2016 02:51 PM, Barry Smith wrote:
>> >>>>>> On Jul 6, 2016, at 4:19 PM, frank
>> <hengjiew at uci.edu> wrote:
>> >>>>>>
>> >>>>>> Hi Barry,
>> >>>>>>
>> >>>>>> Thank you for you advice.
>> >>>>>> I tried three test. In the 1st test, the grid is
>> 3072*256*768 and the process mesh is 96*8*24.
>> >>>>>> The linear solver is 'cg' the preconditioner is
>> 'mg' and 'telescope' is used as the preconditioner at the
>> coarse mesh.
>> >>>>>> The system gives me the "Out of Memory" error
>> before the linear system is completely solved.
>> >>>>>> The info from '-ksp_view_pre' is attached. I seems
>> to me that the error occurs when it reaches the coarse mesh.
>> >>>>>>
>> >>>>>> The 2nd test uses a grid of 1536*128*384 and
>> process mesh is 96*8*24. The 3rd
>> test uses the same grid but a different
>> process mesh 48*4*12.
>> >>>>>> Are you sure this is right? The total matrix
>> and vector memory usage goes from 2nd test
>> >>>>>> Vector 384 383
>> 8,193,712 0.
>> >>>>>> Matrix 103 103
>> 11,508,688 0.
>> >>>>>> to 3rd test
>> >>>>>> Vector 384 383
>> 1,590,520 0.
>> >>>>>> Matrix 103 103
>> 3,508,664 0.
>> >>>>>> that is the memory usage got smaller but if you
>> have only 1/8th the processes and the same grid it should
>> have gotten about 8 times bigger. Did you maybe cut the
>> grid by a factor of 8 also? If so that still doesn't
>> explain it because the memory usage changed by a factor
>> of 5 something for the vectors and 3 something for the
>> matrices.
>> >>>>>>
>> >>>>>>
>> >>>>>> The linear solver and petsc options in 2nd and 3rd
>> tests are the same in 1st test. The linear solver works
>> fine in both test.
>> >>>>>> I attached the memory usage of the 2nd and 3rd
>> tests. The memory info is from the option '-log_summary'.
>> I tried to use '-momery_info' as you suggested, but in my
>> case petsc treated it as an unused option. It output
>> nothing about the memory. Do I need to add sth to my code
>> so I can use '-memory_info'?
>> >>>>>> Sorry, my mistake the option is -memory_view
>> >>>>>>
>> >>>>>> Can you run the one case with -memory_view and
>> -mg_coarse jacobi -ksp_max_it 1 (just so it doesn't
>> iterate forever) to see how much memory is used without
>> the telescope? Also run case 2 the same way.
>> >>>>>>
>> >>>>>> Barry
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> In both tests the memory usage is not large.
>> >>>>>>
>> >>>>>> It seems to me that it might be the 'telescope'
>> preconditioner that allocated a lot of memory and caused
>> the error in the 1st test.
>> >>>>>> Is there is a way to show how much memory it
>> allocated?
>> >>>>>>
>> >>>>>> Frank
>> >>>>>>
>> >>>>>> On 07/05/2016 03:37 PM, Barry Smith wrote:
>> >>>>>> Frank,
>> >>>>>>
>> >>>>>> You can run with -ksp_view_pre to have it
>> "view" the KSP before the solve so hopefully it gets that
>> far.
>> >>>>>>
>> >>>>>> Please run the problem that does fit with
>> -memory_info when the problem completes it will show the
>> "high water mark" for PETSc allocated memory and total
>> memory used. We first want to look at these numbers to
>> see if it is using more memory than you expect. You could
>> also run with say half the grid spacing to see how the
>> memory usage scaled with the increase in grid points.
>> Make the runs also with -log_view and send all the output
>> from these options.
>> >>>>>>
>> >>>>>> Barry
>> >>>>>>
>> >>>>>> On Jul 5, 2016, at 5:23 PM, frank
>> <hengjiew at uci.edu> wrote:
>> >>>>>>
>> >>>>>> Hi,
>> >>>>>>
>> >>>>>> I am using the CG ksp solver and Multigrid
>> preconditioner to solve a linear system in parallel.
>> >>>>>> I chose to use the 'Telescope' as the
>> preconditioner on the coarse mesh for its good performance.
>> >>>>>> The petsc options file is attached.
>> >>>>>>
>> >>>>>> The domain is a 3d box.
>> >>>>>> It works well when the grid is 1536*128*384 and
>> the process mesh is 96*8*24. When I double the size of
>> grid and keep the same
>> process mesh and petsc options, I get an "out of memory"
>> error from the super-cluster I am using.
>> >>>>>> Each process has access to at least 8G memory,
>> which should be more than enough for my application. I am
>> sure that all the other parts of my code( except the
>> linear solver ) do not use much memory. So I doubt if
>> there is something wrong with the linear solver.
>> >>>>>> The error occurs before the linear system is
>> completely solved so I don't have the info from ksp view.
>> I am not able to re-produce the error with a smaller
>> problem either.
>> >>>>>> In addition, I tried to use the block jacobi as
>> the preconditioner with the same grid and same
>> decomposition. The linear solver runs extremely slow but
>> there is no memory error.
>> >>>>>>
>> >>>>>> How can I diagnose what exactly cause the error?
>> >>>>>> Thank you so much.
>> >>>>>>
>> >>>>>> Frank
>> >>>>>> <petsc_options.txt>
>> >>>>>>
>> <ksp_view_pre.txt><memory_test2.txt><memory_test3.txt><petsc_options.txt>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> <ksp_view1.txt><ksp_view2.txt><ksp_view3.txt><memory1.txt><memory2.txt><petsc_options1.txt><petsc_options2.txt><petsc_options3.txt>
>> >
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160914/0bb85b33/attachment-0001.html>
-------------- next part --------------
! Purpose: Test ksp solver with a 3d poisson eqn.
!
! Discrip: The domain is a periodic cube. The exact solution is sin(x)*sin(y)*sin(z).
! The eqn is discretized with 2nd order central difference.
! The rhs of the eqn would be -3*sin(x)*(y)*sin(z) * (-h**2), where h is the space step.
! In the matrix, the diagram element is 6 and the other 6 non-zero elements in the row is -1.
!
! Usage: The length of the cube is set by command line input.
! The number of process in each dimension is the same and also set by command line input.
! To run the code on 512^3 grid points and 16^3 cores:
! mpirun -n 4096 ./test_ksp.exe -N 512 -P 16
! Petsc options are passed in a file "pestc_options.txt" in the working directory.
!
! Author: Frank
! hengjiew at uci.edu
!
! Date: 09/14/2016
PROGRAM test_ksp
#include <petsc/finclude/petscdmdef.h>
#include <petsc/finclude/petscmatdef.h>
#include <petsc/finclude/petscvecdef.h>
#include <petsc/finclude/petsckspdef.h>
USE petscdmda
USE petscsys
IMPLICIT NONE
INTEGER, PARAMETER :: pr = 8
CHARACTER(*), PARAMETER :: petsc_options_file="petsc_options.txt"
DM :: decomp
DMBoundaryType :: BType(3)
INTEGER :: N ! length of cube
integer :: P ! process # in each dimension
INTEGER :: is, js ,ks, nxl, nyl, nzl ! subdomain index
INTEGER :: rank
INTEGER :: i, j, k
REAL(pr) :: h ! space step
Vec :: x,b ! solution and rhs
REAL(pr), POINTER :: ptr_vec(:,:,:) => NULL() ! pointer to petsc vector
Mat :: A
MatNullSpace :: nullspace
MatStencil :: row(4), col(4,7)
PetscInt :: i1 = 1, i7 = 7
REAL(pr) :: value(7)
KSP :: ksp
PetscErrorCode :: ierr
REAL(pr) :: q2 = 0.5_pr
REAL(pr) :: erri = 0, err1 = 0
LOGICAL :: is_converged, is_set
CALL PetscInitialize( petsc_options_file, ierr )
CALL MPI_Comm_rank( PETSC_COMM_WORLD, rank, ierr )
! default values
N = 64
P = 2
BType = DM_BOUNDARY_PERIODIC
! read domain size and process# from command line
CALL PetscOptionsGetInt( PETSC_NULL_OBJECT, PETSC_NULL_CHARACTER, '-N', N, is_set, ierr )
CALL PetscOptionsGetInt( PETSC_NULL_OBJECT, PETSC_NULL_CHARACTER, '-P', P, is_set, ierr )
CALL DMDACreate3d( PETSC_COMM_WORLD, BType(1), BType(2), BType(3), &
& DMDA_STENCIL_STAR, N, N, N, P, P, P, 1, 1, &
& PETSC_NULL_INTEGER, PETSC_NULL_INTEGER, PETSC_NULL_INTEGER, &
& decomp, ierr )
CALL DMDAGetCorners( decomp, is, js, ks, nxl, nyl, nzl, ierr)
!WRITE(*,'(7I6)') rank, is, js, ks, nxl, nyl, nzl
! create vector for rhs and solution
CALL DMCreateGlobalVector( decomp, b, ierr )
CALL VecDuplicate( b, x, ierr )
CALL VecSet( x, 0.0_pr, ierr )
! create matrix
CALL DMSetMatType( decomp, MATAIJ, ierr )
CALL DMCreateMatrix( decomp, A, ierr )
! create ksp solver
CALL KSPCreate( PETSC_COMM_WORLD, ksp, ierr )
CALL KSPSetDM( ksp, decomp, ierr )
CALL KSPSetDMActive( ksp, PETSC_FALSE, ierr )
CALL KSPSetFromOptions( ksp, ierr )
! create nullspace, periodic bc give the matrix a nullspace
CALL MatNullSpaceCreate( PETSC_COMM_WORLD, PETSC_TRUE, PETSC_NULL_INTEGER, &
& PETSC_NULL_INTEGER, nullspace, ierr )
! set rhs
CALL DMDAVecGetArrayF90( decomp, b, ptr_vec, ierr )
DO i = is, is+nxl-1
DO j = js, js+nyl-1
DO k = ks, ks+nzl-1
ptr_vec(i,j,k) = -3 * SIN((i+q2)*h) * SIN((j+q2)*h) * SIN((k+q2)*h)
END DO
END DO
END DO
h = 1.0_pr / N
ptr_vec = - h**2 * ptr_vec
! assembly rhs
CALL VecAssemblyBegin( b, ierr )
CALL VecAssemblyEnd( b, ierr )
CALL DMDAVecRestoreArrayF90( decomp, b, ptr_vec, ierr )
! set matrix
DO i = is, is+nxl-1
DO j = js, js+nyl-1
DO k = ks, ks+nzl-1
! row index of current point
row(MatStencil_i) = i
row(MatStencil_j) = j
row(MatStencil_k) = k
! column index of current point and its neighbors
col(MatStencil_i,1) = i; col(MatStencil_j,1) = j; col(MatStencil_k,1) = k
col(MatStencil_i,2) = i-1; col(MatStencil_j,2) = j; col(MatStencil_k,2) = k
col(MatStencil_i,3) = i+1; col(MatStencil_j,3) = j; col(MatStencil_k,3) = k
col(MatStencil_i,4) = i; col(MatStencil_j,4) = j-1; col(MatStencil_k,4) = k
col(MatStencil_i,5) = i; col(MatStencil_j,5) = j+1; col(MatStencil_k,5) = k
col(MatStencil_i,6) = i; col(MatStencil_j,6) = j; col(MatStencil_k,6) = k-1
col(MatStencil_i,7) = i; col(MatStencil_j,7) = j; col(MatStencil_k,7) = k+1
! set values at current point and its neighbors
value(1 ) = 6.0_pr
value(2:) = -1.0_pr
! set the matrix's elements
CALL MatSetValuesStencil( A, i1, row, i7, col, value, INSERT_VALUES, ierr )
END DO
END DO
END DO
! assembly the matrix
CALL MatAssemblyBegin( A, MAT_FINAL_ASSEMBLY, ierr )
CALL MatAssemblyEnd( A, MAT_FINAL_ASSEMBLY, ierr )
! remove the nullspace
CALL MatSetNullSpace( A, nullspace, ierr )
CALL MatNullSpaceRemove( nullspace, b, PETSC_NULL_OBJECT, ierr )
! Solve system
CALL KSPSetOperators( ksp, A, A, ierr )
!CALL KSPSetReusePreconditioner( ksp, reuse_pc, ierr )
CALL KSPSolve( ksp, b, x, ierr )
!CALL KSPGetIterationNumber( ksp, niter, ierr )
!CALL KSPGetConvergedReason( ksp, is_converged, ierr )
! get solution
CALL VecAssemblyBegin( x, ierr )
CALL VecAssemblyEnd( x, ierr )
CALL DMDAVecGetArrayF90( decomp, x, ptr_vec, ierr )
! check the error
DO i = is, is+nxl-1
DO j = js, js+nyl-1
DO k = ks, ks+nzl-1
erri = MAX(erri, ABS(ptr_vec(i,j,k) - SIN((i+q2)*h)*SIN((j+q2)*h)*SIN((k+q2)*h)))
err1 = err1 + ABS(ptr_vec(i,j,k) - SIN((i+q2)*h)*SIN((j+q2)*h)*SIN((k+q2)*h))
END DO
END DO
END DO
CALL DMDAVecRestoreArrayF90( decomp, x, ptr_vec, ierr )
CALL MPI_Allreduce( erri, MPI_IN_PLACE, 1, MPI_REAL8, MPI_MAX, PETSC_COMM_WORLD, ierr )
CALL MPI_Allreduce( err1, MPI_IN_PLACE, 1, MPI_REAL8, MPI_SUM, PETSC_COMM_WORLD, ierr )
IF( rank == 0 ) THEN
PRINT*, 'norm1 error: ', err1 / N**3
PRINT*, 'norm inf error:', erri
END IF
CALL VecDestroy( x, ierr )
CALL VecDestroy( b, ierr )
CALL MatDestroy( A, ierr )
CALL KSPDestroy( ksp, ierr )
END PROGRAM test_ksp
More information about the petsc-users
mailing list