[petsc-users] Strange efficiency in PETSc-dev using OpenMP
Danyang Su
danyang.su at gmail.com
Sun Sep 22 14:43:45 CDT 2013
Hi Barry,
Thanks, please find the answer bellow.
On 22/09/2013 9:53 AM, Barry Smith wrote:
> 1) # WARNING!!! #
> # #
> # This code was compiled with a debugging option, #
> # To get timing results run ./configure #
> # using --with-debugging=no, the performance will #
> # be generally two or three times faster. #
>
> Never time without optimization, it can give very misleading information because different parts of the code speedup very differently when optimized
With optimization, the problem still exists. See attached
log, using 1 thread and 4 threads.
> 2) Where are the 4 MPI processes being placed on your system? Are they being placed on 4 cores on the same CPU (as with the OpenMP run) or possibly on different CPUs?
Yes, the system information are as follows
OS: Windows 7 X64 Pro, CYGWIN
Processor: Intel Xeon E5-2620 2.0GHz, 6cores/12threads
Memory: 16GB
Compiler: Intel Visual Fortran V13.1.
> 3) Do you have any OpenMP pragmas in your code. Make a run where you take them all out
For the current, I have no OpenMP pragmas in the code. The
code is just the same as I used for PETSc MPI version and it works fine
when using MPI.
>
> 4) Both runs are actually taking very little time in the solver;
>
> KSPSolve 1 1.0 9.2897e-002
> KSPSolve 1 1.0 2.9056e-001
>
> How are you getting your matrix? From a file?
Yes, the matrix are currently from the files. Should this
be the problem? The timing is started after the reading matrix. And I
didn't see the speedup for the solver, the runtime for kspsolve are
almost the same.
nthreads = 1, KSPSolve 1 1.0 6.2800e-002
nthreads = 4, KSPSolve
1 1.0 5.5090e-002
The main question is that the program is stuck in the following codes
when run with openmp, but no problem when run with mpi.
do i = istart, iend - 1
ii = ia_in(i+1)
jj = ia_in(i+2)
call MatSetValues(a, ione, i, jj-ii, ja_in(ii:jj-1)-1, a_in(ii:jj-1), Insert_Values, ierr)
end do
The testing codes has also been attached. I have removed some
unnecessary codes, but some are still there.
>
> Barry
>
>
> On Sep 21, 2013, at 11:06 PM, Danyang Su <danyang.su at gmail.com> wrote:
>
>> Hi Shri,
>>
>> Thanks for your info. It can work with the option -threadcomm_type openmp. But another problem arises, as described as follows.
>>
>> The sparse matrix is 53760*53760 with 1067392 non-zero entries. If the codes is compiled using PETSc-3.4.2, it works fine, the equations can be solved quickly and I can see the speedup. But if the code is compiled using PETSc-dev with OpenMP option, it takes a long time in solving the equations and I cannot see any speedup when more processors are used.
>>
>> For PETSc-3.4.2, run by "mpiexec -n 4 ksp_inhm_d -log_summary log_mpi4_petsc3.4.2.log", the iteration and runtime are:
>> Iterations 6 time_assembly 0.4137E-01 time_ksp 0.9296E-01
>>
>> For PETSc-dev, run by "mpiexec -n 1 ksp_inhm_d -threadcomm_type openmp -threadcomm_nthreads 4 -log_summary log_openmp_petsc_dev.log", the iteration and runtime are:
>> Iterations 6 time_assembly 0.3595E+03 time_ksp 0.2907E+00
>>
>> Most of the time 'time_assembly 0.3595E+03' is spent on the following codes
>> do i = istart, iend - 1
>> ii = ia_in(i+1)
>> jj = ia_in(i+2)
>> call MatSetValues(a, ione, i, jj-ii, ja_in(ii:jj-1)-1, a_in(ii:jj-1), Insert_Values, ierr)
>> end do
>>
>> The log files for both PETSc-3.4.2 and PETSc-dev are attached.
>>
>> Is there anything wrong with my codes or with running option? The above codes works fine when using MPICH.
>>
>> Thanks and regards,
>>
>> Danyang
>>
>> On 21/09/2013 2:09 PM, Shri wrote:
>>> There are three thread communicator types in PETSc. The default is "no thread" which is basically a non-threaded version. The other two types are "openmp" and "pthread". If you want to use OpenMP then use the option -threadcomm_type openmp.
>>>
>>> Shri
>>>
>>> On Sep 21, 2013, at 3:46 PM, Danyang Su <danyang.su at gmail.com> wrote:
>>>
>>>> Hi Barry,
>>>>
>>>> Thanks for the quick reply.
>>>>
>>>> After changing
>>>> #if defined(PETSC_HAVE_PTHREADCLASSES) || defined (PETSC_HAVE_OPENMP)
>>>> to
>>>> #if defined(PETSC_HAVE_PTHREADCLASSES)
>>>> and comment out
>>>> #elif defined(PETSC_HAVE_OPENMP)
>>>> PETSC_EXTERN PetscStack *petscstack;
>>>>
>>>> It can be compiled and validated with "make test".
>>>>
>>>> But I still have questions on running the examples. After rebuild the codes (e.g., ksp_ex2f.f), I can run it with "mpiexec -n 1 ksp_ex2f", or "mpiexec -n 4 ksp_ex2f", or "mpiexec -n 1 ksp_ex2f -threadcomm_nthreads 1", but if I run it with "mpiexec -n 1 ksp_ex2f -threadcomm_nthreads 4", there will be a lot of error information (attached).
>>>>
>>>> The codes is not modified and there is no OpenMP routines in it. For the current development in my project, I want to keep the OpenMP codes in calculating matrix values, but want to solve it with PETSc (OpenMP). Is it possible?
>>>>
>>>> Thanks and regards,
>>>>
>>>> Danyang
>>>>
>>>>
>>>>
>>>> On 21/09/2013 7:26 AM, Barry Smith wrote:
>>>>> Danyang,
>>>>>
>>>>> I don't think the || defined (PETSC_HAVE_OPENMP) belongs in the code below.
>>>>>
>>>>> /* Linux functions CPU_SET and others don't work if sched.h is not included before
>>>>> including pthread.h. Also, these functions are active only if either _GNU_SOURCE
>>>>> or __USE_GNU is not set (see /usr/include/sched.h and /usr/include/features.h), hence
>>>>> set these first.
>>>>> */
>>>>> #if defined(PETSC_HAVE_PTHREADCLASSES) || defined (PETSC_HAVE_OPENMP)
>>>>>
>>>>> Edit include/petscerror.h and locate these lines and remove that part and then rerun make all. Let us know if it works or not.
>>>>>
>>>>> Barry
>>>>>
>>>>> i.e. replace
>>>>>
>>>>> #if defined(PETSC_HAVE_PTHREADCLASSES) || defined (PETSC_HAVE_OPENMP)
>>>>>
>>>>> with
>>>>>
>>>>> #if defined(PETSC_HAVE_PTHREADCLASSES)
>>>>>
>>>>> On Sep 21, 2013, at 6:53 AM, Matthew Knepley
>>>>> <petsc-maint at mcs.anl.gov>
>>>>> wrote:
>>>>>
>>>>>
>>>>>> On Sat, Sep 21, 2013 at 12:18 AM, Danyang Su <danyang.su at gmail.com>
>>>>>> wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> I got error information in compiling petsc-dev with openmp in cygwin. Before, I have successfully compiled petsc-3.4.2 and it works fine.
>>>>>> The log files have been attached.
>>>>>>
>>>>>> The OpenMP configure test is wrong. It clearly fails to find pthread.h, but the test passes. Then in petscerror.h
>>>>>> we guard pthread.h using PETSC_HAVE_OPENMP. Can someone who knows OpenMP fix this?
>>>>>>
>>>>>> Matt
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Danyang
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>>>>> -- Norbert Wiener
>>>>>>
>>>> <error.txt>
>> <log_mpi4_petsc3.4.2.log><log_openmp_petsc_dev.log>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: log_openmp_petsc_dev_opt_1.log
Type: application/octet-stream
Size: 10116 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20130922/94acc282/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: log_openmp_petsc_dev_opt_4.log
Type: application/octet-stream
Size: 10145 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20130922/94acc282/attachment-0003.obj>
-------------- next part --------------
program petsc_unsym_f
implicit none
! The following include statements are required for KSP Fortran programs:
! petscsys.h - base PETSc routines
! petscvec.h - vectors
! petscmat.h - matrices
! petscpc.h - preconditioners
! petscksp.h - Krylov subspace methods
! Additional include statements may be needed if using additional
! PETSc routines in a Fortran program, e.g.,
! petscviewer.h - viewers
! petscis.h - index sets
!
#include <finclude/petscsys.h>
#include <finclude/petscvec.h>
#include <finclude/petscvec.h90>
#include <finclude/petscmat.h>
#include <finclude/petscpc.h>
#include <finclude/petscksp.h>
#include <finclude/petscis.h>
#include <finclude/petscdmda.h>
#include <finclude/petscdmda.h90>
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
! Variable declarations
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
!
! Variables:
! ksp - linear solver context
! ksp - Krylov subspace method context
! pc - preconditioner context
! x, b, u - approx solution, right-hand-side, exact solution vectors
! A - matrix that defines linear system
! its - iterations for convergence
! norm - norm of error in solution
! rctx - random number generator context
!
! Note that vectors are declared as PETSc "Vec" objects. These vectors
! are mathematical objects that contain more than just an array of
! double precision numbers. I.e., vectors in PETSc are not just
! double precision x(*).
! However, local vector data can be easily accessed via VecGetArray().
! See the Fortran section of the PETSc users manual for details.
!
#define xx_a(ib) xx_v(xx_i + (ib))
PetscReal :: norm, norm1, norm2, norm1_ref, norm2_ref
PetscInt :: i,j,k,ii,jj,kk,m,n,its
PetscInt :: istart,iend,ione
PetscErrorCode :: ierr
PetscMPIInt :: rank,nprcs
PetscBool :: flg
PetscScalar :: v,one,neg_one
Vec :: x,b,u, x_ref, u_ref
Mat :: a
KSP :: ksp
PetscRandom :: rctx
PetscLogDouble :: tstart,tend
PetscReal :: ttotal_assembly, ttotal_ksp
PetscViewer :: viewer
! Convergence parameters
PetscReal :: rtol !default 1.0E-5
PetscReal :: abstol !default 1.0E-50
PetscReal :: dtol !default 1.0E5
PetscInt :: maxits !default 1.0E5
!c.. all other variables
integer, allocatable :: ia_in(:)
integer, allocatable :: ja_in(:)
PetscScalar, allocatable :: a_in(:)
PetscScalar, allocatable :: b_in(:)
PetscScalar, allocatable :: x_out(:)
PetscReal, allocatable :: x_ref_in(:)
PetscInt, allocatable :: ix(:), d_nnz(:), o_nnz(:)
PetscInt :: nia, nb, nb_sqrt, nnz, nd_nzrow, nlocal
PetscReal :: err_max, err_min, err_norm
PetscScalar :: xx_v(1)
PetscOffset :: xx_i
!PetscInt :: ixs(4)
!PetscInt :: ix_in(2)
!PetscInt :: ix_out(2)
PetscInt :: ng
PetscInt, allocatable :: gindices(:)
ISLocalToGlobalMapping :: mapping
PetscBool :: flgUseDMDA !Check if use DMDA, only valid for structured mesh 1D, 2D and 3D.
PetscInt :: nDimDMDA
DM :: da
PetscInt :: xs, ys, zs, xm, ym, zm, mx, my, mz, nstep
PetscScalar, pointer :: vecpointer(:)
MatPartitioning :: part
IS :: is, isg
character(256) :: strFolder(100)
character(256) :: strXOut
character(256) :: strNameAppend, strResultRefAppend
character(16) :: strKSPType
character(12) :: strnum
integer :: iunit, ifolder, imatrix
! These variables are not currently used.
! PC pc
! PCType ptype
! double precision tol
! Note: Any user-defined Fortran routines (such as MyKSPMonitor)
! MUST be declared as external.
!external MyKSPMonitor,MyKSPConverged
ione = 1
one = 1.0
neg_one = -1.0
! Convergence parameters
rtol = 1.0E-10 !default 1.0E-5
abstol = 1.0E-8 !default 1.0E-50
dtol = 1.0E5 !default 1.0E5
maxits = 100 !default 1.0E5
strKSPType = "KSPGMRES"
!strKSPType = "KSPBCGS"
flgUseDMDA = .false.
nDimDMDA = 1
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
! Beginning of program
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
call PetscInitialize(Petsc_Null_Character,ierr)
call MPI_Comm_rank(Petsc_Comm_World,rank,ierr)
call MPI_Comm_size(Petsc_Comm_World,nprcs,ierr)
!read data
strFolder(1) = "D:\dsu\ResearchAtUBC\Min3P_Parallel_Test\Solver_Matrix_Test\chrom-bio"
strFolder(2) = "D:\dsu\ResearchAtUBC\Min3P_Parallel_Test\Solver_Matrix_Test\chrom-bio-restart"
strFolder(3) = "D:\dsu\ResearchAtUBC\Min3P_Parallel_Test\Solver_Matrix_Test\Decal-concrete-level2"
nd_nzrow = 0 !number of non-zero elements in a row, maximum.
!strNameAppend = "_"
strNameAppend = "_reactran_rt_"
strResultRefAppend = ""
iunit = 21
do ifolder = 3, 3
if(rank == 0) then
write(*,"(a)") trim(strFolder(ifolder))
end if
!read sparse matrix structure ia
open(unit = iunit, file = trim(adjustl(strFolder(ifolder))) // "\ia" //trim(strNameAppend)//"1.txt")
read(iunit,*) nia
if(.not. allocated(ia_in)) then
allocate( ia_in ( nia ) )
end if
if(size(ia_in,1) /= nia) then
deallocate(ia_in)
allocate(ia_in(nia))
end if
read(iunit,*) ia_in(1)
nd_nzrow = ia_in(1)
do i = 2, nia
read(iunit,*) ia_in(i)
if(ia_in(i) - ia_in(i-1) > nd_nzrow) then
nd_nzrow = ia_in(i) - ia_in(i-1)
end if
end do
close(iunit)
write(*,*) "read ia finished, size of ia", size(ia_in,1)
!read sparse matrix structure ja
open(unit = iunit, file = trim(adjustl(strFolder(ifolder))) // "\ja" //trim(strNameAppend)//"1.txt")
read(iunit,*) nnz
if(.not. allocated(ja_in)) then
allocate( ja_in( nnz) )
end if
if(size(ja_in,1) /= nnz) then
deallocate(ja_in)
allocate(ja_in(nnz))
end if
do i = 1, nnz
read(iunit,*) ja_in(i)
end do
close(iunit)
write(*,*) "read ja finished. size of ja", size(ja_in,1)
nb = nia - 1
!initialize indices
if(.not. allocated(ix)) then
allocate( ix( nb) )
end if
if(size(ix,1) /= nb) then
deallocate(ix)
allocate(ix(nb))
end if
do i = 1, nb
ix(i)=i-1
end do
!initialize
if(.not. allocated(x_out)) then
allocate( x_out( nb) )
end if
if(size(x_out,1) /= nb) then
deallocate(x_out)
allocate(x_out(nb))
end if
!initialize d_nnz, o_nnz
if(.not. allocated(d_nnz)) then
allocate( d_nnz( nb) )
end if
if(size(d_nnz,1) /= nb) then
deallocate(d_nnz)
allocate(d_nnz(nb))
end if
d_nnz = 0
if(.not. allocated(o_nnz)) then
allocate( o_nnz( nb) )
end if
if(size(o_nnz,1) /= nb) then
deallocate(o_nnz)
allocate(o_nnz(nb))
end if
o_nnz = 0
call MatCreate(Petsc_Comm_World, a, ierr)
call MatSetSizes(a, Petsc_Decide, Petsc_Decide, nb, nb, ierr)
call MatSetFromOptions(a, ierr)
call MatSetUp(a, ierr)
call MatGetOwnershipRange(a,istart,iend,ierr)
do i = istart + 1, iend
do j = 1, ia_in(i+1) - ia_in(i)
k = ja_in(ia_in(i)+j-1)
if(k>istart .and. k<=iend) then
d_nnz(i) = d_nnz(i) + 1
else
o_nnz(i) = o_nnz(i) + 1
end if
end do
end do
call MatMPIAIJSetPreallocation(a,Petsc_Null_Integer,d_nnz(istart+1:iend), Petsc_Null_Integer,o_nnz(istart+1:iend), ierr)
!!Create stiffness matrix
!Currently, all PETSc parallel matrix formats are partitioned by
!contiguous chunks of rows across the processors. Determine which
!rows of the matrix are locally owned.
!call MatGetOwnershipRange(a,istart,iend,ierr)
!Create local element variables and RHS
call VecCreateMPI(Petsc_Comm_World, Petsc_Decide, nb, b, ierr)
call VecDuplicate(b, x, ierr)
call VecDuplicate(x, u, ierr)
call VecDuplicate(u, x_ref, ierr)
call VecDuplicate(x_ref, u_ref, ierr)
write(*,*) "Mat preallocation finished."
!!local to global mapping
call VecGetSize(b, m, ierr)
call VecGetOwnershipRange(b,istart,iend,ierr)
!!Method 3, consider ghost node index
ng = iend-istart + 2
if(.not. allocated(gindices)) then
allocate( gindices(ng) )
end if
if(size(gindices,1) /= ng) then
deallocate(gindices)
allocate(gindices(ng))
end if
gindices(1)=istart-1
do i = 1, ng-1
gindices(i+1)=gindices(i)+1
end do
!map the first and last point as periodic
if(gindices(1) == -1) then
gindices(1) = m-1
end if
if(gindices(ng) == m) then
gindices(ng) = 0
end if
call ISLocalToGlobalMappingCreate(Petsc_Comm_Self,ng,gindices,Petsc_Copy_Values,mapping,ierr)
call VecSetLocalToGlobalMapping(b,mapping,ierr);
call ISLocalToGlobalMappingDestroy(mapping,ierr)
write(*,*) "Local global mapping finished."
!Create linear solver context
call KSPCreate(Petsc_Comm_World,ksp,ierr)
!Set KSP type
if(trim(strKSPType) == "KSPGMRES") then
call KSPSetType(ksp, KSPGMRES, ierr)
else if(trim(strKSPType) == "KSPBCGS") then
call KSPSetType(ksp, KSPBCGS, ierr)
end if
!Set linear solver defaults for this problem (optional).
call KSPSetTolerances(ksp,rtol, abstol, dtol, maxits,ierr)
call KSPSetFromOptions(ksp,ierr)
write(*,*) "Create KSP solver finished."
nstep = 0
!Read matrix value and right hand side, contructure linear equation
do imatrix = 1, 10
nstep = nstep + 1
write(*,*) "Begin matrix ", imatrix
write(strnum, *) imatrix
strXOut = trim(adjustl(strFolder(ifolder))) // "\x_PETSc"// trim(strNameAppend)//trim(adjustl(strnum))//".txt"
!read matrix value a
open(unit = iunit, file = trim(adjustl(strFolder(ifolder))) // "\a" // trim(strNameAppend)//trim(adjustl(strnum))//".txt")
read(iunit,*) nnz
if(.not. allocated(a_in)) then
allocate( a_in(nnz) )
end if
if(size(a_in,1) /= nnz) then
deallocate(a_in)
allocate(a_in(nnz))
end if
do i = 1, nnz
read(iunit,*) a_in(i)
end do
close(iunit)
write(*,*) "read a finished, size of a", size(a_in,1)
!read right hand side b
open(unit = iunit, file = trim(adjustl(strFolder(ifolder))) // "\b"//trim(strNameAppend)//trim(adjustl(strnum))//".txt")
read(iunit,*) nb
if(.not. allocated(b_in)) then
allocate(b_in(nb))
end if
if(size(b_in,1) /= nb ) then
deallocate(b_in)
allocate(b_in(nb))
end if
do i = 1, nb
read(iunit,*) b_in(i)
end do
close(iunit)
write(*,*) "read b finished, size of b", size(b_in,1)
!referenced result
open(unit = iunit, file = trim(adjustl(strFolder(ifolder))) // "\x"//trim(strResultRefAppend)//trim(strNameAppend)//trim(adjustl(strnum))//".txt")
read(iunit,*) nb
if(.not. allocated(x_ref_in)) then
allocate(x_ref_in(nb))
end if
if(size(x_ref_in,1)/=nb) then
deallocate(x_ref_in)
allocate(x_ref_in(nb))
end if
do i = 1, nb
read(iunit,*) x_ref_in(i)
end do
close(iunit)
write(*,*) "read x_ref finished, size of x_ref", size(x_ref_in,1)
!Set the solution from WatSolv (ws209)
call VecGetOwnershipRange(x_ref,istart,iend,ierr)
do i = istart, iend - 1
call VecSetValue(x_ref, i, x_ref_in(i+1), Insert_Values, ierr)
end do
call VecAssemblyBegin(x_ref,ierr)
call VecAssemblyEnd(x_ref,ierr)
write(*,*) "Assembly reference result finished"
call PetscTime(tstart, ierr)
!Set matrix elements
! - Each processor needs to insert only elements that it owns
! locally (but any non-local elements will be sent to the
! appropriate processor during matrix assembly).
! - Always specify global row and columns of matrix entries.
! - Note that MatSetValues() uses 0-based row and column numbers
! in Fortran as well as in C.
call MatZeroEntries(a, ierr)
write(*,*) "mark 1"
call MatGetOwnershipRange(a,istart,iend,ierr)
write(*,*) "mark 2"
do i = istart, iend - 1
ii = ia_in(i+1)
jj = ia_in(i+2)
call MatSetValues(a, ione, i, jj-ii, ja_in(ii:jj-1)-1, a_in(ii:jj-1), Insert_Values, ierr)
end do
write(*,*) "mark 3"
call MatAssemblyBegin(a, Mat_Final_Assembly, ierr)
call MatAssemblyEnd(a, Mat_Final_Assembly, ierr)
write(*,*) "mark 4"
if(nstep == 1) then
call MatSetOption(a,Mat_New_Nonzero_location_Err,Petsc_True,ierr)
!!Test matrix partition using ParMetis
!call MatPartitioningCreate(Petsc_Comm_World, part, ierr)
!call MatPartitioningSetAdjacency(part, a, ierr)
!call MatPartitioningSetFromOptions(part, ierr)
!call MatPartitioningApply(part, is, ierr)
!call ISPartitioningToNumbering(is, isg, ierr)
!call ISView(isg, PETSC_VIEWER_STDOUT_WORLD, ierr)
!call ISDestroy(is, ierr)
!call ISDestroy(isg, ierr)
!call MatPartitioningDestroy(part, ierr)
end if
!call MatView(a, PETSC_VIEWER_STDOUT_WORLD, ierr)
write(*,*) "Assembly matrix a finished"
!!Set right hand side b
call VecGetOwnershipRange(b,istart,iend,ierr)
!!Method 1
!do i = istart, iend - 1
! call VecSetValue(b, i, b_in(i+1), Insert_Values, ierr)
!end do
!!Method 2
!!Use VecSetValues instead, it is faster.
!call VecSetValues(b, iend-istart, ix(istart+1:iend), b_in(istart+1:iend), Insert_Values, ierr)
!!Method 3-1, set the values using the local ordering, without inserting ghost values
!do i = 2, ng-1
! call VecSetValuesLocal(b,1,i-1,b_in(istart+i-1:istart+i-1),Insert_Values,ierr)
!end do
!!Method 3-2, set the values using the local ordering, without inserting ghost values
call VecSetValuesLocal(b,ng-2,ix(2:ng-1),b_in(istart+1:istart+ng-2),Insert_Values,ierr)
!!Assemble matrix
call VecAssemblyBegin(b,ierr)
call VecAssemblyEnd(b,ierr)
!call VecView(b,PETSC_VIEWER_STDOUT_WORLD,ierr)
write(*,*) "Assembly right-hand-side b finished"
call PetscTime(tend, ierr)
ttotal_assembly = tend - tstart
call PetscTime(tstart, ierr)
!call KSPSetOperators(ksp,a,a,SAME_PRECONDITIONER,ierr) !Fail for NWMO_Basin
call KSPSetOperators(ksp,a,a,SAME_NONZERO_PATTERN,ierr)
!call KSPSetOperators(ksp,a,a,DIFFERENT_NONZERO_PATTERN,ierr)
call KSPSolve(ksp,b,x,ierr)
!call VecView(x,PETSC_VIEWER_STDOUT_WORLD,ierr)
call PetscTime(tend, ierr)
ttotal_ksp = tend - tstart
write(*,*) "KSP solver finished"
!!calculate residual r=ax-b, not required
call MatMult(a, x, u, ierr)
call VecAXPY(u, neg_one, b, ierr)
!!calculate norm of residual a, not required
call VecNormBegin(u, Norm_1, norm1, ierr)
call VecNormBegin(u, Norm_2, norm2, ierr)
call VecNormEnd(u, Norm_1, norm1, ierr)
call VecNormEnd(u, Norm_2, norm2, ierr)
!!Calculate norm of PETSc solution and compare solution PETSc vs WatSolv (ws209)
call MatMult(a,x_ref,u_ref,ierr)
call VecAXPY(u_ref, neg_one, b, ierr)
call VecNormBegin(u_ref, Norm_1, norm1_ref, ierr)
call VecNormBegin(u_ref, Norm_2, norm2_ref, ierr)
call VecNormEnd(u_ref, Norm_1, norm1_ref, ierr)
call VecNormEnd(u_ref, Norm_2, norm2_ref, ierr)
call VecAXPY(x_ref, neg_one, x, ierr)
call VecMax(x_ref, PETSC_NULL_INTEGER, err_max, ierr)
call VecMin(x_ref, PETSC_NULL_INTEGER, err_min, ierr)
call VecNorm(x_ref, Norm_2, err_norm, ierr)
!!get solver residual
call KSPGetResidualNorm(ksp,norm,ierr)
call KSPGetIterationNumber(ksp,its,ierr)
call KSPGetErrorIfNotConverged(ksp,flg,ierr)
!!Get solution vector to array data
!call VecGetOwnershipRange(x,istart,iend,ierr)
!call VecGetValues(x, iend-istart,ix(istart+1:iend),x_out(istart+1:iend), ierr)
!write(*, 90) istart+1, x_out(istart+1), iend, x_out(iend)
!!Get solution vector by pointer,
!call VecGetArray(x, xx_v, xx_i, ierr)
!call VecGetLocalSize(x,nlocal,ierr) !you can not use ownership range istart, iend here, it's buggy.
!write(*, 90) istart+1, xx_a(1), iend, xx_a(nlocal)
!call VecRestoreArray(x, xx_v, xx_i, ierr)
90 format ('x(', i6, ') = ',e11.4, ' x(', i6, ') = ',e11.4)
write(*,*) "Get solution residual finished"
!!Output the solution solved by PETSc
call PetscViewerASCIIOpen(Petsc_Comm_World, trim(strXOut) , viewer, ierr)
call VecView(x, viewer, ierr)
call PetscViewerDestroy(viewer, ierr)
if (rank == 0) then
if(flg) then
write(*,*) "Solver does not converge"
else
write(*,100) its, norm, norm1, norm2, ttotal_assembly, ttotal_ksp
end if
end if
100 format('Iterations ',i5, ' norm ', e11.4, ' norm1 ', e11.4, ' norm2 ', e11.4, &
' time_assembly ', e11.4, ' time_ksp ', e11.4)
end do
!release local arrays
if (allocated(ia_in)) deallocate(ia_in)
if (allocated(ja_in)) deallocate(ja_in)
if (allocated(a_in)) deallocate(a_in)
if (allocated(b_in)) deallocate(b_in)
if (allocated(x_out)) deallocate(x_out)
if (allocated(x_ref_in)) deallocate(x_ref_in)
if (allocated(ix)) deallocate(ix)
if (allocated(gindices)) deallocate(gindices)
if (allocated(d_nnz)) deallocate(d_nnz)
if (allocated(o_nnz)) deallocate(o_nnz)
!release Petsc objects
call KSPDestroy(ksp,ierr)
call VecDestroy(b, ierr)
call VecDestroy(x, ierr)
call VecDestroy(u, ierr)
call VecDestroy(x_ref, ierr)
call VecDestroy(u_ref, ierr)
call MatDestroy(a, ierr)
if(flgUseDMDA) then
call DMDestroy(da,ierr)
end if
end do
call PetscFinalize(ierr)
end program
More information about the petsc-users
mailing list