[petsc-users] Strange efficiency in PETSc-dev using OpenMP

Danyang Su danyang.su at gmail.com
Sun Sep 22 14:43:45 CDT 2013


Hi Barry,

Thanks, please find the answer bellow.

On 22/09/2013 9:53 AM, Barry Smith wrote:
> 1)      #                          WARNING!!!                    #
>        #                                                        #
>        #   This code was compiled with a debugging option,      #
>        #   To get timing results run ./configure                #
>        #   using --with-debugging=no, the performance will      #
>        #   be generally two or three times faster.              #
>
>      Never time without optimization, it can give very misleading information because different parts of the code speedup very differently when optimized
              With optimization, the problem still exists. See attached 
log, using 1 thread and 4 threads.
> 2) Where are the 4 MPI processes being placed on your system? Are they being placed on 4 cores on the same CPU (as with the OpenMP run) or possibly on different CPUs?
             Yes, the system information are as follows
             OS: Windows 7 X64 Pro, CYGWIN
             Processor: Intel Xeon E5-2620 2.0GHz, 6cores/12threads
             Memory: 16GB
             Compiler: Intel Visual Fortran V13.1.
> 3) Do you have any OpenMP pragmas in your code. Make a run where you take them all out
             For the current, I have no OpenMP pragmas in the code. The 
code is just the same as I used for PETSc MPI version and it works fine 
when using MPI.
>
> 4) Both runs are actually taking very little time in the solver;
>
>    KSPSolve               1 1.0 9.2897e-002
> KSPSolve               1 1.0 2.9056e-001
>
>     How are you getting your matrix? From a file?
             Yes, the matrix are currently from the files. Should this 
be the problem? The timing is started after the reading matrix. And I 
didn't see the speedup for the solver, the runtime for kspsolve are 
almost the same.

       nthreads = 1,  KSPSolve               1 1.0   6.2800e-002

             nthreads = 4,       KSPSolve                               
1 1.0 5.5090e-002

The main question is that the program is stuck in the following codes 
when run with openmp, but no problem when run with mpi.

                 do i = istart, iend - 1
                    ii = ia_in(i+1)
                    jj = ia_in(i+2)
                    call MatSetValues(a, ione, i, jj-ii, ja_in(ii:jj-1)-1, a_in(ii:jj-1), Insert_Values, ierr)
                 end do

The testing codes has also been attached. I have removed some 
unnecessary codes, but some are still there.
>
>     Barry
>
>
> On Sep 21, 2013, at 11:06 PM, Danyang Su <danyang.su at gmail.com> wrote:
>
>> Hi Shri,
>>
>> Thanks for your info. It can work with the option -threadcomm_type openmp. But another problem arises, as described as follows.
>>
>> The sparse matrix is  53760*53760 with 1067392 non-zero entries. If the codes is compiled using PETSc-3.4.2, it works fine, the equations can be solved quickly and I can see the speedup. But if the code is compiled using PETSc-dev with OpenMP option, it takes a long time in solving the equations and I cannot see any speedup when more processors are used.
>>
>> For PETSc-3.4.2,  run by "mpiexec -n 4 ksp_inhm_d -log_summary log_mpi4_petsc3.4.2.log", the iteration and runtime are:
>> Iterations     6 time_assembly  0.4137E-01 time_ksp  0.9296E-01
>>
>> For PETSc-dev,  run by "mpiexec -n 1 ksp_inhm_d -threadcomm_type openmp -threadcomm_nthreads 4 -log_summary log_openmp_petsc_dev.log", the iteration and runtime are:
>> Iterations     6 time_assembly  0.3595E+03 time_ksp  0.2907E+00
>>
>> Most of the time 'time_assembly  0.3595E+03' is spent on the following codes
>>                  do i = istart, iend - 1
>>                     ii = ia_in(i+1)
>>                     jj = ia_in(i+2)
>>                     call MatSetValues(a, ione, i, jj-ii, ja_in(ii:jj-1)-1, a_in(ii:jj-1), Insert_Values, ierr)
>>                  end do
>>
>> The log files for both PETSc-3.4.2 and PETSc-dev are attached.
>>
>> Is there anything wrong with my codes or with running option? The above codes works fine when using MPICH.
>>
>> Thanks and regards,
>>
>> Danyang
>>
>> On 21/09/2013 2:09 PM, Shri wrote:
>>> There are three thread communicator types in PETSc. The default is "no thread" which is basically a non-threaded version. The other two types are "openmp" and "pthread". If you want to use OpenMP then use the option -threadcomm_type openmp.
>>>
>>> Shri
>>>
>>> On Sep 21, 2013, at 3:46 PM, Danyang Su <danyang.su at gmail.com> wrote:
>>>
>>>> Hi Barry,
>>>>
>>>> Thanks for the quick reply.
>>>>
>>>> After changing
>>>> #if defined(PETSC_HAVE_PTHREADCLASSES) || defined (PETSC_HAVE_OPENMP)
>>>> to
>>>> #if defined(PETSC_HAVE_PTHREADCLASSES)
>>>> and comment out
>>>> #elif defined(PETSC_HAVE_OPENMP)
>>>> PETSC_EXTERN PetscStack *petscstack;
>>>>
>>>> It can be compiled and validated with "make test".
>>>>
>>>> But I still have questions on running the examples. After rebuild the codes (e.g., ksp_ex2f.f), I can run it with "mpiexec -n 1 ksp_ex2f", or "mpiexec -n 4 ksp_ex2f", or "mpiexec -n 1 ksp_ex2f -threadcomm_nthreads 1", but if I run it with "mpiexec -n 1 ksp_ex2f -threadcomm_nthreads 4", there will be a lot of error information (attached).
>>>>
>>>> The codes is not modified and there is no OpenMP routines in it. For the current development in my project, I want to keep the OpenMP codes in calculating matrix values, but want to solve it with PETSc (OpenMP). Is it possible?
>>>>
>>>> Thanks and regards,
>>>>
>>>> Danyang
>>>>
>>>>
>>>>
>>>> On 21/09/2013 7:26 AM, Barry Smith wrote:
>>>>>    Danyang,
>>>>>
>>>>>       I don't think the  || defined (PETSC_HAVE_OPENMP)   belongs in the code below.
>>>>>
>>>>> /*  Linux functions CPU_SET and others don't work if sched.h is not included before
>>>>>      including pthread.h. Also, these functions are active only if either _GNU_SOURCE
>>>>>      or __USE_GNU is not set (see /usr/include/sched.h and /usr/include/features.h), hence
>>>>>      set these first.
>>>>> */
>>>>> #if defined(PETSC_HAVE_PTHREADCLASSES) || defined (PETSC_HAVE_OPENMP)
>>>>>
>>>>> Edit include/petscerror.h and locate these lines and remove that part and then rerun make all.  Let us know if it works or not.
>>>>>
>>>>>     Barry
>>>>>
>>>>> i.e. replace
>>>>>
>>>>> #if defined(PETSC_HAVE_PTHREADCLASSES) || defined (PETSC_HAVE_OPENMP)
>>>>>
>>>>> with
>>>>>
>>>>> #if defined(PETSC_HAVE_PTHREADCLASSES)
>>>>>
>>>>> On Sep 21, 2013, at 6:53 AM, Matthew Knepley
>>>>> <petsc-maint at mcs.anl.gov>
>>>>>   wrote:
>>>>>
>>>>>
>>>>>> On Sat, Sep 21, 2013 at 12:18 AM, Danyang Su <danyang.su at gmail.com>
>>>>>>   wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> I got error information in compiling petsc-dev with openmp in cygwin. Before, I have successfully compiled petsc-3.4.2 and it works fine.
>>>>>> The log files have been attached.
>>>>>>
>>>>>> The OpenMP configure test is wrong. It clearly fails to find pthread.h, but the test passes. Then in petscerror.h
>>>>>> we guard pthread.h using PETSC_HAVE_OPENMP. Can someone who knows OpenMP fix this?
>>>>>>
>>>>>>      Matt
>>>>>>   
>>>>>> Thanks,
>>>>>>
>>>>>> Danyang
>>>>>>
>>>>>>
>>>>>>
>>>>>> -- 
>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>>>>> -- Norbert Wiener
>>>>>>
>>>> <error.txt>
>> <log_mpi4_petsc3.4.2.log><log_openmp_petsc_dev.log>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: log_openmp_petsc_dev_opt_1.log
Type: application/octet-stream
Size: 10116 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20130922/94acc282/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: log_openmp_petsc_dev_opt_4.log
Type: application/octet-stream
Size: 10145 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20130922/94acc282/attachment-0003.obj>
-------------- next part --------------
      program petsc_unsym_f
    
      implicit none
      
!  The following include statements are required for KSP Fortran programs:
!     petscsys.h       - base PETSc routines
!     petscvec.h    - vectors
!     petscmat.h    - matrices
!     petscpc.h     - preconditioners
!     petscksp.h    - Krylov subspace methods
!  Additional include statements may be needed if using additional
!  PETSc routines in a Fortran program, e.g.,
!     petscviewer.h - viewers
!     petscis.h     - index sets
!      
#include <finclude/petscsys.h>
#include <finclude/petscvec.h>
#include <finclude/petscvec.h90>
#include <finclude/petscmat.h>
#include <finclude/petscpc.h>
#include <finclude/petscksp.h>  

#include <finclude/petscis.h>

#include <finclude/petscdmda.h>
#include <finclude/petscdmda.h90>

! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
!                   Variable declarations
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
!
!  Variables:
!     ksp     - linear solver context
!     ksp      - Krylov subspace method context
!     pc       - preconditioner context
!     x, b, u  - approx solution, right-hand-side, exact solution vectors
!     A        - matrix that defines linear system
!     its      - iterations for convergence
!     norm     - norm of error in solution
!     rctx     - random number generator context
!
!  Note that vectors are declared as PETSc "Vec" objects.  These vectors
!  are mathematical objects that contain more than just an array of
!  double precision numbers. I.e., vectors in PETSc are not just
!        double precision x(*).
!  However, local vector data can be easily accessed via VecGetArray().
!  See the Fortran section of the PETSc users manual for details.
!
#define xx_a(ib) xx_v(xx_i + (ib))
      PetscReal :: norm, norm1, norm2, norm1_ref, norm2_ref
      PetscInt :: i,j,k,ii,jj,kk,m,n,its
      PetscInt :: istart,iend,ione
      PetscErrorCode :: ierr
      PetscMPIInt ::    rank,nprcs
      PetscBool ::  flg
      PetscScalar :: v,one,neg_one
      Vec ::        x,b,u, x_ref, u_ref
      Mat ::        a
      KSP ::        ksp
      PetscRandom :: rctx
      PetscLogDouble  :: tstart,tend
      PetscReal :: ttotal_assembly, ttotal_ksp
      PetscViewer :: viewer
      
!     Convergence parameters
      PetscReal :: rtol     !default 1.0E-5
      PetscReal :: abstol   !default 1.0E-50
      PetscReal :: dtol     !default 1.0E5
      PetscInt  :: maxits   !default 1.0E5
      
!c.. all other variables
      integer, allocatable :: ia_in(:) 
      integer, allocatable :: ja_in(:)
      PetscScalar, allocatable :: a_in(:)
      PetscScalar, allocatable :: b_in(:) 
      PetscScalar, allocatable :: x_out(:)
      PetscReal, allocatable :: x_ref_in(:)
      PetscInt, allocatable :: ix(:), d_nnz(:), o_nnz(:)
      PetscInt :: nia, nb, nb_sqrt, nnz, nd_nzrow, nlocal
      PetscReal :: err_max, err_min, err_norm
      
      PetscScalar :: xx_v(1)
      PetscOffset :: xx_i
      
      !PetscInt  :: ixs(4)
      !PetscInt  :: ix_in(2)
      !PetscInt  :: ix_out(2)
      PetscInt :: ng
      PetscInt, allocatable :: gindices(:)
      ISLocalToGlobalMapping :: mapping
      
      PetscBool :: flgUseDMDA   !Check if use DMDA, only valid for structured mesh 1D, 2D and 3D.
      PetscInt  :: nDimDMDA
      DM :: da
      PetscInt :: xs, ys, zs, xm, ym, zm, mx, my, mz, nstep
      PetscScalar, pointer :: vecpointer(:)

      MatPartitioning :: part
      IS :: is, isg
      


      character(256) :: strFolder(100)
      character(256) :: strXOut
      character(256) :: strNameAppend, strResultRefAppend
      
      character(16) :: strKSPType
      
      character(12) :: strnum
      integer :: iunit, ifolder, imatrix  
      

!  These variables are not currently used.
!      PC          pc
!      PCType      ptype
!      double precision tol


!  Note: Any user-defined Fortran routines (such as MyKSPMonitor)
!  MUST be declared as external.

     !external MyKSPMonitor,MyKSPConverged
     ione    = 1 
     one     = 1.0
     neg_one = -1.0
     
!     Convergence parameters
     rtol    = 1.0E-10 !default 1.0E-5
     abstol  = 1.0E-8 !default 1.0E-50
     dtol    = 1.0E5 !default 1.0E5
     maxits  = 100 !default 1.0E5
     
     strKSPType = "KSPGMRES"
     !strKSPType = "KSPBCGS"
     
     flgUseDMDA = .false.  
     nDimDMDA = 1    
     

! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
!                 Beginning of program
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -     
 
     call PetscInitialize(Petsc_Null_Character,ierr)
     
     call MPI_Comm_rank(Petsc_Comm_World,rank,ierr)
     call MPI_Comm_size(Petsc_Comm_World,nprcs,ierr)
     
     !read data     
    
      strFolder(1) = "D:\dsu\ResearchAtUBC\Min3P_Parallel_Test\Solver_Matrix_Test\chrom-bio"
      strFolder(2) = "D:\dsu\ResearchAtUBC\Min3P_Parallel_Test\Solver_Matrix_Test\chrom-bio-restart"
      strFolder(3) = "D:\dsu\ResearchAtUBC\Min3P_Parallel_Test\Solver_Matrix_Test\Decal-concrete-level2"
      
      nd_nzrow = 0          !number of non-zero elements in a row, maximum.
      !strNameAppend = "_"
      strNameAppend = "_reactran_rt_"
     
      strResultRefAppend = ""

      
      iunit = 21  
      
      do ifolder = 3, 3
          if(rank == 0) then
              write(*,"(a)") trim(strFolder(ifolder))
          end if
                   
          !read sparse matrix structure ia
          open(unit = iunit, file = trim(adjustl(strFolder(ifolder))) // "\ia" //trim(strNameAppend)//"1.txt")
          read(iunit,*) nia
          if(.not. allocated(ia_in)) then
              allocate( ia_in ( nia ) )    
          end if
          if(size(ia_in,1) /= nia) then
              deallocate(ia_in)
              allocate(ia_in(nia))
          end if
        
          read(iunit,*) ia_in(1)    
          nd_nzrow = ia_in(1) 
          do i = 2, nia
              read(iunit,*) ia_in(i)  
              if(ia_in(i) - ia_in(i-1) > nd_nzrow) then
                  nd_nzrow = ia_in(i) - ia_in(i-1)
              end if
          end do
          close(iunit)
          
          write(*,*) "read ia finished, size of ia", size(ia_in,1)            
       
          !read sparse matrix structure ja
          open(unit = iunit, file = trim(adjustl(strFolder(ifolder))) // "\ja" //trim(strNameAppend)//"1.txt")
          read(iunit,*) nnz
          if(.not. allocated(ja_in)) then
              allocate( ja_in( nnz) )    
          end if
          if(size(ja_in,1) /= nnz) then
              deallocate(ja_in)
              allocate(ja_in(nnz))
          end if
        
          do i = 1, nnz
              read(iunit,*) ja_in(i)
          end do
          close(iunit)  
          write(*,*) "read ja finished. size of ja", size(ja_in,1) 
         
          nb = nia - 1
          !initialize indices
          if(.not. allocated(ix)) then
              allocate( ix( nb) )    
          end if
          if(size(ix,1) /= nb) then
              deallocate(ix)
              allocate(ix(nb))
          end if
          do i = 1, nb
              ix(i)=i-1
          end do
          
          !initialize 
          if(.not. allocated(x_out)) then
              allocate( x_out( nb) )    
          end if
          if(size(x_out,1) /= nb) then
              deallocate(x_out)
              allocate(x_out(nb))
          end if
          
          !initialize d_nnz, o_nnz
          if(.not. allocated(d_nnz)) then
              allocate( d_nnz( nb) )    
          end if
          if(size(d_nnz,1) /= nb) then
              deallocate(d_nnz)
              allocate(d_nnz(nb))
          end if  
          d_nnz = 0
          
          if(.not. allocated(o_nnz)) then
              allocate( o_nnz( nb) )    
          end if
          if(size(o_nnz,1) /= nb) then
              deallocate(o_nnz)
              allocate(o_nnz(nb))
          end if  
          o_nnz = 0          
          
          call MatCreate(Petsc_Comm_World, a, ierr)
          call MatSetSizes(a, Petsc_Decide, Petsc_Decide, nb, nb, ierr)
          call MatSetFromOptions(a, ierr)
          call MatSetUp(a, ierr)
          call MatGetOwnershipRange(a,istart,iend,ierr)              
          do i = istart + 1, iend
              do j = 1, ia_in(i+1) - ia_in(i)
                  k = ja_in(ia_in(i)+j-1)
                  if(k>istart .and. k<=iend) then
                      d_nnz(i) = d_nnz(i) + 1
                  else
                      o_nnz(i) = o_nnz(i) + 1
                  end if
              end do
          end do 
          call MatMPIAIJSetPreallocation(a,Petsc_Null_Integer,d_nnz(istart+1:iend), Petsc_Null_Integer,o_nnz(istart+1:iend), ierr)
            
          !!Create stiffness matrix
            
          !Currently, all PETSc parallel matrix formats are partitioned by
          !contiguous chunks of rows across the processors.  Determine which
          !rows of the matrix are locally owned.
          !call MatGetOwnershipRange(a,istart,iend,ierr)     
          
          !Create local element variables and RHS
          call VecCreateMPI(Petsc_Comm_World, Petsc_Decide, nb, b, ierr)
          
          call VecDuplicate(b, x, ierr) 
          call VecDuplicate(x, u, ierr)
          call VecDuplicate(u, x_ref, ierr)
          call VecDuplicate(x_ref, u_ref, ierr)          
          
          write(*,*) "Mat preallocation finished."          
          
          !!local to global mapping          
          call VecGetSize(b, m, ierr)
          call VecGetOwnershipRange(b,istart,iend,ierr)
          
          !!Method 3, consider ghost node index
          ng = iend-istart + 2
          if(.not. allocated(gindices)) then
              allocate( gindices(ng) )    
          end if
          if(size(gindices,1) /= ng) then
              deallocate(gindices)
              allocate(gindices(ng))
          end if
          gindices(1)=istart-1
          do i = 1, ng-1
              gindices(i+1)=gindices(i)+1
          end do          
          !map the first and last point as periodic          
          if(gindices(1) == -1) then
              gindices(1) = m-1
          end if
          if(gindices(ng) == m) then
              gindices(ng) = 0
          end if  
          
          call ISLocalToGlobalMappingCreate(Petsc_Comm_Self,ng,gindices,Petsc_Copy_Values,mapping,ierr)
          call VecSetLocalToGlobalMapping(b,mapping,ierr);
          call ISLocalToGlobalMappingDestroy(mapping,ierr) 
          
          write(*,*) "Local global mapping finished."  
          
          !Create linear solver context
          call KSPCreate(Petsc_Comm_World,ksp,ierr)
          
          !Set KSP type
          if(trim(strKSPType) == "KSPGMRES") then
            call KSPSetType(ksp, KSPGMRES, ierr)
          else if(trim(strKSPType) == "KSPBCGS") then
            call KSPSetType(ksp, KSPBCGS, ierr)
          end if
          
          !Set linear solver defaults for this problem (optional).
          call KSPSetTolerances(ksp,rtol, abstol, dtol, maxits,ierr)
          
          call KSPSetFromOptions(ksp,ierr)
          
          write(*,*) "Create KSP solver finished."  
          
          nstep = 0
          
          !Read matrix value and right hand side, contructure linear equation
          do imatrix = 1, 10
              
            nstep = nstep + 1
            
            write(*,*) "Begin matrix ", imatrix
              
            write(strnum, *) imatrix
            
            strXOut = trim(adjustl(strFolder(ifolder))) // "\x_PETSc"// trim(strNameAppend)//trim(adjustl(strnum))//".txt"
            
            !read matrix value a
            open(unit = iunit, file = trim(adjustl(strFolder(ifolder))) // "\a" // trim(strNameAppend)//trim(adjustl(strnum))//".txt")
            read(iunit,*) nnz
            if(.not. allocated(a_in)) then
                allocate( a_in(nnz) )    
            end if
            if(size(a_in,1) /= nnz) then
                deallocate(a_in)
                allocate(a_in(nnz))
            end if
        
            do i = 1, nnz
                read(iunit,*) a_in(i)
            end do
            close(iunit) 
            write(*,*) "read a finished, size of a", size(a_in,1)        
        
            !read right hand side b
            open(unit = iunit, file = trim(adjustl(strFolder(ifolder))) // "\b"//trim(strNameAppend)//trim(adjustl(strnum))//".txt")
            read(iunit,*) nb
            if(.not. allocated(b_in)) then
                allocate(b_in(nb))    
            end if
            if(size(b_in,1) /= nb ) then
                deallocate(b_in)
                allocate(b_in(nb))
            end if        
            do i = 1, nb
                read(iunit,*) b_in(i)
            end do
            close(iunit)        
            write(*,*) "read b finished, size of b", size(b_in,1) 
            
            !referenced result
            open(unit = iunit, file = trim(adjustl(strFolder(ifolder))) // "\x"//trim(strResultRefAppend)//trim(strNameAppend)//trim(adjustl(strnum))//".txt")
            read(iunit,*) nb
            if(.not. allocated(x_ref_in)) then
                allocate(x_ref_in(nb))
            end if
            if(size(x_ref_in,1)/=nb) then
                deallocate(x_ref_in)
                allocate(x_ref_in(nb))
            end if         
            do i = 1, nb
                read(iunit,*) x_ref_in(i)
            end do
            close(iunit)        
            write(*,*) "read x_ref finished, size of x_ref", size(x_ref_in,1)
            
            !Set the solution from WatSolv (ws209)
            call VecGetOwnershipRange(x_ref,istart,iend,ierr)
            do i = istart, iend - 1
               call VecSetValue(x_ref, i, x_ref_in(i+1), Insert_Values, ierr)
            end do
            call VecAssemblyBegin(x_ref,ierr)
            call VecAssemblyEnd(x_ref,ierr)
            
            write(*,*) "Assembly reference result finished"
            
            call PetscTime(tstart, ierr)            
            !Set matrix elements
            ! - Each processor needs to insert only elements that it owns
            !   locally (but any non-local elements will be sent to the
            !   appropriate processor during matrix assembly).
            ! - Always specify global row and columns of matrix entries.
            ! - Note that MatSetValues() uses 0-based row and column numbers
            !   in Fortran as well as in C.
            call MatZeroEntries(a, ierr)
            write(*,*) "mark 1"
            call MatGetOwnershipRange(a,istart,iend,ierr)
            write(*,*) "mark 2"
            do i = istart, iend - 1
                ii = ia_in(i+1)
                jj = ia_in(i+2)
                call MatSetValues(a, ione, i, jj-ii, ja_in(ii:jj-1)-1, a_in(ii:jj-1), Insert_Values, ierr)
            end do 
            write(*,*) "mark 3"
            call MatAssemblyBegin(a, Mat_Final_Assembly, ierr)
            call MatAssemblyEnd(a, Mat_Final_Assembly, ierr)
            write(*,*) "mark 4"
            if(nstep == 1) then
                call MatSetOption(a,Mat_New_Nonzero_location_Err,Petsc_True,ierr)
                
                !!Test matrix partition using ParMetis
                !call MatPartitioningCreate(Petsc_Comm_World, part, ierr)
                !call MatPartitioningSetAdjacency(part, a, ierr)
                !call MatPartitioningSetFromOptions(part, ierr)
                !call MatPartitioningApply(part, is, ierr)
                !call ISPartitioningToNumbering(is, isg, ierr)
                !call ISView(isg, PETSC_VIEWER_STDOUT_WORLD, ierr)
                !call ISDestroy(is, ierr)
                !call ISDestroy(isg, ierr)
                !call MatPartitioningDestroy(part, ierr)
            end if
            !call MatView(a, PETSC_VIEWER_STDOUT_WORLD, ierr)
            
            write(*,*) "Assembly matrix a finished"        
            
            !!Set right hand side b
            call VecGetOwnershipRange(b,istart,iend,ierr)
            !!Method 1            
            !do i = istart, iend - 1
            !   call VecSetValue(b, i, b_in(i+1), Insert_Values, ierr)          
            !end do
            
            !!Method 2
            !!Use VecSetValues instead, it is faster.
            !call VecSetValues(b, iend-istart, ix(istart+1:iend), b_in(istart+1:iend), Insert_Values, ierr)            

            !!Method 3-1, set the values using the local ordering, without inserting ghost values
            !do i = 2, ng-1
            !    call VecSetValuesLocal(b,1,i-1,b_in(istart+i-1:istart+i-1),Insert_Values,ierr)
            !end do
            !!Method 3-2, set the values using the local ordering, without inserting ghost values
            call VecSetValuesLocal(b,ng-2,ix(2:ng-1),b_in(istart+1:istart+ng-2),Insert_Values,ierr)
            
            !!Assemble matrix
            call VecAssemblyBegin(b,ierr)
            call VecAssemblyEnd(b,ierr)
            !call VecView(b,PETSC_VIEWER_STDOUT_WORLD,ierr) 
            
            write(*,*) "Assembly right-hand-side b finished"   
            
             
            call PetscTime(tend, ierr)
            ttotal_assembly = tend - tstart
            
            call PetscTime(tstart, ierr)  
            !call KSPSetOperators(ksp,a,a,SAME_PRECONDITIONER,ierr)     !Fail for NWMO_Basin
            call KSPSetOperators(ksp,a,a,SAME_NONZERO_PATTERN,ierr)
            !call KSPSetOperators(ksp,a,a,DIFFERENT_NONZERO_PATTERN,ierr)
            
            call KSPSolve(ksp,b,x,ierr)             
            !call VecView(x,PETSC_VIEWER_STDOUT_WORLD,ierr)
            
            call PetscTime(tend, ierr)
            ttotal_ksp = tend - tstart     
            
            write(*,*) "KSP solver finished" 

            !!calculate residual r=ax-b, not required
            call MatMult(a, x, u, ierr)
            call VecAXPY(u, neg_one, b, ierr)
            
            !!calculate norm of residual a, not required
            call VecNormBegin(u, Norm_1, norm1, ierr)
            call VecNormBegin(u, Norm_2, norm2, ierr) 
            call VecNormEnd(u, Norm_1, norm1, ierr)
            call VecNormEnd(u, Norm_2, norm2, ierr)              
                       
            !!Calculate norm of PETSc solution and compare solution PETSc vs WatSolv (ws209)
            call MatMult(a,x_ref,u_ref,ierr)
            call VecAXPY(u_ref, neg_one, b, ierr)
            call VecNormBegin(u_ref, Norm_1, norm1_ref, ierr)
            call VecNormBegin(u_ref, Norm_2, norm2_ref, ierr) 
            call VecNormEnd(u_ref, Norm_1, norm1_ref, ierr)
            call VecNormEnd(u_ref, Norm_2, norm2_ref, ierr)            
            
            call VecAXPY(x_ref, neg_one, x, ierr)
            call VecMax(x_ref, PETSC_NULL_INTEGER, err_max, ierr)
            call VecMin(x_ref, PETSC_NULL_INTEGER, err_min, ierr)
            call VecNorm(x_ref, Norm_2, err_norm, ierr)
            
            !!get solver residual
            call KSPGetResidualNorm(ksp,norm,ierr)
            call KSPGetIterationNumber(ksp,its,ierr)
            call KSPGetErrorIfNotConverged(ksp,flg,ierr) 
            
            !!Get solution vector to array data
            !call VecGetOwnershipRange(x,istart,iend,ierr)
            !call VecGetValues(x, iend-istart,ix(istart+1:iend),x_out(istart+1:iend), ierr)
            !write(*, 90) istart+1, x_out(istart+1), iend, x_out(iend)
            
            !!Get solution vector by pointer, 
            !call VecGetArray(x, xx_v, xx_i, ierr)
            !call VecGetLocalSize(x,nlocal,ierr)     !you can not use ownership range istart, iend here, it's buggy.
            !write(*, 90) istart+1, xx_a(1), iend, xx_a(nlocal)
            !call VecRestoreArray(x, xx_v, xx_i, ierr)
            
90          format ('x(', i6, ') = ',e11.4, ' x(', i6, ') = ',e11.4)
            
            write(*,*) "Get solution residual finished" 
            
            !!Output the solution solved by PETSc
            call PetscViewerASCIIOpen(Petsc_Comm_World, trim(strXOut) , viewer, ierr)
            call VecView(x, viewer, ierr)
            call PetscViewerDestroy(viewer, ierr)            
            
            if (rank == 0) then 
                if(flg) then
                    write(*,*) "Solver does not converge"
                else
                    write(*,100) its, norm, norm1, norm2, ttotal_assembly, ttotal_ksp
                end if
            end if
            
100         format('Iterations ',i5, ' norm ', e11.4, ' norm1 ', e11.4, ' norm2 ', e11.4, & 
                   ' time_assembly ', e11.4, ' time_ksp ', e11.4)         
      
          end do
          
          !release local arrays
          if (allocated(ia_in))      deallocate(ia_in)
          if (allocated(ja_in))      deallocate(ja_in)
          if (allocated(a_in))       deallocate(a_in)
          if (allocated(b_in))       deallocate(b_in)
          if (allocated(x_out))      deallocate(x_out)
          if (allocated(x_ref_in)) deallocate(x_ref_in)
          if (allocated(ix))         deallocate(ix)
          if (allocated(gindices))   deallocate(gindices)
          if (allocated(d_nnz))      deallocate(d_nnz)
          if (allocated(o_nnz))      deallocate(o_nnz)
          
          !release Petsc objects
          call KSPDestroy(ksp,ierr)
          call VecDestroy(b, ierr)
          call VecDestroy(x, ierr)
          call VecDestroy(u, ierr)
          call VecDestroy(x_ref, ierr)
          call VecDestroy(u_ref, ierr)
          call MatDestroy(a, ierr)
          if(flgUseDMDA) then
              call DMDestroy(da,ierr)
          end if
      
      end do
      
      call PetscFinalize(ierr) 
      
     end program



More information about the petsc-users mailing list