[petsc-users] Can't expand MemType 1: jcol 16104

Mon Jul 13 14:52:31 CDT 2015

Hi Barry,

If you look in module_baseflows.F90 line 467-473, I have:

     call KSPDestroy(ksp,ierr)
     call VecDestroy(frhs,ierr)
     call VecDestroy(frhs2,ierr)
     call VecDestroy(sol,ierr)
     call VecDestroy(sol_seq,ierr)
     call MatDestroy(DXX,ierr)
     call MatDestroy(DYY,ierr)

I checked again and all the vectors that I create (see 
module_baseflows.F90 lines 87-88) are destroyed there. Also when I use 
VecScatterCreateToZero, I always have a VecScatterDestroy after.

Thanks,

Anthony

On 07/13/2015 12:15 PM, Barry Smith wrote:
>> On Jul 13, 2015, at 2:05 PM, Anthony Haas <aph at email.arizona.edu> wrote:
>>
>> Hi,
>>
>> I ran my program under Valgrind with 2 processors using the following:
>>
>> ${PETSC_DIR}/${PETSC_ARCH}/bin/mpiexec -n 2 valgrind --tool=memcheck --leak-check=full --show-leak-kinds=all --num-callers=20 --log-file=valgrind.log.%p ./BiGlobal_Spectral_Petsc.x -malloc off -ksp_type preonly -pc_type lu -pc_factor_mat_solver_package superlu_dist -eps_view -st_ksp_type preonly -st_pc_type lu -st_pc_factor_mat_solver_package superlu_dist
>>
>> Note that in my program, I use once PETSc alone to solve a linear system of equations (see module_baseflows.F90) and then later in the code I use SLEPc to get the eigenvalues (see module_petsc.F90). My main program is called BiGlobal_Spectral_Petsc.F90
>>
>> I have attached the log files from Valgrind. It seems that quite a few erros occur in pzgstrf. See valgrind.log.31371 and 31372. Do I have any control over these errors?
>     These are likely bugs in SuperLU_Dist.
>
>> Then starting from line 403 in valgrind.log.31371 and 458 in valgrind.log.31372, I have a series of errors that seem to be related to a VecCreateSeq and to a VecScatterCreateToZero (see in module_baseflows.F90). Is there something I can do to avoid these errors?
>     These are because you are not destroying all your vectors at the end.
>
>> Thanks
>>
>> Anthony
>>
>>
>>
>>
>>
>>
>>
>> On 07/08/2015 01:49 PM, Barry Smith wrote:
>>>     You should try running under valgrind, see:  http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>>>
>>>      You can also run in the debugger (yes this is tricky on a batch system but possible) and see exactly what triggers the floating point exception or when it "hangs" interrupt the program to see where it is "hanging".
>>>
>>>
>>>    Barry
>>>
>>>
>>>
>>>> On Jul 8, 2015, at 12:34 PM, Anthony Paul Haas <aph at email.arizona.edu> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I have used the switch -mat_superlu_dist_parsymbfact in my pbs script. However, although my program worked fine with sequential symbolic factorization, I get one of the following 2 behaviors when I run with parallel symbolic factorization (depending on the number of processors that I use):
>>>>
>>>> 1) the program just hangs (it seems stuck in some subroutine ==> see test.out-hangs)
>>>> 2) I get a floating point exception ==> see test.out-floating-point-exception
>>>>
>>>> Note that as suggested in the Superlu manual, I use a power of 2 number of procs. Are there any tunable parameters for the parallel symbolic factorization? Note that when I build my sparse matrix, most elements I add are nonzero of course but to simplify the programming, I also add a few zero elements in the sparse matrix. I was thinking that maybe if the parallel symbolic factorization proceed by block, there could be some blocks where the pivot would be zero, hence creating the FPE??
>>>>
>>>> Thanks,
>>>>
>>>> Anthony
>>>>
>>>>
>>>>
>>>> On Wed, Jul 8, 2015 at 6:46 AM, Xiaoye S. Li <xsli at lbl.gov> wrote:
>>>> Did you find out how to change option to use parallel symbolic factorization?  Perhaps PETSc team can help.
>>>>
>>>> Sherry
>>>>
>>>>
>>>> On Tue, Jul 7, 2015 at 3:58 PM, Xiaoye S. Li <xsli at lbl.gov> wrote:
>>>> Is there an inquiry function that tells you all the available options?
>>>>
>>>> Sherry
>>>>
>>>> On Tue, Jul 7, 2015 at 3:25 PM, Anthony Paul Haas <aph at email.arizona.edu> wrote:
>>>> Hi Sherry,
>>>>
>>>> Thanks for your message. I have used superlu_dist default options. I did not realize that I was doing serial symbolic factorization. That is probably the cause of my problem.
>>>> Each node on Garnet has 60GB usable memory and I can run with 1,2,4,8,16 or 32 core per node.
>>>>
>>>> So I should use:
>>>>
>>>> -mat_superlu_dist_r 20
>>>> -mat_superlu_dist_c 32
>>>>
>>>> How do you specify the parallel symbolic factorization option? is it -mat_superlu_dist_matinput 1
>>>>
>>>> Thanks,
>>>>
>>>> Anthony
>>>>
>>>>
>>>> On Tue, Jul 7, 2015 at 3:08 PM, Xiaoye S. Li <xsli at lbl.gov> wrote:
>>>> For superlu_dist failure, this occurs during symbolic factorization.  Since you are using serial symbolic factorization, it requires the entire graph of A to be available in the memory of one MPI task. How much memory do you have for each MPI task?
>>>>
>>>> It won't help even if you use more processes.  You should try to use parallel symbolic factorization option.
>>>>
>>>> Another point.  You set up process grid as:
>>>>         Process grid nprow 32 x npcol 20
>>>> For better performance, you show swap the grid dimension. That is, it's better to use 20 x 32, never gives nprow larger than npcol.
>>>>
>>>>
>>>> Sherry
>>>>
>>>>
>>>> On Tue, Jul 7, 2015 at 1:27 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>>
>>>>     I would suggest running a sequence of problems, 101 by 101 111 by 111 etc and get the memory usage in each case (when you run out of memory you can get NO useful information out about memory needs). You can then plot memory usage as a function of problem size to get a handle on how much memory it is using.  You can also run on more and more processes (which have a total of more memory) to see how large a problem you may be able to reach.
>>>>
>>>>     MUMPS also has an "out of core" version (which we have never used) that could in theory anyways let you get to large problems if you have lots of disk space, but you are on your own figuring out how to use it.
>>>>
>>>>    Barry
>>>>
>>>>> On Jul 7, 2015, at 2:37 PM, Anthony Paul Haas <aph at email.arizona.edu> wrote:
>>>>>
>>>>> Hi Jose,
>>>>>
>>>>> In my code, I use once PETSc to solve a linear system to get the baseflow (without using SLEPc) and then I use SLEPc to do the stability analysis of that baseflow. This is why, there are some SLEPc options that are not used in test.out-superlu_dist-151x151 (when I am solving for the baseflow with PETSc only). I have attached a 101x101 case for which I get the eigenvalues. That case works fine. However If i increase to 151x151, I get the error that you can see in test.out-superlu_dist-151x151 (similar error with mumps: see test.out-mumps-151x151 line 2918 ). If you look a the very end of the files test.out-superlu_dist-151x151 and test.out-mumps-151x151, you will see that the last info message printed is:
>>>>>
>>>>> On Processor (after EPSSetFromOptions)  0    memory:    0.65073152000E+08          =====>  (see line 807 of module_petsc.F90)
>>>>>
>>>>> This means that the memory error probably occurs in the call to EPSSolve (see module_petsc.F90 line 810). I would like to evaluate how much memory is required by the most memory intensive operation within EPSSolve. Since I am solving a generalized EVP, I would imagine that it would be the LU decomposition. But is there an accurate way of doing it?
>>>>>
>>>>> Before starting with iterative solvers, I would like to exploit as much as I can direct solvers. I tried GMRES with default preconditioner at some point but I had convergence problem. What solver/preconditioner would you recommend for a generalized non-Hermitian (EPS_GNHEP) EVP?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Anthony
>>>>>
>>>>> On Tue, Jul 7, 2015 at 12:17 AM, Jose E. Roman <jroman at dsic.upv.es> wrote:
>>>>>
>>>>> El 07/07/2015, a las 02:33, Anthony Haas escribió:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am computing eigenvalues using PETSc/SLEPc and superlu_dist for the LU decomposition (my problem is a generalized eigenvalue problem). The code runs fine for a grid with 101x101 but when I increase to 151x151, I get the following error:
>>>>>>
>>>>>> Can't expand MemType 1: jcol 16104   (and then [NID 00037] 2015-07-06 19:19:17 Apid 31025976: OOM killer terminated this process.)
>>>>>>
>>>>>> It seems to be a memory problem. I monitor the memory usage as far as I can and it seems that memory usage is pretty low. The most memory intensive part of the program is probably the LU decomposition in the context of the generalized EVP. Is there a way to evaluate how much memory will be required for that step? I am currently running the debug version of the code which I would assume would use more memory?
>>>>>>
>>>>>> I have attached the output of the job. Note that the program uses twice PETSc: 1) to solve a linear system for which no problem occurs, and, 2) to solve the Generalized EVP with SLEPc, where I get the error.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Anthony
>>>>>> <test.out-superlu_dist-151x151>
>>>>> In the output you are attaching there are no SLEPc objects in the report and SLEPc options are not used. It seems that SLEPc calls are skipped?
>>>>>
>>>>> Do you get the same error with MUMPS? Have you tried to solve linear systems with a preconditioned iterative solver?
>>>>>
>>>>> Jose
>>>>>
>>>>>
>>>>> <module_petsc.F90><test.out-mumps-151x151><test.out_superlu_dist-101x101><test.out-superlu_dist-151x151>
>>>>
>>>>
>>>>
>>>>
>>>> <test.out-hangs><test.out-floating-point-exception>
>>
>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>> !! SPECTRAL TEMPORAL BIGLOBAL SOLVER !
>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>
>> PROGRAM Biglobal_Spectral
>>
>>   USE PETSC
>>   USE TYPEDEF
>>   USE MAPPINGS
>>   USE BASEFLOW
>>   USE WRITE_PRINT
>>   USE FOURIER_CHEBYSHEV
>>
>>   IMPLICIT NONE
>>
>>   include 'mpif.h'
>>
>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -!
>> !                           Variables Definition                         !
>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -!
>>
>> !...Testing and dummy variables
>>     real(dp) :: dummy
>>
>> !...Problem parameters
>>     integer(i4b),parameter :: nx=11,ny=11,nxy=nx*ny,N=4*nxy
>>
>>     !Grid size for 3D Poiseuille baseflow
>>     integer(i4b),parameter :: nnx=nx,nny=ny
>>
>>     real(dp) :: Re,alpha,beta,Lx,xmin,xmax,ymin,ymax,AA,xll,yll
>>
>> !...Problem Flags
>>     integer(i4b) :: fourier,testcase,formulation,algorithm
>>
>>     !Mapping Flags
>>     integer(i4b) :: map,mapx,mapy
>>
>> !...Indices
>>     integer(i4b) :: i,j,iii,jjj,ij,skip,imin,imax,jmin,jmax
>>
>> !...Output files
>>     integer(i4b)   :: file_num
>>     character(128) :: filetype,filename,file_name
>>
>> !...Boundary conditions variables
>>     integer(i4b) :: ct1,ct2,ct3,ct4
>>     integer(i4b),dimension(:),allocatable :: fst_bc,wall_bc,outlet_bc,inlet_bc
>>     integer(i4b),dimension(:),allocatable :: bcond_px,bcond_py,bcond_pz
>>     integer(i4b),dimension(:),allocatable :: bcond_u,bcond_v,bcond_w,bcond_p
>>     integer(i4b),dimension(:),allocatable :: bcond
>>
>> !...Chebyshev related
>>     real(dp),dimension(:),allocatable   :: xi,yi,x,y,xfourier
>>
>> !...Baseflow variables
>>     real(dp),dimension(:,:),allocatable :: usol,uxsol,uysol
>>     real(dp),dimension(:,:),allocatable :: vsol,vxsol,vysol
>>     real(dp),dimension(:,:),allocatable :: wsol,wxsol,wysol
>>
>>     !For Hiemenz Baseflow
>>     integer(i4b) :: npts_hz
>>     real(dp) :: ymax_hz,epstol,si
>>
>>     real(dp),dimension(:),allocatable :: eta
>>     real(dp),dimension(:),allocatable :: u0,v0,v1,v2,v3
>>     real(dp),dimension(:),allocatable :: w0,w1,w2
>>     real(dp),dimension(:),allocatable :: etarev,u0rev,v0rev,w0rev
>>
>>     real(dp),dimension(:,:),allocatable :: Ubar,Vbar,Wbar
>>
>> !...Grid variables
>>     real(dp),dimension(:,:),allocatable :: xx,yy,xx1,yy1
>>
>> !...Vectorization array
>>     integer(i4b),dimension(:),allocatable :: li
>>
>> !...MPI
>>     integer(i4b) :: rank,size,ierr
>>
>> !...Spline interpolation
>>     !real(dp),dimension(:),allocatable :: bsp,csp,dsp
>>     !real(dp),dimension(:),allocatable :: interp_u0,interp_v0,interp_w0
>>
>>     real(dp) :: seval
>>
>> !...External Subroutines
>>     external seval
>>
>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -!
>> !                           Beginning of program                         !
>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -!
>>
>>     call MPI_INIT(ierr)
>>
>>     call MPI_COMM_SIZE(MPI_COMM_WORLD,size,ierr)
>>     call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr)
>>
>>     if (rank==0) call cpu_time(start_total)
>>
>> !...Parameters for BiGlobal Computation
>>
>>     testcase=2      ! 1 --> Plane Poiseuille Flow
>>                     ! 2 --> 3D Poiseuille Flow
>>                     ! 3 --> Attachment line flow (Hiemenz)
>>                     ! 4 --> Laminar Separation Bubble
>>                     ! 5 --> Artificially parallel boundary-layer (cf. Rodriguez thesis)
>>
>>     formulation = 1 ! 1 --> x,y inhomogeneous
>>                     ! 2 --> y,z inhomogeneous
>>
>>     algorithm = 1   ! 1 --> QZ algorith
>>                     ! 2 --> Arnoldi algorithm
>>
>>
>> !...Arrays allocation
>>
>>     allocate(li(nx))
>>     allocate(xi(nx),yi(ny))
>>     allocate(x(nx),y(ny))
>>     allocate(xfourier(nx))
>>
>>     allocate(uvec(nxy),uxvec(nxy),uyvec(nxy))
>>     allocate(vvec(nxy),vxvec(nxy),vyvec(nxy))
>>     allocate(wvec(nxy),wxvec(nxy),wyvec(nxy))
>>
>>     allocate(xx(nx,ny),yy(nx,ny))
>>     allocate(D1X(nx,nx),D2X(nx,nx))
>>     allocate(D1Y(ny,ny),D2Y(ny,ny))
>>     allocate(DMX(nx,nx,2),DMY(ny,ny,2))
>>
>>     !allocate(interp_u0(ny),interp_v0(ny),interp_w0(ny))
>>
>>     li       = 0
>>     xi       = 0.0d0
>>     yi       = 0.0d0
>>     x        = 0.0d0
>>     y        = 0.0d0
>>     xfourier = 0.0d0
>>     uvec     = 0.0d0
>>     uxvec    = 0.0d0
>>     uyvec    = 0.0d0
>>     vvec     = 0.0d0
>>     vxvec    = 0.0d0
>>     vyvec    = 0.0d0
>>     wvec     = 0.0d0
>>     wxvec    = 0.0d0
>>     wyvec    = 0.0d0
>>     xx       = 0.0d0
>>     yy       = 0.0d0
>>     D1X      = 0.0d0
>>     D2X      = 0.0d0
>>     D1Y      = 0.0d0
>>     D2Y      = 0.0d0
>>     DMX      = 0.0d0
>>     DMY      = 0.0d0
>>
>> !...Test Cases Parameters
>>
>>     if ( testcase == 2 ) then
>>
>>        mapx=1
>>        mapy=0
>>
>>        Re=1000
>>
>>        if (formulation == 1)then
>>           beta = pi
>>        elseif (formulation == 2) then
>>           alpha = pi
>>        endif
>>
>>        AA=1.0
>>
>>        xmin=-AA
>>        xmax=AA
>>
>>        ymin=-1.0
>>        ymax=1.0
>>
>>        !grid for baseflow
>>        !nnx=nx
>>        !nny=ny
>>
>>     endif
>>
>> !...Working Array for Converting 2D Indices to 1D
>>
>>     li=0
>>
>>     do i=1,nx
>>        li(i)=(i-1)*ny
>>     enddo
>>
>> !...Get Collocation Points and Chebyshev Differentiation Matrices
>>
>>     call cheby(nx-1,xi,DMX) ! (cf. Spectral Methods in Matlab pages 53-55)
>>     call cheby(ny-1,yi,DMY)
>>
>>     xll=0
>>
>>     ! To get matrices D1X,D2X,D1Y,D2Y
>>     call set_mappings(nx,ny,mapx,mapy,xmin,xmax,ymin,ymax,xi,yi,xll,yll,x,y)
>>
>>     do i=1,nx
>>        do j=1,ny
>>
>>           xx(i,j)=x(i) ! not equivalent to matlab meshgrid!!!
>>           yy(i,j)=y(j) ! not equivalent to matlab meshgrid!!!
>>
>>        enddo
>>     enddo
>>
>>
>> !...Compute Baseflow (3D Poiseuille)
>>
>>     if ( testcase == 2 ) then
>>
>>        allocate(xx1(nnx,nny),yy1(nnx,nny))
>>
>>        allocate(usol(nnx,nny),uxsol(nnx,nny),uysol(nnx,nny))
>>        allocate(vsol(nnx,nny),vxsol(nnx,nny),vysol(nnx,nny))
>>        allocate(wsol(nnx,nny),wxsol(nnx,nny),wysol(nnx,nny))
>>
>>        xx1   = 0.0d0
>>        yy1   = 0.0d0
>>        usol  = 0.0d0
>>        uxsol = 0.0d0
>>        uysol = 0.0d0
>>        vsol  = 0.0d0
>>        vxsol = 0.0d0
>>        vysol = 0.0d0
>>        wsol  = 0.0d0
>>        wxsol = 0.0d0
>>        wysol = 0.0d0
>>
>>        if (rank == 0) then
>>           WRITE(*,*)
>>           WRITE(*,'(A,I3,A,I3)')'3D POISEUILLE BASEFLOW COMPUTED ON GRID WITH NX =',nnx,' NY =',nny
>>           WRITE(*,*)
>>           call cpu_time(start)
>>        endif
>>
>>        !currently I am taking same grid for baseflow as for biglobal ==> no interpolation
>>        call Poiseuille_3D_PETSC(nnx,nny,AA,usol,xx1,yy1)
>>
>>        if (rank==0) then
>>           call cpu_time(finish)
>>           print '("Time for baseflow computation = ",f15.4," seconds.")',finish-start
>>
>>           file_num=10
>>           file_name='check_baseflow_3D_poiseuille.dat'
>>
>>           call write_to_tecplot(file_num,file_name,nnx,nny,xx1,yy1,usol,usol,usol)
>>
>>        endif
>>
>>        vsol=0*usol
>>        wsol=0*usol
>>
>>        if (formulation==1) then
>>
>>           wsol=usol
>>           usol=0.*wsol
>>           vsol=0.*wsol
>>
>>        endif
>>
>>        !Compute derivatives
>>        uxsol = MATMUL(D1X,usol)
>>        uysol = MATMUL(usol,TRANSPOSE(D1Y))
>>
>>        vxsol = MATMUL(D1X,vsol)
>>        vysol = MATMUL(vsol,TRANSPOSE(D1Y))
>>
>>        wxsol = MATMUL(D1X,wsol)
>>        wysol = MATMUL(wsol,TRANSPOSE(D1Y))
>>
>>        !Vectorize the matrices
>>        do i=1,nx
>>           do j=1,ny
>>
>>              ij=li(i)+j
>>
>>              uvec(ij) = usol(i,j)
>>              vvec(ij) = vsol(i,j)
>>              wvec(ij) = wsol(i,j)
>>
>>              uxvec(ij) = uxsol(i,j)
>>              vxvec(ij) = vxsol(i,j)
>>              wxvec(ij) = wxsol(i,j)
>>
>>              uyvec(ij) = uysol(i,j)
>>              vyvec(ij) = vysol(i,j)
>>              wyvec(ij) = wysol(i,j)
>>
>>           enddo
>>        enddo
>>
>>     endif
>>
>>     !call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
>>
>> !...Build Matrices, Set BCs and Solve Problem
>>
>>     call SetupSolveGEVP(nx,ny,nxy,alpha,beta,Re)
>>
>>     if (rank==0) then
>>        call cpu_time(finish_total)
>>        print '("Total time = ",f15.4," seconds.")',finish_total-start_total
>>     endif
>>
>>     deallocate(li)
>>     deallocate(xi,yi)
>>     deallocate(x,y)
>>     deallocate(xfourier)
>>
>>     deallocate(uvec,uxvec,uyvec)
>>     deallocate(vvec,vxvec,vyvec)
>>     deallocate(wvec,wxvec,wyvec)
>>
>>     deallocate(xx,yy)
>>     deallocate(D1X,D2X)
>>     deallocate(D1Y,D2Y)
>>     deallocate(DMX,DMY)
>>
>>     deallocate(xx1,yy1)
>>
>>     deallocate(usol,uxsol,uysol)
>>     deallocate(vsol,vxsol,vysol)
>>     deallocate(wsol,wxsol,wysol)
>>
>>     call MPI_FINALIZE(ierr)
>>
>>
>>   END PROGRAM Biglobal_Spectral
>>
>> MODULE BASEFLOW
>>
>>   USE TYPEDEF
>>   USE WRITE_PRINT
>>   USE FOURIER_CHEBYSHEV
>>
>>   IMPLICIT NONE
>>
>> CONTAINS
>>
>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>
>>   SUBROUTINE Poiseuille_3D_PETSC(nx,ny,AA,usol,xx,yy)
>>
>> #include <petsc/finclude/petscsys.h>
>> #include <petsc/finclude/petscvec.h>
>> #include <petsc/finclude/petscmat.h>
>> #include <petsc/finclude/petscmat.h90>
>> #include <petsc/finclude/petscvec.h90>
>> #include <petsc/finclude/petscpc.h>
>> #include <petsc/finclude/petscksp.h>
>> #include <petsc/finclude/petscviewer.h>
>> #include <petsc/finclude/petscviewer.h90>
>>
>> #include <slepc/finclude/slepcsys.h>
>> #include <slepc/finclude/slepceps.h>
>>
>> !...Regular Variables
>>
>>     integer(i4b),intent(in) :: nx,ny
>>
>>     real(dp),intent(in) :: AA
>>     real(dp),dimension(nx,ny),intent(inout) :: usol,xx,yy
>>
>>     integer(i4b) :: i,j,ij
>>     integer(i4b) :: rank,size,ierr,source
>>     integer(i4b) :: nxy,nxm2,nym2,nxym2
>>
>>     integer(i4b),dimension(nx) :: li
>>     integer(i4b),dimension(nx-2) :: lim2
>>
>>     integer(i4b),dimension(:),allocatable  :: loc
>>
>>     real(dp) :: ymin,ymax,xmin,xmax
>>     real(dp) :: dxidx,d2xidx2,dyidy,d2yidy2
>>
>>     real(dp),dimension(nx) :: xi,x
>>     real(dp),dimension(ny) :: yi,y
>>
>>     real(dp),dimension(nx,nx) :: D2XB
>>     real(dp),dimension(ny,ny) :: D2YB
>>     real(dp),dimension(nx,nx,2) :: DMXB
>>     real(dp),dimension(ny,ny,2) :: DMYB
>>
>>     real(dp),dimension(nx*ny) :: usol_vec
>>     real(dp),dimension( nx-2,ny-2 ) :: sol_mat
>>
>>     complex(dpc) :: U00
>>     complex(dpc) :: atest
>>     complex(dpc),dimension(nx-2,nx-2) :: D2XBs
>>     complex(dpc),dimension(ny-2,ny-2) :: D2YBs
>>     complex(dpc),dimension( nx-2,ny-2 ) :: sol_mat_cmplx
>>     complex(dpc),dimension(:),allocatable  :: xwork1
>>
>> !!$    allocate(li(nx))
>> !!$    allocate(lim2(nx-2))
>>
>> !!! NEED TO TRANSFORM ALL THESE ARRAYS TO ALLOCATABLE!!!!!!
>>
>> !...Petsc Variables
>>     PetscInt ii,jj,ione,ct
>>     PetscInt jmin,jmax
>>     PetscInt Istart,Iend
>>     PetscInt IstartVec,IendVec
>>
>>     PetscReal   tol
>>     PetscScalar aone
>>
>>     PetscViewer viewer
>>
>>     PC         pc
>>     KSP        ksp
>>     VecScatter ctx
>>     Mat        DXX,DYY
>>     !Mat        DLOAD
>>     Vec        frhs,frhs2
>>     Vec        sol,sol_seq
>>
>>
>>     !For VecGetArrayReadF90
>>     !PetscScalar,pointer :: xx_v(:) !Fortran90 pointer to the array
>>     PetscOffset i_x
>>     PetscScalar sol_array(1)
>>
>> !...MPI
>>     !PetscMPIInt    rank,size
>>
>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -!
>> !                         Beginning of program                           !
>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -!
>>
>>     call MPI_COMM_SIZE(MPI_COMM_WORLD,size,ierr)
>>     call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr)
>>
>>     call PetscInitialize(PETSC_NULL_CHARACTER,ierr)
>>
>>     !call PetscOptionsInsertString("-mat_superlu_dist_parsymbfact",ierr)
>>
>> !...Check if nz and ny are odd (necessary to set max velocity to 1)
>>
>>     if ( mod(nx,2)==0 .or.  mod(ny,2)==0   ) then
>>        write(*,*)
>>        STOP'STOPPING... IN 3D POISEUILLE BASEFLOW NZ,NY NEED TO BE ODD'
>>     endif
>>
>> !...Set Domain Sizes and Initialize other variables
>>
>>     xmin=-AA
>>     xmax=AA
>>
>>     ymin=-1
>>     ymax=1
>>
>>     ione=1
>>
>>     nxy=nx*ny
>>
>>     !Variables to Build DXX and DYY
>>     nxm2=nx-2
>>     nym2=ny-2
>>     nxym2 = nxm2*nym2
>>
>> !...Mat to Vec and Vec to Mat conversion
>>
>>     lim2=0
>>     do i=1,nxm2
>>        lim2(i) = (i-1)*nym2
>>     enddo
>>
>> !...Get collocation points and chebyshev differentiation matrices
>>
>>     call cheby(nx-1,xi,DMXB)
>>     call cheby(ny-1,yi,DMYB)
>>
>>     ! x-mapping from [-1,1]->[-XMAX,XMAX]
>>     x=xmax*xi
>>
>>     ! y-mapping (from [-1,1]->[-1,1])
>>     y=yi
>>
>>     !Compute the metric terms
>>     dxidx = 1/xmax
>>     d2xidx2 = 0
>>
>>     dyidy = 1
>>     d2yidy2 = 0
>>
>>     !Scale the differentiation matrix (due to the use of the mapping)
>>
>>     !x-direction
>>     D2XB  = d2xidx2*DMXB(:,:,1) + dxidx**2*DMXB(:,:,2)
>>     D2XBs = D2XB(2:nx-1,2:nx-1)
>>
>>     !y-direction
>>     D2YB  = d2yidy2*DMYB(:,:,1) + dyidy**2*DMYB(:,:,2)
>>     D2YBs = D2YB(2:ny-1,2:ny-1)
>>
>>     xx=0.0
>>     yy=0.0
>>
>>     do i=1,nx
>>        do j=1,ny
>>
>>           xx(i,j)=x(i)
>>           yy(i,j)=y(j)
>>
>>        enddo
>>     enddo
>>
>> !...Create Petsc right-hand-side vector and derivation matrices
>>     call VecCreateSeq(PETSC_COMM_SELF,nxym2,sol_seq,ierr)
>>     call VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,nxym2,frhs,ierr) ! right hand side vector
>>     call VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,nxym2,frhs2,ierr) ! right hand side vector # 2
>>
>>     call MatCreateAIJ(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,nxym2,nxym2,nx,PETSC_NULL_INTEGER,nx,PETSC_NULL_INTEGER,DXX,ierr)
>>     call MatCreateAIJ(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,nxym2,nxym2,ny,PETSC_NULL_INTEGER,ny,PETSC_NULL_INTEGER,DYY,ierr)
>>
>> !...Determine which rows of the vector/matrix are locally owned.
>>     call VecGetOwnershipRange(frhs,IstartVec,IendVec,ierr)
>>     call MatGetOwnershipRange(DXX,Istart,Iend,ierr)
>>
>>     if ( IstartVec /= Istart .or. IendVec /= Iend ) then
>>        stop 'Problem in Baseflow Computation'
>>     endif
>>
>> !...Fill in Sequential and Parallel Vectors
>>     allocate(xwork1(Iend-Istart))
>>     allocate(loc(Iend-Istart))
>>
>>     ct=0
>>     do i=Istart,Iend-1
>>        ct=ct+1
>>        loc(ct)=i
>>        xwork1(ct)=(-2.0d0,0.0d0)
>>     enddo
>>
>>     call VecSetValues(frhs,Iend-Istart,loc,xwork1,INSERT_VALUES,ierr)
>>     call VecZeroEntries(sol_seq,ierr)
>>
>> !...Assemble Vectors
>>     call VecAssemblyBegin(frhs,ierr)
>>     call VecAssemblyEnd(frhs,ierr)
>>
>>     call VecAssemblyBegin(sol_seq,ierr)
>>     call VecAssemblyEnd(sol_seq,ierr)
>>
>>     call VecDuplicate(frhs,sol,ierr)   ! parallel solution vector
>>
>> !...Display vector
>>     !call VecView(frhs,PETSC_VIEWER_STDOUT_WORLD,ierr) ! ==> seems very slow
>>
>> !...Build DXX in Parallel
>>
>> !!$    if (rank==0) then
>> !!$       call print_matrix('D2XB',nx,nx,D2XB)
>> !!$       call print_matrix('D2YB',ny,ny,D2YB)
>> !!$
>> !!$       call print_matrix_cmplx('D2XBs',nx-2,nx-2,D2XBs)
>> !!$       call print_matrix_cmplx('D2YBs',ny-2,ny-2,D2YBs)
>> !!$    endif
>>
>>     if (rank == 0) call cpu_time(start)
>>
>>     do i=Istart,Iend-1
>>
>>        ii=1+floor((i+0.01)/nym2)
>>        jj=1
>>
>>        jmin=mod(i,nym2)
>>        jmax=jmin+(nxm2-1)*nym2
>>
>>        do j=jmin,jmax,nym2
>>
>>           call MatSetValues(DXX,ione,i,ione,j,D2XBs(ii,jj),INSERT_VALUES,ierr)
>>           jj=jj+1
>>
>>        enddo
>>
>>     enddo
>>
>> !...Build DYY in Parallel
>>
>>     do i=Istart,Iend-1
>>
>>        ii=mod(i,nym2)+1
>>        jj=1
>>
>>        jmin=floor((i+0.01)/nym2)*nym2
>>
>>        do j=jmin,jmin+(nym2-1)
>>
>>           call MatSetValues(DYY,ione,i,ione,j,D2YBs(ii,jj),INSERT_VALUES,ierr)
>>           jj=jj+1
>>
>>        enddo
>>
>>     enddo
>>
>> !...Assemble DXX and DYY
>>
>>     call MatAssemblyBegin(DXX,MAT_FINAL_ASSEMBLY,ierr)
>>     call MatAssemblyEnd(DXX,MAT_FINAL_ASSEMBLY,ierr)
>>
>>     call MatAssemblyBegin(DYY,MAT_FINAL_ASSEMBLY,ierr)
>>     call MatAssemblyEnd(DYY,MAT_FINAL_ASSEMBLY,ierr)
>>
>>     if (rank==0) then
>>        call cpu_time(finish)
>>        print '("Time for build and assembly of baseflow operator = ",f15.4," seconds.")',finish-start
>>     endif
>>
>>     !call MatView(DXX,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>     !call MatView(DYY,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>
>>     aone = (1.0d0,0.0d0)
>>
>>     if (rank == 0) call cpu_time(start)
>>     call MatAXPY(DXX,aone,DYY,DIFFERENT_NONZERO_PATTERN,ierr) ! on exit DXX = a*DYY + DXX
>>     if (rank==0) then
>>        call cpu_time(finish)
>>        print '("Time for Matrix summation = ",f15.4," seconds.")',finish-start
>>     endif
>>
>>
>> !...Write Matrix to ASCII and Binary Format
>> !!$    call PetscViewerASCIIOpen(PETSC_COMM_WORLD,"Amat.m",viewer,ierr)
>> !!$    call MatView(DXX,viewer,ierr)
>> !!$    call PetscViewerDestroy(viewer,ierr)
>> !!$
>> !!$    call PetscViewerBinaryOpen(PETSC_COMM_WORLD,"Amat_binary.m",FILE_MODE_WRITE,viewer,ierr)
>> !!$    call MatView(DXX,viewer,ierr)
>> !!$    call PetscViewerDestroy(viewer,ierr)
>>
>> !...Load a Matrix in Binary Format
>> !!$    call PetscViewerBinaryOpen(PETSC_COMM_WORLD,"Amat_binary.m",FILE_MODE_READ,viewer,ierr)
>> !!$    call MatCreate(PETSC_COMM_WORLD,DLOAD,ierr)
>> !!$    call MatSetType(DLOAD,MATAIJ,ierr)
>> !!$    call MatLoad(DLOAD,viewer,ierr)
>> !!$    call PetscViewerDestroy(viewer,ierr)
>>
>>     !call MatTranspose(DXX,MatReuse reuse,Mat *B)
>>     !call MatView(DXX,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>
>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>> !! SOLVE POISSON EQUATION FOR THE FIRST TIME !!
>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>
>> !...Create linear solver context
>>     call KSPCreate(PETSC_COMM_WORLD,ksp,ierr)
>>
>> !...Set operators. Here the matrix that defines the linear system also serves as the preconditioning matrix.
>>     !call KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr) !aha commented and replaced by next line
>>     call KSPSetOperators(ksp,DXX,DXX,ierr) ! remember: here DXX=DXX+DYY
>>
>>     !call KSPSetOperators(ksp,DLOAD,DLOAD,ierr) ! remember: here DXX=DXX+DYY
>>
>>     tol = 1.e-10
>>     !call KSPSetTolerances(ksp,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr) ! set relative tolerance and uses defualt for absolute and divergence tol
>>     call KSPSetTolerances(ksp,tol,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr) ! set relative and absolute tolerances and uses defualt for divergence tol
>>
>> !...Set the Direct (LU) Solver
>>     !call KSPSetType(ksp,KSPPREONLY,ierr)
>>     !call KSPGetPC(ksp,pc,ierr)
>>     !call PCSetType(pc,PCLU,ierr)
>>     !call PCFactorSetMatSolverPackage(pc,MATSOLVERMUMPS,ierr) ! MATSOLVERSUPERLU_DIST
>>
>> !...Set runtime options, e.g., -ksp_type <type> -pc_type <type> -ksp_monitor -ksp_rtol <rtol>
>>     !These options will override those specified above as long as KSPSetFromOptions() is called _after_
>>     !any other customization routines.
>>
>>     call KSPSetFromOptions(ksp,ierr)
>>
>> !...Sets an ADDITIONAL function to be called at every iteration to monitor the residual/error etc
>>     call KSPMonitorSet(ksp,KSPMonitorTrueResidualNorm,PETSC_NULL_OBJECT,PETSC_NULL_FUNCTION,ierr)
>>
>> !...Solve the linear system
>>     call KSPSolve(ksp,frhs,sol,ierr)
>>
>> !...View solver info (could use -ksp_view instead)
>>
>>     call KSPView(ksp,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>
>> !...Extract Maximum Value (at center of duct) from Solution Vector
>>
>>     call VecScatterCreateToZero(sol,ctx,sol_seq,ierr)
>>     !scatter as many times as you need
>>     call VecScatterBegin(ctx,sol,sol_seq,INSERT_VALUES,SCATTER_FORWARD,ierr)
>>     call VecScatterEnd(ctx,sol,sol_seq,INSERT_VALUES,SCATTER_FORWARD,ierr)
>>     !destroy scatter context and local vector when no longer needed
>>     call VecScatterDestroy(ctx,ierr)
>>
>>     !call VecView(sol_seq,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>
>>     call VecGetArray(sol_seq,sol_array,i_x,ierr)
>>
>>     if (rank==0) then
>>
>>        sol_mat_cmplx=0.0d0
>>
>>        do i=1,nxm2
>>           do j=1,nym2
>>              ij=lim2(i)+j
>>              sol_mat_cmplx(i,j) = sol_array(i_x + ij)
>>           enddo
>>        enddo
>>
>>        U00 = sol_mat_cmplx((nxm2+1)/2,(nym2+1)/2)
>>
>>     endif
>>
>>     source=0
>>     Call MPI_Bcast(U00,1,MPI_DOUBLE_COMPLEX,source,MPI_COMM_WORLD,ierr)
>>
>>     !call VecGetArrayReadF90(sol,xx_v,ierr) ! requires #include <petsc/finclude/petscvec.h90>
>>     !atest = xx_v(3)
>>     !call VecRestoreArrayReadF90(sol,xx_v,ierr)
>>     !write(*,*)'atest',atest
>>
>>     call VecRestoreArray(sol_seq,sol_array,i_x,ierr)
>>
>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>> !! SOLVE POISSON EQUATION FOR THE SECOND TIME !!
>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>
>> !...Fill in New Right-Hand-Side Vector (frhs2)
>>     call VecSetValues(frhs2,Iend-Istart,loc,xwork1/U00,INSERT_VALUES,ierr)
>>
>> !...Assemble New RHS Vector
>>     call VecAssemblyBegin(frhs2,ierr)
>>     call VecAssemblyEnd(frhs2,ierr)
>>
>> !...Solve System
>>     call KSPSolve(ksp,frhs2,sol,ierr)
>>
>> !...Get Final Solution
>>     call VecScatterCreateToZero(sol,ctx,sol_seq,ierr)
>>     call VecScatterBegin(ctx,sol,sol_seq,INSERT_VALUES,SCATTER_FORWARD,ierr)
>>     call VecScatterEnd(ctx,sol,sol_seq,INSERT_VALUES,SCATTER_FORWARD,ierr)
>>     call VecScatterDestroy(ctx,ierr)
>>
>>     call VecGetArray(sol_seq,sol_array,i_x,ierr)
>>
>>     if (rank==0) then
>>
>>        usol=0.0d0
>>        sol_mat_cmplx=0.0d0
>>
>>        do i=1,nxm2
>>           do j=1,nym2
>>              ij=lim2(i)+j
>>              sol_mat_cmplx(i,j) = sol_array(i_x + ij)
>>           enddo
>>        enddo
>>
>>        usol(2:nx-1,2:ny-1) = dble(sol_mat_cmplx)
>>
>>     endif
>>
>>     call VecRestoreArray(sol_seq,sol_array,i_x,ierr)
>>
>> !...Send usol to all other processes (vectorize, broadcast and matrixize)
>>     li=0
>>     do i=1,nx
>>        li(i) = (i-1)*ny
>>     enddo
>>
>>     if (rank==0) then
>>
>>        do i=1,nx
>>           do j=1,ny
>>              ij=li(i)+j
>>              usol_vec(ij)=usol(i,j)
>>           enddo
>>        enddo
>>
>>     endif
>>
>>     call MPI_Bcast(usol_vec,nx*ny,MPI_DOUBLE,source,MPI_COMM_WORLD,ierr)
>>
>>     if (rank/=0) then
>>
>>        do i=1,nx
>>           do j=1,ny
>>              ij=li(i)+j
>>              usol(i,j)=usol_vec(ij)
>>           enddo
>>        enddo
>>
>>     endif
>>
>> !...Free work space. All PETSc objects should be destroyed when they are no longer needed.
>>
>>     deallocate(loc)
>>     deallocate(xwork1)
>>
>>     call KSPDestroy(ksp,ierr)
>>     call VecDestroy(frhs,ierr)
>>     call VecDestroy(frhs2,ierr)
>>     call VecDestroy(sol,ierr)
>>     call VecDestroy(sol_seq,ierr)
>>     call MatDestroy(DXX,ierr)
>>     call MatDestroy(DYY,ierr)
>>
>>     call PetscFinalize(ierr)
>>
>>
>> END SUBROUTINE Poiseuille_3D_PETSC
>>
>>
>>
>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>
>>
>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>> !!!!!!! HIEMENZ FLOW SOLVER !!!!!!!!
>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>
>>
>> !*************************************************************************************************
>> ! on Veltins: ifort hiemenz.f90 -convert big_endian -r8 -i4 -O3 -132 -o hiemenz.x
>> ! ************************************************************************************************
>>       !program hiemenz
>>
>>       subroutine hiemenz(ymax,eps,n,si,eta,u0,v0,v1,v2,v3,w0,w1,w2)
>>
>>       implicit none
>>
>>       real(dp) eps,ymax,si
>>       integer(i4b) n,i
>>
>> !      real(dp),allocatable,dimension(:) :: eta
>> !      real(dp),allocatable,dimension(:) :: u0,v0,v1,v2,v3
>> !      real(dp),allocatable,dimension(:) :: w0,w1,w2
>>
>>       real(dp),dimension(n) :: eta
>>       real(dp),dimension(n) :: u0,v0,v1,v2,v3
>>       real(dp),dimension(n) :: w0,w1,w2
>>
>>       write(*,*)
>>       write(*,*) '*******************************************************************'
>>       write(*,*) '                       HIEMENZ SOLVER                              '
>>       write(*,*) '*******************************************************************'
>>       write(*,*)
>>
>> !
>> !..... Numerical Parameters
>> !
>>
>> !Domain height and number of points
>>
>>       !ymax=100
>>       !n=500001
>>
>> !Tolerance for Newton iteration
>>
>>       !eps=1e-12
>>
>> !Initial guess for Newton iteration: f''(0)
>>
>>       !si = -1.232587656820134
>>
>> !
>> !..... Allocate arrays
>> !
>> !      allocate(eta(n))
>> !      allocate(u0(n),v0(n),v1(n),v2(n),v3(n))
>> !      allocate(w0(n),w1(n),w2(n))
>>
>> !
>> !..... Compute solution
>> !
>>
>>       call hiemenz_fct(ymax,eps,n,si,eta,u0,v0,v1,v2,v3,w0,w1,w2)
>>
>> !
>> !..... Write out solution in ASCII format
>> !
>>
>>       write(*,*)
>>       write(*,*) 'Writing solution to hiemenz.dat'
>>       write(*,*)
>>
>>       open(unit=15,file='hiemenz.dat',form='formatted')
>>
>>       do i=1,n
>>          write(15,'(9E21.11)') eta(i),u0(i),v0(i),v1(i),v2(i),v3(i),w0(i),w1(i),w2(i)
>>       end do
>>
>>       close(15)
>>
>>
>>
>>     end subroutine hiemenz
>>
>>
>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>
>>
>>       subroutine hiemenz_fct(ymax,eps,n,si,eta,u0,v0,v1,v2,v3,w0,w1,w2)
>>
>>       implicit none
>>
>>       integer(i4b) i,iter,n,j
>>       real(dp) ymax,res,p,eps,s,si
>>
>>       real(dp),allocatable,dimension(:,:) :: v,w
>>
>>       real(dp) eta(n)
>>       real(dp) u0(n),v0(n),v1(n),v2(n),v3(n)
>>       real(dp) w0(n),w1(n),w2(n)
>>
>>       !external w_ode
>>
>>       write(*,*) 'eps=',eps
>>       write(*,*) 'ymax=',ymax
>>       write(*,*) 'N=',n
>>       write(*,*)
>>
>> !
>> !     Allocate arrays
>> !
>>       allocate(v (6,n))
>>       allocate(w (5,n))
>>
>> !
>> !     Create grid
>> !
>>       do i=1,n
>>          eta(i) = ymax * real(i-1)/real(n-1)
>>       end do
>> !
>> !     Newton-Raphson iteration for v
>> !
>>       iter = 0
>>       s = si
>>       res = 1.
>>       write(*,*) 'Newton-Raphson iteration for v: s=',s
>>
>>       do while (res.gt.eps)
>>
>>          iter=iter+1
>>
>>          call hiemenz_eval(s,eta,n,v)
>>
>>          s = s - (v(2,n) + 1.0) / v(5,n)
>>          res = abs(v(2,n) + 1.0)
>>
>>          write(*,*) iter,s,res
>>
>>       end do
>>
>>       write(*,*) 'Number of iterations ',iter
>>       write(*,*)
>>       write(*,*) "v''(0)=",v(3,1)
>>
>> !
>> !     Now compute w
>> !
>>       w(1,1) = 0.
>>       w(2,1) = v(3,1)
>>
>>       w(3,1) = 0.
>>       w(4,1) = 0.
>>       w(5,1) = v(3,1)
>>
>>       do i=2,n
>>          call rk4(w(1,i-1),w(1,i),eta(i)-eta(i-1),5,w_ode)
>>       end do
>>
>> !
>> !     Rescale w to match boundary conditions at infinity
>> !
>>       p = 1./w(1,n)
>>
>>       do i=1,n
>>          w(1,i) = w(1,i) * p
>>          w(2,i) = w(2,i) * p
>>       end do
>>
>>       write(*,*) "w'(0)=",w(2,1)
>>
>> !
>> !     Now compute v''', w'' from the RHS and copy all values to new arrays
>> !
>>
>>       do i=1,n
>>
>>          u0(i) = -v(2,i)
>>          v0(i) = v(1,i)
>>          v1(i) = v(2,i)
>>          v2(i) = v(3,i)
>>          v3(i) = -v(2,i)*v(2,i) + v(1,i)*v(3,i) + 1.0
>>          w0(i) = w(1,i)
>>          w1(i) = w(2,i)
>>          w2(i) = w(2,i)*v(1,i)
>>
>>       end do
>>
>>
>>       deallocate(v,w)
>>
>>     end subroutine hiemenz_fct
>>
>>
>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>
>>
>>       subroutine hiemenz_eval(s,eta,n,f)
>>
>>       implicit none
>>
>>       integer(i4b) n,i
>>       real(dp) s
>>       real(dp) eta(n),f(6,n)
>>
>>       !external v_ode
>>
>>       f(1,1) = 0.
>>       f(2,1) = 0.
>>       f(3,1) = s
>>
>>       f(4,1) = 0.
>>       f(5,1) = 0.
>>       f(6,1) = 1.
>>
>>       do i=2,n
>>          call rk4(f(1,i-1),f(1,i),eta(i)-eta(i-1),6,v_ode)
>>       end do
>>
>>     end subroutine hiemenz_eval
>>
>>
>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>
>>
>>       subroutine v_ode(y,dy)
>>
>>       implicit none
>>
>>       real(dp) y(6),dy(6)
>>
>>       dy(1) = y(2)
>>       dy(2) = y(3)
>>       dy(3) = -y(2)*y(2) + y(1)*y(3) + 1.0
>>
>>       dy(4) = y(5)
>>       dy(5) = y(6)
>>       dy(6) = y(3)*y(4) - 2.0*y(2)*y(5) + y(1)*y(6)
>>
>>       end subroutine v_ode
>>
>>
>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>
>>
>>       subroutine w_ode(y,dy)
>>
>>       implicit none
>>
>>       real(dp) y(5),dy(5)
>>
>>       dy(1) = y(2)
>>       dy(2) = y(2)*y(3)
>>
>>       dy(3) = y(4)
>>       dy(4) = y(5)
>>       dy(5) = -y(4)*y(4) + y(3)*y(5) + 1.0
>>
>>       end subroutine w_ode
>>
>>
>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>
>>
>>       subroutine rk4(y1,y2,h,nn,der)
>> !
>> !     Perform one RK4 step with step size h of a n-vector
>> !     Input:  y1(n)  old step
>> !             h      step size
>> !             nn     vector size
>> !             der    derivative function
>> !     Output: y2(nn) new step
>> !
>>       implicit none
>>
>>       integer(i4b) nn
>>
>>       real(dp) y1(nn),y2(nn),y(nn)
>>       real(dp) k1(nn),k2(nn),k3(nn),k4(nn)
>>       real(dp) h
>>
>>       external der
>> !
>> !     First RK substep
>> !
>>
>>       y(:) = y1(:)
>>       call der(y,k1)
>>       k1(:)=h*k1(:)
>>
>> !
>> !     Second RK substep
>> !
>>
>>       y(:) = y1(:) + .5*k1(:)
>>       call der(y,k2)
>>       k2(:)=h*k2(:)
>>
>> !
>> !     Third RK substep
>> !
>>
>>       y(:) = y1(:) + .5*k2(:)
>>       call der(y,k3)
>>       k3(:)=h*k3(:)
>>
>> !
>> !     Fourth RK substep
>> !
>>
>>       y(:) = y1(:) + k3(:)
>>       call der(y,k4)
>>       k4(:)=h*k4(:)
>>
>> !
>> !     Compose solution after full RK step
>> !
>>       y2(:) = y1(:) + ( k1(:) + 2.0*k2(:) + 2.0*k3(:) + k4(:) ) / 6.0
>>
>>       end subroutine rk4
>>
>>
>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>
>>
>>       subroutine compute_baseflow_derivative(DXYZ,uvwvec,nx,ny,UVWxyz)
>>
>>         implicit none
>>
>>         integer(i4b) :: i,j
>>
>>         integer(i4b),intent(in) :: nx,ny
>>
>>         real(dp),dimension(nx*ny),intent(in) :: uvwvec
>>         real(dp),dimension(nx*ny,nx*ny),intent(in) :: DXYZ
>>         real(dp),dimension(nx*ny,nx*ny),intent(out) :: UVWxyz
>>
>>         UVWxyz = 0.0
>>
>>         do i=1,nx*ny
>>            do j=1,nx*ny
>>
>>               UVWxyz(i,i) = UVWxyz(i,i) + DXYZ(i,j)*uvwvec(j)
>>
>>            enddo
>>         enddo
>>
>>       end subroutine compute_baseflow_derivative
>>
>>
>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>
>>
>>       subroutine compute_UVW_DXYZ(DXYZ,uvwvec,nx,ny,UVW_DXYZ)
>>
>>         implicit none
>>
>>         integer(i4b) :: i,j
>>
>>         integer(i4b),intent(in) :: nx,ny
>>
>>         real(dp),dimension(nx*ny),intent(in) :: uvwvec
>>         real(dp),dimension(nx*ny,nx*ny),intent(in) :: DXYZ
>>         real(dp),dimension(nx*ny,nx*ny),intent(out) :: UVW_DXYZ
>>
>>         do i=1,nx*ny
>>            do j=1,nx*ny
>>
>>               UVW_DXYZ(i,j) = uvwvec(i)*DXYZ(i,j)
>>
>>            enddo
>>         enddo
>>
>>       end subroutine compute_UVW_DXYZ
>>
>>
>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>
>>
>>
>>
>> END MODULE BASEFLOW
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> !!$
>> !!$
>> !!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>> !!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>> !!$
>> !!$  SUBROUTINE Poiseuille_3D(nz,ny,AA,usol,zz,yy)
>> !!$
>> !!$    character(128) :: file_name
>> !!$    integer(i4b) :: INFO,NRHS,ct,i,j,N,file_num,m1,n1,p1,q1
>> !!$
>> !!$    integer(i4b),intent(in) :: nz,ny
>> !!$    real(dp),intent(in) :: AA
>> !!$
>> !!$    real(dp) :: ymin,ymax,zmin,zmax,dzidz,d2zidz2,dyidy,d2yidy2,U00
>> !!$    real(dp) :: start,finish
>> !!$
>> !!$    real(dp),dimension(nz,nz,2) :: DMZ
>> !!$    real(dp),dimension(ny,ny,2) :: DMY
>> !!$
>> !!$    real(dp),dimension(nz,nz) :: D2Z
>> !!$    real(dp),dimension(ny,ny) :: D2Y
>> !!$
>> !!$    real(dp),dimension(nz-2,nz-2) :: D2Zs
>> !!$    real(dp),dimension(ny-2,ny-2) :: D2Ys
>> !!$
>> !!$    real(dp),dimension(nz-2,nz-2) :: IZ
>> !!$    real(dp),dimension(ny-2,ny-2) :: IY
>> !!$
>> !!$    real(dp),dimension(nz,ny) :: usol,zz,yy
>> !!$    real(dp),dimension(nz-2,ny-2) :: utmp
>> !!$
>> !!$    real(dp),dimension(nz) :: zi,z
>> !!$    real(dp),dimension(ny) :: yi,y
>> !!$
>> !!$    real(dp),dimension( (nz-2)*(ny-2) ) :: frhs
>> !!$    real(dp),dimension( (nz-2)*(ny-2),(nz-2)*(ny-2) ) :: AMAT,DZZ,DYY
>> !!$
>> !!$    integer(i4b),dimension((nz-2)*(ny-2)) :: IPIV
>> !!$
>> !!$    frhs=0.0
>> !!$
>> !!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>> !!$!!!!!!! 3D POISEUILLE FLOW !!!!!!!!
>> !!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>> !!$
>> !!$    zmin=-AA
>> !!$    zmax=AA
>> !!$
>> !!$    ymin=-1
>> !!$    ymax=1
>> !!$
>> !!$    !
>> !!$    !! .....Create Grid
>> !!$    !
>> !!$
>> !!$    if ( mod(nz,2)==0 .or.  mod(ny,2)==0   ) then
>> !!$       WRITE(*,*)
>> !!$       STOP'STOPPING... IN 3D POISEUILLE BASEFLOW NZ,NY NEED TO BE ODD'
>> !!$    endif
>> !!$
>> !!$    !
>> !!$    !! .....Get collocation points and chebyshev differentiation matrices
>> !!$    !
>> !!$
>> !!$    call cheby(nz-1,zi,DMZ)
>> !!$    call cheby(ny-1,yi,DMY)
>> !!$
>> !!$    ! z-mapping from [-1,1]->[-ZMAX,ZMAX]
>> !!$    z=zmax*zi
>> !!$
>> !!$    ! y-mapping (from [-1,1]->[-1,1])
>> !!$    y=yi
>> !!$
>> !!$    !Compute the metric terms
>> !!$    dzidz = 1/zmax
>> !!$    d2zidz2 = 0
>> !!$
>> !!$    dyidy = 1
>> !!$    d2yidy2 = 0
>> !!$
>> !!$    !Scale the differentiation matrix (due to the use of the mapping)
>> !!$
>> !!$    !z-direction
>> !!$    !D1Z = dzidz*DMZ(:,:,1)
>> !!$    D2Z  = d2zidz2*DMZ(:,:,1) + dzidz**2*DMZ(:,:,2)
>> !!$    D2Zs = D2Z(2:nz-1,2:nz-1)
>> !!$
>> !!$    !y-direction
>> !!$    !D1Y = dyidy*DMY(:,:,1)
>> !!$    D2Y  = d2yidy2*DMY(:,:,1) + dyidy**2*DMY(:,:,2)
>> !!$    D2Ys = D2Y(2:ny-1,2:ny-1)
>> !!$
>> !!$    IZ=0.0
>> !!$    IY=0.0
>> !!$
>> !!$    do i=1,nz-2
>> !!$       IZ(i,i)=1.0
>> !!$    enddo
>> !!$
>> !!$    do i=1,ny-2
>> !!$       IY(i,i)=1.0
>> !!$    enddo
>> !!$
>> !!$    m1=size(IY,1)
>> !!$    n1=size(IY,2)
>> !!$    p1=size(D2Zs,1)
>> !!$    q1=size(D2Zs,2)
>> !!$
>> !!$    call mykron(IY,D2Zs,m1,n1,p1,q1,DZZ)
>> !!$
>> !!$    m1=size(D2Ys,1)
>> !!$    n1=size(D2Ys,2)
>> !!$    p1=size(IZ,1)
>> !!$    q1=size(IZ,2)
>> !!$
>> !!$    call mykron(D2Ys,IZ,m1,n1,p1,q1,DYY)
>> !!$
>> !!$    AMAT = DYY + DZZ
>> !!$
>> !!$    !CALL PRINT_MATRIX( 'DZZ', (nz-2)*(ny-2), (nz-2)*(ny-2), DZZ )
>> !!$    !CALL PRINT_MATRIX( 'DYY', (nz-2)*(ny-2), (nz-2)*(ny-2),DYY )
>> !!$
>> !!$
>> !!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>> !!$    !! SOLVE POISSON EQUATION FOR THE FIRST TIME !!
>> !!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>> !!$
>> !!$
>> !!$
>> !!$    do i=1,(ny-2)*(nz-2)
>> !!$       frhs(i)=-2.0
>> !!$    enddo
>> !!$
>> !!$    INFO=0
>> !!$    NRHS=1 ! only 1 rhs here
>> !!$    N=(nz-2)*(ny-2)
>> !!$
>> !!$
>> !!$    !
>> !!$    ! Compute the LU factorization of A.
>> !!$    !
>> !!$
>> !!$    ! Note: on exit of dgetrf, AMAT is not AMAT anymore but contains the factors L and U
>> !!$    ! from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
>> !!$
>> !!$    !call PRINT_MATRIX( 'AMAT', N, N, AMAT )
>> !!$
>> !!$
>> !!$    call cpu_time(start)
>> !!$    CALL dgetrf( N, N, AMAT, N, IPIV, INFO )
>> !!$    call cpu_time(finish)
>> !!$
>> !!$    !write(*,*)'AMAT AFTER',AMAT
>> !!$
>> !!$    print '("Time LU = ",f15.4," seconds.")',finish-start
>> !!$
>> !!$    IF( INFO .NE. 0 ) THEN
>> !!$       WRITE(*,*)'INFO',INFO
>> !!$       STOP 'PROBLEM IN LAPACK (Poiseuille_3D)'
>> !!$    ENDIF
>> !!$
>> !!$
>> !!$    !
>> !!$    ! Solve the system A*X = B, overwriting B with X.
>> !!$    !
>> !!$
>> !!$    IF( INFO.EQ.0 ) THEN
>> !!$
>> !!$       call cpu_time(start)
>> !!$       CALL dgetrs( 'No transpose', n, nrhs, AMAT, N, IPIV, frhs, N, INFO )
>> !!$       call cpu_time(finish)
>> !!$
>> !!$       print '("Time Linear System Solve = ",f15.4," seconds.")',finish-start
>> !!$       write(*,*)
>> !!$
>> !!$    END IF
>> !!$
>> !!$    ct=0
>> !!$    do j=1,ny-2
>> !!$       do i=1,nz-2
>> !!$
>> !!$          ct=ct+1
>> !!$          utmp(i,j) = frhs(ct)
>> !!$
>> !!$       enddo
>> !!$    enddo
>> !!$
>> !!$    usol = 0.0
>> !!$    usol(2:nz-1,2:ny-1) = utmp(:,:)
>> !!$
>> !!$    zz=0.0
>> !!$    yy=0.0
>> !!$
>> !!$    do i=1,nz
>> !!$       do j=1,ny
>> !!$
>> !!$          zz(i,j)=z(i)
>> !!$          yy(i,j)=y(j)
>> !!$
>> !!$       enddo
>> !!$    enddo
>> !!$
>> !!$    U00 = usol( (nz+1)/2 , (ny+1)/2 )
>> !!$
>> !!$
>> !!$
>> !!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>> !!$!! SOLVE POISSON EQUATION FOR THE SECOND TIME !!
>> !!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>> !!$
>> !!$
>> !!$    utmp=0.0
>> !!$    usol=0.0
>> !!$    frhs=0.0
>> !!$
>> !!$    !New,scaled rhs
>> !!$    do i=1,(ny-2)*(nz-2)
>> !!$       frhs(i)=-2.0/U00
>> !!$    enddo
>> !!$
>> !!$    !
>> !!$    ! Solve the system A*X = B, overwriting B with X.
>> !!$    !
>> !!$
>> !!$    ! Note: here we already have the LU factorization so that we only need a call to dgetrs
>> !!$
>> !!$    CALL dgetrs( 'No transpose', N, NRHS, AMAT, N, IPIV, frhs, N, INFO )
>> !!$
>> !!$    IF( INFO .NE. 0 ) THEN
>> !!$       WRITE(*,*)'INFO',INFO
>> !!$       STOP 'PROBLEM IN LAPACK (Poiseuille_3D -- solve 2)'
>> !!$    ENDIF
>> !!$
>> !!$    !write(*,*)'frhs',frhs
>> !!$
>> !!$    ct=0
>> !!$    do j=1,ny-2
>> !!$       do i=1,nz-2
>> !!$
>> !!$          ct=ct+1
>> !!$          utmp(i,j) = frhs(ct)
>> !!$
>> !!$       enddo
>> !!$    enddo
>> !!$
>> !!$    usol(2:nz-1,2:ny-1) = utmp(:,:)
>> !!$
>> !!$
>> !!$  END SUBROUTINE Poiseuille_3D
>> !!$
>> !!$
>> !
>> !  Description: Build complex matrices A and B in the context of a generalized
>> !  eigenvalue problem (GEVP)
>> !
>> !
>> !/*T
>> !  Concepts: Build complex matrices
>> !  Processors: n
>> !T*/
>> !
>> ! -----------------------------------------------------------------------
>>
>> MODULE PETSC
>>
>>   USE TYPEDEF
>>   USE WRITE_PRINT
>>
>>   IMPLICIT NONE
>>
>> CONTAINS
>>
>>   Subroutine SetupSolveGEVP(nx,ny,nxy,alpha,beta,Re)
>>
>> #include <petsc/finclude/petscsys.h>
>> #include <petsc/finclude/petscvec.h>
>> #include <petsc/finclude/petscmat.h>
>> #include <petsc/finclude/petscmat.h90>
>> #include <petsc/finclude/petscpc.h>
>> #include <petsc/finclude/petscksp.h>
>>
>> #include <slepc/finclude/slepcsys.h>
>> #include <slepc/finclude/slepceps.h>
>>
>> !...Non PETSc/SLEPc Variables
>>
>>     real(dp),intent(in) :: alpha,beta,Re
>>     integer(i4b),intent(in) :: nx,ny,nxy
>>
>>     integer(i4b) :: ij
>>     integer(i4b) :: sizem,rank,source
>>
>>     integer(i4b),dimension(:),allocatable  :: idxm1,idxn1
>>     integer(i4b),dimension(:),allocatable  :: idxm2,idxn2
>>     integer(i4b),dimension(:),allocatable  :: idxm3,idxn3
>>
>>     integer(i4b),dimension(:),allocatable  :: li
>>     integer(i4b),dimension(:),allocatable  :: all_bc
>>     integer(i4b),dimension(:),allocatable  :: u_bc,v_bc,w_bc,p_bc
>>     integer(i4b),dimension(:),allocatable  :: fst_bc,wall_bc
>>     integer(i4b),dimension(:),allocatable  :: outlet_bc,inlet_bc
>>
>>
>>     complex(dpc) :: ibeta,Rec,alphac,betac,small,sigma
>>     complex(dpc),dimension(:,:),allocatable :: D1Xc,D2Xc,D1Yc,D2Yc
>>
>>     complex(dpc),dimension(nxy) :: uvec_cmplx,uxvec_cmplx,uyvec_cmplx
>>     complex(dpc),dimension(nxy) :: vvec_cmplx,vxvec_cmplx,vyvec_cmplx
>>     complex(dpc),dimension(nxy) :: wvec_cmplx,wxvec_cmplx,wyvec_cmplx
>>
>>     complex(dpc),dimension(3*nxy) :: wvec_cmplx3
>>     complex(dpc),dimension(ny,ny) :: VDIAG,VDY
>>
>> !...Temporary variables
>>     integer(i4b),dimension(:),allocatable  :: loc
>>     complex(dpc),dimension(:),allocatable  :: xwork1
>>
>>
>> !...SLEPC VARIABLES
>>     EPS eps
>>     EPSType tname
>>
>>     ST st
>>     PC pc
>>
>>     Vec xr,xi
>>
>>     PetscReal      error
>>     PetscScalar    kr,ki
>>
>>     PetscLogDouble mem
>>
>>     PetscInt       maxit,nconv
>>     PetscInt       nev,ncv,mpd
>>     PetscInt       three
>>     PetscInt       ct,ct1,ct2,ct3,ct4
>>
>>     PetscInt       icntl,ival,ival1
>>
>>     PetscReal tolslepc
>>     PetscInt maxiter
>>
>> !...PETSC VARIABLES
>>
>>     PetscReal      norm,tol,nrm,nrm1
>>     PetscInt       i,j,k,II,JJ,its
>>
>>     PetscInt       jmin,jmax
>>     PetscInt       ione,ntimes
>>     PetscErrorCode ierr
>>     PetscBool      flg
>>     PetscScalar    v,one,zero,rhs
>>
>>     KSP            ksp
>>     PetscRandom    rctx
>>
>> !...MPI
>>     !PetscMPIInt    rank,size
>>
>> !...Matrices indices
>>
>>     PetscInt       IstartA,IendA,IstartB,IendB
>>
>> !...Vectors
>>
>>     Vec            i_vec,frhs,sol
>>
>> !...Matrices
>>
>>     Mat            A,B
>>
>>     !  Note: Any user-defined Fortran routines (such as MyKSPMonitor)
>>     !  MUST be declared as external.
>>
>>     !external MyKSPMonitor,MyKSPConverged
>>
>> !#if !defined(PETSC_USE_COMPLEX)
>> !#error "this code requires PETSc --with-scalar-type=complex build"
>> !#endif
>> !#if !defined(PETSC_USE_REAL_DOUBLE)
>> !#error "this code requires PETSc --with-precision=real build"
>> !#endif
>>
>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -!
>> !                           Beginning of program                         !
>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -!
>>
>>     call SlepcInitialize(PETSC_NULL_CHARACTER,ierr)!;CHKERRQ(ierr)
>>
>>     call PetscMemorySetGetMaximumUsage(ierr)!;CHKERRQ(ierr)
>>
>>     call PetscOptionsInsertString("-mat_superlu_dist_parsymbfact",ierr)
>>
>>     call MPI_COMM_SIZE(MPI_COMM_WORLD,sizem,ierr)!;CHKERRQ(ierr)
>>     call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr)!;CHKERRQ(ierr)
>>
>>     call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
>>     !call PetscMemoryGetMaximumUsage(mem,ierr)
>>     call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr)
>>     if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (entering module)',rank,'    memory:',mem
>>
>>     !call PetscOptionsGetInt(PETSC_NULL_CHARACTER,'-nx',nx,flg,ierr)
>>     !call PetscOptionsGetInt(PETSC_NULL_CHARACTER,'-ny',ny,flg,ierr)
>>
>>     ione=1 ! integer
>>     ibeta = imag_unit*beta
>>     one = (1.0d0,0.0d0)
>>     zero = (0.0d0,0.0d0)
>>     Rec=Re
>>     alphac=alpha
>>     betac=beta
>>
>>     source=0
>>
>>     ct=0
>>
>> !...Convert Arrays to Complex
>>
>>     uvec_cmplx=uvec
>>     uxvec_cmplx=uxvec
>>     uyvec_cmplx=uyvec
>>
>>     vvec_cmplx=vvec
>>     vxvec_cmplx=vxvec
>>     vyvec_cmplx=vyvec
>>
>>     wvec_cmplx=wvec
>>     wxvec_cmplx=wxvec
>>     wyvec_cmplx=wyvec
>>
>>     wvec_cmplx3(1:nxy)=wvec_cmplx
>>     wvec_cmplx3(nxy+1:2*nxy)=wvec_cmplx
>>     wvec_cmplx3(2*nxy+1:3*nxy)=wvec_cmplx
>>
>>     allocate(D1Xc(nx,nx),D2Xc(nx,nx),D1Yc(ny,ny),D2Yc(ny,ny))
>>
>>     D1Xc=D1X
>>     D1Yc=D1Y
>>
>>     D2Xc=D2X
>>     D2Yc=D2Y
>>
>>     !call MPI_Bcast(wvec_cmplx3,3*nxy,MPI_DOUBLE_COMPLEX,source,MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
>>
>> !...Parallel Matrices
>>
>>     !Create A
>>     call MatCreateAIJ(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4*nxy,4*nxy,2*(nx+ny),PETSC_NULL_INTEGER,max(nx,ny),PETSC_NULL_INTEGER,A,ierr)!;CHKERRQ(ierr)
>>
>>     !Create B
>>
>>     call MatCreateAIJ(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4*nxy,4*nxy,1,PETSC_NULL_INTEGER,0,PETSC_NULL_INTEGER,B,ierr)!;CHKERRQ(ierr)
>>
>>     call MatGetOwnershipRange(A,IstartA,IendA,ierr)!;CHKERRQ(ierr)
>>     call MatGetOwnershipRange(B,IstartB,IendB,ierr)!;CHKERRQ(ierr)
>>
>>     !write(*,*)'IstartA,IendA',IstartA,IendA
>>     !write(*,*)'IstartB,IendB',IstartB,IendB
>>
>>     if (IstartA > 4*nxy .or. IstartB > 4*nxy .or. IendA > 4*nxy .or. IendB > 4*nxy) then
>>        stop 'indices problem in module_petsc'
>>     endif
>>
>> !...NEED TO TEST WITHOUT THIS NEXT LINE TO UNDERSTAND !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>> !...This line avoids error [0]PETSC ERROR: Argument out of range [0]PETSC ERROR: New nonzero at (470,470)...
>>     call MatSetOption(A,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE,ierr)!;CHKERRQ(ierr)
>>
>> !...Get Memory Usage
>>     call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
>>     !call PetscMemoryGetMaximumUsage(mem,ierr)
>>     call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr)
>>     if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (after matcreate)',rank,'    memory:',mem
>>
>> !...Build A in Parallel
>>
>>     if (rank == 0) call cpu_time(start)
>>
>>     do i=IstartA,IendA-1
>>
>>        if     ( i < nxy ) then
>>
>>           if (abs(uyvec_cmplx(i+1))>1e10) STOP '1'
>>           call MatSetValues(A,ione,i,ione,i+nxy,uyvec_cmplx(i+1),INSERT_VALUES,ierr)!;CHKERRQ(ierr) ! Uy (block 1,2)
>>
>>           !Build DX (block 1,4)
>>           ii=1+floor((i+0.01)/ny)
>>           jj=1
>>
>>           jmin=mod(i,ny)
>>           jmax=jmin+(nx-1)*ny
>>
>>           do j=jmin,jmax,ny
>>
>>              if (abs(D1Xc(ii,jj))>1e10) STOP '2'
>>              call MatSetValues(A,ione,i,ione,j+3*nxy,D1Xc(ii,jj),INSERT_VALUES,ierr) ! Build DX (block 1,4)
>>              jj=jj+1
>>
>>           enddo
>>
>>           !Build -DXX/Re (part of block 1,1)
>>           ii=1+floor((i+0.01)/ny)
>>           jj=1
>>
>>           jmin=mod(i,ny)
>>           jmax=jmin+(nx-1)*ny
>>
>>           do j=jmin,jmax,ny
>>
>>              if (abs(-D2Xc(ii,jj)/Rec)>1e10) STOP '3'
>>              call MatSetValues(A,ione,i,ione,j,-D2Xc(ii,jj)/Rec,INSERT_VALUES,ierr) !Build -DXX/Re (part of block 1,1)
>>              jj=jj+1
>>
>>           enddo
>>
>>
>>        elseif ( i > nxy-1 .and. i < 2*nxy ) then
>>
>>           if (abs(vxvec_cmplx(i-nxy+1))>1e10) STOP '4'
>>           call MatSetValues(A,ione,i,ione,i-nxy,vxvec_cmplx(i-nxy+1),INSERT_VALUES,ierr)!;CHKERRQ(ierr) ! Vx (block 2,1)
>>
>>           !Build DY (block 2,4)
>>           ii=mod(i,ny)+1
>>           jj=1
>>
>>           jmin=floor((i-nxy+0.01)/ny)*ny
>>
>>           do j=jmin,jmin+(ny-1)
>>
>>              if (abs(D1Yc(ii,jj))>1e10) STOP '5'
>>              call MatSetValues(A,ione,i,ione,j+3*nxy,D1Yc(ii,jj),INSERT_VALUES,ierr) ! Build DY (block 2,4)
>>              jj=jj+1
>>
>>           enddo
>>
>>
>>           !Build -DXX/Re (part of block 2,2)
>>           ii=1+floor((i-nxy+0.01)/ny)
>>           jj=1
>>
>>           jmin=mod(i,ny)
>>           jmax=jmin+(nx-1)*ny
>>
>>           do j=jmin,jmax,ny
>>
>>              if (abs(-D2Xc(ii,jj)/Rec)>1e10) STOP '6'
>>              call MatSetValues(A,ione,i,ione,j+nxy,-D2Xc(ii,jj)/Rec,INSERT_VALUES,ierr) !Build -DXX/Re (part of block 2,2)
>>              jj=jj+1
>>
>>           enddo
>>
>>
>>
>>
>>        elseif ( i > 2*nxy-1 .and. i < 3*nxy ) then
>>
>>           call MatSetValues(A,ione,i,ione,i+nxy,ibeta,INSERT_VALUES,ierr)! iBeta (block 3,4)
>>
>>           call MatSetValues(A,ione,i,ione,i-2*nxy,wxvec_cmplx(i-2*nxy+1),INSERT_VALUES,ierr)! Wx (block 3,1)
>>           call MatSetValues(A,ione,i,ione,i  -nxy,wyvec_cmplx(i-2*nxy+1),INSERT_VALUES,ierr)! Wy (block 3,2)
>>
>>
>>           !Build -DXX/Re (part of block 3,3)
>>           ii=1+floor((i-2*nxy+0.01)/ny)
>>           jj=1
>>
>>           jmin=mod(i,ny)
>>           jmax=jmin+(nx-1)*ny
>>
>>           do j=jmin,jmax,ny
>>
>>              call MatSetValues(A,ione,i,ione,j+2*nxy,-D2Xc(ii,jj)/Rec,INSERT_VALUES,ierr) !Build -DXX/Re (part of block 3,3)
>>              jj=jj+1
>>
>>           enddo
>>
>>
>>        elseif ( i > 3*nxy-1 ) then
>>
>>           call MatSetValues(A,ione,i,ione,i-nxy,ibeta,INSERT_VALUES,ierr)! iBeta (block 4,3)
>>
>>           !Build DX (block 4,1)
>>           ii=1+floor(((i-3*nxy)+0.01)/ny)
>>           jj=1
>>
>>           jmin=mod(i,ny)
>>           jmax=jmin+(nx-1)*ny
>>
>>           do j=jmin,jmax,ny
>>
>>              call MatSetValues(A,ione,i,ione,j,D1Xc(ii,jj),INSERT_VALUES,ierr)
>>              jj=jj+1
>>
>>           enddo
>>
>>           !Build DY (block 4,2)
>>           ii=mod(i,ny)+1
>>           jj=1
>>
>>           jmin=floor((i-3*nxy+0.01)/ny)*ny
>>
>>           do j=jmin,jmin+(ny-1)
>>
>>              call MatSetValues(A,ione,i,ione,j+nxy,D1Yc(ii,jj),INSERT_VALUES,ierr)
>>              jj=jj+1
>>
>>           enddo
>>
>>        endif
>>
>>     enddo
>>
>>     if (rank==0) then
>>
>>        call cpu_time(finish)
>>        write(*,*)
>>        print '("Time Partial Parallel Build A #1 = ",f21.3," seconds.")',finish-start
>>
>>     endif
>>
>>
>>     call MatAssemblyBegin(A,MAT_FLUSH_ASSEMBLY,ierr)!;CHKERRQ(ierr) ! necessary to switch between INSERT_VALUES and ADD_VALUES
>>     call MatAssemblyEnd(A,MAT_FLUSH_ASSEMBLY,ierr)!;CHKERRQ(ierr)   ! necessary to switch between INSERT_VALUES and ADD_VALUES
>>
>>
>>
>> !...Finish Blocks A11, A22 and A33
>>
>>     if (rank == 0) call cpu_time(start)
>>
>>     VDY=(0.0d0,0.0d0)
>>     VDIAG=(0.0d0,0.0d0)
>>
>>     do i=IstartA,IendA-1
>>        if (i < 3*nxy) then
>>           call MatSetValues(A,ione,i,ione,i,betac**2/Rec+imag_unit*betac*wvec_cmplx3(i+1),ADD_VALUES,ierr)!;CHKERRQ(ierr)
>>        endif
>>     enddo
>>
>>     do i=IstartA,IendA-1
>>
>>        if     ( i < nxy ) then
>>
>>           !Build -DYY/Rec (part of block 1,1)
>>           ii=mod(i,ny)+1
>>           jj=1
>>
>>           jmin=floor((i+0.01)/ny)*ny
>>
>>           do j=jmin,jmin+(ny-1)
>>
>>              call MatSetValues(A,ione,i,ione,j,-D2Yc(ii,jj)/Rec,ADD_VALUES,ierr)
>>              jj=jj+1
>>
>>           enddo
>>
>>
>>           !Build U*DX (part of block 1,1)
>>           ii=1+floor((i+0.01)/ny)
>>           jj=1
>>
>>           jmin=mod(i,ny)
>>           jmax=jmin+(nx-1)*ny
>>
>>           do j=jmin,jmax,ny
>>
>>              call MatSetValues(A,ione,i,ione,j,uvec_cmplx(i+1)*D1Xc(ii,jj),ADD_VALUES,ierr)
>>              jj=jj+1
>>
>>           enddo
>>
>>
>>           !Build V*DY (part of block 1,1)
>>           ii=mod(i,ny)+1
>>           jj=1
>>
>>           jmin=floor((i+0.01)/ny)*ny
>>
>>           do j=jmin,jmin+(ny-1)
>>              call MatSetValues(A,ione,i,ione,j,vvec_cmplx(i+1)*D1Yc(ii,jj),ADD_VALUES,ierr)
>>              jj=jj+1
>>
>>           enddo
>>
>>
>>           call MatSetValues(A,ione,i,ione,i,uxvec_cmplx(i+1),ADD_VALUES,ierr)!;CHKERRQ(ierr) ! Ux (block 1,1)
>>
>>
>>
>>
>>        elseif ( i > nxy-1 .and. i < 2*nxy ) then
>>
>>           !Build -DYY/Rec (part of block 2,2)
>>           ii=mod(i,ny)+1
>>           jj=1
>>
>>           jmin=floor((i-nxy+0.01)/ny)*ny
>>
>>           do j=jmin,jmin+(ny-1)
>>
>>              call MatSetValues(A,ione,i,ione,j+nxy,-D2Yc(ii,jj)/Rec,ADD_VALUES,ierr) ! Build DY (block 2,4)
>>              jj=jj+1
>>
>>           enddo
>>
>>
>>           !Build U*DX (part of block 2,2)
>>           ii=1+floor((i-nxy+0.01)/ny)
>>           jj=1
>>
>>           jmin=mod(i,ny)
>>           jmax=jmin+(nx-1)*ny
>>
>>           do j=jmin,jmax,ny
>>
>>              call MatSetValues(A,ione,i,ione,j+nxy,uvec_cmplx(i-nxy+1)*D1Xc(ii,jj),ADD_VALUES,ierr)
>>              jj=jj+1
>>
>>           enddo
>>
>>
>>           !Build V*DY (part of block 2,2)
>>           ii=mod(i,ny)+1
>>           jj=1
>>
>>           jmin=floor((i-nxy+0.01)/ny)*ny
>>
>>           do j=jmin,jmin+(ny-1)
>>              call MatSetValues(A,ione,i,ione,j+nxy,vvec_cmplx(i-nxy+1)*D1Yc(ii,jj),ADD_VALUES,ierr)
>>              jj=jj+1
>>
>>           enddo
>>
>>           call MatSetValues(A,ione,i,ione,i,vyvec_cmplx(i-nxy+1),ADD_VALUES,ierr)!;CHKERRQ(ierr) ! Vy (block 2,2)
>>
>>
>>        elseif ( i > 2*nxy-1 .and. i < 3*nxy ) then
>>
>>           !Build -DYY/Rec (part of block 3,3)
>>           ii=mod(i,ny)+1
>>           jj=1
>>
>>           jmin=floor((i-2*nxy+0.01)/ny)*ny
>>
>>           do j=jmin,jmin+(ny-1)
>>
>>              call MatSetValues(A,ione,i,ione,j+2*nxy,-D2Yc(ii,jj)/Rec,ADD_VALUES,ierr) ! Build DY (block 2,4)
>>              jj=jj+1
>>
>>           enddo
>>
>>
>>           !Build U*DX (part of block 3,3)
>>           ii=1+floor((i-2*nxy+0.01)/ny)
>>           jj=1
>>
>>           jmin=mod(i,ny)
>>           jmax=jmin+(nx-1)*ny
>>
>>           do j=jmin,jmax,ny
>>
>>              call MatSetValues(A,ione,i,ione,j+2*nxy,uvec_cmplx(i-2*nxy+1)*D1Xc(ii,jj),ADD_VALUES,ierr)
>>              jj=jj+1
>>
>>           enddo
>>
>>
>>           !Build V*DY (part of block 3,3)
>>           ii=mod(i,ny)+1
>>           jj=1
>>
>>           jmin=floor((i-2*nxy+0.01)/ny)*ny
>>
>>           do j=jmin,jmin+(ny-1)
>>              call MatSetValues(A,ione,i,ione,j+2*nxy,vvec_cmplx(i-2*nxy+1)*D1Yc(ii,jj),ADD_VALUES,ierr)
>>              jj=jj+1
>>
>>           enddo
>>
>>        endif
>>
>>     enddo
>>
>>
>>     if (rank==0) then
>>
>>        call cpu_time(finish)
>>        write(*,*)
>>        print '("Time Partial Parallel Build A #2 = ",f21.3," seconds.")',finish-start
>>
>>     endif
>>
>> !$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
>> !$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
>>
>>
>>     !...Artificially create diagonal element for A ==> TO BE REMOVED
>>     small = (0.000001d0,0.0d0)
>>     do i=IstartA,IendA-1
>>        call MatSetValues(A,ione,i,ione,i,small,ADD_VALUES,ierr)!;CHKERRQ(ierr)
>>     enddo
>>
>>     call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
>>     !call PetscMemoryGetMaximumUsage(mem,ierr)
>>     call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr)
>>     if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (A just before assembly)',rank,'    memory:',mem
>>
>>
>>     if (rank==0) then
>>        call cpu_time(start)
>>     endif
>>
>> !...Assemble A
>>     call MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY,ierr)!;CHKERRQ(ierr)
>>     call MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY,ierr)!;CHKERRQ(ierr)
>>     !call MatView(A,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>
>>     if (rank==0) then
>>        call cpu_time(finish)
>>        write(*,*)
>>        print '("Time to assemble A = ",f21.3," seconds.")',finish-start
>>     endif
>>
>>
>>     call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
>>     !call PetscMemoryGetMaximumUsage(mem,ierr)
>>     call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr)
>>     if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (A build and assembled)',rank,'    memory:',mem
>>
>>
>> !...Build B in parallel
>>     if (rank==0) then
>>        call cpu_time(start)
>>     endif
>>
>>     do i=IstartB,IendB-1
>>        if (i < 3*nxy) then
>>           call MatSetValues(B,ione,i,ione,i,imag_unit,INSERT_VALUES,ierr)!;CHKERRQ(ierr)
>>        endif
>>     enddo
>>
>> !...Assemble B
>>     call MatAssemblyBegin(B,MAT_FINAL_ASSEMBLY,ierr)!;CHKERRQ(ierr)
>>     call MatAssemblyEnd(B,MAT_FINAL_ASSEMBLY,ierr)!;CHKERRQ(ierr)
>>     !call MatView(B,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>
>>     call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
>>     !call PetscMemoryGetMaximumUsage(mem,ierr)
>>     call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr)
>>     if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (B build and assembled)',rank,'    memory:',mem
>>
>>
>>     call MatNorm(B,NORM_FROBENIUS,nrm1,ierr)!;CHKERRQ(ierr)
>>
>> !!$    if (rank==0) then
>> !!$       write(*,*)'norm B:',nrm1
>> !!$    endif
>>
>>     if (rank==0) then
>>        call cpu_time(finish)
>>        write(*,*)
>>        print '("Time to build and assemble B = ",f21.3," seconds.")',finish-start
>>     endif
>>
>>
>> !...Set Boundary Conditions
>>
>>     if (rank==0) then
>>
>>        allocate(li(nx))
>>
>>        do i=1,nx
>>           li(i)=(i-1)*ny
>>        enddo
>>
>>        ct1=0
>>        ct2=0
>>        ct3=0
>>        ct4=0
>>
>>        do i=1,nx
>>           do j=1,ny
>>
>>              if (j==1) then ! fst_bc
>>                 ct1=ct1+1
>>              elseif (j==ny) then ! wall_bc
>>                 ct2=ct2+1
>>              elseif (i==1) then ! outlet_bc
>>                 ct3=ct3+1
>>              elseif (i==nx) then ! inlet_bc
>>                 ct4=ct4+1
>>              endif
>>
>>           enddo
>>        enddo
>>
>>        ct=ct1+ct2+ct3+ct4
>>
>>     endif ! if (rank == 0)
>>
>>     call MPI_Bcast(ct,1,MPI_INT,source,MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
>>     call MPI_Bcast(ct1,1,MPI_INT,source,MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
>>     call MPI_Bcast(ct2,1,MPI_INT,source,MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
>>     call MPI_Bcast(ct3,1,MPI_INT,source,MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
>>     call MPI_Bcast(ct4,1,MPI_INT,source,MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
>>
>>     allocate(fst_bc(ct1))
>>     allocate(wall_bc(ct2))
>>     allocate(outlet_bc(ct3))
>>     allocate(inlet_bc(ct4))
>>
>>     allocate(all_bc(3*ct))
>>
>>     allocate(u_bc(ct))
>>     allocate(v_bc(ct))
>>     allocate(w_bc(ct))
>>     allocate(p_bc(ct))
>>
>>     if (rank == 0) then
>>
>>        ct1=0
>>        ct2=0
>>        ct3=0
>>        ct4=0
>>
>>        do i=1,nx
>>           do j=1,ny
>>
>>              ij=li(i)+j-1
>>
>>              if (j==1) then ! fst_bc
>>                 ct1=ct1+1
>>                 fst_bc(ct1)=ij
>>              elseif (j==ny) then ! wall_bc
>>                 ct2=ct2+1
>>                 wall_bc(ct2)=ij
>>              elseif (i==1) then ! outlet_bc
>>                 ct3=ct3+1
>>                 outlet_bc(ct3)=ij
>>              elseif (i==nx) then ! inlet_bc
>>                 ct4=ct4+1
>>                 inlet_bc(ct4)=ij
>>              endif
>>
>>           enddo
>>        enddo
>>
>>        u_bc(1:ct1)=fst_bc
>>        u_bc(ct1+1:ct1+ct2)=wall_bc
>>        u_bc(ct1+ct2+1:ct1+ct2+ct3)=outlet_bc
>>        u_bc(ct1+ct2+ct3+1:ct1+ct2+ct3+ct4)=inlet_bc
>>
>>        v_bc=u_bc+nxy
>>        w_bc=v_bc+nxy
>>        p_bc=w_bc+nxy
>>
>>        all_bc=[u_bc,v_bc,w_bc]
>>
>>     endif ! if (rank == 0)
>>
>>     call MPI_Bcast(all_bc,3*ct,MPI_INT,source,MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
>>
>>     !For parallel case, all processes that share  matrix MUST call this routine, regardless of whether any rows being zeroed are owned by them.
>>     call MatZeroRows(A,3*ct,all_bc, one,zero,zero,ierr);CHKERRQ(ierr)
>>     call MatZeroRows(B,3*ct,all_bc,zero,zero,zero,ierr);CHKERRQ(ierr)
>>
>>     !call MatView(B,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>
>>     ! cannot be inside a if (rank==0) statment!!!
>>     call MatNorm(A,NORM_FROBENIUS,nrm,ierr)!;CHKERRQ(ierr)
>>     call MatNorm(B,NORM_FROBENIUS,nrm1,ierr)!;CHKERRQ(ierr)
>>
>>     if (rank==0) then
>>        write(*,*)'norm A:',nrm
>>        write(*,*)'norm B:',nrm1
>>     endif
>>
>>
>>
>> !!!!!! TESTING HOW BIG LU CAN BE !!!!!!!!!!!!!!!!
>>
>> !!$    sigma = (1.0d0,0.0d0)
>> !!$    call MatAXPY(A,-sigma,B,DIFFERENT_NONZERO_PATTERN,ierr) ! on exit A = A-sigma*B
>> !!$
>> !!$    !...Create linear solver context
>> !!$    call KSPCreate(PETSC_COMM_WORLD,ksp,ierr)
>> !!$
>> !!$!...Set operators. Here the matrix that defines the linear system also serves as the preconditioning matrix.
>> !!$    !call KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr) !aha commented and replaced by next line
>> !!$    call KSPSetOperators(ksp,A,A,ierr) ! remember: here A = A-sigma*B
>> !!$
>> !!$    tol = 1.e-10
>> !!$    call KSPSetTolerances(ksp,tol,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr) ! set relative and absolute tolerances and uses defualt for divergence tol
>> !!$
>> !!$!...Set the Direct (LU) Solver
>> !!$    call KSPSetType(ksp,KSPPREONLY,ierr)
>> !!$    call KSPGetPC(ksp,pc,ierr)
>> !!$    call PCSetType(pc,PCLU,ierr)
>> !!$    call PCFactorSetMatSolverPackage(pc,MATSOLVERSUPERLU_DIST,ierr) ! MATSOLVERSUPERLU_DIST MATSOLVERMUMPS
>> !!$
>> !!$!...Create right-hand-side vector
>> !!$    call MatCreateVecs(A,frhs,PETSC_NULL_OBJECT,ierr)
>> !!$    call MatCreateVecs(A,sol,PETSC_NULL_OBJECT,ierr)
>> !!$
>> !!$    allocate(xwork1(IendA-IstartA))
>> !!$    allocate(loc(IendA-IstartA))
>> !!$
>> !!$    ct=0
>> !!$    do i=IstartA,IendA-1
>> !!$       ct=ct+1
>> !!$       loc(ct)=i
>> !!$       xwork1(ct)=(1.0d0,0.0d0)
>> !!$    enddo
>> !!$
>> !!$    call VecSetValues(frhs,IendA-IstartA,loc,xwork1,INSERT_VALUES,ierr)
>> !!$    call VecZeroEntries(sol,ierr)
>> !!$
>> !!$    deallocate(xwork1,loc)
>> !!$
>> !!$!...Assemble Vectors
>> !!$    call VecAssemblyBegin(frhs,ierr)
>> !!$    call VecAssemblyEnd(frhs,ierr)
>> !!$
>> !!$!...Solve the linear system
>> !!$    call KSPSolve(ksp,frhs,sol,ierr)
>> !!$
>> !!$    !call VecView(sol,PETSC_VIEWER_STDOUT_WORLD,ierr)
>> !!$
>> !!$    STOP
>>
>>
>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>> !!!!!!!!!!!!!!!!!!!!!!!           SLEPC         !!!!!!!!!!!!!!!!!!!!!!!!!!
>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>
>>     if (rank==0) call cpu_time(start)
>>
>>
>>     call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
>>     call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr)
>>     if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (begin slepc)',rank,'    memory:',mem
>>
>> !...Get vector(s) compatible with the matrix, i.e. with the same parallel layout
>>     call MatCreateVecs(A,xr,xi,ierr)
>> !!$    call MatCreateVecs(A,xr,PETSC_NULL_OBJECT,ierr)
>> !!$    call MatCreateVecs(A,xi,PETSC_NULL_OBJECT,ierr)
>>
>>
>> !...Create Eigensolver Context and Spectral Transformation context
>>     call EPSCreate(PETSC_COMM_WORLD,eps,ierr)
>>     !call STCreate(PETSC_COMM_WORLD,st,ierr)
>>
>> !...Set Operators and Problem Type - Generalized Eigenvalue Problem
>>     call EPSSetOperators(eps,A,B,ierr)
>>     !call EPSSetProblemType(eps,EPS_PGNHEP,ierr) ! THIS ONE LEADS TO AN ERROR --> CHECK WHY BECAUSE MY B SEEMS FITTING THAT CRITERION
>>     call EPSSetProblemType(eps,EPS_GNHEP,ierr)
>>
>> !...Select Portion of Spectrum
>>     call EPSSetTarget(eps,(3.0d0,0.d0),ierr)
>>     call EPSSetWhichEigenpairs(eps,EPS_TARGET_MAGNITUDE,ierr)
>>
>> !...Select Algorithm for Eigenvalue Computation (default is Krylov-Schur)
>>     call EPSSetType(eps,EPSKRYLOVSCHUR,ierr) !EPSARNOLDI,EPSKRYLOVSCHUR
>>
>>     call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
>>     call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr)
>>     if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (slepc after EPSSetType)',rank,'    memory:',mem
>>
>> !...Set Tolerance and Maxiter for Solver
>>     tolslepc=1e-12
>>     maxiter=500
>>     !call EPSSetTolerances(eps,PETSC_DEFAULT_REAL,maxiter,ierr)
>>     call EPSSetTolerances(eps,tolslepc,maxiter,ierr)
>>
>> !...Sets number of eigenvalues to compute and dimension of subspace
>>     nev=200
>>     ncv=3*nev
>>     mpd=10*nev
>>     !call EPSSetDimensions(eps,nev,ncv,mpd,ierr) PETSC_DEFAULT
>>     call EPSSetDimensions(eps,nev,PETSC_DEFAULT_INTEGER,PETSC_DEFAULT_INTEGER,ierr)
>>
>>     call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
>>     call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr)
>>     if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (before EPSGetST)',rank,'    memory:',mem
>>
>> !...Set the Spectral Transformation --> Use Shift-And-Invert
>>     call EPSGetST(eps,st,ierr)
>>     call STSetType(st,STSINVERT,ierr)
>>
>> !...Set Linear Solver
>>     call STGetKSP(st,ksp,ierr)
>>     call KSPSetType(ksp,KSPPREONLY,ierr)
>>     call KSPGetPC(ksp,pc,ierr)
>>     call PCSetType(pc,PCLU,ierr) ! could try PCCHOLESKY
>>     call PCFactorSetMatSolverPackage(pc,MATSOLVERSUPERLU_DIST,ierr) ! MATSOLVERSUPERLU_DIST,MATSOLVERMUMPS
>>
>> !...Set Solver Parameters at Runtime
>>     call EPSSetFromOptions(eps,ierr)
>>
>>     call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
>>     call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr)
>>     if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (after EPSSetFromOptions)',rank,'    memory:',mem
>>
>> !...Solve the Eigensystem
>>     call EPSSolve(eps,ierr)
>>
>>     call EPSGetIterationNumber(eps,its,ierr)
>>     if (rank .eq. 0) then
>>        write(*,110) its
>>     endif
>> 110 format (/' Number of iterations of the method:',I4)
>>
>> !...Optional: Get some information from the solver and display it
>>     call EPSGetType(eps,tname,ierr)
>>     if (rank .eq. 0) then
>>        write(*,120) tname
>>     endif
>> 120 format (' Solution method: ',A)
>>
>>     nev=10
>>     call EPSGetDimensions(eps,nev,PETSC_NULL_INTEGER,PETSC_NULL_INTEGER,ierr)
>>     if (rank .eq. 0) then
>>        write(*,130) nev
>>     endif
>> 130 format (' Number of requested eigenvalues:',I4)
>>     call EPSGetTolerances(eps,tol,maxit,ierr)
>>     if (rank .eq. 0) then
>>        write(*,140) tol, maxit
>>     endif
>> 140 format (' Stopping condition: tol=',1P,E10.4,', maxit=',I4)
>>
>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>> !     Display solution and clean up
>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>>
>> !...Get number of converged eigenpairs
>>     call EPSGetConverged(eps,nconv,ierr)
>>     if (rank .eq. 0) then
>>        write(*,150) nconv
>>     endif
>> 150 format (' Number of converged eigenpairs:',I4/)
>>
>> !...Display eigenvalues and relative errors
>>     if (nconv.gt.0) then
>>        if (rank .eq. 0) then
>>           write(*,*) '         kr               ki          ||Ax-kx||/||kx||'
>>           write(*,*) ' ----------------- -----------------  ------------------'
>>        endif
>>        do i=0,nconv-1
>>           !...Get converged eigenpairs: i-th eigenvalue is stored in kr (real part) and ki (imaginary part)
>>           call EPSGetEigenpair(eps,i,kr,ki,xr,xi,ierr)
>>
>>           !...Compute the relative error associated to each eigenpair
>>           call EPSComputeError(eps,i,EPS_ERROR_RELATIVE,error,ierr)
>>           if (rank .eq. 0) then
>>              write(*,160) PetscRealPart(kr),PetscImaginaryPart(kr), error
>>           endif
>> 160       format (1P,'   ',E21.11,'       ',E21.11,'      ',E21.11)
>>
>>        enddo
>>
>>     endif
>>
>> !...Free work space
>>
>>     if (rank==0) deallocate(li)
>>
>>     deallocate(D1Xc,D2Xc,D1Yc,D2Yc)
>>
>>     deallocate(fst_bc)
>>     deallocate(wall_bc)
>>     deallocate(outlet_bc)
>>     deallocate(inlet_bc)
>>
>>     deallocate(all_bc)
>>
>>     deallocate(u_bc)
>>     deallocate(v_bc)
>>     deallocate(w_bc)
>>     deallocate(p_bc)
>>
>>     !call STDestroy(st,ierr)
>>     call EPSDestroy(eps,ierr)
>>     call MatDestroy(A,ierr)
>>     call MatDestroy(B,ierr)
>>     !call VecDestroy(i_vec,ierr)
>>     call VecDestroy(xr,ierr)
>>     call VecDestroy(xi,ierr)
>>
>> !...Finalize
>>     call SlepcFinalize(ierr)
>>
>>     if (rank==0) then
>>        call cpu_time(finish)
>>        write(*,*)
>>        print '("Total time for SLEPC = ",f21.3," seconds.")',finish-start
>>     endif
>>
>>     END Subroutine SETUPSOLVEGEVP
>>
>>
>>     END MODULE PETSC
>>
>>
>>
>>
>>
>> !!$       write(*,*)'fst_bc',fst_bc
>> !!$       write(*,*)'wall_bc',wall_bc
>> !!$       write(*,*)'outlet_bc',outlet_bc
>> !!$       write(*,*)'inlet_bc',inlet_bc
>> !!$
>> !!$       write(*,*)'all_bc',all_bc
>>
>> !!$       write(*,*)'ct1+ct2+ct3+ct4',ct1+ct2+ct3+ct4
>>
>>
>>
>> !!$    call MatCreate(PETSC_COMM_WORLD,A,ierr)
>> !!$    call MatSetSizes(A,PETSC_DECIDE,PETSC_DECIDE,4*nxy,4*nxy,ierr)
>> !!$    call MatSetFromOptions(A,ierr)
>> !!$    call MatSetUp(A,ierr)
>>
>>
>>
>>        !call VecCreateSeq(PETSC_COMM_SELF,nxy,i_vec,ierr) ! sequential vector
>>        !call VecSetValues(i_vec,nxy,loc,xwork1,INSERT_VALUES,ierr)
>>
>>        !  Assemble vector
>>        !call VecAssemblyBegin(i_vec,ierr)
>>        !call VecAssemblyEnd(i_vec,ierr)
>>
>>     !  Create parallel matrix, specifying only its global dimensions.
>>     !  When using MatCreate(), the matrix format can be specified at
>>     !  runtime. Also, the parallel partitioning of the matrix is
>>     !  determined by PETSc at runtime.
>>
>>
>>
>> !!$    call MatCreate(PETSC_COMM_WORLD,B,ierr)
>> !!$    call MatSetSizes(B,PETSC_DECIDE,PETSC_DECIDE,4*nxy,4*nxy,ierr)
>> !!$    call MatSetFromOptions(B,ierr)
>> !!$    call MatSetUp(B,ierr)
>>
>>     !  Currently, all PETSc parallel matrix formats are partitioned by
>>     !  contiguous chunks of rows across the processors.  Determine which
>>     !  rows of the matrix are locally owned.
>>
>>
>>
>>
>>
>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>> !                    Include files
>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>> !
>> !  This program uses CPP for preprocessing, as indicated by the use of
>> !  PETSc include files in the directory petsc/include/finclude.  This
>> !  convention enables use of the CPP preprocessor, which allows the use
>> !  of the #include statements that define PETSc objects and variables.
>> !
>> !  Use of the conventional Fortran include statements is also supported
>> !  In this case, the PETsc include files are located in the directory
>> !  petsc/include/foldinclude.
>> !
>> !  Since one must be very careful to include each file no more than once
>> !  in a Fortran routine, application programmers must exlicitly list
>> !  each file needed for the various PETSc components within their
>> !  program (unlike the C/C++ interface).
>> !
>>
>>
>>
>>
>>     !write(*,*)'IstartA,IendA',IstartA,IendA
>>     !write(*,*)'IstartB,IendB',IstartB,IendB
>>
>>
>>
>> !!$#include <finclude/petscsys.h>
>> !!$#include <finclude/petscvec.h>
>> !!$#include <finclude/petscmat.h>
>> !!$#include <finclude/petscmat.h90>
>> !!$#include <finclude/petscpc.h>
>> !!$#include <finclude/petscksp.h>
>> !!$
>> !!$#include <finclude/slepcsys.h>
>> !!$#include <finclude/slepceps.h>
>>
>>
>>
>>
>>
>>
>>
>>
>> !!$      do i=1,nx
>> !!$         do j=1,nx
>> !!$            D1X(i,j)=100+j+(i-1)*nx
>> !!$         enddo
>> !!$      enddo
>>
>> !!$      do i=1,ny
>> !!$         do j=1,ny
>> !!$            D1Y(i,j)=999.d0
>> !!$         enddo
>> !!$      enddo
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> !  Create parallel vectors.
>> !   - Here, the parallel partitioning of the vector is determined by
>> !     PETSc at runtime.  We could also specify the local dimensions
>> !     if desired -- or use the more general routine VecCreate().
>> !   - When solving a linear system, the vectors and matrices MUST
>> !     be partitioned accordingly.  PETSc automatically generates
>> !     appropriately partitioned matrices and vectors when MatCreate()
>> !     and VecCreate() are used with the same communicator.
>> !   - Note: We form 1 vector from scratch and then duplicate as needed.
>>
>>       !call VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,m*n,u,ierr)
>>       !call VecSetFromOptions(u,ierr)
>>       !call VecDuplicate(u,b,ierr)
>>       !call VecDuplicate(b,x,ierr)
>>
>> !  Set exact solution; then compute right-hand-side vector.
>> !  By default we use an exact solution of a vector with all
>> !  elements of 1.0;  Alternatively, using the runtime option
>> !  -random_sol forms a solution vector with random components.
>>
>> !!$      call PetscOptionsHasName(PETSC_NULL_CHARACTER,'-random_exact_sol',flg,ierr)
>> !!$      if (flg) then
>> !!$         call PetscRandomCreate(PETSC_COMM_WORLD,rctx,ierr)
>> !!$         call PetscRandomSetFromOptions(rctx,ierr)
>> !!$         call VecSetRandom(u,rctx,ierr)
>> !!$         call PetscRandomDestroy(rctx,ierr)
>> !!$      else
>> !!$         call VecSet(u,one,ierr)
>> !!$      endif
>> !!$      call MatMult(A,u,b,ierr)
>>
>>
>> !  View the exact solution vector if desired
>>
>> !!$      call PetscOptionsHasName(PETSC_NULL_CHARACTER,"-view_exact_sol",flg,ierr)
>> !!$      if (flg) then
>> !!$         call VecView(u,PETSC_VIEWER_STDOUT_WORLD,ierr)
>> !!$      endif
>>
>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>> !         Create the linear solver and set various options
>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>>
>> !  Create linear solver context
>>
>> !!$      call KSPCreate(PETSC_COMM_WORLD,ksp,ierr)
>>
>>
>> !  Set operators. Here the matrix that defines the linear system
>> !  also serves as the preconditioning matrix.
>>
>>       !call KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr) !aha commented and replaced by next line
>> !!$      call KSPSetOperators(ksp,A,A,ierr)
>>
>>
>> !  Set linear solver defaults for this problem (optional).
>> !   - By extracting the KSP and PC contexts from the KSP context,
>> !     we can then directly directly call any KSP and PC routines
>> !     to set various options.
>> !   - The following four statements are optional; all of these
>> !     parameters could alternatively be specified at runtime via
>> !     KSPSetFromOptions(). All of these defaults can be
>> !     overridden at runtime, as indicated below.
>>
>> !     We comment out this section of code since the Jacobi
>> !     preconditioner is not a good general default.
>>
>> !      call KSPGetPC(ksp,pc,ierr)
>> !      ptype = PCJACOBI
>> !      call PCSetType(pc,ptype,ierr)
>> !      tol = 1.e-7
>> !      call KSPSetTolerances(ksp,tol,PETSC_DEFAULT_DOUBLE_PRECISION,
>> !     &     PETSC_DEFAULT_DOUBLE_PRECISION,PETSC_DEFAULT_INTEGER,ierr)
>>
>> !  Set user-defined monitoring routine if desired
>>
>>       !call PetscOptionsHasName(PETSC_NULL_CHARACTER,'-my_ksp_monitor',flg,ierr) ! flg set to true if that option has been specified
>>       !if (flg) then
>>       !  call KSPMonitorSet(ksp,MyKSPMonitor,PETSC_NULL_OBJECT,PETSC_NULL_FUNCTION,ierr) ! call MyKSPMonitor
>>       !endif
>>
>>       !tol = 1.e-10
>>       !call KSPSetTolerances(ksp,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr) ! set relative tolerance and uses defualt for absolute and divergence tol
>>       !call KSPSetTolerances(ksp,tol,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr) ! set relative and absolute tolerances and uses defualt for divergence tol
>>
>> !  Set runtime options, e.g.,
>> !      -ksp_type <type> -pc_type <type> -ksp_monitor -ksp_rtol <rtol>
>> !  These options will override those specified above as long as
>> !  KSPSetFromOptions() is called _after_ any other customization
>> !  routines.
>>
>>       !call KSPSetFromOptions(ksp,ierr)
>>
>>
>> !  Set convergence test routine if desired
>>
>>       !call PetscOptionsHasName(PETSC_NULL_CHARACTER,'-my_ksp_convergence',flg,ierr)
>>       !if (flg) then
>>       !  call KSPSetConvergenceTest(ksp,MyKSPConverged,PETSC_NULL_OBJECT,PETSC_NULL_FUNCTION,ierr)
>>       !endif
>>
>>
>>
>>       !call KSPMonitorSet(ksp,KSPMonitorTrueResidualNorm,PETSC_NULL_OBJECT,PETSC_NULL_FUNCTION,ierr);
>>
>>
>> !
>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>> !                      Solve the linear system
>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>>
>>       !call KSPSolve(ksp,b,x,ierr)
>>
>>
>> !
>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>> !                      View solver info (could use -ksp_view instead)
>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>>
>>       !added by aha
>>
>> !!$      write(*,*)''
>> !!$      write(*,*)'Start of KSPView'
>> !!$      write(*,*)'----------------'
>> !!$      write(*,*)''
>> !!$      call KSPView(ksp,PETSC_VIEWER_STDOUT_WORLD,ierr)
>> !!$      write(*,*)''
>> !!$      write(*,*)'End of KSPView'
>> !!$      write(*,*)'--------------'
>> !!$      write(*,*)''
>>
>>
>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>> !                     Check solution and clean up
>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>>
>> !  Check the error
>> !!$      call VecAXPY(x,neg_one,u,ierr)
>> !!$      call VecNorm(x,NORM_2,norm,ierr)
>> !!$      call KSPGetIterationNumber(ksp,its,ierr)
>> !!$      if (rank .eq. 0) then
>> !!$        if (norm .gt. 1.e-12) then
>> !!$           write(6,100) norm,its
>> !!$        else
>> !!$           write(6,110) its
>> !!$        endif
>> !!$      endif
>> !!$  100 format('Norm of error ',e11.4,' iterations ',i5)
>> !!$  110 format('Norm of error < 1.e-12,iterations ',i5)
>>
>> !  Free work space.  All PETSc objects should be destroyed when they
>> !  are no longer needed.
>>
>> !!$      call KSPDestroy(ksp,ierr)
>> !!$      call VecDestroy(u,ierr)
>> !!$      call VecDestroy(x,ierr)
>> !!$      call VecDestroy(b,ierr)
>> !!$      call MatDestroy(A,ierr)
>>
>> !  Always call PetscFinalize() before exiting a program.  This routine
>> !    - finalizes the PETSc libraries as well as MPI
>> !    - provides summary and diagnostic information if certain runtime
>> !      options are chosen (e.g., -log_summary).  See PetscFinalize()
>> !      manpage for more information.
>>
>>
>> ! --------------------------------------------------------------
>> !
>> !  MyKSPMonitor - This is a user-defined routine for monitoring
>> !  the KSP iterative solvers.
>> !
>> !  Input Parameters:
>> !    ksp   - iterative context
>> !    n     - iteration number
>> !    rnorm - 2-norm (preconditioned) residual value (may be estimated)
>> !    dummy - optional user-defined monitor context (unused here)
>> !
>> !!$      subroutine MyKSPMonitor(ksp,n,rnorm,dummy,ierr)
>> !!$
>> !!$      implicit none
>> !!$
>> !!$#include <finclude/petscsys.h>
>> !!$#include <finclude/petscvec.h>
>> !!$#include <finclude/petscksp.h>
>> !!$
>> !!$      KSP              ksp
>> !!$      Vec              x
>> !!$      PetscErrorCode ierr
>> !!$      PetscInt n,dummy
>> !!$      PetscMPIInt rank
>> !!$      double precision rnorm
>> !!$
>> !!$!  Build the solution vector
>> !!$
>> !!$      call KSPBuildSolution(ksp,PETSC_NULL_OBJECT,x,ierr)
>> !!$
>> !!$!  Write the solution vector and residual norm to stdout
>> !!$!   - Note that the parallel viewer PETSC_VIEWER_STDOUT_WORLD
>> !!$!     handles data from multiple processors so that the
>> !!$!     output is not jumbled.
>> !!$
>> !!$      call MPI_Comm_rank(PETSC_COMM_WORLD,rank,ierr)
>> !!$      if (rank .eq. 0) write(6,100) n
>> !!$      call VecView(x,PETSC_VIEWER_STDOUT_WORLD,ierr)
>> !!$      if (rank .eq. 0) write(6,200) n,rnorm
>> !!$
>> !!$ 100  format('iteration ',i5,' solution vector:')
>> !!$ 200  format('iteration ',i5,' residual norm ',e11.4)
>> !!$      ierr = 0
>> !!$
>> !!$      end subroutine MyKSPMonitor
>>
>> ! --------------------------------------------------------------
>> !
>> !  MyKSPConverged - This is a user-defined routine for testing
>> !  convergence of the KSP iterative solvers.
>> !
>> !  Input Parameters:
>> !    ksp   - iterative context
>> !    n     - iteration number
>> !    rnorm - 2-norm (preconditioned) residual value (may be estimated)
>> !    dummy - optional user-defined monitor context (unused here)
>> !
>> !!$      subroutine MyKSPConverged(ksp,n,rnorm,flag,dummy,ierr)
>> !!$
>> !!$      implicit none
>> !!$
>> !!$#include <finclude/petscsys.h>
>> !!$#include <finclude/petscvec.h>
>> !!$#include <finclude/petscksp.h>
>> !!$
>> !!$      KSP              ksp
>> !!$      PetscErrorCode ierr
>> !!$      PetscInt n,dummy
>> !!$      KSPConvergedReason flag
>> !!$      double precision rnorm
>> !!$
>> !!$      if (rnorm .le. .05) then
>> !!$        flag = 1
>> !!$      else
>> !!$        flag = 0
>> !!$      endif
>> !!$      ierr = 0
>> !!$
>> !!$      end subroutine MyKSPConverged
>>
>>
>>
>> <valgrind.log.31371><valgrind.log.31372>